CC BY-NC-ND 4.0 · Yearb Med Inform 2021; 30(01): 257-263
DOI: 10.1055/s-0041-1726528
Section 10: Natural Language Processing
Synopsis

Year 2020 (with COVID): Observation of Scientific Literature on Clinical Natural Language Processing

Natalia Grabar
1   Université Paris Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France
2   STL, CNRS, Université de Lille, Domaine du Pont-de-bois, Villeneuve-d’Ascq cedex, France
,
Cyril Grouin
1   Université Paris Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France
,
Section Editors of the IMIA Yearbook Section on Clinical Natural Language Processing › Author Affiliations

Summary

Objectives: To analyze the content of publications within the medical NLP domain in 2020.

Methods: Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues.

Results: Three best papers have been selected in 2020. We also propose an analysis of the content of the NLP publications in 2020, all topics included.

Conclusion: The two main issues addressed in 2020 are related to the investigation of COVID-related questions and to the further adaptation and use of transformer models. Besides, the trends from the past years continue, such as diversification of languages processed and use of information from social networks

5 https://openai.com/blog/better-language-models/




Publication History

Article published online:
03 September 2021

© 2021. IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Nadkarni PM, Ohno-Machado L, Chapman WW. Natural Language Processing: an introduction. J Am Med Inform Assoc 2011; 18: 544-51
  • 2 Friedman C, Hripcsak G. Natural language processing and its future in medicine. Acad Med 1999 Aug 74 (08) 890-5
  • 3 Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [Internet]. 2019 . Available from: https://arxiv.org/abs/1810.04805v2
  • 4 Lee J, Yoon W, Kim S, Kim D, Kim S, So CH. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020; 36 (04) 1234-40
  • 5 de Vries W, Van Cranenburgh A, Bisazza A, Caselli T, van Noord G, Nissim M. BERTje: A Dutch BERT Model [Internet]. 2019 . Available from: https://arxiv.org/abs/1912.09582v1
  • 6 Le H, Vial L, Frej J, Segonne V, Coavoux M, Lecouteux B. et al. FlauBERT: Unsupervised Language Model Pre-training for French. In: Proc of LREC Marseille, France: European Language Resources Association; 2020: 2479-90
  • 7 Martin L, Muller B, Ortiz Suàrez PJ, Dupont Y, Romary L, De La Cergerie É. et al. CamemBERT: a tasty French language model. . In Proc. of ACL Association for Computational Linguistics; 2020: 7203-19
  • 8 Scheible R, Thomczyk F, Tippmann P, Jaravine V. Boeker M (2020). GottBERT: a pure German Language Model [Internet]. 2020 . Available from: arXiv preprint arXiv: 2012.02110
  • 9 Polignano M, Basile P, de Gemmis M. Semeraro G & Basile V. AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets. In: Proc of CLiC-it, volume 2481. CEUR 2019
  • 10 Canete J, Chaperon G, Fuentes R, Ho J-H, Kang H, Pérez J. Spanish Pre-Trained BERT Model and Evaluation Data. In PML4DC at ICLR; 2020
  • 11 Jain S, Tang G, Chi LS. MRCBert: A Machine Reading Comprehension Approach for Unsupervised Summarization [Internet]. 2021 . Available from: https://arxiv.org/abs/2105.00239v1
  • 12 Rajpurkar P, Zhang J, Lopyrev K, Liang P. SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics; 2016: 2383-92
  • 13 Mikolov T, Chen K. Corrado G & Dean J. Efficient Estimation of Word Representations in Vector Space [Internet]. 2013 . Available from: https://arxiv.org/abs/1301.3781v3
  • 14 Pennington J, Socher R, Manning CD. GloVe: Global Vectors for Word Representation. In: Empirical Methods in Natural Language Processing (EMNLP); 2014: 1532-43
  • 15 Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. In: Proc. of NAACL; 2018.
  • 16 Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language Models are Unsupervised Multitask Learners. 2019 . Available from: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
  • 17 Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P. et al. D. Language Models are Few-Shot Learners [Internet]. 2020 . Available from: https://arxiv.org/abs/2005.14165v4
  • 18 Poerner N, Waltinger U, Shütze H. Inexpensive Domain Adaptation of Pretrained Language Models: Case Studies on Biomedical NER and Covid-19 QA. In: Proc. of EMNLP; 2020: 1482-90
  • 19 Jin Q, Tan C, Chen M. Liu X & Huang S. Predicting Clinical Trial Results by Implicit Evidence Integration. In: Proc. of EMNLP; 2020: 1461-77
  • 20 Faris H, Habib M, Faris M. Alomari M & Alomari A. Medical speciality classification system based on binary particle swarms and ensemble of one vs. rest support vector machines. J Biomed Inform 2020; 109: 103525
  • 21 Chen C-H, Hsieh J-G, Cheng S-L, Lin Y-L, Lin P-H, Jeng J-H. Early short-term prediction of emergency department length of stay using natural language processing for low-acuity outpatients. Am J Emerg Med 2020; 38 (11) 2368-73
  • 22 Li X, Lin X, Ren H, Guo J. Ontological organization and bioinformatic analysis of adverse drug reactions from package inserts: Development and usability study. J Med Internet Res 2020; 22 (07) e20443
  • 23 Wang Z, Huang H, Cui L, Chen J, An J, Duan H. et al. Using natural language processing techniques to provide personalized educational materials for chronic disease patients in China: Development and assessment of a knowledge-based health recommender system. JMIR Med Inform 2020; 8 (04) e17642
  • 24 Wu C, Luo G, Guo C, Ren Y, Zheng A, Yang C. An attention-based multi-task model for named entity recognition and intent analysis of Chinese online medical questions. J Biomed Inform 2020; 108: 103511
  • 25 Xia H, An W, Li J, Zhang ZJ. Outlier knowledge management for extreme public health events: Understanding public opinions about COVID-19 based on microblog data. Socioecon Plann Sci 2020; 10 (09) 41-1
  • 26 Zhang Z, Zhu L, Yu P. Multi-level representation learning for Chinese medical entity recognition: Model development and validation. JMIR Med Inform 2020; 8 (05) e17637
  • 27 Krsnik I, Glavas G, Krsnik M, Miletic D, SŠtajduhar I. Automatic annotation of narrative radiology reports. Diagnostics (Basel) 2020; 10 (04) 196
  • 28 Güngör O, Güngör T, Uskudarli S. Exseqreg: Explaining sequence-based nlp tasks with regions with a case study using morphological features for named entity recognition. PLoS One 2020; 15 (12) e0244179
  • 29 Moen H, Hakala K, Peltonen L-M, Matinolli H-M, Suhonen H, Terho K. et al. Assisting nurses in care documentation: from automated sentence classification to coherent document structures with subject headings. J Biomed Semantics2020 11 (01) 10
  • 30 Grabar N, Dalloux C, Claveau V. CAS: corpus of clinical cases in French. J Biomed Semantics 2020; 11 (01) 7
  • 31 Neuraz A, Lerner I, Digan W, Paris N, Tsopra R, Rogier A. et al. Natural language processing for rapid response to emergent diseases: Case study of calcium channel blockers and hypertension in the COVID-19 pandemic. J Med Internet Res 2020; 22 (08) e20773
  • 32 Abbood A, Ullrich A, Busche R, Ghozzi S. EventEpi-A natural language processing framework for event-based surveillance. PLoS Comput Biol 2020; 16 (11) e1008277
  • 33 Ferrario A, Demiray B, Yordanova K, Luo M, Martin M. Social reminiscence in older adults’ everyday conversations: Automated detection using natural language processing and machine learning. J Med Internet Res 2020; 22 (09) e19133
  • 34 Wulff A, Mast M, Hassler M, Montag S, Marschollek M, Jack T. Designing an openEHR-based pipeline for extracting and standardizing unstructured clinical data using natural language processing. Methods Inf Med 2020; 59 (S02): e64-e78
  • 35 Barash Y, Guralnik G, Tau N, Soffer S, Levy T, Shimon O. et al. Comparison of deep learning models for natural language processing-based classification of non-english head ct reports. Neuroradiology 2020; 62 (10) 1247-56
  • 36 Lanzone J, Cenci C, Tombini M, Ricci L, Tufo T, Piccoli M. et al. Glimpsing the impact of COVID19 lock-down on people with epilepsy: A text mining approach. Front Neurol 2020; 11: 870
  • 37 Mensa E, Colla D, Dalmasso M, Giustini M, Mamo C, Pitidis A. et al. Violence detection explanation via semantic roles embeddings. BMC Med Inform Decis Mak 2020; 20 (01) 263
  • 38 Viani N, Kam J, Yin L, Bittar A, Dutta R, Patel R. et al. Temporal information extraction from mental health records to identify duration of untreated psychosis. J Biomed Semantics 2020; 11 (01) 2
  • 39 Nakatani H, Nakao M, Uchiyama H, Toyoshiba H, Ochiai C. Predicting inpatient falls using natural language processing of nursing records obtained from Japanese electronic medical records: Case-control study. JMIR Med Inform 2020; 8 (04) e16970
  • 40 Ujiee S, Yada S, Wakamiya S, Aaramaki E. Identification of adverse drug event-related Japanese articles: Natural language processing analysis. JMIR Med Inform 2020; 8 (11) e22661
  • 41 Cho I. Lee M & Kim Y. What are the main patient safety concerns of healthcare stakeholders: a mixed-method study of web-based text. Int J Med Inform 2020; 140: 104162
  • 42 Lee KH, Kim HJ, Kim YJ, Kim JH, Song EY. Extracting structured genotype information from free-text hla reports using a rule-based approach. J Korean Med Sci 2020; 35 (12) e78
  • 43 Eskildsen NK, Eriksson R, Christensen SB, Aghassipour TS, Bygso MJ, Brunak S. et al. Implementation and comparison of two text mining methods with a standard pharmacovigilance method for signal detection of medication errors. BMC Med Inform Decis Mak 2020; 20 (01) 94
  • 44 Lopes F. Teixeira C & Oliveira HG. Comparing different methods for named entity recognition in Portuguese neurology text. J Med Syst 2020; 44 (04) 77
  • 45 Graziani D, Soriano JB, Rio-Bermudez CD, Morena D, Dìaz T, Castillo M. et al. Characteristics and prognosis of COVID-19 in patients with COPD. J Clin Med 2020; 9 (10) 3259
  • 46 Lopez-Úbeda P, Diaz-Galiano MC, Martìn-Noguerol T, Luna A, Urena-Lopez LA, Martìn-Valdivia MT. COVID-19 detection in radiological text reports integrating entity recognition. Comput Biol Med 2020; 127: 104066
  • 47 Najafabadipour M, Zanin M, Rodrìguez-Gonzàlez A, Torrente M, Garcìa BN, Bermudez JLC. et al. Reconstructing the patient's natural history from electronic health records. Artif Intell Med 2020; 105: 101860
  • 48 Santiso S, Pérez A, Casillas A, Oronoz M. Neural negated entity recognition in Spanish electronic health records. J Biomed Inform 2020; 105: 103419
  • 49 Caccamisi A, Jorgensen L, Dalianis H, Rosenlund M. Natural language processing and machine learning to enable automatic extraction and classification of patients’ smoking status from electronic medical records. Ups J Med Sci 2020; 125 (04) 316-24
  • 50 Wang LL, Lo K, Chaddrasekhar Y, Reas R, Yang J, Burdick D. et al. CORD-19: The COVID-19 Open Research Dataset [Internet]. 2020 . Available from: https://arxiv.org/abs/2004.10706v4
  • 51 Izquierdo JL, Ancochea J, Savana COVID-19 Research Group. Soriano JB. Clinical characteristics and prognostic factors for intensive care unit admission of patients with COVID-19: Retrospective study using machine learning and natural language processing. J Med Internet Res 2020; 22 (10) e21801
  • 52 Gates LE, Hamed AA. The anatomy of the SARS-CoV-2 biomedical literature: Introducing the CovidX network algorithm for drug repurposing recommendation. J Med Internet Res 2020; 22 (08) e21169
  • 53 Ebadi A, Xi P, Tremblay S, Spencer B, Pall R, Wong A. Understanding the temporal evolution of COVID-19 research through machine learning and natural language processing. Scientometrics 2020; 11: 1-15
  • 54 Wang LL, Lo K. Text mining approaches for dealing with the rapidly expanding literature on COVID-19. Brief Bioinform 2021; 22 (02) 781-99
  • 55 Doanvo A, Qian X, Ramjee D, Piontkivska H, Desai A, Majumder M. Machine learning maps research needs in COVID-19 literature. Patterns (N Y) 2020; 1 (09) 100123
  • 56 Boon-Itt S, Skunkan Y. Public perception of the COVID-19 pandemic on Twitter: Sentiment analysis and topic modeling study. JMIR Public Health Surveill 2020; 6 (04) e21978
  • 57 Dyer J, Kolic B. Public risk perception and emotion on Twitter during the Covid-19 pandemic. Appl Netw Sci 2020; 5 (01) 99
  • 58 Mackey TK, Li J, Purushothaman V, Nali M, Shah N, Bardier C. et al. Big data, natural language processing, and deep learning to detect and characterize illicit COVID-19 product sales: Infoveillance study on Twitter and Instagram. JMIR Public Health Surveill 2020; 6 (03) e20794
  • 59 Picone M, Inoue S, Defelice C, Naujokas MF, Sinrod J, Cruz VA. et al. Social listening as a rapid approach to collecting and analyzing COVID-19 symptoms and disease natural histories reported by large numbers of individuals. Popul Health Manag 2020; 23 (05) 350-60
  • 60 Themistocleous C, Webster K, Afthinos A, Tsapkini K. Part of speech production in patients with primary progressive aphasia: An analysis based on natural language processing. Am J Speech Lang Pathol 2021; 30 (15) 466-80
  • 61 Reeves S, Williams V, Costela FM, Palumbo R, Umoren O, Christopher MM. et al. Narrative video scene description task discriminates between levels of cognitive impairment in Alzheimer's disease. Neuropsychology 2020; 34 (04) 437-46
  • 62 Chojnicka I, Wawer A. Social language in autism spectrum disorder: A computational analysis of sentiment and linguistic abstraction. PLoS One 2020; 15 (03) e229985
  • 63 Ive J, Viani N, Kam J, Yin L, Verma S, Puntis S. et al. Generation and evaluation of artificial mental health records for natural language processing. NPJ Digit Med 2020; 3: 69
  • 64 Senior M, Burghart M, Yu R, Kormilitzin A, Liu Q, Vaci N. et al. Identifying predictors of suicide in severe mental illness: A feasibility study of a clinical prediction rule (oxford mental illness and suicide tool or OxMIS). Front Psychiatry 2020; 11: 268
  • 65 Levis M, Westgate CL, Gui J, Watts BV, Shiner B. Natural language processing of clinical mental health notes may add predictive value to existing suicide risk models. Psychol Med 2020; 1-10
  • 66 Jayasinghe L, Bittar A. Dutta R & Stewart R. Clinician-recalled quoted speech in electronic health records and risk of suicide attempt: a case-crossover study. BMJ Open 2020; 10 (04) e36186
  • 67 Hart KL, Pellegrini AM, Forester BP, Berretta S, Murphy SN, Perlis RH. et al. Distribution of agitation and related symptoms among hospitalized patients using a scalable natural language processing method. Gen Hosp Psychiatry 2021; 68: 46-51
  • 68 Colling C, Khondoker M, Patel R, Fok M, Harland R, Broadbent M. et al. Predicting high-cost care in a mental health setting. BJPsych Open 2020; 6 (01) e10
  • 69 Straw I, Callison-Burch C. Artificial intelligence in mental health and the biases of language based models. PLoS One 2020; 15 (12) e0240376
  • 70 Hernandez-Boussard I, Blayney DW, Brooks JD. Leveraging digital data to inform and improve quality cancer care. Cancer Epidemiol Biomarkers Prev 2020; 29 (04) 816-22
  • 71 Ford E, Oswald M, Hassan L, Bozentko K, Nenadic G, Cassell J. Should free-text data in electronic medical records be shared for research? A citizens’ jury study in the UK. J Med Ethics 2020; 46 (06) 367-77
  • 72 Nawab K, Ramsey G, Schreiber R. Natural language processing to extract meaningful information from patient experience feedback. Appl Clin Inform 2020; 11 (02) 242-52
  • 73 Tighe PJ, Sannapaneni B, Fillingim RB, Doyle C, Kent M, Shickel B. et al. Forty-two million ways to describe pain: Topic modeling of 200,000 PubMed pain-related abstracts using natural language processing and deep learning-based text generation. Pain Med 2020; 21 (11) 3133-60
  • 74 Cuffy C, Hagiwara N, Vrana S, McInnes BT. Measuring the quality of patient-physician communication. J Biomed Inform 2020; 112: 103589
  • 75 Agarwal S, Guntuku SC, Robinson OC, Dunn A, Ungar LH. Examining the phenomenon of quarter-life crisis through artificial intelligence and the language of twitter. Front Psychol 2020; 11: 341
  • 76 Cammel SA, De Vos MS, van Soest D, Hettne KM, Boer F, Steyerberg EW. et al. How to automatically turn patient experience free-text responses into actionable insights: a natural language programming (NLP) approach. BMC Med Inform Decis Mak 2020; 20: 01-97
  • 77 Stevens R, Bonett S, Bannon J, Chittamuru D, Slaff B, Browne SK. et al. (2020). Association between HIV-related tweets and HIV incidence in the United States: Infodemiology study. J Med Internet Res 2020; 22 (06) e17196
  • 78 Tassone J, Yan P, Simpson M, Mendhe C, Mago V, Choudhury S. Utilizing deep learning and graph mining to identify drug use on Twitter data. BMC Med Inform Decis Mak 2020; 20 (Suppl 11): 304