Subscribe to RSS
DOI: 10.1055/a-2742-2430
Transformer Language Models for Neurology Research with Electronic Health Records: Current State of the Science
Authors
Abstract
This review provides an overview of the emergence and application of transformer-based language models in electronic health records in neurology. Transformer architectures are well-suited for neurological data due to their ability to model complex spatiotemporal patterns and capture long-range dependencies, both characteristic of neurological conditions and their documentation. We introduce the foundational principles of transformer models and outline the model training and evaluation frameworks commonly used in clinical text processing. We then examine current applications of transformers in neurology, spanning disease detection and diagnosis, phenotyping and symptom extraction, and outcome and prognosis prediction, and synthesize emerging patterns in model adaptation and evaluation strategies. Additionally, we discuss the limitations of current models, including generalizability, model bias, and data privacy, and propose future directions for research and implementation. By synthesizing recent advances, this review aims to guide future efforts in leveraging transformer-based language models to improve neurological care and research.
Keywords
natural language processing - electronic health records - large language models - text miningPublication History
Received: 08 September 2025
Accepted: 10 November 2025
Accepted Manuscript online:
11 November 2025
Article published online:
28 November 2025
© 2025. Thieme. All rights reserved.
Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA
-
References
- 1 Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. Published online 2018. Accessed November 18, 2025 at: https://www.mikecaptain.com/resources/pdf/gpt-1.pdf
- 2 Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv. Preprint posted online May 24, 2019.
- 3 Touvron H, Lavril T, Izacard G. et al. LLaMA: open and efficient foundation language models. arXiv. Preprint posted online February 27, 2023.
- 4 Lee J, Yoon W, Kim S. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020; 36 (04) 1234-1240
- 5 Alsentzer E, Murphy JR, Boag W. et al. Publicly available clinical BERT embeddings. arXiv. Preprint posted online 2019.
- 6 Li L, Zhou J, Gao Z. et al. A scoping review of using large language models (LLMs) to investigate electronic health records (EHRs). arXiv. Preprint posted online 2024.
- 7 Dunckley T, Coon KD, Stephan DA. Discovery and development of biomarkers of neurological disease. Drug Discov Today 2005; 10 (05) 326-334
- 8 Sharma VK, Singh TG, Mehta V, Mannan A. Biomarkers: role and scope in neurological disorders. Neurochem Res 2023; 48 (07) 2029-2058
- 9 Vaswani A, Shazeer N, Parmar N. et al. Attention is All You Need. In: Guyon I, Luxburg UV, Bengio S, et al., eds. Advances in Neural Information Processing Systems. Vol 30. Curran Associates, Inc.; 2017 . Accessed November 18, 2025 at: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-paper.pdf
- 10 Peng Y, Yan S, Lu Z. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. arXiv. Preprint posted online 2019.
- 11 Subramanian S, Elango V, Gungor M. Small language models (SLMs) can still pack a punch: a survey. arXiv. Preprint posted online 2025.
- 12 Chiang WL, Li Z, Lin Z. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. LMSYS ORG. March 30, 2023 . Accessed November 18, 2025 at: https://lmsys.org/blog/2023-03-30-vicuna/
- 13 Mixtral AI. Mixtral of experts. December 11, 2023 . Accessed November 18, 2025 at: https://mistral.ai/news/mixtral-of-experts
- 14 DeepSeek-AI. Guo D, Yang D. et al. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. arXiv. Preprint posted online 2025.
- 15 Raffel C, Shazeer N, Roberts A. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv. Preprint posted online 2019.
- 16 Karabacak M, Margetis K. Embracing large language models for medical applications: opportunities and challenges. Cureus 2023; 15 (05) e39305
- 17 Lu W, Luu RK, Buehler MJ. Fine-tuning large language models for domain adaptation: exploration of training strategies, scaling, model merging and synergistic capabilities. NPJ Comput Mater 2025; 11 (01) 84
- 18 Gururangan S, Marasović A, Swayamdipta S. et al. Don't stop pretraining: adapt language models to domains and tasks. arXiv. Preprint posted online 2020.
- 19 Gao Y, Dligach D, Christensen L. et al. A scoping review of publicly available language tasks in clinical natural language processing. J Am Med Inform Assoc 2022; 29 (10) 1797-1806
- 20 Inouye SK, Leo-Summers L, Zhang Y, Bogardus Jr ST, Leslie DL, Agostini JV. A chart-based method for identification of delirium: validation compared with interviewer ratings using the confusion assessment method. J Am Geriatr Soc 2005; 53 (02) 312-318
- 21 Cheng Y, Malekar M, He Y. et al. High-throughput phenotyping of the symptoms of alzheimer disease and related dementias using large language models: cross-sectional study. JMIR AI 2025; 4: e66926
- 22 Ge W, Alabsi H, Jain A. et al. Identifying patients with delirium based on unstructured clinical notes: observational study. JMIR Form Res 2022; 6 (06) e33834
- 23 Fonferko-Shadrach B, Lacey AS, Roberts A. et al. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system. BMJ Open 2019; 9 (04) e023232
- 24 Xie K, Gallagher RS, Conrad EC. et al. Extracting seizure frequency from epilepsy clinic notes: a machine reading approach to natural language processing. J Am Med Inform Assoc 2022; 29 (05) 873-881
- 25 Fang S, Holgate B, Shek A. et al. Extracting epilepsy-related information from unstructured clinic letters using large language models. Epilepsia 2025; 66 (09) 3369-3384
- 26 Sivarajkumar S, Tam TYC, Mohammad HA. et al. Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing. J Am Med Inform Assoc 2024; 31 (10) 2217-2227
- 27 Chen A, Paredes D, Yu Z. et al. Identifying Symptoms of Delirium from Clinical Narratives Using Natural Language Processing. In: 2024 IEEE 12th International Conference on Healthcare Informatics (ICHI). IEEE; 2024 :305–311.
- 28 Yang X, Chen A, PourNejatian N. et al. A large language model for electronic health records. NPJ Digit Med 2022; 5 (01) 194
- 29 Chiang CC, Luo M, Dumkrieger G. et al. A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records. Headache 2024; 64 (04) 400-409
- 30 Ojemann WKS, Xie K, Liu K. et al. Zero-shot extraction of seizure outcomes from clinical notes using generative pretrained transformers. J Healthc Inform Res 2025; 9 (03) 380-400
- 31 Brown TB, Mann B, Ryder N. et al. Language models are few-shot learners. arXiv. Preprint posted online 2020;
- 32 Song J, Huang J, Liu R. Integrating NLP and LLMs to discover biomarkers and mechanisms in Alzheimer's disease. SLAS Technol 2025; 31: 100257
- 33 Mao C, Xu J, Rasmussen L. et al. AD-BERT: using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer's disease. J Biomed Inform 2023; 144: 104442
- 34 Lin CH, Hsu KC, Liang CK. et al. A disease-specific language representation model for cerebrovascular disease research. Comput Methods Programs Biomed 2021; 211: 106446
- 35 Lu H, Ehwerhemuepha L, Rakovski C. A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance. BMC Med Res Methodol 2022; 22 (01) 181
- 36 Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R. Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 2018; 29 (08) 3573-3587
- 37 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321-357
- 38 Amacher SA, Baumann SM, Berger S. et al. Can the large language model ChatGPT-4omni predict outcomes in adult patients with status epilepticus?. Epilepsia 2025; 66 (03) 674-685
- 39 Hsu E, Bako AT, Potter T. et al. Extraction of radiological characteristics from free-text imaging reports using natural language processing among patients with ischemic and hemorrhagic stroke: algorithm development and validation. JMIR AI 2023; 2: e42884
- 40 Hewitt KJ, Wiest IC, Carrero ZI. et al. Large language models as a diagnostic support tool in neuropathology. J Pathol Clin Res 2024; 10 (06) e70009
- 41 Song X, Wang J, He F, Yin W, Ma W, Wu J. Stroke diagnosis and prediction tool using ChatGLM: development and validation study. J Med Internet Res 2025; 27: e67010
- 42 Cross JL, Choma MA, Onofrey JA. Bias in medical AI: implications for clinical decision-making. PLOS Digit Health 2024; 3 (11) e0000651
- 43 Akins RB, Tolson H, Cole BR. Stability of response characteristics of a Delphi panel: application of bootstrap data expansion. BMC Med Res Methodol 2005; 5 (01) 37
- 44 Mathew J, Fakhraei S, Ambite JL. Biomedical named entity recognition via reference-set augmented bootstrapping. arXiv. Preprint posted online 2019;
- 45 Chen Y, Lasko TA, Mei Q, Denny JC, Xu H. A study of active learning methods for named entity recognition in clinical text. J Biomed Inform 2015; 58: 11-18
- 46 Nogues IE, Wen J, Lin Y. et al. Weakly semi-supervised phenotyping using electronic health records. J Biomed Inform 2022; 134: 104175
- 47 Carlini N, Tramer F, Wallace E. et al. Extracting training data from large language models. arXiv. Preprint posted online 2020;
- 48 Moura L, Jones DT, Sheikh IS. et al. Implications of large language models for quality and efficiency of neurologic care: emerging issues in neurology. Neurology 2024; 102 (11) e209497
- 49 Barrit S, Torcida N, Mazeraud A. et al. Specialized large language model outperforms neurologists at complex diagnosis in blinded case-based evaluation. Brain Sci 2025; 15 (04) 347
- 50 Costa AD, Denkovski S, Malyska M. et al. Multiple sclerosis severity classification from clinical text. arXiv. Preprint posted online 2020;
- 51 Chen BY, Antaki F, Gonzalez M. et al. Automated identification of stroke thrombolysis contraindications from synthetic clinical notes: a proof-of-concept study. Cerebrovasc Dis Extra 2025; 15 (01) 130-136
- 52 Julian-Kwong M, Poole S, Henderson K. et al. Large language models to infer depression in patients with neurological conditions: applications and limitations (P2–1.006). Neurology 2025; 104 (7, suppl_1): 5459
- 53 Kaster L, Hillis E, Oh IY. et al; Brain Gene Registry Consortium. Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models. J Neurodev Disord 2025; 17 (01) 24
- 54 Neubig L, Larsen D, Kunduk M, Kist AM. Unstructured electronic health records of dysphagic patients analyzed by large language models. IEEE J Transl Eng Health Med 2025; 13: 237-245
- 55 Owens D, Nguyen DQ, Dohopolski M, Rousseau JF, Peterson ED, Navar AM. Accuracy of large language models to identify stroke subtypes within unstructured electronic health record data. Stroke 2025; 56 (10) 2966-2975
- 56 Rhee J, Sounack T, Tentor Z. et al. Validating patient symptoms in the electronic health record with large language models for scalable tracking of symptoms in neuro-oncology (P3–11.004). Neurology 2025; 104 (7, suppl_1): 2978
- 57 Sung SF, Chen CH, Pan RC, Hu YH, Jeng JS. Natural language processing enhances prediction of functional outcome after acute ischemic stroke. J Am Heart Assoc 2021; 10 (24) e023486
- 58 Deng B, Zhu W, Sun X. et al. Development and validation of an automatic system for intracerebral hemorrhage medical text recognition and treatment plan output. Front Aging Neurosci 2022; 14: 798132
- 59 Hakeem H, Feng W, Chen Z. et al. Development and validation of a deep learning model for predicting treatment response in patients with newly diagnosed epilepsy. JAMA Neurol 2022; 79 (10) 986-996
- 60 Miller MI, Orfanoudaki A, Cronin M. et al. Natural language processing of radiology reports to detect complications of ischemic stroke. Neurocrit Care 2022; 37 (Suppl. 02) 291-302
- 61 Kanzawa J, Yasaka K, Fujita N, Fujiwara S, Abe O. Automated classification of brain MRI reports using fine-tuned large language models. Neuroradiology 2024; 66 (12) 2177-2183
- 62 Le Guellec B, Lefèvre A, Geay C. et al. Performance of an open-source large language model in extracting information from free-text radiology reports. Radiol Artif Intell 2024; 6 (04) e230364
- 63 Chung YG, Cho J, Kim YH. et al. Data transformation of unstructured electroencephalography reports by natural language processing: improving data usability for large-scale epilepsy studies. Front Neurol 2025; 16: 1521001
- 64 Poole S, Sisodia N, Koshal K. et al; UCSF Multiple Sclerosis and Neuroinflammation Center clinicians. Detecting new lesions using a large language model: applications in real-world multiple sclerosis datasets. Ann Neurol 2025; 98 (02) 308-316