Subscribe to RSS
DOI: 10.1055/a-2616-9858
Extracting Social Determinants of Health from Dental Clinical Notes
Authors
Funding This project was supported by the Research Training Program Scholarship provided by the Australian Government, and the Postgraduate Top-up Scholarship provided by the Commonwealth Scientific and Industrial Research Organisation.

Abstract
Objectives
In dentistry, social determinants of health (SDoH) are potentially recorded in the clinical notes of electronic dental records. The objective of this study was to examine the availability of SDoH data in dental clinical notes and evaluate natural language processing methods to extract SDoH from dental clinical notes.
Methods
A set of 1,000 dental clinical notes was sampled from a dataset of 105,311 patient visits to a dental clinic and manually annotated for information pertaining to sugar, tobacco, alcohol, methamphetamine, housing, and employment. Annotations included temporality, dose, type, duration, and frequency where appropriate. Experiments were to compare extraction using fine-tuned pretrained language models (PLMs) with a rule-based approach. Performance was measured by F1-score.
Results
For identifying SDoH, the best-performing PLM method produced F1-scores of 0.75 (sugar), 0.69 (tobacco), 0.67 (alcohol), 0.42 (housing), and 0 (employment). The rule-based method produced F1-scores of 0.70 (sugar), 0.69 (tobacco), 0.53 (alcohol), 0.44 (housing), and 0 (employment). The overall difference between PLMs and rule-based methods was F1-score of 0.04 (95% confidence interval −0.01, 0.09). SDoH were relatively rare in dental clinical notes, from sugar (9.1%), tobacco (3.9%), alcohol (1.2%), housing (1.2%), employment (0.2%), and methamphetamine use (0%).
Conclusion
The main challenge of extracting SDoH information from dental clinical notes was the frequency with which they are recorded, and the brevity and inconsistency where they are recorded. Improved surveillance likely needs new ways to standardize how SDoHs are reported in dental clinical notes.
Keywords
natural language processing - information extraction - electronic dental records - social determinants of health - dentistryProtection of Human and Animal Subjects
Ethical approval and waiver of consent for this study was granted by the Nepean Blue Mountains Local Health District Human Research Ethics Committee to The University of Sydney on May 24, 2022 (2022/ETH00578). This study was conducted in accordance with the Australian National Statement on Ethical Conduct in Human Research (2007).
Data Availability
The data underlying this article cannot be shared publicly due to reidentification risks per ethics approval requirements.
Publication History
Received: 23 January 2025
Accepted: 20 May 2025
Accepted Manuscript online:
21 May 2025
Article published online:
03 October 2025
© 2025. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 World Health Organization. Global oral health status report: towards universal health coverage for oral health by 2030. Geneva: World Health Organization; 2022
- 2 World Health Organization. Commission on social determinants of health. Closing the gap in a generation: health equity through action on the social determinants of health. Geneva: World Health Organization; 2008
- 3 Peres MA, Macpherson LMD, Weyant RJ. et al. Oral diseases: a global public health challenge. Lancet 2019; 394 (10194): 249-260
- 4 Watt RG, Venturelli R, Daly B. Understanding and tackling oral health inequalities in vulnerable adult populations: from the margins to the mainstream. Br Dent J 2019; 227 (01) 49-54
- 5 Baumeister S-E, Freuer D, Baurecht H. et al. Understanding the consequences of educational inequalities on periodontitis: a Mendelian randomization study. J Clin Periodontol 2022; 49 (03) 200-209
- 6 Freitas DJ, Kaplan LM, Tieu L, Ponath C, Guzman D, Kushel M. Oral health and access to dental care among older homeless adults: results from the HOPE HOME study. J Public Health Dent 2019; 79 (01) 3-9
- 7 Alsaeed L, Adham M, Sabbah W. Association between housing characteristics and dental caries among children in USA. Community Dent Health 2024; 41 (02) S4-S4
- 8 Watt RG, Daly B, Allison P. et al. Ending the neglect of global oral health: time for radical action. Lancet 2019; 394 (10194): 261-272
- 9 Dye BA, Duran DG, Murray DM. et al. The importance of evaluating health disparities research. Am J Public Health 2019; 109 (S1): S34-S40
- 10 Chen M, Tan X, Padman R. Social determinants of health in electronic health records and their impact on analysis and risk prediction: a systematic review. J Am Med Inform Assoc 2020; 27 (11) 1764-1773
- 11 Jonnalagadda P, Swoboda C, Singh P. et al. Developing dashboards to address children's health disparities in Ohio. Appl Clin Inform 2022; 13 (01) 100-112
- 12 Malvitz DM, Barker LK, Phipps KR. Development and status of the National Oral Health Surveillance System. Prev Chronic Dis 2009; 6 (02) A66
- 13 Claman DB, Molina JL, Peng J, Fischbach H, Casamassimo PS. Accuracy of parental self-report of medical history in a dental setting: integrated electronic health record and nonintegrated dental record. Pediatr Dent 2021; 43 (03) 230-236
- 14 Simon L, Obadan-Udoh E, Yansane A-I. et al. Improving oral-systemic healthcare through the interoperability of electronic medical and dental records: an exploratory study. Appl Clin Inform 2019; 10 (03) 367-376
- 15 Balicer RD, Luengo-Oroz M, Cohen-Stavi C. et al. Using big data for non-communicable disease surveillance. Lancet Diabetes Endocrinol 2018; 6 (08) 595-598
- 16 Cantor MN, Thorpe L. Integrating data on social determinants of health into electronic health records. Health Aff (Millwood) 2018; 37 (04) 585-590
- 17 Hartzler AL, Xie SJ, Wedgeworth P. et al; SDoH Community Champion Advisory Board. Integrating patient voices into the extraction of social determinants of health from clinical notes: ethical considerations and recommendations. J Am Med Inform Assoc 2023; 30 (08) 1456-1462
- 18 Patra BG, Sharma MM, Vekaria V. et al. Extracting social determinants of health from electronic health records using natural language processing: a systematic review. J Am Med Inform Assoc 2021; 28 (12) 2716-2727
- 19 Pethani F, Dunn AG. Natural language processing for clinical notes in dentistry: A systematic review. J Biomed Inform 2023; 138: 104282
- 20 Chiticariu L, Li Y, Reiss F. Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems. Paper presented at: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013 ; Seattle, Washington, United States.
- 21 Fu S, Chen D, He H. et al. Clinical concept extraction: a methodology review. J Biomed Inform 2020; 109: 103526
- 22 Freitag D, Cadigan J, Niekrasz J, Sasseen R. Accelerating human authorship of information extraction rules. Paper presented at: Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning, 2022 ; Gyeongju, Republic of Korea.
- 23 Bai F, Ritter A, Xu W. Pre-train or annotate? Domain adaptation with a constrained budget. Paper presented at: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021 ; Online and Punta Cana, Dominican Republic.
- 24 Solar O, Irwin A. A conceptual framework for action on the social determinants of health. Social Determinants of Heath Discussion Paper 2 (Policy and Practice): World Health Organization; 2010
- 25 McKetin R, Degenhardt L, Shanahan M, Baker AL, Lee NK, Lubman DI. Health service utilisation attributable to methamphetamine use in Australia: Patterns, predictors and national impact. Drug Alcohol Rev 2018; 37 (02) 196-204
- 26 Degenhardt L, Sara G, McKetin R. et al. Crystalline methamphetamine use and methamphetamine-related harms in Australia. Drug Alcohol Rev 2017; 36 (02) 160-170
- 27 Council of Australian Governments Health Council. Healthy mouths, healthy lives: Australia's National Oral Health Plan 2015–2024. Adelaide: South Australian Dental Service; 2015
- 28 Miller GA. WordNet: a lexical database for English. Association for Computing Machinery 1995; 38 (11) 39-41
- 29 Unified Medical Language System Metathesaurus. Accessed January 24, 2023 at: https://uts.nlm.nih.gov/uts/umls/home
- 30 Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Paper presented at: 1st International Conference on Learning Representations, 2013
- 31 Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their composionality. Paper presented at: Advances in Neural Information Processing Systems, 2013
- 32 Bejan CA, Angiolillo J, Conway D. et al. Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records. J Am Med Inform Assoc 2018; 25 (01) 61-71
- 33 Topaz M, Murga L, Bar-Bachar O, Cato K, Collins S. Extracting alcohol and substance abuse status from clinical notes: the added value of nursing data. Stud Health Technol Inform 2019; 264: 1056-1060
- 34 Chiu B, Crichton G, Korhonen A, Pyysalo S. How to train good word embeddings for biomedical NLP. Paper presented at: Proceedings of the 15th Workshop on Biomedical Natural Language Processing, 2016 ; Berlin, Germany.
- 35 Lybarger K, Yetisgen M, Uzuner Ö. The 2022 n2c2/UW shared task on extracting social determinants of health. J Am Med Inform Assoc 2023; 30 (08) 1367-1378
- 36 Yetisgen M, Vanderwende L. Automatic identification of substance abuse from social history in clinical text. Paper presented at: Artificial Intelligence in Medicine, 2017 ; Cham.
- 37 Wang Y, Chen ES, Pakhomov S. et al. Automated extraction of substance use information from clinical texts. Paper presented at: AMIA Annu Symp Proc, 2015
- 38 Hong J, Davoudi A, Yu S, Mowery DL. Annotation and extraction of age and temporally-related events from clinical histories. BMC Med Inform Decis Mak 2020; 20 (11, Suppl 11): 338
- 39 Roberts A, Gaizauskas R, Hepple M. et al. Building a semantically annotated corpus of clinical texts. J Biomed Inform 2009; 42 (05) 950-966
- 40 Deleger L, Molnar K, Savova G. et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc 2013; 20 (01) 84-94
- 41 Campillos L, Deléger L, Grouin C. et al. A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT). Lang Resour Eval 2018; 52 (02) 571-601
- 42 Touger-Decker R, van Loveren C. Sugars and dental caries. Am J Clin Nutr 2003; 78 (04) 881S-892S
- 43 Selwitz RH, Ismail AI, Pitts NB. Dental caries. Lancet 2007; 369 (9555) 51-59
- 44 Moores CJ, Kelly SAM, Moynihan PJ. Systematic review of the effect on caries of sugars intake: ten-year update. J Dent Res 2022; 101 (09) 1034-1045
- 45 Shetty V, Mooney LJ, Zigler CM, Belin TR, Murphy D, Rawson R. The relationship between methamphetamine use and increased dental disease. J Am Dent Assoc 2010; 141 (03) 307-318
- 46 Warnakulasuriya S, Dietrich T, Bornstein MM. et al. Oral health risks of tobacco use and effects of cessation. Int Dent J 2010; 60 (01) 7-30
- 47 Ford PJ, Rich AM. Tobacco use and oral health. Addiction 2021; 116 (12) 3531-3540
- 48 Ogden GR. Alcohol and mouth cancer. Br Dent J 2018; 225 (09) 880-883
- 49 Zhu S, Zhang F, Zhao G. et al. Trends in the global burden of oral cancer joint with attributable risk factors: results from the global burden of disease study 2019. Oral Oncol 2022; 134: 106189
- 50 Australian Institute of Health and Welfare. 2019. The health of Australia's prisoners 2018. Cat. no. PHE 246. Canberra: AIHW;
- 51 Lybarger K, Ostendorf M, Yetisgen M. Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction. J Biomed Inform 2021; 113: 103631
- 52 Campo M, Commerford J. Supporting young people leaving out-of-home care. (CFCA Paper No. 41). Melbourne: Child Family Community Australia information exchange, Australian Institute of Family Studies; 2016
- 53 South B, Shen S, Leng J. et al. A prototype tool set to support machine-assisted annotation. Paper presented at: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, 2012 ; Montreal, Canada.
- 54 Rim K. MAE2: portable annotation tool for general natural language use. Paper presented at: In Proceedings of the 12th Joint ACL-ISO Workshop on Interoperable Semantic Annotation, 2016 ; Portorož, Slovenia.
- 55 Stenetorp P, Pyysalo S, Topić G. et al. BRAT: a web-based tool for NLP-assisted text annotation. Paper presented at: Proceedings of the Demonstrations Session at EACL 2012, 2012 ; Avignon, France.
- 56 Hripcsak G, Heitjan DF. Measuring agreement in medical informatics reliability studies. J Biomed Inform 2002; 35 (02) 99-110
- 57 Hripcsak G, Rothschild AS. Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc 2005; 12 (03) 296-298
- 58 Stubbs A, Uzuner Ö. Annotating risk factors for heart disease in clinical narratives for diabetic patients. J Biomed Inform 2015; 58 (Suppl): S78-S91
- 59 Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota, United States: Association for Computational Linguistics; 2019 , 4171–86.
- 60 Lee J, Yoon W, Kim S. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020; 36 (04) 1234-1240
- 61 Alsentzer E, Murphy J, Boag W. et al. Publicly available clinical BERT embeddings. Paper presented at: Proceedings of the 2nd Clinical Natural Language Processing Workshop, 2019 ; Minneapolis, Minnesota, United States.
- 62 Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text. Paper presented at: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019 ; Hong Kong, China.
- 63 Chapman AB, Jones A, Kelley AT. et al. ReHouSED: a novel measurement of Veteran housing stability using natural language processing. J Biomed Inform 2021; 122: 103903
- 64 Reeves RM, Christensen L, Brown JR. et al. Adaptation of an NLP system to a new healthcare environment to identify social determinants of health. J Biomed Inform 2021; 120: 103851
- 65 Wu W, Holkeboer KJ, Kolawole TO, Carbone L, Mahmoudi E. Natural language processing to identify social determinants of health in Alzheimer's disease and related dementia from electronic health records. Health Serv Res 2023; 58 (06) 1292-1302
- 66 Eyre H, Chapman AB, Peterson KS. et al. Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. Paper presented at: AMIA Annu Symp Proc, 2021
- 67 Topaz M, Murga L, Gaddis KM. et al. Mining fall-related information in clinical notes: comparison of rule-based and novel word embedding-based machine learning approaches. J Biomed Inform 2019; 90: 103103
- 68 Zhang J, El-Gohary Nora M. Semantic NLP-based information extraction from construction regulatory documents for automated compliance checking. J Comput Civ Eng 2016; 30 (02) 04015014
- 69 Baldwin T, Guo Y, Mukherjee VV, Syeda-Mahmood T. Generalized extraction and classification of span-level clinical phrases. AMIA Annu Symp Proc 2018; 2018: 205-214
- 70 Yang X, Bian J, Hogan WR, Wu Y. Clinical concept extraction using transformers. J Am Med Inform Assoc 2020; 27 (12) 1935-1942
- 71 Shivade C, Gangadharaiah R, Gella S. et al. Extracting appointment spans from medical conversations. Paper presented at: Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations, 2021
- 72 Lybarger K, Damani A, Gunn M, Uzuner OZ, Yetisgen M. Extracting radiological findings with normalized anatomical information using a span-based BERT relation extraction model. AMIA Jt Summits Transl Sci Proc 2022; 2022: 339-348
- 73 Chen SY, Feng Z, Yi X. A general introduction to adjustment for multiple comparisons. J Thorac Dis 2017; 9 (06) 1725-1729
- 74 Armstrong RA. When to use the Bonferroni correction. Ophthalmic Physiol Opt 2014; 34 (05) 502-508
- 75 Wang Y, Chen ES, Pakhomov S. et al. Automated extraction of substance use information from clinical texts. AMIA Annu Symp Proc 2015; 2015: 2121-2130
- 76 Guevara M, Chen S, Thomas S. et al. Large language models to identify social determinants of health in electronic health records. NPJ Digit Med 2024; 7 (01) 6
- 77 Lybarger K, Bear Don't Walk OJIV, Yetisgen M, Uzuner Ö. Advancements in extracting social determinants of health information from narrative text. J Am Med Inform Assoc 2023; 30 (08) 1363-1366
- 78 Das BC, Amini MH, Wu Y. Security and privacy challenges of large language models: a survey. ACM Comput Surv 2025; 57 (06) 152
- 79 Jonnagaddala J, Wong ZS-Y. Privacy preserving strategies for electronic health records in the era of large language models. NPJ Digit Med 2025; 8 (01) 34
- 80 Tierney AA, Gayre G, Hoberman B. et al. Ambient artificial intelligence scribes to alleviate the burden of clinical documentation. NEJM Catal 2024 5. 03
- 81 van Buchem MM, Boosman H, Bauer MP, Kant IMJ, Cammel SA, Steyerberg EW. The digital scribe in clinical practice: a scoping review and research agenda. NPJ Digit Med 2021; 4 (01) 57
- 82 Feldman SS, Davlyatov G, Hall AG. Toward understanding the value of missing social determinants of health data in care transition planning. Appl Clin Inform 2020; 11 (04) 556-563
- 83 Feller DJ, Bear Don't Walk Iv OJ, Zucker J, Yin MT, Gordon P, Elhadad N. Detecting social and behavioral determinants of health with structured and free-text clinical data. Appl Clin Inform 2020; 11 (01) 172-181
- 84 Khurshid A, Hautala M, Oliveira E. et al. Social and health information platform: piloting a standards-based, digital platform linking social determinants of health data into clinical workflows for community-wide use. Appl Clin Inform 2023; 14 (05) 883-892
- 85 Sylolypavan A, Sleeman D, Wu H, Sim M. The impact of inconsistent human annotations on AI driven clinical decision making. NPJ Digit Med 2023; 6 (01) 26