Appl Clin Inform 2020; 11(01): 172-181
DOI: 10.1055/s-0040-1702214
Research Article
Georg Thieme Verlag KG Stuttgart · New York

Detecting Social and Behavioral Determinants of Health with Structured and Free-Text Clinical Data

Daniel J. Feller
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
Oliver J. Bear Don't Walk IV
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
Jason Zucker
2  Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
,
Michael T. Yin
2  Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
,
Peter Gordon
2  Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
,
Noémie Elhadad
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
› Author Affiliations
Funding This study was funded by the following sources:
National Library of Medicine—T15 LM007079: “Training in Biomedical Informatics at Columbia University.”
National Institute of Allergy and Infectious Diseases—T32AI007531 “Training in Pediatric Infectious Diseases.”
National Institute of General Medical Sciences—R01 GM114355.
Further Information

Publication History

14 October 2019

07 January 2020

Publication Date:
04 March 2020 (online)

Abstract

Background Social and behavioral determinants of health (SBDH) are environmental and behavioral factors that often impede disease management and result in sexually transmitted infections. Despite their importance, SBDH are inconsistently documented in electronic health records (EHRs) and typically collected only in an unstructured format. Evidence suggests that structured data elements present in EHRs can contribute further to identify SBDH in the patient record.

Objective Explore the automated inference of both the presence of SBDH documentation and individual SBDH risk factors in patient records. Compare the relative ability of clinical notes and structured EHR data, such as laboratory measurements and diagnoses, to support inference.

Methods We attempt to infer the presence of SBDH documentation in patient records, as well as patient status of 11 SBDH, including alcohol abuse, homelessness, and sexual orientation. We compare classification performance when considering clinical notes only, structured data only, and notes and structured data together. We perform an error analysis across several SBDH risk factors.

Results Classification models inferring the presence of SBDH documentation achieved good performance (F1 score: 92.7–78.7; F1 considered as the primary evaluation metric). Performance was variable for models inferring patient SBDH risk status; results ranged from F1 = 82.7 for LGBT (lesbian, gay, bisexual, and transgender) status to F1 = 28.5 for intravenous drug use. Error analysis demonstrated that lexical diversity and documentation of historical SBDH status challenge inference of patient SBDH status. Three of five classifiers inferring topic-specific SBDH documentation and 10 of 11 patient SBDH status classifiers achieved highest performance when trained using both clinical notes and structured data.

Conclusion Our findings suggest that combining clinical free-text notes and structured data provide the best approach in classifying patient SBDH status. Inferring patient SBDH status is most challenging among SBDH with low prevalence and high lexical diversity.

Protection of Human and Anmial Subjects

The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects, and was reviewed by the Institutional Review Board at Columbia University Medical Center.


Supplementary Material