Open Access
CC BY 4.0 · Appl Clin Inform 2025; 16(05): 1475-1485
DOI: 10.1055/a-2703-7227
Research Article

A Natural Language Processing Approach to Identify Negative Patient Descriptors in Electronic Health Records for Maternal Care

Authors

  • Azade Tabaie

    1   Center for Biostatistics, Informatics, and Data Science, MedStar Health Research Institute, Columbia, Maryland, United States
    2   Department of Emergency Medicine, Georgetown University Medical Center, Washington, District of Columbia, United States
  • Angela D. Thomas

    3   Healthcare Delivery Research, MedStar Health Research Institute, Washington, District of Columbia, United States
    4   Georgetown University School of Health, Washington, District of Columbia, United States
  • Emily K. Mutondo

    3   Healthcare Delivery Research, MedStar Health Research Institute, Washington, District of Columbia, United States
  • Allan Fong

    1   Center for Biostatistics, Informatics, and Data Science, MedStar Health Research Institute, Columbia, Maryland, United States

Abstract

Background

Maternal harm, especially for Black women, is a significant health care issue. Unstructured clinical notes in electronic health records (EHRs) may reveal unsafe maternal care. Prior studies using natural language processing (NLP) have shown that tone and sentiment in notes contribute to preventable safety events.

Objective

This study aimed to examine whether negative patient descriptors in EHR clinical notes are associated with adverse maternal outcomes and how their use varies by patient demographics.

Methods

We conducted a retrospective cohort study of women who delivered at two large birthing hospitals in Washington, DC between January 1, 2016 and March 31, 2020. Using a predefined list of negative keywords (e.g., combative) and NLP, we identified sentences from clinical notes for manual review. Two subject matter experts labeled keywords as “negative descriptors” if they negatively described patients. A logistic regression model with elastic net regularization was trained on the labeled sentences to classify the remaining corpus. We evaluated the prevalence of negative descriptors by race, age, insurance type, and pregnancy outcomes, and calculated adjusted odds ratios.

Results

Among 190,026 clinical notes from 9,302 patients, 719 notes associated with 444 patients contained at least one negative descriptor. Of these, 313 (70.5%) were Black, 45 (10.1%) were White, and 86 (19.4%) were from Other racial groups (p < 0.001). Negative descriptors were more common among younger patients (18–29 years: 49.3%) and those with Medicare/Medicaid insurance (65.3%). Although case patients—defined as those with postpartum readmission or severe maternal morbidity—had slightly fewer descriptors overall, they had higher adjusted odds of having them. Black patients were associated with higher odds, and commercial insurance with lower odds, of having negative descriptors.

Conclusion

Negative descriptors appear disproportionately in the notes of Black patients and those with public insurance, suggesting implicit bias in documentation. Addressing biased language is essential for improving equity in maternal care.

Protection of Human and Animal Subjects

The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects and was reviewed by our Institutional Review Board.


Data Availability

The datasets generated during and/or analyzed during this study are not publicly available due to the sensitive nature of the electronic health records data and identifiable information embedded in clinical notes but are available from the corresponding author on reasonable request with proper data use agreement.




Publication History

Received: 30 April 2025

Accepted: 17 September 2025

Article published online:
28 October 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany