Open Access
CC BY 4.0 · Appl Clin Inform 2025; 16(02): 314-326
DOI: 10.1055/a-2524-5216
Research Article

Application of an Externally Developed Algorithm to Identify Research Cases and Controls from EHR Data: Trials and Triumphs

Nelly Estefanie Garduno-Rapp*
1   Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States
,
Simone Herzberg*
2   Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States
3   Medical Scientist Training Program, Vanderbilt University School of Medicine, Nashville, Tennessee, United States
,
Henry H. Ong
4   Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
,
Cindy Kao
1   Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States
,
Christoph U. Lehmann
1   Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States
,
Srushti Gangireddy
4   Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
,
Nitin B Jain
5   Department of Physical Medicine and Rehabilitation, University of Michigan, Ann Arbor, Michigan, United States
,
Ayush Giri
2   Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States
6   Division of Quantitative and Clinical Sciences, Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville, Tennessee, United States
› Author Affiliations

Funding This work received funding from the U.S. Department of Health and Human Services, National Institutes of Health, National Center for Advancing Translational Sciences (grant no.: UL1TR003163), National Institutes of Health, and National Institute of Arthritis and Musculoskeletal and Skin Diseases ( grant no.: R01AR074989).
Preview

Abstract

Background

The use of electronic health records (EHRs) in research demands robust and interoperable systems. By linking biorepositories to EHR algorithms, researchers can efficiently identify cases and controls for large observational studies (e.g., genome-wide association studies). This is critical for ensuring efficient and cost-effective research. However, the lack of standardized metadata and algorithms across different EHRs complicates their sharing and application. Our study presents an example of a successful implementation and validation process.

Objectives

This study aimed to implement and validate a rule-based algorithm from a tertiary medical center in Tennessee to classify cases and controls from a research study on rotator cuff tear (RCT) nested within a tertiary medical center in North Texas and to assess the algorithm's performance.

Methods

We applied a phenotypic algorithm (designed and validated in a tertiary medical center in Tennessee) using EHR data from 492 patients enrolled in a case-control study recruited from a tertiary medical center in North Texas. The algorithm leveraged the international classification of diseases and current procedural terminology codes to identify case and control status for degenerative RCT. A manual review was conducted to compare the algorithm's classification with a previously recorded gold standard documented by clinical researchers.

Results

Initially the algorithm identified 398 (80.9%) patients correctly as cases or controls. After fine-tuning and correcting errors in our gold standard dataset, we calculated a sensitivity of 0.94 and a specificity of 0.76. The implementation of the algorithm presented challenges due to the variability in coding practices between medical centers. To enhance performance, we refined the algorithm's data dictionary by incorporating additional codes. The process highlighted the need for meticulous code verification and standardization in multi-center studies.

Conclusion

Sharing case-control algorithms boosts EHR research. Our rule-based algorithm improved multi-site patient identification and revealed 12 data entry errors, helping validate our results.

Protection of Human and Animal Subjects

Our study received approval from the Institutional Review Board center STU-2020-0689. Only patients who provided informed consent at UTSW were included in the data query. To ensure confidentiality, all patient information was de-identified and securely managed.


* These authors contributed equally.




Publication History

Received: 06 October 2024

Accepted: 15 January 2025

Accepted Manuscript online:
24 January 2025

Article published online:
26 March 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany