Appl Clin Inform 2021; 12(04): 816-825
DOI: 10.1055/s-0041-1733846
Research Article

A Framework for Systematic Assessment of Clinical Trial Population Representativeness Using Electronic Health Records Data

Yingcheng Sun
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
Alex Butler
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
2  Department of Medicine, Columbia University, New York, New York, United States
,
Ibrahim Diallo
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
Jae Hyun Kim
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
Casey Ta
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
James R. Rogers
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
Hao Liu
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
Chunhua Weng
1  Department of Biomedical Informatics, Columbia University, New York, New York, United States
› Author Affiliations
Funding This work was supported by the National Library of Medicine grant R01LM009886–11 (Bridging the Semantic Gap Between Research Eligibility Criteria and Clinical Data) and National Center for Advancing Clinical and Translational Science grants UL1TR001873 and 3U24TR001579–05.

Abstract

Background Clinical trials are the gold standard for generating robust medical evidence, but clinical trial results often raise generalizability concerns, which can be attributed to the lack of population representativeness. The electronic health records (EHRs) data are useful for estimating the population representativeness of clinical trial study population.

Objectives This research aims to estimate the population representativeness of clinical trials systematically using EHR data during the early design stage.

Methods We present an end-to-end analytical framework for transforming free-text clinical trial eligibility criteria into executable database queries conformant with the Observational Medical Outcomes Partnership Common Data Model and for systematically quantifying the population representativeness for each clinical trial.

Results We calculated the population representativeness of 782 novel coronavirus disease 2019 (COVID-19) trials and 3,827 type 2 diabetes mellitus (T2DM) trials in the United States respectively using this framework. With the use of overly restrictive eligibility criteria, 85.7% of the COVID-19 trials and 30.1% of T2DM trials had poor population representativeness.

Conclusion This research demonstrates the potential of using the EHR data to assess the clinical trials population representativeness, providing data-driven metrics to inform the selection and optimization of eligibility criteria.

Protection of Human and Animal Subjects

No human or animal subjects were involved in the project.




Publication History

Received: 18 April 2021

Accepted: 23 June 2021

Publication Date:
08 September 2021 (online)

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany