A Framework for Systematic Assessment of Clinical Trial Population Representativeness Using Electronic Health Records Data

Yingcheng Sun; Alex Butler; Ibrahim Diallo; Jae Hyun Kim; Casey Ta; James R. Rogers; Hao Liu; Chunhua Weng

doi:10.1055/s-0041-1733846

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035026.xml

Share / Bookmark

Facebook Linkedin Weibo

Download PDF

Appl Clin Inform 2021; 12(04): 816-825
DOI: 10.1055/s-0041-1733846

Research Article

A Framework for Systematic Assessment of Clinical Trial Population Representativeness Using Electronic Health Records Data

Yingcheng Sun

¹Department of Biomedical Informatics, Columbia University, New York, New York, United States

,

Alex Butler

¹Department of Biomedical Informatics, Columbia University, New York, New York, United States

²Department of Medicine, Columbia University, New York, New York, United States

,

Ibrahim Diallo

¹Department of Biomedical Informatics, Columbia University, New York, New York, United States

,

Jae Hyun Kim

¹Department of Biomedical Informatics, Columbia University, New York, New York, United States

,

Casey Ta

¹Department of Biomedical Informatics, Columbia University, New York, New York, United States

,

James R. Rogers

¹Department of Biomedical Informatics, Columbia University, New York, New York, United States

,

Hao Liu

¹Department of Biomedical Informatics, Columbia University, New York, New York, United States

,

Chunhua Weng

¹Department of Biomedical Informatics, Columbia University, New York, New York, United States

› Author Affiliations
Funding This work was supported by the National Library of Medicine grant R01LM009886–11 (Bridging the Semantic Gap Between Research Eligibility Criteria and Clinical Data) and National Center for Advancing Clinical and Translational Science grants UL1TR001873 and 3U24TR001579–05.

› Further Information

Also available at

Abstract
Full Text
References

Permissions and Reprints

Abstract

Background Clinical trials are the gold standard for generating robust medical evidence, but clinical trial results often raise generalizability concerns, which can be attributed to the lack of population representativeness. The electronic health records (EHRs) data are useful for estimating the population representativeness of clinical trial study population.

Objectives This research aims to estimate the population representativeness of clinical trials systematically using EHR data during the early design stage.

Methods We present an end-to-end analytical framework for transforming free-text clinical trial eligibility criteria into executable database queries conformant with the Observational Medical Outcomes Partnership Common Data Model and for systematically quantifying the population representativeness for each clinical trial.

Results We calculated the population representativeness of 782 novel coronavirus disease 2019 (COVID-19) trials and 3,827 type 2 diabetes mellitus (T2DM) trials in the United States respectively using this framework. With the use of overly restrictive eligibility criteria, 85.7% of the COVID-19 trials and 30.1% of T2DM trials had poor population representativeness.

Conclusion This research demonstrates the potential of using the EHR data to assess the clinical trials population representativeness, providing data-driven metrics to inform the selection and optimization of eligibility criteria.

Keywords

clinical trials - eligibility criteria - generalizability assessment - population representativeness - information extraction - natural language processing

Protection of Human and Animal Subjects

No human or animal subjects were involved in the project.

Publication History

Received: 18 April 2021

Accepted: 23 June 2021

Article published online:
08 September 2021

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Piantadosi S. Clinical Trials: A Methodologic Perspective. John Wiley & Sons; 2017

MissingFormLabel
Search in Google Scholar
2 Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: a review. Contemp Clin Trials Commun 2018; 11: 156-164

MissingFormLabel
Crossref PubMed Search in Google Scholar
3 Naceanceno KS, House SL, Asaro PV. Shared-task worklists improve clinical trial recruitment workflow in an academic emergency department. Appl Clin Inform 2021; 12 (02) 293-300

MissingFormLabel
Thieme Connect PubMed Search in Google Scholar
4 Sen A, Ryan P, Goldstein A. et al. Assessing eligibility criteria generalizability and their correlations with adverse events using big data for EHRS and clinical trials. In Proceedings of the Data Science Learning and Applications to Biomedical and Health Sciences Conference (Big Data Workshop organized by New York Academy of Sciences; 74–79

MissingFormLabel
PubMed
5 Thadani SR, Weng C, Bigger JT. et al. Electronic screening improves efficiency in clinical trial recruitment. J Am Med Inform Assoc 2009; 16 (6): 869-873

MissingFormLabel
Crossref PubMed Search in Google Scholar
6 Weng C. Optimizing clinical research participant selection with informatics. Trends in pharmacological sciences 2015; 36 (11): 706-709

MissingFormLabel
Crossref PubMed Search in Google Scholar
7 Van Spall HG, Toren A, Kiss A, Fowler RA. Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review. JAMA 2007; 297 (11) 1233-1240

MissingFormLabel
Crossref PubMed Search in Google Scholar
8 Janson M, Edlund G, Kressner U. et al. Analysis of patient selection and external validity in the Swedish contribution to the COLOR trial. Surg Endosc 2009; 23 (08) 1764-1769

MissingFormLabel
Crossref PubMed Search in Google Scholar
9 van der Aalst CM, van Iersel CA, van Klaveren RJ. et al. Generalisability of the results of the Dutch-Belgian randomised controlled lung cancer CT screening trial (NELSON): does self-selection play a role?. Lung Cancer 2012; 77 (01) 51-57

MissingFormLabel
Crossref PubMed Search in Google Scholar
10 Bress AP, Tanner RM, Hess R, Colantonio LD, Shimbo D, Muntner P. Generalizability of SPRINT Results to the U.S. Adult Population. J Am Coll Cardiol 2016; 67 (05) 463-472

MissingFormLabel
Crossref PubMed Search in Google Scholar
11 Weng C, Li Y, Ryan P. et al. A distribution-based method for assessing the differences between clinical trial target populations and patient populations in electronic health records. Appl Clin Inform 2014; 5 (02) 463-479

MissingFormLabel
Thieme Connect PubMed Search in Google Scholar
12 Sen A, Ryan P, Goldstein A. et al. Correlating eligibility criteria generalizability and adverse events using Big Data for patients and clinical trials. Annals of the New York Academy of Sciences 2017; 1387 (01) 34-43

MissingFormLabel
Crossref PubMed Search in Google Scholar
13 Sen A, Chakrabarti S, Goldstein A, Wang S, Ryan PB, Weng C. GIST 2.0: a scalable multi-trait metric for quantifying population representativeness of individual clinical studies. J Biomed Inform 2016; 63: 325-336

MissingFormLabel
Crossref PubMed Search in Google Scholar
14 Cahan A, Cahan S, Cimino JJ. Computer-aided assessment of the generalizability of clinical trial results. Int J Med Inform 2017; 99: 60-66

MissingFormLabel
Crossref PubMed Search in Google Scholar
15 Reich C, Ryan PB, Belenkaya R. et al. OHDSI Common Data Model v6.0 Specifications. Accessed 2019 at: https://github.com/OHDSI/CommonDataModel/wiki

MissingFormLabel
PubMed
16 Tu SW, Peleg M, Carini S. et al. A practical method for transforming free-text eligibility criteria into computable criteria. J Biomed Inform 2011; 44 (02) 239-250

MissingFormLabel
Crossref PubMed Search in Google Scholar
17 Yuan C, Ryan PB, Ta C. et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J Am Med Inform Assoc 2019; 26 (04) 294-305

MissingFormLabel
Crossref PubMed Search in Google Scholar
18 Savova GK, Masanz JJ, Ogren PV. et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010; 17 (05) 507-513

MissingFormLabel
Crossref PubMed Search in Google Scholar
19 Aronson AR. 2001 Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium (p. 17). American Medical Informatics Association. Accessed 2021 at: https://pubmed.ncbi.nlm.nih.gov/11825149/

MissingFormLabel
PubMed Search in Google Scholar
20 Kury F, Butler A, Yuan C. et al. Chia, a large annotated corpus of clinical trial eligibility criteria. Sci Data 2020; 7 (01) 281

MissingFormLabel
Crossref PubMed Search in Google Scholar
21 Hripcsak G, Duke JD, Shah NH. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015; 216: 574-578

MissingFormLabel
PubMed Search in Google Scholar
22 Chang AX, Manning CD. 2012 , May. Sutime: a library for recognizing and normalizing time expressions. In: LREC. European Language Resources Association (ELRA); vol. 2012;3735–3740. Accessed 2021 at: http://www.lrec-conf.org/proceedings/lrec2012/pdf/284_Paper.pdf

MissingFormLabel
PubMed
23 Laffin LJ, Besser SA, Alenghat FJ. A data-zone scoring system to assess the generalizability of clinical trial results to individual patients. Eur J Prev Cardiol 2019; 26 (06) 569-575

MissingFormLabel
Crossref PubMed Search in Google Scholar
24 Chatterjee P, Cymberknop LJ, Armentano RL. Nonlinear systems in healthcare towards intelligent disease prediction. In: Nonlinear Systems-Theoretical Aspects and Recent Applications. IntechOpen; 2019

MissingFormLabel
Crossref Search in Google Scholar
25 Awad M, Khanna R. Support vector regression. In: Efficient Learning Machines. Apress, Berkeley; CA: 67-80

MissingFormLabel
26 Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS One 2018; 13 (08) e0202344

MissingFormLabel
Crossref PubMed Search in Google Scholar
27 Sun Y, Butler A, Lin F. et al. The COVID-19 trial finder. J Am Med Inform Assoc 2021; 28 (03) 616-621

MissingFormLabel
Crossref PubMed Search in Google Scholar
28 Kim JH, Ta CN, Liu C. et al. Towards clinical data-driven eligibility criteria optimization for interventional COVID-19 clinical trials. J Am Med Inform Assoc 2021; 28 (01) 14-22

MissingFormLabel
Crossref PubMed Search in Google Scholar
29 Al-Lawati JA. Diabetes mellitus: a local and global public health emergency!. Oman medical journal 2017; 32 (03) 177-179

MissingFormLabel
Crossref PubMed Search in Google Scholar
30 Sun Y, Butler A, Stewart LA. et al. Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials. J Biomed Inform 2021; 118: 103790

MissingFormLabel
Crossref PubMed Search in Google Scholar
31 Sen A, Goldstein A, Chakrabarti S. et al. The representativeness of eligible patients in type 2 diabetes trials: a case study using GIST 2.0. J Am Med Inform Assoc 2018; 25 (03) 239-247

MissingFormLabel
Crossref PubMed Search in Google Scholar
32 Sun Y, Loparo K. Information extraction from free text in clinical trials with knowledge-based distant supervision. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC) (Vol. 1, pp. 954–955). IEEE

MissingFormLabel
PubMed
33 Li X, Liu H, Kury F. et al. A Comparison between Human and NLP-based Annotation of Clinical Trial Eligibility Criteria Text Using The OMOP Common Data Model. In AMIA 2021 Virtual Informatics Summit; 394-403

MissingFormLabel
PubMed

Subscribe to RSS

Share / Bookmark

A Framework for Systematic Assessment of Clinical Trial Population Representativeness Using Electronic Health Records Data

Abstract

Keywords

Protection of Human and Animal Subjects

Publication History

References