A Distribution-based Method for Assessing The Differences between Clinical Trial Target Populations and Patient Populations in Electronic Health Records
18 December 2013
Accepted: 09 April 2014
21 December 2017 (online)
Objective: To improve the transparency of clinical trial generalizability and to illustrate the method using Type 2 diabetes as an example.
Methods: Our data included 1,761 diabetes clinical trials and the electronic health records (EHR) of 26,120 patients with Type 2 diabetes who visited Columbia University Medical Center of New-York Presbyterian Hospital. The two populations were compared using the Generalizability Index for Study Traits (GIST) on the earliest diagnosis age and the mean hemoglobin A1c (HbA1c) values.
Results: Greater than 70% of Type 2 diabetes studies allow patients with HbA1c measures between 7 and 10.5, but less than 40% of studies allow HbA1c<7 and fewer than 45% of studies allow HbA1c>10.5. In the real-world population, only 38% of patients had HbA1c between 7 and 10.5, with 12% having values above the range and 52% having HbA1c<7. The GIST for HbA1c was 0.51. Most studies adopted broad age value ranges, with the most common restrictions excluding patients >80 or <18 years. Most of the real-world population fell within this range, but 2% of patients were <18 at time of first diagnosis and 8% were >80. The GIST for age was 0.75. Conclusions: We contribute a scalable method to profile and compare aggregated clinical trial target populations with EHR patient populations. We demonstrate that Type 2 diabetes studies are more generalizable with regard to age than they are with regard to HbA1c. We found that the generalizability of age increased from Phase 1 to Phase 3 while the generalizability of HbA1c decreased during those same phases. This method can generalize to other medical conditions and other continuous or binary variables. We envision the potential use of EHR data for examining the generaliz-ability of clinical trials and for defining population-representative clinical trial eligibility criteria.
Citation: Weng C, Li Y, Ryan P, Zhang Y, Liu F, Gao J, Bigger JT, Hripcsak G. A distribution-based method for assessing the differences between clinical trial target populations and patient populations in electronic health records. Appl Clin Inf 2014; 5: 463–479 http://dx.doi.org/10.4338/ACI-2013-12-RA-0105
KeywordsClinical trials - selection bias - comparative effectiveness research - electronic health records - clinical research informatics - meta-analysis
* equal-contribution first authors
- 1 Rothwell PM. External validity of randomised controlled trials: „to whom do the results of this trial apply?“. Lancet 2005; 365 9453 82-93.
- 2 Fuks A, Weijer C, Freedman B, Shapiro S, Skrutkowska M, Riaz A. A study in contrasts: Eligibility criteria in a twenty-year sample of NSABP and POG clinical trials. Journal of Clinical Epidemiology 1998; 51 (Suppl. 02) 69-79.
- 3 Van Spall HGC, Toren A, Kiss A, Fowler RA. Eligibility criteria of randomized controlled trials published in high-impact general medical journals: A systematic sampling review. Journal of the American Medical Association 2007; 297 (11) 1233-1240.
- 4 Williams L, Huang J, Bargh J. The Scaffolded Mind: Higher mental processes are grounded in early experience of the physical world. European Journal of Social Psychology 2009; 39 (Suppl. 07) 1257-1267.
- 5 Hertwig R, Barron G, Weber E, Erev I. Decisions from experience and the effect of rare events. Psychological Science 2004; 15: 534-539.
- 6 Cheng P, Holyoak K. Pragmatic reasoning schemas. Cognitive Psychology 1985; 17: 391-416.
- 7 Weisberg R. Memory, Thought, and Behavior. New York: Oxford University Press.; 1980
- 8 Ommaya Ak KJ. Challenges facing the us patient-centered outcomes research institute. JAMA: The Journal of the American Medical Association 2011; 306 (Suppl. 07) 756-757.
- 9 Schoenmaker N, Van Gool WA. The age gap between patients in clinical studies and in the general population: a pitfall for dementia research. The Lancet Neurology 2004; 3 (10) 627-630.
- 10 Masoudi FA, Havranek EP, Wolfe P, Gross CP, Rathore SS, Steiner JF. et al. Most hospitalized older persons do not meet the enrollment criteria for clinical trials in heart failure. American Heart Journal 2003; 146 (Suppl. 02) 250-257.
- 11 Etulain J, Negrotto S, Carestia A, Pozner RG, Romaniuk MA, D’Atri LP. et al. Acidosis downregulates platelet haemostatic functions and promotes neutrophil proinflammatory responses mediated by platelets. Thrombosis and haemostasis 2012; 107 (Suppl. 01) 99-110. PubMed PMID: 22159527. Epub 2011/12/14.
- 12 Sokka T, Pincus T. Eligibility of patients in routine care for major clinical trials of anti–tumor necrosis factor á agents in rheumatoid arthritis. Arthritis & Rheumatism 2003; 48 (Suppl. 02) 313-318.
- 13 Davis KL, Thal LJ, Gamzu ER, Davis CS, Woolson RF, Gracon SI. et al. A Double-Blind, Placebo-Controlled Multicenter Study of Tacrine for Alzheimer’s Disease. New England Journal of Medicine 1992; 327 (18) 1253-1259.
- 14 Weng C, Bigger J, Busacca L, Wilcox A, Getaneh A. editors. Comparing the Effectiveness of a Clinical Data Warehouse and a Clinical Registry for Supporting Clinical Trial Recruitment: A Case Study. Proceeding of American Medical Informatics Association Fall Symposium 2010.
- 15 Thadani SR, Weng C, Bigger JT, Ennever JF, Wajngurt D. Electronic Screening Improves Efficiency in Clinical Trial Recruitment. Journal of the American Medical Informatics Association 2009; 16 (Suppl. 06) 869-873.
- 16 Weng C, Batres C, Borda T, Weiskopf N, Wilcox A, Bigger J. et al. A real-time screening alert improves patient recruitment efficiency. Proceedings of American Medical Informatics Association Fall Symposium 2011 p. 1489-1498
- 17 Albers DJ, Hripcsak G. A statistical dynamics approach to the study of human health data: resolving population scale diurnal variation in laboratory data. Phys Lett A 2010; 374 (Suppl. 09) 1159-1164.
- 18 Hripcsak G, Albers DJ, Perotte A. Exploiting time in electronic health record correlations. Journal of the American Medical Informatics Association 2011; 18 (Suppl. 01) i109-i115.
- 19 Souri M, Sugiura-Ogasawara M, Saito S, Kemkes-Matthes B, Meijers JC, Ichinose A. Increase in the plasma levels of protein Z-dependent protease inhibitor in normal pregnancies but not in non-pregnant patients with unexplained recurrent miscarriage. Thrombosis and haemostasis 2012 107. 03 PubMed PMID: 22274138. Epub 2012/01/26.
- 20 Luo Z, Johnson SB, Lai AM, Weng C. Extracting temporal constraints from clinical research eligibility criteria using conditional random fields. Proceedings of American Medical Informatics Association Fall Symposium 2011 p. 843-52
- 21 Luo Z, Miotto R, Weng C. A human–computer collaborative approach to identifying common data elements in clinical trial eligibility criteria. J Biomed Inform 2013; 46 (Suppl. 01) 33-39.
- 22 Luo Z, Yetisgen-Yildiz M, Weng C. Dynamic categorization of clinical research eligibility criteria by hierarchical clustering. Journal of Biomedical Informatics 2011; 44 (Suppl. 06) 927-935.
- 23 Weng C, Tu SW, Sim I, Richesson R. Formal representation of eligibility criteria: a literature review. Journal of Biomedical Informatics 2010; 43 (Suppl. 03) 451-467. PubMed PMID: 20034594. Pubmed Central PMCID: 2878905. Epub 2009/12/26.
- 24 Boland MR, Miotto R, Gao J, Weng C. Feasibility of feature-based indexing, clustering, and search of clinical trials. A case study of breast cancer trials from ClinicalTrials.gov. Methods of information in medicine 2013; 52 (Suppl. 05) 382-394. PubMed PMID: 23666475. Pubmed Central PMCID: 3796134.
- 25 Boland MR, Miotto R, Weng C. A method for probing disease relatedness using common clinical eligibility criteria. Studies in health technology and informatics 2013; 192: 481-485. PubMed PMID: 23920601. Pubmed Central PMCID: 3803102.
- 26 Miotto R, Jiang S, Weng C. eTACTS: a method for dynamically filtering clinical trial search results. J Biomed Inform 2013; 46 (Suppl. 06) 1060-1067. PubMed PMID: 23916863. Pubmed Central PMCID: 3843999.
- 27 Miotto R, Weng C. Towards dynamic and interactive retrieval of clinical trials using common eligibility features. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science 2013; 2013: 182. PubMed PMID: 24303261. Pubmed Central PMCID: 3845761.
- 28 Miotto R, Weng C. Unsupervised mining of frequent tags for clinical eligibility text indexing. J Biomed Inform 2013; 46 (Suppl. 06) 1145-1151. PubMed PMID: 24036004. Pubmed Central PMCID: 3843986.
- 29 Marchena PJ, Nieto JA, Guil M, Garcia-Bragado F, Rabunal R, Boccalon H. et al. Long-term therapy with low-molecular-weight heparin in cancer patients with venous thromboembolism. Thrombosis and haemostasis 2012; 107 (Suppl. 01) 37-43 PubMed PMID: 22116496. Epub 2011/11/26.
- 30 Lin C-C, Li C-I, Hsiao C-Y, Liu C-S, Yang S-Y, Lee C-C. et al. Time trend analysis of the prevalence and incidence of diagnosed type 2 diabetes among adults in Taiwan from 2000 to 2007: a population-based study. BMC Public Health 2013; 13 (Suppl. 01) 318. PubMed PMID: doi:10.1186/1471–2458–13–318.
- 31 Klompas M, Eggleston E, McVetta J, Lazarus R, Li L, Platt R. Automated Detection and Classification of Type 1 Versus Type 2 Diabetes Using Electronic Health Record Data. Diabetes Care 2013; 36 (Suppl. 04) 914-921.
- 32 Wei W-Q, Leibson CL, Ransom JE, Kho AN, Caraballo PJ, Chai HS. et al. Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus. Journal of the American Medical Informatics Association 2012; 19 (Suppl. 02) 219-224.
- 33 Kudyakov R, Bowen J, Ewen E, West S, Daoud Y, Fleming N. et al. Electronic health record use to classify patients with newly diagnosed versus preexisting type 2 diabetes: infrastructure for comparative effectiveness research and population health management. Popul Health Manag 2012; 15 (Suppl. 01) 3-11.
- 34 Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013; 20 (Suppl. 01) 144-151.
- 35 Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. Defining and measuring completeness of electronic health records for secondary use. Journal of Biomedical Informatics, in press, http://dx.doi.org 10. 1016/j.jbi.2013.06.010.
- 36 Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM. et al. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. Journal of the American Medical Informatics Association 2012; 19 e1 e162-e169.
- 37 Carlo L, Chase HS, Weng C. Aligning Structured and Unstructured Medical Problems Using UMLS. AMIA Annu Symp Proc 2010; 2010: 91-5. PubMed PMID: 21346947. Pubmed Central PMCID: 3041294. Epub 2011/02/25.
- 38 Li L, Chase HS, Patel CO, Friedman C, Weng C. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. AMIA Annu Symp Proc 2008: 404-408 PubMed PMID: 18999285. Pubmed Central PMCID: 2656007. Epub 2008/11/13.
- 39 Rhodes ET, Laffel LMB, Gonzalez TV, Ludwig DS. Accuracy of Administrative Coding for Type 2 Diabetes in Children, Adolescents, and Young Adults. Diabetes Care 2007; 30 (Suppl. 01) 141-143.
- 40 Ding EL, Song Y, Manson JE, Pradhan AD, Buring JE, Liu S. Accuracy of Administrative Coding for Type 2 Diabetes in Children, Adolescents, and Young Adults: Response to Rhodes et al. Diabetes Care 2007; 30 (Suppl. 09) e98.
- 41 Kho AN, Hayes MG, Rasmussen-Torvik L, Pacheco JA, Thompson WK, Armstrong LL. et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. Journal of the American Medical Informatics Association 2012; 19 (Suppl. 02) 212-218.
- 42 Ritchie MD, Denny JC, Crawford DC, Ramirez AH, Weiner JB, Pulley JM. et al. Robust Replication of Genotype-Phenotype Associations across Multiple Diseases in an Electronic Medical Record. The American Journal of Human Genetics 2010; 86 (Suppl. 04) 560-572.
- 43 Johnson S. Generic data modeling for clinical repositories. Journal of the American Medical Informatics Association 1996; 3 (Suppl. 05) 328-339.
- 44 Manzano-Fernandez S, Cambronero F, Caro-Martinez C, Hurtado-Martinez JA, Marin F, Pastor-Perez FJ. et al. Mild kidney disease as a risk factor for major bleeding in patients with atrial fibrillation undergoing percutaneous coronary stenting. Thrombosis and haemostasis 2012; 107 (Suppl. 01) 51-58. PubMed PMID: 22072287. Epub 2011/11/11.
- 45 Catherine C, Cowie Rust KF, Byrd-Holt DD, Eberhardt MS, Flegal KM, Engelgau MM. et al. Prevalence of Diabetes and Impaired Fasting Glucose in Adults in the U. S. Population. . Diabetes Care 2006; 29: 1263-1268.
- 46 Altman DG. Practical Statistics for Medical Research: Chapman and Hall/CRC. 1st ed edition (November 22, 1990) 1990 Hardcover: 624 pages p.
- 47 Bhattacharya S, Cantor MN. Analysis of eligibility criteria representation in industry-standard clinical trial protocols. Journal of Biomedical Informatics 2013; 46 (Suppl. 05) 805-813.