Appl Clin Inform 2014; 05(02): 463-479
DOI: 10.4338/ACI-2013-12-RA-0105
Research Article
Schattauer GmbH

A Distribution-based Method for Assessing The Differences between Clinical Trial Target Populations and Patient Populations in Electronic Health Records

C. Weng*
1  Department of Biomedical Informatics, Columbia University, New York, NY 10032
Y. Li*
2  Department of Computer Science, City College of New York, New York, NY 10031
P. Ryan
3  Janssen Research and Development, Titusville, New Jersey, 08560
4  Observational Health Data Sciences and Informatics, New York, NY, 10032
Y. Zhang
5  Department of Biostatistics, Columbia University, New York, NY 10032
F. Liu
1  Department of Biomedical Informatics, Columbia University, New York, NY 10032
J. Gao
6  Business School, Columbia University, New York, NY 10025
J.T. Bigger
7  Department of Medicine, Columbia University, New York, NY 10032
G. Hripcsak
1  Department of Biomedical Informatics, Columbia University, New York, NY 10032
› Author Affiliations
Further Information

Publication History

Received: 18 December 2013

Accepted: 09 April 2014

Publication Date:
21 December 2017 (online)



Objective: To improve the transparency of clinical trial generalizability and to illustrate the method using Type 2 diabetes as an example.

Methods: Our data included 1,761 diabetes clinical trials and the electronic health records (EHR) of 26,120 patients with Type 2 diabetes who visited Columbia University Medical Center of New-York Presbyterian Hospital. The two populations were compared using the Generalizability Index for Study Traits (GIST) on the earliest diagnosis age and the mean hemoglobin A1c (HbA1c) values.

Results: Greater than 70% of Type 2 diabetes studies allow patients with HbA1c measures between 7 and 10.5, but less than 40% of studies allow HbA1c<7 and fewer than 45% of studies allow HbA1c>10.5. In the real-world population, only 38% of patients had HbA1c between 7 and 10.5, with 12% having values above the range and 52% having HbA1c<7. The GIST for HbA1c was 0.51. Most studies adopted broad age value ranges, with the most common restrictions excluding patients >80 or <18 years. Most of the real-world population fell within this range, but 2% of patients were <18 at time of first diagnosis and 8% were >80. The GIST for age was 0.75. Conclusions: We contribute a scalable method to profile and compare aggregated clinical trial target populations with EHR patient populations. We demonstrate that Type 2 diabetes studies are more generalizable with regard to age than they are with regard to HbA1c. We found that the generalizability of age increased from Phase 1 to Phase 3 while the generalizability of HbA1c decreased during those same phases. This method can generalize to other medical conditions and other continuous or binary variables. We envision the potential use of EHR data for examining the generaliz-ability of clinical trials and for defining population-representative clinical trial eligibility criteria.

Citation: Weng C, Li Y, Ryan P, Zhang Y, Liu F, Gao J, Bigger JT, Hripcsak G. A distribution-based method for assessing the differences between clinical trial target populations and patient populations in electronic health records. Appl Clin Inf 2014; 5: 463–479

* equal-contribution first authors