Developing an Algorithm to Detect Early Childhood Obesity in Two Tertiary Pediatric Medical CentersFunding All phases of this study were supported by United States National Institutes of Health (U1 1U01HG006828–01) as part of the Electronic Medical Record and Genomics project (eMERGE), NIH-NIDDK grant T32DK007699, K12DK094721 and Nutrition and Obesity Research Center at Harvard (P30-DK040561), as well as institutional funding from CCHMC, BCH, Vanderbilt University, Children’s Hospital of Philadelphia and Geisinger Health System.
03 March 2016
accepted: 15 June 2016
19 December 2017 (online)
The objective of this study is to develop an algorithm to accurately identify children with severe early onset childhood obesity (ages 1–5.99 years) using structured and unstructured data from the electronic health record (EHR).
Childhood obesity increases risk factors for cardiovascular morbidity and vascular disease. Accurate definition of a high precision phenotype through a standardize tool is critical to the success of large-scale genomic studies and validating rare monogenic variants causing severe early onset obesity.
Data and Methods
Rule based and machine learning based algorithms were developed using structured and unstructured data from two EHR databases from Boston Children’s Hospital (BCH) and Cincinnati Children’s Hospital and Medical Center (CCHMC). Exclusion criteria including medications or comorbid diagnoses were defined. Machine learning algorithms were developed using cross-site training and testing in addition to experimenting with natural language processing features.
Precision was emphasized for a high fidelity cohort. The rule-based algorithm performed the best overall, 0.895 (CCHMC) and 0.770 (BCH). The best feature set for machine learning employed Unified Medical Language System (UMLS) concept unique identifiers (CUIs), ICD-9 codes, and RxNorm codes.
Detecting severe early childhood obesity is essential for the intervention potential in children at the highest long-term risk of developing comorbidities related to obesity and excluding patients with underlying pathological and non-syndromic causes of obesity assists in developing a high-precision cohort for genetic study. Further such phenotyping efforts inform future practical application in health care environments utilizing clinical decision support.
Citation: Lingren T, Thaker V, Brady C, Namjou B, Kennebeck S, Bickel J, Patibandla N, Ni Y, Van Driest SL, Chen L, Roach A, Cobb B, Kirby J, Denny J, Bailey-Davis L, Williams MS, Marsolo K, Solti I, Holm IA, Harley J, Kohane IS, Savova G, Crimmins N. Developing an algorithm to detect early childhood obesity in two tertiary pediatric medical centers.
* Equal contribution
- 1 Ogden CL, Carroll MD, Kit BK, Flegal KM. Prevalence of childhood and adult obesity in the United States, 2011–2012. JAMA 2014; 311 (08) 806-814.
- 2 Skinner A, Skelton J. Prevalence and trends in obesity and severe obesity among children in the United States, 1999-2012. JAMA Pediatr 2014; 168 (06) 561-566.
- 3 Flegal K, Wei R. Characterizing extreme values of body mass index–for-age by using the 2000 Centers for Disease Control and Prevention growth charts. Am J Clin Nutr 2009; 90: 1314-1320.
- 4 Lo JC, Maring B, Chandra M, Daniels SR, Sinaiko A, Daley MF, Sherwood NE, Kharbanda EO, Parker ED, Adams KF, Prineas RJ, Magid DJ, O’Connor PJ, Greenspan LC. Prevalence of obesity and extreme obesity in children aged 3–5 years. Pediatr Obes 2014; 09 (03) 167-175.
- 5 Wright CM, Parker L, Lamont D, Craft AW. Implications of childhood obesity for adult health: findings from thousand families cohort study. BMJ 2001; 323 7324 1280-1284.
- 6 Freedman DS, Mei Z, Srinivasan SR, Berenson GS, Dietz WH. Cardiovascular risk factors and excess adiposity among overweight children and adolescents: the Bogalusa Heart Study. J Pediatr 2007; 150 (01) 12-17 e2.
- 7 Ice CL, Murphy E, Cottrell L, Neal WA. Morbidly obese diagnosis as an indicator of cardiovascular disease risk in children: results from the CARDIAC Project. Int J Pediatr Obes 2011; 06 (02) 113-119.
- 8 Imai CM, Gunnarsdottir I, Gudnason V, Aspelund T, Birgisdottir BE, Thorsdottir I, Halldorsson TI. Faster increase in body mass index between ages 8 and 13 is associated with risk factors for cardiovascular morbidity and mortality. Nutr Metab Cardiovasc Dis 2014; 24: 730-736.
- 9 Lo JC, Chandra M, Sinaiko A, Daniels SR, Prineas RJ, Maring B, Parker ED, Sherwood NE, Daley MF, Kharbanda EO, Adams KF, Magid DJ, O’Connor PJ, Greenspan LC. Severe obesity in children: prevalence, persistence and relation to hypertension. Int J Pediatr Endocrinol 2014; 2014 (01) 3.
- 10 Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet 2011; 12 (06) 417-428.
- 11 Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLoS Comput Biol 2012; 08 (12) e1002823.
- 12 Min J, Chiu DT, Wang Y. Variation in the heritability of body mass index based on diverse twin studies: a systematic review. Obes Rev. 2013 doi:10.1111/obr.12065..
- 13 Barsh GS, Farooqi IS, O’Rahilly S. Genetics of body-weight regulation. Nature 2000; 404 6778 644-651.
- 14 Farooqi IS, Jebb SA, G L, Lawrence E, Cheetham CH, Prentice A, Hughes I, McCamish M, O’Rahilly S. Effects of Recombinant Leptin Therapy in a Child with Congenital Leptin Deficiency. N Engl J Med 1999; 341 (12) 879-884.
- 15 O’Rahilly S, Farooqi IS. Human obesity: a heritable neurobehavioral disorder that is highly sensitive to environmental conditions. Diabetes 2008; 57 (11) 2905-2910.
- 16 Hill JO, Wyatt HR, Melanson EL. Genetic and environmental contributions to obesity. Med Clin North Am 2000; 84 (02) 333-346.
- 17 Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014; 21 (02) 221-230.
- 18 Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS, Bastarache L, Zuvich R, Peissig P, Carrell D, Ramirez AH, Pathak J, Wilke RA, Rasmussen L, Wang X, Pacheco JA, Kho AN, Hayes MG, Weston N, Matsumoto M, Kopp PA, Newton KM, Jarvik GP, Li R, Manolio TA, Kullo IJ, Chute CG, Chisholm RL, Larson EB, McCarty CA, Masys DR, Roden DM, de Andrade M. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genomeand phenome-wide studies. Am J Hum Genet 2011; 89 (04) 529-542.
- 19 Kho AN, Hayes MG, Rasmussen-Torvik L, Pacheco JA, Thompson WK, Armstrong LL, Denny JC, Peissig PL, Miller AW, Wei W-Q, Bielinski SJ, Chute CG, Leibson CL, Jarvik GP, Crosslin DR, Carlson CS, Newton KM, Wolf WA, Chisholm RL, Lowe WL. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J Am Med Inform Assoc 2012; 19 (02) 212-218.
- 20 Kullo IJ, Fan J, Pathak J, Savova GK, Ali Z, Chute CG. Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J Am Med Inform Assoc 2010; 17 (05) 568-574.
- 21 Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V, Basford M, Chute CG, Kullo IJ, Li R, Pacheco JA, Rasmussen L V, Spangler L, Denny JC. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc 2013; 20 e1 e147-154.
- 22 Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform Assoc 2013; 20 e2 e206-e211.
- 23 Peissig PL, Rasmussen L V, Berg RL, Linneman JG, McCarty CA, Waudby C, Chen L, Denny JC, Wilke RA, Pathak J, Carrell D, Kho AN, Starren JB. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. J Am Med Inform Assoc 2012; 19 (02) 225-234.
- 24 Schildcrout JS, Basford MA, Pulley JM, Masys DR, Roden DM, Wang D, Chute CG, Kullo IJ, Carrell D, Peissig P, Kho A, Denny JC. An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records. J Biomed Inform 2010; 43 (06) 914-923.
- 25 Bailey LC, Milov DE, Kelleher K, Kahn MG, Del Beccaro M, Yu F, Richards T, Forrest CB. Multi-Institutional Sharing of Electronic Health Record Data to Assess Childhood Obesity. PLoS One 2013; 08 (06) e66192.
- 26 Plenge RM, Bridges SL, Huizinga TWJ, Criswell LA, Gregersen PK. Recommendations for publication of genetic association studies in Arthritis & Rheumatism. Arthritis Rheum 2011; 63 (10) 2839-2847.
- 27 Rea S, Pathak J, Savova G, Oniki TA, Westberg L, Beebe CE, Tao C, Parker CG, Haug PJ, Huff SM, Chute CG. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J Biomed Inform 2012; 45 (04) 763-771.
- 28 Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012; 13 (06) 395-405.
- 29 Manion FJ, Harris MR, Buyuktur AG, Clark PM, An LC, Hanauer DA. Leveraging EHR data for outcomes and comparative effectiveness research in oncology. Curr Oncol Rep 2012; 14 (06) 494-501.
- 30 Warrer P, Hansen EH, Juhl-Jensen L, Aagaard L. Using text-mining techniques in electronic patient records to identify ADRs from medicine use. Br J Clin Pharmacol 2012; 73 (05) 674-684.
- 31 Centers for Disease Control and Prevention: Selected percentiles and LMS Parameters.
- 32 Flegal K, Cole T. Construction of LMS parameters for the centers for disease control and prevention 2,000 growth charts. Natl Health Stat Report 2013; (63) 1-4.
- 33 World Health Organization: Child Growth Standards. Available at: http://www.who.int/childgrowth/stan dards/bmi_for_age/en/
- 34 Skelton J, a Cook SR, Auinger P, Klein JD, Barlow SE. Prevalence and trends of severe obesity among US children and adolescents. Acad Pediatr 2009; 09 (05) 322-329.
- 35 Jensen GL. Drug-induced hyperphagia: what can we learn from psychiatric medications? JPEN. J Parenter Enteral Nutr 2008; 32 (05) 578-581.
- 36 Savova GK, Coden AR, Sominsky IL, Johnson R, Ogren P V, de Groen PC, Chute CG. Word sense disambiguation across two domains: biomedical literature and clinical notes. J Biomed Inform 2008; 41 (06) 1088-1100.
- 37 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P WI. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 2009; 11 (01) 10-18.
- 38 Team RC. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2013
- 39 Moja L, Kwag KH, Lytras T, Bertizzolo L, Brandt L, Pecoraro V, Rigon G, Vaona A, Ruggiero F, Mangia M, Iorio A, Kunnamo I, Bonovas S. Effectiveness of clinical decision support systems linked to electronic health records: A systematic review and meta-analysis. Am J Public Health 2014; 104: e12-e22.
- 40 Taveras EM, Marschall R, Kleinman KP, Gillman MW, Hacker K, Horan CM, Smith RL, Price S, Sharifi M, Rifas-Shiman SL, Simon SR. Comparative effectiveness of childhood obesity interventions in pediatric primary care: a cluster-randomized clinical trial. JAMA Pediatr 2015; 169 (06) 535-542.