A Rigorous Algorithm To Detect And Clean Inaccurate Adult Height Records Within EHR Systems
15 October 2013
accepted: 12 February 2013
20 December 2017 (online)
Background: Height is a critical variable for many biomedical analyses because it is an important component of Body Mass Index (BMI). Transforming EHR height measures into meaningful research-ready values is challenging and there is limited information available on methods for “cleaning” these data.
Objectives: We sought to develop an algorithm to clean adult height data extracted from EHR using only height values and associated ages.
Results: The algorithm we developed is sensitive to normal decreases in adult height associated with aging, is implemented using an open-source software tool and is thus easily modifiable, and is freely available. We checked the performance of our algorithm using data from the Northwestern biobank and a replication sample from the Marshfield Clinic biobank obtained through our participation in the eMERGE consortium. The algorithm identified 1262 erroneous values from a total of 33937 records in the Northwestern sample. Replacing erroneous height values with those identified as correct by the algorithm resulted in meaningful changes in height and BMI records; median change in recorded height after cleaning was 7.6 cm and median change in BMI was 2.9 kg/m2. Comparison of cleaned EHR height values to observer measured values showed that 94.5% (95% C.I 93.8-% – 95.2%) of cleaned values were within 3.5 cm of observer measured values.
Conclusions: Our freely available height algorithm cleans EHR height data with only height and age inputs. Use of this algorithm will benefit groups trying to perform research with height and BMI data extracted from EHR.
Citation: Muthalagu A, Pacheco JA, Aufox S, Peissig PL, Fuehrer JT, Tromp G, Kho AN, Rasmussen-Torvik LJ. A rigorous algorithm to detect and clean inaccurate adult height records within EHR systems. Appl Clin Inf 2014; 5: 118–126 http://dx.doi.org/10.4338/ACI-2013-09-RA-0074
- 1 Noel PH, Copeland LA, Perrin RA, Lancaster AE, Pugh MJ, Wang CP, Bollinger MJ, Hazuda HP. VHA Corporate Data Warehouse height and weight data: opportunities and challenges for health services research. Journal of rehabilitation research and development 2010; 47 (08) 739-750. PubMed PMID: 21141302. Epub 2010/12/15. eng.
- 2 Lipman TH, McGinley A, Hughes J, Minakami J, Layden VM, Ratcliffe S, Hench K. Evaluation of the accuracy of height assessment of premenopausal and menopausal women. J Obstet Gynecol Neonatal Nurs 2006; 35 (04) 516-22. PubMed PMID: 16881996. Epub 2006/08/03.
- 3 Sorkin JD, Muller DC, Andres R. Longitudinal change in the heights of men and women: consequential effects on body mass index. Epidemiologic reviews 1999; 21 (02) 247-260. PubMed PMID: 10682261. Epub 2000/02/22.
- 4 Joss EE, Temperli R, Mullis PE. Adult height in constitutionally tall stature: accuracy of five different height prediction methods. Arch Dis Child 1992; 67 (11) 1357-1362. PubMed PMID: 1471886. Pubmed Central PMCID: 1793786. Epub 1992/11/01.
- 5 Siminoski K, Warshawski RS, Jen H, Lee K. The accuracy of historical height loss for the detection of vertebral fractures in postmenopausal women. Osteoporosis international : a journal established as result of cooperation between the European Foundation for Osteoporosis and the National Osteoporosis Foundation of the USA 2006; 17 (02) 290-296. PubMed PMID: 16143833. Epub 2005/09/07.
- 6 Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM, Weston N, Crane PK, Pathak J, Chute CG, Bielinski SJ, Kullo IJ, Li R, Manolio TA, Chisholm RL, Denny JC. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med 2011; 3 (79) 79re1. PubMed PMID: 21508311. Epub 2011/04/22.
- 7 Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V, Basford M, Chute CG, Kullo IJ, Li R, Pacheco JA, Rasmussen LV, Spangler L, Denny JC. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc 2013. PubMed PMID: 23531748. Epub 2013/03/28.
- 8 Kho AN, Hayes MG, Rasmussen-Torvik L, Pacheco JA, Thompson WK, Armstrong LL, Denny JC, Peissig PL, Miller AW, Wei WQ, Bielinski SJ, Chute CG, Leibson CL, Jarvik GP, Crosslin DR, Carlson CS, Newton KM, Wolf WA, Chisholm RL, Lowe WL. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J Am Med Inform Assoc 2012; 19 (02) 212-218. PubMed PMID: 22101970. Pubmed Central PMCID: 3277617. Epub 2011/11/22.
- 9 Gudbjartsson DF, Walters GB, Thorleifsson G, Stefansson H, Halldorsson BV, Zusmanovich P, Sulem P, Thorlacius S, Gylfason A, Steinberg S, Helgadottir A, Ingason A, Steinthorsdottir V, Olafsdottir EJ, Olafsdottir GH, Jonsson T, Borch-Johnsen K, Hansen T, Andersen G, Jorgensen T, Pedersen O, Aben KK, Witjes JA, Swinkels DW, den Heijer M, Franke B, Verbeek AL, Becker DM, Yanek LR, Becker LC, Tryggvadottir L, Rafnar T, Gulcher J, Kiemeney LA, Kong A, Thorsteinsdottir U, Stefansson K. Many sequence variants affecting diversity of adult human height. Nat Genet 2008; 40 (05) 609-615. PubMed PMID: 18391951.
- 10 Hellerstein JM. Quantitative data cleaning for large databases. United Nations Economic Commission for Europe (UNECE); 2008
- 11 Lyratzopoulos G, Heller RF, Hanily M, Lewis PS. Risk factor measurement quality in primary care routine data was variable but nondifferential between individuals. J Clin Epidemiol 2008; 61 (03) 261-267. PubMed PMID: 18226749. Epub 2008/01/30.
- 12 Green BB, Anderson ML, Cook AJ, Catz S, Fishman PA, McClure JB, Reid R. Using body mass index data in the electronic health record to calculate cardiovascular risk. Am J Prev Med 2012; 42 (04) 342-347. PubMed PMID: 22424246. Pubmed Central PMCID: 3308122. Epub 2012/03/20.
- 13 Yoong SL, Carey ML, D’Este C, Sanson-Fisher RW. Agreement between self-reported and measured weight and height collected in general practice patients: a prospective study. BMC medical research methodology 2013; 13: 38. PubMed PMID: 23510189. Pubmed Central PMCID: 3599990. Epub 2013/03/21.
- 14 Stommel M, Schoenborn CA. Accuracy and usefulness of BMI measures based on self-reported weight and height: findings from the NHANES & NHIS 2001–2006. BMC public health 2009; 9: 421. PubMed PMID: 19922675. Pubmed Central PMCID: 2784464. Epub 2009/11/20.
- 15 Connor Gorber S, Tremblay M, Moher D, Gorber B. A comparison of direct vs. self-report measures for assessing height, weight and body mass index: a systematic review. Obes Rev 2007; 8 (04) 307-326. PubMed PMID: 17578381. Epub 2007/06/21.