Summary
Background: Height is a critical variable for many biomedical analyses because it is an important
component of Body Mass Index (BMI). Transforming EHR height measures into meaningful
research-ready values is challenging and there is limited information available on
methods for “cleaning” these data.
Objectives: We sought to develop an algorithm to clean adult height data extracted from EHR using
only height values and associated ages.
Results: The algorithm we developed is sensitive to normal decreases in adult height associated
with aging, is implemented using an open-source software tool and is thus easily modifiable,
and is freely available. We checked the performance of our algorithm using data from
the Northwestern biobank and a replication sample from the Marshfield Clinic biobank
obtained through our participation in the eMERGE consortium. The algorithm identified
1262 erroneous values from a total of 33937 records in the Northwestern sample. Replacing
erroneous height values with those identified as correct by the algorithm resulted
in meaningful changes in height and BMI records; median change in recorded height
after cleaning was 7.6 cm and median change in BMI was 2.9 kg/m2. Comparison of cleaned EHR height values to observer measured values showed that
94.5% (95% C.I 93.8-% – 95.2%) of cleaned values were within 3.5 cm of observer measured
values.
Conclusions: Our freely available height algorithm cleans EHR height data with only height and
age inputs. Use of this algorithm will benefit groups trying to perform research with
height and BMI data extracted from EHR.
Citation: Muthalagu A, Pacheco JA, Aufox S, Peissig PL, Fuehrer JT, Tromp G, Kho AN, Rasmussen-Torvik
LJ. A rigorous algorithm to detect and clean inaccurate adult height records within
EHR systems. Appl Clin Inf 2014; 5: 118–126 http://dx.doi.org/10.4338/ACI-2013-09-RA-0074
Keywords
Height - dimensional measurement accuracy - electronic health record - body mass index
- electronic medical record - phenotyping