Summary
Objectives: Given the quickening speed of discovery of variant disease drivers from combined
patient genotype and phenotype data, the objective is to provide methodology using
big data technology to support the definition of deep phenotypes in medical records.
Methods: As the vast stores of genomic information increase with next generation sequencing,
the importance of deep phenotyping increases. The growth of genomic data and adoption
of Electronic Health Records (EHR) in medicine provides a unique opportunity to integrate
phenotype and genotype data into medical records. The method by which collections
of clinical findings and other health related data are leveraged to form meaningful
phenotypes is an active area of research. Longitudinal data stored in EHRs provide
a wealth of information that can be used to construct phenotypes of patients. We focus
on a practical problem around data integration for deep phenotype identification within
EHR data. The use of big data approaches are described that enable scalable markup
of EHR events that can be used for semantic and temporal similarity analysis to support
the identification of phenotype and genotype relationships.
Conclusions: Stead and colleagues’ 2005 concept of using light standards to increase the productivity
of software systems by riding on the wave of hardware/processing power is described
as a harbinger for designing future healthcare systems. The big data solution, using
flexible markup, provides a route to improved utilization of processing power for
organizing patient records in genotype and phenotype research.
Keywords
Deep phenotype - ontology - big data - genome - electronic health record