EHR Big Data Deep Phenotyping

L. J. Frey; L. Lenert; G. Lopez-Campos

doi:10.15265/IY-2014-0006

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00034612.xml

Share / Bookmark

Facebook Linkedin Weibo

Download PDF

Yearb Med Inform 2014; 23(01): 206-211
DOI: 10.15265/IY-2014-0006

Original Article

Georg Thieme Verlag KG Stuttgart

EHR Big Data Deep Phenotyping

Contribution of the IMIA Genomic Medicine Working Group

L. J. Frey

¹Chair IMIA Genomic Medicine WG, Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA

,

L. Lenert

²Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA

,

G. Lopez-Campos

³Vice-Chair IMIA Genomic Medicine WG, Health and Biomedical Informatics Centre, The University of Melbourne, Parkville, Victoria, Australia

› Author Affiliations

Further Information

Correspondence to:

Lewis J Frey

Chair IMIA Genomic Medicine WG

Biomedical Informatics Center

Public Health Sciences, Associate Professor

Hollings Cancer Center, Research Member

Medical University of South Carolina

135 Cannon Street, Suite 405K, MUSC 200

Charleston, SC 29425. USA

Phone: +1 843 792 4216

Fax: +1 843 792 5587

Email: Frey@musc.edu

Publication History

15 August 2014

Publication Date:
05 March 2018 (online)

Also available at

Abstract
Full Text
References

PDF Download Permissions and Reprints

Summary
References

Summary

Objectives: Given the quickening speed of discovery of variant disease drivers from combined patient genotype and phenotype data, the objective is to provide methodology using big data technology to support the definition of deep phenotypes in medical records.

Methods: As the vast stores of genomic information increase with next generation sequencing, the importance of deep phenotyping increases. The growth of genomic data and adoption of Electronic Health Records (EHR) in medicine provides a unique opportunity to integrate phenotype and genotype data into medical records. The method by which collections of clinical findings and other health related data are leveraged to form meaningful phenotypes is an active area of research. Longitudinal data stored in EHRs provide a wealth of information that can be used to construct phenotypes of patients. We focus on a practical problem around data integration for deep phenotype identification within EHR data. The use of big data approaches are described that enable scalable markup of EHR events that can be used for semantic and temporal similarity analysis to support the identification of phenotype and genotype relationships.

Conclusions: Stead and colleagues’ 2005 concept of using light standards to increase the productivity of software systems by riding on the wave of hardware/processing power is described as a harbinger for designing future healthcare systems. The big data solution, using flexible markup, provides a route to improved utilization of processing power for organizing patient records in genotype and phenotype research.

Keywords

Deep phenotype - ontology - big data - genome - electronic health record

References
1 Cases M, Fulong LI, Albanell J, Altman RB, Bellazzi R, Boyer S. et al. Improving data and knowledge management to better integrate health care and research.. J Intern Med 2013; 321-8.

MissingFormLabel
PubMed Search in Google Scholar
2 Starren J, Williams MS, Bottinger EP. Crossing the omic chasm: a time for omic ancillary systems.. JAMA 2013; Mar 27; 309 (12) 1237-8.

MissingFormLabel
Crossref PubMed Search in Google Scholar
3 Masys DR, Jarvik GP, Abernethy NF, Anderson NR, Papanicolaou GJ, Paltoo DN. et al. Technical desiderata for the integration of genomic data into Electronic Health Records.. J Biomed Inform 2012; Jun 45 (03) 419-22.

MissingFormLabel
Crossref PubMed Search in Google Scholar
4 Stead WW, Kelly BJ, Kolodner RM. Achievable Steps Toward Building a National Health Information Infrastructure in the United States.. J Am Med Inform Assoc 2005; 12 (02) 113-21.

MissingFormLabel
PubMed Search in Google Scholar
5 Rabbani B, Mahdieh N, Hosomichi K, Nakaoka H, Inoue I. Next-generation sequencing: impact of exome sequencing in characterizing Mendelian disorders.. J Hum Genet 2012; July: 621-32.

MissingFormLabel
PubMed Search in Google Scholar
6 Robinson PN. Deep Phenotyping for Precision Medicine.. Hum Mutat 2012; 33 (05) 777-80.

MissingFormLabel
Crossref PubMed Search in Google Scholar
7 Dawkins R. The Extended Phenotype.. Oxford: Oxford University Press; 1989

MissingFormLabel
Search in Google Scholar
8 Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records.. J Am Med Inform Assoc 2013; 20 (01) 117-21.

MissingFormLabel
Crossref PubMed Search in Google Scholar
9 Darvasi A. Experimental strategies for the genetic dissection of complex traits in animal models.. Nat Genet 1998; 18: 19-24.

MissingFormLabel
Crossref PubMed Search in Google Scholar
10 Doelken SC, Mungall CJ, Bauer S, Firth HV, Bail-leul-Forestier I, Black GCM. et al. The Human Phenotype Ontology project : linking molecular biology and disease through phenotype data.. Nucleic Acids Res 2013; 1-9.

MissingFormLabel
PubMed Search in Google Scholar
11 Girdea M, Dumitriu S, Fiume M, Bowdin S, Boycott KM, Chitayat D. et al. PhenoTips: Patient Phenotyping Software for Clinical and Research Use.. Hum Mutat 2013; 34 (08) 1057-65.

MissingFormLabel
Crossref PubMed Search in Google Scholar
12 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology.. Nucleic Acids Res 32 (Database issue). 2004; 267-70.

MissingFormLabel
PubMed Search in Google Scholar
13 Bodenreider O, Burgun A. Aligning knowledge sources in the UMLS: methods, quantitative results, and applications.. Medinfo Proc 2004; 327-31.

MissingFormLabel
PubMed Search in Google Scholar
14 Wilson PS, Scichilone RA. LOINC as a data standard: how LOINC can be used in electronic environments.. J AHMIA 2011; 82 (07) 44-47.

MissingFormLabel
PubMed Search in Google Scholar
15 Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years.. J Am Med Inform Assoc 2011; 8: 441-8.

MissingFormLabel
PubMed Search in Google Scholar
16 International Classification of Diseases version 9.. [15 December 2013];. http://www.icd9data.com/2006/Volume1/

MissingFormLabel
PubMed
17 Melton GB, Parsons S, Morrison FP, Rothschild AS, Markatou M, Hripcsak G. Inter-patient distance metrics using SNOMED CT defining relationships.. J Biomed Inform 2006; 39: 697-705.

MissingFormLabel
Crossref PubMed Search in Google Scholar
18 Robinson PN, Mundlos S. The Human Phenotype Ontology.. Clin Genet 2010; 77: 525-34.

MissingFormLabel
Crossref PubMed Search in Google Scholar
19 McKusick VA. Mendelian Inheritance in Man.. A Catalog of Human Genes and Genetic Disorders. Baltimore:: Johns Hopkins University Press;; 1998. (12th edition).

MissingFormLabel
Search in Google Scholar
20 Köhler S, Schulz MH, Krawitz P, Bauer S, Dölken S, Ott CE. et al. Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies.. Am J Hum Genet 2009; 85: 457-64.

MissingFormLabel
Crossref PubMed Search in Google Scholar
21 Hajihashemi Z, Popescu M. An Early Illness Recognition Framework Using a Temporal Smith Waterman Algorithm and NLP.. Proc AMIA Fall Symp 2013; 549-57.

MissingFormLabel
PubMed Search in Google Scholar
22 Wang TD, Plaisant C. et al. Aligning Temporal Data by Sentinel Events: Discovering Patterns in Electronic Health Records.. CHI 2008 Proceedings: Health and Wellness,; Florence, Italy;: 2008

MissingFormLabel
Search in Google Scholar
23 Wongsuphasawat K, Gomez JAG. et al. LifeFlow: Visualizing an Overview of Event Sequences.. CHI 2011,; Vancouver,; BC, Canada;: 2011

MissingFormLabel
Search in Google Scholar
24 Wongsuphasawat K, Gotz DH. Outflow: Visualizing Patient Flow by Symptoms and Outcome.. IBM;; 2011. p. 1-4.

MissingFormLabel
Search in Google Scholar
25 Lee WN, Das AK. Local Alignment Tool for Clinical History: Temporal Semantic Search of Clinical Databases.. AMIA Annu Symp Proc 2010; 437-41.

MissingFormLabel
PubMed Search in Google Scholar
26 Lee WN, Bridewell W, Das AK. Alignment and Clustering of Breast Cancer Patients by Longitudinal Treatment History.. AMIA Annu Symp Proc 2011; 2011: 760-7.

MissingFormLabel
PubMed Search in Google Scholar
27 Ayres J, Gehrke J, Yiu T, Flannick J. Sequential Pattern Mining using a Bitmap Representation.. In: SIGKDD’02,; Edmonton,; Canada;: 2002. July.

MissingFormLabel
Search in Google Scholar
28 Papapetrou P, Kollios G, Sclaroff S, Gunopulos D. Mining frequent arrangements of temporal intervals.. Knowl Inf Syst 2009; 21 (02) 133-71.

MissingFormLabel
Crossref PubMed Search in Google Scholar
29 Shahar Y, Goren-Bar D, Boaz D, Tahan G. Distributed, intelligent, interactive visualization and exploration of time-oriented clinical data.. Artif Intell Med 2006; 38 (02) 115-35.

MissingFormLabel
Crossref PubMed Search in Google Scholar
30 Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM. et al. Exome sequencing identifies the cause of a mendelian disorder.. Nat Genet 2010; Jan 42 (01) 30-5.

MissingFormLabel
Crossref PubMed Search in Google Scholar
31 Reese MG, Moore B, Batchelor C, Salas F, Cunningham F, Marth GT. et al. A standard variation file format for human genome sequences.. Genome Biol 2010; 11 (08) R88.

MissingFormLabel
Crossref PubMed Search in Google Scholar
32 McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB. et al. The eMERGE Network : A consortium of biorepositories linked to electronic medical records data for conducting genomic studies.. BMC Med Genomics 2011; 4 (01) 13.

MissingFormLabel
Crossref PubMed Search in Google Scholar
33 Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V. et al. Validation of electronic medical record-based phenotyping algorithms : results and lessons learned from the eMERGE network.. Medical Informatics 2013; 20: e147-e154.

MissingFormLabel
PubMed Search in Google Scholar
34 Ghemawat S, Gobioff H. et al. The Google File System.. SOSP’03, Bolton Landing,; New York, USA;: 2003

MissingFormLabel
Search in Google Scholar
35 Chang F, Dean J. et al. Bigtable: A Distributed Storage System for Structured Data.. OSDI ‘06: 7th USENIX Symposium on Operating Systems Design and Implementation;; 2006. p. 205-18.

MissingFormLabel
Search in Google Scholar
36 Borthakur D, Sarma JS. et al. Apache Hadoop Goes Realtime at Facebook.. SIGMOD ‘11, Athens,; Greece.: 2011

MissingFormLabel
Search in Google Scholar
37 Seoane JA, Dorado J, Pazos A. Data Integration in Genomic Medicine: Trends and Applications.. IMIA Yearbook 2012: Personal Health Informatics 2012; 117-25.

MissingFormLabel
PubMed Search in Google Scholar

Correspondence to:

Lewis J Frey

Chair IMIA Genomic Medicine WG

Biomedical Informatics Center

Public Health Sciences, Associate Professor

Hollings Cancer Center, Research Member

Medical University of South Carolina

135 Cannon Street, Suite 405K, MUSC 200

Charleston, SC 29425. USA

Phone: +1 843 792 4216

Fax: +1 843 792 5587

Email: Frey@musc.edu

References
1 Cases M, Fulong LI, Albanell J, Altman RB, Bellazzi R, Boyer S. et al. Improving data and knowledge management to better integrate health care and research.. J Intern Med 2013; 321-8.

MissingFormLabel
PubMed Search in Google Scholar
2 Starren J, Williams MS, Bottinger EP. Crossing the omic chasm: a time for omic ancillary systems.. JAMA 2013; Mar 27; 309 (12) 1237-8.

MissingFormLabel
Crossref PubMed Search in Google Scholar
3 Masys DR, Jarvik GP, Abernethy NF, Anderson NR, Papanicolaou GJ, Paltoo DN. et al. Technical desiderata for the integration of genomic data into Electronic Health Records.. J Biomed Inform 2012; Jun 45 (03) 419-22.

MissingFormLabel
Crossref PubMed Search in Google Scholar
4 Stead WW, Kelly BJ, Kolodner RM. Achievable Steps Toward Building a National Health Information Infrastructure in the United States.. J Am Med Inform Assoc 2005; 12 (02) 113-21.

MissingFormLabel
PubMed Search in Google Scholar
5 Rabbani B, Mahdieh N, Hosomichi K, Nakaoka H, Inoue I. Next-generation sequencing: impact of exome sequencing in characterizing Mendelian disorders.. J Hum Genet 2012; July: 621-32.

MissingFormLabel
PubMed Search in Google Scholar
6 Robinson PN. Deep Phenotyping for Precision Medicine.. Hum Mutat 2012; 33 (05) 777-80.

MissingFormLabel
Crossref PubMed Search in Google Scholar
7 Dawkins R. The Extended Phenotype.. Oxford: Oxford University Press; 1989

MissingFormLabel
Search in Google Scholar
8 Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records.. J Am Med Inform Assoc 2013; 20 (01) 117-21.

MissingFormLabel
Crossref PubMed Search in Google Scholar
9 Darvasi A. Experimental strategies for the genetic dissection of complex traits in animal models.. Nat Genet 1998; 18: 19-24.

MissingFormLabel
Crossref PubMed Search in Google Scholar
10 Doelken SC, Mungall CJ, Bauer S, Firth HV, Bail-leul-Forestier I, Black GCM. et al. The Human Phenotype Ontology project : linking molecular biology and disease through phenotype data.. Nucleic Acids Res 2013; 1-9.

MissingFormLabel
PubMed Search in Google Scholar
11 Girdea M, Dumitriu S, Fiume M, Bowdin S, Boycott KM, Chitayat D. et al. PhenoTips: Patient Phenotyping Software for Clinical and Research Use.. Hum Mutat 2013; 34 (08) 1057-65.

MissingFormLabel
Crossref PubMed Search in Google Scholar
12 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology.. Nucleic Acids Res 32 (Database issue). 2004; 267-70.

MissingFormLabel
PubMed Search in Google Scholar
13 Bodenreider O, Burgun A. Aligning knowledge sources in the UMLS: methods, quantitative results, and applications.. Medinfo Proc 2004; 327-31.

MissingFormLabel
PubMed Search in Google Scholar
14 Wilson PS, Scichilone RA. LOINC as a data standard: how LOINC can be used in electronic environments.. J AHMIA 2011; 82 (07) 44-47.

MissingFormLabel
PubMed Search in Google Scholar
15 Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years.. J Am Med Inform Assoc 2011; 8: 441-8.

MissingFormLabel
PubMed Search in Google Scholar
16 International Classification of Diseases version 9.. [15 December 2013];. http://www.icd9data.com/2006/Volume1/

MissingFormLabel
PubMed
17 Melton GB, Parsons S, Morrison FP, Rothschild AS, Markatou M, Hripcsak G. Inter-patient distance metrics using SNOMED CT defining relationships.. J Biomed Inform 2006; 39: 697-705.

MissingFormLabel
Crossref PubMed Search in Google Scholar
18 Robinson PN, Mundlos S. The Human Phenotype Ontology.. Clin Genet 2010; 77: 525-34.

MissingFormLabel
Crossref PubMed Search in Google Scholar
19 McKusick VA. Mendelian Inheritance in Man.. A Catalog of Human Genes and Genetic Disorders. Baltimore:: Johns Hopkins University Press;; 1998. (12th edition).

MissingFormLabel
Search in Google Scholar
20 Köhler S, Schulz MH, Krawitz P, Bauer S, Dölken S, Ott CE. et al. Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies.. Am J Hum Genet 2009; 85: 457-64.

MissingFormLabel
Crossref PubMed Search in Google Scholar
21 Hajihashemi Z, Popescu M. An Early Illness Recognition Framework Using a Temporal Smith Waterman Algorithm and NLP.. Proc AMIA Fall Symp 2013; 549-57.

MissingFormLabel
PubMed Search in Google Scholar
22 Wang TD, Plaisant C. et al. Aligning Temporal Data by Sentinel Events: Discovering Patterns in Electronic Health Records.. CHI 2008 Proceedings: Health and Wellness,; Florence, Italy;: 2008

MissingFormLabel
Search in Google Scholar
23 Wongsuphasawat K, Gomez JAG. et al. LifeFlow: Visualizing an Overview of Event Sequences.. CHI 2011,; Vancouver,; BC, Canada;: 2011

MissingFormLabel
Search in Google Scholar
24 Wongsuphasawat K, Gotz DH. Outflow: Visualizing Patient Flow by Symptoms and Outcome.. IBM;; 2011. p. 1-4.

MissingFormLabel
Search in Google Scholar
25 Lee WN, Das AK. Local Alignment Tool for Clinical History: Temporal Semantic Search of Clinical Databases.. AMIA Annu Symp Proc 2010; 437-41.

MissingFormLabel
PubMed Search in Google Scholar
26 Lee WN, Bridewell W, Das AK. Alignment and Clustering of Breast Cancer Patients by Longitudinal Treatment History.. AMIA Annu Symp Proc 2011; 2011: 760-7.

MissingFormLabel
PubMed Search in Google Scholar
27 Ayres J, Gehrke J, Yiu T, Flannick J. Sequential Pattern Mining using a Bitmap Representation.. In: SIGKDD’02,; Edmonton,; Canada;: 2002. July.

MissingFormLabel
Search in Google Scholar
28 Papapetrou P, Kollios G, Sclaroff S, Gunopulos D. Mining frequent arrangements of temporal intervals.. Knowl Inf Syst 2009; 21 (02) 133-71.

MissingFormLabel
Crossref PubMed Search in Google Scholar
29 Shahar Y, Goren-Bar D, Boaz D, Tahan G. Distributed, intelligent, interactive visualization and exploration of time-oriented clinical data.. Artif Intell Med 2006; 38 (02) 115-35.

MissingFormLabel
Crossref PubMed Search in Google Scholar
30 Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM. et al. Exome sequencing identifies the cause of a mendelian disorder.. Nat Genet 2010; Jan 42 (01) 30-5.

MissingFormLabel
Crossref PubMed Search in Google Scholar
31 Reese MG, Moore B, Batchelor C, Salas F, Cunningham F, Marth GT. et al. A standard variation file format for human genome sequences.. Genome Biol 2010; 11 (08) R88.

MissingFormLabel
Crossref PubMed Search in Google Scholar
32 McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB. et al. The eMERGE Network : A consortium of biorepositories linked to electronic medical records data for conducting genomic studies.. BMC Med Genomics 2011; 4 (01) 13.

MissingFormLabel
Crossref PubMed Search in Google Scholar
33 Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V. et al. Validation of electronic medical record-based phenotyping algorithms : results and lessons learned from the eMERGE network.. Medical Informatics 2013; 20: e147-e154.

MissingFormLabel
PubMed Search in Google Scholar
34 Ghemawat S, Gobioff H. et al. The Google File System.. SOSP’03, Bolton Landing,; New York, USA;: 2003

MissingFormLabel
Search in Google Scholar
35 Chang F, Dean J. et al. Bigtable: A Distributed Storage System for Structured Data.. OSDI ‘06: 7th USENIX Symposium on Operating Systems Design and Implementation;; 2006. p. 205-18.

MissingFormLabel
Search in Google Scholar
36 Borthakur D, Sarma JS. et al. Apache Hadoop Goes Realtime at Facebook.. SIGMOD ‘11, Athens,; Greece.: 2011

MissingFormLabel
Search in Google Scholar
37 Seoane JA, Dorado J, Pazos A. Data Integration in Genomic Medicine: Trends and Applications.. IMIA Yearbook 2012: Personal Health Informatics 2012; 117-25.

MissingFormLabel
PubMed Search in Google Scholar

Permissions and Reprints

Subscribe to RSS

Share / Bookmark

EHR Big Data Deep Phenotyping

Correspondence to:

Publication History

Summary

Keywords

References

Correspondence to:

References