Grappling with the Future Use of Big Data for Translational Medicine and Clinical Care
18 August 2017
11 September 2017 (online)
Objectives: Although patients may have a wealth of imaging, genomic, monitoring, and personal device data, it has yet to be fully integrated into clinical care.
Methods: We identify three reasons for the lack of integration. The first is that “Big Data” is poorly managed by most Electronic Medical Record Systems (EMRS). The data is mostly available on “cloud-native” platforms that are outside the scope of most EMRs, and even checking if such data is available on a patient often must be done outside the EMRS. The second reason is that extracting features from the Big Data that are relevant to healthcare often requires complex machine learning algorithms, such as determining if a genomic variant is protein-altering. The third reason is that applications that present Big Data need to be modified constantly to reflect the current state of knowledge, such as instructing when to order a new set of genomic tests. In some cases, applications need to be updated nightly.
Results: A new architecture for EMRS is evolving which could unite Big Data, machine learning, and clinical care through a microservice-based architecture which can host applications focused on quite specific aspects of clinical care, such as managing cancer immunotherapy.
Conclusion: Informatics innovation, medical research, and clinical care go hand in hand as we look to infuse science-based practice into healthcare. Innovative methods will lead to a new ecosystem of applications (Apps) interacting with healthcare providers to fulfill a promise that is still to be determined.
- 1 Haneuse S, Daniels M. A General Framework for Considering Selection Bias in EHR-Based Studies: What Data Are Observed and Why?. EGEMS (Wash DC) 2016; 04 (01) 1203.
- 2 Johnson RJ. A Comprehensive Review of an Electronic Health Record System Soon to Assume Market Ascendancy: EPIC. J Healthc Commun. 2016 01(04)
- 3 Ash JS, Bates DW. Factors and Forces Affecting EHR System Adoption: Report of a 2004 ACMI discussion. J Am Med Inform Assoc 2005; 12 (01) 8-12.
- 4 Johnson SB, Friedman C. Integrating Data From Natural Language Processing into a Clinical Information System. Proc AMIA Annu Fall Symp. 1996. 537-41.
- 5 Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ. et al. Big Data: Astronomical or Genomical?. PLoS Biol 2015; 13 (07) e1002195.
- 6 Stine M. Migrating to Cloud-Native Application Architectures: OReily. 2015
- 7 Berchuk A, Kahn M, Rusincovitc S, Meeker D, Murphy S, Bhosale R. et al. Editor Assessment and Planning for the PCORnet Common Data Model. AMIA 2017 Joint Summits on Translational Science. 2017
- 8 Cimino JJ. The False Security of Blind Dates: Chrononymization’s Lack of Impact on Data Privacy of Laboratory Data. Appl Clin Inform 2012; 03 (04) 392-403.
- 9 Knowledge Representation for Health Care, HEC 2016 International Joint Workshop. New York, NY: Springer Berlin Heidelberg; 2017
- 10 Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A. et al. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci Data 2016; 03: 160018.
- 11 Kimball R. The Data Warehouse Toolkit : Practical Techniques for Building Dimensional Data Warehouses. New York: John Wiley & Sons; 1996
- 12 Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S. et al. Serving the Enterprise and Beyond with Informatics for Integrating Biology and the Bedside (i2b2). J Am Med Inform Assoc 2010; 17 (02) 124-30.
- 13 Kruse CS, Goswamy R, Raval Y, Marawi S. Challenges and Opportunities of Big Data in Health Care: A Systematic Review. JMIR Med Inform 2016; 04 (04) e38.
- 14 Belle A, Thiagarajan R, Soroushmehr SM, Navidi F, Beard DA, Najarian K. Big Data Analytics in Healthcare. Biomed Res Int 2015; 2015: 370194.
- 15 Brookhart MA, Sturmer T, Glynn RJ, Rassen J, Schneeweiss S. Confounding Control in Healthcare Database Research: Challenges and Potential Approaches. Med Care 2010; 48 (6 Suppl): S114-20.
- 16 Castro VM, Apperson WK, Gainer VS, Anan-thakrishnan AN, Goodson AP, Wang TD. et al. Evaluation of Matched Control Algorithms in EHR-based Phenotyping Studies: a Case Study of Inflammatory Bowel Disease Comorbidities. J Biomed Inform 2014; 52: 105-11.
- 17 O’Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM. Measuring diagnoses: ICD Code Accuracy. Health Serv Res 2005; 40 (5 Pt 2): 1620-39.
- 18 Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in Complex Traits: Challenges and Strategies. Nat Rev Genet 2013; 14 (07) 483-95.
- 19 Gainer VS, Cagan A, Castro VM, Duey S, Ghosh B, Goodson AP. et al. The Biobank Portal for Partners Personalized Medicine: A Query Tool for Working with Consented Biobank Samples, Genotypes, and Phenotypes Using i2b2. J Pers Med. 2016 06(01)
- 20 Sinnott JA, Dai W, Liao KP, Shaw SY, Ananthakrishnan AN, Gainer VS. et al. Improving the Power of Genetic Association Tests with Imperfect Phenotype Derived from Electronic Medical Records. Hum Genet 2014; 133 (11) 1369-82.
- 21 Perlis RH, Iosifescu DV, Castro VM, Murphy SN, Gainer VS, Minnier J. et al. Using Electronic Medical Records to Enable Large-Scale Studies in Psychiatry: Treatment Resistant Depression as a Model. Psychol Med 2012; 42 (01) 41-50.
- 22 Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN. et al. Development of Phenotype Algorithms using Electronic Medical Records and Incorporating Natural Language Processing. BMJ 2015; 350: h1885.
- 23 Castro VM, Dligach D, Finan S, Yu S, Can A, Abd-El-Barr M. et al. Large-scale Identification of Patients with Cerebral Aneurysms Using Natural Language Processing. Neurology 2017; 88 (02) 164-8.
- 24 Castro VM, Minnier J, Murphy SN, Kohane I, Churchill SE, Gainer V. et al. Validation of Electronic Health Record Phenotyping of Bipolar Disorder Cases and Controls. Am J Psychiatry 2015; 172 (04) 363-72.
- 25 Hripcsak G, Albers DJ. Next-Generation Phenotyping of Electronic Health Records. J Am Med Inform Assoc 2013; 20 (01) 117-21.
- 26 Yu S, Liao KP, Shaw SY, Gainer VS, Churchill SE, Szolovits P. et al. Toward High-Throughput Phenotyping: Unbiased Automated Feature Extraction and Selection from Knowledge Sources. J Am Med Inform Assoc 2015; 22 (05) 993-1000.
- 27 Mandl KD, Kohane IS, McFadden D, Weber GM, Natter M, Mandel J. et al. Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): Architecture. J Am Med Inform Assoc 2014; 21 (04) 615-20.
- 28 Weber GM, Murphy SN, McMurry AJ, Macfadden D, Nigrin DJ, Churchill S. et al. The Shared Health Research Information Network (SHRINE): a Prototype Federated Query Tool for Clinical Data Repositories. J Am Med Inform Assoc 2009; 16 (05) 624-30.
- 29 Goodwin T, Harabagiu SM. editors. Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records. 2013 IEEE Seventh International Conference on Semantic Computing (ICSC). IEEE. 2013
- 30 Ennis C, Snyder D, Ainsworth T, Stacy M, Sanderson I. editors. Utilization of the EPIC Electronic Health Record System for Clinical Trials Management at Duke University. 2014 IEEE International Conference on Healthcare Informatics (ICHI). IEEE. 2014
- 31 Geifman N, Butte AJ. Do Cancer Clinical Trial Populations Truly Represent Cancer Patients? A Comparison of Open Clinical Trials to the Cancer Genome Atlas. Pac Symp Biocomput 2016; 21: 309-20.
- 32 Porter M, Ramaswamy B, Beisler K, Neki P, Single N, Thomas J. et al. A Comprehensive Program for the Enhancement of Accrual to Clinical Trials. Ann Surg Oncol 2016; 23 (07) 2146-52.
- 33 Doshi-Velez F, Ge Y, Kohane I. Comorbidity Clusters in Autism Spectrum Disorders: an Electronic Health Record Time-Series Analysis. Pediatrics 2014; 133 (01) e54-63.
- 34 Ananthakrishnan AN, Gainer VS, Cai T, Perez RG, Cheng SC, Savova G. et al. Similar Risk of Depression and Anxiety Following Surgery or Hospitalization for Crohn’s Disease and Ulcerative Colitis. Am J Gastroenterol 2013; 108 (04) 594-601.
- 35 Dang T-T, Ouankhamchan P, Ho T-B. editors. Detection of New Drug Indications from Electronic Medical Records. 2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF). IEEE. 2016
- 36 Tatonetti NP, Denny JC, Murphy SN, Fernald GH, Krishnan G, Castro V. et al. Detecting Drug Interactions from Adverse-Event Reports: Interaction Between Paroxetine and Pravastatin Increases Blood Glucose Levels. Clin Pharmacol Ther 2011; 90 (01) 133-42.
- 37 Weber GM, Kohane IS. Extracting Physician Group Intelligence from Electronic Health Records to Support Evidence Based Medicine. PLoS One 2013; 08 (05) e64933.
- 38 Open CScience. PSYCHOLOGY. Estimating the Reproducibility of Psychological Science. Science. 2015 349. (6251):aac4716.
- 39 Walsh C, Hripcsak G. The Effects of Data Sources, Cohort Selection, and Outcome Definition on a Predictive Model of Risk of Thirty-day Hospital Readmissions. J Biomed Inform 2014; 52: 418-26.
- 40 Reis BY, Kohane IS, Mandl KD. Longitudinal Histories as Predictors of Future Diagnoses of Domestic Abuse: Modelling Study. BMJ 2009; 339: b3677.
- 41 Barak-Corren Y, Castro VM, Javitt S, Hoffnagle AG, Dai Y, Perlis RH. et al. Predicting Suicidal Behavior From Longitudinal Electronic Health Records. Am J Psychiatry. 2016 appia-jp201616010077.
- 42 McCoy TH, Castro VM, Cagan A, Roberson AM, Kohane IS, Perlis RH. Sentiment Measured in Hospital Discharge Notes Is Associated with Readmission and Mortality Risk: An Electronic Health Record Study. PLoS One 2015; 10 (08) e0136341.
- 43 Castro VM, McCoy TH, Cagan A, Rosenfield HR, Murphy SN, Churchill SE. et al. Stratification of Risk for Hospital Admissions for Injury Related to Fall: Cohort Study. BMJ 2014; 349: g5863.
- 44 Burwell SM. Setting Value-based Payment Goals--HHS Efforts to Improve U.S. Health Care. N Engl J Med 2015; 372 (10) 897-9.
- 45 Evans RS, Benuzillo J, Horne BD, Lloyd JF, Brad-shaw A, Budge D. et al. Automated Identification and Predictive Tools to Help Identify High-risk Heart Failure Patients: Pilot Evaluation. J Am Med Inform Assoc 2016; 23 (05) 872-8.
- 46 Erskine AR, Karunakaran B, Slotkin JR, Fein-berg DT. Harvard Business Review [Internet]. 2016 Available from: https://hbr.org/2016/12/how-geisinger-health-system-uses-big-data-to-save-lives
- 47 Weber GM, Mandl KD, Kohane IS. Finding the Missing Link for Big Biomedical Data. JAMA 2014; 311 (24) 2479-80.
- 48 Mandl KD, Mandel JC, Kohane IS. Driving Innovation in Health Systems through an Apps-Based Information Economy. Cell Syst 2015; 01 (01) 8-13.
- 49 Mandl KD, Kohane IS. No Small Change for the Health Information Economy. N Engl J Med 2009; 360 (13) 1278-81.
- 50 Bloomfeld Jr RA, Polo-Wood F, Mandel JC, Mandl KD. Opening the Duke Electronic Health Record to Apps: Implementing SMART on FHIR. Int J Med Inform 2017; 99: 1-10.
- 51 SMART Apps Gallery. 2017 [Available from: https://gallery.smarthealthit.org/
- 52 Warner JL, Rioth MJ, Mandl KD, Mandel JC, Kreda DA, Kohane IS. et al. SMART Precision Cancer Medicine: a FHIR-based App to Provide Genomic Information at the Point of Care. J Am Med Inform Assoc 2016; 23 (04) 701-10.
- 53 Alterovitz G, Warner J, Zhang P, Chen Y, Ull-man-Cullere M, Kreda D. et al. SMART on FHIR Genomics: Facilitating Standardized Clinico-genomic Apps. J Am Med Inform Assoc 2015; 22 (06) 1173-8.
- 54 Berman JJ. Concept-match medical data scrubbing. How Pathology Text can be used in Research. Arch Pathol Lab Med 2003; 127 (06) 680-6.