Yearb Med Inform 2014; 23(01): 08-13
DOI: 10.15265/IY-2014-0024
Original Article
Georg Thieme Verlag KG Stuttgart

Big Data and Biomedical Informatics: A Challenging Opportunity

Riccardo Bellazzi
1   Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy
› Author Affiliations
Further Information

Publication History

22 May 2014

Publication Date:
05 March 2018 (online)

Summary

Big data are receiving an increasing attention in biomedicine and healthcare. It is therefore important to understand the reason why big data are assuming a crucial role for the biomedical informatics community. The capability of handling big data is becoming an enabler to carry out unprecedented research studies and to implement new models of healthcare delivery. Therefore, it is first necessary to deeply understand the four elements that constitute big data, namely Volume, Variety, Velocity, and Veracity, and their meaning in practice. Then, it is mandatory to understand where big data are present, and where they can be beneficially collected. There are research fields, such as translational bioinformatics, which need to rely on big data technologies to withstand the shock wave of data that is generated every day. Other areas, ranging from epidemiology to clinical care, can benefit from the exploitation of the large amounts of data that are nowadays available, from personal monitoring to primary care. However, building big data-enabled systems carries on relevant implications in terms of reproducibility of research studies and management of privacy and data access; proper actions should be taken to deal with these issues. An interesting consequence of the big data scenario is the availability of new software, methods, and tools, such as map-reduce, cloud computing, and concept drift machine learning algorithms, which will not only contribute to big data research, but may be beneficial in many biomedical informatics applications. The way forward with the big data opportunity will require properly applied engineering principles to design studies and applications, to avoid preconceptions or over-enthusiasms, to fully exploit the available technologies, and to improve data processing and data management regulations.

 
  • References

  • 1 Ross JW, Beath CM, Quaadgras A. You May Not Need Big Data After All. Harvard Business Review, Dec 01, 2013
  • 2 Grossman RL, White KP. A vision for a biomedical cloud. J Intern Med 2012; Feb 271 (Suppl. 02) 122-30.
  • 3 Smitha T, Suresh Kumar V. Applications of big data in data mining. International Journal of Emerging Technology and Advanced Engineering 2013 7. 03 ( www.ijetae.com ).
  • 4 Peek N, Sun J, Holmes J, Martin-Sanchez F, Bella-zzi R. Biomedical and Healthcare Analytics on Big Data. AMIA 2013 Symposium Proceedings 2013 November 1116-7.
  • 5 http://www-01.ibm.com/software/data/bigdata/
  • 6 Eaton C, DeRoos D, Deutsch T, Lapis G, Zikopoulos P. Understanding Big Data. McGraw Hill; 2012
  • 7 http://www.techspot.com/news/52011-one-minute-on-the-internet-640tb-data-transferred-100ktweets-204-million-e-mails-sent.html
  • 8 Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA 2013 Apr 3 309 (Suppl. 13) 1351-2.
  • 9 McAfee A, Brynjolfsson E. Big data: the management revolution. Harv Bus Rev 2012; Oct 90 (Suppl. 10) 60-6 68, 128.
  • 10 Cuzzocrea A, Moussa R, Guandong X. OLAP*: Effectively and Efficiently Supporting Parallel OLAP over Big Data, Model and Data Engineering, Lecture Notes in Computer Science. Volume 8216, 2013 p. 38-49.
  • 11 Hay SI, George DB, Moyes CL, Brownstein JS. Big data opportunities for global infectious disease surveillance. PLoS Med 2013; 10 (Suppl. 04) e1001413.
  • 12 Schultz T. Turning healthcare challenges into big data opportunities: A use-case review across the pharmaceutical development lifecycle. Bulletin of the Association for Information Science and Technology 2013; 39 (Suppl. 05) 34-40.
  • 13 Costa FF. Big data in biomedicine. Drug Discov Today 2013 Oct 29.
  • 14 O’Driscoll A, Daugelaite J, Sleator RD. ‘Big data’, Hadoop and cloud computing in genomics. J Biomed Inform 2013; Oct 46 (Suppl. 05) 774-81.
  • 15 Shah NH. Translational bioinformatics embraces big data. Yearb Med Inform 2012; 7 (Suppl. 01) 130-4.
  • 16 Lecroq T, Soualmia LF. From genome sequencing to bedside. Findings from the section on bioinformatics and translational informatics. Yearb Med Inform 2013; 8 (Suppl. 01) 175-7.
  • 17 Van Horn JD, Toga AW. Human neuroimaging as a “Big Data” science. Brain Imaging Behav 2013 Oct 10. [Epub ahead of print]
  • 18 Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA. et al. eMERGE Network. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med 2013; Oct 15 (Suppl. 10) 761-71.
  • 19 Leduc R, Vaughn M, Fonner JM, Sullivan M, Williams JG, Blood PD. et al. Leveraging the national cyberinfrastructure for biomedical research. J Am Med Inform Assoc 2013 Aug 20.
  • 20 Dong X, Bahroos N, Sadhu E, Jackson T, Chukhman M, Johnson R. et al. Leverage Hadoop Framework for Large Scale Clinical Informatics Applications. AMIA Summits Transl Sci Proc 2013 Mar 18 2013: 53.
  • 21 Athey BD, Braxenthaler M, Haas M, Guo Y. tranSMART: An Open Source and Community-Driven Informatics and Data Sharing Platform for Clinical and Translational Research. AMIA Summits Transl Sci Proc.2013 Mar 18 2013: 6-8.
  • 22 Sarkar IN, Butte AJ, Lussier YA, Tarczy-Hornoch P, Ohno-Machado L. Translational bioinformatics: linking knowledge across biological and clinical realms. J Am Med Inform Assoc 2011; Jul-Aug 18 (Suppl. 04) 354-7.
  • 23 Concaro S, Sacchi L, Cerra C, Fratino P, Bellazzi R. Mining health care administrative data with temporal association rules on hybrid events. Methods Inf Med 2011; 50 (Suppl. 02) 166-79.
  • 24 Colombo GL, Rossi E, De Rosa M, Benedetto D, Gaddi AV. Antidiabetic therapy in real practice: indicators for adherence and treatment cost. Patient Prefer Adherence 2012; 6: 653-61.
  • 25 Dalle Carbonare S, Cerra C, Bellazzi R. Development and representation of health indicators with thematic maps. Stud Health Technol Inform 2012; 180: 220-4.
  • 26 Sortsø C, Thygesen LC, Brønnum-Hansen H. Database on Danish population-based registers for public health and welfare research. Scand J Public Health 2011; Jul 39 (Suppl. 07) 17-9.
  • 27 Lippert S, Kverneland A. The Danish National Health Informatics Strategy. Stud Health Technol Inform 2003; 95: 845-50.
  • 28 Martin Sanchez F, Gray K, Bellazzi R, Lopez-Campos G. Exposome informatics: considerations for the design of future biomedical research information systems. J Am Med Inform Assoc 2013 Nov 1.
  • 29 Chute CG, Ullman-Cullere M, Wood GM, Lin SM, He M, Pathak J. Some experiences and opportunities for big data in translational research. Genet Med 2013; Oct 15 (Suppl. 10) 802-9. doi: 10.1038/ gim.2013.121. Epub 2013 Sep 5.
  • 30 de Lissovoy G. Big data meets the electronic medical record: a commentary on “identifying patients at increased risk for unplanned readmission”. Med Care 2013; Sep 51 (Suppl. 09) 759-60.
  • 31 Cases M, Furlong LI, Albanell J, Altman RB, Bella-zzi R, Boyer S. et al. Improving data and knowledge management to better integrate health care and research. J Intern Med 2013; Oct 274 (Suppl. 04) 321-8.
  • 32 Restuccia JD, Cohen AB, Horwitt JN, Shwartz M. Hospital implementation of health information technology and quality of care: are they related?. BMC Med Inform Decis Mak 2012 Sep 27 12: 109.
  • 33 Masoni M, Guelfi MR, Conti A, Gensini GF. Pharmacovigilance and use of online health information. Trends Pharmacol Sci 2013; Jul 34 (Suppl. 07) 357-8.
  • 34 Olson DR, Konty KJ, Paladini M, Viboud C, Simonsen L. Reassessing Google Flu Trends data for detection of seasonal and pandemic influenza: a comparative epidemiological study at three geographic scales. PLoS Comput Biol 2013; 9 (Suppl. 10) e1003256. Epub 2013 Oct 17.
  • 35 Harpaz R, Vilar S, Dumouchel W, Salmasian H, Haerian K, Shah NH. et al. Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions. J Am Med Inform Assoc 2013 May 1 20 (Suppl. 03) 413-9.
  • 36 Lazer D, Kennedy R, King G, Vespignani A. Big data. The parable of Google Flu: traps in big data analysis. Science 2014 Mar 14 343 6176 1203-5.
  • 37 Malin JL. Envisioning Watson as a rapid-learning system for oncology. J Oncol Pract 2013; May 9 (Suppl. 03) 155-7.
  • 38 Mirnezami R, Nicholson J, Darzi A. Preparing for precision medicine. N Engl J Med 2012 Feb 9 366 (Suppl. 06) 489-91.
  • 39 de Bronkart D. How the e-patient community helped save my life: an essay by Dave deBronkart. BMJ 2013 Apr 2 346: f1990.
  • 40 Giuse NB, Koonce TY, Storrow AB, Kusnoor SV, Ye F. Using health literacy and learning style preferences to optimize the delivery of health information. J Health Commun 2012;17 Suppl 3 122-40.
  • 41 Furlanello C. Emerging data waves in biomedicine: the challenge of reproducibility, IDAMAP 2012 keynote lecture. Pavia: November 22, 2012
  • 42 Baggerly KA, Coombes KR. Deriving Chemosensitivity from Cell Lines: Forensic Bioinformatics and Reproducible Research in High-throughput Biology. Annals of Applied Statistics 2009; 3 (Suppl. 04) 1309-34.
  • 43 Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC. et al. Repeatability of published microarray gene expression analyses. Nat Genet 2009; Feb 41 (Suppl. 02) 149-55.
  • 44 Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 2010; Oct 11 (Suppl. 10) 733-9.
  • 45 https://www.scienceexchange.com/reproducibility
  • 46 Sneddon TP, Li P, Edmunds SC. GigaDB: announcing the GigaScience database. Gigascience 2012 Jul 12 1 (Suppl. 01) 11.
  • 47 Stodden V, Guo P, Ma Z. Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals. PLoS One 2013 Jun 21 8 (Suppl. 06) e67111.
  • 48 Dreyer NA, Schneeweiss S, McNeil BJ, Berger ML, Walker AM, Ollendorf DA. et al. GRACE Principles. Recognizing High-Quality Observational Studies of Comparative Effectiveness. Am J Manag Care 2010; 16 (Suppl. 06) 467-71.
  • 49 Schadt EE. The changing privacy landscape in the era of big data. Mol Syst Biol 2012; 8: 612.
  • 50 Geissbuhler A, Safran C, Buchan I, Bellazzi R, Labkoff S, Eilenberg K. et al. Trustworthy reuse of health data: a transnational perspective. Int J Med Inform 2013; Jan 82 (Suppl. 01) 1-9.
  • 51 Andersen MR, Storm HH. on behalf of the Euro-course Work Package 2 Group. Cancer registration, public health and the reform of the European data protection framework: Abandoning or improving European public health research?. Eur J Cancer 2013 Oct 10.
  • 52 Di Iorio CT, Carinci F, Oderkirk J. Health research and systems’ governance are at risk: should the right to data protection override health?. J Med Ethics 2013 Dec 5.
  • 53 Zou Q, Li XB, Jiang WR, Lin ZY, Li GL, Chen K. Survey of MapReduce frame operation in bioinformatics. Brief Bioinform 2013 Feb 7. [Epub ahead of print].
  • 54 Dong X, Bahroos N, Sadhu E, Jackson T, Chukhman M, Johnson R. et al. Leverage Hadoop Framework for Large Scale Clinical Informatics Applications. AMIA Summits Transl Sci Proc 2013 Mar 18 2013: 53.
  • 55 Nordberg H, Bhatia K, Wang K, Wang Z. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data. Bioinformatics 2013 Dec 1 29 (Suppl. 23) 3014-9.
  • 56 Manyam G, Payton MA, Roth JA, Abruzzo LV, Coombes KR. Relax with CouchDB-into the non-relational DBMS era of bioinformatics. Genomics 2012; Jul 100 (Suppl. 01) 1-7.
  • 57 Lee KK, Tang WC, Choi KS. Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage. Comput Methods Programs Biomed 2013; Apr 110 (Suppl. 01) 99-109.
  • 58 Triplet T, Butler G. A review of genomic data warehousing systems. Brief Bioinform 2013 May 14.
  • 59 Wolfe PJ. Making sense of big data. Proc Natl Acad Sci U S A 2013 Nov 5 110 (Suppl. 45) 18031-2.
  • 60 Zilberstein S. Using Anytime Algorithms in Intelligent Systems. AI Magazine 1996; 17 (Suppl. 03) 73-83.
  • 61 Žliobaitè, Indrè.. Learning under concept drift: an overview. arXiv preprint arXiv:1010.4784; 2010
  • 62 Ryan Hoens T, Polikar R, Chawla NV. Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 2012; 1 (Suppl. 01) 89-101.
  • 63 Stella F, Amer Y. Continuous time Bayesian network classifiers. J Biomed Inform 2012; Dec 45 (Suppl. 06) 1108-19.
  • 64 Wall DP, Kudtarkar P, Fusaro VA, Pivovarov R, Patil P, Tonellato PJ. Cloud computing for comparative genomics. BMC Bioinformatics 2010 May 18 11: 259.
  • 65 Lin CW, Abdul SS, Clinciu DL, Scholl J, Jin X, Lu H. et al. Empowering village doctors and enhancing rural healthcare using cloud computing in a rural area of mainland China. Comput Methods Programs Biomed 2013 Nov 9.
  • 66 Lin YC, Yu CS, Lin YJ. Enabling large-scale biomedical analysis in the cloud. Biomed Res Int 2013; 2013: 185679.
  • 67 Kaur PD, Chana I. Cloud based intelligent system for delivering health care as a service. Comput Methods Programs Biomed 2014; Jan 113 (Suppl. 01) 346-59.
  • 68 Zhou S, Liao R, Guan J. When cloud computing meets bioinformatics: a review. J Bioinform Comput Biol 2013; Oct 11 (Suppl. 05) 1330002.
  • 69 Ohno-Machado L, Farcas C, Kim J, Wang S, Jiang X. Genomes in the Cloud: Balancing Privacy Rights and the Public Good. AMIA Summits Transl Sci Proc 2013
  • 70 http://bd2k.nih.gov
  • 71 Jaulent MC. Personal communication.
  • 72 Neff G. Why big data won’t cure us. Big data 2013; Sep 1 (Suppl. 03) 117-23.
  • 73 Davenport TH, Patil DJ. Data scientist: the sexiest job of the 21st century. Harv Bus Rev 2012; Oct 90 (Suppl. 10) 70-6 128.