Methods Inf Med 2016; 55(02): 125-135
DOI: 10.3414/ME15-01-0082
Original Articles
Schattauer GmbH

Integrated Data Repository Toolkit (IDRT)

A Suite of Programs to Facilitate Health Analytics on Heterogeneous Medical Data
C. R. K. D. Bauer*
1   Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
,
T. Ganslandt*
2   Medical Center for Information and Communication Technology, Erlangen University Hospital, Erlangen, Germany
3   Chair of Medical Informatics, Friedrich-Alexander-University of Erlangen-Nuremberg, Erlangen, Germany
,
B. Baum
1   Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
,
J. Christoph
3   Chair of Medical Informatics, Friedrich-Alexander-University of Erlangen-Nuremberg, Erlangen, Germany
,
I. Engel
3   Chair of Medical Informatics, Friedrich-Alexander-University of Erlangen-Nuremberg, Erlangen, Germany
,
M. Löbe
4   Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University Leipzig, Leipzig, Germany
,
S. Mate
3   Chair of Medical Informatics, Friedrich-Alexander-University of Erlangen-Nuremberg, Erlangen, Germany
,
S. Stäubert
4   Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University Leipzig, Leipzig, Germany
,
J. Drepper
5   TMF – Technology, Methods, and Infrastructure for Networked Medical Research, Berlin, Germany
,
H.-U. Prokosch
2   Medical Center for Information and Communication Technology, Erlangen University Hospital, Erlangen, Germany
3   Chair of Medical Informatics, Friedrich-Alexander-University of Erlangen-Nuremberg, Erlangen, Germany
,
A. Winter
4   Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University Leipzig, Leipzig, Germany
,
U. Sax
1   Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
› Author Affiliations
Further Information

Publication History

received: 03 July 2015

accepted: 09 November 2015

Publication Date:
08 January 2018 (online)

Summary

Background: In recent years, research data warehouses moved increasingly into the focus of interest of medical research. Nevertheless, there are only a few center-independent infrastructure solutions available. They aim to provide a consolidated view on medical data from various sources such as clinical trials, electronic health records, epidemiological registries or longitudinal cohorts. The i2b2 framework is a well-established solution for such repositories, but it lacks support for importing and integrating clinical data and metadata.

Objectives: The goal of this project was to develop a platform for easy integration and administration of data from heterogeneous sources, to provide capabilities for linking them to medical terminologies and to allow for transforming and mapping of data streams for user-specific views.

Methods: A suite of three tools has been developed: the i2b2 Wizard for simplifying administration of i2b2, the IDRT Import and Mapping Tool for loading clinical data from various formats like CSV, SQL, CDISC ODM or biobanks and the IDRT i2b2 Web Client Plugin for advanced export options. The Import and Mapping Tool also includes an ontology editor for rearranging and mapping patient data and structures as well as annotating clinical data with medical terminologies, primarily those used in Germany (ICD-10-GM, OPS, ICD-O, etc.).

Results: With the three tools functional, new i2b2-based research projects can be created, populated and customized to researcher’s needs in a few hours. Amalgamating data and metadata from different databases can be managed easily. With regards to data privacy a pseudonymization service can be plugged in. Using common ontologies and reference terminologies rather than project-specific ones leads to a consistent understanding of the data semantics.

Conclusions: i2b2’s promise is to enable clinical researchers to devise and test new hypothesis even without a deep knowledge in statistical programing. The approach pre -sented here has been tested in a number of scenarios with millions of observations and tens of thousands of patients. Initially mostly observant, trained researchers were able to construct new analyses on their own. Early feedback indicates that timely and extensive access to their “own” data is appreciated most, but it is also lowering the barrier for other tasks, for instance checking data quality and completeness (missing data, wrong coding).

* Both authors contributed equally to the article.


 
  • References

  • 1 Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med 2009; 48 (Suppl. 01) 38-44.
  • 2 Anderson N, Abend A, Mandel A, Geraghty E, Gabriel D, Wynden R. et al. Implementation of a deidentified federated data network for population-based cohort discovery. J Am Med Inform Assoc 2012; 19 e1 e60-67.
  • 3 Shin S, Kim WS, Lee J. Characteristics Desired in Clinical Data Warehouse for Biomedical Research. Healthc Inform Res 2014; 20 (Suppl. 02) 109-116.
  • 4 Cuggia M, Besana P, Glasspool D. Comparing semi-automatic systems for recruitment of patients to clinical trials. Int J Med Inform 2011; 80 (Suppl. 06) 371-388.
  • 5 Köpcke F, Kraus S, Scholler A, Nau C, Schuttler J, Prokosch H. et al. Secondary use of routinely collected patient data in a clinical trial: an evaluation of the effects on patient recruitment and data acquisition. Int J Med Inform 2013; 82 (Suppl. 03) 185-192.
  • 6 Natter MD, Quan J, Ortiz DM, Bousvaros A, Ilowite NT, Inman CJ. et al. An i2b2-based, generalizable, open source, self-scaling chronic disease registry. J Am Med Inform Assoc 2013; 20 (Suppl. 01) 172-179.
  • 7 Moor GD, Sundgren M, Kalra D, Schmidt A, Dugas M, Claerhout B. et al. Using Electronic Health Records for Clinical Research: the Case of the EHR4CR Project. J Biomed Inform. 2014
  • 8 OHDSI | Observational Health Data Sciences and Informatics. [cited 2015 Mar 3]. Available from: http://www.ohdsi.org.
  • 9 Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc 2014; 21 (Suppl. 04) 578-582.
  • 10 Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 2010; 17 (Suppl. 02) 124-130.
  • 11 Deshmukh VG, Meystre SM, Mitchell JA. Evaluating the informatics for integrating biology and the bedside system for clinical research. BMC Med Res Methodol 2009; 9: 70.
  • 12 Segagni D, Ferrazzi F, Larizza C, Tibollo V, Napolitano C, Priori SG. et al. R engine cell: integrating R into the i2b2 software infrastructure. J Am Med Inform Assoc 2011; 18 (Suppl. 03) 314-317.
  • 13 Weinlich B, Mate S, Prokosch HU, Ganslandt T, Toddenroth D. “R-Scriptlets” für i2b2-Endanwender. GMDS 2014. 59. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Göttingen, 07.-10.09.2014. Düsseldorf: German Medical Science GMS Publishing House; 2014. DocAbstr. 346 2014.
  • 14 Overview: i2b2 Wiki [cited 2015 Apr 29]. Available from: https://community.i2b2.org/wiki/dashboard.action (archived at: http://www.webcitation.org/6Y97WMkIF ).
  • 15 O’Dushlaine C, Ripke S, Ruderfer DM, Hamilton SP, Fava M, Iosifescu DV. et al. Rare copy number variation in treatment-resistant major depressive disorder. Biol Psychiatry 2014; 76 (Suppl. 07) 536-541.
  • 16 Abonia JP, Wen T, Stucke EM, Grotjan T, Griffith MS, Kemme KA. et al. High prevalence of eosinophilic esophagitis in patients with inherited connective tissue disorders. J Allergy Clin Immunol 2013; 132 (Suppl. 02) 378-386.
  • 17 Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2008; 15 (Suppl. 01) 14-24.
  • 18 Uzuner O. Recognizing obesity and comorbidities in sparse data. J Am Med Inform Assoc 2009; 16 (Suppl. 04) 561-570.
  • 19 Uzuner O, Solti I, Cadag E. Extracting medication information from clinical text. J Am Med Inform Assoc 2010; 17 (Suppl. 05) 514-518.
  • 20 Uzuner O, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc 2011; 18 (Suppl. 05) 552-556.
  • 21 Uzuner O, Bodnari A, Shen S, Forbush T, Pestian J, South BR. Evaluating the state of the art in coreference resolution for electronic medical records. J Am Med Inform Assoc 2012; 19 (Suppl. 05) 786-791.
  • 22 Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc 2013; 20 (Suppl. 05) 806-813.
  • 23 What is eTRIKS... [cited 2015 Mar 3]. Available from: URL: http://www.etriks.org.
  • 24 Athey BD, Braxenthaler M, Haas M, Guo Y. tranSMART: An Open Source and Community-Driven Informatics and Data Sharing Platform for Clinical and Translational Research. AMIA Jt Summits Transl Sci Proc 2013; 2013: 6-8.
  • 25 tranSMART Foundation – [cited 2015 Mar 3]. Available from: http://transmartfoundation.org.
  • 26 Weber GM, Murphy SN, McMurry AJ, Macfadden D, Nigrin DJ, Churchill S. et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc 2009; 16 (Suppl. 05) 624-630.
  • 27 Amin W, Tsui FR, Borromeo C, Chuang CH, Espino JU, Ford D. et al. PaTH: towards a learning health system in the Mid-Atlantic region. J Am Med Inform Assoc 2014; 21 (Suppl. 04) 633-636.
  • 28 Ganslandt T, Mate S, Helbing K, Sax U, Prokosch HU. Unlocking Data for Clinical Research – The German i2b2 Experience. Appl Clin Inform 2011; 2 (Suppl. 01) 116-127.
  • 29 Deshmukh VG, Meystre SM, Mitchell JA. Evaluating the informatics for integrating biology and the bedside system for clinical research. BMC Med Res Methodol 2009; 9: 70.
  • 30 TMF.. Home [cited 2015 Mar 3]. Available from: http://www.tmf-ev.de/EnglishSite/Home.aspx.
  • 31 CiteULike: Group: IDRT1WP1 – library 24 articles [cited 2015 Apr 27 (archived at:http://www.webcitation.org/6Y69NRmPf)]. Available from: www.citeulike.org/group/19482.
  • 32 Löbe M, Stäubert S, Winter A. Integrated Data Repository Toolkit (IDRT). – Deliverable D1.1: Aktualisierung und Formalisierung der bestehenden Anforderungsanalyse.; 2012. Available from:http://ias4.imise.uni-leipzig.de/idrt_edit/IDRT-I/Deliverables/IDR1-D1.1-Anforderunganalyse.pdf. Accessed: 2015-03-02. (Archived by WebCite® at http://www.webcitation.org/6Wj1aVaBE ).
  • 33 Stausberg J, Löbe M, Verplancke P, Drepper J, Herre H, Loffler M. Foundations of a metadata repository for databases of registers and trials. Stud Health Technol Inform 2009; 150: 409-413.
  • 34 Faldum A, Pommerening K. An optimal code for patient identifiers. Comput Methods Programs Biomed 2005; 79 (Suppl. 01) 81-88.
  • 35 TMF.. V015–01 PID-Generator [cited 2015 Aug 24 (archived at: http://www.webcitation.org/6b1Unil2R )]. Available from: http://www.tmfev.de/Themen/Projekte/V015_01_PID_Generator.aspx.
  • 36 InEK GmbH.. Datensatzbeschreibung, InEK GmbH [cited 2015 Aug 25 (archived at: http://www.webcitation.org/6b308BG2Z )]. Available from: www.g-drg.de/cms/inek_site_de/Datenlieferung_gem._21_KHEntgG/Dokumente_zur_Datenlieferung/Datensatzbeschreibung.
  • 37 Löbe M. Metadata Repository. 2014 [cited 2015 Mar 3]. Available from: https://mdr.imise.uni/leipzig.de.
  • 38 van der Haring EJ, Broenhorst S, Napel H ten Weber S, Schopen M, Zanstra PE. ClaML: a standard for the electronic publication of classification coding schemes. Stud Health Technol Inform 2006; 124: 801-806.
  • 39 Nadkarni PM. QAV: querying entity-attribute-value metadata in a biomedical database. Comput Methods Programs Biomed 1997; 53 (Suppl. 02) 93-103.
  • 40 gnu.org [cited 2015 Mar 5]. Available from: URL: http://www.gnu.org/licenses.
  • 41 IDRT – Integrated Data Repository Toolkit – Related Project – IDRT – Integrated Data Repository Toolkit – i2b2 Wiki [cited 2015 Aug 25]. Available from: https://community.i2b2.org/wiki/display/IDRT/.
  • 42 Optimizing Query Performance with the Ontology Total_Num field – i2b2 Developer’s Forum – i2b2 Wiki [cited 2015 Aug 25 (archived at: http://www.webcitation.org/6b2z98sBj )]. Available from:https://community.i2b2.org/wiki/display/DevForum/Optimizing+Query+Performance+with+the+Ontology+Total_Num+field(V.04).
  • 43 710. Best Practices for Performance Optimization – Related Project – IDRT – Integrated Data Repository Toolkit – i2b2 Wiki [cited 2015 Aug 25 (archived at: http://www.webcitation.org/6b30LuFSP )]. Available from:https://community.i2b2.org/wiki/display/IDRT/710.+Best+Practices+for+Performance+Optimization
  • 44 Kompetenznetz AHF. Information material. [cited 2015 Apr 28 (archived at: http://www.webcitation.org/6Y97WMkIF )]. Available from: http://www.kompetenznetz-ahf.de/en/competence-network/information-material/.
  • 45 CSCC; 2015 [cited 2015 May 12]. Available from: http://www.cscc.uniklinikum-jena.de/CSCC.html.
  • 46 Reich-Erkelenz D. Clinical Research Group 241 [cited 2015 Apr 28]. Available from: http://www.kfo241.de/index_en.php.
  • 47 Christoph J, Griebel L, Leb I, Engel I, Köpcke F, Toddenroth D. et al. Secure Secondary Use of Clinical Data with Cloud-based NLP Services. Towards a Highly Scalable Research Infrastructure. Methods Inf Med 2015; 54 (Suppl. 03) 276-282.
  • 48 Deutsche Hochschulmedizin e.V. Landkarte [German only]; 2014 [cited 2015 Apr 29 (archived at: http://www.webcitation.org/6Y98p7WTO )]. Available from: http://www.landkarte-hochschul/medizin.de/#auswertung.
  • 49 Segagni D, Tibollo V, Dagliati A, Perinati L, Zambelli A, Priori S. et al. The ONCO-I2b2 project: integrating biobank information and clinical data to support translational research in on-cology. Stud Health Technol Inform 2011; 169: 887-891.
  • 50 Shawn Murphy. i2b2 Roadmap. Göttingen; 2014. (2nd European i2b2 Academic User Group Meet-ing/GMDS/IMIA Workshop “Research Databases”) [cited 2015 Aug 25 (archived at: http://www.webcitation.org/6b4XMA7T1)]. Available from:http://www.pg-ss.imi.uni-erlangen.de/SiteCollectionDocuments/S1T1%20Murphy.pdf.
  • 51 Cameron D. Transforming “Big Data” into Knowledge|HMS. [cited 2015 Aug 27 (archived at: http://www.webcitation.org/6b5n1tJp4 )]. Available from: https://hms.harvard.edu/news/transforming-big-data-knowledge.
  • 52 Paten B, Diekhans M, Druker BJ, Friend S, Guinney J, Gassner N. et al. The NIH BD2K center for big data in translational genomics. J Am Med Inform Assoc. 2015
  • 53 Canuel V, Rance B, Avillach P, Degoulet P, Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Briefings in bioinformatics 2015; 16 (Suppl. 02) 280-290.
  • 54 Huser V, Cimino JJ. Desiderata for healthcare integrated data repositories based on architectural comparison of three public repositories. AMIA Annu Symp Proc 2013; 2013: 648-656.
  • 55 GITHub – TMF – Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. [cited 2015 May 6]. Available from: https://github.com/tmfev.