Secure Secondary Use of Clinical Data with Cloud-based NLP Services

J. Christoph; L. Griebel; I. Leb; I. Engel; F. Köpcke; D. Toddenroth; H. -U. Prokosch; J. Laufer; K. Marquardt; M. Sedlmayr

doi:10.3414/ME13-01-0133

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2015; 54(03): 276-282
DOI: 10.3414/ME13-01-0133

Original Articles

Schattauer GmbH

Secure Secondary Use of Clinical Data with Cloud-based NLP Services

Towards a Highly Scalable Research Infrastructure

Authors

J. Christoph

¹Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
L. Griebel

¹Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
I. Leb

¹Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
I. Engel

¹Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
F. Köpcke

¹Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
D. Toddenroth

¹Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
H. -U. Prokosch

¹Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
J. Laufer

²Rhön-Klinikum AG, Bad Neustadt/Saale, Germany
K. Marquardt

²Rhön-Klinikum AG, Bad Neustadt/Saale, Germany
M. Sedlmayr

¹Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

Further Information

Publication History

received: 01 December 2013

accepted: 08 October 2014

Publication Date:
22 January 2018 (online)

Permissions and Reprints

Summary

Objectives: The secondary use of clinical data provides large opportunities for clinical and translational research as well as quality assurance projects. For such purposes, it is necessary to provide a flexible and scalable infrastructure that is compliant with privacy requirements. The major goals of the cloud4health project are to define such an architecture, to implement a technical prototype that fulfills these requirements and to evaluate it with three use cases.

Methods: The architecture provides components for multiple data provider sites such as hospitals to extract free text as well as structured data from local sources and de-identify such data for further anonymous or pseudonymous processing. Free text documentation is analyzed and transformed into structured information by text-mining services, which are provided within a cloud-computing environment. Thus, newly gained annotations can be integrated along with the already available structured data items and the resulting data sets can be uploaded to a central study portal for further analysis.

Results: Based on the architecture design, a prototype has been implemented and is under evaluation in three clinical use cases. Data from several hundred patients provided by a University Hospital and a private hospital chain have already been processed.

Conclusions: Cloud4health has shown how existing components for secondary use of structured data can be complemented with text-mining in a privacy compliant manner. The cloud-computing paradigm allows a flexible and dynamically adaptable service provision that facilitates the adoption of services by data providers without own investments in respective hardware resources and software tools.

Keywords

Cloud-computing - secondary use - text-mining - privacy - natural language processing - software design

References
1 Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PR, Bernstam EV. et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical care 2013; 51 (Suppl. 08) Suppl 3 S30-7. Epub 2013/06/19.

Crossref Search in Google Scholar
Download RIS citation
2 Cuggia M, Besana P, Glasspool D. Comparing semi-automatic systems for recruitment of patients to clinical trials. Int J Med Inform 2011; 80 (06) 371-388. Epub 2011/04/05.

Crossref Search in Google Scholar
Download RIS citation
3 Weber GM, Murphy SN, McMurry AJ, Macfadden D, Nigrin DJ, Churchill S. et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. JAMIA 2009; 16 (05) 624-630. Epub 2009/07/02.

Search in Google Scholar
Download RIS citation
4 McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB. et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC medical genomics 2011; 4: 13 Epub 2011/01/29.

Crossref Search in Google Scholar
Download RIS citation
5 Rea S, Pathak J, Savova G, Oniki TA, Westberg L, Beebe CE. et al. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J Biomed Inform 2012; 45 (04) 763-771. Epub 2012/02/14.

Crossref Search in Google Scholar
Download RIS citation
6 Ethier JF, Dameron O, Curcin V, McGilchrist MM, Verheij RA, Arvanitis TN. et al. A unified structural/terminological interoperability framework based on LexEVS: application to TRANSFoRm. JAMIA 2013; 20 (05) 986-994. Epub 2013/04/11.

Search in Google Scholar
Download RIS citation
7 Oliveira JL, Lopes P, Nunes T, Campos D, Boyer S, Ahlberg E. et al. The EU-ADR Web Platform: delivering advanced pharmacovigilance tools. Pharmacoepidemiology and drug safety 2013; 22 (05) 459-467. Epub 2012/12/05.

Crossref Search in Google Scholar
Download RIS citation
8 Coorevits P, Sundgren M, Klein GO, Bahr A, Claerhout B, Daniel C. et al. Electronic health records: new opportunities for clinical research. J Int Med 2013; 274 (06) 547-560. Epub 2013/08/21.

Crossref Search in Google Scholar
Download RIS citation
9 Ohno-Machado L. Realizing the full potential of electronic health records: the role of natural language processing. JAMIA 2011; 18 (05) 539 Epub 2011/08/19.

Search in Google Scholar
Download RIS citation
10 Thomas AA, Zheng C, Jung H, Chang A, Kim B, Gelfond J. et al. Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results. World journal of urology. 2013. Epub 2013/02/19.

Search in Google Scholar
Download RIS citation
11 Chard K, Russell M, Lussier YA, Mendonca EA, Silverstein JC. A cloud-based approach to medical NLP. In: Proc AMIA Ann Symp. 2011: 207-216. Epub 2011/12/24.

Search in Google Scholar
Download RIS citation
12 Le Moigno S, Charlet J, Bourigault D, Degoulet P, Jaulent MC. Terminology extraction from text to build an ontology in surgical intensive care. In: Proc AMIA Ann Symp. 2002: 430-434. Epub 2002/12/05.

Search in Google Scholar
Download RIS citation
13 Mell P, Grance T. The NIST Definition of Cloud Computing. Commun Acm 2010; 53 (06) 50.

Search in Google Scholar
Download RIS citation
14 Stingl C, Slamanig D. Health Records and the Cloud Computing Paradigm from a Privacy Perspective. Journal of Healthcare Engineering 2011; 2 (04) 487-508.

Crossref Search in Google Scholar
Download RIS citation
15 Glock J, Herold R, Pommerening K. Personal identifiers in medical research networks: evaluation of the personal identifier generator in the Competence Network Paediatric Oncology and Haematology GMS Med Inform Biom Epidemiol 2006 [Internet]. 2006; 2: 2.

Search in Google Scholar
Download RIS citation
16 Feldman H, Reti S, Kaldany E, Safran C. Deployment of a highly secure clinical data repository in an insecure international environment. Studies in health technology and informatics 2010; 160 Pt 2 869.

Search in Google Scholar
Download RIS citation
17 Neamatullah I, Douglass MM, Li-wei HL, Reisner A, Villarroel M, Long WJ. et al. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak 2008; 8 (01) 32.

Crossref Search in Google Scholar
Download RIS citation
18 Pfitzmann A, Hansen M. Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management - A Consolidated Proposal for Terminology 2008 2013–03–08. 2013. Available from http://dud.inf.tu-dresden.de/Anon_Terminology.shtml.

Search in Google Scholar
Download RIS citation
19 Sweeney L. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2002; 10 (05) 557-570.

Search in Google Scholar
Download RIS citation
20 McDonald CJ, Huff SM, Suico JG, Hill G, Leavelle D, Aller R. et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clinical chemistry 2003; 49 (04) 624-633.

Crossref Search in Google Scholar
Download RIS citation
21 Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. JAMIA 2013; 20 (01) 117-121. Epub 2012/09/08.

Search in Google Scholar
Download RIS citation
22 Talend-Germany. Talend Open Studio. 2013. [29.11.2013]; Available from http://en.talend. com/products/talend-open-studio.

Search in Google Scholar
Download RIS citation
23 Zunner C, Burkle T, Prokosch HU, Ganslandt T. Mapping local laboratory interface terms to LOINC at a German university hospital using RELMA V.5: a semi-automated approach. JAMIA 2013; 20 (02) 293-297. Epub 2012/07/18.

Search in Google Scholar
Download RIS citation
24 Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). JAMIA 2010; 17 (02) 124-130.

Search in Google Scholar
Download RIS citation
25 Sax U, Winter A, Prokosch H-U. Integrated Data Repository Toolkit (IDRT).

Download RIS citation
26 Milojièiæ D, Llorente IM, Montero RS. Open-nebula: A cloud management tool. Internet Computing, IEEE 2011; 15 (02) 11-14.

Search in Google Scholar
Download RIS citation
27 Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering 2004; 10 (03) (04) 327-348.

Crossref Search in Google Scholar
Download RIS citation
28 CDISC. Clinical Data Interchange Standards Consortium - Operational Data Model. Available from. http://www.cdisc.org/odm.

Download RIS citation
29 Eder J, Gottweis H, Zatloukal K. It solutions for privacy protection in biobanking. Public Health Genomics 2012; 15 (05) 254-262.

Crossref Search in Google Scholar
Download RIS citation
30 Stark K, Eder J, Zatloukal K. Achieving k-anonymity in DataMarts used for gene expressions exploitation. J Integr Bioinform 2007; 4 (01) 57.

Search in Google Scholar
Download RIS citation
31 Payne P, Ervin D, Dhaval R, Borlawsky T, Lai A. TRIAD: The Translational Research Informatics and Data Management Grid. Applied clinical informatics 2011; 2 (03) 331-344. Epub 2011/01/01.

Thieme Connect Search in Google Scholar
Download RIS citation
32 Heath AP. et al. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets. JAMIA. 2014. Epub 2014/01/28.

Search in Google Scholar
Download RIS citation
33 EHR4CR-Konsortium. Electronic Health Rec- ords for Clinical Research. 2011. [cited 2013 16.01.2013]; Available from http://www.ehr4cr.eu.

Search in Google Scholar
Download RIS citation
34 Shivade C. et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. JAMIA 2013. Epub 2013/11/10.

Download RIS citation
35 Meystre SM, Deshmukh VG, Mitchell J. A clinical use case to evaluate the i2b2 Hive: predicting asthma exacerbations. In: Proc AMIA Annual Symposium. 2009: 442-446. Epub 2009/01/01.

Search in Google Scholar
Download RIS citation
36 Hurdle JF. et al. Identifying clinical/translational research cohorts: ascertainment via querying an integrated multi-source database. JAMIA 2013; 20 (01) 164-171. Epub 2012/10/13.

Search in Google Scholar
Download RIS citation
37 Gesundheit Bf. Gesundheitssystem. 2013. [cited 2013 17.1.2013]; Available from http://www.bmg.bund.de/gesundheitssystem.html.

Search in Google Scholar
Download RIS citation
38 TMF. Arbeitsgruppe Datenschutz. [28.11.2013]; Available from. http://www.tmf-ev.de/Arbeitsgruppen_Foren/AGDS.aspx.

Download RIS citation
39 Tomanek K, Enders F, Daumke P, Müller ML, Sedlmayr M. Prokosch H-U. Ein System zur De-Identifikation medizinischer Rohdaten. GMDS 2012. 57. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie eV (GMDS). Braunschweig: German Medical Science GMS Publishing House; 2012

Search in Google Scholar
Download RIS citation
40 Tomanek K. et al. An Interactive De-Identifica- tion-System. 2014. (9 April 2014). Available from http://www.zora.uzh.ch/64476/16/11_An_interactive_de-identification-system.pdf.

Search in Google Scholar
Download RIS citation
41 Ziegler W. et al. Experience made using public cloud infrastructure to analyse clinical patient data. In Cunningham P. editor eChallenges; Dublin. 2013

Search in Google Scholar
Download RIS citation
42 Senger P, Klenner A, Fluck J. A Business Logic System for Mining German Patient Records. 58. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie eV (GMDS); Lübeck. 2013

Search in Google Scholar
Download RIS citation
43 Ahuja SP, Mani S, Zambrano J. A Survey of the State of Cloud Computing in Healthcare. Network and Communication Technologies 2012; 1 (02) 12-19.

Search in Google Scholar
Download RIS citation
44 Briscoe G, Marinos A. editors Digital ecosystems in the clouds: towards community cloud computing. 3rd IEEE International Conference on Digital Ecosystems and Technologies (DEST’09). 2009. IEEE.;

Search in Google Scholar
Download RIS citation
45 Fette G, Ertl M, Wörner A, Klügl P, Störk S, Puppe F. Information Extraction from Unstructured Electronic Health Records and Integration into a Data Warehouse. In Goltz U, Magnor MA, Appelrath H-J, Matthies HK, Balke W-T, Wolf LC. editors Informatik 2012. Braunschweig: GI; 2012

Search in Google Scholar
Download RIS citation
46 Anderson N. et al. Implementation of a deidentified federated data network for population-based cohort discovery. JAMIA 2012; 19 e (01) e60-7. Epub 2011/08/30.

Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Secure Secondary Use of Clinical Data with Cloud-based NLP Services

Authors

Publication History

Summary

Keywords

References