Information Extraction from Echocardiography Reports for a Clinical Follow-up Study—Comparison of Extracted Variables Intended for General Use in a Data Warehouse with Those Intended Specifically for the Study

Mathias Kaspar; Caroline Morbach; Georg Fette; Maximilian Ertl; Lea K. Seidlmayer; Jonathan Krebs; Georg Dietrich; Leon Liman; Frank Puppe; Stefan Störk

doi:10.1055/s-0039-3402069

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035037.xml

PDF herunterladen

Methods Inf Med 2019; 58(04/05): 140-150
DOI: 10.1055/s-0039-3402069

Original Article

Georg Thieme Verlag KG Stuttgart · New York

Information Extraction from Echocardiography Reports for a Clinical Follow-up Study—Comparison of Extracted Variables Intended for General Use in a Data Warehouse with Those Intended Specifically for the Study

Autor*innen

Mathias Kaspar

¹Comprehensive Heart Failure Center, Würzburg University Hospital, Würzburg, Germany

²Chair for Artificial Intelligence and Applied Informatics, Würzburg University, Würzburg, Germany
Caroline Morbach

¹Comprehensive Heart Failure Center, Würzburg University Hospital, Würzburg, Germany

³Department of Internal Medicine I, Würzburg University, Würzburg, Germany
Georg Fette

¹Comprehensive Heart Failure Center, Würzburg University Hospital, Würzburg, Germany

²Chair for Artificial Intelligence and Applied Informatics, Würzburg University, Würzburg, Germany
Maximilian Ertl

⁴Service Centre Medical Informatics, Würzburg University, Würzburg, Germany
Lea K. Seidlmayer

¹Comprehensive Heart Failure Center, Würzburg University Hospital, Würzburg, Germany

³Department of Internal Medicine I, Würzburg University, Würzburg, Germany
Jonathan Krebs

²Chair for Artificial Intelligence and Applied Informatics, Würzburg University, Würzburg, Germany
Georg Dietrich

²Chair for Artificial Intelligence and Applied Informatics, Würzburg University, Würzburg, Germany
Leon Liman

²Chair for Artificial Intelligence and Applied Informatics, Würzburg University, Würzburg, Germany
Frank Puppe

²Chair for Artificial Intelligence and Applied Informatics, Würzburg University, Würzburg, Germany
Stefan Störk

¹Comprehensive Heart Failure Center, Würzburg University Hospital, Würzburg, Germany

³Department of Internal Medicine I, Würzburg University, Würzburg, Germany

Funding This work was supported by the German Ministry of Education and Research (BMBF), Berlin (#01EO1004, #01EO1504).

Weitere Informationen

Publikationsverlauf

22. März 2019

12. November 2019

Publikationsdatum:
30. Januar 2020 (online)

Lizenzen und Reprints

Abstract

Background The interest in information extraction from clinical reports for secondary data use is increasing. But experience with the productive use of information extraction processes over time is scarce. A clinical data warehouse has been in use at our university hospital for several years, which also provides an information extraction of echocardiography reports developed for general use.

Objectives This study aims to illustrate the difficulties encountered, while using data from a preexisting information extraction process for a large clinical study. To compare the data from the preexisting process with the data obtained from a specially developed process designed to improve the quality and completeness of the study data.

Methods We extracted the echocardiography variables for 440 patients from the general-use information extraction of the data warehouse (678 reports). Then we developed an information extraction process for the same variables but specifically for this study, with the aim to extract as much information as possible from the text. The extracted data of both processes were compared with a newly created gold standard defined by a cardiologist with long-standing experience in heart failure.

Results Among 57 echocardiography variables considered relevant for the study, 50 were documented in the routine text reports and could be extracted. Twenty of the required variables were not provided by the general-use extraction process, some others were not provided correctly. The median macro F1-score (precision, recall) across the 30 variables for which values were extracted was 0.81 (0.94, 0.77). Across all 50 variables, as relevant for the study, median macro F1-score was only 0.49 (0.56, 0.46). Employing the study-specific approach considerably improved the quality and completeness of the variables, resulting in F1-scores of 0.97 (0.98, 0.96) across all variables.

Conclusion Data from information extractions can be used for large clinical studies. However, preexisting information extraction processes should be treated with caution, as the time and effort spent defining each variable in the information extraction process may not be clear.

Keywords

information extraction - echocardiography reports - reevaluation - secondary data usage

References
1 Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med 2009; 48 (01) 38-44

Thieme Connect PubMed Suche in Google Scholar
Download RIS citation
2 Duftschmid G, Gall W, Eigenbauer E, Dorda W. Management of data from clinical trials using the ArchiMed system. Med Inform Internet Med 2002; 27 (02) 85-98

Crossref PubMed Suche in Google Scholar
Download RIS citation
3 Murphy SN, Weber G, Mendis M. , et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 2010; 17 (02) 124-130

Crossref PubMed Suche in Google Scholar
Download RIS citation
4 Haarbrandt B, Tute E, Marschollek M. Automated population of an i2b2 clinical data warehouse from an openEHR-based data repository. J Biomed Inform 2016; 63: 277-294

Crossref PubMed Suche in Google Scholar
Download RIS citation
5 Hahn U, Romacker M, Schulz S. MEDSYNDIKATE--a natural language system for the extraction of medical information from findings reports. Int J Med Inform 2002; 67 (1-3): 63-74

Crossref PubMed Suche in Google Scholar
Download RIS citation
6 Savova GK, Masanz JJ, Ogren PV. , et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010; 17 (05) 507-513

Crossref PubMed Suche in Google Scholar
Download RIS citation
7 Mykowiecka A, Marciniak M, Kupść A. Rule-based information extraction from patients' clinical data. J Biomed Inform 2009; 42 (05) 923-936

Crossref PubMed Suche in Google Scholar
Download RIS citation
8 Zheng S, Lu JJ, Ghasemzadeh N, Hayek SS, Quyyumi AA, Wang F. Effective information extraction framework for heterogeneous clinical reports using online machine learning and controlled vocabularies. JMIR Med Inform 2017; 5 (02) e12

Crossref PubMed Suche in Google Scholar
Download RIS citation
9 Hu YH, Tai CT, Tsai CF, Huang MW. Improvement of adequate digoxin dosage: an application of machine learning approach. J Healthc Eng 2018; 2018: 3948245

PubMed Suche in Google Scholar
Download RIS citation
10 Toepfer M, Corovic H, Fette G, Klügl P, Störk S, Puppe F. Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med Inform Decis Mak 2015; 15: 91

Crossref PubMed Suche in Google Scholar
Download RIS citation
11 Wang Y, Wang L, Rastegar-Mojarad M. , et al. Clinical information extraction applications: A literature review. J Biomed Inform 2018; 77: 34-49

Crossref PubMed Suche in Google Scholar
Download RIS citation
12 Ferrucci D, Lally A, Verspoor K, Nyberg E. Unstructured information management architecture (UIMA) version 1.0. OASIS Standard. Available at: https://www.oasis-open.org/committees/download.php/28492/uima-spec-wd-05.pdf . Accessed November 28, 2019

Download RIS citation
13 Cunningham H. GATE, a general architecture for text engineering. Comput Hum 2002; 36 (02) 223-254

Crossref Suche in Google Scholar
Download RIS citation
14 Wang Y, Mehrabi S, Sohn S, Atkinson EJ, Amin S, Liu H. Natural language processing of radiology reports for identification of skeletal site-specific fractures. BMC Med Inform Decis Mak 2019; 19 (03) (Suppl. 03) 73

Crossref PubMed Suche in Google Scholar
Download RIS citation
15 Fonferko-Shadrach B, Lacey AS, Roberts A. , et al. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system. BMJ Open 2019; 9 (04) e023232

Crossref PubMed Suche in Google Scholar
Download RIS citation
16 Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform 2018; 22 (05) 1589-1604

Crossref PubMed Suche in Google Scholar
Download RIS citation
17 Jagannatha AN, Yu H. Structured prediction models for RNN based sequence labeling in clinical text. In: Su J, Duh K, Carreras X. , eds. Conference on Empirical Methods in Natural Language Processing: Conference Proceedings. Austin, TX: The Association for Computational Linguistics; 2016: 856-865

Suche in Google Scholar
Download RIS citation
18 Rios A, Durbin EB, Hands I. , et al. Cross-registry neural domain adaptation to extract mutational test results from pathology reports. J Biomed Inform 2019; 97: 103267

Crossref PubMed Suche in Google Scholar
Download RIS citation
19 Small AM, Kiss DH, Zlatsin Y. , et al. Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease. J Biomed Inform 2017; 72: 77-84

Crossref PubMed Suche in Google Scholar
Download RIS citation
20 Nath C, Albaghdadi MS, Jonnalagadda SR. A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One 2016; 11 (04) e0153749

Crossref PubMed Suche in Google Scholar
Download RIS citation
21 Patterson OV, Freiberg MS, Skanderson M, J Fodeh S, Brandt CA, DuVall SL. Unlocking echocardiogram measurements for heart disease research through natural language processing. BMC Cardiovasc Disord 2017; 17 (01) 151

Crossref PubMed Suche in Google Scholar
Download RIS citation
22 Fette G, Ertl M, Wörner A, Kluegl P, Störk S, Puppe F. Information extraction from unstructured electronic health records and integration into a data warehouse. In: Goltz U, Magnor M, Appelrath HJ, Matthies HK, Balke WT, Wolf L. , eds. INFORMATIK 2012. Bonn: Gesellschaft für Informatik e.V; 2012: 1237-1251

Suche in Google Scholar
Download RIS citation
23 Dietrich G, Ertl M, Fette G. , et al. Extending the query language of a data warehouse for patient recruitment. Stud Health Technol Inform 2017; 243: 152-156

PubMed Suche in Google Scholar
Download RIS citation
24 Dietrich G, Krebs J, Fette G. , et al. Ad hoc information extraction for clinical data warehouses. Methods Inf Med 2018; 57 (01) e22-e29

Thieme Connect PubMed Suche in Google Scholar
Download RIS citation
25 Kaspar M, Ertl M, Fette G. , et al. Data linkage from clinical to study databases via an r data warehouse user interface. experiences from a large clinical follow-up study. Methods Inf Med 2016; 55 (04) 381-386

Thieme Connect PubMed Suche in Google Scholar
Download RIS citation
26 Kluegl P, Toepfer M, Beck PD. , et al. UIMA Ruta: rapid development of rule-based information extraction applications. Nat Lang Eng 2016; 22 (01) 1-40

Crossref Suche in Google Scholar
Download RIS citation
27 R Development Core Team. A language and environment for statistical computing. Available at: https://www.gbif.org/tool/81287/r-a-language-and-environment-for-statistical-computing . Accessed November 28, 2019

Download RIS citation
28 Voelker W, Koch D, Flachskampf FA. , et al; Arbeitsgruppe Kardiovaskulärer Ultraschall der DGK. Strukturierter Datensatz zur Befunddokumentation in der Echokardiographie--Version 2004 für den Arbeitskreis “Standardisierung und LV-Funktion” der Arbeitsgruppe Kardiovaskulärer Ultraschall der DGK. [A structured data set for Echocardiography Reports, Version 2004]. Z Kardiol 2004; 93 (12) 987-1004

Crossref PubMed Suche in Google Scholar
Download RIS citation

Ähnliche Zeitschriften

RSS-Feed abonnieren

Teilen / Bookmarken

Information Extraction from Echocardiography Reports for a Clinical Follow-up Study—Comparison of Extracted Variables Intended for General Use in a Data Warehouse with Those Intended Specifically for the Study

Autor*innen

Publikationsverlauf

Abstract

Keywords

References