CC BY-NC-ND 4.0 · Appl Clin Inform 2024; 15(01): 111-118
DOI: 10.1055/s-0043-1777741
Research Article

Creating a Medication Therapy Observational Research Database from an Electronic Medical Record: Challenges and Data Curation

Wolfgang Rödle
1   Chair of Medical Informatics, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Hans-Ulrich Prokosch
1   Chair of Medical Informatics, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Eva Neumann
3   Dr Margarete Fischer Bosch Institute of Clinical Pharmacology, Stuttgart, Germany
Irmgard Toni
2   Departmant of Paediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Erlangen, Germany
Julia Haering-Zahn
2   Departmant of Paediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Erlangen, Germany
Antje Neubert
2   Departmant of Paediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Erlangen, Germany
Sonja Eberl
2   Departmant of Paediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Erlangen, Germany
› Author Affiliations


Background Observational research has shown its potential to complement experimental research and clinical trials by secondary use of treatment data from hospital care processes. It can also be applied to better understand pediatric drug utilization for establishing safer drug therapy. Clinical documentation processes often limit data quality in pediatric medical records requiring data curation steps, which are mostly underestimated.

Objectives The objectives of this study were to transform and curate data from a departmental electronic medical record into an observational research database. We particularly aim at identifying data quality problems, illustrating reasons for such problems and describing the systematic data curation process established to create high-quality data for observational research.

Methods Data were extracted from an electronic medical record used by four wards of a German university children's hospital from April 2012 to June 2020. A four-step data preparation, mapping, and curation process was established. Data quality of the generated dataset was firstly assessed following an established 3 × 3 Data Quality Assessment guideline and secondly by comparing a sample subset of the database with an existing gold standard.

Results The generated dataset consists of 770,158 medication dispensations associated with 89,955 different drug exposures from 21,285 clinical encounters. A total of 6,840 different narrative drug therapy descriptions were mapped to 1,139 standard terms for drug exposures. Regarding the quality criterion correctness, the database was consistent and had overall a high agreement with our gold standard.

Conclusion Despite large amounts of freetext descriptions and contextual knowledge implicitly included in the electronic medical record, we were able to identify relevant data quality issues and to establish a semi-automated data curation process leading to a high-quality observational research database. Because of inconsistent dosage information in the original documentation this database is limited to a drug utilization database without detailed dosage information.

Protection of Human and Animal Subjects

A positive ethics vote exists for the study (Application No 561_20 BC, ethics commission of the Friedrich-Alexander-Universität chaired by Prof. Dr. med. Renke Maas).

Supplementary Materials

The [Supplementary Material 1] (available in online version only) is a detailed description of the thorough intra-database quality assessment. The [Supplementary Material 2] (available in online version only) is a detailed description of the thorough extra-database quality assessment.


The data that support the findings of this study are available from the senior author S.E. (, upon reasonable request.

Supplementary Material

Publication History

Received: 26 April 2023

Accepted: 28 August 2023

Article published online:
07 February 2024

© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

  • References

  • 1 Conroy S, Choonara I, Impicciatore P. et al; European Network for Drug Investigation in Children. Survey of unlicensed and off label drug use in paediatric wards in European countries. BMJ 2000; 320 (7227) 79-82
  • 2 Frattarelli DA, Galinkin JL, Green TP. et al; American Academy of Pediatrics Committee on Drugs. Off-label use of drugs in children. Pediatrics 2014; 133 (03) 563-567
  • 3 Gore R, Chugh PK, Tripathi CD, Lhamo Y, Gautam S. Pediatric off-label and unlicensed drug use and its implications. Curr Clin Pharmacol 2017; 12 (01) 18-25
  • 4 Knopf H, Wolf IK, Sarganas G, Zhuang W, Rascher W, Neubert A. Off-label medicine use in children and adolescents: results of a population-based study in Germany. BMC Public Health 2013; 13: 631
  • 5 Mühlbauer B, Janhsen K, Pichler J, Schoettler P. Off-label use of prescription drugs in childhood and adolescence: an analysis of prescription patterns in Germany. Dtsch Arztebl Int 2009; 106 (03) 25-31
  • 6 Kimland E, Odlind V. Off-label drug use in pediatric patients. Clin Pharmacol Ther 2012; 91 (05) 796-801
  • 7 Lasky T, Carleton B, Horton DB. et al. Real-world evidence to assess medication safety or effectiveness in children: systematic review. Drugs Real World Outcomes 2020; 7 (02) 97-107
  • 8 Lu Y, Van Zandt M, Liu Y. et al. Analysis of dual combination therapies used in treatment of hypertension in a multinational cohort. JAMA Netw Open 2022; 5 (03) e223877
  • 9 The European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP) Guide on Methodological Standards in Pharmacoepidemiology. Published June 30, 2022. Accessed August 22, 2023 at:
  • 10 Bleisinger B, Hartmann P, Schels T. Elektronische Patientenkurve: Schluss mit Notizzetteln. Deutsches Ärzteblatt. 2013 . Accessed February 11, 2022 at:
  • 11 Council of Europe. EDQM Standard Terms - Internal Controlled Vocabularies for Pharmaceutical Dose Forms
  • 12 Weiskopf NG, Bakken S, Hripcsak G, Weng C. A data quality assessment guideline for electronic health record data reuse. EGEMS (Wash DC) 2017; 5 (01) 14
  • 13 Toni I, Wimmer S, Rascher W, Neubert A. AVOID - Kann eine elektronische Patientenkurve mit Dosierungsrechner die Arzneimitteltherapiesicherheit in der Pädiatrie erhöhen? 5. Deutscher Kongress für Patientensicherheit bei medikamentöser Therapie - Arzneiverordnung in der Praxis. Accessed August 22, 2023 at: 2019
  • 14 Wimmer S, Toni I, Botzenhardt S, Trollmann R, Rascher W, Neubert A. Impact of a computerized physician order entry system on medication safety in pediatrics-the AVOID study. Pharmacol Res Perspect 2023; 11 (03) e01092
  • 15 McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 2012; 22 (03) 276-282
  • 16 Ranganathan P, Pramesh CS, Aggarwal R. Common pitfalls in statistical analysis: measures of agreement. Perspect Clin Res 2017; 8 (04) 187-191
  • 17 Zaki R, Bulgiba A, Ismail R, Ismail NA. Statistical methods used to test for agreement of medical instruments measuring continuous variables in method comparison studies: a systematic review. PLoS One 2012; 7 (05) e37908
  • 18 Horton DB, Blum MD, Burcu M. Real-world evidence for assessing treatment effectiveness and safety in pediatric populations. J Pediatr 2021; 238: 312-316
  • 19 Lasky T, Ernst FR, Greenspan J, Wang S, Gonzalez L. Estimating pediatric inpatient medication use in the United States. Pharmacoepidemiol Drug Saf 2011; 20 (01) 76-82
  • 20 Lasky T, Greenspan J, Ernst FR, Gonzalez L. Morphine use in hospitalized children in the United States: a descriptive analysis of data from pediatric hospitalizations in 2008. Clin Ther 2012; 34 (03) 720-727
  • 21 Feudtner C, Dai D, Faerber J, Metjian TA, Luan X. Pragmatic estimates of the proportion of pediatric inpatients exposed to specific medications in the USA. Pharmacoepidemiol Drug Saf 2013; 22 (08) 890-898
  • 22 Griffith HG, Dantuluri K, Thurm C. et al. Considerable variability in antibiotic use among US children's hospitals in 2017-2018. Infect Control Hosp Epidemiol 2020; 41 (05) 571-578
  • 23 Dai D, Feinstein JA, Morrison W, Zuppa AF, Feudtner C. Epidemiology of polypharmacy and potential drug-drug interactions among pediatric patients in ICUs of U.S. children's hospitals. Pediatr Crit Care Med 2016; 17 (05) e218-e228
  • 24 Getz KD, Miller TP, Seif AE. et al. Opioid utilization among pediatric patients treated for newly diagnosed acute myeloid leukemia. PLoS One 2018; 13 (02) e0192529
  • 25 Yu Y, Nie X, Zhao Y. et al. Detection of pediatric drug-induced kidney injury signals using a hospital electronic medical record database. Front Pharmacol 2022; 13: 957980
  • 26 Yu G, Zeng X, Ni S. et al. A computational method to quantitatively measure pediatric drug safety using electronic medical records. BMC Med Res Methodol 2020; 20 (01) 9
  • 27 Choi J, Urubuto F, Dusabimana R. et al. Establishing a neonatal database in a tertiary hospital in Rwanda - an observational study. Paediatr Int Child Health 2019; 39 (04) 265-274
  • 28 Reinecke I, Siebel J, Fuhrmann S. et al. Assessment and improvement of drug data structuredness from electronic health records: algorithm development and validation. JMIR Med Inform 2023; 11: e40312
  • 29 Kapsner LA, Mang JM, Mate S. et al. Linking a consortium-wide data quality assessment tool with the MIRACUM metadata repository. Appl Clin Inform 2021; 12 (04) 826-835
  • 30 Zahn J, Eberl S, Rödle W, Rascher W, Neubert A, Toni I. Metamizole use in children: analysis of drug utilisation and adverse drug reactions at a German university hospital between 2015 and 2020. Paediatr Drugs 2022; 24 (01) 45-56
  • 31 Cars T, Wettermark B, Malmström RE. et al. Extraction of electronic health record data in a hospital setting: comparison of automatic and semi-automatic methods using anti-TNF therapy as model. Basic Clin Pharmacol Toxicol 2013; 112 (06) 392-400
  • 32 Schmidt CO, Struckmann S, Enzenbach C. et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol 2021; 21 (01) 63
  • 33 Semler SC, Wissing F, Heyder R. German medical informatics initiative. Methods Inf Med 2018; 57 (S 01): e50-e56
  • 34 Prokosch HU, Acker T, Bernarding J. et al. MIRACUM: medical informatics in research and care in university medicine. Methods Inf Med 2018; 57 (S 01): e82-e91
  • 35 Maier C, Lang L, Storf H. et al. Towards implementation of OMOP in a German university hospital consortium. Appl Clin Inform 2018; 9 (01) 54-61