Appl Clin Inform 2021; 12(01): 057-064
DOI: 10.1055/s-0040-1721481
Research Article

Patient Cohort Identification on Time Series Data Using the OMOP Common Data Model

Christian Maier
1   Chair of Medical Informatics, Friedrich–Alexander–Universität Erlangen–Nürnberg (FAU), Erlangen, Bayern, Germany
Lorenz A. Kapsner
2   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Bayern, Germany
Sebastian Mate
2   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Bayern, Germany
Hans-Ulrich Prokosch
1   Chair of Medical Informatics, Friedrich–Alexander–Universität Erlangen–Nürnberg (FAU), Erlangen, Bayern, Germany
2   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Bayern, Germany
Stefan Kraus
3   Department of Computer Science, Mannheim University of Applied Sciences, Mannheim, Baden-Württemberg, Germany
› Author Affiliations
Funding This work was funded in part by the German Federal Ministry of Education and Research (BMBF) within the Medical Informatics Initiative (MIRACUM Consortium) under the Funding Number FKZ: 01ZZ1801A. The present work was performed in fulfillment of the requirements for obtaining the degree “Dr. rer. biol. hum.” from the Friedrich-Alexander-Universität Erlangen-Nürnberg (CM).


Background The identification of patient cohorts for recruiting patients into clinical trials requires an evaluation of study-specific inclusion and exclusion criteria. These criteria are specified depending on corresponding clinical facts. Some of these facts may not be present in the clinical source systems and need to be calculated either in advance or at cohort query runtime (so-called feasibility query).

Objectives We use the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) as the repository for our clinical data. However, Atlas, the graphical user interface of OMOP, does not offer the functionality to perform calculations on facts data. Therefore, we were in search for a different approach. The objective of this study is to investigate whether the Arden Syntax can be used for feasibility queries on the OMOP CDM to enable on-the-fly calculations at query runtime, to eliminate the need to precalculate data elements that are involved with researchers' criteria specification.

Methods We implemented a service that reads the facts from the OMOP repository and provides it in a form which an Arden Syntax Medical Logic Module (MLM) can process. Then, we implemented an MLM that applies the eligibility criteria to every patient data set and outputs the list of eligible cases (i.e., performs the feasibility query).

Results The study resulted in an MLM-based feasibility query that identifies cases of overventilation as an example of how an on-the-fly calculation can be realized. The algorithm is split into two MLMs to provide the reusability of the approach.

Conclusion We found that MLMs are a suitable technology for feasibility queries on the OMOP CDM. Our method of performing on-the-fly calculations can be employed with any OMOP instance and without touching existing infrastructure like the Extract, Transform and Load pipeline. Therefore, we think that it is a well-suited method to perform on-the-fly calculations on OMOP.

Protection of Human and Animal Subjects

Only anonymized data was used. Therefore, the authors declare that the study was conducted in accordance with the ethical principles of the Helsinki Declaration.

Supplementary Material

Publication History

Received: 22 June 2020

Accepted: 04 November 2020

Article published online:
27 January 2021

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

  • References

  • 1 Meystre SM, Heider PM, Kim Y, Aruch DB, Britten CD. Automatic trial eligibility surveillance based on unstructured clinical data. Int J Med Inform 2019; 129: 13-19
  • 2 Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. Summit Translat Bioinforma 2010; 2010: 1-5
  • 3 Hripcsak G, Duke JD, Shah NH. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015; 216: 574-578
  • 4 Maier C, Lang L, Storf H. et al. Towards implementation of OMOP in a German University Hospital Consortium. Appl Clin Inform 2018; 9 (01) 54-61
  • 5 Lamer A, Depas N, Doutreligne M. et al. Transforming French Electronic Health Records into the Observational Medical Outcome Partnership's Common Data Model: a feasibility study. Appl Clin Inform 2020; 11 (01) 13-22
  • 6 Yoon D, Ahn EK, Park MY. et al. Conversion and data quality assessment of electronic health record data at a Korean tertiary teaching hospital to a Common Data Model for distributed network research. Healthc Inform Res 2016; 22 (01) 54-58
  • 7 Lynch KE, Deppen SA, DuVall SL. et al. Incrementally transforming electronic medical records into the Observational Medical Outcomes Partnership Common Data Model: a multidimensional quality assurance approach. Appl Clin Inform 2019; 10 (05) 794-803
  • 8 Denney MJ, Long DM, Armistead MG, Anderson JL, Conway BN. Validating the extract, transform, load process used to populate a large clinical research database. Int J Med Inform 2016; 94: 271-274
  • 9 Defining a cohort in Atlas through a ratio between two measurement values, OHDSI Forum. Published January 9, 2020. Accessed January 9, 2020 at:
  • 10 Ross J, Tu S, Carini S, Sim I. Analysis of eligibility criteria complexity in clinical trials. Summit On Translat Bioinforma 2010; 2010: 46-50
  • 11 Castellanos I, Martin M, Kraus S. et al. Effects of staff training and electronic event monitoring on long-term adherence to lung-protective ventilation recommendations. J Crit Care 2018; 43: 13-20
  • 12 Mate S, Castellanos I, Ganslandt T, Prokosch H-U, Kraus S. Standards-based procedural phenotyping: the Arden Syntax on i2b2. Stud Health Technol Inform 2017; 243: 37-41
  • 13 Hripcsak G, Ludemann P, Pryor TA, Wigertz OB, Clayton PD. Rationale for the Arden Syntax. Comput Biomed Res 1994; 27 (04) 291-324
  • 14 Samwald M, Fehre K, de Bruin J, Adlassnig K-P. The Arden Syntax standard for clinical decision support: experiences and directions. J Biomed Inform 2012; 45 (04) 711-718
  • 15 Weng C, Tu SW, Sim I, Richesson R. Formal representation of eligibility criteria: a literature review. J Biomed Inform 2010; 43 (03) 451-467
  • 16 Wang SJ, Ohno-Machado L, Mar P, Boxwala AA, Greenes RA. Enhancing Arden Syntax for clinical trial eligibility criteria. Proc AMIA Symp. . Published online 1999 1188
  • 17 Ohno-Machado L, Wang SJ, Mar P, Boxwala AA. Decision support for clinical trial eligibility determination in breast cancer. Proc AMIA Symp. . Published online 1999 340-344
  • 18 Hripcsak G. Writing Arden Syntax Medical Logic Modules. Comput Biol Med 1994; 24 (05) 331-363
  • 19 Kraus S. Generalizing the Arden Syntax to a common clinical application language. Stud Health Technol Inform 2018; 247: 675-679
  • 20 McDonald CJ, Huff SM, Suico JG. et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem 2003; 49 (04) 624-633
  • 21 Kraus S, Toddenroth D, Staudigel M. et al. Mapping the entire record-an alternative approach to data access from Medical Logic Modules. Appl Clin Inform 2020; 11 (02) 342-349
  • 22 Talend Inc, Open Source ESB: Talend Open Studio Free ESB Tool. . Accessed January 24, 2020
  • 23 Wickham H, François R, Henry L, Müller K. dplyr: a grammar of data manipulation. . Published online 2019
  • 24 Murphy SN, Weber G, Mendis M. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 2010; 17 (02) 124-130
  • 25 Sarkar IN, Chen ES, Rosenau PT, Storer MB, Anderson B, Horbar JD. Using Arden Syntax to identify registry-eligible very low birth weight neonates from the electronic health record. AMIA Annu Symp Proc 2014; 2014: 1028-1036
  • 26 Gietzelt M, Goltz U, Grunwald D. et al. ARDEN2BYTECODE: a one-pass Arden Syntax compiler for service-oriented decision support systems based on the OSGi platform. Comput Methods Prog Biomed 2012; 106 (02) 114-125
  • 27 Yuan C, Ryan PB, Ta C. et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J Am Med Inform Assoc 2019; 26 (04) 294-305
  • 28 Unberath P, Prokosch HU, Gründner J, Erpenbeck M, Maier C, Christoph J. EHR-independent predictive decision support architecture based on OMOP. Appl Clin Inform 2020; 11 (03) 399-404
  • 29 Nadkarni PM. Metadata-Driven Software Systems in Biomedicine: Designing Systems That Can Adapt to Changing Knowledge. London: Springer; 2011
  • 30 Fowler M, Parsons R. Domain-Specific Languages. Boston: Addison-Wesley; 2011
  • 31 Fowler M. Business Readable DSL. Accessed January 31, 2020 at:
  • 32 Sonntag D, Wennerberg P, Buitelaar P, Zillner S. Pillars of ontology treatment in the medical domain. In: Cases on Semantic Interoperability for Information Systems Integration: Practices and Applications, IGI Global Hershey. 2010: 162-186
  • 33 Evans CC, Simonov K. Query combinators: domain specific query languages for medical research. Bioinformatics 2019; DOI: 10.1101/737619.