Methods Inf Med 2022; 61(01/02): 003-010
DOI: 10.1055/s-0041-1739361
Original Article

A Semi-Automated Term Harmonization Pipeline Applied to Pulmonary Arterial Hypertension Clinical Trials

Ryan J. Urbanowicz
1   Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
,
John H. Holmes
1   Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
,
Dina Appleby
1   Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
,
Vanamala Narasimhan
2   Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
,
Stephen Durborow
1   Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
,
Nadine Al-Naamani
3   Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
,
Melissa Fernando
1   Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
,
Steven M. Kawut
3   Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
› Author Affiliations
Funding This work was supported by Cardiovascular Medical Research and Education Fund (CMREF), Aldrighetti Research Award for Young Investigators, and NIH 2K24HL103844–06 and 5K23HL141584–03.

Abstract

Objective Data harmonization is essential to integrate individual participant data from multiple sites, time periods, and trials for meta-analysis. The process of mapping terms and phrases to an ontology is complicated by typographic errors, abbreviations, truncation, and plurality. We sought to harmonize medical history (MH) and adverse events (AE) term records across 21 randomized clinical trials in pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension.

Methods We developed and applied a semi-automated harmonization pipeline for use with domain-expert annotators to resolve ambiguous term mappings using exact and fuzzy matching. We summarized MH and AE term mapping success, including map quality measures, and imputation of a generalizing term hierarchy as defined by the applied Medical Dictionary for Regulatory Activities (MedDRA) ontology standard.

Results Over 99.6% of both MH (N = 37,105) and AE (N = 58,170) records were successfully mapped to MedDRA low-level terms. Automated exact matching accounted for 74.9% of MH and 85.5% of AE mappings. Term recommendations from fuzzy matching in the pipeline facilitated annotator mapping of the remaining 24.9% of MH and 13.8% of AE records. Imputation of the generalized MedDRA term hierarchy was unambiguous in 85.2% of high-level terms, 99.4% of high-level group terms, and 99.5% of system organ class in MH, and 75% of high-level terms, 98.3% of high-level group terms, and 98.4% of system organ class in AE.

Conclusion This pipeline dramatically reduced the burden of manual annotation for MH and AE term harmonization and could be adapted to other data integration efforts.

Supplementary Material



Publication History

Received: 08 July 2021

Accepted: 04 October 2021

Article published online:
24 November 2021

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Maxwell SE, Kelley K, Rausch JR. Sample size planning for statistical power and accuracy in parameter estimation. Annu Rev Psychol 2008; 59 (01) 537-563
  • 2 Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials 1981; 2 (02) 93-113
  • 3 Evans JDW, Girerd B, Montani D. et al. BMPR2 mutations and survival in pulmonary arterial hypertension: an individual participant data meta-analysis. Lancet Respir Med 2016; 4 (02) 129-137
  • 4 Halliday SJ, Hemnes AR. Identifying “super responders” in pulmonary arterial hypertension. Pulm Circ 2017; 7 (02) 300-311
  • 5 Lee JS-H, Kibbe WA, Grossman RL. Data harmonization for a molecularly driven health system. Cell 2018; 174 (05) 1045-1048
  • 6 Jiang G, Evans J, Oniki T. et al. Harmonization of detailed clinical models with clinical study data standard. Methods Inf Med 2015; 54 (01) 65-74
  • 7 Kock-Schoppenhauer A-K, Kroll B, Lambarki M. et al. One step away from technology but one step towards domain experts-MDRBridge: a template-based ISO 11179-compliant metadata processing pipeline. Methods Inf Med 2019; 58 (S 02): e72-e79
  • 8 Kalter J, Sweegers MG, Verdonck-de Leeuw IM, Brug J, Buffart LM. Development and use of a flexible data harmonization platform to facilitate the harmonization of individual patient data for meta-analyses. BMC Res Notes 2019; 12 (01) 164
  • 9 Firnkorn D, Ganzinger M, Muley T, Thomas M, Knaup P. Construction and Application for the Lung Cancer Phenotype Database of the German Center for Lung Research. A generic data harmonization process for cross-linked research and network interaction. Methods Inf Med 2015; 54 (05) 455-460
  • 10 Bauer CRKD, Ganslandt T, Baum B. et al. Integrated data repository toolkit (IDRT). A suite of programs to facilitate health analytics on heterogeneous medical data. Methods Inf Med 2016; 55 (02) 125-135
  • 11 PCORnet. The national patient-centered clinical research network. Accessed April 20, 2021 at: https://pcornet.org/data/
  • 12 OMOP Common Data Model – OHDSI. . Accessed April 20, 2021 at: https://www.ohdsi.org/data-standardization/the-common-data-model/
  • 13 i2b2: Informatics for integrating biology & the bedside. Accessed April 20, 2021 at: https://www.i2b2.org/about/intro.html
  • 14 Boussadi A, Zapletal E. A Fast Healthcare Interoperability Resources (FHIR) layer implemented over i2b2. BMC Med Inform Decis Mak 2017; 17 (01) 120
  • 15 CDISC. Clear Data. Clear Impact. Accessed April 20, 2021 at: https://www.cdisc.org/
  • 16 Kuchinke W, Aerts J, Semler SC, Ohmann C. CDISC standard-based electronic archiving of clinical trials. Methods Inf Med 2009; 48 (05) 408-413
  • 17 Hume S, Aerts J, Sarnikar S, Huser V. Current applications and future directions for the CDISC Operational Data Model standard: a methodological review. J Biomed Inform 2016; 60: 352-362
  • 18 Huser V, Sastry C, Breymaier M, Idriss A, Cimino JJ. Standardizing data exchange for clinical research protocols and case report forms: an assessment of the suitability of the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM). J Biomed Inform 2015; 57: 88-99
  • 19 Bruland P, Breil B, Fritz F, Dugas M. Interoperability in clinical research: from metadata registries to semantically annotated CDISC ODM. Stud Health Technol Inform 2012; 180: 564-568
  • 20 Navarro G. A guided tour to approximate string matching. ACM Comput Surv 2001; 33 (01) 31-88
  • 21 Nachimuthu SK, Lau LM. Applying hybrid algorithms for text matching to automated biomedical vocabulary mapping. AMIA Annu Symp Proc 2005; 2005: 555-559
  • 22 MedDRA. Welcome to MedDRA. Accessed April 20, 2021 at: https://www.meddra.org/
  • 23 McLaughlin VV, Benza RL, Rubin LJ. et al. Addition of inhaled treprostinil to oral therapy for pulmonary arterial hypertension: a randomized controlled clinical trial. J Am Coll Cardiol 2010; 55 (18) 1915-1922
  • 24 GALIE N. Ambrisentan in Pulmonary Arterial Hypertension. Randomized, double-blind, placebo-controlled, multicenter, efficacy studies (ARIES) group. Circulation 2008; 117: 2966-2968
  • 25 Rubin LJ, Badesch DB, Barst RJ. et al. Bosentan therapy for pulmonary arterial hypertension. N Engl J Med 2002; 346 (12) 896-903
  • 26 Simonneau G, Barst RJ, Galie N. et al; Treprostinil Study Group. Continuous subcutaneous infusion of treprostinil, a prostacyclin analogue, in patients with pulmonary arterial hypertension: a double-blind, randomized, placebo-controlled trial. Am J Respir Crit Care Med 2002; 165 (06) 800-804
  • 27 Channick RN, Simonneau G, Sitbon O. et al. Effects of the dual endothelin-receptor antagonist bosentan in patients with pulmonary hypertension: a randomised placebo-controlled study. Lancet 2001; 358 (9288): 1119-1123
  • 28 Jing Z-C, Parikh K, Pulido T. et al. Efficacy and safety of oral treprostinil monotherapy for the treatment of pulmonary arterial hypertension: a randomized, controlled trial. Circulation 2013; 127 (05) 624-633
  • 29 Olschewski H, Simonneau G, Galiè N. et al; Aerosolized Iloprost Randomized Study Group. Inhaled iloprost for severe pulmonary hypertension. N Engl J Med 2002; 347 (05) 322-329
  • 30 Galiè N, Barberà JA, Frost AE. et al; AMBITION Investigators. Initial use of ambrisentan plus tadalafil in pulmonary arterial hypertension. N Engl J Med 2015; 373 (09) 834-844
  • 31 Oudiz RJ, Galiè N, Olschewski H. et al; ARIES Study Group. Long-term ambrisentan therapy for the treatment of pulmonary arterial hypertension. J Am Coll Cardiol 2009; 54 (21) 1971-1981
  • 32 Olschewski H, Hoeper MM, Behr J. et al. Long-term therapy with inhaled iloprost in patients with pulmonary hypertension. Respir Med 2010; 104 (05) 731-740
  • 33 Rubin LJ, Badesch DB, Fleming TR. et al; SUPER-2 Study Group. Long-term treatment with sildenafil citrate in pulmonary arterial hypertension: the SUPER-2 study. Chest 2011; 140 (05) 1274-1283
  • 34 Pulido T, Adzerikho I, Channick RN. et al; SERAPHIN Investigators. Macitentan and morbidity and mortality in pulmonary arterial hypertension. N Engl J Med 2013; 369 (09) 809-818
  • 35 Tapson VF, Torres F, Kermeen F. et al. Oral treprostinil for the treatment of pulmonary arterial hypertension in patients on background endothelin receptor antagonist and/or phosphodiesterase type 5 inhibitor therapy (the FREEDOM-C study): a randomized controlled trial. Chest 2012; 142 (06) 1383-1390
  • 36 Tapson VF, Jing Z-C, Xu K-F. et al; FREEDOM-C2 Study Team. Oral treprostinil for the treatment of pulmonary arterial hypertension in patients receiving background endothelin receptor antagonist and phosphodiesterase type 5 inhibitor therapy (the FREEDOM-C2 study): a randomized controlled trial. Chest 2013; 144 (03) 952-958
  • 37 Ghofrani H-A, D'Armini AM, Grimminger F. et al; CHEST-1 Study Group. Riociguat for the treatment of chronic thromboembolic pulmonary hypertension. N Engl J Med 2013; 369 (04) 319-329
  • 38 Ghofrani H-A, Galiè N, Grimminger F. et al; PATENT-1 Study Group. Riociguat for the treatment of pulmonary arterial hypertension. N Engl J Med 2013; 369 (04) 330-340
  • 39 Sandoval J, Torbicki A, Souza R. et al; STRIDE-4 investigators. Safety and efficacy of sitaxsentan 50 and 100 mg in patients with pulmonary arterial hypertension. Pulm Pharmacol Ther 2012; 25 (01) 33-39
  • 40 Sitbon O, Channick R, Chin KM. et al; GRIPHON Investigators. Selexipag for the treatment of pulmonary arterial hypertension. N Engl J Med 2015; 373 (26) 2522-2533
  • 41 Galiè N, Ghofrani HA, Torbicki A. et al; Sildenafil Use in Pulmonary Arterial Hypertension (SUPER) Study Group. Sildenafil citrate therapy for pulmonary arterial hypertension. N Engl J Med 2005; 353 (20) 2148-2157
  • 42 Benza RL, Barst RJ, Galie N. et al. Sitaxsentan for the treatment of pulmonary arterial hypertension: a 1-year, prospective, open-label observation of outcome and survival. Chest 2008; 134 (04) 775-782
  • 43 Barst RJ, Langleben D, Frost A. et al; STRIDE-1 Study Group. Sitaxsentan therapy for pulmonary arterial hypertension. Am J Respir Crit Care Med 2004; 169 (04) 441-447
  • 44 Oudiz RJ, Brundage BH, Galiè N. et al; PHIRST Study Group. Tadalafil for the treatment of pulmonary arterial hypertension: a double-blind 52-week uncontrolled extension study. J Am Coll Cardiol 2012; 60 (08) 768-774
  • 45 Galiè N, Brundage BH, Ghofrani HA. et al; Pulmonary Arterial Hypertension and Response to Tadalafil (PHIRST) Study Group. Tadalafil therapy for pulmonary arterial hypertension. Circulation 2009; 119 (22) 2894-2903
  • 46 Barst RJ, Langleben D, Badesch D. et al. STRIDE-2 Study Group. Treatment of pulmonary arterial hypertension with the selective endothelin-a receptor antagonist sitaxsentan. J Am Coll Cardiol 2006; 47 (10) 2049-2056
  • 47 Dootson A. Tracing data elements through a standard data flow. Pharmaceutical Programming 2011; 4 1–2: 59-69
  • 48 Holmes D, McCabe MC. Improving precision and recall for Soundex retrieval. Paper presented at: Proceedings International Conference on Information Technology: Coding and Computing, Las Vegas, Nevada; April 8–10, 2002:22–26
  • 49 Cohen A. Fuzzywuzzy: Fuzzy string matching in Python. Accessed April 20, 2021 at: https://github.com/seatgeek/fuzzywuzzy