CC BY-NC-ND 4.0 · Appl Clin Inform 2022; 13(03): 521-531
DOI: 10.1055/s-0042-1748144
Research Article

Transforming Thyroid Cancer Diagnosis and Staging Information from Unstructured Reports to the Observational Medical Outcome Partnership Common Data Model

Sooyoung Yoo
1   Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
,
Eunsil Yoon
1   Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
,
Dachung Boo
1   Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
,
Borham Kim
1   Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
,
Seok Kim
1   Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
,
Jin Chul Paeng
2   Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
,
Ie Ryung Yoo
3   Division of Nuclear Medicine, Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea
,
In Young Choi
4   Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea
5   Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
,
Kwangsoo Kim
6   Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, South Korea
,
Hyun Gee Ryoo
7   Department of Nuclear Medicine, Seoul National University Hospital, Seoul, South Korea
8   Department of Nuclear Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea
,
Sun Jung Lee
4   Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea
5   Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
,
Eunhye Song
9   Department of Data Science Research, Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, South Korea
,
Young-Hwan Joo
10   Biomedical Research Institute, Seoul National University Hospital, Seoul, South Korea
,
Junmo Kim
11   Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, South Korea
,
Ho-Young Lee
1   Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
2   Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
› Author Affiliations
Funding This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI19C0378). This work was also partly supported by the Technology Innovation Program (grant number: 20003883, Advancing and expanding CDM based distributed biohealth data platform) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea).

Abstract

Background Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date.

Objective We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports.

Methods Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data.

Results The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%.

Conclusion As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer–specific data for retrospective observational research and participate in multicenter studies.

Protection of Human and Animal Subjects

The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects and was reviewed and approved by each institutional review board of the three medical institutions participating in the study. The Common Data Model (CDM) database was retained at each medical institution, and only summary results were shared. No patient-level data were exported in this study.


Supplementary Material



Publication History

Received: 20 July 2021

Accepted: 31 December 2021

Article published online:
15 June 2022

© 2022. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Khozin S, Blumenthal GM, Pazdur R. Real-world data for clinical evidence generation in oncology. J Natl Cancer Inst 2017; 109 (11) 1-5
  • 2 Wang Y, Wang L, Rastegar-Mojarad M. et al. Clinical information extraction applications: a literature review. J Biomed Inform 2018; 77 (77) 34-49
  • 3 Wang L, Luo L, Wang Y, Wampfler J, Yang P, Liu H. Natural language processing for populating lung cancer clinical research data. BMC Med Inform Decis Mak 2019; 19 (Suppl. 05) 239
  • 4 Deshmukh PR, Phalnikar R. Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing. Health Technol (Berl) 2020; 10 (06) 1555-1570
  • 5 Johanna Johnsi Rani G, Gladis D, Manipadam MT, Ishitha G. Breast cancer staging using Natural Language Processing. 2015 Presented in International Conference on Advances in Computing, Communications and Informatics, ICACCI, August 10–13:2015. Kochi, India
  • 6 Wieneke AE, Bowles EJ, Cronkite D. et al. Validation of natural language processing to extract breast cancer pathology procedures and results. J Pathol Inform 2015; 6 (01) 38
  • 7 Yala A, Barzilay R, Salama L. et al. Using machine learning to parse breast pathology reports. Breast Cancer Res Treat 2017; 161 (02) 203-211
  • 8 Nobel JM, Puts S, Bakers FCH, Robben SGF, Dekker ALAJ. Natural language processing in dutch free text radiology reports: challenges in a small language area staging pulmonary oncology. J Digit Imaging 2020; 33 (04) 1002-1008
  • 9 Ryu B, Yoon E, Kim S. et al. Transformation of pathology reports into the common data model with oncology module: use case for colon cancer. J Med Internet Res 2020; 22 (12) e18526
  • 10 Spasić I, Livsey J, Keane JA, Nenadić G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform 2014; 83 (09) 605-623
  • 11 Idarraga AJ, Luong G, Hsiao V, Schneider DF. False negative rates in benign thyroid nodule diagnosis: machine learning for detecting malignancy. J Surg Res 2021; 268 (268) 562-569
  • 12 Zhang Q, Zhang S, Li J. et al. Improved diagnosis of thyroid cancer aided with deep learning applied to sonographic text reports: a retrospective, multi-cohort, diagnostic study. Cancer Biol Med 2021;18j.issn.2095-3941.2020.0509
  • 13 Hripcsak G, Duke JD, Shah NH. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015; 216: 574-578
  • 14 Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc 2012; 19 (01) 54-60
  • 15 Seong Y, You SC, Ostropolets A. et al. Incorporation of korean electronic data interchange vocabulary into observational medical outcomes partnership vocabulary. Healthc Inform Res 2021; 27 (01) 29-38
  • 16 Belenkaya R, Gurley M, Dymshyts D. et al. Standardized observational cancer research using the OMOP CDM oncology module. Stud Health Technol Inform 2019; 264: 1831-1832
  • 17 Edge SB, Byrd DR, Compton CC, Fritz AG, Greene FTrotti A. AJCC Cancer Staging Manual. 7th ed. New York, NY: Springer; 2010
  • 18 Ortiz S, Rodríguez JM, Soria T. et al. Extrathyroid spread in papillary carcinoma of the thyroid: clinicopathological and prognostic study. Otolaryngol Head Neck Surg 2001; 124 (03) 261-265
  • 19 Andersen PE, Kinsella J, Loree TR, Shaha AR, Shah JP. Differentiated carcinoma of the thyroid with extrathyroidal extension. Am J Surg 1995; 170 (05) 467-470
  • 20 Ahn HY, Park YJ. Incidence and clinical characteristics of thyroid cancer in Korea. Korean J Med 2009; 77 (05) 537-542
  • 21 Belenkaya R, Gurley MJ, Golozar A. et al. Extending the OMOP common data model and standardized vocabularies to support observational cancer research. JCO Clin Cancer Inform 2021; 5: 12-20