RSS-Feed abonnieren

DOI: 10.1055/a-2544-3117
Using Electronic Health Records to Classify Cancer Site and Metastasis
Funding E2C2 was funded as part of the Improving the Management of symPtoms during and following Cancer Treatment (IMPACT) Consortium, a National Cancer Institute Cancer Moonshot™ Research Initiative under the authorization of the 2016 United States 21st Century Cures Act. This research was supported by the National Cancer Institute of the NIH, UM1CA233033 (PI Cheville, Mayo Clinic, Rochester, MN). The findings expressed in this manuscript do not necessarily reflect the opinion of the NIH.

Abstract
Background
The Enhanced EHR-facilitated Cancer Symptom Control (E2C2) Trial is a pragmatic trial testing a collaborative care approach for managing common cancer symptoms. There were challenges in identifying cancer site and metastatic status.
Objectives
This study compares three different approaches to determine cancer site and six strategies for identifying the presence of metastasis using EHR and cancer registry data.
Methods
The E2C2 cohort included 50,559 patients seen in the medical oncology clinics of a large health system. SPPADE symptoms were assessed with 0 to 10 numeric rating scales (NRS). A multistep process was used to develop three approaches for representing cancer site: the single most prevalent International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) code, the two most prevalent codes, and any diagnostic code. Six approaches for identifying metastatic disease were compared: ICD-10 codes, natural language processing (NLP), cancer registry, medications typically prescribed for incurable disease, treatment plan, and evaluation for phase 1 trials.
Results
The approach counting the two most prevalent ICD-10 cancer site diagnoses per patient detected a median of 92% of the cases identified by counting all cancer site diagnoses, whereas the approach counting only the single most prevalent cancer site diagnosis identified a median of 65%. However, agreement among the three approaches was very good (kappa > 0.80) for most cancer sites. ICD and NLP methods could be applied to the entire cohort and had the highest agreement (kappa = 0.53) for identifying metastasis. Cancer registry data was available for less than half of the patients.
Conclusion
Identification of cancer site and metastatic disease using EHR data was feasible in this large and diverse cohort of patients with common cancer symptoms. The methods were pragmatic and may be acceptable for covariates, but likely require refinement for key dependent and independent variables.
Keywords
neoplasms - cancer site - metastasis - pragmatic clinical trial - electronic health records - natural language processing - cancer registryProtection of Human and Animal Subjects
The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects, and was reviewed by the Mayo Clinic Institutional Review Board.
Publikationsverlauf
Eingereicht: 11. Januar 2025
Angenommen: 18. Februar 2025
Artikel online veröffentlicht:
18. Juni 2025
© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Van Lancker A, Velghe A, Van Hecke A. et al. Prevalence of symptoms in older cancer patients receiving palliative care: a systematic review and meta-analysis. J Pain Symptom Manage 2014; 47 (01) 90-104
- 2 Cleeland CS. Symptom burden: multiple symptoms and their impact as patient-reported outcomes. J Natl Cancer Inst Monogr 2007; 2007 (37) 16-21
- 3 Esther Kim JE, Dodd MJ, Aouizerat BE, Jahan T, Miaskowski C. A review of the prevalence and impact of multiple symptoms in oncology patients. J Pain Symptom Manage 2009; 37 (04) 715-736
- 4 Kjaer TK, Johansen C, Ibfelt E. et al. Impact of symptom burden on health related quality of life of cancer survivors in a Danish cancer rehabilitation program: a longitudinal study. Acta Oncol 2011; 50 (02) 223-232
- 5 Tan CJ, Yip SYC, Chan RJ, Chew L, Chan A. Investigating how cancer-related symptoms influence work outcomes among cancer survivors: a systematic review. J Cancer Surviv 2022; 16 (05) 1065-1078
- 6 Hirpara D, Eskander A, Coburn N, Sutradhar R, Chan W, Hallet J. Association between patient-reported symptoms and health care resource utilization: a first step to develop patient-centred value measures in cancer care. Can J Surg 2022; 65: S99-S100
- 7 Farrell MM, Jiang C, Moss G. et al. Associations between symptoms with healthcare utilization and death in advanced cancer patients. Support Care Cancer 2023; 31 (03) 183
- 8 Kroenke K, Lam V, Ruddy KJ. et al. Prevalence, severity, and co-occurrence of SPPADE symptoms in 31,866 patients with cancer. J Pain Symptom Manage 2023; 65 (05) 367-377
- 9 Barbera L, Seow H, Howell D. et al. Symptom burden and performance status in a population-based cohort of ambulatory cancer patients. Cancer 2010; 116 (24) 5767-5776
- 10 Harrington CB, Hansen JA, Moskowitz M, Todd BL, Feuerstein M. It's not over when it's over: long-term symptoms in cancer survivors—a systematic review. Int J Psychiatry Med 2010; 40 (02) 163-181
- 11 Finney Rutten LJ, Ruddy KJ, Chlan LL. et al. Pragmatic cluster randomized trial to evaluate effectiveness and implementation of enhanced EHR-facilitated cancer symptom control (E2C2). Trials 2020; 21 (01) 480
- 12 Kim E, Rubinstein SM, Nead KT, Wojcieszynski AP, Gabriel PE, Warner JL. The Evolving Use of Electronic Health Records (EHR) for Research. Elsevier; 2019: 354-361
- 13 Ling AY, Kurian AW, Caswell-Jin JL, Sledge Jr GW, Shah NH, Tamang SR. Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. JAMIA Open 2019; 2 (04) 528-537
- 14 Alba PR, Gao A, Lee KM. et al. Ascertainment of veterans with metastatic prostate cancer in electronic health records: demonstrating the case for natural language processing. JCO Clin Cancer Inform 2021; 5: 1005-1014
- 15 Warner JL, Levy MA, Neuss MN, Warner JL, Levy MA, Neuss MN. ReCAP: feasibility and accuracy of extracting cancer stage information from narrative electronic health record data. J Oncol Pract 2016; 12 (02) 157-158 , e169–e7
- 16 Warren JL, Yabroff KR. Challenges and opportunities in measuring cancer recurrence in the United States. J Natl Cancer Inst 2015; 107 (08) djv134
- 17 Richesson RL, Marsolo KS, Douthit BJ. et al. Enhancing the use of EHR systems for pragmatic embedded research: lessons from the NIH Health Care Systems Research Collaboratory. J Am Med Inform Assoc 2021; 28 (12) 2626-2640
- 18 Raman SR, O'Brien EC, Hammill BG. et al. Evaluating fitness-for-use of electronic health records in pragmatic clinical trials: reported practices and recommendations. J Am Med Inform Assoc 2022; 29 (05) 798-804
- 19 Herrin J, Finney Rutten LJ, Ruddy KJ, Kroenke K, Cheville AL. Pragmatic cluster randomized trial to evaluate effectiveness and implementation of EHR-facilitated collaborative symptom control in cancer (E2C2): addendum. Trials 2023; 24 (01) 21
- 20 Wang L, Fu S, Wen A. et al. Assessment of electronic health record for cancer research and patient care through a scoping review of cancer natural language processing. JCO Clin Cancer Inform 2022; 6: e2200006
- 21 Liu K, Kulkarni O, Witteveen-Lane M, Chen B, Chesla D. MetBERT: A Generalizable and Pre-trained Deep Learning Model for the Prediction of Metastatic Cancer from Clinical Notes. American Medical Informatics Association; 2022: 331
- 22 Soysal E, Warner JL, Denny JC, Xu H. Identifying metastases-related information from pathology reports of lung cancer patients. AMIA Jt Summits Transl Sci Proc 2017; 2017: 268-277
- 23 Liu S, Wen A, Wang L. et al. An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C). J Am Med Inform Assoc 2023; 30 (12) 2036-2040
- 24 Kroenke K, Stump TE, Kean J, Talib TL, Haggstrom DA, Monahan PO. PROMIS 4-item measures and numeric rating scales efficiently assess SPADE symptoms compared with legacy measures. J Clin Epidemiol 2019; 115: 116-124
- 25 Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33 (01) 159-174
- 26 Yang D, Dalton JE. A unified approach to measuring the effect size between two groups using SAS. SAS Global Forum; 2012. 335. 1-6
- 27 Stuart EA, Lee BK, Leacy FP. Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. J Clin Epidemiol 2013; 66 (8, Suppl): S84-S90 , 90.e1
- 28 Hassett MJ, Uno H, Cronin AM, Carroll NM, Hornbrook MC, Ritzwoller D. Detecting lung and colorectal cancer recurrence using structured clinical/administrative data to enable outcomes research and population health management. Med Care 2017; 55 (12) e88-e98
- 29 Carroll NM, Ritzwoller DP, Banegas MP. et al. Performance of cancer recurrence algorithms after coding scheme switch from international classification of diseases 9th Revision to International Classification of Diseases 10th Revision. JCO Clin Cancer Inform 2019; 3: 1-9
- 30 He J, Mark L, Hilton C. et al. A comparison of structured data query methods versus natural language processing to identify metastatic melanoma cases from electronic health records. Int J Comput Med Healthcare 2019; 1 (01) 101-111
- 31 Seneviratne MG, Banda JM, Brooks JD, Shah NH, Hernandez-Boussard TM. Identifying cases of metastatic prostate cancer using machine learning on electronic health records. AMIA Annu Symp Proc 2018; 2018: 1498-1504
- 32 Hassett MJ, Ritzwoller DP, Taback N. et al. Validating billing/encounter codes as indicators of lung, colorectal, breast, and prostate cancer recurrence using 2 large contemporary cohorts. Med Care 2014; 52 (10) e65-e73
- 33 Ritzwoller DP, Hassett MJ, Uno H. et al. Development, validation, and dissemination of a breast cancer recurrence detection and timing informatics algorithm. J Natl Cancer Inst 2018; 110 (03) 273-281
- 34 Ping X-O, Tseng Y-J, Chung Y. et al. Information extraction for tracking liver cancer patients' statuses: from mixture of clinical narrative report types. Telemed J E Health 2013; 19 (09) 704-710
- 35 Lerro CC, Bradley MC, Forshee RA, Rivera DR. The bar is high: evaluating fit-for-use oncology real-world data for regulatory decision making. JCO Clin Cancer Inform 2024; 8: e2300261
- 36 Cooper JD, Shou K, Sunderland K, Pham K, Thornton JA, DeStefano CB. Real-world pitfalls of analyzing real-world data: a cautionary note and path forward. JCO Clin Cancer Inform 2023; 7 (07) e2300097
- 37 NIH Pragmatic Trials Collaboratory: Rethinking Clinical Trials. Accessed June 29, 2024 at: https://rethinkingclinicaltrials.org/chapters/design/using-electronic-health-record-data-pragmatic-clinical-trials-top/using-electronic-health-record-data-in-pragmatic-clinical-trials-introduction/
- 38 Li C, Zhang Y, Weng Y, Wang B, Li Z. Natural language processing applications for computer-aided diagnosis in oncology. Diagnostics (Basel) 2023; 13 (02) 286
- 39 Sangariyavanich E, Ponthongmak W, Tansawet A. et al. Systematic review of natural language processing for recurrent cancer detection from electronic medical records. Inform Med Unlocked 2023; 41 (03) 101326
- 40 Choi YC, Zhang D, Tyczynski JE. Comparison between health insurance claims and electronic health records (EHRs) for metastatic non-small-cell lung cancer (NSCLC) patient characteristics and treatment patterns: a retrospective cohort study. Drugs Real World Outcomes 2021; 8 (04) 577-587
- 41 Stafkey-Mailey D, Landsman-Blumberg P, Farrelly E, Eaddy M. Comparison of methods to identify stage IIIB or IV metastatic lung cancer patients from electronic medical records. Value Health 2013; 16 (07) A582