Open Access
CC BY 4.0 · Appl Clin Inform 2025; 16(03): 556-568
DOI: 10.1055/a-2544-3117
Research Article

Using Electronic Health Records to Classify Cancer Site and Metastasis

Authors

  • Kurt Kroenke

    1   Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, United States
    2   Regenstrief Institute, Inc., Indianapolis, Indiana, United States
  • Kathryn J. Ruddy

    3   Division of Medical Oncology, Mayo Clinic, Rochester, Minnesota, United States
  • Deirdre R. Pachman

    4   Division of Community Internal Medicine, Geriatrics, and Palliative Care, Mayo Clinic, Rochester, Minnesota, United States
  • Veronica Grzegorczyk

    5   Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, Minnesota, United States
  • Jeph Herrin

    6   Department of Internal Medicine, Yale University School of Medicine, New Haven, Connecticut, United States
  • Parvez A. Rahman

    7   Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, United States
  • Kyle A. Tobin

    7   Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, United States
  • Joan M. Griffin

    7   Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, United States
    8   Division of Health Care Delivery Research, Mayo Clinic, Rochester, Minnesota, United States
  • Linda L. Chlan

    9   Division of Nursing Research, Department of Nursing, Mayo Clinic College of Medicine and Science, Mayo Clinic, Rochester, Minnesota, United States
  • Jessica D. Austin

    10   Department of Epidemiology, Mayo Clinic College of Medicine and Science, Scottsdale, Arizona, United States
  • Jennifer L. Ridgeway

    7   Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, United States
    8   Division of Health Care Delivery Research, Mayo Clinic, Rochester, Minnesota, United States
  • Sandra A. Mitchell

    11   Outcomes Research Branch, Healthcare Delivery Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, Maryland, United States
  • Keith A. Marsolo

    12   Department of Population Health Sciences, Duke University School of Medicine, Durham, North Carolina, United States
  • Andrea L. Cheville

    5   Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, Minnesota, United States
    7   Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, United States

Funding E2C2 was funded as part of the Improving the Management of symPtoms during and following Cancer Treatment (IMPACT) Consortium, a National Cancer Institute Cancer Moonshot™ Research Initiative under the authorization of the 2016 United States 21st Century Cures Act. This research was supported by the National Cancer Institute of the NIH, UM1CA233033 (PI Cheville, Mayo Clinic, Rochester, MN). The findings expressed in this manuscript do not necessarily reflect the opinion of the NIH.
Preview

Abstract

Background

The Enhanced EHR-facilitated Cancer Symptom Control (E2C2) Trial is a pragmatic trial testing a collaborative care approach for managing common cancer symptoms. There were challenges in identifying cancer site and metastatic status.

Objectives

This study compares three different approaches to determine cancer site and six strategies for identifying the presence of metastasis using EHR and cancer registry data.

Methods

The E2C2 cohort included 50,559 patients seen in the medical oncology clinics of a large health system. SPPADE symptoms were assessed with 0 to 10 numeric rating scales (NRS). A multistep process was used to develop three approaches for representing cancer site: the single most prevalent International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) code, the two most prevalent codes, and any diagnostic code. Six approaches for identifying metastatic disease were compared: ICD-10 codes, natural language processing (NLP), cancer registry, medications typically prescribed for incurable disease, treatment plan, and evaluation for phase 1 trials.

Results

The approach counting the two most prevalent ICD-10 cancer site diagnoses per patient detected a median of 92% of the cases identified by counting all cancer site diagnoses, whereas the approach counting only the single most prevalent cancer site diagnosis identified a median of 65%. However, agreement among the three approaches was very good (kappa > 0.80) for most cancer sites. ICD and NLP methods could be applied to the entire cohort and had the highest agreement (kappa = 0.53) for identifying metastasis. Cancer registry data was available for less than half of the patients.

Conclusion

Identification of cancer site and metastatic disease using EHR data was feasible in this large and diverse cohort of patients with common cancer symptoms. The methods were pragmatic and may be acceptable for covariates, but likely require refinement for key dependent and independent variables.

Protection of Human and Animal Subjects

The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects, and was reviewed by the Mayo Clinic Institutional Review Board.


Supplementary Material



Publikationsverlauf

Eingereicht: 11. Januar 2025

Angenommen: 18. Februar 2025

Artikel online veröffentlicht:
18. Juni 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany