Methods Inf Med 2024; 63(05/06): 176-182
DOI: 10.1055/a-2590-6456
Original Article

Automated Information Extraction from Unstructured Hematopathology Reports to Support Response Assessment in Myeloproliferative Neoplasms

Spencer Krichevsky*
1   Division of Hematology and Medical Oncology, Richard T. Silver Myeloproliferative Neoplasms Center, Weill Cornell Medicine, New York, New York, United States
,
Evan T. Sholle*
2   Information Technologies and Services, Weill Cornell Medicine, New York, New York, United States
,
Prakash M. Adekkanattu
2   Information Technologies and Services, Weill Cornell Medicine, New York, New York, United States
,
Sajjad Abedian
2   Information Technologies and Services, Weill Cornell Medicine, New York, New York, United States
,
Madhu Ouseph
3   Department of Pathology, Weill Cornell Medicine, New York, New York, United States
,
Elwood Taylor
1   Division of Hematology and Medical Oncology, Richard T. Silver Myeloproliferative Neoplasms Center, Weill Cornell Medicine, New York, New York, United States
,
Ghaith Abu-Zeinah
1   Division of Hematology and Medical Oncology, Richard T. Silver Myeloproliferative Neoplasms Center, Weill Cornell Medicine, New York, New York, United States
,
Diana Jaber
1   Division of Hematology and Medical Oncology, Richard T. Silver Myeloproliferative Neoplasms Center, Weill Cornell Medicine, New York, New York, United States
,
Claudia Sosner
1   Division of Hematology and Medical Oncology, Richard T. Silver Myeloproliferative Neoplasms Center, Weill Cornell Medicine, New York, New York, United States
,
Marika M. Cusick
2   Information Technologies and Services, Weill Cornell Medicine, New York, New York, United States
,
Niamh Savage
1   Division of Hematology and Medical Oncology, Richard T. Silver Myeloproliferative Neoplasms Center, Weill Cornell Medicine, New York, New York, United States
,
Richard T. Silver
1   Division of Hematology and Medical Oncology, Richard T. Silver Myeloproliferative Neoplasms Center, Weill Cornell Medicine, New York, New York, United States
,
Joseph M. Scandura*
1   Division of Hematology and Medical Oncology, Richard T. Silver Myeloproliferative Neoplasms Center, Weill Cornell Medicine, New York, New York, United States
,
Thomas R. Campion Jr.*
2   Information Technologies and Services, Weill Cornell Medicine, New York, New York, United States
4   Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, United States
5   Clinical and Translational Science Center, Weill Cornell Medicine, New York, New York, United States
› Institutsangaben

Funding This study received support from New York-Presbyterian Hospital (NYPH) and Weill Cornell Medical College (WCMC), including the Clinical and Translational Science Center (CTSC) (UL1 TR000457) and Joint Clinical Trials Office (JCTO). This study was supported in part by the William and Judy Higgins Trust and the Johns Family Foundation of the Cancer Research and Treatment Fund, Inc. New York, New York, United States.
Preview

Abstract

Background

Assessing treatment response in patients with myeloproliferative neoplasms is difficult because data components exist in unstructured bone marrow pathology (hematopathology) reports, which require specialized, manual annotation, and interpretation. Although natural language processing (NLP) has been successfully implemented for the extraction of features from solid tumor reports, little is known about its application to hematopathology.

Methods

An open-source NLP framework called Leo was implemented to parse document segments and extract concept phrases utilized for assessing responses in myeloproliferative neoplasms. A reference standard was generated through the manual review of hematopathology notes.

Results

Compared with a reference standard (n = 300 reports), our NLP method extracted features such as aspirate myeloblasts (F1 = 98%) and biopsy reticulin fibrosis (F1 = 93%) with high accuracy. However, other values, such as myeloblasts from the biopsy (F1 = 6%) and via flow cytometry (F1 = 8%), were affected by sparsity representative of reporting conventions. The four features with the highest clinical importance were extracted with F1 scores exceeding 90%. Whereas manual annotation of 300 reports required 30 hours of staff effort, automated NLP required 3.5 hours of runtime for 34,301 reports.

Conclusion

To the best of our knowledge, this is among the first studies to demonstrate the application of NLP to hematopathology for clinical feature extraction. The approach may inform efforts at other institutions, and the code is available at https://github.com/wcmc-research-informatics/BmrExtractor.

* These authors contributed equally to the work.


Supplementary Material



Publikationsverlauf

Eingereicht: 23. September 2024

Angenommen: 09. April 2025

Accepted Manuscript online:
17. April 2025

Artikel online veröffentlicht:
09. Mai 2025

© 2025. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany