Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality ImprovementFunding This project work was supported by the Agency for Healthcare Research and Quality (AHRQ) grant R01HS020270. The PECARN infrastructure was supported by the Health Resources and Services Administration (HRSA), the Maternal and Child Health Bureau (MCHB), and the Emergency Medical Services for Children (EMSC) Network Development Demonstration Program under cooperative agreements U03MC00008, U03MC00001, U03MC00003, U03MC00006, U03MC00007, U03MC22684, and U03MC22685. This information or content and conclusions are those of the author and should not be construed as the official position or policy of, nor should any endorsements be inferred by HRSA, HHS or the U.S. Government.
01 August 2016
accepted: 26 September 2016
18 December 2017 (online)
Background Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed.
Objective To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement.
Methods Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English “stop words” and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures.
Results There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall=0.960, precision=0.896, and F1 score=0.927).
Conclusions NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents.
Citation: Grundmeier RW, Masino AJ, Casper TC, Dean JM, Bell J, Enriquez R, Deakyne S, Chamberlain JM, Alpern ER. Identification of long bone fractures in radiology reports using natural language processing to support healthcare quality improvement.
- 1 Black AD, Car J, Pagliari C, Anandan C, Cresswell K, Bokun T, McKinstry B, Procter R, Majeed A, Sheikh A. The impact of eHealth on the quality and safety of health care: a systematic overview. PLoS medicine 2011; 08 (01) e1000387.
- 2 van Poelgeest R, Heida J-P, Pettit L, de Leeuw RJ, Schrijvers G. The Association between eHealth Capabilities and the Quality and Safety of Health Care in the Netherlands: Comparison of HIMSS Analytics EMRAM data with Elsevier’s “The Best Hospitals” data. Journal of Medical Systems 2015; 39 (09) 90.
- 3 Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PRO, Bernstam EV, Lehmann HP, Hripcsak G, Hartzog TH, Cimino JJ, Saltz JH. Caveats for the Use of Operational Electronic Health Record Data in Comparative Effectiveness Research. 2013; 51: S30-7.
- 4 Devine EB, Capurro D, van Eaton E, Alfonso-Cristancho R, Devlin A, Yanez ND, Yetisgen-Yildiz M, Flum DR, Tarczy-Hornoch P. Preparing Electronic Clinical Data for Quality Improvement and Comparative Effectiveness Research: The SCOAP CERTAIN Automation and Validation Project. EGEMS (Washington, DC) 2013; 01 (01) 1025.
- 5 Roane TE, Patel V, Hardin H, Knoblich M. Discrepancies identified with the use of prescription claims and diagnostic billing data following a comprehensive medication review. Journal of managed care pharmacy JMCP 2014; 20 (02) 165-73.
- 6 Quan H, Parsons GA, Ghali WA. Validity of procedure codes in International Classification of Diseases, 9th revision, clinical modification administrative data. 2004; 42 (08) 801-9.
- 7 Heintzman J, Bailey SR, Hoopes MJ, Le T, Gold R, O’Malley JP, Cowburn S, Marino M, Krist A, DeVoe JE. Agreement of Medicaid claims and electronic health records for assessing preventive care quality among adults. The Oxford University Press 2014; 21 (04) 720-4.
- 8 Krive J, Patel M, Gehm L, Mackey M, Kulstad E, Li JJ, Lussier YA, Boyd AD. The complexity and challenges of the International Classification of Diseases, Ninth Revision, Clinical Modification to International Classification of Diseases, 10th Revision, Clinical Modification transition in EDs. The American journal of emergency medicine 2015; 33 (05) 713-8.
- 9 Yadav K, Sarioglu E, Smith M, Choi H-A. Automated outcome classification of emergency department computed tomography imaging reports. Academic emergency medicine : official journal of the Society for Academic Emergency Medicine 2013; 20 (08) 848-54.
- 10 Friedlin J, Mahoui M, Jones J, Jamieson P. Knowledge Discovery and Data Mining of Free Text Radiology Reports. IEEE 2011; 89-96.
- 11 Womack JA, Scotch M, Gibert C, Chapman W, Yin M, Justice AC, Brandt C. A comparison of two approaches to text processing: facilitating chart reviews of radiology reports in electronic medical records. Perspectives in health information management / AHIMA, American Health Information Management Association 2010; 07: 1a.
- 12 Hersh W. Evaluation of biomedical text-mining systems: lessons learned from information retrieval. Briefings in bioinformatics 2005; 06 (04) 344-56.
- 13 Alpern ER, Alessandrini EA, Casper TC, Bajaj L, Gorelick MH, Gerber JS, Funai T, Grundmeier RW, Enriquez R, Dean JM, Campos DA, Deakyne SJ, Bell J, Hayes KL, Kittick M, Chamberlain JM. Benchmarks in Pediatric Emergency Medicine Performance Measures Derived from an Multicenter Electronic Health Record Registry. Platform Presentation at the Pediatric Academic Societies. San Diego, CA: 2015
- 14 Deakyne SJ, Grundmeier RW, Campos DA, Hayes KL, Cao J, Enriquez R, Bell J, Fahim C, Casper TC, Funai T, Scheid B, Kittick M, Dean JM, Alessandrini EA, Bajaj L, Gorelick MH, Chamberlain JM, Alpern ER. Building a Pediatric Emergency Care Electronic Medical Registry. Poster Session at the Pediatric Academic Societies. San Diego, CA: 2015
- 15 Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. Lippincott Williams & Wilkins 2000; 101 (23) E215-20.
- 16 Neamatullah I, Douglass MM, Lehman L-WH, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD. Automated de-identification of free-text medical records. BioMed Central Ltd 2008; 08 (01) 32.
- 17 R Core Team. R. A Language and Environment for Statistical Computing [Internet]. 3rd ed. Vienna: Austria: R Foundation for Statistical Computing. Available from: http://www.R-project.org/.
- 18 Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. The Oxford University Press 2011; 18 (05) 544-51.
- 19 Feinerer I, Hornik K. Package “tm” [Internet]. cran.r-project.org. 2015 [cited 2015 Sep 10]. 34-5 Available from: https://cran.r-project.org/web/packages/tm/tm.pdf..
- 20 Porter MF. An algorithm for suffix stripping. Program: electronic library and information systems 1980; 14 (03) 130-7.
- 21 Bouchet-Valat M. Package “SnowballC” [Internet]. cran.r-project.org. 2014 [cited 2015 Sep 10]. Available from: https://cran.r-project.org/web/packages/SnowballC/SnowballC.pdf
- 22 Cortes C, Vapnik V. Support-Vector Networks. Kluwer Academic Publishers 1995; 20 (03) 273-97.
- 23 Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 2010; 33 (01) 1-22.
- 24 Sevenster M, Buurman J, Liu P, Peters JF, Chang PJ. Natural Language Processing Techniques for Extracting and Categorizing Finding Measurements in Narrative Radiology Reports. 2015; 06 (03) 600-10.
- 25 Do BH, Wu AS, Maley J, Biswal S. Automatic retrieval of bone fracture knowledge using natural language processing. J Digit Imaging 2013; 26 (04) 709-13.
- 26 Pathak J, Bailey KR, Beebe CE, Bethard S, Carrell DC, Chen PJ, Dligach D, Endle CM, Hart LA, Haug PJ, Huff SM, Kaggal VC, Li D, Liu H, Marchant K, Masanz J, Miller T, Oniki TA, Palmer M, Peterson KJ, Rea S, Savova GK, Stancl CR, Sohn S, Solbrig HR, Suesse DB, Tao C, Taylor DP, Westberg L, Wu S, Zhuo N, Chute CG. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. The Oxford University Press 2013; 20 (e2): e341-8.
- 27 Lasko TA, Denny JC, Levy MA. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PloS one. Public Library of Science 2013; 08 (06) e66341.
- 28 Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports. Artificial Intelligence In Medicine 2016; 66: 29-39.
- 29 Widmer G, Kubat M. Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning. Kluwer Academic Publishers-Plenum Publishers 1996; 23 (01) 69-101.
- 30 Learning under Concept Drift: an Overview. Vol. cs.AI, arXiv.org. 2010
- 31 Holzinger A. Interactive machine learning for health informatics: when do we need the human-in-theloop?. Brain Inf. Berlin: Springer; 2016: 1-13.
- 32 Harkema H, Dowling JN, Thornblade T, Chapman WW. ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports. JBI 2009; 42 (05) 839-51.