The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance

Jeffrey P. Ferraro; Ye Ye; Per H. Gesteland; Peter J. Haug; Fuchiang Tsui; Gregory F. Cooper; Rudy Van Bree; Thomas Ginter; Andrew J. Nowalk; Michael Wagner

doi:10.4338/ACI-2016-12-RA-0211

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035026.xml

Download PDF

Appl Clin Inform 2017; 08(02): 560-580
DOI: 10.4338/ACI-2016-12-RA-0211

Research Article

Schattauer GmbH

The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance

Authors

Jeffrey P. Ferraro

¹Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA

²Intermountain Healthcare, Salt Lake City, Utah, USA
Ye Ye

³Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

⁴Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Per H. Gesteland

¹Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA

⁵Department of Pediatrics, University of Utah, Salt Lake City, Utah, USA
Peter J. Haug

¹Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA

²Intermountain Healthcare, Salt Lake City, Utah, USA
Fuchiang Tsui

³Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

⁴Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Gregory F. Cooper

³Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Rudy Van Bree

²Intermountain Healthcare, Salt Lake City, Utah, USA
Thomas Ginter

⁶VA Salt Lake City Healthcare System, Salt Lake City, Utah
Andrew J. Nowalk

⁷Department of Pediatrics, Children‘s Hospital of Pittsburgh of University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Michael Wagner

³Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

⁴Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

Funding Research reported in this publication was supported by grant R01LM011370 from the National Library of Medicine.

Further Information

Publication History

received: 31 December 2016

accepted: 11 March 2017

Publication Date:
21 December 2017 (online)

Also available at

Permissions and Reprints

Summary

Objectives: This study evaluates the accuracy and portability of a natural language processing (NLP) tool for extracting clinical findings of influenza from clinical notes across two large healthcare systems. Effectiveness is evaluated on how well NLP supports downstream influenza case-detection for disease surveillance.

Methods: We independently developed two NLP parsers, one at Intermountain Healthcare (IH) in Utah and the other at University of Pittsburgh Medical Center (UPMC) using local clinical notes from emergency department (ED) encounters of influenza. We measured NLP parser performance for the presence and absence of 70 clinical findings indicative of influenza. We then developed Bayesian network models from NLP processed reports and tested their ability to discriminate among cases of (1) influenza, (2) non-influenza influenza-like illness (NI-ILI), and (3) ‘other’ diagnosis.

Results: On Intermountain Healthcare reports, recall and precision of the IH NLP parser were 0.71 and 0.75, respectively, and UPMC NLP parser, 0.67 and 0.79. On University of Pittsburgh Medical Center reports, recall and precision of the UPMC NLP parser were 0.73 and 0.80, respectively, and IH NLP parser, 0.53 and 0.80. Bayesian case-detection performance measured by AUROC for influenza versus non-influenza on Intermountain Healthcare cases was 0.93 (using IH NLP parser) and 0.93 (using UPMC NLP parser). Case-detection on University of Pittsburgh Medical Center cases was 0.95 (using UPMC NLP parser) and 0.83 (using IH NLP parser). For influenza versus NI-ILI on Intermountain Healthcare cases performance was 0.70 (using IH NLP parser) and 0.76 (using UPMC NLP parser). On University of Pisstburgh Medical Center cases, 0.76 (using UPMC NLP parser) and 0.65 (using IH NLP parser).

Conclusion: In all but one instance (influenza versus NI-ILI using IH cases), local parsers were more effective at supporting case-detection although performances of non-local parsers were reasonable.

Citation: Ferraro JP, Ye Y, Gesteland PH, Haug PJ, Tsui F(R), Cooper GF, Van Bree R, Ginter T, Nowalk AJ, Wagner M. The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance. Appl Clin Inform 2017; 8: 560–580 https://doi.org/10.4338/ACI-2016-12-RA-0211

Keywords

Natural language processing - case detection - disease surveillance - generalizability - portability

Human Subjects Protection

This study was conducted with Institutional Review Board (IRB) approval obtained from both healthcare systems governing protection of human and animal subjects.

References
1 Shaikh AT, Ferland L, Hood-Cree R, Shaffer L, McNabb SJ. Disruptive Innovation Can Prevent the Next Pandemic. Frontiers in public health. 2015: 3.

Reference Link Ris
PubMed Search in Google Scholar
2 Buckeridge DL. Outbreak detection through automated surveillance: a review of the determinants of detection. J Biomed Inform 2007; 40 (04) 370-379.

Reference Link Ris
Crossref PubMed Search in Google Scholar
3 Fineberg HV. Pandemic preparedness and response—lessons from the H1N1 influenza of 2009. N Engl J Med 2014; 370 (14) 1335-1342.

Reference Link Ris
Crossref PubMed Search in Google Scholar
4 Veenema T, Tõke J. Early detection and surveillance for biopreparedness and emerging infectious diseases. Online journal of issues in nursing. 2006 11(1).

Reference Link Ris
PubMed Search in Google Scholar
5 Morse SS. Public health surveillance and infectious disease detection. Biosecurity and bioterrorism: biodefense strategy, practice, and science 2012; 10 (01) 6-16.

Reference Link Ris
Crossref PubMed Search in Google Scholar
6 Moon S, Leigh J, Woskie L, Checchi F, Dzau V, Fallah M, Fitzgerald G, Garrett L, Gostin L, Heymann DL. Post-Ebola reforms: ample analysis, inadequate action. Bmj 2017; 356: j280.

Reference Link Ris
PubMed Search in Google Scholar
7 Clemmons NS, Gastanaduy PA, Fiebelkorn AP, Redd SB, Wallace GS, Control CfD. Prevention. Measles—United States, January 4–April 2, 2015. MMWR Morb Mortal Wkly Rep 2015; 64 (14) 373-376.

Reference Link Ris
PubMed Search in Google Scholar
8 Gerbier-Colomban S, Potinet-Pagliaroli V, Metzger M-H. Can epidemic detection systems at the hospital level complement regional surveillance networks: Case study with the influenza epidemic?. BMC infectious diseases 2014; 14 (01) 381.

Reference Link Ris
Crossref PubMed Search in Google Scholar
9 Control CfD, Prevention. State electronic disease surveillance systems---United States, 2007 and 2010. MMWR: Morbidity and mortality weekly report 2011; 60 (41) 1421-1423.

Reference Link Ris
PubMed Search in Google Scholar
10 Dixon BE, Siegel JA, Oemig TV, Grannis SJ. Towards Interoperability for public health surveillance: experiences from two states. Online journal of public health informatics. 2013 5(1).

Reference Link Ris
PubMed Search in Google Scholar
11 Gesteland PH, Wagner MM, Chapman WW, Espino JU, Tsui F-C, Gardner RM, Rolfs RT, Dato V, James BC, Haug PJ. Rapid deployment of an electronic disease surveillance system in the state of Utah for the 2002 Olympic winter games. Proc AMIA Symp 2002: 285-289.

Reference Link Ris
PubMed
12 Centers for Disease Control and Prevention, National Syndromic Surveillance Program (NSSP) –Bio-Sense Platform 2003 [updated March 31, 2016 accessed Apr 2016]. Available from: http://www.cdc.gov/nssp/biosense/index.html

Reference Link Ris
PubMed
13 Lombardo J, Burkom H, Elbert E, Magruder S, Lewis SH, Loschen W, Sari J, Sniegoski C, Wojcik R, Pavlin J. A systems overview of the Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE II). J Urban Health 2013; 80 (Suppl. 01) i32-i42.

Reference Link Ris
PubMed Search in Google Scholar
14 Ferraro J, Haug P, Mynam K, Post H, Li Y, Jephson A, Stoddard G, Vines C, Allen T, Dean N. Performance of a real-time electronic screening tool for pneumonia. Am J Respir Crit Care Med 2012; 185: A5136.

Reference Link Ris
PubMed Search in Google Scholar
15 Dean NC, Jones BE, Ferraro JP, Vines CG, Haug PJ. Performance and utilization of an emergency department electronic screening tool for pneumonia. JAMA Intern Med 2013; 173 (08) 699-701.

Reference Link Ris
Crossref PubMed Search in Google Scholar
16 Moore CR, Farrag A, Ashkin E. Using Natural Language Processing to Extract Abnormal Results From Cancer Screening Reports. J Patient Saf. 2014

Reference Link Ris
PubMed Search in Google Scholar
17 Ye Y, Tsui F, Wagner M, Espino JU, Li Q. Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. J Am Med Inform Assoc 2014; 21 (05) 815-823.

Reference Link Ris
Crossref PubMed Search in Google Scholar
18 Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, Gainer VS, Shaw SY, Xia Z, Szolovits P. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. bmj 2015; 350: h1885.

Reference Link Ris
Crossref PubMed Search in Google Scholar
19 Castro VM, Minnier J, Murphy SN, Kohane I, Churchill SE, Gainer V, Cai T, Hoffnagle AG, Dai Y, Block S. Validation of electronic health record phenotyping of bipolar disorder cases and controls. American Journal of Psychiatry 2015; 172 (04) 363-372.

Reference Link Ris
Crossref PubMed Search in Google Scholar
20 Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. The Oxford University Press; 2013

Reference Link Ris
Search in Google Scholar
21 Chapman WW, Dowling JN, Ivanov O, Gesteland PH, Olszewski R, Espino JU, Wagner MM. editors. Evaluating natural language processing applications applied to outbreak and disease surveillance. Proceedings of 36th symposium on the interface: computing science and statistics. 2004 Citeseer.

Reference Link Ris
PubMed Search in Google Scholar
22 Chapman WW, Gundlapalli AV, South BR, Dowling JN. Natural language processing for biosurveillance. In: Castillo-Chavez C, Chen H, Lober WB, Thurmond M, Zeng D. editors. Infectious Disease Informatics and Biosurveillance. Springer; 2011. p. 279-310.

Reference Link Ris
Search in Google Scholar
23 Dublin S, Baldwin E, Walker RL, Christensen LM, Haug PJ, Jackson ML, Nelson JC, Ferraro J, Carrell D, Chapman WW. Natural Language Processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf 2013; 22 (08) 834-841.

Reference Link Ris
Crossref PubMed Search in Google Scholar
24 Gundlapalli AV, Carter ME, Palmer M, Ginter T, Redd A, Pickard S, Shen S, South B, Divita G, Duvall S. Using natural language processing on the free text of clinical documents to screen for evidence of homelessness among US veterans. AMIA Annu Symp Proc 2013; Nov 16 2013: 537-546.

Reference Link Ris
PubMed
25 Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support?. J Biomed Inform 2009; 42 (05) 760-772.

Reference Link Ris
Crossref PubMed Search in Google Scholar
26 Elkin PL, Froehling DA, Wahner-Roedler DL, Brown SH, Bailey KR. Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Annals of Internal Medicine 2012; 156 1_Part_1 11-18.

Reference Link Ris
Crossref PubMed Search in Google Scholar
27 Lippincott T, Séaghdha DÓ, Korhonen A. Exploring subdomain variation in biomedical language. BMC Bioinformatics 2011; 12 (01) 1.

Reference Link Ris
Crossref PubMed Search in Google Scholar
28 Chapman WW, Nadkarni PM, Hirschman L, D‘Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 2011; 18 (05) 540-543.

Reference Link Ris
Crossref PubMed Search in Google Scholar
29 Daumé III H. Frustratingly easy domain adaptation. Proc 45th Ann Meeting of the Assoc Compuational Linguistics 2007; 45 (01) 256-263.

Reference Link Ris
PubMed Search in Google Scholar
30 Dredze M, Blitzer J, Talukdar PP, Ganchev K, Graca J, Pereira FC. Frustratingly Hard Domain Adaptation for Dependency Parsing. Conference on Empirical Methods in Natural Language Processing 2007: 1051-1055.

Reference Link Ris
PubMed
31 Ferraro JP, Daumé H, DuVall SL, Chapman WW, Harkema H, Haug PJ. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. J Am Med Inform Assoc 2013; 20 (05) 931-939.

Reference Link Ris
Crossref PubMed Search in Google Scholar
32 Teixeira PL, Wei W-Q, Cronin RM, Mo H, VanHouten JP, Carroll RJ, LaRose E, Bastarache LA, Rosen-bloom ST, Edwards TL. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. Journal of the American Medical Informatics Association. 2016 ocw071.

Reference Link Ris
PubMed Search in Google Scholar
33 Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA, Boomershine CS, Lasko TA, Xu H. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. Journal of the American Medical Informatics Association 2012; 19 e1 e162-e169.

Reference Link Ris
Crossref PubMed Search in Google Scholar
34 Tsui F, Wagner M, Cooper G, Que J, Harkema H, Dowling J, Sriburadej T, Li Q, Espino J, Voorhees R. Probabilistic case detection for disease surveillance using data in electronic medical records. Online J Public Health Inform. 2011 3(3).

Reference Link Ris
PubMed Search in Google Scholar
35 Russell S, Norvig P. Artificial Intelligence: A Modern Approach. Prentice Hall: 2009. p. 272-319.

Reference Link Ris
Search in Google Scholar
36 Bodenreider O. The Unified Medical Language Yystem (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; 32 (01) D267-D270.

Reference Link Ris
Crossref PubMed Search in Google Scholar
37 Samore MH. Natutal language processing: Can it help detect cases and characterize outbreaks?. Advances in Disease Surveillance. 2008 5(59).

Reference Link Ris
PubMed Search in Google Scholar
38 Pineda AL, Tsui F-C, Visweswaran S, Cooper GF. Detection of patients with influenza syndrome using machine-learning models learned from emergency department reports. Online J Public Health Inform. 2013 5(1).

Reference Link Ris
PubMed Search in Google Scholar
39 Mehrabi S, Wang Y, Ihrke D, Liu H. Exploring Gaps of Family History Documentation in EHR for Precision Medicine-A Case Study of Familial Hypercholesterolemia Ascertainment. AMIA Summits on Translational Science Proceedings 2016; 2016: 160.

Reference Link Ris
PubMed Search in Google Scholar
40 Sohn S, Wi C-i Krusemark EA, Liu H, Ryu E, Wu S, Juhn YJ. Assessment of Asthma Progression Determined by Natural Language Processing to Improve Asthma Care and Research in the Era of Electronic Medical Records. The Journal of Allergy and Clinical Immunology 2017; 139 (02) AB100.

Reference Link Ris
PubMed Search in Google Scholar
41 Liu H, Bielinski SJ, Sohn S, Murphy S, Kavishwar BW, Jonnalagadda SR, Ravikumar KE, Wu ST, Kullo IJ, Chute CG. An information extraction framework for cohort identification using electronic health records. AMIA Jt Summits Transl Sci Proc 2013: 149-153.

Reference Link Ris
PubMed
42 Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 2010; 17 (05) 507-513.

Reference Link Ris
Crossref PubMed Search in Google Scholar
43 Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering 2004; 10 (3–4) 327-348.

Reference Link Ris
Crossref PubMed Search in Google Scholar
44 Denny JC, Spickard A, Johnson KB, Peterson NB, Peterson JF, Miller RA. Evaluation of a method to identify and categorize section headers in clinical documents. Journal of the American Medical Informatics Association 2009; 16 (06) 806-815.

Reference Link Ris
Crossref PubMed Search in Google Scholar
45 Darwiche A. Modeling and reasoning with Bayesian networks . Cambridge University Press; 2009

Reference Link Ris
Search in Google Scholar
46 Ferraro JP, Allen TL, Briggs B, Haug P, Post H. editors. Development and function of a real-time web-based screening system for emergency department patients with occult septic shock. 2008 Annual Meeting –Socity for Academic Emergency Medicine; 2008. Washington, DC:

Reference Link Ris
Search in Google Scholar
47 Leng J, Shen S, Gundlapalli A, South B. editors. The Extensible Human Oracle Suite of Tools (eHOST) for Annotation of Clinical Narratives. AMIA Spring Congress; 2010. Phoenix, AZ:

Reference Link Ris
Search in Google Scholar
48 Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971; 76 (05) 378.

Reference Link Ris
Crossref PubMed Search in Google Scholar
49 Cooper G, Herskovits E. A Bayesian method for the induction of probabilistic networks from data. Machine learning. 1992; 9 (04) 309-347.

Reference Link Ris
PubMed Search in Google Scholar
50 Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 1995: 289-300.

Reference Link Ris
PubMed
51 Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes in C: The Art of Scientific Computing. 3rd ed. New York, NY: Cambridge University Press; 2007

Reference Link Ris
Search in Google Scholar
52 DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988: 837-845.

Reference Link Ris
PubMed
53 Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association 2010; 17 (03) 229-236.

Reference Link Ris
Crossref PubMed Search in Google Scholar
54 Patterson O, Hurdle JF. editors. Document clustering of clinical narratives: a systematic study of clinical sublanguages. AMIA Annu Symp Proc; 2011. Citeseer:

Reference Link Ris
Search in Google Scholar
55 Ferraro JP, Daumé H, DuVall SL, Chapman WW, Harkema H, Haug PJ. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. Journal of the American Medical Informatics Association 2013; 20 (05) 931-939.

Reference Link Ris
Crossref PubMed Search in Google Scholar
56 Cooper GF, Villamarin R, Tsui F-CR, Millett N, Espino JU, Wagner MM. A method for detecting and characterizing outbreaks of infectious disease from clinical reports. Journal of biomedical informatics 2015; 53: 15-26.

Reference Link Ris
Crossref PubMed Search in Google Scholar
57 Pineda AL, Ye Y, Visweswaran S, Cooper GF, Wagner MM, Tsui FR. Comparison of machine learning classifiers for influenza detection from emergency department free-text reports. Journal of Biomedical Informatics 2015; 58: 60-69.

Reference Link Ris
Crossref PubMed Search in Google Scholar
58 Shi Y, Sha F. Information-theoretical learning of discriminative clusters for unsupervised domain adaptation. Proceedings of International Conference on Machine Learning 2012: 1079-1086.

Reference Link Ris
PubMed
59 Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW. A theory of learning from different domains. Mach Learn 2010; 79 (1–2) 151-175.

Reference Link Ris
Crossref PubMed Search in Google Scholar
60 Blitzer J, Kakade S, Foster DP. editors. Domain adaptation with coupled subspaces. International Conference on Artificial Intelligence and Statistics. 2011

Reference Link Ris
PubMed Search in Google Scholar

Related Journals

Subscribe to RSS

Share / Bookmark

The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance

Authors

Publication History

Summary

Keywords

References