Automating case definitions using literature-based reasoning

T. Botsis; R. Ball

doi:10.4338/ACI-2013-04-RA-0028

Applied Clinical Informatics, Inhaltsverzeichnis

Appl Clin Inform 2013; 04(04): 515-527
DOI: 10.4338/ACI-2013-04-RA-0028

Research Article

Schattauer GmbH

Automating case definitions using literature-based reasoning

Authors

T. Botsis

¹Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research (CBER), Food and Drug Administration (FDA), Rockville, MD

²Department of Computer Science, University of Tromsø, Tromsø, Norway
R. Ball

¹Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research (CBER), Food and Drug Administration (FDA), Rockville, MD

Abstract

Summary

Background: Establishing a Case Definition (CDef) is a first step in many epidemiological, clinical, surveillance, and research activities. The application of CDefs still relies on manual steps and this is a major source of inefficiency in surveillance and research.

Objective: Describe the need and propose an approach for automating the useful representation of CDefs for medical conditions.

Methods: We translated the existing Brighton Collaboration CDef for anaphylaxis by mostly relying on the identification of synonyms for the criteria of the CDef using the NLM MetaMap tool. We also generated a CDef for the same condition using all the related PubMed abstracts, processing them with a text mining tool, and further treating the synonyms with the above strategy. The co-occur-rence of the anaphylaxis and any other medical term within the same sentence of the abstracts supported the construction of a large semantic network. The ‘islands’ algorithm reduced the network and revealed its densest region including the nodes that were used to represent the key criteria of the CDef. We evaluated the ability of the “translated” and the “generated” CDef to classify a set of 6034 H1N1 reports for anaphylaxis using two similarity approaches and comparing them with our previous semi-automated classification approach.

Results: Overall classification performance across approaches to producing CDefs was similar, with the generated CDef and vector space model with cosine similarity having the highest accuracy (0.825±0.003) and the semi-automated approach and vector space model with cosine similarity having the highest recall (0.809±0.042). Precision was low for all approaches.

Conclusion: The useful representation of CDefs is a complicated task but potentially offers substantial gains in efficiency to support safety and clinical surveillance.

Citation: Botsis T, Ball R. Automating case definitions using literature-based reasoning. Appl Clin Inf 2013; 4: 515–527

http://dx.doi.org/10.4338/ACI-2013-04-RA-0028

Keywords

Case definition - safety surveillance - semantic networks - literature-based reasoning - anaphylaxis - similarity

Volltext

Referenzen

References
1 Merrill R. Introduction to Epidemiology. 5th ed. Jones & Bartlett Learning; 2010
2 Ghanaie RM, Karimi A, Sadeghi H, Esteghamti A, Falah F, Armin S, Fahimzad A, Shamshiri A, Kahbazi M, Shiva F. Sensitivity and specificity of the World Health Organization pertussis clinical case definition. International Journal of Infectious Diseases 2010; 14 (12) e1072-e1075.
3 CDC.. National Notifiable Diseases Surveillance System (NNDSS). December 7, 2012. Available from: http://wwwn.cdc.gov/nndss.
4 Koo D, Wharton M, Birkhead G. Case Definitions for Infectious Conditions Under Public Health Surveil-lance. MMWR Recomm Rep 1997; 46 RR-10 1-64.
5 Wharton M, Chorba TL, Vogt RL, Morse DL, Buehler JW. Case definitions for public health surveillance. MMWR Recomm Rep 1990; 39 RR-13 1-43.
6 Bonhoeffer J, Kohl K, Chen R, Duclos P, Heijbel H, Heininger U, Jefferson T, Loupi E. The Brighton Collaboration: addressing the need for standardized case definitions of adverse events following immunization (AEFI). Vaccine 2002; 21 (03) 298-302.
7 Ball R, Halsey N, Braun MM, Moulton LH, Gale AD, Rammohan K, Wiznitzer M, Johnson R, Salive ME. Development of case definitions for acute encephalopathy, encephalitis, and multiple sclerosis reports to the Vaccine Adverse Event Reporting System. Journal of Clinical Epidemiology 2002; 55 (08) 819-824.
8 Berry SH, Bogart LM, Pham C, KARIN LIU, Nyberg L, Stoto M, Suttorp M, Clemens JQ. Development, validation and testing of an epidemiological case definition of interstitial cystitis/painful bladder syndrome. The Journal of Urology 2010; 183 (05) 1848-1852.
9 Bines JE, Ivanoff B, Justice F, Mulholland K. Clinical case definition for the diagnosis of acute intussusception. Journal of Pediatric Gastroenterology and Nutrition 2004; 39 (05) 511-518.
10 Eisenhardt KM. Building theories from case study research. Academy of Management Review 1989; 532-550.
11 Hullermeier E. Case-based approximate reasoning. 44 ed. Springer; 2007
12 Cunningham A, Stein CM, Chung CP, Daugherty JR, Smalley WE, Ray WA. An automated database case definition for serious bleeding related to oral anticoagulant use. Pharmacoepidemiology and Drug Safety 2011; 20 (06) 560-566.
13 Leslie WD, Lix LM, Yogendran MS. Validation of a case definition for osteoporosis disease surveillance. Osteoporosis International 2011; 22 (01) 37-46.
14 Reid AY. et al. Development and validation of a case definition for epilepsy for use with administrative health data. Epilepsy Res 2012; 102 (03) 173-179.
15 Parks S, Sugerman D, Xu L, Coronado V. Characteristics of non-fatal abusive head trauma among children in the USA, 2003–2008: application of the CDC operational case definition to national hospital inpatient data. Injury Prevention 2012; 18 (06) 392-398.
16 Desai JR, Wu P, Nichols GA, Lieu TA, O’ Connor PJ. Diabetes and Asthma Case identification, validation, and representativeness when using electronic health data to construct registries for comparative effectiveness and epidemiologic research. Medical Care 2012; 50: S30-S35.
17 Afzal Z, Schuemie MJ, van Blijderveen JC, Sen EF, Sturkenboom MC, Kors JA. Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records. BMC Medical Informatics and Decision Making 2013; 13 (01) 1-11.
18 Kohl KS, Magnus M, Ball R, Halsey N, Shadomy S, Farley TA. Applicability, reliability, sensitivity, and specificity of six Brighton Collaboration standardized case definitions for adverse events following immunization. Vaccine 2008; 26 (050) 6349-6360.
19 Ruggeberg JU, Gold MS, Bayas JM, Blum MD, Bonhoeffer J, Friedlander S, de Souza BG, Heininger U, Imoukhuede B, Khamesipour A. Anaphylaxis: case definition and guidelines for data collection, analysis, and presentation of immunization safety data. Vaccine 2007; 25 (31) 5675.
20 Botsis T, Nguyen MD, Woo EJ, Markatou M, Ball R. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. Journal of the American Medical Informatics Association 2011; 18 (05) 631-638.
21 Cao H, Melton GB, Markatou M, Hripcsak G. Use abstracted patient-specific features to assist an information-theoretic measurement to assess similarity between medical cases. Journal of Biomedical Informatics 2008; 41 (06) 882-888.
22 Batet M, Sánchez D, Valls A. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics 2011; 44 (01) 118-125.
23 Begum S, Ahmed MU, Funk P, Xiong N, Von Scheele B. A case-based decision support system for individual stress diagnosis using fuzzy similarity matching. Computational Intelligence 2009; 25 (03) 180-195.
24 Huang ML, Hung YH, Lee WM, Li RK, Wang TH. Usage of case-based reasoning, neural network and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification diagnosis. Journal of Medical Systems 2012; 36 (02) 407-414.
25 van den Branden M, Wiratunga N, Burton D, Craw S. Integrating case-based reasoning with an electronic patient record system. Artificial Intelligence in Medicine 2011; 51 (Suppl. 02) 117-123.
26 Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of diseaseGÇôdrug knowledge from biomedical and clinical documents: an initial study. Journal of the American Medical Informatics Association 2008; 15 (01) 87-98.
27 Markatou M, Don PK, Hu J, Wang F, Sun J, Sorrentino R, Ebadollahi S. Case-based reasoning in comparative effectiveness research. IBM Journal of Research and Development 2012; 56 (05) 4-1.
28 Bichindaritz I, Marling C. Case-based reasoning in the health sciences: What’s next?. Artificial Intelligence in Medicine 2006; 36 (02) 127-135.
29 Letang E, Naniche D, Bower M, Miro JM. Kaposi sarcoma-associated immune reconstitution inflammatory syndrome: In need of a specific case definition. Clinical Infectious Diseases 2012; 55 (01) 157-158.
30 Botsis T, Buttolph T, Nguyen MD, Winiecki S, Woo EJ, Ball R. Vaccine adverse event text mining system for extracting features from vaccine safety reports. Journal of the American Medical Informatics Association 2012; 19 (06) 1011-1018.
31 Aronson AR. Metamap: Mapping text to the UMLS metathesaurus. Bethesda, MD: NLM, NIH, DHHS; 2006
32 Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association 2010; 17 (03) 229-236.
33 Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel HP. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics 2008; 9 (01) 207.
34 Spacic J, Jensen LJ, Ouzounova R, Rojas I, Bork P. Extraction of regulatory gene/protein networks from Medline. Bioinformatics 2006; 22 (06) 645-650.
35 Cameron D, Bodenreider O, Yalamanchili H, Danh T, Vallabhaneni S, Thirunarayan K, Sheth AP, Rind-flesch TC. A graph-based recovery and decomposition of swanson’s hypothesis using semantic predications. Journal of Biomedical Informations 2013; 46 (02) 238-251.
36 Miller CM, Rindflesch TC, Fiszman M, Hristovski D, Shin D, Rosemblat G, Zhang H, Strohl KP. A closed literature-based discovery technique finds a mechanistic link between hypogonadism and diminished sleep quality in aging men. Sleep 2012; 35 (02) 279.
37 Wilkowski B, Fiszman M, Miller CM, Hristovski D, Arabandi S, Rosemblat G, Rindflesch TC. Graph-Based Methods for Discovery Browsing with Semantic Predications. American Medical Informatics Association; Annual Meeting 2011. p. 1514.
38 Yetisgen-Yildiz M, Pratt W. A new evaluation methodology for literature-based discovery systems. Journal of Biomedical Informatics 2009; 42 (04) 633.
39 Coulet A, Shah NH, Garten Y, Musen M, Altman RB. Using text to build semantic networks for pharmacogenomics. Journal of Biomedical Informatics 2010; 43 (06) 1009-1019.
40 Fundel K, Kuffner R, Zimmer R. RelEx-Relation extraction using dependency parse trees. Bioinformatics 2007; 23 (03) 365-371.
41 Barnickel T, Weston J, Collobert R, Mewes HW, Stümpflen V. Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts. PLoS One 2009; 4 (07) e6393.
42 Bethard S, Lu Z, Martin JH, Hunter L. Semantic role labeling for protein transport predicates. BMC Bioinformatics 2008; 9 (01) 277.
43 Kogan Y, Collier N, Pakhomov S, Krauthammer M. Towards semantic role labeling & IE in the medical literature. American Medical Informatics Association Annual Meeting. 2005: 410.
44 Zaversnik M, Batagelj V. Islands. Sunbelt XXIV Portoroz, Slovenia.
45 De Nooy W, Mrvar A, Batagelj V. Exploratory social network analysis with Pajek. 34 ed. Cambridge Univ Press; 2011
46 Ball R, Botsis T. Can Network Analysis Improve Pattern Recognition Among Adverse Events Following Immunization Reported to VAERS&quest. Clinical Pharmacology & Therapeutics 2011; 90 (02) 271-278.
47 NLM.. UMLS® Reference Manual. September 2009. Available from http://www.ncbi.nlm.nih.gov/booksNBK9676/
48 Brown EG, Wood L, Wood S. The medical dictionary for regulatory activities (MedDRA). Drug Safety 1999; 20 (Suppl. 02) 109-117.
49 Manning CD, Raghavan P, Schutze H. Introduction to information retrieval. 1 ed. Cambridge University Press Cambridge; 2008
50 Lin D. An information-theoretic definition of similarity. ICML. 1998: 296-304.
51 Aslam JA, Frost M. An information-theoretic measure for document similarity. SIGIR. 2003: 449-450.
52 Kohl KS, Bonhoeffer J, Braun MM, Chen RT, Duclos P, Heijbel H, Heininger U, Loupi E. The Brighton Collaboration: Creating a global standard for case definitions (and guidelines) for adverse events following immunization. Advances in Patient Safety 2005; 2: 87-102.
53 van Haagen HH, ‘t Hoen P, Bovo AB, de Morree A, van Mulligen EM, Chichester C, Kors JA, den Dunnen JT, van Ommen GJB, van der Maarel SM. Novel protein-protein interactions inferred from literature context. PLoS One 2009; 4 (011) e7894.
54 Cohen T, Widdows D. Empirical distributional semantics: methods and biomedical applications. Journal of Biomedical Informatics 2009; 42 (02) 390-405.
55 Huang K, Geller J, Halper M, Cimino JJ. Piecewise synonyms for enhanced UMLS source terminology integration. American Medical Informatics Association Annual Meeting. 2007: 339.
56 Huang KC, Geller J, Halper M, Perl Y, Xu J. Using WordNet synonym substitution to enhance UMLS source integration. Artificial Intelligence in Medicine 2009; 46 (02) 97-109.
57 Ozgur A, Xiang Z, Radev DR, He Y. Literature-based discovery of IFN and vaccine-Mediated Gene Interaction Networks. Journal of Biomedicine and Biotechnology 2010; 2010: 426479.
58 Schuemie MJ, Kors JA, Mons B. Word sense disambiguation in the biomedical domain: an overview. Journal of Computational Biology 2005; 12 (05) 554-565.
59 Xu H, Markatou M, Dimova R, Liu H, Friedman C. Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinformatics 2006; 7 (01) 334.
60 Cheng XQ, Ren FX, Zhou S, Hu MB. Triangular clustering in document networks. New Journal of Physics 2009; 11 (03) 033019.
61 Swanson DR. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine 1986; 30 (01) 7.

Zusatzmaterial

Zusatzmaterial (PDF)