Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2

W. Chen; R. Kowatch; S. Lin; M. Splaingard; Y. Huang

doi:10.4338/ACI-2014-11-RA-0106

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035026.xml

PDF herunterladen

Appl Clin Inform 2015; 06(02): 345-363
DOI: 10.4338/ACI-2014-11-RA-0106

Research Article

Schattauer GmbH

Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2

Autoren

W. Chen

¹Research Information Solutions and Innovations
R. Kowatch

²Center for Innovation in Pediatric Practice
S. Lin

¹Research Information Solutions and Innovations
M. Splaingard

³Sleep Disorder Center, Nationwide Children’s Hospital, Columbus, OH
Y. Huang

¹Research Information Solutions and Innovations

Weitere Informationen

Publikationsverlauf

received: 25. November 2014

accepted: 23. Februar 2015

Publikationsdatum:
19. Dezember 2017 (online)

Auch verfügbar auf

Lizenzen und Reprints

Summary

Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semi-structured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible.

Objective: We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents.

Methods: We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge.

Results: 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds.

Conclusion: Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.

Citation: Chen W, Kowatch R, Lin S, Splaingard M, Huang Y. Interactive cohort identification of sleep disorder patients using natural language processing and i2b2. Appl Clin Inf 2015; 6: 345–363

http://dx.doi.org/10.4338/ACI-2014-11-RA-0106

Keywords

Sleep disorder - cohort identification - natural language processing (NLP) - i2b2 - clinical ontology

References
1 Profile C. Cohort profile: the Swiss HIV Cohort study. International journal of epidemiology 2010; 39: 1179-1189.

Crossref PubMed Suche in Google Scholar
Download RIS citation
2 Hoang PD, Cameron MH, Gandevia SC, Lord SR. Neuropsychological, Balance, and Mobility Risk Factors for Falls in People With Multiple Sclerosis: A Prospective Cohort Study. Archives of physical medicine and rehabilitation 2014; 95 (03) 480-486.

Crossref PubMed Suche in Google Scholar
Download RIS citation
3 Oh J, Kang S-M, Hong N, Youn J-C, Park S, Lee S-H, Choi D. Comparison of pooled cohort risk equations and Framingham risk score for metabolic syndrome in a Korean community-based population. International journal of cardiology 2014; 176 (03) 1154-1155.

Crossref PubMed Suche in Google Scholar
Download RIS citation
4 Marcus CL, Moore RH, Rosen CL, Giordani B, Garetz SL, Taylor HG, Mitchell RB, Amin R, Katz ES, Arens R. A randomized trial of adenotonsillectomy for childhood sleep apnea. New England Journal of Medicine 2013; 368 (25) 2366-2376.

Crossref PubMed Suche in Google Scholar
Download RIS citation
5 Müller F, Christ-Crain M, Bregenzer T, Krause M, Zimmerli W, Mueller B, Schuetz P. Procalcitonin Levels Predict Bacteremia in Patients With Community-Acquired PneumoniaA Prospective Cohort Trial. CHEST Journal 2010; 138 (01) 121-129.

Crossref PubMed Suche in Google Scholar
Download RIS citation
6 Shibasaki M, Nakajima Y, Shime N, Sawa T, Sessler DI. Prediction of optimal endotracheal tube cuff volume from tracheal diameter and from patient height and age: a prospective cohort trial. Journal of anesthesia 2012; 26 (04) 536-540.

Crossref PubMed Suche in Google Scholar
Download RIS citation
7 Hahn U, Krummenauer F, Kölbl B, Neuhann T, Schayan-Araghi K, Schmickler S, von Wolff K, Weindler J, Will T, Neuhann I. Determination of valid benchmarks for outcome indicators in cataract surgery: a multicenter, prospective cohort trial. Ophthalmology 2011; 118 (11) 2105-2112.

Crossref PubMed Suche in Google Scholar
Download RIS citation
8 Jain M, Harrison L, Howe G, Miller A. Evaluation of a self-administered dietary questionnaire for use in a cohort study. The American journal of clinical nutrition 1982; 36 (05) 931-935.

Crossref PubMed Suche in Google Scholar
Download RIS citation
9 Olsen J, Melbye M, Olsen SF, Sørensen TI, Aaby P, Andersen A-MN, Taxbøl D, Hansen KD, Juhl M, Schow TB. The Danish National Birth Cohort-its background, structure and aim. Scandinavian journal of public health 2001; 29 (04) 300-307.

PubMed Suche in Google Scholar
Download RIS citation
10 Wacholder S. Practical considerations in choosing between the case-cohort and nested case-control designs. Epidemiology 1991: 155-158.

Download RIS citation
11 Schneeweiss S, Stürmer T, Maclure M. Case–crossover and case–time–control designs as alternatives in pharmacoepidemiologic research. Pharmacoepidemiology and drug safety 1997; 6 S3 S51-S59.

Crossref PubMed Suche in Google Scholar
Download RIS citation
12 Jurafsky D, James H. Speech and language processing an introduction to natural language processing, computational linguistics, and speech. 2000

PubMed Suche in Google Scholar
Download RIS citation
13 Bekhuis T, Kreinacke M, Spallek H, Song M, O’Donnell JA. Using natural language processing to enable in-depth analysis of clinical messages posted to an Internet mailing list: a feasibility study. Journal of medical Internet research 2011; 13 (04) e98.

Crossref PubMed Suche in Google Scholar
Download RIS citation
14 Wu ST, Sohn S, Ravikumar K, Wagholikar K, Jonnalagadda SR, Liu H, Juhn YJ. Automated chart review for asthma cohort identification using natural language processing: an exploratory study. Annals of Allergy, Asthma & Immunology 2013; 111 (05) 364-369.

Crossref PubMed Suche in Google Scholar
Download RIS citation
15 Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. Journal of the American Medical Informatics Association 2010; 17 (01) 19-24.

Crossref PubMed Suche in Google Scholar
Download RIS citation
16 Chen W, Fosler-Lussier E, Xiao N, Raje S, Ramnath R, Sui D. editors. A Synergistic Framework for Geographic Question Answering. Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on; 2013: 94-99.

Download RIS citation
17 Wu ST, Liu H, Li D, Tao C, Musen MA, Chute CG, Shah NH. Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis. Journal of the American Medical Informatics Association 2012; 19 e1 e149-156.

Crossref PubMed Suche in Google Scholar
Download RIS citation
18 Garvin JH, DuVall SL, South BR, Bray BE, Bolton D, Heavirland J, Pickard S, Heidenreich P, Shen S, Weir C.. Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure. Journal of the American Medical Informatics Association 2012; 19 (Suppl. 05) 859-866.

Crossref PubMed Suche in Google Scholar
Download RIS citation
19 Doan S, Conway M, Phuong TM, Ohno-Machado L. Natural language processing in biomedicine: a unified system architecture overview. Methods in molecular biology (Clifton, NJ) 2013; 1168: 275-294.

Suche in Google Scholar
Download RIS citation
20 Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association 2010; 17 (03) 229-236.

Crossref PubMed Suche in Google Scholar
Download RIS citation
21 Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 2010; 17 (05) 507-513.

Crossref PubMed Suche in Google Scholar
Download RIS citation
22 Osborne JD, Lin S, Zhu LJ, Kibbe WA. Mining biomedical data using MetaMap Transfer (MMtx) and the Unified Medical Language System (UMLS). Gene Function Analysis: Springer; 2007. p. 153-69.

Suche in Google Scholar
Download RIS citation
23 Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Journal of the American Medical Informatics Association 2011; 18 (05) 601-606.

Crossref PubMed Suche in Google Scholar
Download RIS citation
24 Tang B, Cao H, Wu Y, Jiang M, Xu H. editors. Clinical entity recognition using structural support vector machines with rich features. Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. 2012 ACM.

PubMed Suche in Google Scholar
Download RIS citation
25 Tang B, Cao H, Wu Y, Jiang M, Xu H. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. BMC medical informatics and decision making 2013; 13 (Suppl. 01) S1.

PubMed Suche in Google Scholar
Download RIS citation
26 Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics 2012; 13 (06) 395-405.

Crossref PubMed Suche in Google Scholar
Download RIS citation
27 Zhu D, Wu S, Carterette B, Liu H. Using large clinical corpora for query expansion in text-based cohort identification. Journal of biomedical informatics 2014; 49: 275-281.

Crossref PubMed Suche in Google Scholar
Download RIS citation
28 Murphy SN, Wilcox A. Mission and Sustainability of Informatics for Integrating Biology and the Bedside (i2b2). eGEMs (Generating Evidence & Methods to improve patient outcomes) 2014; 2 (02) 7.

Suche in Google Scholar
Download RIS citation
29 Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, Kohane I. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association 2010; 17 (02) 124-130.

Crossref PubMed Suche in Google Scholar
Download RIS citation
30 Natter MD, Quan J, Ortiz DM, Bousvaros A, Ilowite NT, Inman CJ, Marsolo K, McMurry AJ, Sandborg CI, Schanberg LE. An i2b2-based, generalizable, open source, self-scaling chronic disease registry. Journal of the American Medical Informatics Association 2013; 20 (01) 172-179.

Crossref PubMed Suche in Google Scholar
Download RIS citation
31 Moser R, Boyer E, Lupinski D, Darer J, Anderer T, Villareal A, Berger P. C-B4–02: Enhancing the Quality and Efficiency of Obstructive Sleep Apnea Screening Using Health Information Technology: Results of a Geisinger Clinic Pilot Study. Clinical medicine & research 2011; 9 3–4 170-171.

Suche in Google Scholar
Download RIS citation
32 Zhang G-Q, Cui L, Teagno J, Kaebler D, Koroukian S, Xu R. Merging Ontology Navigation with Query Construction for Web-based Medicare Data Exploration. AMIA Summits on Translational Science Proceedings 2013; 2013: 285.

Suche in Google Scholar
Download RIS citation
33 Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC medical informatics and decision making 2006; 6 (01) 30.

Crossref PubMed Suche in Google Scholar
Download RIS citation
34 Chen D, Manning CD. A fast and accurate dependency parser using neural networks. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014: 740-750.

Download RIS citation
35 Socher R, Lin CC, Manning C, Ng AY. Parsing natural scenes and natural language with recursive neural networks. Proceedings of the 28th International Conference on Machine Learning (ICML-11) 2011: 129-136.

Download RIS citation
36 Socher R, Manning CD, Ng AY. Learning continuous phrase representations and syntactic parsing with recursive neural networks. Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop 2010: 1-9.

Download RIS citation
37 Chen W. editor Context-based Natural Language Processing for GIS-based Vague Region Visualization. Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science; 2014: Association for Computational Linguistics.

Download RIS citation
38 Klein D, Manning CD. editors. Accurate unlexicalized parsing. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1; 2003: Association for Computational Linguistics.

Download RIS citation
39 Klein D, Manning CD. editors. Fast exact inference with a factored model for natural language parsing. Advances in neural information processing systems; 2002

PubMed Suche in Google Scholar
Download RIS citation
40 Cohen WW, Sarawagi S. editors. Exploiting dictionaries in named entity extraction: combining semi-markov extraction processes and data integration methods. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining; 2004 : ACM.

PubMed Suche in Google Scholar
Download RIS citation
41 Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse DA, Xu H. editors. A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries. AMIA Annual Symposium Proceedings; 2012 American Medical Informatics Association.

PubMed Suche in Google Scholar
Download RIS citation
42 Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics 2001; 34 (05) 301-310.

Crossref PubMed Suche in Google Scholar
Download RIS citation
43 Ristad ES, Yianilos PN. Learning string-edit distance. Pattern Analysis and Machine Intelligence, IEEE Transactions on 1998; 20 (05) 522-532.

Crossref Suche in Google Scholar
Download RIS citation

Ähnliche Zeitschriften

RSS-Feed abonnieren

Teilen / Bookmarken

Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2

Autoren

Publikationsverlauf

Summary

Keywords

References