Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2

W. Chen; R. Kowatch; S. Lin; M. Splaingard; Y. Huang

doi:10.4338/ACI-2014-11-RA-0106

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035026.xml

Share / Bookmark

Facebook Linkedin Weibo

Download PDF

Appl Clin Inform 2015; 06(02): 345-363
DOI: 10.4338/ACI-2014-11-RA-0106

Research Article

Schattauer GmbH

Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2

W. Chen

¹Research Information Solutions and Innovations

,

R. Kowatch

²Center for Innovation in Pediatric Practice

,

S. Lin

¹Research Information Solutions and Innovations

,

M. Splaingard

³Sleep Disorder Center, Nationwide Children’s Hospital, Columbus, OH

,

Y. Huang

¹Research Information Solutions and Innovations

› Author Affiliations

Further Information

Publication History

received: 25 November 2014

accepted: 23 February 2015

Publication Date:
19 December 2017 (online)

Also available at

Abstract
Full Text
References

Permissions and Reprints

Summary

Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semi-structured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible.

Objective: We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents.

Methods: We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge.

Results: 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds.

Conclusion: Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.

Citation: Chen W, Kowatch R, Lin S, Splaingard M, Huang Y. Interactive cohort identification of sleep disorder patients using natural language processing and i2b2. Appl Clin Inf 2015; 6: 345–363

http://dx.doi.org/10.4338/ACI-2014-11-RA-0106

Keywords

Sleep disorder - cohort identification - natural language processing (NLP) - i2b2 - clinical ontology

References
1 Profile C. Cohort profile: the Swiss HIV Cohort study. International journal of epidemiology 2010; 39: 1179-1189.

MissingFormLabel
Crossref PubMed Search in Google Scholar
2 Hoang PD, Cameron MH, Gandevia SC, Lord SR. Neuropsychological, Balance, and Mobility Risk Factors for Falls in People With Multiple Sclerosis: A Prospective Cohort Study. Archives of physical medicine and rehabilitation 2014; 95 (03) 480-486.

MissingFormLabel
Crossref PubMed Search in Google Scholar
3 Oh J, Kang S-M, Hong N, Youn J-C, Park S, Lee S-H, Choi D. Comparison of pooled cohort risk equations and Framingham risk score for metabolic syndrome in a Korean community-based population. International journal of cardiology 2014; 176 (03) 1154-1155.

MissingFormLabel
Crossref PubMed Search in Google Scholar
4 Marcus CL, Moore RH, Rosen CL, Giordani B, Garetz SL, Taylor HG, Mitchell RB, Amin R, Katz ES, Arens R. A randomized trial of adenotonsillectomy for childhood sleep apnea. New England Journal of Medicine 2013; 368 (25) 2366-2376.

MissingFormLabel
Crossref PubMed Search in Google Scholar
5 Müller F, Christ-Crain M, Bregenzer T, Krause M, Zimmerli W, Mueller B, Schuetz P. Procalcitonin Levels Predict Bacteremia in Patients With Community-Acquired PneumoniaA Prospective Cohort Trial. CHEST Journal 2010; 138 (01) 121-129.

MissingFormLabel
Crossref PubMed Search in Google Scholar
6 Shibasaki M, Nakajima Y, Shime N, Sawa T, Sessler DI. Prediction of optimal endotracheal tube cuff volume from tracheal diameter and from patient height and age: a prospective cohort trial. Journal of anesthesia 2012; 26 (04) 536-540.

MissingFormLabel
Crossref PubMed Search in Google Scholar
7 Hahn U, Krummenauer F, Kölbl B, Neuhann T, Schayan-Araghi K, Schmickler S, von Wolff K, Weindler J, Will T, Neuhann I. Determination of valid benchmarks for outcome indicators in cataract surgery: a multicenter, prospective cohort trial. Ophthalmology 2011; 118 (11) 2105-2112.

MissingFormLabel
Crossref PubMed Search in Google Scholar
8 Jain M, Harrison L, Howe G, Miller A. Evaluation of a self-administered dietary questionnaire for use in a cohort study. The American journal of clinical nutrition 1982; 36 (05) 931-935.

MissingFormLabel
Crossref PubMed Search in Google Scholar
9 Olsen J, Melbye M, Olsen SF, Sørensen TI, Aaby P, Andersen A-MN, Taxbøl D, Hansen KD, Juhl M, Schow TB. The Danish National Birth Cohort-its background, structure and aim. Scandinavian journal of public health 2001; 29 (04) 300-307.

MissingFormLabel
PubMed Search in Google Scholar
10 Wacholder S. Practical considerations in choosing between the case-cohort and nested case-control designs. Epidemiology 1991: 155-158.

MissingFormLabel
PubMed
11 Schneeweiss S, Stürmer T, Maclure M. Case–crossover and case–time–control designs as alternatives in pharmacoepidemiologic research. Pharmacoepidemiology and drug safety 1997; 6 S3 S51-S59.

MissingFormLabel
Crossref PubMed Search in Google Scholar
12 Jurafsky D, James H. Speech and language processing an introduction to natural language processing, computational linguistics, and speech. 2000

MissingFormLabel
PubMed Search in Google Scholar
13 Bekhuis T, Kreinacke M, Spallek H, Song M, O’Donnell JA. Using natural language processing to enable in-depth analysis of clinical messages posted to an Internet mailing list: a feasibility study. Journal of medical Internet research 2011; 13 (04) e98.

MissingFormLabel
Crossref PubMed Search in Google Scholar
14 Wu ST, Sohn S, Ravikumar K, Wagholikar K, Jonnalagadda SR, Liu H, Juhn YJ. Automated chart review for asthma cohort identification using natural language processing: an exploratory study. Annals of Allergy, Asthma & Immunology 2013; 111 (05) 364-369.

MissingFormLabel
Crossref PubMed Search in Google Scholar
15 Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. Journal of the American Medical Informatics Association 2010; 17 (01) 19-24.

MissingFormLabel
Crossref PubMed Search in Google Scholar
16 Chen W, Fosler-Lussier E, Xiao N, Raje S, Ramnath R, Sui D. editors. A Synergistic Framework for Geographic Question Answering. Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on; 2013: 94-99.

MissingFormLabel
PubMed
17 Wu ST, Liu H, Li D, Tao C, Musen MA, Chute CG, Shah NH. Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis. Journal of the American Medical Informatics Association 2012; 19 e1 e149-156.

MissingFormLabel
Crossref PubMed Search in Google Scholar
18 Garvin JH, DuVall SL, South BR, Bray BE, Bolton D, Heavirland J, Pickard S, Heidenreich P, Shen S, Weir C.. Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure. Journal of the American Medical Informatics Association 2012; 19 (Suppl. 05) 859-866.

MissingFormLabel
Crossref PubMed Search in Google Scholar
19 Doan S, Conway M, Phuong TM, Ohno-Machado L. Natural language processing in biomedicine: a unified system architecture overview. Methods in molecular biology (Clifton, NJ) 2013; 1168: 275-294.

MissingFormLabel
PubMed Search in Google Scholar
20 Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association 2010; 17 (03) 229-236.

MissingFormLabel
Crossref PubMed Search in Google Scholar
21 Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 2010; 17 (05) 507-513.

MissingFormLabel
Crossref PubMed Search in Google Scholar
22 Osborne JD, Lin S, Zhu LJ, Kibbe WA. Mining biomedical data using MetaMap Transfer (MMtx) and the Unified Medical Language System (UMLS). Gene Function Analysis: Springer; 2007. p. 153-69.

MissingFormLabel
Search in Google Scholar
23 Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Journal of the American Medical Informatics Association 2011; 18 (05) 601-606.

MissingFormLabel
Crossref PubMed Search in Google Scholar
24 Tang B, Cao H, Wu Y, Jiang M, Xu H. editors. Clinical entity recognition using structural support vector machines with rich features. Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. 2012 ACM.

MissingFormLabel
PubMed Search in Google Scholar
25 Tang B, Cao H, Wu Y, Jiang M, Xu H. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. BMC medical informatics and decision making 2013; 13 (Suppl. 01) S1.

MissingFormLabel
PubMed Search in Google Scholar
26 Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics 2012; 13 (06) 395-405.

MissingFormLabel
Crossref PubMed Search in Google Scholar
27 Zhu D, Wu S, Carterette B, Liu H. Using large clinical corpora for query expansion in text-based cohort identification. Journal of biomedical informatics 2014; 49: 275-281.

MissingFormLabel
Crossref PubMed Search in Google Scholar
28 Murphy SN, Wilcox A. Mission and Sustainability of Informatics for Integrating Biology and the Bedside (i2b2). eGEMs (Generating Evidence & Methods to improve patient outcomes) 2014; 2 (02) 7.

MissingFormLabel
PubMed Search in Google Scholar
29 Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, Kohane I. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association 2010; 17 (02) 124-130.

MissingFormLabel
Crossref PubMed Search in Google Scholar
30 Natter MD, Quan J, Ortiz DM, Bousvaros A, Ilowite NT, Inman CJ, Marsolo K, McMurry AJ, Sandborg CI, Schanberg LE. An i2b2-based, generalizable, open source, self-scaling chronic disease registry. Journal of the American Medical Informatics Association 2013; 20 (01) 172-179.

MissingFormLabel
Crossref PubMed Search in Google Scholar
31 Moser R, Boyer E, Lupinski D, Darer J, Anderer T, Villareal A, Berger P. C-B4–02: Enhancing the Quality and Efficiency of Obstructive Sleep Apnea Screening Using Health Information Technology: Results of a Geisinger Clinic Pilot Study. Clinical medicine & research 2011; 9 3–4 170-171.

MissingFormLabel
PubMed Search in Google Scholar
32 Zhang G-Q, Cui L, Teagno J, Kaebler D, Koroukian S, Xu R. Merging Ontology Navigation with Query Construction for Web-based Medicare Data Exploration. AMIA Summits on Translational Science Proceedings 2013; 2013: 285.

MissingFormLabel
PubMed Search in Google Scholar
33 Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC medical informatics and decision making 2006; 6 (01) 30.

MissingFormLabel
Crossref PubMed Search in Google Scholar
34 Chen D, Manning CD. A fast and accurate dependency parser using neural networks. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014: 740-750.

MissingFormLabel
PubMed
35 Socher R, Lin CC, Manning C, Ng AY. Parsing natural scenes and natural language with recursive neural networks. Proceedings of the 28th International Conference on Machine Learning (ICML-11) 2011: 129-136.

MissingFormLabel
PubMed
36 Socher R, Manning CD, Ng AY. Learning continuous phrase representations and syntactic parsing with recursive neural networks. Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop 2010: 1-9.

MissingFormLabel
PubMed
37 Chen W. editor Context-based Natural Language Processing for GIS-based Vague Region Visualization. Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science; 2014: Association for Computational Linguistics.

MissingFormLabel
PubMed
38 Klein D, Manning CD. editors. Accurate unlexicalized parsing. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1; 2003: Association for Computational Linguistics.

MissingFormLabel
PubMed
39 Klein D, Manning CD. editors. Fast exact inference with a factored model for natural language parsing. Advances in neural information processing systems; 2002

MissingFormLabel
PubMed Search in Google Scholar
40 Cohen WW, Sarawagi S. editors. Exploiting dictionaries in named entity extraction: combining semi-markov extraction processes and data integration methods. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining; 2004 : ACM.

MissingFormLabel
PubMed Search in Google Scholar
41 Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse DA, Xu H. editors. A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries. AMIA Annual Symposium Proceedings; 2012 American Medical Informatics Association.

MissingFormLabel
PubMed Search in Google Scholar
42 Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics 2001; 34 (05) 301-310.

MissingFormLabel
Crossref PubMed Search in Google Scholar
43 Ristad ES, Yianilos PN. Learning string-edit distance. Pattern Analysis and Machine Intelligence, IEEE Transactions on 1998; 20 (05) 522-532.

MissingFormLabel
Crossref PubMed Search in Google Scholar

Subscribe to RSS

Share / Bookmark

Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2

Publication History

Summary

Keywords

References