Leveraging Electronic Dental Record Data to Classify Patients Based on Their Smoking IntensityFunding This study was funded by Dr. Thankam Paul Thyvalikakath’s start-up funds from Indiana University School of Dentistry.
08 May 2018
11 October 2018
15 March 2019 (online)
Background Smoking is an established risk factor for oral diseases and, therefore, dental clinicians routinely assess and record their patients' detailed smoking status. Researchers have successfully extracted smoking history from electronic health records (EHRs) using text mining methods. However, they could not retrieve patients' smoking intensity due to its limited availability in the EHR. The presence of detailed smoking information in the electronic dental record (EDR) often under a separate section allows retrieving this information with less preprocessing.
Objective To determine patients' detailed smoking status based on smoking intensity from the EDR.
Methods First, the authors created a reference standard of 3,296 unique patients’ smoking histories from the EDR that classified patients based on their smoking intensity. Next, they trained three machine learning classifiers (support vector machine, random forest, and naïve Bayes) using the training set (2,176) and evaluated performances on test set (1,120) using precision (P), recall (R), and F-measure (F). Finally, they applied the best classifier to classify smoking status from an additional 3,114 patients’ smoking histories.
Results Support vector machine performed best to classify patients into smokers, nonsmokers, and unknowns (P, R, F: 98%); intermittent smoker (P: 95%, R: 98%, F: 96%); past smoker (P, R, F: 89%); light smoker (P, R, F: 87%); smokers with unknown intensity (P: 76%, R: 86%, F: 81%), and intermediate smoker (P: 90%, R: 88%, F: 89%). It performed moderately to differentiate heavy smokers (P: 90%, R: 44%, F: 60%). EDR could be a valuable source for obtaining patients’ detailed smoking information.
Conclusion EDR data could serve as a valuable source for obtaining patients' detailed smoking information based on their smoking intensity that may not be readily available in the EHR.
Keywordselectronic dental record - smoking intensity - information extraction - electronic health record - dental informatics - machine learning classifiers
Human Subjects Protections
This study was reviewed and approved by our institutional review board (IRB: 1310579094) and granted authorization.
- 1 Chatzopoulos G. Smoking, smokeless tobacco, and alcohol consumption as contributing factors to periodontal disease. Northwest Dent 2016; 95 (01) 37-41
- 2 Marcenes W, Kassebaum NJ, Bernabé E. , et al. Global burden of oral conditions in 1990-2010: a systematic analysis. J Dent Res 2013; 92 (07) 592-597
- 3 Durham J, Fraser HM, McCracken GI, Stone KM, John MT, Preshaw PM. Impact of periodontitis on oral health-related quality of life. J Dent 2013; 41 (04) 370-376
- 4 Martinez-Canut P, Lorca A, Magán R. Smoking and periodontal disease severity. J Clin Periodontol 1995; 22 (10) 743-749
- 5 Kinane DF, Chestnutt IG. Smoking and periodontal disease. Crit Rev Oral Biol Med 2000; 11 (03) 356-365
- 6 Morse DE, Psoter WJ, Cleveland D. , et al. Smoking and drinking in relation to oral cancer and oral epithelial dysplasia. Cancer Causes Control 2007; 18 (09) 919-929
- 7 Charangowda BK. Dental records: an overview. J Forensic Dent Sci 2010; 2 (01) 5-10
- 8 Chaffee BW, Couch ET, Ryder MI. The tobacco-using periodontal patient: role of the dental practitioner in tobacco cessation and periodontal disease management. Periodontol 2000 2016; 71 (01) 52-64
- 9 Rush WA, Schleyer TK, Kirshner M. , et al. Integrating tobacco dependence counseling into electronic dental records: a multi-method approach. J Dent Educ 2014; 78 (01) 31-39
- 10 Song M, Liu K, Abromitis R, Schleyer TL. Reusing electronic patient data for dental clinical research: a review of current status. J Dent 2013; 41 (12) 1148-1163
- 11 Siddiqui Z, Wang Y, Makkad P, Thyvalikakath T. Characterizing restorative dental treatments of Sjögren's syndrome patients using electronic dental records data. Stud Health Technol Inform 2017; 245: 1166-1169
- 12 Wu Y, Rosenbloom ST, Denny JC. , et al. Detecting abbreviations in discharge summaries using machine learning methods. AMIA Annu Symp Proc 2011; 2011: 1541-1549
- 13 Clark C, Good K, Jezierny L, Macpherson M, Wilson B, Chajewska U. Identifying smokers with a medical extraction system. J Am Med Inform Assoc 2008; 15 (01) 36-39
- 14 Cohen AM. Five-way smoking status classification using text hot-spot identification and error-correcting output codes. J Am Med Inform Assoc 2008; 15 (01) 32-35
- 15 Figueroa RL, Soto DA, Pino EJ. Identifying and extracting patient smoking status information from clinical narrative texts in Spanish. Conf Proc IEEE Eng Med Biol Soc 2014; 2014: 2710-2713
- 16 Stubbs A, Kotfila C, Xu H, Uzuner Ö. Identifying risk factors for heart disease over time: overview of 2014 i2b2/UTHealth shared task Track 2. J Biomed Inform 2015; 58 (Suppl): S67-S77
- 17 Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2008; 15 (01) 14-24
- 18 Wang L, Ruan X, Yang P, Liu H. Comparison of three information sources for smoking information in electronic health records. Cancer Inform 2016; 15: 237-242
- 19 De Silva L, Ginter T, Forbush T. , et al. , eds. Extraction and quantification of pack-years and classification of smoker information in semi-structured Medical Records. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA; 2011
- 20 Sohn S, Savova GK. Mayo clinic smoking status classification system: extensions and improvements. AMIA Annu Symp Proc 2009; 2009: 619-623
- 21 Jonnagaddala J, Dai H-J, Ray P, Liaw S-T. A preliminary study on automatic identification of patient smoking status in unstructured electronic health records. In: Proceedings of the BioNLP 15; 2015 :147–151
- 22 Schane RE, Ling PM, Glantz SA. Health effects of light and intermittent smoking: a review. Circulation 2010; 121 (13) 1518-1522
- 23 Schoenborn CA, Adams PE. Health behaviors of adults: United States, 2005-2007. Vital Health Stat 10 2010; (245) 1-132
- 24 Neumann T, Rasmussen M, Heitmann BL, Tønnesen H. Gold standard program for heavy smokers in a real-life setting. Int J Environ Res Public Health 2013; 10 (09) 4186-4199
- 25 Neves M, Leser U. A survey on annotation tools for the biomedical literature. Brief Bioinform 2014; 15 (02) 327-340
- 26 South BR, Mowery D, Suo Y. , et al. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. J Biomed Inform 2014; 50: 162-172
- 27 South BR, Shen S, Leng J, Forbush TB, DuVall SL, Chapman WW. , eds. A prototype tool set to support machine-assisted annotation. In: Proceedings of the 2012 Workshop Biomed Natural Language Processing: Association for Computational Linguistics; 2012
- 28 McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 2012; 22 (03) 276-282
- 29 Carroll RJ, Eyler AE, Denny JC. Naïve electronic health record phenotype identification for rheumatoid arthritis. AMIA Annu Symp Proc 2011; 2011: 189-196
- 30 Castro VM, Minnier J, Murphy SN. , et al; International Cohort Collection for Bipolar Disorder Consortium. Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am J Psychiatry 2015; 172 (04) 363-372
- 31 Hripcsak G, Rothschild AS. Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc 2005; 12 (03) 296-298
- 32 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD. 2009; 11 (01) 10-18
- 33 Wei Z, Wang W, Bradfield J. , et al; International IBD Genetics Consortium. Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. Am J Hum Genet 2013; 92 (06) 1008-1012
- 34 Clinical Practice Guideline Treating Tobacco Use and Dependence 2008 Update Panel, Liaisons, and Staff. A clinical practice guideline for treating tobacco use and dependence: 2008 update. A U.S. Public Health Service report. Am J Prev Med 2008; 35 (02) 158-176
- 35 Chambrone L, Preshaw PM, Rosa EF. , et al. Effects of smoking cessation on the outcomes of non-surgical periodontal therapy: a systematic review and individual patient data meta-analysis. J Clin Periodontol 2013; 40 (06) 607-615
- 36 Baig MR, Rajan M. Effects of smoking on the outcome of implant treatment: a literature review. Indian J Dent Res 2007; 18 (04) 190-195
- 37 Kotsakis GA, Javed F, Hinrichs JE, Karoussis IK, Romanos GE. Impact of cigarette smoking on clinical outcomes of periodontal flap surgical procedures: a systematic review and meta-analysis. J Periodontol 2015; 86 (02) 254-263
- 38 Atchison KA, Weintraub JA, Rozier RG. Bridging the dental-medical divide: case studies integrating oral health care and primary health care. J Am Dent Assoc 2018; 149 (10) 850-858