Appl Clin Inform 2017; 08(02): 430-446
DOI: 10.4338/ACI-2016-05-RA-0078
Research Article
Schattauer GmbH

Combining Contrast Mining with Logistic Regression To Predict Healthcare Utilization in a Managed Care Population

Lincoln Sheets
1  University of Missouri, MU Informatics Institute, Columbia, Missouri, USA
2  University of Missouri, School of Medicine, Columbia, Missouri, USA
,
Gregory F. Petroski
2  University of Missouri, School of Medicine, Columbia, Missouri, USA
,
Yan Zhuang
3  University of Missouri, College of Engineering, Columbia, Missouri, USA
,
Michael A. Phinney
3  University of Missouri, College of Engineering, Columbia, Missouri, USA
,
Bin Ge
2  University of Missouri, School of Medicine, Columbia, Missouri, USA
,
Jerry C. Parker
2  University of Missouri, School of Medicine, Columbia, Missouri, USA
,
Chi-Ren Shyu
1  University of Missouri, MU Informatics Institute, Columbia, Missouri, USA
› Author Affiliations
Funding LS and JCP: This publication was made possible by Grant Number 1C1CMS331001–01–00 from the Department of Health and Human Services, Centers for Medicare & Medicaid Services. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the U.S. Department of Health and Human Services or any of its agencies. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report. MAP is supported by the US Department of Education Graduate Assistance in Areas of National Need (GAANN) Fellowship under grant number P200A100053, and YZ and CRS are supported by the Shumaker Endowment for biomedical informatics. The high performance computing infrastructure used in this research is currently supported by the National Science Foundation under grant number CNS-1429294.
Further Information

Publication History

received: 26 May 2016

accepted: 21 February 2017

Publication Date:
21 December 2017 (online)

Summary

Background: Because 5% of patients incur 50% of healthcare expenses, population health managers need to be able to focus preventive and longitudinal care on those patients who are at highest risk of increased utilization. Predictive analytics can be used to identify these patients and to better manage their care. Data mining permits the development of models that surpass the size restrictions of traditional statistical methods and take advantage of the rich data available in the electronic health record (EHR), without limiting predictions to specific chronic conditions.

Objective: The objective was to demonstrate the usefulness of unrestricted EHR data for predictive analytics in managed healthcare.

Methods: In a population of 9,568 Medicare and Medicaid beneficiaries, patients in the highest 5% of charges were compared to equal numbers of patients with the lowest charges. Contrast mining was used to discover the combinations of clinical attributes frequently associated with high utilization and infrequently associated with low utilization. The attributes found in these combinations were then tested by multiple logistic regression, and the discrimination of the model was evaluated by the c-statistic.

Results: Of 19,014 potential EHR patient attributes, 67 were found in combinations frequently associated with high utilization, but not with low utilization (support>20%). Eleven of these attributes were significantly associated with high utilization (p<0.05). A prediction model composed of these eleven attributes had a discrimination of 84%.

Conclusions: EHR mining reduced an unusably high number of patient attributes to a manageable set of potential healthcare utilization predictors, without conjecturing on which attributes would be useful. Treating these results as hypotheses to be tested by conventional methods yielded a highly accurate predictive model. This novel, two-step methodology can assist population health managers to focus preventive and longitudinal care on those patients who are at highest risk for increased utilization.

Citation: Sheets L, Petroski GF, Zhuang Y, Phinney MA, Ge B, Parker JC, Shyu C-R. Combining contrast mining with logistic regression to predict healthcare Appl Clin Inform 2017; 8: 430–446 https://doi.org/10.4338/ACI-2016-05-RA-0078

Clinical Relevance Statement

Accurate prediction of the 5% of patients who incur 50% of healthcare expenses is needed to permit population health managers to focus preventive and longitudinal care effectively. Combining contrast mining, which permits the use of the rich data available in the EHR, with testing by traditional statistical methods created flexible and highly accurate healthcare predictive analytics which can support population health management.


Protection of Human and Animal Subjects

This project was funded by the Center for Medicare and Medicaid Services (CMS) to expand the scope of services to a population of CMS beneficiaries, so the Health Sciences Institutional Review Board deemed the project to be a quality improvement initiative that did not require a formal patient consent process since the explicit purpose of data use was to improve patient care; the IRB number is 2001677-QI.