Clinics in Colon and Rectal Surgery 2019; 32(01): 005-006
DOI: 10.1055/s-0038-1673347
Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

Introduction to Big Data in Colorectal Surgery

David A. Etzioni
1  Mayo Clinic College of Medicine and Science, College in Rochester, Minnesota
› Author Affiliations
Further Information

Publication History

Publication Date:
08 January 2019 (online)

With advances in health care technology, clinical phenomena are increasingly translated into a digital format and represented for the whole world to see. These advances have occurred incrementally over the last two decades, but the last few years have seen an inflection point in the rate of change.

Evidence of these unprecedented changes can be seen in several areas. As a result of the Affordable Care Act (ACA), a significant amount of Medicare data are now freely available to parties in the public and private sector.[1] This is not new—Medicare data have been a source of valuable research and policy insights for quite some time. What is new is the inclination of the Centers for Medicare and Medicaid Services (CMS) to freely distribute data, at minimal cost, that include provider designation.

This policy shift has led to a cottage industry of health care analysis, interested in using these data for a broad range of purposes. For some time, the CMS has used its own mechanisms to report a broad range of facility-level surgical and nonsurgical outcomes.[2] In April 2014, a newspaper-based site began reporting annual physician-level billing to Medicare.[3] This was one of the first large-scale private sector report based on these data—certainly not the last. In 2015, two other private sector sites, both considered consumer advocates, followed shortly after with sites reporting surgeon-specific outcomes.[4] [5]

These initial forays into the public representation of health care delivery data can reasonably be considered the first generation, with more on the horizon. Each of these initial efforts is based on Medicare data, and for good reason. The sheer numbers of patients, the relatively inexpensive nature of data acquisition, and the fact that the data are population based make Medicare data attractive for any group interested in analyzing and reporting domestic quality of surgical care.

How well suited are Medicare data for this purpose? Medicare data, and other sources/warehouses of data, are generally termed “administrative data,” as they are generated and collected primarily for the purposes of billing. The ability of these mechanisms to determine surgical outcomes in a way that is accurate/consistent/unbiased has been questioned, especially by proponents of registry data. Registry data (e.g., the National Surgical Quality Improvement Project [NSQIP]) are collected by trained abstractors, using detailed clinical criteria. When compared with administrative data, registry data give very different results.[6] A recent study has elucidated some of the reasons for these differences, finding that the biggest reasons for discordance between administrative data and registry data (NSQIP) are differences in the criteria used to determine whether or not a complication occurred.[7]

There is an emerging tension that exists between administrative data and registry data. Administrative data are inexpensive, but widely considered to have significant problems with accuracy when compared with registry data. This tradeoff, between cost and data quality, is central to any discussion regarding quality monitoring and quality improvement in health care. Fundamentally, however, no data source is perfect. A question that is central to any party seeking to use “big data” to drive an agenda is: “How good does a particular data source have to be in order to be useful?” The answer to the question is certainly well beyond this brief preface. Any party that interprets reports based on big data should be appropriately skeptical regarding the quality of any data source.

Data quality, however, is not the only problem that customers of big data must overcome. Even high-quality data can be analyzed in a way that leads to problems with bias and unintended consequences. Probably the biggest area of potential problems with data analysis is risk adjustment. Any approach to risk adjustment is necessarily incomplete. There is no practical way to accurately characterize the risk associated with every known patient comorbidity (or combinations of comorbidities). As a result, cohorts of patients with risk factors outside of standard risk-adjustment schemes (e.g., low socioeconomic status) may experience outcomes which are worse than expected.

For health care providers, however, the biggest challenge with the use of big data is translating these data into action. As the saying goes, “Weighing a pig does not make the pig fatter.”[8] In order for quality to improve, these data need to drive a quality improvement process. Something that care providers do has to actually change.

In this issue of Clinics in Colon and Rectal Surgery, experts in the use of big data contribute their insights regarding a broad range of data sources that are currently used to drive health care research/policy within the United States. These insights are important for clinicians, researchers, and policy analysts in interpreting reports that are based on each of these types of data.