Summary
Background: Clinical coding systems have been developed to translate real-world healthcare information
such as prescriptions, diagnoses and procedures into standardized codes appropriate
for use in large healthcare datasets. Due to the lack of information on coding system
characteristics and insufficient uniformity in coding practices, there is a growing
need for better understanding of coding systems and their use in pharmacoepidemiology
and observational real world data research.
Objectives: To determine: 1) the number of available coding systems and their characteristics,
2) which pharmacoepidemiology databases are they adopted in, 3) what outcomes and
exposures can be identified from each coding system, and 4) how robust they are with
respect to consistency and validity in pharmacoepidemiology and observational database
studies.
Methods: Electronic literature database and unpublished literature searches, as well as hand
searching of relevant journals were conducted to identify eligible articles discussing
characteristics and applications of coding systems in use and published in the English
language between 1986 and 2016. Characteristics considered included type of information
captured by codes, clinical setting(s) of use, adoption by a pharmacoepidemiology
database, region, and available mappings. Applications articles describing the use
and validity of specific codes, code lists, or algorithms were also included. Data
extraction was performed independently by two reviewers and a narrative synthesis
was performed.
Results: A total of 897 unique articles and 57 coding systems were identified, 17% of which
included country-specific modifications or multiple versions. Procedures (55%), diagnoses
(36%), drugs (38%), and site of disease (39%) were most commonly and directly captured
by these coding systems. The systems were used to capture information from the following
clinical settings: inpatient (63%), ambulatory (55%), emergency department (ED, 34%),
and pharmacy (13%). More than half of all coding systems were used in Europe (59%)
and North America (57%). 34% of the reviewed coding systems were utilized in at least
1 of the 16 pharmacoepidemiology databases of interest evaluated. 21% of coding systems
had studies evaluating the validity and consistency of their use in research within
pharmacoepidemiology databases of interest. The most prevalent validation method was
comparison with a review of patient charts, case notes or medical records (64% of
reviewed validation studies). The reported performance measures in the reviewed studies
varied across a large range of values (PPV 0-100%, NPV 6-100%, sensitivity 0-100%,
specificity 23-100% and accuracy 16-100%) and were dependent on many factors including
coding system(s), therapeutic area, pharmacoepidemiology database, and outcome.
Conclusions: Coding systems vary by type of information captured, clinical setting, and pharmacoepidemiology
database and region of use. Of the 57 reviewed coding systems, few are routinely and
widely applied in pharmacoepidemiology database research. Indication and outcome dependent
heterogeneity in coding system performance suggest that accurate definitions and algorithms
for capturing specific exposures and outcomes within large healthcare datasets should
be developed on a case-by-case basis and in consultation with clinical experts.
Keywords
Clinical coding/classification - clinical coding/standards - clinical coding/utilization
- pharmacoepidemiology/methods - reproducibility of results