Methods Inf Med 2018; 57(01/02): 01-42
DOI: 10.3414/ME17-05-0006
Review Articles
Schattauer GmbH

A Systematic Review of Coding Systems Used in Pharmacoepidemiology and Database Research

Yong Chen
1   GlaxoSmithKline, Inc., Collegeville, PA, USA
Marko Zivkovic
2   Genesis Research, Hoboken, NJ, USA
Tongtong Wang
3   Merck & Co, Inc., Kenilworth, NJ, USA
Su Su
4   Merck Research Laboratories, Rahway, NJ, USA
Jianyi Lee
2   Genesis Research, Hoboken, NJ, USA
Edward A. Bortnichak
3   Merck & Co, Inc., Kenilworth, NJ, USA
› Author Affiliations
Further Information

Publication History

received: 29 June 2017

accepted: 09 February 2018

Publication Date:
05 April 2018 (online)


Background: Clinical coding systems have been developed to translate real-world healthcare information such as prescriptions, diagnoses and procedures into standardized codes appropriate for use in large healthcare datasets. Due to the lack of information on coding system characteristics and insufficient uniformity in coding practices, there is a growing need for better understanding of coding systems and their use in pharmacoepidemiology and observational real world data research.

Objectives: To determine: 1) the number of available coding systems and their characteristics, 2) which pharmacoepidemiology databases are they adopted in, 3) what outcomes and exposures can be identified from each coding system, and 4) how robust they are with respect to consistency and validity in pharmacoepidemiology and observational database studies.

Methods: Electronic literature database and unpublished literature searches, as well as hand searching of relevant journals were conducted to identify eligible articles discussing characteristics and applications of coding systems in use and published in the English language between 1986 and 2016. Characteristics considered included type of information captured by codes, clinical setting(s) of use, adoption by a pharmacoepidemiology database, region, and available mappings. Applications articles describing the use and validity of specific codes, code lists, or algorithms were also included. Data extraction was performed independently by two reviewers and a narrative synthesis was performed.

Results: A total of 897 unique articles and 57 coding systems were identified, 17% of which included country-specific modifications or multiple versions. Procedures (55%), diagnoses (36%), drugs (38%), and site of disease (39%) were most commonly and directly captured by these coding systems. The systems were used to capture information from the following clinical settings: inpatient (63%), ambulatory (55%), emergency department (ED, 34%), and pharmacy (13%). More than half of all coding systems were used in Europe (59%) and North America (57%). 34% of the reviewed coding systems were utilized in at least 1 of the 16 pharmacoepidemiology databases of interest evaluated. 21% of coding systems had studies evaluating the validity and consistency of their use in research within pharmacoepidemiology databases of interest. The most prevalent validation method was comparison with a review of patient charts, case notes or medical records (64% of reviewed validation studies). The reported performance measures in the reviewed studies varied across a large range of values (PPV 0-100%, NPV 6-100%, sensitivity 0-100%, specificity 23-100% and accuracy 16-100%) and were dependent on many factors including coding system(s), therapeutic area, pharmacoepidemiology database, and outcome.

Conclusions: Coding systems vary by type of information captured, clinical setting, and pharmacoepidemiology database and region of use. Of the 57 reviewed coding systems, few are routinely and widely applied in pharmacoepidemiology database research. Indication and outcome dependent heterogeneity in coding system performance suggest that accurate definitions and algorithms for capturing specific exposures and outcomes within large healthcare datasets should be developed on a case-by-case basis and in consultation with clinical experts.