Development and Evaluation of Record Linkage Rules in a Safety-Net Health System Serving Disadvantaged Communities
21 December 2018
22 May 2019
12 August 2019 (online)
Background There is a need for flexible, accurate record-linkage systems with transparent rules that work across diverse populations.
Objectives We developed rules responsive to challenges in linking records for an urban safety-net health system; we calculated performance characteristics for our algorithm.
Methods We evaluated encounters during January 1, 2012 through September 30, 2018. We compared our algorithm, using name (first-last), date-of-birth (DOB), and last four of social security number to our electronic health record (EHR) system's reconciliation process. We applied our algorithm to unreconciled real-time Admission-Discharge-Transfer registration data, and compared match results to reconciled identities from our enterprise data warehouse. We manually validated matches for randomly sampled discordant pairs; we calculated sensitivity/specificity. We evaluated predictors of discordance, including census tract information.
Results Of 771,477 unique medical record numbers, most (95%) were concordant between systems; a substantial minority (5%) was discordant. Of 38,993 discordant pairs, most (n = 36,539; 94%) were detected by our local algorithm. The sensitivity of our algorithm was higher than the EHR process (99% vs. 81%), but with lower specificity (98.6% vs. 99.9%). Our highest-yield rules, beyond full first and last name plus complete DOB match, were first three initials of first name, transposed first-last names, and DOB offsets (+1 and +365 days). Factors associated with discordance were homelessness (adjusted odds ratio [aOR] = 2.4; 95% confidence interval [CI], 2.2–2.6) and living in a census tract with high levels of poverty (aOR = 1.4; 95% CI, 1.3–1.4).
Conclusion Our algorithm had superior sensitivity compared to our EHR process. Homelessness and poverty were associated with unmatched records. Improved sensitivity was attributable to several critical input-variable processing steps useful for similar difficult-to-link populations.
Protection of Human and Animal Subjects
This study was performed as a quality improvement project to improve record linkage within separate domains of a large integrated health system. We conferred with the Institutional Review Board, and it was determined that review was not necessary.
- 1 Aitken M, de St Jorre J, Pagliari C, Jepson R, Cunningham-Burley S. Public responses to the sharing and linkage of health data for research purposes: a systematic review and thematic synthesis of qualitative studies. BMC Med Ethics 2016; 17 (01) 73
- 2 Kshetri N. Big data’s impact on privacy, security and consumer welfare. Telecomm Policy 2014; 38 (11) 1134-1145
- 3 Bohensky MA, Jolley D, Sundararajan V. , et al. Data linkage: a powerful research tool with potential problems. BMC Health Serv Res 2010; 10 (01) 346
- 4 Kho AN, Cashy JP, Jackson KL. , et al. Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. J Am Med Inform Assoc 2015; 22 (05) 1072-1080
- 5 Harron K, Wade A, Muller-Pebody B, Goldstein H, Gilbert R. Opening the black box of record linkage. J Epidemiol Community Health 2012; 66 (12) 1198
- 6 Zingmond DS, Ye Z, Ettner SL, Liu H. Linking hospital discharge and death records--accuracy and sources of bias. J Clin Epidemiol 2004; 57 (01) 21-29
- 7 Maizlish NA, Herrera L. A record linkage protocol for a diabetes registry at ethnically diverse community health centers. J Am Med Inform Assoc 2005; 12 (03) 331-337
- 8 Patman F, Shaefer L. Is Soundex Good Enough for You? On the Hidden Risks of Soundex-Based Name Searching. Herndon, VA: Language Analysis Systems, Inc.; 2001. –2003
- 9 Meadow T. “A Rose is a Rose” on producing legal gender classifications. Gend Soc 2010; 24 (06) 814-837
- 10 Quantin C, Binquet C, Bourquard K. , et al. Which are the best identifiers for record linkage?. Med Inform Internet Med 2004; 29 (3-4): 221-227
- 11 Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry/Geospatial Research, Analysis, and Services Program. Social Vulnerability Index 2014 Database, Illinois. Available at: https://svi.cdc.gov/data-and-tools-download.html . Accessed October 30, 2018
- 12 Flanagan BE, Gregory EW, Hallisey EJ, Heitgerd JL, Lewis B. A social vulnerability index for disaster management. J Homel Secur Emerg 2011; 8 (01) 3
- 13 Ansolabehere S, Hersh ED. ADGN: an algorithm for record linkage using address, date of birth, gender, and name. Stat Public Policy (Phila) 2017; 4 (01) 1-10
- 14 Holman CDAJ, Bass AJ, Rouse IL, Hobbs MS. Population-based linkage of health records in Western Australia: development of a health services research linked database. Aust N Z J Public Health 1999; 23 (05) 453-459
- 15 Jutte DP, Roos LL, Brownell MD. Administrative record linkage as a tool for public health research. Annu Rev Public Health 2011; 32: 91-108
- 16 Culbertson A, Goel S, Madden MB. , et al. The building blocks of interoperability. A multisite analysis of patient demographic attributes available for matching. Appl Clin Inform 2017; 8 (02) 322-336
- 17 Jaro MA. Probabilistic linkage of large public health data files. Stat Med 1995; 14 (5-7): 491-498
- 18 Newman TB, Brown AN. Use of commercial record linkage software and vital statistics to identify patient deaths. J Am Med Inform Assoc 1997; 4 (03) 233-237
- 19 Lariscy JT. Differential record linkage by Hispanic ethnicity and age in linked mortality studies: implications for the epidemiologic paradox. J Aging Health 2011; 23 (08) 1263-1284
- 20 Pew Charitable Trusts. Enhanced patient matching is critical to achieving the full promise of digital health records. Secondary Pew Charitable Trusts. Enhanced patient matching is critical to achieving the full promise of digital health records. Available at: https://www.pewtrusts.org/-/media/assets/2018/09/healthit_enhancedpatientmatching_report_final.pdf . Accessed: November 1, 2018
- 21 Zech J, Husk G, Moore T, Shapiro JS. Measuring the degree of unmatched patient records in a health information exchange using exact matching. Appl Clin Inform 2016; 7 (02) 330-340
- 22 Naessens JM, Visscher SL, Peterson SM. , et al. Incorporating the last four digits of social security numbers substantially improves linking patient data from de-identified hospital claims databases. Health Serv Res 2015; 50 (Suppl. 01) 1339-1350
- 23 Linkja: Open Source Privacy Preserving Record Linkage. GitHub Repository initiated January 31, 2019. Available at: https://linkja.github.io/ . Accessed July 1, 2019