The False Security of Blind DatesChrononymization’s Lack of Impact on Data Privacy of Laboratory Data
13 July 2012
Accepted 01 October 2012
19 December 2017 (online)
Background: The reuse of clinical data for research purposes requires methods for the protection of personal privacy. One general approach is the removal of personal identifiers from the data. A frequent part of this anonymization process is the removal of times and dates, which we refer to as “chrononymization.” While this step can make the association with identified data (such as public information or a small sample of patient information) more difficult, it comes at a cost to the usefulness of the data for research.
Objectives: We sought to determine whether removal of dates from common laboratory test panels offers any advantage in protecting such data from re-identification.
Methods: We obtained a set of results for 5.9 million laboratory panels from the National Institutes of Health’s (NIH) Biomedical Translational Research Information System (BTRIS), selected a random set of 20,000 panels from the larger source sets, and then identified all matches between the sets.
Results: We found that while removal of dates could hinder the re-identification of a single test result, such removal had almost no effect when entire panels were used.
Conclusions: Our results suggest that reliance on chrononymization provides a false sense of security for the protection of laboratory test results. As a result of this study, the NIH has chosen to rely on policy solutions, such as strong data use agreements, rather than removal of dates when reusing clinical data for research purposes.
Citation: Cimino J.J. The false security of blind dates: Chrononymization’s lack of impact on data privacy of laboratory data. Appl Clin Inf 2012; 3: 392–403
- 1 Prokosch H, Ganslandt T. Perspectives for medical informatics: Reusing the electronic medical record for clinical research. Methods of Information in Medicine 2009; 48: 38-44.
- 2 Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association 2007; 14 (Suppl. 05) 550-563.
- 3 El Emam K. Methods for the de-identification of electronic health records for genomic research. Genome Medicine 2011; 3 (Suppl. 04) 25.
- 4 http://www.hhs.gov/ocr/privacy/hipaa/administrative/
- 5 http://www.hhs.gov/ohrp/humansubjects/anprm2011page.html
- 6 Murphy SM, Chueh HC. A security architecture for query tools used to access large biomedical databases. Proceedings of the Annual Symposium of the American Medical Informatics Association 2002: 552-556.
- 7 Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association 2007; 14 (Suppl. 05) 550-563.
- 8 Sweeney L. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 2002; 10 (Suppl. 05) 557-570.
- 9 Malin BA. An Evaluation of the Current State of Genomic Data Privacy Protection Technology and a Road-map for the Future. Journal of the American Medical Informatics Association 2005; 12: 28-34.
- 10 Malin B, Sweeney L. How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. Journal of Biomedical Informatics 2004; 37 (Suppl. 03) 179-192.
- 11 Cimino JJ, Ayres EJ. The clinical research data repository of the US National Institutes of Health. Studies in Health Technology and Informatics 2010; 160 Pt 2 1299-1303.
- 12 Simon MS, Mueller BA, Deapen D, Copeland G. A comparison of record linkage yield for health research using different variable sets. Breast Cancer Research and Treatment 2005; 89 (Suppl. 02) 107-110.
- 13 Brice JH, Friend KD, Delbridge TR. Accuracy of EMS-recorded patient demographic data. Prehosp Emerg Care 2008; 12 (Suppl. 02) 187-191.
- 14 Beauchamp A, Tonkin AM, Kelsall H, Sundararajan V, English DR, Sundaresan L, Wolfe R, Turrell G, Giles GG, Peeters A. Validation of de-identified record linkage to ascertain hospital admissions in a cohort study. BMC Medical Research Methodology 2011; 11: 42.
- 15 Migowski A, Chaves RB, Coeli CM, Ribeiro AL, Tura BR, Kuschnir MC, Azevedo VM, Floriano DB, Magalhães CA, Pinheiro MC, Xavier RM. Accuracy of probabilistic record linkage in the assessment of high-complexity cardiology procedures. Revista de Saúde Pública 2011; 45 (Suppl. 02) 269-275.
- 16 Malin BA. k-Unlinkability: A privacy protection model for distributed data. Data & Knowledge Engineering 2008; 64: 294-311.
- 17 Loukides G, Denny JC, Malin B. The disclosure of diagnosis codes can breach research participants’ privacy. J Am Med Inform Assoc 2010; 17 (Suppl. 03) 322-327.
- 18 Anderson N, Abend A, Mandel A, Geraghty E, Gabriel D, Wynden R, Kamerick M, Anderson K, Rainwater J, Tarczy-Hornoch P. Implementation of a deidentified federated data network for population-based cohort discovery. J Am Med Inform Assoc 2011 Aug 26. [Epub ahead of print]
- 19 Malin B, Loukides G, Benitez K, Clayton EW. Identifiability in biobanks: models, measures, and mitigation strategies. Hum Genet 2011; 130 (Suppl. 03) 383-392.
- 20 Schnieder H. Via songs, lost iPod reshuffled to owner. The Washington Post, December 21 2011: C1-3.
- 21 Brisbane AS. The Public Editor: Name Withheld, but Not His Identity. New York Times, December 17 2011 Sunday Review:12.