Methods Inf Med 2021; 60(03/04): 084-094
DOI: 10.1055/s-0041-1735619
Original Article

Optimizing Identification of People Living with HIV from Electronic Medical Records: Computable Phenotype Development and Validation

Yiyang Liu
1   Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, United States
Khairul A. Siddiqi
2   Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States
Robert L. Cook
1   Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, United States
Jiang Bian
2   Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States
Patrick J. Squires
3   Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, Florida, United States
Elizabeth A. Shenkman
2   Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States
Mattia Prosperi
1   Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, United States
Dushyantha T. Jayaweera
4   Department of Medicine, Miller School of Medicine, University of Miami, Miami, Florida, United States
› Author Affiliations
Funding This work was supported by the National Institute of Allergy and Infectious Diseases (NIAID) under Award Number R01AI145552 (Co-PIs: Salemi, Prosperi) and a pilot grant from the Center for AIDS Research (CFAR) (PI: Jayaweera) from the National Institute of Allergy and Infectious Diseases (NIAID) under Award Number 5P30AI073961-13 (PI: Pahwa). The work was also, in part, funded by CDC U18DP006512 and NCI R01CA246418. Additionally, the research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under University of Florida Clinical and Translational Science Awards UL1TR000064 and UL1TR001427. The OneFlorida Clinical Research Consortium was funded by the Patient-Centered Outcomes Research Institute number CDRN-1501-26692 and RI-CRN-2020-005; in part by the OneFlorida Cancer Control Alliance, funded by the Florida Department of Health's James and Esther King Biomedical Research Program #4KB16.


Background Electronic health record (EHR)-based computable phenotype algorithms allow researchers to efficiently identify a large virtual cohort of Human Immunodeficiency Virus (HIV) patients. Built upon existing algorithms, we refined, improved, and validated an HIV phenotype algorithm using data from the OneFlorida Data Trust, a repository of linked claims data and EHRs from its clinical partners, which provide care to over 15 million patients across all 67 counties in Florida.

Methods Our computable phenotype examined information from multiple EHR domains, including clinical encounters with diagnoses, prescription medications, and laboratory tests. To identify an HIV case, the algorithm requires the patient to have at least one diagnostic code for HIV and meet one of the following criteria: have 1+ positive HIV laboratory, have been prescribed with HIV medications, or have 3+ visits with HIV diagnostic codes. The computable phenotype was validated against a subset of clinical notes.

Results Among the 15+ million patients from OneFlorida, we identified 61,313 patients with confirmed HIV diagnosis. Among them, 8.05% met all four inclusion criteria, 69.7% met the 3+ HIV encounters criteria in addition to having HIV diagnostic code, and 8.1% met all criteria except for having positive laboratories. Our algorithm achieved higher sensitivity (98.9%) and comparable specificity (97.6%) relative to existing algorithms (77–83% sensitivity, 86–100% specificity). The mean age of the sample was 42.7 years, 58% male, and about half were Black African American. Patients' average follow-up period (the time between the first and last encounter in the EHRs) was approximately 4.6 years. The median number of all encounters and HIV-related encounters were 79 and 21, respectively.

Conclusion By leveraging EHR data from multiple clinical partners and domains, with a considerably diverse population, our algorithm allows more flexible criteria for identifying patients with incomplete laboratory test results and medication prescribing history compared with prior studies.


The content is solely the responsibility of the authors and does not necessarily represent the official views of the Patient-Centered Outcomes Research Institute (PCORI), its Board of Governors or Methodology, the OneFlorida Clinical Research Consortium, the University of Florida's Clinical and Translational Science Institute, the Florida Department of Health, or the National Institutes of Health.

Supplementary Material

Publication History

Received: 02 June 2021

Accepted: 20 July 2021

Article published online:
30 September 2021

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

  • References

  • 1 Office of National AIDS Policy. National HIV/AIDS Strategy for the United States: Updated to 2020. Federal Action Plan, December 2015. Accessed May 5, 2021 at:
  • 2 Fauci AS, Redfield RR, Sigounas G, Weahkee MD, Giroir BP. Ending the HIV epidemic: a plan for the United States. JAMA 2019; 321 (09) 844-845
  • 3 Centers for Disease Control and Prevention. HIV in the United States by Region. 2021; Accessed May 5, 2021 at:
  • 4 Food and Drug Administration. Framework for FDA's Real-World Evidence Program. 2018 . Accessed September 20, 2021 at:
  • 5 Fasciano NJ, Cherlow AL, Turner BJ, Thornton CV. Profile of medicare beneficiaries with AIDS: application of an AIDS case finding algorithm. Health Care Financ Rev 1998; 19 (03) 1-20
  • 6 Thornton C, Fasciano N, Turner BJ, Cherlow A, Bencio DS. Methods for Identifying AIDS Cases in Medicare and Medicaid Claims Data. Health Care Financ Admin. Princeton, NJ: Mathematica Policy Research; 1997
  • 7 Keyes M, Andrews R, Mason ML. A methodology for building an AIDS research file using Medicaid claims and administrative data bases. J Acquir Immune Defic Syndr (1988) 1991; 4 (10) 1015-1024
  • 8 Leibowitz AA, Desmond K. Identifying a sample of HIV-positive beneficiaries from Medicaid claims data and estimating their treatment costs. Am J Public Health 2015; 105 (03) 567-574
  • 9 Walkup JT, Wei W, Sambamoorthi U, Crystal S. Sensitivity of an AIDS case-finding algorithm: who are we missing?. Med Care 2004; 42 (08) 756-763
  • 10 McGinnis KA, Fine MJ, Sharma RK. et al; Veterans Aging Cohort 3-Site Study (VACS 3). Understanding racial disparities in HIV using data from the veterans aging cohort 3-site study and VA administrative data. Am J Public Health 2003; 93 (10) 1728-1733
  • 11 Fultz SL, Skanderson M, Mole LA. et al. Development and verification of a “virtual” cohort using the National VA Health Information System. Med Care 2006; 44 (8, suppl 2): S25-S30
  • 12 O'Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM. Measuring diagnoses: ICD code accuracy. Health Serv Res 2005; 40 (5 Pt 2): 1620-1639
  • 13 Peabody JW, Luck J, Jain S, Bertenthal D, Glassman P. Assessing the accuracy of administrative data in health information systems. Med Care 2004; 42 (11) 1066-1072
  • 14 Felsen UR, Bellin EY, Cunningham CO, Zingman BS. Development of an electronic medical record-based algorithm to identify patients with unknown HIV status. AIDS Care 2014; 26 (10) 1318-1325
  • 15 Paul DW, Neely NB, Clement M. et al. Development and validation of an electronic medical record (EMR)-based computed phenotype of HIV-1 infection. J Am Med Inform Assoc 2018; 25 (02) 150-157
  • 16 Florida Department of Health. HIV Data Center: HIV Care Data. 2020; Accessed October 13, 2020 at:
  • 17 Shenkman E, Hurt M, Hogan W. et al. OneFlorida Clinical Research consortium: linking a clinical and translational science institute with a community-based distributive Medical Education Model. Acad Med 2018; 93 (03) 451-455
  • 18 Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. Summit On Translat Bioinforma 2010; 2010: 1-5
  • 19 Ehrenstein V, Kharrazi H, Lehmann H, Taylor CO. Obtaining Data from Electronic Health Records. In: Gliklich RE, Leavy MB, Dreyer NA. eds. Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User's Guide, 3rd ed., Addendum 2. Chapter 4 Obtaining Data From Electronic Health Records. Rockville, MD: Agency for Healthcare Research and Quality (US); 2019
  • 20 Benitez K, Malin B. Evaluating re-identification risks with respect to the HIPAA privacy rule. J Am Med Inform Assoc 2010; 17 (02) 169-177
  • 21 Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc 2014; 21 (04) 578-582
  • 22 Bian J, Loiacono A, Sura A. et al. Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network. JAMIA Open 2019; 2 (04) 562-569
  • 23 Goetz MB, Hoang T, Kan VL, Rimland D, Rodriguez-Barradas M. Development and validation of an algorithm to identify patients newly diagnosed with HIV infection from electronic health records. AIDS Res Hum Retroviruses 2014; 30 (07) 626-633
  • 24 Duro R, Rocha-Pereira N, Figueiredo C. et al. Routine CD4 monitoring in HIV patients with viral suppression: is it really necessary? A Portuguese cohort. J Microbiol Immunol Infect 2018; 51 (05) 593-597
  • 25 Ambrosioni J, Mosquera M, Miró JM. Baseline Genotype Testing to Assess Drug Resistance Before Beginning HIV Treatment. JAMA 2018; 320 (20) 2153-2154
  • 26 Food and Drug Administration. HIV Treatment Information for Adults. 2020 Accessed September 25, 2020 at:
  • 27 Peissig PL, Rasmussen LV, Berg RL. et al. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. J Am Med Inform Assoc 2012; 19 (02) 225-234
  • 28 United States Census Bureau. Annual Estimates of the Resident Population, QuickFacts Florida. 2021. Accessed May 05, 2021 at:
  • 29 AIDSVu. Local Data: Florida. 2021. Accessed May 4, 2021 at:
  • 30 Hibbard JH, Pope CR. Gender roles, illness orientation and use of medical services. Soc Sci Med 1983; 17 (03) 129-137
  • 31 Cleary PD, Mechanic D, Greenley JR. Sex differences in medical care utilization: an empirical investigation. J Health Soc Behav 1982; 23 (02) 106-119
  • 32 Vaidya V, Partha G, Karmakar M. Gender differences in utilization of preventive care services in the United States. J Womens Health (Larchmt) 2012; 21 (02) 140-145
  • 33 Shergill Y, Rice D, Smyth C. et al. Characteristics of frequent users of the emergency department with chronic pain. CJEM 2020; 22 (03) 350-358
  • 34 Luque JS, Soulen G, Davila CB, Cartmell K. Access to health care for uninsured Latina immigrants in South Carolina. BMC Health Serv Res 2018; 18 (01) 310
  • 35 Betancourt JR, Carrillo JE, Green AR, Maina A. Barriers to health promotion and disease prevention in the Latino population. Clin Cornerstone 2004; 6 (03) 16-26 , discussion 27–29
  • 36 De Jesus M, Xiao C. Cross-border health care utilization among the Hispanic population in the United States: implications for closing the health care access gap. Ethn Health 2013; 18 (03) 297-314
  • 37 Boyd CM, Lucas GM. Patient-centered care for people living with multimorbidity. Curr Opin HIV AIDS 2014; 9 (04) 419-427
  • 38 Harris MF, Dennis S, Pillay M. Multimorbidity: negotiating priorities and making progress. Aust Fam Physician 2013; 42 (12) 850-854