Appl Clin Inform 2014; 05(02): 557-570
DOI: 10.4338/ACI-2014-02-RA-0013
Research Article
Schattauer GmbH

Development and validation of a computer-based algorithm to identify foreign-born patients with HIV infection from the electronic medical record

J. Levison
1   Massachusetts General Hospital, Division of Infectious Diseases, Boston, Massachusetts, United States
2   Massachusetts General Hospital, Division of General Internal Medicine, Boston, Massachusetts, United States
3   Massachusetts General Hospital, Medical Practice Evaluation Center, Boston, Massachusetts, United States
4   Brigham and Women’s Hospital, Division of Infectious Diseases, Boston, Massachusetts, United States
5   Harvard Medical School, Boston, Massachusetts, United States
V. Triant
1   Massachusetts General Hospital, Division of Infectious Diseases, Boston, Massachusetts, United States
2   Massachusetts General Hospital, Division of General Internal Medicine, Boston, Massachusetts, United States
3   Massachusetts General Hospital, Medical Practice Evaluation Center, Boston, Massachusetts, United States
5   Harvard Medical School, Boston, Massachusetts, United States
E. Losina
2   Massachusetts General Hospital, Division of General Internal Medicine, Boston, Massachusetts, United States
3   Massachusetts General Hospital, Medical Practice Evaluation Center, Boston, Massachusetts, United States
4   Brigham and Women’s Hospital, Division of Infectious Diseases, Boston, Massachusetts, United States
5   Harvard Medical School, Boston, Massachusetts, United States
6   Boston University School of Public Health, Departments of Biostatistics and Epidemiology, Boston, Massachusetts, United States
7   Harvard University Center for AIDS Research, Harvard University, Boston, Massachusetts, Unites States
K. Keefe
2   Massachusetts General Hospital, Division of General Internal Medicine, Boston, Massachusetts, United States
3   Massachusetts General Hospital, Medical Practice Evaluation Center, Boston, Massachusetts, United States
5   Harvard Medical School, Boston, Massachusetts, United States
K. Freedberg
1   Massachusetts General Hospital, Division of Infectious Diseases, Boston, Massachusetts, United States
2   Massachusetts General Hospital, Division of General Internal Medicine, Boston, Massachusetts, United States
3   Massachusetts General Hospital, Medical Practice Evaluation Center, Boston, Massachusetts, United States
5   Harvard Medical School, Boston, Massachusetts, United States
6   Boston University School of Public Health, Departments of Biostatistics and Epidemiology, Boston, Massachusetts, United States
7   Harvard University Center for AIDS Research, Harvard University, Boston, Massachusetts, Unites States
S. Regan
2   Massachusetts General Hospital, Division of General Internal Medicine, Boston, Massachusetts, United States
3   Massachusetts General Hospital, Medical Practice Evaluation Center, Boston, Massachusetts, United States
5   Harvard Medical School, Boston, Massachusetts, United States
› Institutsangaben
Weitere Informationen

Correspondence to:

Julie Levison, MD, MPhil, MPH
Program in HIV Epidemiology and Outcomes Research
Medical Practice Evaluation Center
Massachusetts General Hospital
50 Staniford St, 9th Floor
Boston, Massachusetts 02114
Telefon: 617–724–4698   
Fax: 617–726–2691   


Received: 16. Februar 2014

Accepted: 05. Mai 2014

21. Dezember 2017 (online)



Objective: To develop and validate an efficient and accurate method to identify foreign-born patients from a large patient data registry in order to facilitate population-based health outcomes research.

Methods: We developed a three-stage algorithm for classifying foreign-born status in HIV-infected patients receiving care in a large US healthcare system (January 1, 2001-March 31, 2012) (n = 9,114). In stage 1, we classified those coded as non-English language speaking as foreign-born. In stage 2, we searched free text electronic medical record (EMR) notes of remaining patients for keywords associated with place of birth and language spoken. Patients without keywords were classified as US-born. In stage 3, we retrieved and reviewed a 50-character text window around the keyword (i.e. token) for the remaining patients. To validate the algorithm, we performed a chart review and asked all HIV physicians (n = 37) to classify their patients (n = 957).We calculated algorithm sensitivity and specificity.

Results: We excluded 160/957 because physicians indicated the patient was not HIV-infected (n = 54), “not my patient” (n = 103), or had unknown place of birth (n = 3), leaving 797 for analysis. In stage 1, providers agreed that 71/95 foreign language speakers were foreign-born. Most disagreements (23/24) involved patients born in Puerto Rico. In stage 2, 49/50 patients without keywords were classified as US-born by chart review. In stage 3, token review correctly classified 55/60 patients (92%), with 93% (CI: 84.4, 100%) sensitivity and 90% (CI: 74.3, 100%) specificity compared with full chart review. After application of the three-stage algorithm, 2,102/9,114 (23%) patients were classified as foreign-born. When compared against physician response, estimated sensitivity of the algorithm was 94% (CI: 90.9, 97.2%) and specificity 92% (CI: 89.7, 94.1%), with 92% correctly classified.

Conclusion: A computer-based algorithm classified foreign-born status in a large HIV-infected cohort efficiently and accurately. This approach can be used to improve EMR-based outcomes research.

Citation: Levison J, Triant V, Losina E, Keefe K, Freedberg K, Regan S. Development and validation of a computer-based algorithm to identify foreign-born patients with HIV infection from the electronic medical record. Appl Clin Inf 2014; 5: 557–570




Conflicts of interest

All authors have declared that no competing interests exist.

  • References

  • 1 Bureau of the Census, US Department of Commerce.. State and County QuickFacts. Washington, DC: Bureau of the Census; 2013. Available at Accessed on February 16, 2014.
  • 2 Derose KP, Escarce JJ, Lurie N. Immigrants and health care: sources of vulnerability. Health Aff (Mill-wood) 2007; Sep-Oct 26 (Suppl. 05) 1258-68
  • 3 Martinez O, Wu E, Sandfort T, Dodge B, Carballo-Dieguez A, Pinto R, Rhodes S, Moya E, Chavez-Baray S. Evaluating the Impact of Immigration Policies on Health Status Among Undocumented Immigrants: A Systematic Review. J Immigr Minor Health Dec 28.
  • 4 Wohl AR, Galvan FH, Myers HF, Garland W, George S, Witt M, Cadden J, Operskalski E, Jordan W, Carpio F. Social support, stress and social network characteristics among HIV-positive Latino and African American women and men who have sex with men. AIDS Behav 2010; Oct 14 (Suppl. 05) 1149-58
  • 5 Wohl AR, Galvan FH, Myers HF, Garland W, George S, Witt M, Cadden J, Operskalski E, Jordan W, Carpio F, Lee ML. Do social support, stress, disclosure and stigma influence retention in HIV care for Latino and African American men who have sex with men and women?. AIDS Behav 2011; Aug 15 (Suppl. 06) 1098-110
  • 6 Keesee MS, Natale AP, Curiel HF. HIV positive Hispanic/Latinos who delay HIV care: analysis of multilevel care engagement barriers. Soc Work Health Care 2012; 51 (Suppl. 05) 457-78.
  • 7 Gilbert PA, Rhodes SD. HIV testing among immigrant sexual and gender minority Latinos in a US region with little historical Latino presence. AIDS Patient Care STDS 2013; Nov 27 (11) 628-36
  • 8 Akinsete OO, Sides T, Hirigoyen D, Cartwright C, Boraas C, Davey C, Pessoa-Brandao L, McLaughlin M, Kane E, Hall J, Henry K. Demographic, clinical, and virologic characteristics of African-born persons with HIV/AIDS in a Minnesota hospital. AIDS Patient Care STDS 2007; May 21 (Suppl. 05) 356-65
  • 9 Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2008; Jan-Feb 15 (Suppl. 01) 14-24
  • 10 Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak 2006; 6: 30.
  • 11 Sada Y, Hou J, Richardson P, El-Serag H, Davila J. Validation of case finding agorithms for hepatocellular cancer from administrative data and electronic health records using natural language processing. Med Care Aug 6.
  • 12 Gundlapalli AV, Carter ME, Palmer M, Ginter T, Redd A, Pickard S, Shen S, South B, Divita G, Duvall S, Nguyen TM, D‘Avolio LW, Samore M. Using natural language processing on the free text of clinical documents to screen for evidence of homelessness among US veterans. AMIA Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 2013 2013: 537-46
  • 13 Kavuluru R, Hands I, Durbin EB, Witt L. Automatic extraction of ICD-O-3 primary xites from cancer pathology reports. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science 2013 2013: 112-6
  • 14 City of Boston, Mayor’s Office of New Bostonians.. New Bostonians Demographic Report. Boston: Mayor’s Office of New Bostonians; 2004. Available at Accessed February 16, 2014.
  • 15 Massachusetts Department of Health and Human Services.. Refugee Arrivals to Massachusetts by Country of Origin. Boston: Massachusetts Department of Health and Human Services; 2013. Available at Accessed February 16, 2014.
  • 16 Bureau of the Census, US Department of Commerce.. American Community Survey, 5-Year Estimates. Updated every year. Washington, DC : Bureau of the Census; 2013. Available at Accessed February 16, 2014.
  • 17 Massachusetts Department of Public Health Office of HIV/AIDS.. Massachusetts HIV/AIDS Data Fact Sheet. People Born Outside the United States. Available at Accessed on February 16, 2014.
  • 18 Kent JB. Impact of foreign-born persons on HIV diagnosis rates among Blacks in King County, Washington. AIDS Educ Prev 2005; Dec 17 6 Suppl B 60-7
  • 19 Harawa NT, Bingham TA, Cochran SD, Greenland S, Cunningham WE. HIV prevalence among foreign-and US-born clients of public STD clinics. Am J Public Health 2002; Dec 92 (12) 1958-63
  • 20 Marc LG, Patel-Larson A, Hall HI, Hughes D, Alegria M, Jeanty G, Eveillard YS, Jean-Louis E. HIV among Haitian-born persons in the United States, 1985-2007. AIDS Aug 24 24 (13) 2089-97.
  • 21 Beckwith CG, DeLong AK, Desjardins SF, Gillani F, Bazerman L, Mitty JA, Ross H, Cu-Uvin S. HIV infection in refugees: a case-control analysis of refugees in Rhode Island. Int J Infect Dis 2009; Mar 13 (Suppl. 02) 186-92
  • 22 Antiretroviral Therapy Cohort Collaboration (ART-CC).. Influence of geographical origin and ethnicity on mortality in patients on antiretroviral therapy in Canada, Europe, and the United States. Clin Infect Dis 2013; Jun 56 (12) 1800-9
  • 23 Poon KK, Dang BN, Davila JA, Hartman C, Giordano TP. Treatment outcomes in undocumented Hispanic immigrants with HIV infection. PLoS One 8 (Suppl. 03) e60022.

Correspondence to:

Julie Levison, MD, MPhil, MPH
Program in HIV Epidemiology and Outcomes Research
Medical Practice Evaluation Center
Massachusetts General Hospital
50 Staniford St, 9th Floor
Boston, Massachusetts 02114
Telefon: 617–724–4698   
Fax: 617–726–2691   

  • References

  • 1 Bureau of the Census, US Department of Commerce.. State and County QuickFacts. Washington, DC: Bureau of the Census; 2013. Available at Accessed on February 16, 2014.
  • 2 Derose KP, Escarce JJ, Lurie N. Immigrants and health care: sources of vulnerability. Health Aff (Mill-wood) 2007; Sep-Oct 26 (Suppl. 05) 1258-68
  • 3 Martinez O, Wu E, Sandfort T, Dodge B, Carballo-Dieguez A, Pinto R, Rhodes S, Moya E, Chavez-Baray S. Evaluating the Impact of Immigration Policies on Health Status Among Undocumented Immigrants: A Systematic Review. J Immigr Minor Health Dec 28.
  • 4 Wohl AR, Galvan FH, Myers HF, Garland W, George S, Witt M, Cadden J, Operskalski E, Jordan W, Carpio F. Social support, stress and social network characteristics among HIV-positive Latino and African American women and men who have sex with men. AIDS Behav 2010; Oct 14 (Suppl. 05) 1149-58
  • 5 Wohl AR, Galvan FH, Myers HF, Garland W, George S, Witt M, Cadden J, Operskalski E, Jordan W, Carpio F, Lee ML. Do social support, stress, disclosure and stigma influence retention in HIV care for Latino and African American men who have sex with men and women?. AIDS Behav 2011; Aug 15 (Suppl. 06) 1098-110
  • 6 Keesee MS, Natale AP, Curiel HF. HIV positive Hispanic/Latinos who delay HIV care: analysis of multilevel care engagement barriers. Soc Work Health Care 2012; 51 (Suppl. 05) 457-78.
  • 7 Gilbert PA, Rhodes SD. HIV testing among immigrant sexual and gender minority Latinos in a US region with little historical Latino presence. AIDS Patient Care STDS 2013; Nov 27 (11) 628-36
  • 8 Akinsete OO, Sides T, Hirigoyen D, Cartwright C, Boraas C, Davey C, Pessoa-Brandao L, McLaughlin M, Kane E, Hall J, Henry K. Demographic, clinical, and virologic characteristics of African-born persons with HIV/AIDS in a Minnesota hospital. AIDS Patient Care STDS 2007; May 21 (Suppl. 05) 356-65
  • 9 Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2008; Jan-Feb 15 (Suppl. 01) 14-24
  • 10 Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak 2006; 6: 30.
  • 11 Sada Y, Hou J, Richardson P, El-Serag H, Davila J. Validation of case finding agorithms for hepatocellular cancer from administrative data and electronic health records using natural language processing. Med Care Aug 6.
  • 12 Gundlapalli AV, Carter ME, Palmer M, Ginter T, Redd A, Pickard S, Shen S, South B, Divita G, Duvall S, Nguyen TM, D‘Avolio LW, Samore M. Using natural language processing on the free text of clinical documents to screen for evidence of homelessness among US veterans. AMIA Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 2013 2013: 537-46
  • 13 Kavuluru R, Hands I, Durbin EB, Witt L. Automatic extraction of ICD-O-3 primary xites from cancer pathology reports. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science 2013 2013: 112-6
  • 14 City of Boston, Mayor’s Office of New Bostonians.. New Bostonians Demographic Report. Boston: Mayor’s Office of New Bostonians; 2004. Available at Accessed February 16, 2014.
  • 15 Massachusetts Department of Health and Human Services.. Refugee Arrivals to Massachusetts by Country of Origin. Boston: Massachusetts Department of Health and Human Services; 2013. Available at Accessed February 16, 2014.
  • 16 Bureau of the Census, US Department of Commerce.. American Community Survey, 5-Year Estimates. Updated every year. Washington, DC : Bureau of the Census; 2013. Available at Accessed February 16, 2014.
  • 17 Massachusetts Department of Public Health Office of HIV/AIDS.. Massachusetts HIV/AIDS Data Fact Sheet. People Born Outside the United States. Available at Accessed on February 16, 2014.
  • 18 Kent JB. Impact of foreign-born persons on HIV diagnosis rates among Blacks in King County, Washington. AIDS Educ Prev 2005; Dec 17 6 Suppl B 60-7
  • 19 Harawa NT, Bingham TA, Cochran SD, Greenland S, Cunningham WE. HIV prevalence among foreign-and US-born clients of public STD clinics. Am J Public Health 2002; Dec 92 (12) 1958-63
  • 20 Marc LG, Patel-Larson A, Hall HI, Hughes D, Alegria M, Jeanty G, Eveillard YS, Jean-Louis E. HIV among Haitian-born persons in the United States, 1985-2007. AIDS Aug 24 24 (13) 2089-97.
  • 21 Beckwith CG, DeLong AK, Desjardins SF, Gillani F, Bazerman L, Mitty JA, Ross H, Cu-Uvin S. HIV infection in refugees: a case-control analysis of refugees in Rhode Island. Int J Infect Dis 2009; Mar 13 (Suppl. 02) 186-92
  • 22 Antiretroviral Therapy Cohort Collaboration (ART-CC).. Influence of geographical origin and ethnicity on mortality in patients on antiretroviral therapy in Canada, Europe, and the United States. Clin Infect Dis 2013; Jun 56 (12) 1800-9
  • 23 Poon KK, Dang BN, Davila JA, Hartman C, Giordano TP. Treatment outcomes in undocumented Hispanic immigrants with HIV infection. PLoS One 8 (Suppl. 03) e60022.