Methods Inf Med 2022; 61(05/06): 167-173
DOI: 10.1055/a-1938-0436
Original Article

The Digital Analytic Patient Reviewer (DAPR) for COVID-19 Data Mart Validation

Heekyong Park
1   Research Information Science and Computing, Mass General Brigham, Somerville, Massachusetts, United States
,
Taowei David Wang
1   Research Information Science and Computing, Mass General Brigham, Somerville, Massachusetts, United States
2   Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States
,
Nich Wattanasin
1   Research Information Science and Computing, Mass General Brigham, Somerville, Massachusetts, United States
,
Victor M. Castro
1   Research Information Science and Computing, Mass General Brigham, Somerville, Massachusetts, United States
,
Vivian Gainer
1   Research Information Science and Computing, Mass General Brigham, Somerville, Massachusetts, United States
,
Sergey Goryachev
1   Research Information Science and Computing, Mass General Brigham, Somerville, Massachusetts, United States
,
Shawn Murphy
1   Research Information Science and Computing, Mass General Brigham, Somerville, Massachusetts, United States
2   Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States
› Institutsangaben

Abstract

Objective To provide high-quality data for coronavirus disease 2019 (COVID-19) research, we validated derived COVID-19 clinical indicators and 22 associated machine learning phenotypes, in the Mass General Brigham (MGB) COVID-19 Data Mart.

Methods Fifteen reviewers performed a retrospective manual chart review for 150 COVID-19-positive patients in the data mart. To support rapid chart review for a wide range of target data, we offered a natural language processing (NLP)-based chart review tool, the Digital Analytic Patient Reviewer (DAPR). For this work, we designed a dedicated patient summary view and developed new 127 NLP logics to extract COVID-19 relevant medical concepts and target phenotypes. Moreover, we transformed DAPR for research purposes so that patient information is used for an approved research purpose only and enabled fast access to the integrated patient information. Lastly, we performed a survey to evaluate the validation difficulty and usefulness of the DAPR.

Results The concepts for COVID-19-positive cohort, COVID-19 index date, COVID-19-related admission, and the admission date were shown to have high values in all evaluation metrics. However, three phenotypes showed notable performance degradation than the positive predictive value in the prepandemic population. Based on these results, we removed the three phenotypes from our data mart. In the survey about using the tool, participants expressed positive attitudes toward using DAPR for chart review. They assessed that the validation was easy and DAPR helped find relevant information. Some validation difficulties were also discussed.

Conclusion Use of NLP technology in the chart review helped to cope with the challenges of the COVID-19 data validation task and accelerated the process. As a result, we could provide more reliable research data promptly and respond to the COVID-19 crisis. DAPR's benefit can be expanded to other domains. We plan to operationalize it for wider research groups.

Supplementary Material



Publikationsverlauf

Eingereicht: 04. Juni 2022

Angenommen: 30. August 2022

Accepted Manuscript online:
07. September 2022

Artikel online veröffentlicht:
20. Dezember 2022

© 2022. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Proclamation on Declaring a National Emergency Concerning the Novel Coronavirus Disease (COVID-19) Outbreak. Accessed March 13, 2020 at: https://trumpwhitehouse.archives.gov/presidential-actions/proclamation-declaring-national-emergency-concerning-novel-coronavirus-disease-covid-19-outbreak/#:~:text=1601%20et%20seq.),%2C%20beginning%20March%201%2C%202020
  • 2 New COVID-19 Tools for Researchers. . Accessed September 20, 2022, at: https://rc.partners.org/about/projects-initiatives/new-covid-19-research-tools-researchers
  • 3 Foy BH, Carlson JCT, Reinertsen E. et al. Association of red blood cell distribution width with mortality risk in hospitalized adults with SARS-CoV-2 infection. JAMA Netw Open 2020; 3 (09) e2022058-e2022058
  • 4 Serling-Boyd N, D'Silva KM, Hsu TY. et al. Coronavirus disease 2019 outcomes among patients with rheumatic diseases 6 months into the pandemic. Ann Rheumat Dis 2020; DOI: 10.1136/annrheumdis-2020-219279.
  • 5 Al-Samkari H, Karp Leaf RS, Dzik WH. et al. COVID-19 and coagulation: bleeding and thrombotic manifestations of SARS-CoV-2 infection. Blood 2020; 136 (04) 489-500
  • 6 Robinson LB, Wang L, Fu X. et al. COVID-19 severity in asthma patients: a multi-center matched cohort study. J Asthma 2021; 59 (03) 442-450
  • 7 Seiglie J, Platt J, Cromer SJ. et al. Diabetes as a risk factor for poor early outcomes in patients hospitalized with COVID-19. Diabetes Care 2020; 43 (12) 2938-2944
  • 8 Castro VM, Perlis RH. Electronic health record documentation of psychiatric assessments in Massachusetts Emergency Department and outpatient settings during the coronavirus disease 2019 (COVID-19) pandemic. JAMA Netw Open 2020; 3 (06) e2011346-e2011346
  • 9 Castro VM, McCoy TH, Perlis RH. Laboratory findings associated with severe illness and mortality among hospitalized individuals with coronavirus disease 2019 in eastern Massachusetts. JAMA Netw Open 2020; 3 (10) e2023934-e2023934
  • 10 Lin KJ, Schneeweiss S, Tesfaye H. et al. Pharmacotherapy for hospitalized patients with COVID-19: treatment patterns by disease severity. Drugs 2020; 80 (18) 1961-1972
  • 11 Estiri H, Strasser ZH, Klann JG, Naseri P, Wagholikar KB, Murphy SN. Predicting COVID-19 mortality with electronic medical records. NPJ Digit Med 2021; 4 (01) 15
  • 12 Dashti H, Roche EC, Bates DW, Mora S, Demler O. SARS2 simplified scores to estimate risk of hospitalization and death among patients with COVID-19. Sci Rep 2021; 11 (01) 4945
  • 13 ICD-10-CM COVID-19 coding and reporting guidelines. Accessed September 9, 2022, at: https://www.cdc.gov/nchs/data/icd/COVID-19-guidelines-final.pdf
  • 14 LOINC COVID-19 Prerelease Terms. . Accessed September 9, 2022 at: https://loinc.org/prerelease/
  • 15 CPT4 COVID-19 coding and guidance. Accessed September 9, 2022 at: https://www.ama-assn.org/practice-management/cpt/covid-19-coding-and-guidance
  • 16 Liao KP, Cai T, Savova GK. et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 2015; 350: h1885
  • 17 Liao KP, Cai T, Gainer V. et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken) 2010; 62 (08) 1120-1127
  • 18 Yu S, Ma Y, Gronsbell J. et al. Enabling phenotypic big data with PheNorm. J Am Med Inform Assoc 2017
  • 19 Castro VM, Apperson WK, Gainer VS. et al. Evaluation of matched control algorithms in EHR-based phenotyping studies: a case study of inflammatory bowel disease comorbidities. J Biomed Inform 2014; 52: 105-111
  • 20 Kurreeman F, Liao K, Chibnik L. et al. Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am J Hum Genet 2011; 88 (01) 57-69
  • 21 Liao KP, Sun J, Cai TA. et al. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS. J Am Med Inform Assoc 2019; 26 (11) 1255-1262
  • 22 Zhang Y, Cai T, Yu S. et al. High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc 2019; 14 (12) 3426-3444
  • 23 Liao KP, Ananthakrishnan AN, Kumar V. et al. Methods to develop an electronic medical record phenotype algorithm to compare the risk of coronary artery disease across 3 chronic disease cohorts. PLoS One 2015; 10 (08) e0136651
  • 24 Liao K, Cheng S-C, Yu S. et al. Natural language processing improves phenotypic accuracy in an electronic medical record cohort of type 2 diabetes and cardiovascular disease. J Am Coll Cardiol 2014; 63: 1359
  • 25 Brownstein JS, Murphy SN, Goldfine AB. et al. Rapid identification of myocardial infarction risk associated with diabetes medications using electronic medical records. Diabetes Care 2010; 33 (03) 526-531
  • 26 Ahuja Y, Zhou D, He Z. et al. sureLDA: a multidisease automated phenotyping method for the electronic health record. J Am Med Inform Assoc 2020; 27 (08) 1235-1243
  • 27 Yu S, Liao KP, Shaw SY. et al. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J Am Med Inform Assoc 2015; 22 (05) 993-1000
  • 28 Perlis RH, Iosifescu DV, Castro VM. et al. Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model. Psychol Med 2012; 42 (01) 41-50
  • 29 Kohane IS, Aronow BJ, Avillach P. et al; Consortium For Clinical Characterization Of COVID-19 By EHR (4CE). What every reader should know about studies using electronic health record data but may be afraid to ask. J Med Internet Res 2021; 23 (03) e22219
  • 30 Klappe ES, Cornet R, Dongelmans DA, de Keizer NF. Inaccurate recording of routinely collected data items influences identification of COVID-19 patients. Int J Med Inform 2022; 165: 104808
  • 31 Lee J, Kim JH, Liu C. et al. Columbia open health data for COVID-19 research: database analysis. J Med Internet Res 2021; 23 (09) e31122
  • 32 Helmer TT, Lewis AA, McEver M. et al. Creating and implementing a COVID-19 Recruitment Data Mart. J Biomed Inform 2021; 117: 103765
  • 33 Lybarger K, Ostendorf M, Thompson M, Yetisgen M. Extracting COVID-19 diagnoses and symptoms from clinical text: a new annotated corpus and neural event extraction framework. J Biomed Inform 2021; 117: 103761
  • 34 SC CTSI INITIATIVE COVID-19 Data Mart. . Accessed September 9, 2022, at: https://sc-ctsi.org/about/initiatives/covid-19-data-mart
  • 35 CRDW COVID-19 Data Mart. . Accessed September 9, 2022, at: https://cri.uchicago.edu/covid19/#datamart
  • 36 Haendel MA, Chute CG, Bennett TD. et al; N3C Consortium. The National COVID cohort collaborative (N3C): rationale, design, infrastructure, and deployment. J Am Med Inform Assoc 2021; 28 (03) 427-443
  • 37 Fleuren LM, Dam TA, Tonutti M. et al. The Dutch Data Warehouse, a multicenter and full-admission electronic health records database for critically ill COVID-19 patients. Crit Care 2021; 25 (01) 304
  • 38 Agapito G, Zucco C, Cannataro M. COVID-WAREHOUSE: A Data Warehouse of Italian COVID-19, Pollution, and Climate Data. Int J Environ Res Public Health 2020; 17 (15) E5596
  • 39 Brat GA, Weber GM, Gehlenborg N. et al. International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. NPJ Digit Med 2020; 3 (01) 109
  • 40 Klann JG, Estiri H, Weber GM. et al; Consortium for Clinical Characterization of COVID-19 by EHR (4CE) (CONSORTIA AUTHOR). Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data. J Am Med Inform Assoc 2021; 28 (07) 1411-1420
  • 41 Jorge A, Castro VM, Barnado A. et al. Identifying lupus patients in electronic health records: development and validation of machine learning algorithms and application of rule-based algorithms. Semin Arthritis Rheum 2019; 49 (01) 84-90
  • 42 Barak-Corren Y, Castro VM, Nock MK. et al. Validation of an electronic health record-based suicide risk prediction modeling approach across multiple health care systems. JAMA Netw Open 2020; 3 (03) e201262-e201262
  • 43 Castro VM, Minnier J, Murphy SN. et al; International Cohort Collection for Bipolar Disorder Consortium. Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am J Psychiatry 2015; 172 (04) 363-372
  • 44 Rosenberg MA, Lubitz SA, Lin H. et al. Validation of polygenic scores for QT interval in clinical populations. Circ Cardiovasc Genet 2017; 10 (05) e001724
  • 45 Chen C-Y, Lee P, Castro V. et al. Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records. Eur Neuropsychopharmacol 2019; 29: 968-969
  • 46 Carroll RJ, Thompson WK, Eyler AE. et al. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc 2012; 19 (e1): e162-e169
  • 47 Yu S, Ma Y, Gronsbell J. et al. Enabling phenotypic big data with PheNorm. J Am Med Inform Assoc 2018; 25 (01) 54-60
  • 48 Mass General Brigham Center for COVID Innovation. . Accessed September 9, 2022, at: https://covidinnovation.partners.org/
  • 49 Harris M, Zalis M. QPID: Ontology-aided Search for Adverse Properties in the Medical Record. presented at: Radiological Society of North America 2008 Scientific Assembly and Annual Meeting; 2008
  • 50 Chartier C, Gfrerer L, Austen Jr WGJ. ChartSweep: a HIPAA-compliant tool to automate chart review for plastic surgery research. Plast Reconstr Surg Glob Open 2021; 9 (06) e3633
  • 51 Boulos MNK, Zhang P. Digital twins: from personalised medicine to precision public health. J Pers Med 2021; 11 (08) 745