Extracting Medical Information from Paper COVID-19 Assessment Forms

Colin G. White-Dzuro; Jacob D. Schultz; Cheng Ye; Joseph R. Coco; Janet M. Myers; Claude Shackelford; S. Trent Rosenbloom; Daniel Fabbri

doi:10.1055/s-0041-1723024

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035026.xml

PDF herunterladen

Appl Clin Inform 2021; 12(01): 170-178
DOI: 10.1055/s-0041-1723024

Research Article

Extracting Medical Information from Paper COVID-19 Assessment Forms

Autoren

Colin G. White-Dzuro^*

¹Vanderbilt University School of Medicine, Nashville, Tennessee, United States
Jacob D. Schultz‡^*

¹Vanderbilt University School of Medicine, Nashville, Tennessee, United States
Cheng Ye

²Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Joseph R. Coco

²Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Janet M. Myers

³Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Claude Shackelford

³Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States
S. Trent Rosenbloom

¹Vanderbilt University School of Medicine, Nashville, Tennessee, United States

²Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Daniel Fabbri

²Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States

Funding None.

Weitere Informationen

Auch verfügbar auf

Lizenzen und Reprints

Abstract

Objective This study examines the validity of optical mark recognition, a novel user interface, and crowdsourced data validation to rapidly digitize and extract data from paper COVID-19 assessment forms at a large medical center.

Methods An optical mark recognition/optical character recognition (OMR/OCR) system was developed to identify fields that were selected on 2,814 paper assessment forms, each with 141 fields which were used to assess potential COVID-19 infections. A novel user interface (UI) displayed mirrored forms showing the scanned assessment forms with OMR results superimposed on the left and an editable web form on the right to improve ease of data validation. Crowdsourced participants validated the results of the OMR system. Overall error rate and time taken to validate were calculated. A subset of forms was validated by multiple participants to calculate agreement between participants.

Results The OMR/OCR tools correctly extracted data from scanned forms fields with an average accuracy of 70% and median accuracy of 78% when the OMR/OCR results were compared with the results from crowd validation. Scanned forms were crowd-validated at a mean rate of 157 seconds per document and a volume of approximately 108 documents per day. A randomly selected subset of documents was reviewed by multiple participants, producing an interobserver agreement of 97% for documents when narrative-text fields were included and 98% when only Boolean and multiple-choice fields were considered.

Conclusion Due to the COVID-19 pandemic, it may be challenging for health care workers wearing personal protective equipment to interact with electronic health records. The combination of OMR/OCR technology, a novel UI, and crowdsourcing data-validation processes allowed for the efficient extraction of a large volume of paper medical documents produced during the COVID-19 pandemic.

Keywords

COVID-19 - data processing - optical mark recognition - optical character recognition - data creation and storage - crowdsourcing - medical form extraction

Protection of Human and Animal Subjects

None.

^* Authors contributed equally to this study.

Supplementary Material

Supplementary Material (PDF) (opens in new window)

Publikationsverlauf

Eingereicht: 10. September 2020

Angenommen: 25. Dezember 2020

Artikel online veröffentlicht:
10. März 2021

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Patel PD, Cobb J, Wright D. et al. Rapid development of telehealth capabilities within pediatric patient portal infrastructure for COVID-19 care: barriers, solutions, results. J Am Med Inform Assoc 2020; 27 (07) 1116-1120

Crossref PubMed Suche in Google Scholar
Download RIS citation
2 Kim SI, Lee JY. Walk-through screening center for COVID-19: an accessible and efficient screening system in a pandemic situation. J Korean Med Sci 2020; 35 (15) e154

Crossref PubMed Suche in Google Scholar
Download RIS citation
3 Islam MS, Rahman KM, Sun Y. et al. Current knowledge of COVID-19 and infection prevention and control strategies in healthcare settings: a global analysis. Infect Control Hosp Epidemiol 2020; 41 (10) 1196-1206

Crossref PubMed Suche in Google Scholar
Download RIS citation
4 Ferioli M, Cisternino C, Leo V, Pisani L, Palange P, Nava S. Protecting healthcare workers from SARS-CoV-2 infection: practical indications. Eur Respir Rev 2020; 29 (155) 200068

Crossref PubMed Suche in Google Scholar
Download RIS citation
5 Downs SM, Carroll AE, Anand V, Biondich PG. Human and system errors, using adaptive turnaround documents to capture data in a busy practice. AMIA Annu Symp Proc 2005; 2005: 211-215

PubMed Suche in Google Scholar
Download RIS citation
6 Collen MF. Clinical research databases--a historical review. J Med Syst 1990; 14 (06) 323-344

Crossref PubMed Suche in Google Scholar
Download RIS citation
7 Shah NH, Tenenbaum JD. The coming age of data-driven medicine: translational bioinformatics' next frontier. J Am Med Inform Assoc 2012; 19 (e1): e2-e4

Crossref PubMed Suche in Google Scholar
Download RIS citation
8 Bhargava BK, McDonald CJ, Rivera HP, McCarthy LJ, Blevins L. Development and Implementation of a Computerized Clinical Laboratory System. Lab Med 1976; 7 (12) 28-37

Crossref Suche in Google Scholar
Download RIS citation
9 Tafti AP, Baghaie A, Assefi M, Arabnia HR, Yu Z, Peissig P. OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis G, Boyle R, Parvin B. et al., eds. Advances in Visual Computing. Lecture Notes in Computer Science. Springer International Publishing; 2016: 735-746

Suche in Google Scholar
Download RIS citation
10 Biondich PG, Overhage JM, Dexter PR, Downs SM, Lemmon L, McDonald CJ. A modern optical character recognition system in a real world clinical setting: some accuracy and feasibility observations. Proc AMIA Symp 2002; 56-60

PubMed Suche in Google Scholar
Download RIS citation
11 Biondich PG, Anand V, Downs SM, McDonald CJ. Using adaptive turnaround documents to electronically acquire structured data in clinical settings. AMIA Annu Symp Proc 2003; 2003: 86-90

PubMed Suche in Google Scholar
Download RIS citation
12 Shiffman RN, Brandt CA, Freeman BG. Transition to a computer-based record using scannable, structured encounter forms. Arch Pediatr Adolesc Med 1997; 151 (12) 1247-1253

Crossref PubMed Suche in Google Scholar
Download RIS citation
13 Titlestad G. Use of document image processing in cancer registration: how and why?. Medinfo 1995; 8 (Pt 1): 462

PubMed Suche in Google Scholar
Download RIS citation
14 Bussmann H, Wester CW, Ndwapi N. et al. Hybrid data capture approach for monitoring patients on highly active antiretroviral therapy (HAART) in urban Botswana. Bull World Health Organ Int J Public Health 2006; 842: 127-131

Crossref PubMed Suche in Google Scholar
Download RIS citation
15 Bergeron BP. Optical mark recognition. Tallying information from filled-in ‘bubbles’. Postgrad Med 1998; 104 (02) 23-25

Crossref PubMed Suche in Google Scholar
Download RIS citation
16 Shiffman R, Brandt C, Hoffman M, Wiig W, Fernandes L. SEURAT: scanned entry of structured data for a pediatric health maintenance record system. Accessed April 18, 2020 at: https://www.researchgate.net/publication/25901454_SEURAT_Scanned_Entry_of_Structured_Data_for_a_Pediatric_Health_Maintenance_Record_System

Download RIS citation
17 Loke SC, Kasmiran KA, Haron SA. A new method of mark detection for software-based optical mark recognition. PLoS One 2018; 13 (11) e0206420

Crossref PubMed Suche in Google Scholar
Download RIS citation
18 Chouvatut V, Prathan S. The flexible and adaptive X-mark detection for the simple answer sheets. 2014 International Computer Science and Engineering Conference. Accessed 2014 at: https://ieeexplore.ieee.org/document/6978236

Download RIS citation
19 Sattayakawee N. Test scoring for non-optical grid answer sheet based on projection profile method. Int J Inf Educ Technol 2013; 273-277

Suche in Google Scholar
Download RIS citation
20 Rakesh S, Atal K, Arora A. Cost effective optical mark reader. Int J Comput Sci Artif Intell. Accessed April 18, 2020 at: https://scholar.google.com/scholar_lookup?journal=International+Journal+of+Computer+Science+and+Artificial+Intelligence&title=Cost+effective+optical+mark+reader&author=S+Rakesh&author=K+Atal&author=A+Arora&volume=3&publication_year=2013&pages=44&

Download RIS citation
21 Bradski G. The Open CV Library. Dr Dobbs J Softw Tools. Accessed 2000 at: https://www.drdobbs.com/open-source/the-opencv-library/184404319

Download RIS citation
22 Ye C, Coco J, Epishova A. et al. A crowdsourcing framework for medical data sets. AMIA Jt Summits Transl Sci Proc 2018; 2017: 273-280

PubMed Suche in Google Scholar
Download RIS citation
23 Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 2009; 42 (02) 377-381

Crossref PubMed Suche in Google Scholar
Download RIS citation
24 Harris PA, Taylor R, Minor BL. et al; REDCap Consortium. The REDCap consortium: building an international community of software platform partners. J Biomed Inform 2019; 95: 103208

Crossref PubMed Suche in Google Scholar
Download RIS citation
25 van Doremalen N, Bushmaker T, Morris DH. et al. Aerosol and surface stability of SARS-CoV-2 as compared with SARS-CoV-1. N Engl J Med 2020; 382 (16) 1564-1567

Crossref PubMed Suche in Google Scholar
Download RIS citation
26 Popescu S. Roadblocks to infection prevention efforts in health care: SARS-CoV-2/COVID-19 response. Disaster Med Public Health Prep 2020; 14 (04) 538-540

Crossref PubMed Suche in Google Scholar
Download RIS citation
27 Anand V, Carroll AE, Downs SM. Automated primary care screening in pediatric waiting rooms. Pediatrics 2012; 129 (05) e1275-e1281

Crossref PubMed Suche in Google Scholar
Download RIS citation
28 Fifolt M, Blackburn J, Rhodes DJ. et al. Man versus machine: comparing double data entry and optical mark recognition for processing CAHPS survey data. Qual Manag Health Care 2017; 26 (03) 131-135

Crossref PubMed Suche in Google Scholar
Download RIS citation
29 Leung GM, Leung K. Crowdsourcing data to mitigate epidemics. Lancet Digit Health 2020; 2 (04) e156-e157

Crossref PubMed Suche in Google Scholar
Download RIS citation
30 Spasic I, Nenadic G. Clinical text data in machine learning: systematic review. JMIR Med Inform 2020; 8 (03) e17984

Crossref PubMed Suche in Google Scholar
Download RIS citation
31 Kawado M, Hinotsu S, Matsuyama Y, Yamaguchi T, Hashimoto S, Ohashi Y. A comparison of error detection rates between the reading aloud method and the double data entry method. Control Clin Trials 2003; 24 (05) 560-569

Crossref PubMed Suche in Google Scholar
Download RIS citation
32 Paulsen A, Overgaard S, Lauritsen JM. Quality of data entry using single entry, double entry and automated forms processing--an example based on a study of patient-reported outcomes. PLoS One 2012; 7 (04) e35087

Crossref PubMed Suche in Google Scholar
Download RIS citation

Zusatzmaterial

Supplementary Material (PDF) (opens in new window)

Ähnliche Zeitschriften

RSS-Feed abonnieren

Teilen / Bookmarken

Extracting Medical Information from Paper COVID-19 Assessment Forms

Autoren

Abstract

Keywords

Protection of Human and Animal Subjects

Supplementary Material

Publikationsverlauf

References