Subscribe to RSS
DOI: 10.1055/s-0044-1780667
Data Extraction from Unstructured Medical Records with the GPT-4 Chatbot
Background: This study presents a novel approach for using GPT-4 (generative pre-trained transformer 4) as a tool for data extraction from unstructured medical records and assesses its performance on this task.
Methods: Fifty fictitious patient medical records were drafted in German language and GPT-4 was provided with instructions, on how to process each one. Data extraction involved text-mining and classification tasks for nine variables. The accuracy, recall, precision, and F1-Score of the GPT-4 model were assessed for each requested variable.
Results: The accuracy of GPT-4 in data-mining tasks was 100% for all requested variables (patient identifier, EuroSCORE 2, admission and exit date). GPT-4 exhibited the following performance in classification tasks for each requested variable: stroke (accuracy 96%, F1-Score 95.80%), cardiac tamponade (accuracy 100%, F1-Score 100%), pacemaker implantation (accuracy 100%, F1-Score 100%), atrial fibrillation (accuracy 100%, F1-Score 100%) and death (accuracy 100%, F1-Score 100%).
Conclusion: GPT-4 exhibited excellent performance in both text-mining and classification tasks. Reliable data extraction from unstructured medical records seems possible with the GPT-4 chatbot.
Publication History
Article published online:
13 February 2024
© 2024. Thieme. All rights reserved.
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany