J Neurol Surg B Skull Base 2024; 85(S 01): S1-S398
DOI: 10.1055/s-0044-1780077
Presentation Abstracts
Oral Abstracts

Comparing ChatGPT-4 to BARD for Accuracy and Completeness of Responses to Questions Derived from the International Consensus Statement on Endoscopic Skull Base Surgery

Yavar Abgin
1   California Northstate University, Elk Grove, California United States
,
Kayla Umemoto
1   California Northstate University, Elk Grove, California United States
,
Sean Polster
2   University of Chicago, Chicago, Illinois, United States
,
Arthur W. Wu
3   Cedars-Sinai Medical Center, Los Angeles, California, United States
,
Andrew Goulian
1   California Northstate University, Elk Grove, California United States
,
Christopher R. Roxbury
2   University of Chicago, Chicago, Illinois, United States
,
Omar G. Ahmed
4   Houston Methodist, Houston, Texas, United States
,
Pranay Soni
5   Cleveland Clinic, Cleveland, Ohio, United States
,
Dennis M. Tang
3   Cedars-Sinai Medical Center, Los Angeles, California, United States
› Institutsangaben
 

Introduction: Artificial intelligence (AI) language models, such as Chat Generative Pre-Trained Transformer 4 (GPT-4) by OpenAI and Bard by Google, have emerged in 2022 as tools for answering questions, providing information, and offering suggestions to the layperson. These programs are large language models trained on available data to synthesize responses. Chat GPT-4 and Bard have the potential to impact how information is disseminated to patients greatly; however, it is essential to understand how these answers compare to experts in the corresponding field. The International Consensus Statement on Endoscopic Skull Base Surgery 2019 (ICAR:SB) is an international multidisciplinary collaboration to critically evaluate and grade the current literature. The goal of this study is to assess the accuracy and completeness of GPT-4 and Bard generated responses to questions based on ICAR:SB guidelines.

Methods: Endoscopic skull-base surgery policies and grade of evidence were extracted from the ICAR:SB ([Table 1]). Questions were synthesized for each policy statement and input into GPT 4 and Bard. The GPT-4 and Bard answers were graded by a fellowship-trained rhinologist and skull-base neurosurgeon using a 5-point Likert scale for accuracy and completeness. Statistical analysis included descriptive statistics and chi-square testing comparing graded answers of GPT-4 to Bard.

Results: The mean accuracy and completeness of GPT-4 were 4.76 and 4.51, respectively. The mean accuracy and completeness of Bard were 4.25 and 3.84, respectively. The distribution of the scores can be seen on Fig. 1. Chi-square testing comparing Chat GPT-4 to Bard demonstrated statistically significant differences in accuracy (p = 0.005) and completeness (p = 0.004).

Discussion: Overall accuracy and completeness were high for generated responses by GPT-4 and Bard, however, GPT-4 was significantly better in both domains. The capabilities of language models will continue to evolve as more up-to-date information is integrated into future iterations. As the popularity of AI programs continues to expand, patients may search for answers to their healthcare questions on these platforms and it is critical for physicians to monitor the responses from these programs.

Conclusion: This study demonstrates that GPT-4 and Bard generated accurate and complete responses when graded by fellowship-trained rhinologists and skull-base neurosurgeons. AI language models have the potential to be robust tools for disseminating information in the future.

Table 1

Example policy statement from ICAR:SB and the associated question proposed to the language models

Policy

Policy level

Treatment option

Question

X.C. Lumbar Drain after ESBS

Option

LD placement before and/or after ESBS may be used during ESBS

When should a lumber drain be used after endoscopic endonasal skull base surgery?

Zoom
Fig. 1 Bar graph showing count of the results of scoring for accuracy and completeness of GPT-4 and Bard answers based on 5-point Likert scale.


Publikationsverlauf

Artikel online veröffentlicht:
05. Februar 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany