Dear Editor, Dr. Ray, and Dr. Wiwanitkit
I have carefully read your letter and the important issues raised in it regarding
our work on the evaluation of custom ChatGPT agents in the management of pancreatic
cystic lesions [1]
[2]. I agree with the limitations you have pointed out, keeping in mind that this study
is intended as a proof of concept for a novel feature of large language models (LLMs).
Although 60 scenarios is not a large sample size, most similar studies in the field
of gastroenterology and LLMs studied a similar number of scenarios [3]
[4]
[5]. The issue of interrogation of discrepancies as part of a systemic approach to the
development and training of LLMs should indeed be further implemented and improved.
An example of such a systemic approach is described in a recent work that evaluated
various setups using ChatGPT for management of hepatitis C infection [5]. The ethical considerations including bias and error handling and user responsibility
should also be explored. Hopefully, these issues will be addressed gradually in future
studies as this field of currently novel research progresses from generated scenarios
to evaluation on clinical data and prospective interventional studies.