Background: Open-source large language models may provide a solution to the data privacy issues
hindering the use of large language models for processing health records. In this
study we assess the performance of a recently released state-of-the-art open-source
and offline-capable large language model, in data extraction from unstructured electronic
health records.
Methods: Fifty fictitious patient medical records were drafted in German and the open-source
large language model (all three differently sized variants: 405B, 70B, and 8B) was
provided with instructions on processing each one. Data extraction involved text-mining
and classification tasks for nine variables. Two closed-source state-of-the-art large
language models were used for comparison. Large language model prompting and use were
performed via online available deployments of the models.
Results: The accuracy of the open-source large language model over all 450 requested values
was 100% (no false predictions) for the 405B model, 98.6% (6 false predictions, all
binary classifications) for the 70B model, and 90.8% (41 false predictions, all binary
classifications) for the 8B model. The accuracy of both compared closed-source large
language models was 100% (no false predictions).
Conclusion: The 405B version of the open-source large language model exhibited excellent performance,
on par with the two compared closed-source models. Further research with a local offline
installation of the 405B model on a computationally capable computing infrastructure
using real health records is warranted to confirm these results.