RSS-Feed abonnieren

DOI: 10.1055/s-0045-1809617
Assessing the Reliability of ChatGPT and Gemini in Identifying Relevant Orthodontic Literature
Funding The author appreciates the support and funding provided by Prince Sattam Bin Abdulaziz University for the entire research project.

Abstract
Objectives
Artificial intelligence (AI)-based solutions offer potential remedies to the issues encountered in conventional reference identification methods. However, the effectiveness of these AI models in assisting orthodontic experts in discovering relevant material is unknown. The purpose of this study was to assess the validity of ChatGPT and Google Gemini in delivering references for orthodontic literature studies.
Materials and Methods
This study utilized ChatGPT models (3.5 and 4) and Gemini to search for topics in orthodontics and specific subdomains. To verify the existence and precision of the cited references, several reputable sources were employed, including PubMed, Google Scholar, and Web of Science.
Statistical Analysis
Descriptive statistics were employed to present the data numerically and as percentages, focusing on three aspects: completeness, accuracy, and fabrication. Reliability analysis was conducted using Cronbach's α and the results were visually presented in the form of the correlation heat map.
Results
Out of all references, only 15.76% were correct, whereas 71.92% were fake or fabricated references and 12.32% were inaccurate references. Gemini had the significantly highest proportion of correct references (36.36%), followed by GPT 3.5 (15.76%) and GPT 4 (0.95%) (p-value < 0.01). The reliability score of 0.418 indicate low-to-moderate consistency in the accuracy of the references.
Conclusion
While Gemini showed better performance than GPT models, significant limitation remains in all three models in reference generations. These findings advocate for balanced and cautious use of AI tools in academic research related to orthodontics, emphasizing human validation of the references and training of dental professionals and researchers in efficient use of AI tools.
Publikationsverlauf
Artikel online veröffentlicht:
08. August 2025
© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)
Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India
-
References
- 1 Khamkhien A. The art of referencing: patterns of citation and authorial stance in academic texts written by Thai students and professional writers. J Engl Acad Purposes 2025; 74: 101470
- 2 Jomaa NJ, Bidin SJ. Perspectives of EFL doctoral students on challenges of citations in academic writing. Malaysian J Learning Instruct 2017; 14 (02) 177-209
- 3 Mehta V, Thomas V, Mathur A. AI-dependency in scientific writing. Oral Oncol Rep 2024; 10 (03) 100269
- 4 Divecha CA, Tullu MS, Karande S. The art of referencing: well begun is half done!. J Postgrad Med 2023; 69 (01) 1-6
- 5 Suppadungsuk S, Thongprayoon C, Krisanapan P. et al. Examining the validity of ChatGPT in identifying relevant nephrology literature: findings and implications. J Clin Med 2023; 12 (17) 5550
- 6 Azizah NN, Maryanti R, Nandiyanto ABD. How to search and manage references with a specific referencing style using Google Scholar: from step-by-step processing for users to the practical examples in the referencing education. Indonesian J Multidiciplinary Res 2021; 1 (02) 267-294
- 7 Tomášik J, Zsoldos M, Oravcová Ľ. et al. AI and face-driven orthodontics: a scoping review of digital advances in diagnosis and treatment planning. AI 2024; 5 (01) 158-176
- 8 Martin S, Hussain Z, Boyle JG. A beginner's guide to the literature search in medical education. Scott Med J 2017; 62 (02) 58-62
- 9 Boudry C, Alvarez-Muñoz P, Arencibia-Jorge R. et al. Worldwide inequality in access to full text scientific articles: the example of ophthalmology. PeerJ 2019; 7: e7850
- 10 King S, Davidson K, Chitiyo A, Apple D. Evaluating article search and selection procedures in special education literature reviews. Remedial Spec Educ 2020; 41 (01) 3-17
- 11 Menon D, Shilpa K. “Chatting with ChatGPT”: analyzing the factors influencing users' intention to use the Open AI's ChatGPT using the UTAUT model. Heliyon 2023; 9 (11) e20962
- 12 Alyasiri OM, Salman AM, Akhtom D, Salisu S. ChatGPT revisited: using ChatGPT-4 for finding references and editing language in medical scientific articles. J Stomatol Oral Maxillofac Surg 2024; 125 (5S2, Supplement 2): 101842
- 13 Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 2023; 15 (02) e35179
- 14 Biswas SS. ChatGPT for research and publication: a step-by-step guide. J Pediatr Pharmacol Ther 2023; 28 (06) 576-584
- 15 Flaherty HB, Yurch J. Beyond plagiarism: ChatGPT as the vanguard of technological revolution in research and citation. Res Soc Work Pract 2024; 34 (05) 483-486
- 16 Imran M, Almusharraf N. Google Gemini as a next generation AI educational tool: a review of emerging educational technology. Smart Learning Environments. 2024; 11 (01) 22
- 17 Barrot J. Leveraging Google Gemini as a research writing tool in higher education. Technol Knowled Learning 2024; 30 (01) 1-8
- 18 Giray L. ChatGPT references unveiled: distinguishing the reliable from the fake. Internet Ref Serv Q 2024; 28 (01) 9-18
- 19 Wagner MW, Ertl-Wagner BB. Accuracy of information and references using ChatGPT-3 for retrieval of clinical radiological information. Can Assoc Radiol J 2024; 75 (01) 69-73
- 20 Ramos-Gomez F, Marcus M, Maida CA. et al. Using a machine learning algorithm to predict the likelihood of presence of dental caries among children aged 2 to 7. Dent J 2021; 9 (12) 141
- 21 Omar M, Nassar S, Hijazi K, Glicksberg BS, Nadkarni GN, Klang E. Generating credible referenced medical research: a comparative study of openAI's GPT-4 and Google's Gemini. Comput Biol Med 2025; 185: 109545
- 22 Aziz AAA, Abdelrahman HH, Hassan MG. The use of ChatGPT and Google Gemini in responding to orthognathic surgery-related questions: a comparative study. J World Fed Orthod 2025; 14 (01) 20-26
- 23 Labrague LJ. Utilizing artificial intelligence-based tools for addressing clinical queries: ChatGPT versus Google Gemini. J Nurs Educ 2024; 63 (08) 556-559
- 24 Hatia A, Doldo T, Parrini S. et al. Accuracy and completeness of ChatGPT-generated information on interceptive orthodontics: a multicenter collaborative study. J Clin Med 2024; 13 (03) 735
- 25 Thurzo A, Strunga M, Urban R, Surovková J, Afrashtehfar KI. Impact of artificial intelligence on dental education: a review and guide for curriculum update. Educ Sci (Basel) 2023; 13 (02) 150
- 26 Gravel J, D'Amours-Gravel M, Osmanlliu E. Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions. Mayo Clin Proc Digit Health 2023; 1 (03) 226-234
- 27 Pirkle S, Yang J, Blumberg TJ. Do ChatGPT and Gemini provide appropriate recommendations for pediatric orthopaedic conditions?. J Pediatr Orthop 2025; 45 (01) e66-e71
- 28 Rane N, Choudhary S, Rane J. Gemini versus ChatGPT: applications, performance, architecture, capabilities, and implementation. J Appl Artif Intell. 2024; 5 (01) 69-93
- 29 Bhattacharyya M, Miller VM, Bhattacharyya D, Miller LE. High rates of fabricated and inaccurate references in ChatGPT-generated medical content. Cureus 2023; 15 (05) e39238
- 30 Walters WH, Wilder EI. Fabrication and errors in the bibliographic citations generated by ChatGPT. Sci Rep 2023; 13 (01) 14045
- 31 Day T. A preliminary investigation of fake peer-reviewed citations and references generated by ChatGPT. Prof Geogr 2023; 75 (06) 1024-1027
- 32 Chen A, Chen DO. Accuracy of chatbots in citing journal articles. JAMA Netw Open 2023; 6 (08) e2327647
- 33 Lechien JR, Briganti G, Vaira LA. Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology-head and neck surgery. Eur Arch Otorhinolaryngol 2024; 281 (04) 2159-2165
- 34 Buchanan J, Hill S, Shapoval O. ChatGPT hallucinates non-existent citations: evidence from economics. Am Econ 2024; 69 (01) 80-87
- 35 Frosolini A, Franz L, Benedetti S. et al. Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. Eur Arch Otorhinolaryngol 2023; 280 (11) 5129-5133
- 36 Chelli M, Descamps J, Lavoué V. et al. Hallucination rates and reference accuracy of ChatGPT and Bard for systematic reviews: comparative analysis. J Med Internet Res 2024; 26 (01) e53164
- 37 Jamaluddin J, Gaffar NA, Din NSS. Hallucination: a key challenge to artificial intelligence-generated writing. Malays Fam Physician 2023; 18: 68
- 38 Mugaanyi J, Cai L, Cheng S, Lu C, Huang J. Evaluation of large language model performance and reliability for citations and references in scholarly writing: cross-disciplinary study. J Med Internet Res 2024; 26 (01) e52935
- 39 Haman M, Školník M. Using ChatGPT to conduct a literature review. Account Res 2024; 31 (08) 1244-1246
- 40 Rahman M, Terano HJR, Rahman N, Salamzadeh A, Rahaman S. ChatGPT and academic research: a review and recommendations based on practical examples. J Educ, Mngt, and Dev Studies. 2023; 3 (01) 1-12
- 41 Snigdha NT, Batul R, Karobari MI. et al. Assessing the performance of ChatGPT 3.5 and ChatGPT 4 in operative dentistry and endodontics: an exploratory study. In: Tatu AL, ed. Human Behavior and Emerging Technologies. Toronto, Canada: JMIR Publications; 2024;2024(1):1119816
- 42 Aljamaan F, Temsah MH, Altamimi I. et al. Reference hallucination score for medical artificial intelligence chatbots: development and usability study. JMIR Med Inform 2024; 12: e54345