The outcomes of the current study wherein the quality, reliability, readability, and originality of the space maintainer information provided by ChatGPT-3.5 and 4 revealed that both tools have similar mean values regarding the assessed parameters. ChatGPT-3.5 revealed an outstanding quality and ChatGPT-4 revealed a good quality with median values of 5 and 4. The tools also revealed high reliability.
In the developing and changing world, information is displacing serially, and individuals have the chance to get any information meeting with their curiosity with one click. People tend to seek information on the internet and social media previously, however, with the explosive increase in the NLP and LLM, the interest has verged to AI-based chatbots. ChatGPT, one of the most known of these models, has phenomenal popularity and according to our literature review, the features of the space maintainer-related information including reliability and quality seemed to be not assessed previously. Accordingly, the current study aimed to assess the features including quality and reliability of space maintainer-related information provided by the ChatGPT 3.5 and compare these outcomes with the upgraded and paid version named ChatGPT 4.
In a previous similar study, Duran et al. [6] assessed the quality, reliability, readability, and originality of ChatGPT information on cleft lip and palate. Accordingly, ChatGPT-4 was revealed to be a source with high reliability and good quality based on reliability and GQS analysis. FRES results pointed out that the readability of the text created by this tool was ‘’difficult’’. Plagiarism checks revealed acceptable levels of similarity [6]. Yurdakurban et al. [18] have also conducted a similar study wherein they assessed the quality, reliability, readability, and originality of information provided by different AI chatbots (Open Evidence, MediSearch, ChatGPT4) on the subject of orthognathic surgery. Accordingly, all the assessed chatbots performed a high reliability and good quality. The readability index-SMOG revealed that the provided data requires education with a college degree and higher. Chat GPT seemed to have the highest originality.18These outcomes have similarities with the findings of the current study with high reliability, good quality, and low similarity levels.
Buldur and Sezer [19] have previously conducted a study wherein they asked the frequently asked question on the use of fluoride in dentistry determined by ADA to Chat GPT and compare the answers with those of ADA’s. Accordingly, the outcomes revealed that the answers provided by ChatGPT were more detailed and scientific and the authors notified that ChatGPT was reliable and sufficient on the subject of Fluoride in dentistry [19] these outcomes reveal the benefits of ChatGPT also supported the results of the current study we held.
Rokhshad et al. [20] conducted a study wherein they directed 30 true-false questions to pediatric dentists, general dentists, dental students, and different chatbots GPT-3.5. Accordingly, pediatric dentists seem to generate significantly more accurate answers than other clinicians and chatbots. The success of the groups in answering questions were as follows; pediatric dentists, general dentists, chatbots, and students. Among the other chatbots, ChatGPT-3.5 generated more acceptable consistency [20].
Ahmed et al. [21] have also assessed the quality of dental caries-related multiple-choice questions generated by ChatGPT and Bard. Accordingly, they concluded that these tools could generate questions related to dental caries at the cognitive level of knowledge. Bard displayed a higher cognitive level and generated more absolute terms in the related subject [21].
In another previous similar study, Abu Arqub al. [22] assessed the accuracy of the answers provided by ChatGPT on the subject of orthodontic clear aligners. Accordingly, they reported that the answers to the queries were objectively true with a ratio of 58% and false with a ratio of 15%. They conclude that the overall accuracy of the answers was suboptimal, and the software has a limited ability to offer correct and up-to-date information regarding the searched subject. They also are warned of the false claims that were provided by the tools [22]. Haita et al. [23] have analysed the clinical scenarios of interceptive orthodontics, accordingly 21 open-ended questions comprising various clinical cases were directed to ChatGPT. Although the tool has presented a good ability to generate answers to difficult clinical cases, they proposed that ChatGPT still cannot be identified as sophisticated, and the tool is not intellectual enough to replace the mind of a human being [23]. Giannakopoulos et al. [24] examined the performance of LLM models in answering clinically relevant questions in different disciplines of dentistry. Accordingly, the outcomes revealed that all language models revealed that although ChatGPT-4 is more successful than ChatGPT-3.5, all chatbots seem to be exhibiting inaccuracies and outdated content. The answers were lack of reference sources and irrelevant information was also detected [24]. The outcomes of these studies were contradicted to the outcomes of the current study. However, this inconsistency might be related to the variations in the assessment tools recruited in the current study.
Since the use of space maintainers, the types, indications, and the clinical applications consist of limited information, it was not possible to verify the asked questions. The questions were directed to the assessed chatbots only one time and the consistency of the answers provided by the same tool in various times was not able to be assessed. The answers of different chatbots except ChatGPT were not studied within the project. These issues can be listed as the limitations of the current study. Conducting a further study eliminating these limitations can be a useful attempt before declaring a committal regarding recruiting AI-based chatbots as supervisor on heath related subjects specific to pediatric space maintainers.