In recent years, the use of AI systems in medicine and dentistry in image interpretation has attracted great interest and CNN systems change these areas rapidly [23]. However, the number of studies conducted in the field of periodontology is still limited. The current study aims to automatically evaluate panoramic radiographs by an AI system and to determine the success of the related system in detecting periodontal disease findings on radiographs. The results of this study show that AI systems can be a decision-support mechanism for physicians/specialists in the diagnosis of periodontal disease, one of the most common diseases in the world.
Different architectures can be used in AI-based studies. U-Net architecture, a CNN-based AI architecture, is one of the image segmentation techniques that enable the evaluation of images in the medical field with AI systems and provides a more precise and successful evaluation with fewer training sets [31, 32]. Therefore, in the current study, image processing was performed using the U-Net architecture.
In the literature, there were many studies in which AI algorithms were used in the processing and interpretation of patient data, which played a role in diagnosis and treatment planning, such as dental radiographic images [19, 33–35], intraoral photographs [36–39] and pathology images [40]. In addition, these systems were also used in situations such as pre-examination evaluation and risk estimation [41]. Also, it was seen that some of the studies aiming to determine periodontal disease from 2D dental radiographs using AI systems were performed on periapical radiographs [20, 27, 42] and some on panoramic radiographs [3, 6, 24, 28, 29].
In one of the studies in the literature, Lee et al. (2018) used 1740 periapical radiographs and evaluated the prediction success of the CNN algorithm they developed. In the results of this study, they reported that the accuracy of determining premolar teeth with periodontal damage was 81% and 76.7% for molars [26]. Similar to this study, Lee et al. (2021) performed another study on 693 periapical radiographs [17]. In this study, it was aimed to find the radiographic bone loss by grouping it according to the severity of destruction (stages I-II-III) [17] High success rates were found in the detection of periodontal disease in the results of these two studies.
On the other hand, Khan et al. (2021) tried to find dental problems such as caries, alveolar bone loss, and inter radicular bone loss on 206 periapical images using different AI architectures (U-Net, X Net, and Seg Net) in their study [27]. Although this study was similar to our study in terms of using the segmentation method in the detection of periodontal bone loss, no evaluation was made in terms of bone destruction characteristics (horizontal, vertical) [27]. Another difference was Khan et al. (2021) performed their studies on periapical radiographs and with fewer data [27]. Although the evaluations were made on different types of radiography images, it could be said that the success rates of our study were higher. We think that the reason for the higher success rates of the AI system developed in our study may be due to the use of a larger data set. Because as the number of data used for training in AI studies increases, the success rates of the model also increase [43].
In another study, Kurt Bayrakdar et al. (2021) used Google Net Inception v3 architecture on a large data set (2276 panoramic radiographs). They evaluated the success of the AI systems they developed in the determination of radiographs of patients with periodontitis/alveolar bone destruction [6]. In this study, 1137 radiographs of patients with bone destruction and panoramic radiographs of 1139 periodontal healthy individuals were used for the development of the AI model. F1 score and accuracy were found as high as 91% in this study [6]. However, no labeling was made, only the classification method was used to train the radiographs in the form of patient/healthy and to find them by the system. In this respect, they are quite different from our study in terms of this study planning. When the literature is reviewed, it is seen that there are different techniques in computer vision such as classification, object detection, and image segmentation [44, 45]. Classification is useful at the yes-no level of deciding whether an image contains an object (pathology, anatomical structure…etc.) or not. Object detection deals with distinguishing between objects (pathology, anatomical structure…etc.) on an image. It does this by simply surrounding the object with a box, without defining its boundaries. In segmentation, images are divided into pixel groupings and it is possible to define an object (pathology, anatomical structure…etc.) from its exact boundaries [44, 45]. The image segmentation technique plays a significant role in identifying a disorder, treatment planning, and routine follow-ups in the medical field because it has the advantage of presenting disorders/pathologies by identifying all their borders [46]. The reason why it is not preferred in some studies is that it takes time and is a tedious process [47]. Based on this information, it can be said that the segmentation technique we used in our study is the most advantageous method and provides the physician with more detailed information for diagnosis and treatment planning.
There were also studies in the literature comparing the diagnosis of physicians with different experiences and AI model predictions. The results obtained in this way are undoubtedly more interpretable and reveal the success of the system more clearly. For example, Krois et al. (2019) compared the evaluation of 6 dentists and the results of AI in a panoramic radiography study with the CNN technique. They reported the accuracy, specificity, and sensitivity rates for AI-system as 81% in the results of their study [24]. Since this study deals with the evaluation of many physicians, it was a more comprehensive and superior study in terms of planning. However, in the present study, periodontal bone losses were not classified as horizontal or vertical. Unlike this study, one of the main purposes of our study was to guide the physician in treatment planning by enabling the system to find defects in the form of vertical/horizontal/furcation.
In another similarly planned study, Kim et al. (2020) compared the evaluations of 5 different clinicians with the performance of AI in their study for the determination of bone resorption sites using 12179 panoramic radiography [28]. They reported that while the average F1 scores of the clinicians’ results in determining bone destruction were 69%, AI showed higher success and the F1 score was 75%. This study reveals the success of AI systems in radiography interpretation and shows promise for the future use of these systems [28]. Also, Chang et al. (2020) tried to perform the determination of bone loss from panoramic radiographs without any evaluation of bone destruction angulation and defect type [3]. In this study, the staging was tried to be made according to the 2017 periodontitis classification by calculating the amount of bone destruction and destruction [3]. Bone destruction geometry is important when staging in the new classification. In fact, with this aspect, evaluations such as vertical-horizontal-furcation defects, as in our study, can provide more accurate and systematic results. This should be taken into account in future studies.
Finally, Jiang et al. (2022) tried to detect periodontal bone destruction in their study using the CNN model using 640 panoramic radiographs. This study is the most similar to our study in the literature [29] because Jiang et al (2022) also determined the defect types of periodontal disease in the form of vertical-horizontal-furcation defects in their study [29]. Although it was similar to our study in this respect, the object detection method was used in this study for labeling. The segmentation method we use is a much more advantageous method since it determines the defective area with its borders. Because it provides more detailed information to the physician to determine the severity of the disease and to plan the treatment in the next process. In addition, in this method, the defect area is processed like a detailed map and provides more advanced diagnostic support visually. On the other hand, when two studies were compared in detail, it was seen that they used multiple observers, and two different AI architectures (U-Net and Yolo-v4), and included the severity of the disease in their assessment. In these respects, it could be said that their study was superior and more detailed than ours.
One of the limitations of the study was that the decision of multiple observers was not separately compared with the predictions of AI. Another limitation was that no measurements were made to determine disease severity in the study and it was aimed to detect only bone loss areas. More extensive research could have been done using calibrated panoramic radiographs, based on direct measurement, or to determine the percentage of root affected. However, It should be noted that in all 2D radiographic imaging techniques, the evaluation of bone craters, lamina dura, and periodontal bone level is limited to the projection geometry and superposition of adjacent anatomical structures [48]. In other words, these radiographs do not provide clear and reliable information for measurement and treatment planning. For this reason, performing AI-based studies with three-dimensional (3D) radiographic imaging techniques such as Cone-beam computed tomography systems (CBCT) will prevent this limitation. Undoubtedly, as the number of studies in this field increases, AI algorithms with stronger decision-support capability and providing much more detailed information will be developed. Despite the limitations, our study is promising for such studies to be carried out in the future. In addition, the performance of the AI model in determining vertical bone defects in our study was weaker than other periodontal parameters. It is a known fact that vertical bone defects are less common than horizontal bone loss, and therefore, the number of labels for vertical bone defects was more limited in our study [14]. We think that the number of labels used in the detection of vertical bone defects with AI in our study was therefore less, and this situation caused the performance of the model to be found to be weaker. Because the most important thing in the success of AI studies is to work with large data sets. This limitation could be overcome, albeit to a limited extent, by using different AI architectures and making different technical plans. For example, in our study, cross-validation techniques could be used during model development and multi-class training could be done. On the other hand, the more important reason for the low success rates for the vertical bone defect parameter also may be the incomplete understanding of the outline of the alveolar bone crest in 1, 2, and 3-walled defects. We think that interobserver agreements were also lower for this parameter due to this limitation, which may be due to the limited view of panoramic radiography. This diagnostic difficulty can be eliminated by using some radiography techniques that provide more detailed images in future studies.