In recent years, AI has been used in the field of pathologic image diagnosis, and many studies have shown promising results in detecting and diagnosing cancers in a variety of organs, including the stomach, breast, skin, prostate, brain, and lung [13, 22–24]. As for cervical cancer, with the advancement in the management of preinvasive lesions, the increasing diagnostic workload of cervical biopsy calls for the development of high-performance algorithms with high sensitivity and specificity. Practically, many pathologists experienced more difficulty and burden in accurately classifying preinvasive CIN lesions than in distinguishing between invasive and non-neoplastic condition [4]. Therefore, we focused on developing an optimized CNN system for CIN grading.
Two CNN architectures, DenseNet-161 and EfficientNet-B7, were adopted in our study. EfficientNet-B7 is a recently developed heavy model and a state-of-the-art architecture; it showed better performance than DenseNet-161 [25]. However, DenseNet-161 also showed excellent performance and presented good cost-effectiveness [26]. Through repeated validation and tests of two CNN models, we determined the optimal image preprocessing conditions (640 × 480 pixels size and normalization of each RGB color channel in the ImageNet dataset) in which the CNN models can achieve better performance. In addition, we found that data augmentation and histogram equalization did not improve the model performance. In the 4-class classification, the mean accuracies for DenseNet161 and EfficientNet-B were 88.5% and 89.5%, respectively, and the performance was similar to that of human pathologists (93.2% and 89.7%, respectively). The mean AUC values of both CNN models were considerably high in all four classes (Table 1). Furthermore, the mean accuracies of both models for 3-class classification were increased to 91.4% and 92.6% by DenseNet-161 and EfficientNet-B7, respectively, which are almost the same levels as those of human pathologists (95.7% and 92.3%, respectively).
In the natural clinical course, CIN1 has a low potential for progression and a high potential for regression, whereas CIN2 and CIN3 have a higher potential for progression and a lower potential for regression [3]. CIN3 is a direct precursor of invasive cervical cancer, and active treatment is recommended. By contrast, CIN1 observation is the preferred approach. Therefore, it is much more critical to quickly determine CIN1 or CIN3 than CIN2 or CIN3. Despite some misclassification of CIN2, EfficientNet-B7 perfectly discriminated CIN1 from CIN3 based on the 4-class classification (Fig. 2), and the results demonstrated its clinical applicability.
We observed that the CNN models have a weakness in classifying CIN2 (75.2% and 73.0% sensitivities for DenseNet-161 and EfficientNet-B7, respectively); considering that it is often challenging for pathologists to distinguish CIN2 from CIN1 and CIN3 and inter-observer agreement is notoriously poor at this interface, even among experts [4], the performance of these CNN models is almost similar to that of human pathologists even in this respect. Moreover, the difficulty in classifying CIN2 can be attributed to its inherent nature, which is intermediate in the morphological spectrum of CIN. Due to the ambiguity of CIN2 diagnosis based on the hematoxylin and eosin (H&E) morphology, the LAST Project suggested that the addition of p16 immunohistochemical stain significantly improves the reliability of CIN2 diagnosis and advised the use of p16 staining to confirm the presence of a high-grade lesion when CIN2 is diagnosed based on H&E slide [2]. In future studies, analyzing H&E images along with the results of p16 immunohistochemical staining would be helpful to increase diagnostic accuracy of CNN models.
For determining the CIN1 lesions, the mean sensitivities were 82.1% and 85.2% by DenseNet-161 and EfficientNet-B7, respectively, which were lower than those of CIN3 and non-neoplasms. On histologic review, the scarcity of characteristic koilocytotic cells in CIN1, severe inflammation, and metaplastic changes might have contributed to the inaccuracy of CNN classification. For more precise detection of koilocytotic cells, the CNN model needs to be improved. To reduce the false-positive rate, more variable non-neoplastic lesions, such as chronic cervicitis, metaplastic mucosa, and atrophy, should be included in the study set, and a repeat validation would be helpful.
Automated screening machines have been developed for analyzing cervical cytology smears, and a few FDA-approved automated primary screening device are available [14]. However, it is more difficult to develop an automated tool for cervical tissue histology due to the complexity of the patterns observed and the structural associations between different tissue components [17]. Keenan et al. developed a machine vision system for histological grading of CIN using the KS400 macro programming language. It was a scoring system that analyzes geometric data, and 62.7% of the CIN cases with captured images were correctly classified [17]. Several previous studies have used multiclass support vector machines and gray-level co-occurrence matrices to analyze whole slide images (WSIs) or selected images [18, 21, 27]. Despite some promising results, the small data size of less than 100 cases with insufficiently validated or curated images and the extremely complicated methodology limited the applicability of the study results. Huang et al. proposed a method based on the least absolute shrinkage and selection operator and ensemble learning support vector machine [19]. They showed that the accuracy of normal-cancer classification was high (99.64%), but the accuracy of the low-grade squamous intraepithelial lesion (LSIL)-high-grade squamous intraepithelial lesion (HSIL) classification was 76.34%. A recent study that classified cervical tissue pathological images based on fusing deep convolution features has been published [28]. The researchers analyzed the dataset comprising small-sized images cropped from 468 WSIs, including those of normal tissues, LSIL, HSIL, and cancer; Resnet50v2 and DenseNet121(C5) showed excellent performance, with an average classification accuracy of 95.33%.
Pathologic classification is an image-based method, and CNN is an optimized AI tool for image learning. Our study showed that CNN is a robust instrument for pathologic classification, but some things must be considered. For CNN to be developed and to work properly, collecting a large amount of accurate data is of utmost importance. Because CNN produces results very faithfully in the learned input, the quality of the CNN output absolutely depends on the quality of the input data. In order to develop a clinically relevant CNN model for pathologic diagnosis, a superb dataset from expert pathologists must be constructed. Recently, Meng et al. provided a public cervical histopathology dataset for computer-aided diagnosis, called MTCHI [20]. Pathologic diagnosis is sometimes equivocal and might be challenging to perform in some lesions in the gray zone or lesions with reactive changes. Therefore, pathologists should continue to improve and make an objective pathological diagnosis. A limitation of high-quality H&E slide images is the need for using AI to perform a pathologic diagnosis. Although staining and mounting are automated, preparing pathology slides, sectioning, and embedding are still manually performed. Artifacts in the production process, such as tissue overlapping, tangential embedding, and poor sectioning, hinder the acquisition of focused images and cause AI to make diagnostic errors.
Hence, we aimed to develop an artificial technique for classifying CIN from the WSI of cervical biopsy, but some practical difficulties were observed. In the WSIs of tissues, grading of intraepithelial neoplasia or dysplasia is much more complicated than finding lesions or cancer. Because CIN is a morphological spectrum, cervical biopsy specimens show large differences in disease degrees and mix of lesions. This makes it difficult for pathologists to precisely annotate according to the CIN grade in small biopsies. Compared with other tissues such as the breast, colon, and stomach, the specimen used for cervical biopsy are tissue strips or appear irregular in shape and often include a small amount of epithelium. In addition, it is easily embedded in a disoriented or tangential manner. These were obstacles in making a standardized dataset using WSIs suitable for training and validation of the CNN model. In this study, we built a reliable dataset of CIN provided by three qualified pathologists and analyzed the CNN performance prior to its application in WSI. The dataset and the advanced CNN model, EfficientNet-B7, might be applied in future research using the WSI of cervical biopsy.
In summary, we built a reliable dataset for CIN classification and showed that EfficientNet-B7 and DenseNet-161 provided a promising performance in classifying cervical lesions on digital histology images. In terms of accuracy, EfficientNet-B7 had a functional advantage over DenseNet-161. Grad-CAM images used in the CNN models located the areas where CIN lesions can be found. Moreover, we realized that the accurate identification and classification of CIN by CNN relies entirely on the standardized diagnosis of pathologists, and the professional knowledge and analytical experience of pathologists are the cornerstone of technical advancement. An exquisite AI tool trained using a well-established and standardized dataset would be helpful in improving the pathology services worldwide.