SCC-NET: Segmentation of Clinical Cancer image for Head and Neck Squamous Cell Carcinoma

doi:10.21203/rs.3.rs-4577408/v1

Download PDF

Research Article

SCC-NET: Segmentation of Clinical Cancer image for Head and Neck Squamous Cell Carcinoma

https://doi.org/10.21203/rs.3.rs-4577408/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background: Head and neck cancer predominantly originates from the mucosal layer of the upper aerodigestive tract, with squamous cell carcinoma representing the majority of cases. Therefore, a comprehensive oral and upper aerodigestive tract endoscopy examination serves as the primary diagnostic method for these cancers. While deep learning, particularly in computer vision, has been extensively researched for lesion segmentation in various diagnostic endoscopies such as colon polyps and gastric lesions, there have been limited reports on deep learning algorithms specifically tailored for segmenting head and neck squamous cell carcinoma.

Methods: This study comprises a case series investigating artificial intelligence algorithms, focusing on head and neck squamous cell carcinoma (HNSCC) endoscopic images captured between 2016 and 2020. The images were sourced from the Department of Otolaryngology-Head and Neck Surgery at Kaohsiung Veterans General Hospital, a tertiary medical center in southern Taiwan. All photos were rigid endoscopy documentation of tumors histologically confirmed as SCC through biopsy or surgical excision. Importantly, these tumors were captured at the initial presentation of the disease, prior to any surgical or chemo-radiotherapy intervention.

We introduce a novel modification of the Neural Architecture Search (NAS) - U-Net-based model, termed SCC-Net, tailored for segmenting the enrolled endoscopic photos. This modification incorporates a new technique termed "Learnable Discrete Wavelet Pooling," which devises a new formulation by combining outputs from different layers using a channel attention module, assigning weights based on their importance in information flow. Additionally, we integrated the cross-stage-partial design from CSPnet. To evaluate performance, we compared SCC-Net with eight other state-of-the-art image segmentation models.

Results: We collected a total of 556 pathologically confirmed SCC photos of oral cavity, oropharynx, hypopharynx and glottis. The new SCC-Net algorithm achieves high mean Intersection over Union (mIOU) of 87.2%, accuracy of 97.17%, and recall of 97.15%. When comparing the performance of our proposed model with 8 different state-of-the-art image segmentation artificial neural network models, our model performed best in mIOU, DSC, accuracy and recall.

Conclusions: Our proposed SCC-Net architecture successfully segmented lesions from white light endoscopic images with promising accuracy, demonstrating consistent performance across all upper aerodigestive tract areas.

Squamous cell carcinoma

endoscopy

convolution neural network

neural architecture search

Head and neck cancer refers to the development of malignant cells in the upper aerodigestive tract, encompassing the oral cavity, oropharynx, hypopharynx, glottis, and sinonasal cavity. Among these sites, oral cancer represents the majority of cases and involves various locations such as the tongue, buccal mucosa, upper and lower gingiva, retromolar trigone, and hard palate. Squamous cell carcinoma(SCC) accounts for more than 90% of these cases. [1] Head and neck cancer is the 11th most commonly reported cancer worldwide in terms of incidence. It is particularly prevalent in South Asia, with India alone accounting for one-fifth of all new cases of oral cancer.[2] Several risk factors are associated with the development of oral cancer, including tobacco use, alcohol consumption, betel nut chewing, human papillomavirus infection, chronic inflammation, and alterations in the oral microbiome. [3] The 5-year overall survival and disease-specific survival rates were 64.4% and 79.3% respectively.[4] However, the survival rates of oral cancer significantly depend on the stage at diagnosis. The 5-year disease-specific survival for early-stage oral cancer ranged from 91.3–75.8%, whereas for advanced-stage oral cancer, it ranged from 40.5–60.9%.[5] Early detection and diagnosis of oral cancer are crucial in prolonging life expectancy and minimizing the years of life lost for patients.[6] Previous studies have reported a significant correlation between patient delay and the diagnosis of advanced-stage cancer.[7] In high-prevalence countries like Taiwan, where clinical recognition of oral lesions can detect up to 99% of oral cancers,[8] there has been an advocacy for policies aimed at improving oral cavity cancer screening. As a response, a national oral cancer screening program was implemented. The results of this program demonstrated a 26% reduction in mortality among the screened group. [9] Therefore, the development of new technologies aimed at reducing the burden and cost of oral cancer screening may facilitate early detection of the disease

Recently, the application of machine learning and deep learning in medical image interpretation has opened up a new field of medical image analysis. Deep learning models have been utilized to detect and analyze oral cancers using various types of images, including endoscopic images,[10]clinical image or smartphone-based images,[11, 12] narrowed band image[13], Cancer Genome Atlas information,[14] hyperspectral images, [15] computed tomography images, [16] pathological specimen microscopy photos[12] and customized device.[17] These models have been developed to predict survival, differentiate between tumor grades, and detect lymph node metastasis. While the performance of these deep learning models has an accuracy ranging from 77.89–97.51% and 76–94.2%,[18] the application of deep learning in oral cancer had mostly focused on tumor detection and classification. However, there is a lack of literature on medical image segmentation, as designing a neural network for medical image segmentation requires a vast hyperparameter search space and long training time due to the large volumetric data involved.

Neural Architecture Search (NAS) was initially developed to automate the design of artificial neural networks. In the context of medical image segmentation, the NAS-Unet algorithm has shown promising results for segmenting magnetic resonance imaging, computed tomography, and ultrasound images. [19] The NAS-Unet outperformed other architectures such as U-net and FC-DenseNet. Furthermore, a modified version called the Efficient MultiObjective NAS framework has been proposed for 3D medical image segmentation.[20] However, the specific application of NAS-Unet for clinical image segmentation has not been studied before.

In this study, we present a novel semantic segmentation model called SCC-Net which was based on NAS-Unet for analyzing endoscopy photos of the upper aerodigestive tract (UADT). Our specific focus is on the segmentation of oral SCC in various sub-sites within the oral cavity and oropharynx. The ultimate goal is to develop an automatic notification system or margin delineating system that can assist in surgical preparation and contribute to intelligent medical healthcare.

This study is a retrospective multicenter study in which we developed a convolutional neural network for the autosegmentation of SCC in the head and neck using white light imaging endoscopy photos. The study received approval from the Institutional Review Board and Ethics Committee of Kaohsiung Veterans General Hospital (Approval No: KSVGH22-CT10-16).

Image source

All the photos used in this study were retrospectively reviewed from the pathology-proven SCC files between the years 2016 and 2020. The photos included various subsites, namely the oral cavity (including the lip, tongue, buccal mucosa, upper gingiva, lower gingiva, hard palate, floor of the mouth, and retromolar trigone), oropharynx (including the tonsil, soft palate, posterior pharyngeal wall, lateral pharyngeal wall, and tongue base), glottis (including the supraglottis, glottis, and subglottis), and hypopharynx (including the pyriform sinus, post-cricoid region, and posterior pharyngeal wall).

All the photos were captured during the initial presentation of the disease, before any surgical or chemo-radiotherapy intervention. Only tumors that were histologically confirmed as SCC through biopsy or surgical excision were included in our study. The images were documented using a Karl Storz Hopkins II 0-, 30-, or 70-Degree Telescope (KARL STORZ SE & Co. KG, Tuttlingen, Germany) with a medical white light source Karl Storz Cold Light Fountain Power LED 175 SCB. Documentation was done in the central cancer registry computer.

Dataset preparation

The images were manually annotated by two head and neck surgeon (C.Y.H and L.A.H) to define the tumor involvement area in the photos and cross review of the annotations were done. If disagreement of the tumor margin was encountered, a third head and neck surgeon will be consulted for agreement. In cases with multiple tumors or skipped lesions, multiple lesions were annotated accordingly. The annotated margins were then transformed into black and white masks. The datasets were classified as three different entities (oral cavity, oropharynx, hypopharynx/glottis) according to anatomy trait. The model was trained in separated three subgroup (class = 3) and as a single dataset(class = 1) to test for the performance.

For this study, we randomly allocated 80% of the dataset as the training and validation set, while the remaining 20% was designated as the test set. In each training dataset, 30% were allocated as validation dataset for model validation. There was no intersection between the images used for training and those used for testing. Another set of photos with same inclusion criteria but not included in the model training or testing was used as external validation dataset was also used for demonstrating model performance.

Training Parameters

Our experiments were implemented in Python using the PyTorch package as the backend and were executed on an Nvidia GeForce GTX 3090 GPU with 24GB of RAM. To update the weight parameters, we employed the AdaBound optimizer with a learning rate ranging from 1.0^− 3 and weight decay of 5.0^− 4. The batch size was set to 16, and all models were trained for 800 epochs.

Neural Architecture Search architecture

NAS[21] is a computational approach that automatically optimizes neural architecture design. The main challenge of NAS is on how to deal with the exponentially expensive space when exploring and evaluating neural architecture paths. We tackle this problem by considering a novel formulation regarding the search space in terms of what needs to be considered, and how best to reduce complexity.

Differentiable Architecture Search (DARTS) is an automatic neural network architecture search method that revolves around a differentiable approach. DARTS parametrizes neural network structures, enabling the entire search process to be differentiable. This shifts the paradigm from traditional methods, such as evolutionary or reinforcement learning-based approaches, to gradient descent optimization. The key features of DARTS include: Differentiable Neural Architectures: DARTS encapsulates neural network architecture parameters, like convolutional kernel weights and connections, allowing optimization using gradient-based techniques. This eliminates the need for manually exploring numerous network structures. Gradient Descent Optimization: DARTS employs gradient descent to optimize a proxy network's architecture and weights. The proxy network approximates the original network's performance, evolving as the search progresses, ultimately finding an architecture that excels on validation data. Connection Weights and Mixed Operations: Connection weights indicate the strength of connections between nodes, defining the network's topological structure. Mixed operations denote each node's function, like convolution or pooling. Training and Validation Process: DARTS typically includes training the proxy network on the training data, followed by assessing the architecture's performance on validation data. This iterative process refines architecture and weights to achieve the best outcome. DARTS' fundamental innovation is its differentiable nature, which enables automated, efficient neural architecture searches, eliminating the need for laborious manual design and fine-tuning, while consistently delivering outstanding performance in diverse applications.

We have developed a new technique called Learnable Discrete Wavelet Pooling (LDW-Pooling) that can replace standard pooling operations in the search for better architectures for medical image segmentation.[22] LDW-Pooling is reversible and can be applied universally, making it a versatile and powerful tool for NAS.

To further optimize the NAS process, we have designed a new formulation that combines the outputs of different layers using a channel attention module (CAM) and assigns weights based on their importance in the information flow.(Fig. 1)

This reduces the exponential search to a linear weight optimization, leading to a more effective and efficient path search. To speed up the efficiency of cell architecture search, we have incorporated the cross-stage-partial (CSP) design from CSPnet[23] to improve the gradient computation in DARTS.(Fig. 2)

Our proposed DARTS-CSP scheme offers two key benefits: speeding up the architecture search and evaluation time, and reducing the search latency while increasing the accuracy of model inference. By using CSP to optimize gradient computation, we can achieve faster and more accurate neural architecture search results.

When applied to medical image segmentation task, the U-Net[24] is a mainstream architecture for medical image segmentation. We combined our optimized NAS to U-Net-based model called SCC-Net for segmenting of our enrolled endoscopic photos.

Comparison models

We compare the performance of our model with 8 state of the art medical segmentation model Unet++, VGG16-UNet, Resnet50-UNet, EfficientNet B0-UNet, EfficientNet B7-UNet, DenseNet121-UNet, MobileNetv2-UNet and DeepLabv3+. Each model had the same training dataset with same training and validation ratio. The data augmentation method were the same as our proposed SCC-Net.

Outcome Analysis

The model performance was compared with other classic and state of the art deep learning models analyzing the same dataset. In each model, the deep learning predicted mask were compared to previous annotated ground truth mask delineated by the head and neck expert. The masks were compared between ground truth and the model generated prediction. Each pixel was classified as true positive (TP), false positive (FP), true negative (TN) and false negative (FN). Five derived parameters were used to compare the segmentation performance in each model.

Intersection over Union (IoU): the fraction of pixels that are true positives among the union of pixels that are positive predictions and belong to the target class

$$IoU= \frac{TP}{TP+FN+FP}$$

Dice similarity coefficient (DSC): represent the harmonic weight of Precision and Recall values (also called F1 score):

$$DSC = \frac{2TP}{2TP + FN + FP}$$

Accuracy: the percentage of pixels in the image that is correctly classified by the model

$$Accuracy = \frac{\text{T}\text{P} + \text{T}\text{N}}{\text{T}\text{P}+\text{F}\text{P}+\text{T}\text{N}+\text{F}\text{N}}$$

Precision (positive predictive value): the fraction of pixels that are true positives (correctly predicted pixels of the targeting class) among the total predicted pixels

$$Precision=\frac{TP}{TP+FP}$$

Recall (sensitivity): the fraction of pixels that are true positives among the total ground truth segmented pixels:

$$Recall=\frac{TP}{TP+FN}$$

The dataset used for this study consisted of a total of 556 annotated SCC photos. Among these cases, the majority (71.4%) were in the oral cavity. The oropharynx accounted for 15.1% of the cases, while the combined regions of the glottis and hypopharynx represented 13.5% of the cases. (Table 1).

Table 1

Enrolled datasets for deep learning analysis
Anatomy class	Number of photos
Oral cavity	397(71.4%)
Oropharynx	84(15.1%)
Glottis/Hypopharynx	75(13.5%)
Total	556

Using the SCC-Net, a stable model was successfully obtained within 800 epochs, demonstrating no overfitting in the validation dataset. The descending loss function reached a stable state at 1750 epochs. Notably, the validation dataset outperformed the training dataset in terms of mIOU and DSC plots, highlighting the advantageous effects of NAS in improving model performance. (Fig. 3)

We had trained our model in two methods, first we separate our training dataset into three categories (oral cavity, oropharynx and glottis/hypopharynx) according to anatomy similarity which found mIOU 0.7609, DSC 0.77, Accuracy 0.9583, Precision 0.7751 and Recall 0.9145. When combining all dataset as one single dataset, the result improved with mIOU 0.8720, DSC 0.8680, Accuracy 0.9717, precision 0.6946 and recall 0.9715. We compare our SCC-Net with other state of the art model and the result showed our SCC-Net had the best result in terms of mIOU, DSC, accuracy and recall. (Table 2)

Table 2

Various convolutional neural network model performance evaluations
Models	mIOU	DSC	Accuracy	Precision	Recall
U-Net++	0.6409	0.5421	0.9040	0.5284	0.5842
VGG16-UNet	0.6860	0.6303	0.9181	0.7429	0.5473
Resnet50-UNet	0.6889	0.6336	0.9199	0.7612	0.5427
EfficientNet B0-UNet	0.6607	0.5816	0.9165	0.8063	0.4549
EfficientNet B7-UNet	0.7082	0.6644	0.9250	0.7750	0.5814
DenseNet121-UNet	0.6967	0.6444	0.9237	0.7953	0.5416
MobileNetv2-UNet	0.7062	0.6583	0.9271	0.8196	0.5501
DeepLabv3+	0.6419	0.5566	0.9043	0.6809	0.4709
SCC-Net(class = 3)	0.7609	0.7700	0.9583	0.7751	0.9145
SCC-Net(class = 1)	0.8720	0.8680	0.9717	0.6946	0.9715

We had used another set of clinical photos which were not included in the original datasets as external validation and the result were showed in the Fig. 4.

In this study, we present the promising results of a new convolutional neural network SCC-Net specifically designed for the automatic segmentation of head and neck SCC. Our proposed method successfully managed the segmentation of the entire UADT, including the oral cavity, oropharynx, hypopharynx, and larynx, using a single model with high mIOU and DSC.

Endoscopy stands a pivotal role in the diagnosis and management of head and neck cancer. However, leveraging deep neural networks for the analysis of endoscopy images poses significant challenges due to various factors included variable angles, presence of bubbles/fluid, optical artifacts like light reflexes and shadows and image quality variability including inadequate resolution, lack of sharpness and variations in RGB resolution. While a robust training dataset that encompass these obstacles may fulfill the training purpose. However, a diverse and representative dataset to capture the variability in oral cancer cases had not been published. A systemic review examined 332 published articles/datasets discussing oral cancer image analysis, only one publicly available oral cancer image dataset was identified.[25] Contrast the scarcity of oral cancer datasets with the abundance of datasets available for colonoscopy and panendoscopy, which have been widely used in artificial intelligence research due to their availability. (Table 3) The lack of large datasets in oral cancer poses a significant challenge for deep neural network training in this domain. Efforts to bridge this dataset gap are crucial to unlock the full potential of artificial intelligence in oral cancer management, enabling more accurate and efficient diagnosis, treatment planning, and monitoring of patients.

Table 3. Clinical Photo Open Available Datasets
Panendoscopy	Esophageal Endoscopy Images
	Capsule endoscopy dataset
	KVASIR. N=8,000
	ERS. N=6,000
Glottis	Benchmark for Automatic Glottis Segmentation (BAGLS) N=59,250
Colonscopy	MICCAI 2017
	CVC colon DB. N=612
	SUN-SEG. N=158,690
	GLRC dataset.
	KUMC dataset.
Oral cancer	SHIVAM BAROT Oral Cancer images(tongue&lip) N=87

Segmentation serves as a crucial prerequisite for autonomous diagnosis and various computer- and robot-aided interventions. However, head and neck cancer segmentation from endoscopic images is considered as a challenging task due to the variations in size and shape of tumor among different patients. While oropharynx and glottis tumor were rare, the missing rate of smaller tumor during endoscopy exam especially in oropharynx and hypopharynx area is also another issue that needs to be addressed. Therefore, an automatic algorithm to segment the malignancy lesion during endoscopy exam could aid in diagnosis especially in pharynx exam.

In recent years, convolutional neural network (CNN) based classification of endoscopy images has garnered significant attention. Song et al. developed a smartphone-based system for automatic image classification of oral cavity (OC) lesions using CNNs, achieving an accuracy of 87%, sensitivity of 85%, and specificity of 89% by evaluating dual-modality images (white light and autofluorescence).[26] Mascharak et al. utilized naïve Bayesian classifiers trained with low-level image features to automatically detect and quantitatively analyze oropharyngeal SCC using narrowed band image (NBI) multispectral imaging, demonstrating increased diagnostic accuracy compared to conventional white light video-endoscopy.[27] Ren et al. collected a large dataset of 24,667 laryngoscopic images and trained a CNN-based classifier that outperformed clinical visual assessment by 12 otolaryngologists, achieving an overall accuracy of 96.24%.[28] Inaba et al. employed RetinaNet for the detection of superficial laryngopharyngeal cancer from normal pharyngeal mucosa, achieving an accuracy of 97.3%.[29] Kono et al. used a combination of 1243 white light images and 3316 NBI images to train a MASK RCNN model for cancer detection, yielding sensitivity, specificity, PPV, NPV, and accuracy values of 92%, 47%, 55%, 89%, and 66%, respectively.[30] Heo et al. trained 12 convolutional neural network classification algorithms using 5,576 tongue endoscopic images, including 1,941 pathologically proven cancer lesions. The deep learning model achieved an accuracy of 84.7%, while general physicians and oncology specialists achieved accuracies of 75.9% and 91.2%, respectively.[10] Recently, Flügge et al, had used a deep-learning approach based on Swin-Transformer to automatically detect Oral SCC on clinical photographs. With a classification accuracy of 0.986 and an AUC of 0.99, the method shows promise in assisting clinicians for early detection of oral cancer.[31]

Despite successful detection and classification of oral SCC, publications regarding segmentation oral SCC clinical photos scarce. This study represents a significant advancement as the first attempt to validate a deep learning segmentation model capable of achieving accurate results across the oral cavity, oropharynx, hypopharynx, and glottis in SCC. For segmentation tasks, only two publications had described using deep learning model to perform segmentation over UADT. Paderno et al had analyzd 34 and 45 narrowed band image endoscopic videos of oral cavity and oropharynx lesions demonstrated DSC of 0.7603.[13] Muhammad et al had published their result of a novel deep learning segmentation model (SegMENT) reporting segmentation of laryngeal SCC with the following median values: 0.68 intersection over union (IoU), 0.81 dice similarity coefficient (DSC), 0.95 recall, 0.78 precision, 0.97 accuracy with other result of oral cavity and oropharynx SCC in table. (Table 4) Our proposed SCC-Net demonstrated superior mIOU and recall compared to all previous deep learning results. However, the precision did not perform as well, and we can provide two possible explanations for this. Firstly, considering that the model was utilized as a screening tool where high sensitivity (recall) is crucial, the precision (positive predictive value) may have been compromised in order to avoid missing potential lesions. Secondly, our dataset primarily consisted of oral cavity photos, encompassing complex anatomy subsites such as the lip, tongue, bucca, gingiva, retromolar trigone, and hard palate. The precision could have been affected by the variability across these different subsites. This observation aligns with previous reported results, where the oral cavity dataset exhibited a poor precision of 0.602. Nevertheless, it is important to note that surgical management of SCC necessitates an adequate safe margin for excision surgeries. Therefore, utilizing a larger mask than the ground truth should not pose a risk of inadequate excision if employed for surgical planning.

Table 4. Comparing head and neck squamous cell carcinoma segmentation performance
Publication	Task	mIOU	DSC	Recall	Precision
Paderno [13]	Laryngeal SCC NBI lesion		0.7603
Muhammad [32]	Laryngeal SCC	0.686	0.814	0.951	0.785
Muhammad [32]	OCSCC	0.749	0.598	0.905	0.602
Muhammad [32]	OPSCC	0.784	0.879	0.907	0.933
SCC-Net	OC/OP/HP/Larynx SCC	0.872	0.868	0.9715	0.69

SCC: Squamous cell carcinoma, OC: Oral cavity, OP: Orophayrnx, HP: Hypophayrnx

In the era of multidisciplinary diagnosis and treatment for cancer patients, those with head and neck cancer routinely undergo check-ups across various subspecialties. While CT and MRI scans have been the gold standard for staging cancer, the importance of clear gross images of mucosal lesions cannot be overstated in making clinical decisions regarding surgical and adjuvant treatments. Our study proposes a new method for oral cancer image segmentation by combining NAS and Unet for the automatic segmentation of tumors in the oral cavity, oropharynx, hypopharynx, and glottis. This represents the largest cohort of pathologically confirmed cancer segmentation from medical endoscopic photos.

There are several limitations in our current study. Firstly, the datasets were relatively small and partially patient-unbalanced, indicating a high level of variability. Additionally, the ground truth was defined through secondary review opinions. The presence of visual illusions such as shadows, reflections, blurriness, and varying illumination conditions can adversely affect the performance of convolutional neural networks (CNNs) and the quality of segmented oral tumors. It was observed that all CNNs tended to detect malignant areas more prominently under certain illumination conditions. Independent evaluations by multiple experts may lead to a more accurate definition of endoscopic tumor margins.

Head and neck cancers are largely dependent on endoscopic imaging for screening and early diagnosis. However, the capture of endoscopic images is operator-dependent, and the size or shape of lesions may not always be clearly visible to multidisciplinary treatment teams. Our proposed SCC-NET architecture demonstrates promising accuracy and reaction time in autonomously segmenting lesions from white light images. Further research, validation, and data collection efforts are necessary to fully harness the clinical potential of these methods and improve patient care in the diagnosis and treatment of oral cancer.

IOU:Intersection over Union;SCC:Squamous cell carcinoma;NAS:Neural Architecture Search;UADT:Upper aerodigestive tract;LDW pooling:Learnable Discrete Wavelet Pooling;CNN:Convolutional neural network;NBI:Narrowed band image.

Ethics approval and consent to participate

The study received approval from the Institutional Review Board and Ethics Committee of Kaohsiung Veterans General Hospital (Approval No: KSVGH22-CT10-16).

Consent for publication

The IRB approved a request to waive the documentation of informed consent for this retrospective chart review study

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests

The authors declare that they have no competing interests

Funding

The study had not received any form of funding from any sources. The expense for conducting the study is covered by the 1st author and correspondence author.

Authors' contributions

CYH and JWH designed the study and carried out the coordination. CYH, CCT and LAH processed the image data. CCT and GTS carried out the neural network model training. CYH was a major contributor to writing the manuscript and documenting most cases. LAH, BHK, YSL and HHS collected the clinical data and drafted the article. CYH, LAH and JWH critically revised the manuscript for important intellectual content. All authors read and approved the final manuscript.

Acknowledgements

Special thanks to Kaohsiung Veteran General Hospital Department of Otolaryngology Head and Neck Surgery staffs for providing the SCC photos.

Rivera C, Venegas B. Histological and molecular aspects of oral squamous cell carcinoma (Review). Oncol Lett. 2014;8(1):7–11.
Gelband H, et al. In: Gelband H, et al. editors. Summary, in Cancer: Disease Control Priorities, Third Edition (Volume 3). Editors: Washington (DC); 2015.
Irani S. New Insights into Oral Cancer-Risk Factors and Prevention: A Review of Literature. Int J Prev Med. 2020;11:202.
Zanoni DK, et al. Survival outcomes after treatment of cancer of the oral cavity (1985–2015). Oral Oncol. 2019;90:115–21.
Lee CC, et al. Prognostic Performance of a New Staging Category to Improve Discrimination of Disease-Specific Survival in Nonmetastatic Oral Cancer. JAMA Otolaryngol Head Neck Surg. 2017;143(4):395–402.
Huang CC, et al. Life expectancy and expected years of life lost to oral cancer in Taiwan: a nation-wide analysis of 22,024 cases followed for 10 years. Oral Oncol. 2015;51(4):349–54.
Lauritzen BB, et al. Impact of delay in diagnosis and treatment-initiation on disease stage and survival in oral cavity cancer: a systematic review. Acta Oncol. 2021;60(9):1083–90.
Abati S et al. Oral Cancer and Precancer: A Narrative Review on the Relevance of Early Diagnosis. Int J Environ Res Public Health, 2020. 17(24).
Warnakulasuriya S, Kerr AR. Oral Cancer Screening: Past, Present, and Future. J Dent Res. 2021;100(12):1313–20.
Heo J, et al. Deep learning model for tongue cancer diagnosis using endoscopic images. Sci Rep. 2022;12(1):6281.
Lin H et al. Automatic detection of oral cancer in smartphone-based images using deep learning for early diagnosis. J Biomed Opt, 2021. 26(8).
K W et al. Automatic classification and detection of oral cancer in photographic images using deep learning algorithms. J oral Pathol medicine: official publication Int Association Oral Pathologists Am Acad Oral Pathol, 2021. 50(9).
Paderno A, et al. Deep Learning for Automatic Segmentation of Oral and Oropharyngeal Cancer Using Narrow Band Imaging: Preliminary Experience in a Clinical Perspective. Front Oncol. 2021;11:626602.
Kim Y, et al. Novel deep learning-based survival prediction for oral cancer by analyzing tumor-infiltrating lymphocyte profiles through CIBERSORT. Oncoimmunology. 2021;10(1):1904573.
Jeyaraj PR, Samuel ER, Nadar. Computer-assisted medical image classification for early diagnosis of oral cancer employing deep learning algorithm. J Cancer Res Clin Oncol. 2019;145(4):829–37.
Ariji Y, et al. Contrast-enhanced computed tomography image assessment of cervical lymph node metastasis in patients with oral cancer by using a deep learning system of artificial intelligence. Oral Surg Oral Med Oral Pathol Oral Radiol. 2019;127(5):458–63.
Song B, et al. Bayesian deep learning for reliable oral cancer image classification. Biomed Opt Express. 2021;12(10):6422–30.
Chu CS, et al. Deep Learning for Clinical Image Analyses in Oral Squamous Cell Carcinoma: A Review. JAMA Otolaryngol Head Neck Surg. 2021;147(10):893–900.
Weng Y, et al. NAS-Unet: Neural Architecture Search for Medical Image Segmentation. IEEE Access. 2019;7:44247–57.
Baldeon Calisto M, Lai-Yuen SK. EMONAS-Net: Efficient multiobjective neural architecture search using surrogate-assisted evolutionary algorithm for 3D medical image segmentation. Artif Intell Med. 2021;119:102154.
Thomas Elsken JHM, Hutter F. Neural Architecture Search: A Survey. J Mach Learn Res. 2019;20:1–21.
Wang B-S et al. Learnable Discrete Wavelet Pooling (LDW-Pooling) For Convolutional Networks. in BMVC 2021. 2021.
Wang C-Y et al. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. in Computer Vision and Pattern Recognition. 2019.
Olaf Ronneberger PF, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation, in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 2015. pp. 234–241.
Sengupta N, et al. Scarcity of publicly available oral cancer image datasets for machine learning research. Oral Oncol. 2022;126:105737.
Song B, et al. Automatic classification of dual-modalilty, smartphone-based oral dysplasia and malignancy images using deep learning. Biomed Opt Express. 2018;9(11):5318–29.
Mascharak S, Baird BJ, Holsinger FC. Detecting oropharyngeal carcinoma using multispectral, narrow-band imaging and machine learning. Laryngoscope. 2018;128(11):2514–20.
Ren J, et al. Automatic Recognition of Laryngoscopic Images Using a Deep-Learning Technique. Laryngoscope. 2020;130(11):E686–93.
Inaba A, et al. Artificial intelligence system for detecting superficial laryngopharyngeal cancer with high efficiency of deep learning. Head Neck. 2020;42(9):2581–92.
Kono M, et al. Diagnosis of pharyngeal cancer on endoscopic video images by Mask region-based convolutional neural network. Dig Endosc. 2021;33(4):569–76.
Flugge T, et al. Detection of oral squamous cell carcinoma in clinical photographs using a vision transformer. Sci Rep. 2023;13(1):2296.
Azam MA, et al. Videomics of the Upper Aero-Digestive Tract Cancer: Deep Learning Applied to White Light and Narrow Band Imaging for Automatic Segmentation of Endoscopic Images. Front Oncol. 2022;12:900451.

No competing interests reported.

Download PDF

Editor assigned by journal
25 Jun, 2024
Submission checks completed at journal
17 Jun, 2024
First submitted to journal
13 Jun, 2024

You are reading this latest preprint version

SCC-NET: Segmentation of Clinical Cancer image for Head and Neck Squamous Cell Carcinoma

Status:

Version 1

Abstract

Figures

Background

Method

Image source

Dataset preparation

Training Parameters

Neural Architecture Search architecture

Comparison models

Outcome Analysis

Result

Discussion

Conclusion

Abbreviations

Declarations

References

Additional Declarations

Status:

Version 1