A multi-level closing based segmentation framework for dermatoscopic images using ensemble deep network.

doi:10.21203/rs.3.rs-3417922/v1

Download PDF

Research Article

A multi-level closing based segmentation framework for dermatoscopic images using ensemble deep network.

https://doi.org/10.21203/rs.3.rs-3417922/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

The proposed framework is a hybrid model of extensive multi-level closing based hair removal pre-processing followed by training using an ensemble deep network. In this way, a highly optimised pedagogy for lesion segmentation in dermatoscopic images has been obtained. Two publicly available datasets are then used to analyse the performance of the framework. One is HAM10k dataset and another is ISIC dataset. The segmented images are compared with the mask given with the dataset and accordingly the value of Dice Coefficient, Jaccard Similarity index and other performance metrics are computed. The average values of Dice Coefficient and Jaccard value for both datasets are found to be 0.9555 and 0.8545 respectively. These values along with other performance metrics are compared with values of base models and state of the art techniques and was found to be better. The proposed framework achieved an average accuracy of 95.87% for both datasets which is better than all base models and even better than the proposed framework without pre-processing.

Skin cancer

segmentation

lesion detection

biomedical image processing

deep ensemble network

Skin cancer is a major public health concern, and its prevalence is increasing over the last few decades. Skin cancer accounts for a significant fraction of all diagnosed cancer cases globally.According to the American Cancer Society, the number of new melanoma cases in the United States in 2019 is 115,320, with 11,540 deaths, and it accounts for 63% of skin cancer-related deaths(Siegel,2021).The importance of early detection and precise diagnosis in improving patient outcomes and lowering mortality rates cannot be overstated. Medical imaging is critical in the detection and diagnosis of skin cancer, giving clinicians with crucial visual information to help them make treatment decisions. Lesions can indicate either the presence of most lethal type of skin cancer (Melanoma) or they can indicate other non lethal types of lesions/cancerous growth. Thus, identification of the type of lesion based on its appearance has been a major form of ongoing research. The dermatoscopic images of skin can be used for the detection and identification of melanoma based skin lesion. Some of the melanoma based cancers are nodular melanoma, superficial-melanoma and acral lentiginous cancer.

Image processing and deep learning techniques are now frequently employed to diagnose skin cancer (Senam,2019). It is critical to remove hair-like noise from the lesion when using deep learning and image processing to detect skin cancer. If the hair removal is not done appropriately, the success percentage in diagnosing the lesion drops (Mehta,2016). One of the preliminary steps before the detection of skin cancer is to identify the part affected by the skin cancer(Xua,1999). Different types of segmentation techniques can be used for the same. Researchers have been working on identifying the lesion boundary accurately by using multiple segmentation techniques. Many image processing and machine learning techniques have been identified for the same. A better segmented lesion can more accurately depict the nature and growth of skin cancer.

The proposed framework built a multi-level closing pre-processing framework to remove hair from the dermatoscopic images. It then improves the quality of images by using median filter and contrast enhancement. The images are then fed into an ensemble model with Res-Net and U-Net as its components and a meta learner. This results in the extraction of the lesion affected area of the image. This segmented region is then matched with the masks provided in the dataset to ensure the performance of the proposed framework. The proposed methodology is also compared with its base models like U-Net, Res-Net and state of the art algorithms to validate the performance of the proposed framework.

The objectives of the proposed pre-processing and segmentation-framework are as given below:

1) To develop a robust hair removal pre-processing technique for dermatoscopic time that takes minimal time to execute.

2) To develop a highly accurate segmentation framework for dermatoscopic images.

4) To evaluate model performance using established metrics with respect to state-of-the-art techniques and base models.

Further the paper is organised into following sections. Section 2 presents the literature related to the proposed framework. This section also explores the need and novelty related to the presented work. Section 3 explains the methods and step by step implementation of the proposed methodology. Section 4 presents the intermediate and final results obtained after segmentation. This section also discusses various mathematical metrics used to evaluate the proposed algorithm and improvement of proposed work from other state of the art techniques. Section 5 concludes the paper with a brief summary of given work, its results and ways in which this work can be extended in future.

Many skin lesion segmentation algorithms have been proposed by researchers in the past. The algorithms are based on extraction of features for each image from an image dataset and accordingly creating a segmentation model. The model is then used for segmentation of query images and the accuracy and dice score of segmentation is analysed. Some of the related works are as follow.

The DullRazor algorithm by (Lee,1997) has been one of the first solution to the hair removal problem. Using thresholds and morphological processes, this program eliminates hairs from the lesion. (Ali,2022) offered a study on skin cancer picture hair removal and lesion segmentation. They stated that hair removal is critical because the CNN model would find correlations between the noise and the aim (classification of skin cancer). If the noise in the image is not removed, the CNN must learn to ignore it using gradient descent and a big image dataset. In another study (Alom,2020), a Nabla N-Network has been proposed with enhanced fusion units in decoder of the segmentation network. The segmented features are then fed into an Inception Recurrent Residual Convolutional Neural Network (IRRCNN) for final image classification. This study achieved an accuracy of 87% for ISIC-2018 dataset. Anjum et al. first used a YOLOv2 model based on Open Neural networks and SqueezeNet Model for localization of skin lesions.A pixel classification layer has been used for computing the overlapping regions between segmented and ground truth images. Finally, the classification is achieved using Res-Net-18 and Ant colony optimization with Global Accuracy of 0.93, 0.95 on ISBI 2017, and ISBI 2018 respectively(Anjum,2020).Using Mask R_CNN architecture, Jojoa Acosta et al. first clipped the lesions from the skin picture. After that, they classified the clipped lesions using the pre-trained ResNet152 architecture. The ISIC 2017 dataset produced a 90.4% accuracy from this model(Acosta, 2021).In a different study, Malibari et al. proposed a deep model that used deep convolutional neural networks for skin classification and detection. They used the UNET architecture for segmentation when pre-trained Squeezenet architecture-based whale optimization was applied for the classification step. They were able to classify melanoma with 99% accuracy using the ISIC 2019 dataset [10] (Malibari, 2022). A deep convolutional network was employed in a hybrid structure by Jayapriya and Jacob. The lesions were segmented using fully trained pre-convolutional networks, and a method was put forth to extract features from the lesions and classify them using SVM. Experimental studies found that the accuracy of this hybrid technique was 88.92% for ISIC 2016 and 85.3% for ISIC 2017 (Jayapriya, 2019).

Authors in(Monika,2020)proffered a method for the identification and classification of distinct forms of skin cancer by using Multi-class Support Vector Machine (MSVM). The authors used the ISIC 2019 challenge datasetand were able to achieve an accuracy of about 96.25%. In (Vidya2020), authorsused different machine learning approaches, such as SVM, KNN, and Naive Bayes classifier, to categorise skin lesions between benign and melanoma. Using SVM classifiers, a classification accuracy of 97.8% and an area under the curve of 0.94 were attained. Additionally, by using KNN, the Sensitivity and Specificity were both 86.2% and 85%, respectively.Another research work in (Vijayalakshmi,2019) offered a method for removing hair, shading, and glares from images during the pre-processing stage after that segmentation and feature extraction are performed. They trained their model on the back propagation technique (feed-forward neural network), SVM, and CNN in the final step of their process. The models were amalgamated (combined) utilizing image processing tools after classification, yielding an accuracy of 85% on the ISIC dataset.

In (Kumar,2022), images are pre-processed with a Gaussian filter and Region of Interest (ROI) extraction to weed out noise and mine important parts. Integrating U-Net and RP-Net allows for segmentation, and the output from both models is combined using the Jaccard similarity-based fusion model. In order to increase the effectiveness of the detection process, the data augmentation is processed. SqueezeNet, which was trained using the suggested Aquila Whale Optimization (AWO) method, is then used to identify skin cancer. The newly created AWO method was created by fusing the Aquila Optimizer (AO) and Whale Optimization Algorithm (WOA). With the greatest testing accuracy of 92.5%, sensitivity of 92.1%, and specificity of 91.7%, the created AWO-based SqueezeNet exceeded.(Araújo,2022) proposed a U-Net model along with Link-Net for segmentation of skin cancer by using three datasets viz. pH2, ISIC 2018 and DermiS. The proposed model has been compared in terms of Dice Coefficient and obtained an average Dice of 0.923 for PH2 dataset, Dice value of 0.893 for ISIC 2018, and Dice value of 0.879 forDermISdataset.In (Ries,2022), authors proposed U-Net Model for segmentation of skin cancer images and an InsiNet Model for the classification of melanoma images. The model has been used over ISIC and HAM 10K data set of skin cancer and is compared with similar techniques in terms of accuracy. The results reveal that the created InSiNet architecture surpasses the other techniques, with 94.59%, 91.89%, and 90.54% accuracy in the ISIC 2018, 2019, and 2020 datasets, respectively. (Akyel,2022) proposed a LinkNet-B7 model based on the EfficientNetB7 encoder. Here ISIC and pH2 datasets have been used. The Efficient-Net model used is based on Mobile Inverted Bottleneck Convolution. Also this encoder is then fused with Res-Net Model for final segmentation. This model’s training accuracy for noise removal and lesion segmentation was calculated to be 95.72% and 97.80%, respectively. A fully convolution encoder decoder network (FCEDN) with hyper-parameter optimization is suggested in (Muhakud,2022) for the segmentation of dermatoscopy images. The unique Exponential Neighbourhood Grey Wolf Optimization (EN-GWO) algorithm is used to optimise the hyper-parameters of FCEDN rather than manually setting them, with an emphasis on the right balance between exploration and exploitation.The EN-GWO has been validated by the authors against four variations of GWO, GA, and PSO based hyper-parameter optimization approaches using the International Skin Imaging Collaboration (ISIC) 2016 and ISIC 2017 datasets. The proposed model can segment skin cancer images for the ISIC 2016 and ISIC 2017 datasets with a Jaccard coefficient of 96.41%, 86.85%, Dice coefficient of 98.48%, 87.23%, and accuracy of 98.32%, 95.25%, respectively. Authors in(Goyal,2017) proposed an improved fully convolutional network with the help of transfer learning and a hybrid loss function to perform multi-class semantic segmentation. A custom hybrid loss function has been designed for multi-class segmentation of skin lesion images. The results showed that the two-tier level transfer learning FCN-8s achieved the best overall result, with a Dice score of 78.5% in a naevus category, 65.3% in melanoma, and 55.7% in seborrhoeic keratosis in multi-class segmentation, and an accuracy of 84.62% in melanoma recognition in lesion diagnosis. In another study (Dimša,2021), authors used three types of U-Nets namely, Base U-Net, U-Net + + and MultiResU-Net models for the segmentation of skin lesion images. In U-Net + + skip connections are redesigned so that feature maps undergo a dense convolution block. In Multi-ResU-Net, 3x3 and 7x7 convolutions are carried in parallel to 5x5 convolutions and the output of all three types of convolutions are thereby concatenated. In terms of Dice co-efficient, MultiResU-Net outperformed the rest two types of U-Net Models.

Dong et al. (Dong,2021) introduced FAC-Net, a deep learning model. They attained a 91.19% dice coefficient using the ISIC 2018 dataset. Unlike the U-Net architecture, FAC-Net employs upsampling techniques. Among the various modified LinkNet architectures, one notable example is D-LinkNet. D-LinkNet utilizes Res-Net34 as the encoder and incorporates a middle block containing dilated convolutional layers. In comparison to LinkNet34 (Zhou,2018), D-LinkNet achieved a 2% higher accuracy rate.Another related study by Xiong et al. (Xiong2021) presented Dp-LinkNet, which shares similarities with D-LinkNet but utilizes a different center block. They were able to achieve a 0.9% higher accuracy rate than D-LinkNet, albeit with an increase in training time. Malik et al. (Malik,2022) proposed a signet-based model for lesion segmentation. To address hair noise, they applied the Dullrazor algorithm. However, the study revealed that Dullrazor was not adequate for handling thin hairs. Despite this, they achieved a dice accuracy of 88.43%, demonstrating that hair removal enhances accuracy.Hasan et al. (Hasan2020) in 2020 introduce the Dermoscopic Skin Network (DSNet), an autonomous semantic segmentation network for skin lesion segmentation, along with a new loss function that combines an intersection over-union and a binary cross-entropy. In the studies they did, their proposed loss function produced greater true positive rates and played a significant impact in semantic segmentation.Meanwhile, Bagheri et al. (Bagheri2021) proposed a mask R-CNN-based model in another study, achieving a notable 89.83% dice accuracy on the PH2 dataset.While Al-masni et al. (Al-Masni,2018) suggest a revolutionary segmentation methodology in 2020. In order to increase pixel-wise segmentation performance, they employed Full resolution Convolutional Networks (FrCN), which learn the full resolution characteristics of each individual pixel of the input data without the need for pre- or post-processing steps. In the datasets that were tested, they obtained high evaluation metrics values.

3.1 Pre-processing:

3.1.1 Hair removal using multi-level closing and flat field algorithm:

Following steps are followed to remove hair from a dermatoscopic image:

3.1.1.1Image resizing:

The images are resized to make their length and width of the same dimension ‘n’. In this case the value of n has been chosen as 256.

3.1.1.2Multi-level Closing operation:

A closing operation is thereby applied in multiple levels with different structuring elements. Initially the structuring element has been chosen as “disk”. The output image obtained thereby is further applied with a closing operation using “diamond” structuring element followed by closing operation using cuboid. Closing operation is performed using Eq. (1). Figure 1 depicts various stages of application of closing operation.

The disk and diamond are chosen due to their respective difference in shape and cuboid has been used for 3D level removal of hair. Figure 1 shows the images obtained by using multi-level closing.

L•M = (L*M)ϴM (1)

Here • denotes closing operation, * denotes dilation and ϴ represents erosion respectively. Also, L denotes the image and M denotes the structuring element.

3.1.2 Pre-processing for Image quality enhancement

The images obtained from previous subsection then further enhanced for better extraction of segmented region. First, the images are normalized and then undergo a median filter for their improvement as given in Fig. 2.

3.1.2.1Normalization of images

Normalization of images ensures that they have consistent brightness and contrast, which can help improve the accuracy of the skin lesion detection model. In the proposed methodology, normalization is done by subtracting the mean pixel value from each pixel and dividing by the standard deviation.

3.1.2.2Median filter

A median filter is then applied with a window of 3x3 to the images. The median value then replaces each pixel by median from surrounding 3x3 bounding box.

The median filter for image segmentation, using a window size of (2k + 1) x (2k + 1) pixels, where k is an integer is given in Eq. (2).

L'(x, y) = Median(L(x-k, y-k), L(x-k, y-k + 1), ..., L(x + k, y + k) ) (2)

Here L(x, y) is the intensity value of the pixel at coordinates (x, y) in the input image, and L'(x, y) is the corresponding intensity value in the output filtered image. Also, let the neighbourhood window size be (2k + 1) x (2k + 1).

3.2 Training using Ensemble framework

The pre-processed images are then fed into a segmentation framework as given in further subsections.

Figure 3 and Fig. 4shows the architecture and working flowof the proposed stacked ensemble of U-Net and Res-Net model for the diagnosis of Skin Cancer. Initially, prediction probabilities are generated from skin lesion images using two different model which are stacked together at level-0. Later, we combine the predictions of the models and concatenate these models by using stacked ensemble and feed them together at level-1 for the final classification of Skin cancer images. Finally, model evaluation of the image is provided by producing heatmap and different metrics obtained after visualizations obtained for the affected area and thereby a mask of the images is generated.

The proposed ensemble framework has used two different well-structured CNN models (Long2015) for feature extraction from input images as given in further subsections.

3.2.1 U-NET Model

First, U-Net is encoder-decoder architecture for image segmentation that was introduced in (Ronneberger,2015). The U-Net architecture has a contracting and expansive path to first capture the context of input image and then generate a segmentation map by up-sampling the image. The U-Net architecture also includes skip connections between corresponding layers in the contracting and expansive paths, which allow the network to propagate spatial information from the contracting path to the expansive path. The architecture of U-Net model used in the proposed methodology is given in Fig. 5.

The energy function is computed by a pixel-wise soft-max over the final feature map combined with the cross-entropy loss function. The soft-max is defined in Eq. (3).

$${pt}_{k}\left(\text{x}\right) =\left(\text{e}\text{x}\text{p}\left({at}_{k}\right(\text{x}\left)\right)/ \left({\sum }_{{k}^{{\prime }}=1}^{k}\text{e}\text{x}\text{p}\left({at}_{k{\prime }} \right(\text{x}\left)\right)\right)\right)$$

where

${at}_{k}$ (x) denotes the activation in feature channel k at the pixel position x $\in {\Omega }$ with ${\Omega }\subset {\mathbb{Z}}^{2}$.

K is the number of classes, and

${pt}_{k}$ (x) is the approximated maximum-function.

I.e., ${pt}_{k}$(x) ≈ 1 for the k that has the maximum activation ${at}_{k}$(x) and ${pt}_{k}$(x) ≈ 0 for all other k.

The cross entropy then penalizes at each position the deviation of ${pt}_{l\left(x\right)}$(x) using Eq. (4).

$$E= \sum _{x\in {\Omega }}\omega \left(x\right){\text{log}}_{e}\left({pt}_{l}\left(x\right)\right)$$

where

$\mathcal{l}$ : Ω → {1,. . ., K} is the true label of each pixel, and

w: Ω → R is a weight map that we introduced to give some pixels more importance in the training.

The separation border is computed using morphological operations. The weight map is then computed as

$$\omega \left(x\right)= {\omega }_{c}\left(x\right)+ {\omega }_{0} . exp\left(-\frac{{\left({d}_{1}\left(x\right)+ {d}_{2}\left(x\right)\right)}^{2}}{2{\sigma }^{2}}\right)$$

where

${\omega }_{c}$ : Ω → $\mathbb{R}$ is the weight map to balance the class frequencies,

${d}_{1}$ : Ω → $\mathbb{R}$ denotes the distance to the border of the nearest cell, and

${d}_{2}$ : Ω →$\mathbb{R}$the distance to the border of the second nearest cell.

In our experiments we have set ${\omega }_{0}$ = 10 and $\sigma$ ≈ 5 pixels.

3.2.2 RES-NET Model

The second network used is Res-Net. Residual Network or Res-Net is a deep residual network architecture that was introduced by He et al. (He2016). The Res-Net architecture uses residual blocks where each residual block consists of two or three convolutional layers with shortcut connections that skip one or more layers. The shortcut connectionsallow the network to learn the residual mapping between the input and output of each block, which helps to mitigate the vanishing gradient problem that can occur in very deep networks(Deng, 2009) (Szegedy, 2017).

The original Res-Net architecture uses stages with 64, 128, 256, 512 channels, respectively. However, it is possible to modify the number of channels in each stage, if the number of channels in each stage is increasing.The proposed modelused here has the filter sizes of 64, 64, and 64 instead of the general sizes of 64, 128, 256, 512. Thereby a Res-Net architecture with fewer channels has been created. This modified architecture will still have the same basic structure as Res-Net, but with fewer parameters, depending on the task.

For forward propagation(He 2016), the activations are given in Eq. (6).

$${a}^{l}≔g\left({W}^{l-1,l}.{a}^{l-1}+{b}^{l}+{W}^{l-2,l}.{a}^{l-2}\right)$$

$$≔g({Z}^{l}+{W}^{l-2,l}.{a}^{l-2})$$

Where,

$${\text{a}}^{\text{l} } \text{t}\text{h}\text{e} \text{a}\text{c}\text{t}\text{i}\text{v}\text{a}\text{t}\text{i}\text{o}\text{n}\text{s}\left(\text{o}\text{u}\text{t}\text{p}\text{u}\text{t}\text{s}\right)\text{o}\text{f} \text{t}\text{h}\text{e} \text{n}\text{e}\text{u}\text{r}\text{o}\text{n}\text{s} \text{i}\text{n} \text{l}\text{a}\text{y}\text{e}\text{r} \text{l}.$$

g the activation function for layer l.

and ${\text{W}}^{\text{l}-1,\text{l}}$ the weight matrix for neurons between layer l-1 and l and

$${Z}^{l} = {W}^{l-1,l}.{a}^{l-1}+{b}^{l}$$

For backword propagation, the weight updation is done using Eq. (8) and Eq. (9).

$$\varDelta {w}^{l-1,l} := -\mu \frac{\delta {E}^{l}}{\delta {w}^{l-1,l}}= -\mu {a}^{l-1}.{\phi }^{l}$$

and for the skip paths (nearly identical)

$$\varDelta {w}^{l-2,l} := -\mu \frac{\delta {E}^{l}}{\delta {w}^{l-2,l}}= \mu {a}^{l-2}.{\phi }^{l}$$

In both the cases,${\mu } \text{i}\text{s} \text{a} \text{l}\text{e}\text{a}\text{r}\text{n}\text{i}\text{n}\text{g} \text{r}\text{a}\text{t}\text{e}\left({\mu }<0\right),$

${{\phi }}^{\text{l}}$ the error signal of neurons at layer l, and

$${\text{a}}_{\text{i}}^{\text{l}} \text{t}\text{h}\text{e} \text{a}\text{c}\text{t}\text{i}\text{v}\text{a}\text{t}\text{i}\text{o}\text{n} \text{o}\text{f} \text{n}\text{e}\text{u}\text{r}\text{o}\text{n}\text{s} \text{a}\text{t} \text{l}\text{a}\text{y}\text{e}\text{r} \text{l}.$$

In the general case there can be K skip path weight matrices, thereby Eq. (10) will be used for computing updation of weight.

$$\varDelta {w}^{l-k,l} := -\mu \frac{\delta {E}^{l}}{\delta {w}^{l-k,l}}= -\mu {a}^{l-k}.{\phi }^{l}$$

Thereafter a stacked ensemble model is created with meta-model of U-Net and Res-Net. Stacked ensemble is a combination of multiple models that are trained independently and then combined using a meta-model. In this case, meta model is created by combining U-Net and Res-Net, and it is trainedon these two models separately on the same dataset, and then combine these models and concatenate them and construct a meta model with stack ensemble and generate final outputs using a meta-model, and evaluate the model with performance metrics.

The proposed stack model works as a single model that is multiheaded to accept identical input images from two different datasets. Prediction probability vectors generated from the first-level CNN sub-models are combined to the final classification of the skin cancer images. The complete training and validation of the stacked model are done using the two different datasets, viz. ISIC2018and Ham10k.

4.1 Dataset description:

The proposed methodology has been tested over two publicly available datasets. The first dataset is the ISIC 2018 dataset, and the second dataset used is the HAM10K dataset as given below.

4.1.1 ISIC skin cancer

The ISIC (International Skin Imaging Collaboration) Skin Lesion Dataset is a public dataset of dermoscopic images of skin lesions, collected from a variety of sources. The dataset contains more than 2,300 images of skin lesions, including melanoma, nevi, and other benign and malignant skin conditions. The images were acquired using a variety of imaging devices and techniques, and were collected from a range of sources including clinical settings, research studies, and public health campaigns.

4.1.2 Skin Cancer MNIST: HAM10000

The Skin Cancer MNIST: HAM10000 dataset is a collection of 10,015 clinical images of various skin lesions, which was created by the Human Against Machine with 10000 training examples (HAM10000) challenge. The dataset contains images of 7 different types of benign and malignant skin lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (AKIEC), Basal cell carcinoma (BCC), Benign keratosis-like lesions (BKL), Dermatofibroma (DF), Melanoma (MEL), Nevus (NEV) and Vascular lesions (VASC). The images are in JPEG format and have a resolution of 600x450 pixels or higher.

Table 1 summarizes the splitting being used to perform the experiments which are discussed later.

Table 1

Details regarding the split of dataset between testing and training images.
	ISIC skin lesion	HAM 10k skin lesion
Training	830	3204
Testing	519	4808
Validation	1245	2003

Two experiments are performed to evaluate the performance of the proposed algorithm over state-of-the-art methods. Each experiment has been performed on an Intel Xeon based workstation with RTX 2000 GPU for each dataset. The results obtained are as follows.

4.2 Parameters for comparison

The proposed methodology’s outcomes were compared in terms of accuracy(Ay), precision(Pn), recall(Rl), F1-score(F1-sc),Dice Cofficient (DC),Jaccard (Jc) values. These metrics are measured using True Positive(Tr_Pv), True Negative(Tr_Nv), False positive(Fa_Pv) and False Negative(Fa_Nv) values. True positive is the count of data points that are both positive and expected to be positive .True Negative is the count of data points that are both negative and expected to be negative. False negatives and false positives occur when your expected class differs from the actual class. The metrics listed below are used to analyse the performance of the model (Suganeshwari, 2018).

4.2.1 Weighted Average Recall:

It is also known as Sensitivity, is a statistical measure used in binary classification that calculates the proportion of Real Positive cases that are accurately predicted as Positive.

Recall is defined, with its various common appellations, as in Eq. (11) (Powers2020) (Fraser 2007).

Recall = Tr_Pv/(Tr_Pv + Fa_Nv) (11)

Weighted average recall can be computed by taking a weighted mean (weight will be defined by number of samples) of recall values for all output classes.

4.2.2 Weighted Average Precision:

It is also known as Confidence in Data Mining, is a statistical measure used in binary classification that calculates the proportion of Predicted Positive cases that are Real Positives. Unlike Recall, which measures how many relevant cases are identified, Precision focuses on how accurately the Positive predictions match the Real Positives (Powers, 2020)(Carletta, 1996). It is given in Eq. (12).

Pn = Tr_Pv/(Tr_Pv + Fa_Pv) (12)

Weighted average precision can be computed by taking the weighted of precision values for all output classes.

4.2.3 Accuracy

It is a commonly used metric in binary classification that measures the overall correctness of the classifier's predictions. It calculates the ratio of the number of correct predictions to the total number of predictions made by the classifier. Accuracy is defined as given in Eq. (13):

Ay= (Tr_Pv + Tr_Nv)/(Tr_Nv + Tr_Pv + Fa_Pv + Fa_Nv) (13)

4.2.4 The Dice coefficient:

It is a similarity measure used in image segmentation and pattern recognition. It calculates the similarity between two sets based on their overlap. In binary classification, the Dice coefficient (DiCo) is used to evaluate the similarity between the predicted positive set and the true positive set. It is defined as given in Eq. (14):

DiCo = Tr_Pv/(Tr_Pv+(Fa_Nv + Fa_Pv)/2 (14)

4.2.5 The Jaccard value:

It is a measure of similarity between two sets that calculates the ratio of the intersection of the sets to the union of the sets.

4.3 Experimental results and discussion

When the pre-processed dermatoscopic images from two datasets are fed into the CNN networks, following intermediate results have been obtained. Various images obtained at each level are given in Fig. 6. Following subsections describe two experiments conducted for the proposed framework over two dataset described in section 4.1.

4.3.1 Experiment 1:

In this experiment, the proposed segmentation technique has been applied to extract the area affected from skin cancer on ISIC dataset. The segmented images obtained are matched with the already extracted segmented images/mask presented in the dataset and accordingly metrics like average weighted precision,accuracy, f1-score,Jaccard, dice coefficient,IoU value, average weighted recall, and dice scores are calculated. The performance for ISIC dataset for proposed method along with a U-NET algorithm,RES-NET algorithm and Efficient-Net (Lama, 2023) are computed and compared in Table 2. Precision curve to depict the performance of these algorithms is given in Figs. 8, 9 and 10.

As evident from Table 2, the proposed algorithm is performing better by 8.94%, 5.75%, and 0.88% over U-Net,Res-Netand Efficient-Net (Lama, 2023) models respectivelyin terms of accuracy for ISIC dataset. The algorithm performs 0.46% better than its own model without hair removal pre-processing. The improvement in Dice Co-efficient is 16.29%, 28.82%, and 9.87over U-Net, Res-Net and Efficient-Net (Lama, 2023) models respectively. Also, the improvement over the proposed algorithm without pre-processing is 3.28%.

Table 2

Comparison of different performance metrics for proposed work and other related techniques for ISIC dataset.
Models	U-Net Model	RES-NET Model	Efficient-Net	Stack Ensemble	Ensemble model with pre-processing
Jaccard	0.6909	0.63944	0.807	0.8132	0.8214
F1	0.6959	0.74493	0.7888	0.78552	0.79100
Average weighted Recall	0.88023	0.79342	0.8907	0.90896	0.91800
Accuracy	0.87786	0.90438	0.94800	0.9520	0.9564
Average weighted precision	0.92196	0.89065	0.9100	0.92414	0.93114
Dice-Coefficient	0.830158	0.749374	0.880	0.96224	0.9654
Iou	0.615818	0.610543	0.6214	0.621524	0.631132
sensitivity	0.766162	0.769496	0.7564	0.763295	0.76996
specificity	0.979836	0.975469	0.9567	0.980065	0.98500

4.3.2 Experiment 2:

In this experiment, the proposed segmentation technique has been applied to extract the area affected from skin cancer on ham10k dataset. The performance for ISIC dataset for proposed method along with a U-NET algorithm, RES-NET algorithm and an HSV-based model (Bibi, 2022) are computed and compared in Table 3 and the precision curves for U-Net, Res-Net and Ensemble Model are depicted in Fig. 11, Fig. 12 and Fig. 13 respectively.

From Table 3, it can be seen that the proposed algorithm is performing better by 10.55%, 3.55%, and 1.62% over U-Net, Res-Net and an HSV based model (Bibi, 2022) respectively in terms of accuracy. Also, the improvement over the proposed model without hair removal is up to 1.01%. The improvement in Dice Coefficient is 9.95%, 16.18%, 1.10% and 0.02% over U-NET, Res-Net, HSV based model (Bibi, 2022) and Proposed algorithm without pre-processing respectively.

Table 3

Comparison of different performance metrics for proposed work and other related techniques for HAM10k dataset.
Models	U-NET	RES-NET	HSV based model in	Stack-ensemble model	Stack Ensemble with Hair removal
Jaccard	0.76946	0.76828	0.8785	0.8800	0.8876
F1	0.72059	0.85173	0.8760	0.90124	0.90560
Average weighted Recall	0.88765	0.91137	0.9056	0.91459	0.91500
Accuracy	0.86924	0.92802	0.9456	0.95132	0.96100
Average weighted precision	0.93938	0.91149	0.9032	0.93961	0.94500
Dice coefficent	0.860016	0.813862	0.9353	0.945328	0.9456
Iou	0.757947	0.693081	0.7655	0.783049	0.79119
sensitivity	0.878536	0.834044	0.86554	0.885919	0.88600
specificity	0.976750	0.969936	0.97110	0.978162	0.97900

Skin cancer is spreading at an unprecedented rate in the world today and its early identification can only help in its prevention. Skin lesion segmentation is an important preliminary step for identifying the skin cancer damaged region which can be further analyzed for detection of the type of skin cancer. The proposed methodology first removes the hair from dermatoscopic images using a multi-level closing operation with different structuring elements and then identifies the lesion affected area and segments it. The segmented area can then be analyzed to detect the type of skin cancer. The proposed algorithm uses a stack ensemble method by combining two CNNs viz. U-Net and Res-Net and then concatenating the output feature vector and feeding it into the classification layer. The final classification is done using a feed forward neural network based kernel.

As given in Table 2 and Table 3,the proposed ensemble model shows consistently better results than the other models. There is an improvement in Average Recall of 3.08%, 0.39%, 1.03% over U-Net, Res-Net and a HSV based model for HAM10k dataset. The improvements of Average recall for ISIC dataset for U-Net, Res-Net and Efficient-Net Model was found to be 4.29%, 15.70%, and 3.06% respectively. Similarly, the improvement in average precision for U-Net, Res-Net and a HSV based model for HAM10K dataset was found to be 0.59%, 3.67% and 4.62% respectively and 0.99%, 4.54% and 2.32% over U-Net, Res-Net and Efficient-Net Model respectively for ISIC dataset. The average accuracy for both datasets is found to be 95.87%. Also, the proposed framework with hair removal pre-processing is better in terms of all the metrics given in Table 2 and Table 3 when compared with U-Net, Res-Net and one state of the art technique.

This algorithm can be further utilized by classification techniques for identification of skin cancer. The technique can also be improved by further hyper-tuning the parameters and also by adding more CNNs into the stack ensemble.

6.1 Compliance with Ethical Standards:

6.1.1 Disclosure of potential conflicts of interest:

Authors have no conflict of interest of any form with anyone.

6.1.2 Research involving Human Participants and/or Animals:

The research is done on publicly available datasets and is in accordance with ethical standards as laid down in 1964 declaration of Helsinki and its later amendments or comparable ethical interests. There is no conflicting interest in this regard.

6.1.3 Informed Consent:

The images used are from publicly available datasets and thereby informed consent requirement is not applicable.

Jojoa Acosta, M.F.; Caballero Tovar, L.Y.; Garcia-Zapirain, M.B.; Percybrooks, W.S. Melanoma diagnosis using deep learning techniques on dermatoscopic images. BMC Med. Imaging 2021, 21, 6.
Akyel, Cihan, and NursalArıcı (2022), "LinkNet-B7: Noise Removal and Lesion Segmentation in Images of Skin Cancer." Mathematics 10, no. 5: 736.
Ali, K.; Shaikh, Z.A.; Khana, A.A.; Laghari, A.A. (2022), Multiclass skin cancer classification using EfficientNets—A first step towards preventing skin cancer. Artif. Intell. Brain Inform. 2022, 2, 1–10.
Al-Masni, M. A., Al-Antari, M. A., Choi, M. T., Han, S. M., & Kim, T. S. (2018). Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks. Computer methods and programs in biomedicine, 162, 221-231.
Alom, MdZahangir, TheusAspiras, Tarek M. Taha, and Vijayan K. Asari (2020), "Skin cancer segmentation and classification with improved deep convolutional neural network." In Medical Imaging 2020: Imaging informatics for healthcare, research, and applications, vol. 11318, pp. 291-301.
Anjum, Muhammad Almas, Javaria Amin, Muhammad Sharif, Habib Ullah Khan, Muhammad Sheraz Arshad Malik, and SeifedineKadry (2020). "Deep semantic segmentation and multi-class skin lesion classification based on convolutional neural network." IEEE Access 8: 129668-129678.
Araújo, Rafael Luz, Flavio HD de Araujo, and Romuere RV Silva. (2022), "Automatic segmentation of melanoma skin cancer using transfer learning and fine-tuning." Multimedia Systems 28, no. 4: 1239-1250.
Bagheri, F.; Tarokh, M.J.; Ziaratban, M. (2021). Skin lesion segmentation based on mask RCNN, Multi Atrous Full-CNN, and a geodesic method. Int. J. Imaging Syst. Technol., 31, 1609–1624.
Bibi, A., Khan, M. A., Javed, M. Y., Tariq, U., Kang, B. G., Nam, Y., ... & Sakr, R. H. (2022). Skin lesion segmentation and classification using conventional and deep learning based framework. Comput. Mater. Contin, 71, 2477-2495.
Carletta, J. (1996). Assessing agreement on classification tasks: the kappa statistic. arXiv preprint cmp-lg/9602004.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., &Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
Dimša, Nojus, and AgnėPaulauskaitė-Tarasevičienė (2021). "Melanoma multi class segmentation using different U-Net type architectures." In CEUR workshop proceedings: IVUS 2021: Information society and university studies 2021: Proceedings of the 26th international conference on information society and university studies (IVUS 2021), Kaunas, Lithuania, April 23, 2021, vol. 2915, pp. 84-91. CEUR-WS.
Dong, Y., Wang, L., Cheng, S., & Li, Y. (2021). Fac-net: Feedback attention network based on context encoder network for skin lesion segmentation. Sensors, 21(15), 5172.
Fraser, A., &Marcu, D. (2007). Measuring word alignment quality for statistical machine translation. Computational Linguistics, 33(3), 293-303.
Goyal, Manu, Moi Hoon Yap, and Saeed Hassanpour, (2017) "Multi-class semantic segmentation of skin lesions via fully convolutional networks." arXiv preprint arXiv:1711.10449.
Hasan, M. K., Dahal, L., Samarakoon, P. N., Tushar, F. I., &Martí, R. (2020). DSNet: Automatic dermoscopic skin lesion segmentation. Computers in biology and medicine, 120, 103738.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14 (pp. 630-645). Springer International Publishing.
Jayapriya, K.; Jacob, I.J. Hybrid fully convolutional networks-based skin lesion segmentation and melanoma detection using deep feature. Int. J. Imaging Syst. Technol. 2019, 30, 348–357.
Kumar, K. Anup, and C. Vanmathi (2022), "Optimization driven model and segmentation network for skin cancer detection." Computers and Electrical Engineering 103: 108359.
Lama, N., Hagerty, J., Nambisan, A., Stanley, R. J., & Van Stoecker, W. (2023). Skin Lesion Segmentation in Dermoscopic Images with Noisy Data. Journal of Digital Imaging, 1-11.
Lee, T.; Ng, V.; Gallagher, R.; Coldman, A.; McLean, D. Dullrazor (1997), A Software Approach to Hair Removal from Images. Comput. Biol. Med., 27, 533–543.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
Malibari, A.A.; Alzahrani, J.S.; Eltahir, M.M.; Malik, V.; Obayya, M.; Al Duhayyim, M.; Neto, A.V.L.; de Albuquerque, V.H.C. Optimal deep neural network-driven computer aided diagnosis model for skin cancer. Comput. Electr. Eng. 2022, 103, 108318.
Malik, S., Akram, T., Ashraf, I., Rafiullah, M., Ullah, M., & Tanveer, J. (2022). A hybrid preprocessor DE-ABC for efficient skin-lesion segmentation with improved contrast. Diagnostics, 12(11), 2625.
Mehta, P., & Shah, B. (2016). Review on techniques and steps of computer aided skin cancer diagnosis. Procedia Computer Science, 85, 309-316.
Mohakud, Rasmiranjan, and Rajashree Dash (2022). "Skin cancer image segmentation utilizing a novel EN-GWO based hyper-parameter optimized FCEDN." Journal of King Saud University-Computer and Information Sciences.
Monika, M. Krishna, et al.(2020), "Skin cancer detection and classification using machine learning." Materials Today: Proceedings 33: 4266-4270.
Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.
Reis, HaticeCatal, Veysel Turk, KouroshKhoshelham, and Serhat Kaya (2022), "InSiNet: a deep convolutional approach to skin cancer detection and segmentation." Medical & Biological Engineering & Computing 60, no. 3: 643-662.
Ronneberger, O., Fischer, P., &Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 (pp. 234-241). Springer International Publishing.
Senan, M.; Jadhav, M.(2019). Classification of Dermoscopy Images for Early Detection of Skin Cancer—A Review. Int. J. Comput. Appl., 178, 37–43.
Siegel, R. L., Miller, K. D., Fuchs, H. E., & Jemal, A. (2021). Cancer statistics, 2021. Ca Cancer J Clin, 71(1), 7-33.
Suganeshwari, G., & Ibrahim, S. S. (2018). A comparison study on similarity measures in collaborative filtering algorithms for movie recommendation. Int J Pure Appl Math, 119(15), 1495-1505.
Szegedy, C., Ioffe, S., Vanhoucke, V., &Alemi, A. (2017, February). Inception-v4, inception-Res-Net and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
Vidya, Maya, and Maya V. Karki. "Skin cancer detection using machine learning techniques." (2020) IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). IEEE.
Vijayalakshmi, M. (2019), Melanoma Skin Cancer Detection using Image Processing and Machine Learning. Int. J. Trend Sci. Res. Dev., 3, 780–784.
Xiong, W., Jia, X., Yang, D., Ai, M., Li, L., & Wang, S. (2021). DP-LinkNet: A convolutional network for historical document image binarization. KSII Transactions on Internet and Information Systems (TIIS), 15(5), 1778-1797.
Xua, L., Jackowskia, M., Goshtasbya, A., Rosemanb, D., Binesb, S., Yuc, C., ... &Huntleye, (1999) A. Segmentation of skin cancer images, Image and Vision Computing 17 65–74.
Zhou, L., Zhang, C., & Wu, M. (2018). D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 182-186).

Download PDF

Reviewers agreed at journal
06 Nov, 2023
Reviewers invited by journal
20 Oct, 2023
Editor invited by journal
12 Oct, 2023
Editor assigned by journal
09 Oct, 2023
First submitted to journal
07 Oct, 2023

You are reading this latest preprint version

A multi-level closing based segmentation framework for dermatoscopic images using ensemble deep network.

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

3. Methodology

3.1 Pre-processing:

3.1.1 Hair removal using multi-level closing and flat field algorithm:

3.1.1.1Image resizing:

3.1.1.2Multi-level Closing operation:

3.1.2 Pre-processing for Image quality enhancement

3.1.2.1Normalization of images

3.1.2.2Median filter

3.2 Training using Ensemble framework

3.2.1 U-NET Model

3.2.2 RES-NET Model

4. Results and Discussion

4.1 Dataset description:

4.1.1 ISIC skin cancer

4.1.2 Skin Cancer MNIST: HAM10000

4.2 Parameters for comparison

4.2.1 Weighted Average Recall:

4.2.2 Weighted Average Precision:

4.2.3 Accuracy

4.2.4 The Dice coefficient:

4.2.5 The Jaccard value:

4.3 Experimental results and discussion

4.3.1 Experiment 1:

4.3.2 Experiment 2:

5. Conclusion

Declarations

References

Status:

Version 1