In this study we compare and evaluate the efficacy of different preprocessing techniques in CNN based feature extraction and classification of DR stages, with help of a baseline DCNN architecture. In section 2, we have investigated and identified different state-of-art preprocessing strategies, which have been commonly used by the researchers in DL approaches for the DR gradation and classification tasks. In this work, we have proposed a new k-means clustering based retinal region extraction method and have introduced two new preprocessing pipelines (combinations of preprocessing techniques) for contrast enhancement and intensity normalization.
4.1 Preprocessing Pipelines and Staging
Preprocessing strategies for enhancing and standardizing the retinal images precede the feature extraction and DR classification steps in CNN [Figure 1].
4.1.1 Thresholding, Smoothing, Cropping and Resizing
The retinal region of interest is extracted using a binary mask automatically generated for each input retinal image using a hybrid approach which relies of unsupervised learning and as well as on empirical estimation. The steps in the automated retinal ROI region extraction are summarized as follows:
- k-Means clustering (the optimal k was found empirically and is set to 3, where cluster centers corresponds to background, retinal region and bright artifacts, respectively) is applied on the histogram equalized (through CLAHE) and median filtered output image.
- Then the clustered image is thresholded using the minimum non-zero cluster value and the resultant binary mask is smoothened using morphological opening and closing operations.
- Another binary mask is generated by thresholding the input image by 10% of maximum intensity and the resultant binary mask is smoothened by removing holes and white island using morphological opening and closing operations.
- The final retinal ROI mask is generated by minimal intersection of these two masks. The boundary of the circular mask is also eroded to 5% of its radius, to remove illumination artifacts near the edges.
- Finally, the intersection output of the retinal images with their corresponding final retinal ROI masks are cropped around the inner retinal circle using the mask boundaries to obtain retinal ROI and to reject the unwanted back-ground and noisy artifacts.
- The ROI extracted images are scale-normalized and resized to 256´256 pixels using bilinear interpolation, while retain fine-grain artifacts for better feature extraction.
4.1.2 Combining Enhancement and Normalization
To identify the most effective preprocessing strategy for DR classification, we select some commonly used preprocessing strategies, which have shown promising results in the reviewed DR works and also introduced two new preprocessing strategies for contrast enhancement and intensity normalization. In the preprocessing pipeline, we consider seven Contrast and Edge Enhancement Strategies (CEE) –
Five existing preprocessing methods –
- Hybrid color space conversion (LGI)
- Method proposed by Graham et al.12 (GRAHAM)
- Non-Local Means Denoising (NLMD)
- Median filtering followed by MDNCLAHE
- Distance Based Illumination Equalization (DBIE) introduced by of Zhou et al. 18 followed by enhancement based on Graham et al.12 (DBIE_GRAHAM).
Two new preprocessing methods –
- Illumination Equalization on median filtered CLAHE output (MDNCLAHE_IE)
- Contrast and edge enhancement based on Graham et al.12 on median filtered CLAHE output (MDNCLAHE_GRAHAM)
In addition, no-enhancement after ROI extraction (NONE) is also considered as an option.
We used three normalization strategies (NORM) –
- Z-score normalization (ZScr)
- Min-Max normalization (MnMx)
- Rescaling (Rscl)
The different preprocessing pipelines consisting of distinct combination of the enhancement and normalization pairs {CEE, NORM}, are listed in Table 2.
The pipeline goes as follows – raw retinal images to ROI extraction and resizing, then the output goes to the enhancement step (CEE) and then enhanced image goes to the normalization (NORM) step. Each of the distinct preprocessing pipe-line is applied on the train, validation and test dataset, before feeding the result to the ResNet-50. The output of ROI extraction and different Contrast and Edge Enhancement Strategies are illustrated in figure 4.
Table 2. The {CEE, NORM} Pairs of the different Preprocessing Pipelines
SL. No.
|
{CEE, NORM}
|
SL. No.
|
{CEE, NORM}
|
1
|
{NONE, ZScr}
|
13
|
{MDNCLAHE, MnMx}
|
2
|
{LGI, ZScr}
|
14
|
{MDNCLAHE_IE, MnMx}
|
3
|
{GRAHAM, ZScr}
|
15
|
{DBIE_GRAHAM, MnMx}
|
4
|
{NLMD, ZScr}
|
16
|
{MDNCLAHE_GRAHAM, MnMx}
|
5
|
{MDNCLAHE, ZScr}
|
17
|
{NONE, Rscl}
|
6
|
{MDNCLAHE_IE, ZScr}
|
18
|
{LGI, Rscl}
|
7
|
{DBIE_GRAHAM, ZScr}
|
19
|
{GRAHAM, Rscl}
|
8
|
{MDNCLAHE_GRAHAM, ZScr}
|
20
|
{NLMD, Rscl}
|
9
|
{NONE, MnMx}
|
21
|
{MDNCLAHE, Rscl}
|
10
|
{LGI, MnMx}
|
22
|
{MDNCLAHE_IE, Rscl}
|
11
|
{GRAHAM, MnMx}
|
23
|
{DBIE_GRAHAM, Rscl}
|
12
|
{NLMD, MnMx}
|
24
|
{MDNCLAHE_GRAHAM, Rscl}
|
4.2 Implementation Details
The baseline ResNet-50 (pretrained on ImageNet21) is first trained on the pre-processed retinal images from the Kaggle EyePACs dataset with 70%-30% split between train and validation data. For each of the 24 preprocessing pipelines, the model is separately trained for 100 epochs. Then, each of the Kaggle EyePACs pretrained ResNet-50 model of the 24 preprocessing pipelines are further fine-tuned on preprocessed retinal images from APTOS training dataset with 70%-10%-20% split between train, validation and test data, for another 100 epochs for each preprocessing pipeline.
The top layers9 after the global average pooling layers of the pretrained model are dropped off and replaced by a dense layer with 1024 neurons followed by a batch-normalization layer, ReLU activation layer, and a dropout layer (dropout rate of 0.2). The final layer’s weights are initialized according to He et al.22. Finally a 5-class softmax classifier is added for complete DR grading.
For binary classification of DR-screening and referable-DR, the predicted labels and probabilities from the softmax classifier are grouped accordingly to produce the predicted classes and their probabilities. The schematic overview of the preprocessing pipelines and DCNN framework for the classification task is illustrated in figure 2. All the models are trained and tested on a single NVIDIA GeForce GTX 1650 GPU using Keras 2.3.1 on Tensorflow 1.14.0 backend. For each classification task and for each preprocessing pipeline, the DCNN is fine-tuned in end-to-end manner with SGD momentum optimizer with an initial learning rate of 0.001 and a fixed batch size of 8.
The learning rate is scheduled with a decrease rate of 0.1, when validation accuracy fails to drop for 10 consecutive epochs. L2 weight decay regularizer with factor of 0.001 is applied to all the layers.
We also increase the effective number of training images in order to increase generalization and reduce over-fitting. Random data augmentations such as random rotations of 0-90 degrees, random horizontal and vertical flips, and random horizontal and vertical shifts are employed to enforce rotation and translation invariances in the deep feature. It also helps to increases heterogeneity in the samples while preserving prognostic characteristics. Random oversampling of minority classes and augmentation together is used to address the class imbalance problem.