This article attempted to develop an efficient framework for the classification of different Lung diseases. In this article, different images of chest X-Ray were used as input for deep feature extraction using TL algorithms [28]. Features have been extracted and selected using TL and GWO, respectively. The efficacy of the proposed methodology is evaluated using Precision, Sensitivity, and Specificity scores. In addition, the results were compared to the existing state-of-the-art methods.
3.1 Dataset Description (NIH Chest X-Ray)
This research utilizes the data of the NIH Chest X-ray Dataset, a collection of 112,120 X-ray scans of size 1024x1024 annotated with illness diagnoses from 30,805 individuals. These tags were generated by the authors using Natural Language Processing (NLP) to extract categorizations of diseases from publications on related imaging. More detail about the dataset can be found on the official site at [https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345]. Out of this dataset, we have used only 3 classes Cardiomegaly, Emphysema and Hernia which are a total of 5464 images of size 1024x1024, which are then processed further.
3.2 Data preprocessing
The NIH Chest X-ray Dataset consists of 14 different classes originally. The authors are focusing on only 3 classes out of these 14 classes, which are Cardiomegaly, Emphysema, and Hernia. The authors have selected these as these were the top-performing classes on the dataset on the metric of AUC score.
Table 1
Detailed sample-wise distribution of the dataset before oversampling
Dataset
|
Total Images
|
Cardiomegaly
|
Emphysema
|
Hernia
|
NIH Chest X-ray
|
5464
|
images = 2776
|
images = 2516
|
images = 227
|
Class = 1
|
Class = 2
|
Class = 3
|
Label = 0
|
Label = 1
|
Label = 2
|
It can be clearly seen from Table 1 and Figure. 2 (a)that the dataset is highly imbalanced, so to solve this problem, we used the Random oversampling method, which replicates the minority class samples. The balanced dataset is shown in Table 2 and Figure. 2 (b).
Table 2
Detailed distribution of dataset after Random oversampling
Dataset
|
Total Images
|
Cardiomegaly
|
Emphysema
|
Hernia
|
NIH
Chest X-ray
|
7734
|
images = 2846
|
images = 2556
|
images = 2497
|
Class = 1
|
Class = 2
|
Class = 3
|
Label = 0
|
Label = 1
|
Label = 2
|
3.2.1 Data Augmentation
Data augmentation [31] creates new versions of datasets using preexisting data to expand the training set manually. It involves making little adjustments to the dataset or creating new data points using deep learning. The dataset, after preprocessing, is passed to the image data generator where they are rescaled to 1/255, augmented using shear_range = 0.2, zoom_range = 0.2, rotation_range = 24, horizontal_flip = True, vertical_flip = True and then converted to a target size of 256x256. After performing the various augmentation operations for data sampling, sample images are shown in Figure. 3 based on class labels and class names.
3.3 Proposed Framework
The framework of the study is demonstrated in Figure. 4. There are four modules in the framework. The first module features Data Genesis. In the second module, data preparation is carried out. The third module concerns data compatibility for TL. The application of the suggested approach to subject classification comes last. Finally, the suggested model is contrasted with different methods using evaluation metrics, comparing each class separately. The proposed strategy is contrasted with earlier research as well.
3.3.1 Feature Extraction using Residual Learning Models
Dual feature extraction is applied in this study by implementing two fined tuned TL models. For this, both the models are trained for 50 epochs each with the whole model as trainable params = True because we have oversampled the lower class and have enough samples to completely train the neural network.
-
The architecture of ResNet-50 [32] is a bottleneck architecture that serves as the basis for the 50-layer ResNet's building block. The inclusion of 1x1 convolutions, sometimes known as a "bottleneck," in a bottleneck residual block assists in minimising the total number of the parameters as well as the number of matrix multiplications. Because of this, the training of each layer may take place considerably more quickly. After eliminating the most recent two layers from the trained model, all of the data is then sent as input to the model, and features are taken from the shape 2048.
-
The ResNet-101 [33] network is a convolutional neural network that has a total of 101 layers. An enhanced version of the network that has been extensively trained using data from the ImageNet database of over 1 million images may be inserted. The pretrained network is able to divide imaged objects into one thousand distinct categories, such as "keyboard," "mouse," "pencil," and "many animals." As an immediate outcome of this, the network has acquired the ability to learn rich feature representations for an assortment of imagine types. The maximum size for an image that may be imported to the network is 224x224. Following the removal of the top two layers of the trained model, all of the data is then sent as input to the model, after which features are extracted from the form 2048.
3.3.2 Feature Selection Using Grey Wolf Optimization Techniques (GWO)
Among those 2048, there lies many unwanted features which are not essential for training; they can be easily removed by using optimization techniques as summarized below.
-
GWO on RestNet-50: After receiving outputs from ResNet-50, they are passed to GWO [34] for selection of most suitable features; here, out of 2048, it gave an output of 1689 features on the population size of 25 and for 100 iterations.
-
GWO on RestNet-101:After receiving outputs from ResNet-101, they are passed to GWO for selection of most suitable features, here out of 2048, it gave an output of around 1463 features on the population size of 25 and for 100 iterations.
3.3.3 Concatenation of Two Feature Sets
After getting the best set of features from both the GWO those features are linearly concatenated, and a total of 3152 features are received. Now every image of the dataset has 3152 features. The dataset is then split to train and test on the ratio of 75:25 with “shuffling = True”.
3.4 Proposed Approach: Dual Feature Extraction using ResNet with GWO and Deep Dense Neural Network (ResNet-GWO-DD)
The proposed framework conducts the bilinear pooling (concatenation) of features extracted from two models that are ResNet-50 and ResNet-101, which are then transmitted to a neural network to classify a three-class multilabel disease. Using the NIH chest X-ray dataset, three classes were taken into consideration (Cardiomegaly, Emphysema, and Hernia), from which we identified significant medical conditions that can substantially impact a person health. Using imageDatagenerator for all three channels, we converted the original 1024x1024 pixel data to 256x256 pixels for quicker processing and less data loss. Then, we oversampled class hernia in order to analyse this unbalanced dataset, which is then divided into 75 percent training and 25 percent testing. ResNet 50 and 101 are loaded on Imagenet along with 1 globalAvgpooling, (1 + 1) dense layers, and the last layer for prediction is trained using training data for both models of input shape (256,256,3). The last two layers are removed from both trained models, and total data (train + test) is predicted from those models, which converts the input image into a set of features of size (2048) for each image for both models. These processed features are sent to GWO for optimal feature selection of both feature sets; out of 2048, featureset-1 and featureset-2 select approximately 1500 features as F1 and F2, respectively. Both models are trained on 6187 images of size 256x256. The important calculations for input and output at each step are given in Table 3.
Table 3
Flow of features set at each step.
Input to the RestNet models
|
ResNet-50 output features (F1) = 2048 features for each of 7734 images
ResNet-101 output features (F1) = 2048 features for each of its 7734 images
|
Input to GWO
|
GWO for ResNet-50 - input of 2048 features and output of 1689 best features (F2)
GWO for ResNet-101 - input of 2048 features and output of 1463 best features (F2)
|
Input to Neural Network
|
Both feature sets are concatenated along rows, and a total feature vector of 3152 is obtained for 7734 images. This final feature set is then divided to test and train on the ratio of 75:25, which are (5800:1934)
|
Concatenation
(Final Feature Set)
|
Final Feature-set (Concatenated) = 1689 + 1463 = 3152
The Neural network is then trained on 3152 features of sample 5800 and tested on 3152 features of sample 1934, which results in an output of 3 classes
|
Deep Dense Neural Network
|
Neural Network = 2 layers
[Layer1 - Dense(2048, input_shape=(3152), activation = relu)
Layer2 - Dense(3, activation = sigmoid)] Is added. then it is trained on the data for 10 epochs for 3 different optimisers Adam, COCOB, and SGD.
X_train(5800, 3152), y_train(5800, 3), X_test(1934, 3152), y_test(1934, 3)}]
|
The fundamental unit of the 50-layer ResNet is a bottleneck structure. A bottleneck of 11 convolutions is used in a bottleneck residual block to decrease the number of factors and matrix multiplications. Because of this, training for each layer may take place considerably more quickly. Deep feature extraction uses the activation layers to extract the relevant feature vectors, similar to the pre-trained TL model. Earlier activations offer basic images like edges, whereas later or deeper layers present higher-level features intended for image identification. A feature representation in ImageNets is provided by the activations of the first and second completely linked layers. The detailed architecture is given Fig. 5.
With the advent of ResNets (residual networks), the issue of training in extremely deep networks has been significantly simplified. Residual blocks are the building blocks of ResNets where the foundation of residual blocks is the skip connection, a direct link that bypasses some intermediate layers (may vary between models). This skip connection causes a change in the layer's final product. The residual block demonstrated in Figure. 7represents the input and is multiplied by the layer weights before being multiplied by a bias factor. The activation function f(x) produces H(x) as its output. ResNet prevents the gradient vanishing issue common to deep neural networks by including skip connections that provide a faster alternative gradient flow channel.
ResNet-50 is a convolutional neural network that consists of 50 layers of computation (48 convolutional layers, 1 MaxPool layer, and 1 average pool layer). Residual-50 layered neural networks are formed by layering residual blocks. In ResNet-50 architecture, convolution used a 7x7 kernel and 64 separate kernels, each having a stride of 2, comprised of a single layer. The next convolution employs a 1 * 1,64 kernel, then a 3 * 3,64 kernel, and finally a 1 * 1,256 kernel, with each of these three layers being repeated three times for a total of nine layers. Following that, a kernel of 1 * 1,128, then a kernel of 3 * 3,128, and finally a kernel of 1 * 1,512. This process was performed four times for a total of twelve layers. Then, a kernel of 1 * 1,256 is followed by kernels of 3 * 3,256 and 1 * 1,1024 for a grand total of 18 layers (this is done 6 times). Three times, a 3 * 3,512 kernel followed by a 1 * 1,512 kernel isused for a grand total of 9 layers. These 48 convolutional layers are utilized with the final two layers deleted based on the model depicted in Figure. 4ResNet-50.
The same procedure is followed with ResNet-101, except that thereafter, the results from both ResNet-50 and ResNet-101 are sent to GWO independently, where the GWO chooses the most optimum features. Finally, a fully connected neural network with two dense layers is fed the combined optimal features set. The first Dense layer employs a ReLu activation, whereas the second Dense layer employs a sigmoid activation as shown in Figure. 6.