A Novel Dual Feature Extraction using Fine-Tuned ResNet with GWO and Deep Dense Neural Network for Multiple Lung Disease Classification

doi:10.21203/rs.3.rs-3316471/v1

Download PDF

Research Article

A Novel Dual Feature Extraction using Fine-Tuned ResNet with GWO and Deep Dense Neural Network for Multiple Lung Disease Classification

https://doi.org/10.21203/rs.3.rs-3316471/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Lung diseases are one of the primary causes of mortality worldwide. The majority of lung disorders are not discovered until they have progressed significantly. Therefore, the development of systems and methods that allow for immediate and earlier diagnosis will play a crucial role in the modern world. Computer Aided Diagnosis (CADx) systems presently performs this role and are being expanded. This study investigates the feasibility of employing methods for learning features from fine-tuned adaptive learning rate deep learning architectures to provide robust and comprehensive features on NIH Chest X-ray Dataset for three class (are Cardiomegaly, Emphysema, and Hernia) lung disease. A novel dual feature extraction using residual networks with nature inspired Gray Wolf Optimization (GWO) algorithm and Deep Dense Neural Network (ResNet-GWO-DD) is proposed in this study. Dual feature extraction is experimented using two fine-tuned ResNet-50 and ResNet-101 Transfer Learning (TL) architectures. The deep learned features were optimized using Grey Wolf Optimization (GWO). The global best optimal features extracted using GWO are combined for classification using Deep Dense Neural Network. The dual learning of deep features using ResNet-50 and ResNet-101 help the GWO to learn global best optimal features. These dual learning capabilities greatly enhance the performance of the proposed model and achieve significant accuracy while comparing the state-of-the-art methods. The performance of proposed method is further evaluated using three different optimizers such as Adam, stochastic gradient descent (SGD), and Continuous Coin Betting (COCOB). Deep features extracted using GWO and optimizer Adam has yielded maximum accuracy of 99.68%, 96.63% and 96.58% for Hernia, Emphysema, and Cardiomegaly respectively compared to SGD and COCOB.

NIH Lung Disease

Dual Feature extraction

Resnet-50

Resnet-101

GWO

PSO

Deep Dense Neural Network

Lung disease refers to any condition in which the lungs are affected by a disorder that hinders them from performing their function normally. Lungs are an essential part of an intricate mechanism that undergoes thousands of cycles of expansion and relaxation each day in order to take in oxygen and expel carbon dioxide. Lung disease can result from complications with any component of this system.

Similar to cigarette smoking, climate change poses a threat to the public's health. Air pollution has already greatly increased as a result of elevated atmospheric carbon dioxide levels, compromising respiratory health. According to Science, a two-degree increase in the average global temperature had put people in great danger. Climate change is more likely to affect people who have asthma, chronic obstructive pulmonary disease (COPD), or lung cancer. COPD was the third largest cause of death globally in 2019 with an estimated 3.23 million fatalities [1].The authors are focusing on the detection of only 3 diseases which are Cardiomegaly, Emphysema, and Hernia. The heart enlarges as a result of the medical condition known as cardiomegaly. The more generic term for it is "having an enlarged heart". It is typically caused by underlying heart-demanding illnesses such obesity, heart valve disease, high blood pressure, and coronary artery disease. Cardiomegaly and cardiomyopathy are related conditions [2]. Any air-filled expansion in the body's tissues is referred to as emphysema. Emphysema, commonly referred to as pulmonary emphysema, specifically refers to the enlargement of air gaps (alveoli) in the lungs. Lower respiratory tract disease called emphysema is characterised by enlarged, varied, and occasionally very massive air-filled regions in the lungs. The collapse of the alveolar walls results in the voids, which take the place of the spongy lung tissue. As a result, there is less total alveolar surface available for gas exchange, which lowers the blood's ability to exchange oxygen [3]. A lung hernia is when the lung protrudes through the thoracic wall. Lung hernia occurs congenitally in 20% of patients. The hernia is discovered during thoracic surgery, certain pulmonary illnesses, or chest trauma in 80% of cases. Due to the weak neck muscles, congenital hernias develop. Frequent coughing can result in high intra thoracic pressure in pulmonary illnesses like asthma, which can push the lung out [4]. The Chest-X-ray images of all the three diseases are shown in Fig. 1.

The classification and definition of diseases are crucial for advancing human health. The essential concepts for optimal diagnosis, treatment, and prophylaxis are directly reflected in the taxonomies used. The more thoroughly the underlying intricacies are studied and this knowledge is connected to the patient's condition, the more accurate decisions on sickness course, prognosis, and risks for hereditary or environmental transmissions are addressed [5].

Healthcare informatics can benefit significantly from machine learning. It can be used for the prognosis, classification, and diagnosis of diseases [6–10]. One of the most popular radiological tests for lung disease screening is the chest X-ray. X-ray procedures can be completed quickly and affordably. Additionally, each scan can find a variety of suspected pathologies, including pneumonia and tuberculosis, among others. CADe and CADx have received a lot of attention from researchers in the field of medicine. CAD for chest X-rays has the potential to develop into a useful aid for radiologists that is also affordable [11].

Deep learning technologies are superior in addressing a variety of X-ray analysis problems, including image classification [12, 13], NLP-based analysis [14], and localization [15], according to recent developments in artificial intelligence and machine learning. The growing amount of publicly available medical imaging datasets, including CT, MRI, and X-ray images, benefits deep learning's data-driven approach [16]. A sizable collection of chest X-rays showing the absence or presence of 14 lung illnesses was introduced by Wang et al. [17].

It assisted the authors in determining that nature-inspired feature selection approaches which were undocumented. In addition, the time complexity of utilizing three optimizers with differing learning rates was not reported. In the literature review, the numerous factors that affect multiclass classification performance were missing. In this study, the authors endeavored to concentrate on these parameters in order to improve performance accuracy and prediction outcomes.

1.2 Contributions and Highlights of the proposed model

NIH Chest X-ray Dataset, which consists of 1,12,120 X-ray images of size 1024x1024 with disease labels from 30,805 unique patients, is used in this study for three lung disease classification tasks.

Random Oversampling was applied afterwards to add randomly duplicate examples in the minority class.

A variety of Data Augmentation techniques such as shear_range, zoom_range, rotation_range, horizontal_flip, vertical_flip are applied on NIH X-ray dataset.

In this study, we have applied ResNet-50, ResNet-101 for feature extraction. Dual feature extraction is experimented with selection of important features and improved performance.

Further, the authors have also applied nature-inspired GWO technique for feature selection. The output features selected are then concatenated to create a new feature set.

Deep dense layers are added further to learn the weights and perform the classification of three lung diseases.

The performance of the classification of these models was also checked using three different optimisers, such as Adam, SGD and COCOB. The parameter tuning using three optimizers were tested on time consumed by the models to identify the efficacy among the performance metrics.

Models' efficacy was checked using a wide range of metrics like the Accuracy, Precision, Recall, F1-Score, and Cohen_kappa matrices.

The proposed model has shown significant accuracy and outperforms the other models when compared to the state-of-the-art work.

The future sections are arranged as follows: Literature survey discussed in Section 2; Methodology of the proposed work is shown in Section 3; Results are discussed in Section 4; Comparative Analysis based on the performance of the proposed work is presented in Section 5; and finally, the conclusion is presented in Section 6.

DL techniques, especially TL, have recently shown much promise in making it easier to detect lung cancer from medical imaging data. The authors investigated some of the relevant research that used transfer learning to predict lung cancer in their literature review.

Using a database of 6,000 audio recordings and a crackle/non-crackle categorization system, Serbes et al. [22] categorised lung noises. For the purpose of detecting respiratory cracking, time-frequency (TF) and time-scale (TS) feature extraction were preferred. For classification, Support Vector Machine (SVM) [18], k-Nearest Neighbours (k-NN), and multi-layer sensor techniques with the SVM classifier were applied and achieved 97.5% accuracy. Furthermore, Jin et al. [23] assessed two collections of data using tracheal breath sound (TBS) and congenital adventitious sound (CAS). Independent TBS and CAS data sets were created for both inspiration and expiration. Various types of respiratory distress are categorised in the TBS and CAS databases. The function of differentiation, instantaneous kurtosis, and attributes retrieved by Sampen. The accuracy of SVM classifiers trained using a Radial Basis Function (RBF) kernel ranged from 97.7–98.8%. Bahoura et al. [24] utilised MFCC to extract traits from audio samples of individuals with and without asthma. The Gaussian Mixture Model (GMM) was then used to train and assess the algorithm; the maximum reported accuracy was 94.2%.

Normal, fine crackle, coarse crackling, polyphonic wheezes and monophonic were analysed by Reyes et al. [25], who utilised the Genetic algorithm, Fisher's discriminate ratio, and Higher Order Statistics (HOS) to extract attributes. A total of 94.6% accuracy was measured. CNN-trained images of lung illness offered a method of classifying the various symptoms into several groups. Authors [26] attempted to use the U.S. National Institutes of Health (NIH) collection including Normal, Pneumothorax and Pneumonia, images for lung disease classification. EficientNet B7 pretrained on ImageNet weights was used for fine-tuning learning, and the Multi GAP structure maximised layer features. The NIH dataset benchmarks had the best accuracy of 85.32%. The author's [27] experimented a DL approach for this investigation to classify lung disorders based on Chest X-ray radiography (CXR) images from the NIH dataset with EfficientNet V2. CXR images of the chest provide for easier and more precise identification of lung diseases. The validation results for their suggested technique were 82.15% accurate.

Vijayan et al. (2022) [29] used a combination of six distinct DL models [19] and three distinct types of optimizers. Some models used were Inception-V3, AlexNet, ResNet, EfficientNet-b0, GoogleNet, and SqueezeNet. Comparisons are made between the outcomes of the various models and those obtained using the stochastic gradient, momentum, Adam, and RMSProp optimisation procedures to determine the most successful. Their research showed that when Adam was used as the optimiser for GoogleNet, accuracy increased to 92.08%.Wang et al. (2020) [30] tested a novel residual neural network with a transfer learning approach on medical images [20, 21] to identify pathology in lung cancer subtypes for an accurate and reliable diagnosis. They enhanced the algorithm using their proprietary lung cancer dataset from Shandong Provincial Hospital after initial training on the open medical image dataset luna16. Their method has an 85.71% success rate in detecting lung cancer on CT images.

The comprehensive literature review assisted the authors in determining that nature-inspired feature selection approaches were undocumented. In addition, the time complexity of utilizing three optimizers with differing learning rates was not reported. In the literature review, the numerous factors that affect multiclass classification performance were missing. In this study, the authors endeavored to concentrate on these parameters in order to improve performance accuracy and prediction outcomes.

This article attempted to develop an efficient framework for the classification of different Lung diseases. In this article, different images of chest X-Ray were used as input for deep feature extraction using TL algorithms [28]. Features have been extracted and selected using TL and GWO, respectively. The efficacy of the proposed methodology is evaluated using Precision, Sensitivity, and Specificity scores. In addition, the results were compared to the existing state-of-the-art methods.

3.1 Dataset Description (NIH Chest X-Ray)

This research utilizes the data of the NIH Chest X-ray Dataset, a collection of 112,120 X-ray scans of size 1024x1024 annotated with illness diagnoses from 30,805 individuals. These tags were generated by the authors using Natural Language Processing (NLP) to extract categorizations of diseases from publications on related imaging. More detail about the dataset can be found on the official site at [https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345]. Out of this dataset, we have used only 3 classes Cardiomegaly, Emphysema and Hernia which are a total of 5464 images of size 1024x1024, which are then processed further.

3.2 Data preprocessing

The NIH Chest X-ray Dataset consists of 14 different classes originally. The authors are focusing on only 3 classes out of these 14 classes, which are Cardiomegaly, Emphysema, and Hernia. The authors have selected these as these were the top-performing classes on the dataset on the metric of AUC score.

Table 1

Detailed sample-wise distribution of the dataset before oversampling
Dataset	Total Images	Cardiomegaly	Emphysema	Hernia
NIH Chest X-ray	5464	images = 2776	images = 2516	images = 227
		Class = 1	Class = 2	Class = 3
		Label = 0	Label = 1	Label = 2

It can be clearly seen from Table 1 and Figure. 2 (a)that the dataset is highly imbalanced, so to solve this problem, we used the Random oversampling method, which replicates the minority class samples. The balanced dataset is shown in Table 2 and Figure. 2 (b).

Table 2

Detailed distribution of dataset after Random oversampling
Dataset	Total Images	Cardiomegaly	Emphysema	Hernia
NIH Chest X-ray	7734	images = 2846	images = 2556	images = 2497
		Class = 1	Class = 2	Class = 3
		Label = 0	Label = 1	Label = 2

3.2.1 Data Augmentation

Data augmentation [31] creates new versions of datasets using preexisting data to expand the training set manually. It involves making little adjustments to the dataset or creating new data points using deep learning. The dataset, after preprocessing, is passed to the image data generator where they are rescaled to 1/255, augmented using shear_range = 0.2, zoom_range = 0.2, rotation_range = 24, horizontal_flip = True, vertical_flip = True and then converted to a target size of 256x256. After performing the various augmentation operations for data sampling, sample images are shown in Figure. 3 based on class labels and class names.

3.3 Proposed Framework

The framework of the study is demonstrated in Figure. 4. There are four modules in the framework. The first module features Data Genesis. In the second module, data preparation is carried out. The third module concerns data compatibility for TL. The application of the suggested approach to subject classification comes last. Finally, the suggested model is contrasted with different methods using evaluation metrics, comparing each class separately. The proposed strategy is contrasted with earlier research as well.

3.3.1 Feature Extraction using Residual Learning Models

Dual feature extraction is applied in this study by implementing two fined tuned TL models. For this, both the models are trained for 50 epochs each with the whole model as trainable params = True because we have oversampled the lower class and have enough samples to completely train the neural network.

The architecture of ResNet-50 [32] is a bottleneck architecture that serves as the basis for the 50-layer ResNet's building block. The inclusion of 1x1 convolutions, sometimes known as a "bottleneck," in a bottleneck residual block assists in minimising the total number of the parameters as well as the number of matrix multiplications. Because of this, the training of each layer may take place considerably more quickly. After eliminating the most recent two layers from the trained model, all of the data is then sent as input to the model, and features are taken from the shape 2048.
The ResNet-101 [33] network is a convolutional neural network that has a total of 101 layers. An enhanced version of the network that has been extensively trained using data from the ImageNet database of over 1 million images may be inserted. The pretrained network is able to divide imaged objects into one thousand distinct categories, such as "keyboard," "mouse," "pencil," and "many animals." As an immediate outcome of this, the network has acquired the ability to learn rich feature representations for an assortment of imagine types. The maximum size for an image that may be imported to the network is 224x224. Following the removal of the top two layers of the trained model, all of the data is then sent as input to the model, after which features are extracted from the form 2048.

3.3.2 Feature Selection Using Grey Wolf Optimization Techniques (GWO)

Among those 2048, there lies many unwanted features which are not essential for training; they can be easily removed by using optimization techniques as summarized below.

GWO on RestNet-50: After receiving outputs from ResNet-50, they are passed to GWO [34] for selection of most suitable features; here, out of 2048, it gave an output of 1689 features on the population size of 25 and for 100 iterations.
GWO on RestNet-101:After receiving outputs from ResNet-101, they are passed to GWO for selection of most suitable features, here out of 2048, it gave an output of around 1463 features on the population size of 25 and for 100 iterations.

3.3.3 Concatenation of Two Feature Sets

After getting the best set of features from both the GWO those features are linearly concatenated, and a total of 3152 features are received. Now every image of the dataset has 3152 features. The dataset is then split to train and test on the ratio of 75:25 with “shuffling = True”.

3.4 Proposed Approach: Dual Feature Extraction using ResNet with GWO and Deep Dense Neural Network (ResNet-GWO-DD)

The proposed framework conducts the bilinear pooling (concatenation) of features extracted from two models that are ResNet-50 and ResNet-101, which are then transmitted to a neural network to classify a three-class multilabel disease. Using the NIH chest X-ray dataset, three classes were taken into consideration (Cardiomegaly, Emphysema, and Hernia), from which we identified significant medical conditions that can substantially impact a person health. Using imageDatagenerator for all three channels, we converted the original 1024x1024 pixel data to 256x256 pixels for quicker processing and less data loss. Then, we oversampled class hernia in order to analyse this unbalanced dataset, which is then divided into 75 percent training and 25 percent testing. ResNet 50 and 101 are loaded on Imagenet along with 1 globalAvgpooling, (1 + 1) dense layers, and the last layer for prediction is trained using training data for both models of input shape (256,256,3). The last two layers are removed from both trained models, and total data (train + test) is predicted from those models, which converts the input image into a set of features of size (2048) for each image for both models. These processed features are sent to GWO for optimal feature selection of both feature sets; out of 2048, featureset-1 and featureset-2 select approximately 1500 features as F1 and F2, respectively. Both models are trained on 6187 images of size 256x256. The important calculations for input and output at each step are given in Table 3.

Table 3

Flow of features set at each step.
Input to the RestNet models	ResNet-50 output features (F1) = 2048 features for each of 7734 images ResNet-101 output features (F1) = 2048 features for each of its 7734 images
Input to GWO	GWO for ResNet-50 - input of 2048 features and output of 1689 best features (F2) GWO for ResNet-101 - input of 2048 features and output of 1463 best features (F2)
Input to Neural Network	Both feature sets are concatenated along rows, and a total feature vector of 3152 is obtained for 7734 images. This final feature set is then divided to test and train on the ratio of 75:25, which are (5800:1934)
Concatenation (Final Feature Set)	Final Feature-set (Concatenated) = 1689 + 1463 = 3152 The Neural network is then trained on 3152 features of sample 5800 and tested on 3152 features of sample 1934, which results in an output of 3 classes
Deep Dense Neural Network	Neural Network = 2 layers [Layer1 - Dense(2048, input_shape=(3152), activation = relu) Layer2 - Dense(3, activation = sigmoid)] Is added. then it is trained on the data for 10 epochs for 3 different optimisers Adam, COCOB, and SGD. X_train(5800, 3152), y_train(5800, 3), X_test(1934, 3152), y_test(1934, 3)}]

The fundamental unit of the 50-layer ResNet is a bottleneck structure. A bottleneck of 11 convolutions is used in a bottleneck residual block to decrease the number of factors and matrix multiplications. Because of this, training for each layer may take place considerably more quickly. Deep feature extraction uses the activation layers to extract the relevant feature vectors, similar to the pre-trained TL model. Earlier activations offer basic images like edges, whereas later or deeper layers present higher-level features intended for image identification. A feature representation in ImageNets is provided by the activations of the first and second completely linked layers. The detailed architecture is given Fig. 5.

With the advent of ResNets (residual networks), the issue of training in extremely deep networks has been significantly simplified. Residual blocks are the building blocks of ResNets where the foundation of residual blocks is the skip connection, a direct link that bypasses some intermediate layers (may vary between models). This skip connection causes a change in the layer's final product. The residual block demonstrated in Figure. 7represents the input and is multiplied by the layer weights before being multiplied by a bias factor. The activation function f(x) produces H(x) as its output. ResNet prevents the gradient vanishing issue common to deep neural networks by including skip connections that provide a faster alternative gradient flow channel.

ResNet-50 is a convolutional neural network that consists of 50 layers of computation (48 convolutional layers, 1 MaxPool layer, and 1 average pool layer). Residual-50 layered neural networks are formed by layering residual blocks. In ResNet-50 architecture, convolution used a 7x7 kernel and 64 separate kernels, each having a stride of 2, comprised of a single layer. The next convolution employs a 1 * 1,64 kernel, then a 3 * 3,64 kernel, and finally a 1 * 1,256 kernel, with each of these three layers being repeated three times for a total of nine layers. Following that, a kernel of 1 * 1,128, then a kernel of 3 * 3,128, and finally a kernel of 1 * 1,512. This process was performed four times for a total of twelve layers. Then, a kernel of 1 * 1,256 is followed by kernels of 3 * 3,256 and 1 * 1,1024 for a grand total of 18 layers (this is done 6 times). Three times, a 3 * 3,512 kernel followed by a 1 * 1,512 kernel isused for a grand total of 9 layers. These 48 convolutional layers are utilized with the final two layers deleted based on the model depicted in Figure. 4ResNet-50.

The same procedure is followed with ResNet-101, except that thereafter, the results from both ResNet-50 and ResNet-101 are sent to GWO independently, where the GWO chooses the most optimum features. Finally, a fully connected neural network with two dense layers is fed the combined optimal features set. The first Dense layer employs a ReLu activation, whereas the second Dense layer employs a sigmoid activation as shown in Figure. 6.

In order to assess the efficacy of the proposed designs, this section provides many experimental evaluations. To assess the accuracy of the predictions made by the classification issue, we looked at a wide range of performance metrics. Commonly employed performance indicators for classification issues include accuracy, precision, recall, F1_Score, and area under the receiver operating characteristic (AUC_ROC).

In order to assess the efficacy of the proposed designs, this section provides several experimental evaluations. To determine the accuracy of the predictions made by the classification, we evaluated a wide range of performance metrics. Commonly employed performance indicators for classification issues include accuracy, precision, recall, F1_Score, and area under the receiver operating characteristic (AUC_ROC).4.1 Implementation Details

The experimentation setup requires a 64 GB RAM Intel i7 11th Generation processor and 8 GB Nvidia RTX3070 GPU. For parallel computing, 16 cores are utilized using threaded gradient descent. Tensorflow 3.1 and Keras are used as the backbones, providing the modeling libraries. Besides the hardware requirements, Anaconda, Python, and VS Code are used to boost additional processing power. After the preprocessing phase is completed, the image volumetric blob is passed to the sparse volume convolution to increase the channeling output. Finally, respective transfer learning and hybrid models are applied.

4.2 Learning Rate Optimization

Adam: Adam [35] is the name of the optimizer, which adjusts the learning rate for each neural network weight by estimating the first and second moments of the gradient. The most effective stochastic optimization, Adam, is suggested as needing only first-order gradients in cases where memory is insufficient. Adam optimizer works with different parameters such as starting learning rate of 0.0001, beta_1 = 0.9, beta_2 = 0.999, and asmgrad = False with the callback function. In this article, Adam uses reduceLRonPlateau which schedules the Learning rate by (learning_rate* e^(-0.1)) which helps it to train faster and converge better.
SGD: An iterative technique for optimizing an objective function with sufficient smoothness qualities is stochastic gradient descent (SGD) [36]. It can be thought of as a stochastic approximation of gradient descent optimization since it substitutes an estimated gradient for the true gradient, which is generated from the whole set of data. SGD optimizer works with different parameters such as starting learning_rate of 0.0001, momentum = 0.9, and nesterov = False. In this article, LearningRateScheduler and reduceLRonPlateau with callback function at a rate of (learning_rate* e^(-0.1)) are used to train faster and converge better.
COCOB:The novel SGD algorithm Continuous Coin Betting (COCOB) [37] does not require any learning rate setting. In contrast to earlier approaches, we don't employ the presumptive curvature of the goal function or modify learning rates. Instead, we turn the optimization process into a coin-tossing game to get a deep network learning rate-free method. It does not contain a learning rate but contains alpha as a parameter whose default value of 100 is used in this model.

4.3 Performance Evaluation

The proposed framework conducts the bilinear pooling of features extracted from two models that are ResNet-50 and ResNet-101, which are then transmitted to a neural network to classify a three-class multilabel disease. Using the NIH chest x-ray dataset, three classes were taken into consideration (Cardiomegaly, Emphysema, and Hernia), from which we identified significant medical conditions that can substantially impact a person's health. The processed features are sent to GWO for optimal feature selection of both feature sets. Finally, a fully connected neural network with two dense layers is fed to the combined optimal features set to predict the 3 classes for different metrics sets as shown in Table 4.

Table 4

Performance evaluation of proposed approach with three optimizers
Model	Class	Accuracy	Precision	Recall	F1_Score	AUC_ROC
Proposed Model (ResNet-GWO-DD) with Adam	Cardiomegaly	96.58	94.67	95.77	95.22	96.68
	emphysema	96.63	96.33	93.49	94.89	95.39
	Hernia	99.68	99.08	100	99.53	99.68
Proposed Model (ResNet-GWO-DD) with COCOB	Cardiomegaly	96.53	94.92	95.34	95.13	96.84
	emphysema	96.17	94.40	94.11	94.26	95.15
	Hernia	99.68	99.08	100	99.53	99.80
Proposed Model (ResNet-GWO-DD )with SGD	Cardiomegaly	93.01	92.33	87.62	89.91	92.65
	Emphysema	92.39	94.63	81.88	87.80	88.16
	Hernia	94.62	95.84	92.73	94.26	93.69

Among all the results, the proposed model with Adam optimizer gives the best accuracy in case of all the 3 classes with an accuracy of 96.58% in case of Cardiomegaly, an accuracy of 96.63% in case of Emphysema, and an accuracy of 99.68% in case of Hernia as shown in Figure. 8.

(c) Area under the ROC Curve (AUC-ROC)

The AUC [38] curve is used to assess or visualize the performance of the multi-class classification issueThe AUC-ROC curve measures the extent to which a classification problem handles changes in the threshold. AUC is a statistical assessment of the ability to be repaired. It reveals how well the model can differ across classes AUC for the three classes using different optimizers is shown in Figure. 9–11. It can be clearly seen from all the 3 curves that the Adam optimizer performs best when compared to the rest of the 2 optimizers i.e. COCOB, and SGD. However, the result of COCOB is comparable compared to that of Adam.

In the proposed work different optimizers were used and compared based on the performance on every matrix. Table 5 and Figure.12 show the time complexity of different optimizers to check the efficiency of all the 3 optimizers. SGD is substantially faster since it is stochastic in nature, selecting a "random" sample of training data at every stage before computing the gradient. It can also be seen that where there is less time complexity, the accuracy is also low i.e. the time is directly proportional to accuracy as shown in Figure. 13.

Table 5

Time Complexity Comparison among different optimizers
Model with Diff Learning Rates	Time taken for 10 epochs in seconds
DDNN with Adam	20 sec
DDNN with COCOB	40 sec
DDNN with SDG	19 sec

5.2 Comparison with previous studies

Table 6 demonstrates the comparison between the proposed model and previous works utilizing various techniques. The effectiveness of this model is based on the fact that it combines the most successful models using Feature Extraction by ResNet-50 and ResNet-101. Further, the use of Feature selection using GWO and PSO enhances the performance of the model. Our study's performance matrices demonstrate the significance of the proposed model for diagnosis prediction. The Figure. 14 depicts the comparison of the proposed model rive to the state-of-the-art work.

Table 6

Comparison with previous studies
CLASS	Hong et al. [26]	Liu et al. [27]	Vijayan et al. [29]	Wang et al. [30]	Proposed Approach
Overall Accuracy	85.32%	82.15%	92.80%	85.71%	99.68%

In conclusion, this research paper proposed an approach for multi-lung disease classification using chest X-Ray images. The proposed approach involved using ResNet-50 and ResNet-101 models for feature extraction, followed by feature selection using the Gray Wolf Optimization algorithm. The optimal features were then concatenated into a final feature set, which was used to train a neural network with sigmoid activation at the output for classification.

The experimental results demonstrated that the proposed approach achieved promising results for the three-class classification of lung diseases. The accuracy of the proposed approach was found to be better than some of the existing methods reported in the literature. These results suggest that the proposed approach has the potential to be used as a reliable tool for the diagnosis of lung diseases using chest X-Ray images.

Overall, this research contributes to developing efficient and accurate methods for the classification of lung diseases, which can be used to improve the diagnosis and treatment of patients. Future work can focus on extending this approach to other medical imaging modalities and exploring the potential of Deep Learning for classifying other diseases.

Code Availability Statement: The core python code for the comparative analysis of classification techniques for AD in this article will be shared on request to the corresponding author.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

Ethical Statement: This article does not contain any studies with human participants and/or animals performed by any of the authors.

Funding Statement: The authors received no funding for this research work.

Ravi, V., Acharya, V., &Alazab, M. (2023). A multichannel EfficientNet deep learning-based stacking ensemble approach for lung disease detection using chest X-ray images. Cluster Computing, 26(2), 1181-1203.
Amin, H., & Siddiqui, W. J. (2021). Cardiomegaly. In StatPearls [internet]. StatPearls Publishing.
Thurlbeck, W. M., & Müller, N. L. (1994). Emphysema: definition, imaging, and quantification. AJR. American journal of roentgenology, 163(5), 1017-1025.
Weissberg, D. (2013). Lung hernia-a review. AdvClinExp Med, 22(5), 611-613.
Griese, M. (2022). Etiologic classification of diffuse parenchymal (interstitial) lung diseases. Journal of Clinical Medicine, 11(6), 1747.
Sahlol, A. T., AbdElaziz, M., Tariq Jamal, A., Damaševičius, R., & Farouk Hassan, O. (2020). A novel method for detection of tuberculosis in chest radiographs using artificial ecosystem-based optimisation of deep neural network features. Symmetry, 12(7), 1146.
Khan, M. A., Rajinikanth, V., Satapathy, S. C., Taniar, D., Mohanty, J. R., Tariq, U., &Damaševičius, R. (2021). VGG19 network assisted joint segmentation and classification of lung nodules in CT images. Diagnostics, 11(12), 2208.
Poap, D., Wozniak, M., Damaševičius, R., & Wei, W. (2018, November). Chest radiographs segmentation by the use of nature-inspired algorithm for lung disease detection. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 2298-2303). IEEE.
Khan, Yusera Farooq, et al. "HSI-LFS-BERT: Novel Hybrid Swarm Intelligence Based Linguistics Feature Selection and Computational Intelligent Model for Alzheimer’s Prediction Using Audio Transcript." IEEE Access 10 (2022): 126990-127004.
Sharma, C. M., Goyal, L., Chariar, V. M., & Sharma, N. (2022). Lung Disease Classification in CXR Images Using Hybrid Inception-ResNet-v2 Model and Edge Computing. Journal of Healthcare Engineering, 2022.
Ge, Z., Mahapatra, D., Sedai, S., Garnavi, R., &Chakravorty, R. (2018). Chest x-rays classification: A multi-label and fine-grained problem. arXiv preprint arXiv:1807.07247.
Yao, L., Poblenz, E., Dagunts, D., Covington, B., Bernard, D., & Lyman, K. (2017). Learning to diagnose from scratch by exploiting dependencies among labels. arXiv preprint arXiv:1710.10501.
Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., ... & Ng, A. Y. (2017). Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225.
Khan, Yusera Farooq, et al. "Stacked deep dense neural network model to predict alzheimer’s dementia using audio transcript data." IEEE Access 10 (2022): 32750-32765.
Payer, C., Štern, D., Bischof, H., &Urschler, M. (2016). Regressing heatmaps for multiple landmark localization using CNNs. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19 (pp. 230-238). Springer International Publishing.
Armato III, S. G., McLennan, G., Bidaut, L., McNitt‐Gray, M. F., Meyer, C. R., Reeves, A. P., ... & Clarke, L. P. (2011). The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics, 38(2), 915-931.
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, R. M. (2017). Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2097-2106).
Kaushik, B., Chadha, A., & Sharma, R. (2023). Performance Evaluation of Learning Models for the Prognosis of COVID-19. New Generation Computing, 1-19.
Chadha, A., & Kaushik, B. (2021). A survey on prediction of suicidal ideation using machine and ensemble learning. The Computer Journal, 64(11), 1617-1632.
Khan, Y. F., Kaushik, B., Mir, B. A., Verma, R., & Khandelwal, H. (2022, April). Transfer Learning-Assisted Prognosis of Alzheimer's Disease and Mild Cognitive Impairment Using Structural-MRI. In 2022 10th International Conference on Emerging Trends in Engineering and Technology-Signal and Information Processing (ICETET-SIP-22) (pp. 1-6). IEEE.
Khan, Y. F., Kaushik, B., Rahmani, M. K. I., & Ahmed, M. E. (2022). HSI-LFS-BERT: Novel Hybrid Swarm Intelligence Based Linguistics Feature Selection and Computational Intelligent Model for Alzheimer’s Prediction Using Audio Transcript. IEEE Access, 10, 126990-127004.
Serbes, G., Sakar, C. O., Kahya, Y. P., & Aydin, N. (2013). Pulmonary crackle detection using time–frequency and time–scale analysis. Digital Signal Processing, 23(3), 1012-1021.
Jin, F., Sattar, F., & Goh, D. Y. (2014). New approaches for spectro-temporal feature extraction with applications to respiratory sound classification. Neurocomputing, 123, 362-371.
Bahoura, M. (2009). Pattern recognition methods applied to respiratory sounds classification into normal and wheeze classes. Computers in biology and medicine, 39(9), 824-843.
Reyes, B. A., Charleston-Villalobos, S., González-Camarena, R., &Aljama-Corrales, T. (2014). Assessment of time–frequency representation techniques for thoracic sounds analysis. Computer methods and programs in biomedicine, 114(3), 276-290.
Hong, Min, et al. "Multi-class classification of lung diseases using CNN models." Applied Sciences 11.19 (2021): 9289.
Kim, Sungyeup, et al. "Deep learning in multi-class lung diseases’ classification on chest X-ray images." Diagnostics 12.4 (2022): 915.
Ashok, Malvika, and Abhishek Gupta. "Automatic Segmentation of Organs-at-Risk in Thoracic Computed Tomography Images Using Ensembled U-Net InceptionV3 Model." Journal of Computational Biology (2023).
N.Vijayan and J. Kuruvilla, "The impact of transfer learning on lung cancer detection using various deep neural network architectures," 2022 IEEE 19th India Council International Conference (INDICON), Kochi, India, 2022, pp. 1-5. doi: 10.1109/INDICON56171.2022.10040188.
Wang, Shudong, Dong, Liyuan, Wang, Xun and Wang, Xingguang. "Classification of pathological types of lung cancer from CT images by deep residual neural networks with transfer learning strategy" Open Medicine, vol. 15, no. 1, 2020, pp. 190-197. https://doi.org/10.1515/med-2020-0028.
Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of big data, 6(1), 1-48.
Wen, L., Li, X., & Gao, L. (2020). A transfer convolutional neural network for fault diagnosis based on ResNet-50. Neural Computing and Applications, 32, 6111-6124.
Ghosal, P., Nandanwar, L., Kanchan, S., Bhadra, A., Chakraborty, J., & Nandi, D. (2019, February). Brain tumor classification using ResNet-101 based squeeze and excitation deep neural network. In 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP) (pp. 1-6). IEEE.
Nadimi-Shahraki, M. H., Taghian, S., & Mirjalili, S. (2021). An improved grey wolf optimizer for solving engineering problems. Expert Systems with Applications, 166, 113917.
Zhang, Z. (2018, June). Improved adam optimizer for deep neural networks. In 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS) (pp. 1-2). Ieee.
Keskar, N. S., & Socher, R. (2017). Improving generalization performance by switching from adam to sgd. arXiv preprint arXiv:1712.07628.
Sonsare, P. M., & Gunavathi, C. (2021). Cascading 1D-Convnet Bidirectional Long Short Term Memory Network with Modified COCOB Optimizer: A Novel Approach for Protein Secondary Structure Prediction. Chaos, Solitons & Fractals, 153, 111446.
Khan, Yusera Farooq, et al. "Ensemble Model for Diagnostic Classification of Alzheimer’s Disease Based on Brain Anatomical Magnetic Resonance Imaging." Diagnostics 12.12 (2022): 3193.

Download PDF

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

A Novel Dual Feature Extraction using Fine-Tuned ResNet with GWO and Deep Dense Neural Network for Multiple Lung Disease Classification

Status:

Version 1

Abstract

Figures

1. Introduction

1.2 Contributions and Highlights of the proposed model

2. Literature Review

3. Methodology

3.1 Dataset Description (NIH Chest X-Ray)

3.2 Data preprocessing

3.2.1 Data Augmentation

3.3 Proposed Framework

3.3.1 Feature Extraction using Residual Learning Models

3.3.2 Feature Selection Using Grey Wolf Optimization Techniques (GWO)

3.3.3 Concatenation of Two Feature Sets

3.4 Proposed Approach: Dual Feature Extraction using ResNet with GWO and Deep Dense Neural Network (ResNet-GWO-DD)

4. Results and Discussions

4.2 Learning Rate Optimization

4.3 Performance Evaluation

5.2 Comparison with previous studies

Conclusion

Declarations

References

Status:

Version 1