Owing to the computational capabilities associated with most modern computers, as well as the availability of enormous quantities of data, deep learning is in pervasive use in medical imaging applications. The most common type of deep neural network is the convolutional neural network (CNN). CNNs are modelled on the visual cortex of the human brain. They are extensively used for imaging, video and audio applications. The authors of [6] have conducted a detailed survey pertaining to the efficacy of CNNs in various types of object detection applications. The structure and working of LeNet, AlexNet, ZFNet, GoogleNet, VGGNet, ResNet, SENet, DenseNet, Xception, PNAS/ENAS, along with their variations, have been discussed in detail.
One main application of CNN may be to identify salient features. For example, [7] studied the applicability of CNNs in performing component-based and age-invariant face recognition. The proposed methodology addressed the association of facial components with a relevant face, with commendable results. Features extracted from CNNs were subsequently fed into two dimensionality reduction algorithms: the Fisher linear discriminant analysis (FLDA) and locality preserving projections (LPP). The reduced features were classified using the nearest neighbor algorithm. The features extracted using CNNs yielded accuracy of 91% and 90% respectively, better than that using histogram of gradient (HoG) and Gabor transform features. Other than feature extraction, CNN has been studied for detection, classification and segmentation tasks in medical research. For example, [8] proposed a CNN-based double-branched model wherein one branch was utilized for feature extraction, the other for segmentation for multiple abnormality detection from medical images. [9] proposed the CemrgApp, a CNN model, to classify cardiovascular properties from cardiovascular magnetic imaging (CMRI) scans of different cardiac patients, for efficient diagnosis and treatment. A multi-label CNN was used to segment out the atria and atrial structures from the CMRI scans. The proposed framework was trained on 207 manually annotated CMRI scans, ultimately achieving a Dice score of 0.91 ± 0.02 for atrial blood pool segmentation. [10] and [11] implemented CNNs and their variants in automatic lesion detection, and multiple abnormality detection from medical images. [12] developed a deconvolutional CNN for classification of acute lymphoblastic leukemia, a type of cancer of the white blood cell. [13] designed a multi-network feature extraction model using pre-trained deep CNNs to aid the breast cancer diagnosis. [14] offered a concise introduction to multiscale CNNs, and their applicability in the classification of cells from medical images. [15] developed multiscale all convolutional neural network (MA-CNN) for breast cancer classification using mammogram images. [16] designed deep CNN ensembles for the purpose of segmentation in infant brain MRI images. [17] segmented anomalies in abdominal CT images by CNN, and then classified them using fuzzy SVM.
While the success of CNN attracts great attention in medical research, it is not without limitation. As indicated in [18], clinical studies often have limited samples which posed great challenges to CNN model. One solution is transfer learning, a technique used to train deep networks on small datasets. Transfer learning refers to the migration of knowledge between applications. Owing to restrictions on sample quality, data availability, lack of domain knowledge et cetera, it is ofttimes challenging to develop robust models based only on the resources available for the purpose of the application. In such scenarios, researchers would train models that had previously acquired some knowledge from similar tasks, or data sets and transferred the pre-trained model to the dataset of interest. For example, [19] selected color optic disc-centered fundus images using active learning; subsequently identifying glaucoma using transfer learning on a deep CNN. In [20], a problem-based architecture of DCNN called ChestNet was proposed. This variant of DCNN, ChestNet, was pre-trained on a set of relevant and irrelevant data sets; before finally being trained on the Pediatric Chest X-ray dataset for detection of pulmonary consolidation. [21] gives a concise yet informative description about ChestNet, and its applicability in the detection of thoracic diseases on chest images. [22] developed an ensemble of five of the most commonly used deep CNN models (AlexNet, DenseNet121, InceptionV3, ResNet18 and GoogLeNet) pre-trained on ImageNet, for the purpose of pneumonia detection in the Guangzhou Women and Children’s Medical Center dataset of chest X-rays.
In many a case, it is challenging for humans or deep learning methodologies alone to extract the most important set of features from medical image data sets. Hence researchers oftentimes need to go with a combination of machine learning and deep learning approaches in order to utilize the representational capabilities of deep models, while at the same time not overfitting the data. [23] has followed one such approach. The researchers have access to 58 in-house brain MR images, and 128 MR images from The Cancer Genome Atlas—all of patients with high-grade glioma. For each patient, the researchers calculated 348 hand-crafted radiomics features; and extracted 8192 using a pre-trained deep CNN. Next, they performed feature selection and Elastic Net-Cox modeling to classify patients into long- and short-term survivors. [24] was a detailed study of ROI-based opacity classification of diffuse lung diseases in chest CT images. It used the Cifar-10 and Cifar-100 data sets for pre-training deep CNNs; subsequently, a CT image data set of diffuse lung diseases for parameter tuning, and classification. It delved into the structure of CNN used, and how to implement pre-training and parameter tuning. This paper offered the insights about the relation between the type and characteristics of the data sets used for pre-training and parameter-tuning, and the effectiveness of the transfer learning model. In [25], the researchers implemented transfer learning on CNN; and extreme learning machine to classify between malignant and benign pulmonary nodules on CT images. The deep CNN, pre-trained with the ImageNet data set, was used to extract high-level features of pulmonary nodules. These features helped in the classification of benign and malignant pulmonary nodes, using the extreme learning machine (ELM) model. [26] utilized transfer learning and deep CNNs to classify COVID-19, pneumonia and normal patients from a small chest X-ray dataset. The authors of [27] made use of pre-trained VGG19, MobileNet v2, Inception, Xception and Inception ResNet v2 models to train CNNs on small datasets of lung images, which achieved a classification accuracy, sensitivity and specificity of 96.78%, 98.66% and 96.46% respectively. The authors of [28] designed a deep model called AppendiXNet for the purpose of detecting appendicitis from a dataset of less than 500 CT images. The model was pre-trained on 500,000 video clips, where each video was annotated for one of 600 human actions. The pre-trained model weights were then fine-tuned using 438 CT scans of the appendix. The study has exhibited how pre-training significantly improves results, over training the model directly on the target data.
Conventional machine learning classifiers depend heavily on manually extracted features, which are most often not good representations of the data. CNNs, on the other hand, have an innate capability of extracting consequential feature vectors from images, even though the extracted features might not be the most appealing or meaningful to the human eyes. As reviewed above, existing research shows some success of using deep learning model (e.g., pre-trained model taking advantage of transfer learning) to extract features used in machine learning models. Motivated by this, we propose an integrated deep learning-machine learning pipeline to utilize the representational capabilities of CNNs, yet at the same time not overfit the model on small prostate imaging dataset. Here we propose to extract features from raw images using a pre-trained CNN, which is subsequently used as a feature vector for classification by a classical machine learning classifier. Prior to classification, we perform feature selection procedures on the extracted features in order to mine out the most consequential features. The feature selection framework comprises a sequential combination of statistical and machine learning-related algorithms. The Random Forest algorithm is utilized as part of the feature selection framework to automatically select the most important features and generate their respective importance values in percentage. The reduced features yield a classification accuracy of 76.19%, precision of 81.08% and F1-score of 75.93% when classified using a decision tree classifier with 10-fold cross-validation.