Efficient Deep Learning Model for COVID-19 Detection in large CT images datasets: A cross-dataset analysis

doi:10.21203/rs.3.rs-41062/v1

Download PDF

Research Article

Efficient Deep Learning Model for COVID-19 Detection in large CT images datasets: A cross-dataset analysis

https://doi.org/10.21203/rs.3.rs-41062/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 14 Sep, 2020

Read the published version in Informatics in Medicine Unlocked →

Version 1

posted

You are reading this latest preprint version

Early detection and diagnosis are critical factors to control the COVID-19 spreading.

A number of deep learning-based methodologies have been recently proposed for COVID-19 screening in CT scans as a tool to automate and help with the diagnosis. To achieve these goals, in this work, we propose a slice voting-based approach extending the EfficientNet Family of deep artificial neural networks.

We also design a specific data augmentation process and transfer learning for such task.

Moreover, a cross-dataset study is performed into the two largest datasets to date. The proposed method presents comparable results to the state-of-the-art methods and the highest accuracy to date on both datasets (accuracy of 87.60\% for the COVID-CT dataset and accuracy of 98.99% for the SARS-CoV-2 CT-scan dataset). The cross-dataset analysis showed that the generalization power of deep learning models is far from acceptable for the task since accuracy drops from 87.68% to 56.16% on the best evaluation scenario.

These results highlighted that the methods that aim at COVID-19 detection in CT-images have to improve significantly to be considered as a clinical option and larger and more diverse datasets are needed to evaluate the methods in a realistic scenario.

Artificial Intelligence and Machine Learning

COVID-19

Deep Learning

EfficientNet

Pneumonia

Chest Radiography

In March 2020, the World Health Organization (WHO) officially declared the outbreak of COVID-19, the disease caused by SARS-CoV-2, a pandemic. COVID-19 is highly infectious and can potentially evolve to fatal acute respira- tory distress syndrome (ARDS). Early detection and diagnosis is a critical factor to control the COVID-19 spreading. The most common screening method to de- tect it is the reverse-transcription polymerase chain reaction (RT-PCR) testing. However, it is a laborious method and some studies reported its low sensitivity in early stages [1].

Chest scans such as X-rays and Computer tomography (CT) scans have been used to identify morphological patterns of lung lesions linked to the COVID-19. However, the accuracy of the diagnosis of COVID-19 by Chest scans strongly depends on experts [2] and Deep learning techniques have been studied as a tool to automate and help with the diagnosis [3, 4, 5, 6, 7, 8].

A computed tomography scan, or CT scan, produces detailed images of or- gans, bones, soft tissues and blood vessels. CT images allow physicians to iden- tify internal structures and see their shape, size, density and texture. Different from conventional X-Rays, CT scans produce a set of slices of a given region of the body without overlaying the different body structures. Thus, CT scans give a much more detailed picture of the patient’s condition than the conventional X-Rays. This detailed information can be used to determine whether there is a medical problem as well as the extent and exact location of the problem. For these reasons, a number of deep learning based methodologies have been recently proposed for COVID-19 screening in CT scans [9, 10, 11, 12, 13, 14].

The main bottleneck for the realization of a study such as the ones cited above is the lack of good quality comprehensive data sets. Possibly the first attempt to create such a data set was the so-called COVID-CT dataset [15]

which consists of images mined from research papers. Different versions of this dataset were used in [9, 10, 11, 12]. For its most updated version, the highest reported accuracy, F1-score, and AUC were 86%, 85%, and 94% [9], respectively. More recently, Soares et al. [14] made another set of CT scans publicly available. It consists of 2482 CT scans taken from hospitals in the city of São Paulo, Brazil. They have reported an accuracy, sensitivity, and positive predictive value of 97.38%, 95.53%, and 99.16%, respectively.

These two datasets are, to date, the biggest publicly available datasets. It can be seen that the difference in the best results obtained in each of them is significant which raises two questions: (i) Are the discrepancies in the results due to the differences in the datasets? (ii) Does a model trained in one dataset have good performance when tested with the other?

Another drawback of the best performing techniques is their immense num- ber of parameters which directly influence their footprint and latency. Improving these two metrics allows the model to be more easily embedded in mobile ap- plications and to be less of a burden on the server if provided as a web-service receiving an enormous number of requests per second. In addition, having a more compact baseline model allows the exploitation of higher resolution inputs without making the computational cost prohibitively high. Broadly speaking, the computational cost is an important factor in the accessibility and availability of the the technology to the public.

Thus, the main goals of this work are: (i) to propose a high-quality yet compact deep-learnign model for the screening of COVID-19 in CT scans and (ii) to address, for the first time, the aforementioned questions regarding the two biggest datasets.

To produce an efficient model we exploit and extend the EfficientNet Family of deep artificial neural networks along with a data augmentation process and transfer learning. Following previous evaluation protocols [9, 14], state-of-the- art results are presented for the COVID-CT dataset (accuracy of 87.60%) and the SARS-CoV-2 CT-scan dataset (accuracy of 98.99%).

A vote-based evaluation approach is also studied as well as a cross-dataset analysis in order to address questions related to the datasets.

The remainder of this work is organized as follows. Section 2 present the de- tails of COVID-CT [15] and SARS-CoV-2 CT-scan [14] datasets. The method- ology is described in Section 3 and the experiments along with the results in Section 4. Finally, Section 5 presents the conclusion of this work.

This section describes the two datasets considered in this work. To the best of our knowledge, these are the two largest public datasets to date.

2.1 SARS-CoV-2 CT-scan dataset

The SARS-CoV-2 CT-scan dataset [14] consists of 2482 CT scans from 120 patients, with 1252 CT scans of 60 patients infected by SARS-CoV-2 from males

(32) and females (28), and 1230 CT scan images of 60 non-infected patients by SARS-CoV-2 from males (30) and females (30), but presenting other pulmonary diseases. Data was collected from hospitals of São Paulo, Brazil.

In this dataset the images consist of digital scans of the printed CT exams and they have no standard regarding image size (the dimensions of the smallest image in the dataset are 104 × 153 while the largest images are 484 × 416), Figure 1 shows some examples.

This dataset also lacks standardization regarding the contrast of the images, as can be seen in Figure 2. For method evaluation, the protocol presented in [14]

2.2 COVID-CT Dataset

To assemble the COVID-CT dataset [15], CT images of patients infected with COVID-19 were collected from scientific articles (pre-prints) deposited in the medRxiv and biRxiv repositories, from January 19 to March 25 and also some images were donated by hospitals (http://medicalsegmentation. com/covid19/). The PyMuPDF software was used to extract images from the manuscripts, in order to maintain high quality. Meta data were manually ex- tracted and associated with each image: patient age, gender, location, medical history, scan time, severity of COVID-19, and medical report. A total of 349 images were collected, from 216 patients.

Regarding healthy and non-covid patients, the authors collected images from two other datasets (MedPix dataset, LUNA dataset), from the Radiopaedia website and from other articles and texts available at PubMed Central (PMC). A total of 463 images were collected from 55 patients.

Analogous to the previous dataset, the COVID-CT dataset has defined stan- dard for image size and contrast. Figure 3 shows some examples. It is also im- portant to highlight that some images contain textual information which may interfere with model prediction. See Figure 4.

A protocol is proposed for the creation of training, validation, and test sets. The COVID-19 images that were donated by hospitals and extracted directly from medical equipment (LUNA and Radiopaedia) were selected to compose the validation and test sets. The remaining - extracted from scientific articles and manuscripts - were reserved to compose the training set. The dataset is available at https://github.com/UCSD-AI4H/COVID-CT.

2.3 Final Regards

Table 1 summarizes the datasets presented in this section. In the table is possible to observe the issues identified in the datasets, and the relation between the number of patients and the amount of images of each class (COVID and Non-COVID).

Table 1: Datasets distribution.

Dataset

COVID-19

Non-COVID-19

Issues

# Patients

# Images

# Patients

# Images

SARS-CoV-2 CT-scan [14]

1252

1230

non-standard size of images

non-standard contrast of images

COVID-CT [15]

216

349

463

non-standard size of images

non-standard contrast of images textual information on images

In this section, the proposed methodology for COVID-19 screening based on CT scans is presented. To this end, we extended an architecture of the Effiecient- Net family and we trained the models with CT images from healthy and SaR- CoV-2 infected patients. The CT images come from the datasets described in the previous section and undergo the pre-processing procedure described below.

3.1 Pre-processing

Pre-processing is a very common process in computer vision applications. Pre-processing techniques can be useful for removing unwanted noise, emphasize aspects of the image that can help with the recognition task, or even help with the deep learning training phase.

In this work, a simple pixel intensity normalization in the range of [0, 1] is applied. This pre-processing is necessary for model convergence during the training phase.

For convolutional network models, the input images are often resized to maintain compatibility with the network architectures. Since EfficientNets have a low computational cost in terms of latency and memory, it makes it possible to exploit higher resolution input images. Thus, we also investigate the impact of varying the input resolution in the quality of the model. In this way, this pre-processing step becomes another parameter of the network.

3.2 EfficientCovidNet

The EfficientNets are a family of artificial neural networks in which the basic building block is the Mobile Inverted Bottleneck Conv Block, MBconv [16], as depicted in Figure 5. Table 2 presents a typical EfficientNet architecture, particularly the B0 model.

Figure 5: MBConv Block [16]. DWConv stands for depthwise conv, k3x3/k5x5 defines the kernel size, BN is batch normalization, HxWxF represents the tensor shape (height, width, depth), and x1/2/3/4 is the multiplier for number of repeated layers.

The main idea to achieve the EfficientNet architecture was to start from one high quality yet compact baseline model presented in Table 2 and progressively scale each of its dimensions, in a systematical manner, with a fixed set of scaling coefficients.

Table 2: EfficientNet baseline network : B0 architecture.


Stage	Operator	Resolution	#channels	#layers
1	Conv3x3	224x224	32	1
2	MBConv1,k3x3	112x112	16	1
3	MBConv6,k3x3	112x112	24	2
4	MBConv6,k5x5	56x56	40	2
5	MBConv6,k3x3	28x28	80	3
6	MBConv6,k5x5	14x14	112	3
7	MBConv6,k5x5	14x14	192	4
8	MBConv6,k3x3	7x7	320	1
9	Conv1x1/Pooling/FC	7x7	1, 280	1

An EfficientNet can be defined by three dimensions: (i) depth; (ii) width; and (iii) resolution as illustrated in FIGURE 6. Each dimension is scaled by the parameter φ according to Equation 1 where α = 1.2 β = 1.1 and γ = 1.1 are constants obtained experimentally by a grid search. Varying φ, one can find other derived networks. For instance, φ = 1 gives rise to the EfficientNet B1, φ = 2 gives rise to the EfficientNet B2, and so on.

According to [17], Eq. 1 provides a nice trade-off between computational cost and performance.

depth = α^φ width = β^φ resolution = γ^φ

s.t. α · β2 · γ2 ≈ 2

α ≥ 1, β ≥ 1, γ ≥ 1, (1)

In [18], four new blocks are added to the baseline model to improve COVID- 19 recognition on x-ray images. Here, we proposed modifications aimed at CT images, and six new blocks are added to an EfficientNet B0 architecture. These blocks were achieved by a grid search and can be seen in Table 3.

Table 3: Smaller architecture - example of achieved blocks on the SARS-CoV-2 CT-scan dataset - EfficientNet-CT model. (NC = Number of Classes).


Stage	Operator	Resolution	#channels	#layers
1-9	EfficientNet B3	300x300	32	1
10	BN/Dropout	7x7	1280	1
11	FC/BN/Swich/Dropout	1	512	1
12	FC/BN/Swich	1	128	1
13	FC/Softmax	1	NC	1

Two searches are carried out. One aiming at a shallower architecture and the other deeper.

Table 4: Deeper architecture - example of achieved blocks on the SARS-CoV-2 CT-scan dataset - EfficientNet-CT model. (NC = Number of Classes).


Stage	Operator	Resolution	#channels	#layers
1-9	EfficientNet B0	custom input	32	1
10	BN/Dropout	7x7	1280	1
11	FC/BN/Swich/Dropout	1	2048	1
11	FC/BN/Swich/Dropout	1	1024	1
12	FC/BN/Swich	1	512	1
13	FC/Softmax	1	NC	1

On top of the model a new fully connected layer (FC) is added to adapt the classification task to a new domain. Along with those blocks, Batch normaliza- tion (BN), dropout, and swish activation functions are also employed.

The batch normalization operation constrains the output of the layer in a specific range, forcing zero mean and standard deviation one. That works as a regularization, increasing the stability of the neural network, and accelerating the training [19].

The Dropout [20] operation also act as a regularization, by inhibiting a few neurons and thus emulating a bagged ensemble of multiple neural networks , for each mini-batch on training. The dropout parameter defines the number of inhibited neurons (0 to 100 percent of the neurons of one layer).

Despite Rectified Linear Unit (ReLU) is considered the most popular acti- vation function, here we explore the swish activation function [21]. ReLu can be formally defined as f (x) = max(0, x), while the swish function is defined by the equation:

f (x) = x · (1 + exp⁻^x)⁻1. (2)

The swish activation produces a smooth curve during the minimization loss process and contrary to that, the ReLu produces an abrupt change. Also, the swish function does not zero out small negative values. We believe those factors may be relevant for capturing patterns underlying the data [21].

3.3 Training

Due to its complexity, Deep learning models require a large number of in- stances to avoid overfitting. However, for the majority of real-life problems, data is not abundant. In fact, few are the situations where there is an abundance of data, such as the ImageNet [22] dataset. To overcome this issue, one could rely on two techniques: data augmentation and transfer learning. In this work, we made use of both techniques and we describe below.

3.3.1 Data augmentation

Data augmentation consists of increasing the training samples by transform- ing the images without losing semantic information. In this work, we applied three transformations to the training samples: rotation, horizontal flip, and scaling. Figure 7 presents an example of the applied data augmentation. Such transformations preserve the images and would not prevent a physician from interpreting the images.

3.3.2 Transfer learning

Starting from a pre-trained neural network and re-training it to fit other datasets or other domains is called transfer learning [23]. Performing a fine- tune from a pre-trained network can enable the use of deep architectures when there is little training data, as the network has already learned filters in other domains/problems that can be reused [24]. In the present work, we have few im- ages to carry out the training, especially of the COVID-19 class. Thus, transfer learning becomes imperative.

Our models inherit several layers from EfficientNet (See Table2) and the new layers are randomly initialized with zero mean. EfficientNets were originally trained for the Imagenet dataset [22]. Thus, we follow the steps to transfer leanring from one domain to another:

Copy the weights from one EfficientNet model to the new model;
Modify the architecture of the new model, including new layers on top;
Random initialize the new layers;
Define which layers will pass through the learning process and which one will be frozen; and
Perform the learning process, by updating the weights according to the loss function and optimization

Here, the weights are updated with Adam Optimizer with a maximum learn- ing rate of 10⁻4. We schedule the learning rate to decrease by a factor of 10 in the event of stagnation. The number of epochs is fixed at 10.

3.4 Evaluation metrics

Five metrics are used here to evaluate models: accuracy (Acc), COVID- 19 sensitivity (Se_C ), COVID-19 positive prediction (+P_C ), F1-score (F 1), and Area Under the Receiver Operating Curve (AUC), i.e., (see Equation 3 in the Supplemental Files) wherein TP , TN , FN , and FP stand for the COVID-19 samples correctly classified, non-COVID-19 samples correctly classified, the COVID-19 samples classified as non-COVID-19, the non-COVID-19 classified as COVID-19.

To comparison with the literature, we also report the result in terms of Area Under the Receiver Operating Curve (AUC). The Receiver Operating Curve is a plot of true positive rate (A.K.A. sensitivity = Se_C ) versus false positive rate (FPR). The FPR is define by Equation 4. (see Equation 4 in the Supplemental Files) Higher the AUC, better the model is at distinguishing among image categories.

Experiments were carried on an Intel(R) Core(TM) i7-5820K CPU 3.30GHz, 64GB Ram, one Titan X Pascal with 12GB, and the TensorFlow/Keras frame- work for Python. The source code and pre-trained models are available in https:

//github.com/ufopcsilab/EfficientCovidNet. In the following subsections, we present the three experimental setups explored in this work.

In a first setup, in Section 4.1, we investigate the discrepancy regarding the results reported by the methods considered state-of-the-art for the two studied datasets. The best approach for the COVID-CT dataset reports 86.0% of ac- curacy [9]. For the SARS-CoV-2 CT-scan dataset, the state-of-the-art method achieves 97.38% of accuracy [14]. However, the SARS-CoV-2 CT-scan dataset has significantly more images than the COVID-CT dataset and the same num- ber of patients (individuals). To assess whether this difference is due to the evaluation protocol, we perform two experiments. We investigate the impact of selecting samples/images for training and test sets at random and in a second step, we evaluate the impact of performing the selection guided by individuals, that is, ensuring that there are no samples from the same individual simultane- ously in the training and test sets.

In a second setup, in Section 4.2, we investigate a very important aspect, which is the generalization power of a model. A model is only useful if it can also generalize to data from other distributions or other datasets. In this regard, we evaluate how the model, trained with the SARS-CoV-2 CT-scan dataset, behaves when it is faced with images from another dataset, the COVID-CT Dataset. We follow the data-split protocol proposed in [15].

Finally, for the third setup, we explore our EfficientCovidNet model only with the COVID-CT Dataset, considering the protocol proposed in [15]. This setup aims to expand the comparison of the proposed approach with the liter- ature since this dataset is the most popular to date. Here we also explore the impact of varying the size of the input images.

4.1 Setup 1 : 5-fold evaluation on a Large Dataset

To evaluate the performance of the proposed approach, we tested the pro- tocol proposed by Soares et al. [14] and three different scenarios using a 5-fold cross-validation: (i) “Random”, (ii) “Slices”, and (iii) “Voting”. The “Random” evaluation divide the data into training and test sets randomly. The “Slice” eval- uation consider all the CT images independent of each other but consider the patient division, that is, we prevent samples from one individual simultaneously in the training and test sets. In this manner, the model will always be evaluated with samples from unknown individuals. Finally, the “Voting” evaluation con- sider all images of an individual and a voting scheme to achieve a diagnosis per individual instead of by instance or image. Considering that several CT images are acquired in a single exam for a single individual, we believe that the disease patterns will not be present on all instances. Thus, an evaluation using a voting scheme, considering all possible instances of one individual, could increase the chances of success.

4.1.1 Results

Following the protocol proposed in [14], the proposed approach in this work enhanced all metrics as shown in Table 5.

Table 5: Classification protocol proposed in [14].


Approach	Acc (%)	SeC (%)	+PC (%)
Soares et al. [14]	97.38	95.53	99.16
Proposed Approach	98.99	98.80	99.20

Despite the outstanding results presented in Table 5, we believe that such results are overestimated. Upon this fact, we introduce a 5-fold classification and some changes in the original protocol as described and with the results presented in Table 6.

Table 6: 5-fold classification by slicing and with voting.


Approach	Acc (%)	SeC (%)	+PC (%)
Random	98.5 ± 0.4	98.6 ± 0.6	98.4 ± 0.6
Slice	86.6 ± 10.1	94.8 ± 4.5	79.7 ± 20.9
Voting	89.6 ± 5.1	92.0 ± 10.0	77.5 ± 23.3

The “Random” evaluation presents better results when compared to the two other approaches (“Slice” and “Voting”). One of the reasons is due to data from the same patient/individual in both training and test sets, which leads to an overestimated result. Upon this fact, our hypothesis is that an approach tends to learn the patterns related to the individuals instead the COVID patterns.

In the “Slice” evaluation, the samples are classified as an isolated instance, such as the “Random” one but ensuring that all samples of an individual are exclusively present only on one data partition: training or test set. A down- grade is observed which clearly shows an overestimation from the “Random” evaluation.

On the opposite to the “Slice” evaluation, the “Voting” one considers all images of an individual to decide whether the individual is infected or not. It is worth to emphasize that the same model is used in both approaches, that is, the model trained by image (only one “ slice ” of the lung).

Due to the nature of CT scans, we believe the disease patterns will not manifest in all slices (instance/images) of an individual CT exam, and results of “Slice” and “Voting” evaluation reflect that. We believe this can generate false positives and therefore impact the figures of approaches (See Table 6). Besides, this problem can be seen as a multiple instance learning (MIL) problem [25] and that a MIL-based approach can be a promising path for future work.

Comparing the results of both Tables 5 and 6, we believe the presence of samples from the same individual in training and test tends to lead an overes- timation of an approach. To circumvent this issue, it is necessary to ensure the division of the dataset considering the individual, and the use of a cross-dataset approach.

4.2 Setup 2 : Cross-dataset evaluation

For this experiment, we investigate the impact of learning a model in one data distribution and evaluate on another one. This scenario is closer to reality since it is almost impossible to train a model with images acquired from all available sensors, environments and individuals. On this setup, the SARS-CoV-2 CT-scan dataset [14] is used only for train- ing, and none image of this dataset is present on the test set. For the test set, we use the dataset presented in [15], the COVID-CT, since it is a dataset used by several authors in the literature. We follow the protocol proposed in [15] to split the COVID-CT in train and test sets, however, we highlight that for training the model only images from the SARS-CoV-2 CT-scan dataset is used. We also evaluated other test configurations, such as using the COVID-CT train- ing partition as a test and also combining both partitions from the COVID-CT dataset as a larger test set (See Table 7). We also test the opposite scenario, in which we use all images from the COVID-CT dataset [15] for training and all images of SARS-CoV-2 CT-scan dataset [14] to test.

4.2.1 Results

Table 7: crossdataset

Training dataset Test dataset Acc (%)	SeC (%)	+PC (%)
SARS-CoV-2 COVID-CT [15] 59.12	64.14	54.95
SARS-CoV-2 COVID-CT [15] 56.16	53.06	54.74
SARS-CoV-2 COVID-CT [15] 58.31	61.03	54.90
COVID-CT [15] SARS-CoV-2 45.25	54.39	46.36

CT-scan dataset [14] (Train)

CT-scan dataset [14] (Test)

CT-scan dataset [14] (Train + Test)

(Train + Test) CT-scan dataset [14]

As one can see, the model performance is drastically reduced when we com- pare cross-dataset evaluation against an intra dataset one. We believe that the reason for this behavior is due to data acquisition diversity. Images from dif- ferent datasets can be acquired by different equipment, different image sensors, and thus, change relevant features on the images impairing recognition. The model could learn how to identify portions and patterns of one image that may indicate the presence (or absence) of COVID-19, although, those patterns may no appear in a different dataset.

Training on COVID-CT [15] and testing in SARS-CoV-2 CT-scan dataset[14] presents even worse results since COVID-CT training set is smaller.

We believe such test should be mandatory for all methods aiming at COVID- 19 recognition with CT images, since it is the one that most resembles a real test.

4.3 Setup 3 : Impact of input resolution

In this setup, we evaluate the protocol presented in [15] only on COVID- CT dataset. Zhao et al. [15] proposes to divide the COVID-CT dataset into three sets: training, validation, and testing. We also applied data augmentation by rotating (max 0.15 degrees for each side), randomly zooming (80% of the are) with 20% of chance and horizontal flipping with a probability of 50%. We stress that the data augmentation is applied only for training data. The final number of training images totalized 2968 images (1442 of COVID and 1408 of NonCOVID). Using the protocol in [15], the test set consists of 203 images (98 of COVID and 105 of NonCOVID).

4.3.1 Results

In Table 8, we report the results of the proposed approach using the pro- tocol described in [15]. One may observe that the experiments with the same approach used in Setups 1 and 2 (EfficientNet-B3) has a worse performance when compared with the ones available in the literature.

Aiming to reduce the incidence of overfitting during training of “Architecture 1”, we propose a deeper network. In most of the cases, when the deeper network is used (see “Architecture 2” in Table 8), rather than a Architecture 1 one (see “Architecture 1” in Table 8), a gain is observed on all reported figures.

We emphasize that the architectures with the largest image size (550x550) present the worst performance among the experiment (varying the input size), in the opposite direction of what is expected. Our hypothesis is that there are some small images (281x202) that are expanded and severely distorted, which hinders the COVID-19 patterns on images.

Table 8: Custom input using the EfficientNet-B0 as the base network.


Depth	Input size	Acc (%)	SeC (%)	+PC (%)	F 1 (%)
EfficientNet-B3	300x300	77.34	69.39	80.95	74.72
	224x224	79.31	70.41	84.15	76.70
	300x300	76.85	69.39	80.00	74.32
Architecture 1	350x350	80.79	79.59	80.41	80.00
	400x400	83.25	80.61	84.04	82.29
	450x450	83.25	81.63	83.33	82.47
	500x500	83.74	83.67	82.83	83.25
	550x550	79.31	77.55	79.17	78.35
	224x224	83.74	77.55	87.36	82.16
	300x300	81.28	79.59	81.25	80.41
Architecture 2	350x350	86.21	81.63	88.89	85.10
	400x400	80.30	74.49	82.95	78.49
	450x450	77.34	75.51	77.08	76.29
	500x500	87.68	79.59	93.98	86.19
	550x550	75.37	64.29	80.77	71.60

The best model is the one with the Architecture 2 with input size of 500x500 (source available at https://github.com/ufopcsilab/EfficientCovidNet). The ROC curve of the model is presented in Figure 8.

We present in Table 9 a comparison of the best proposed approach against the ones available in the literature. Despite the results presented by Amyar et al. [12] and Mobiny et al. [10], both evaluated their approach with only 105 images (47 COVID and 58 NonCovid) and, therefore, they cannot be directly compared to the present work. Thus, the best results previously obtained in this setup were presented in [9]. Although the work proposed here overcomes it in terms of accuracy and F1-score on COVID-CT dataset using a significantly smaller model ( 3 × smaller). The base model proposed in [9] needs 14,149,480 parameters while the one proposed here only 4,779,038 parameters.

Table 9: Comparison with literature. # - Evaluated with a different test set: only 105 images (47 COVID and 58 NonCovid).


Approach	Acc	F 1	AUC
#Amyar et al.[12]	86.0	-	93.0
#Mobiny et al.[10]	87.6	87.1	96.1
Polsinelli et al.[11]	83.0	83.3	-
He et al.[9]	86.0	85.0	94.0
Proposed approach	87.6	86.19	90.5

In this work, a model for the detection of COVID-19 patterns in CT images, namely EffiecintCovidNet, is proposed. The proposed model presents compa- rable results to the state-of-the-art methods and the highest accuracy to date on both datasets. Also, it is three times smaller (with 4.78 million parameters against 14.15 million of He et al.[9]) and has a latency of 0.010 seconds. This model could enable the use on devices with low computational power, such as smartphones and tablets or even facilitate integration with the Radiology PACS. Our model was evaluated on three setups and with the two largest public datasets. We also performed a cross-dataset analysis To the best of our knowledge, this is the first work to carry out such analysis for the present task. We believe that the cross-dataset approach is of paramount importance for the methods aiming to detect COVID-19 in CT images since the approach resem- bles a real scenario and unveils the limitations of the methods (for instance, the accuracy drops from 87.68% to 56.16% in this scenario for the COVID-CT test set). Our analysis shows that the methods that aim COVID-19 detection in CT images have to improve significantly to be considered as a clinical option.

In this study, we show the potential of Deep Learning models for the task of COVID-19 detection on CT images. We also emphasize that larger and more diverse datasets are needed in order to evaluate the methods in a more realistic manner. As a future research path, we intend to build a very large CT image datasets from several Brazilian centers, in order to try to cover a larger spectrum of equipment (sensors), ethnic groups and acquisition processes and thus, properly validate our method.

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ai, Z. Yang, H. Hou, C. Zhan, C. Chen, W. Lv, Q. Tao, Z. Sun, L. Xia, Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases, Radiology (2020) 200642.
-Y. Ng, E. Y. Lee, J. Yang, F. Yang, X. Li, H. Wang, M. M.-s. Lui,S.-Y. Lo, B. Leung, P.-L. Khong, et al., Imaging profile of the covid-19 infection: radiologic findings and literature review, Radiology: Cardiotho- racic Imaging 2 (1) (2020) e200034.
E.-D. Hemdan, M. A. Shouman, M. E. Karar, Covidx-net: A frame- work of deep learning classifiers to diagnose covid-19 in x-ray images, arXiv preprint arXiv:2003.11055 (2020).
Farooq, A. Hafeez, Covid-resnet: A deep learning framework for screen- ing of covid19 from radiographs, arXiv preprint arXiv:2003.14395 (2020).
Li, Z. Han, B. Wei, Y. Zheng, Y. Hong, J. Cong, Robust screening of covid-19 from chest x-ray via discriminative cost-sensitive learning, ArXiv abs/2004.12592 (2020).
Abbas, M. Abdelsamea, M. Gaber, Classification of covid-19 in chest x-ray images using detrac deep convolutional neural network, medRxiv (2020). doi:10.1101/2020.03.30.20047456.
Wang, A. Wong, Covid-net: A tailored deep convolutional neural net- work design for detection of covid-19 cases from chest radiography images, arXiv preprint arXiv:2003.09871 (2020).
Luz, P. L. Silva, R. Silva, L. Silva, G. Moreira, D. Menotti, Towards an effective and efficient deep learning model for covid-19 patterns detection in x-ray images (2020). arXiv:2004.05717.
He, X. Yang, S. Zhang, J. Zhao, Y. Zhang, E. Xing, P. Xie, Sample- efficient deep learning for covid-19 diagnosis based on ct scans, medRxiv (2020).
Mobiny, P. A. Cicalese, S. Zare, P. Yuan, M. Abavisani, C. C. Wu,Ahuja, P. M. de Groot, H. Van Nguyen, Radiologist-level covid-19 de- tection using ct scans with detail-oriented capsule networks, arXiv preprint arXiv:2004.07407 (2020).
Polsinelli, L. Cinque, G. Placidi, A light cnn for detecting covid-19 from ct scans of the chest, arXiv preprint arXiv:2004.12837 (2020).
Amyar, R. Modzelewski, S. Ruan, Multi-task deep learning based ct imaging analysis for covid-19: Classification and segmentation, medRxiv (2020).
Wang, B. Kang, J. Ma, X. Zeng, M. Xiao, J. Guo, M. Cai, J. Yang, Y. Li,
Meng, B. Xu, A deep learning algorithm using ct images to screen for corona virus disease (covid-19), medRxiv (2020). doi:10.1101/2020.02. 14.20023028.
Soares, P. Angelov, S. Biaso, M. H. Froes, D. K. Abe, Sars-cov-2 ct-scan dataset: A large dataset of real patients ct scans for sars-cov-2 identifica- tion, medRxiv (2020).
Zhao, Y. Zhang, X. He, P. Xie, Covid-ct-dataset: a ct scan dataset about covid-19, arXiv preprint arXiv:2003.13865 (2020).
Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE con- ference on computer vision and pattern recognition, 2018, pp. 4510–4520.
Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, arXiv preprint arXiv:1905.11946 (2019).
Luz, G. Moreira, L. A. Z. Junior, D. Menotti, Deep periocular represen- tation aiming video surveillance, Pattern Recognition Letters 114 (2018) 2–12.
Ioffe, C. Szegedy, Batch normalization: Accelerating deep network train- ing by reducing internal covariate shift, arXiv preprint arXiv:1502.03167 (2015).
Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, The journal of machine learning research 15 (1) (2014) 1929–1958.
Ramachandran, B. Zoph, Q. V. Le, Searching for activation functions, arXiv preprint arXiv:1710.05941 (2017).
Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale vi- sual recognition challenge, International journal of computer vision 115 (3) (2015) 211–252. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.
Oquab, L. Bottou, I. Laptev, J. Sivic, Learning and transferring mid- level image representations using convolutional neural networks, in: Pro- ceedings of the IEEE conference on computer vision and pattern recogni- tion, 2014, pp. 1717–1724.
Sudharshan, C. Petitjean, F. Spanhol, L. E. Oliveira, L. Heutte,Honeine, Multiple instance learning for histopathological breast cancer image classification, Expert Systems with Applications 117 (2019) 103–111.

Equations3and4.pdf

Download PDF

Journal Publication

published 14 Sep, 2020

Read the published version in Informatics in Medicine Unlocked →

Version 1

posted

You are reading this latest preprint version

Efficient Deep Learning Model for COVID-19 Detection in large CT images datasets: A cross-dataset analysis

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction

2. Datasets

3. Methodology

4. Experiments And Discussion

5. Conclusion

Declarations

References

Supplementary Files

Status:

Journal Publication

Version 1