Because of the absence of CP images, we conducted data augmentation on the ROIs of CP images to balance with the amount of COVID-19 images. Compared with the diverse features of COVID-19 CT images, it is much uncomplicated to discriminate CP samples. It caused superior classification result than that of COVID-19. Whereas without data augmentation, the total accuracy rate of the algorithms would decline. More CP cases should be accumulated for further study.
In CT images of COVID-19, there were multiple small patches, obvious stromal changes in the lung exudate, and the damage to the alveolar epithelium was very obvious [5–8]. To be more specific, many patients presented with ground-glass opacity (GGO), consolidation, GGO plus a reticular pattern, vacuolar sign, microvascular dilation sign, fibrotic streak, pleural thickening, pleural retraction sign, and pleural effusion [9–12]. Patients with common pneumonia generally have flaky high-density shadows in the lung, with relatively few ground-glass changes and alveolitis manifestations which are rather grievous in COIVD-19 CT images [13]. Multiple ground-glass shadows and consolidation in double lungs with a small amount of pleural effusion appears in common pneumonia CT images as well [14–16]. The differences between GGO change, consolidation, density, and other distinct characteristics can be extremely well distributed in statistical textural features. The pronouncedly superior performance of the proposed method benefits overwhelmingly from this advantage.
At present, many scholars have been working on the prediction and diagnosis COVID-19, and most of the algorithms they developed are based on deep learning, which obtained impressive results [17, 18]. For example, the deep learning model (COVNet) developed by Li et al.[19] can accurately detect COVID-19 and distinguish it from lung diseases such as community-acquired pneumonia (CAP). The sensitivity and specificity for detecting COVID-19 in the independent test group were 114 (90% [95% CI: 93%, 98%]) out of 127 (90% [95% CI: 83%, 94%]) ]) and 307 (96% [95% CI: 93%, 98%]) of 294 (90% [95% CI: 93%, 98%]), the AUC was 0.96 (p-value < 0.001). The sensitivity and specificity for detecting the CAP in the independent test set were 87% (152 (175)) and 92% ((239 (259)), and the AUC was 0.95 (95% CI: 0.93, 0.97). However, development of the practical deep learning diagnostic systems for epidemic response is rather different from the development of traditional deep learning diagnostic systems due to deep learning-based COVID-19 diagnostic systems is time-consuming to some extent, while initially the COIVD-19 samples are in shortage, thus not enough to train deep learning algorithms that often require a large scale of training data. Furthermore, deep neural networks require high-end Graphics Processing Units (GPUs), which are extremely expensive, to be trained within a reasonable amount of time. It is not practical to train deep neural networks to achieve high performance without GPUs. To effectively leverage such high-end GPUs, fast-access Central Processing Units (CPUs), Solid State Disk (SSD) and large-capacity random-access memory (RAM) are also required. In contrast, machine learning is a relatively fabulous choice. Machine learning can achieve relatively considerable results in a small data set. At the same time, machine learning does not need high-performance hardware, and can try a variety of different machine learning algorithms in a short time to choose the best one. Traditional machine learning algorithms are based on feature engineering which is easier to explain and understand than deep networks that lack transparency and interpretability (e.g., it is impossible to determine what imaging features are being used to determine the output). Our proposed machine learning method in combination with statistical textural features accomplished the accuracy of 91.66% for distinguish COVID-19 from CP and AUC of 0.98. It also has a high sensitivity and specificity of 85.26% and 98.15% respectively.
Limitations
1. The ROI is manually delineated which is time-consuming especially when doctors are racing against time to save lives. 2. The CT images, particularly CP images, are not adequate. Despite that the proposed model attained state-of-the-art performance, more clinical images are required to test the generalizability of the machine learning model to other patients. 3. Instead of distinguishing CP from COVID-19, the result of our established model didn’t determine which specific pneumonia it was, such as viral or bacterial, mainly due to insufficient data and quiet over-lapping CT manifestations. Prognosis of CPs will be considered in our future study.