We studied 199 adult patients admitted to the emergency department of the largest hospital in Milan, Lombardy, with symptoms compatible with COVID-19 during the first 3 weeks of SARS-COV-2 outbreak in Italy. In the present manuscript, we describe this population and highlight the differences between patients who actually tested positive to SARS-COV-2 and those who did not. Few attempts in applying artificial intelligence torapidly predict positivity/negativity to SARS-COV-2 were made since the outbreak, using mostly CT imaging and lab results, collected in Chinese population(42-44). Nevertheless, we present the first European attempt and promising results applying artificial intelligence to rapidly predict positivity/negativity to SARS-COV-2 using only basic clinical data, available in the vast majority of emergency departments all over the world. The wide application of this decision support tool could have a major clinical and organizational impact during the current pandemic.
In our study, several differences were observed between the two study groups (Table 3). However, none of these, or a combination of them, allows, so far, to clearly differentiate between patients with COVID-19 and patients with other diseases, having a similar clinical presentation. Our data underline the key finding of Coronavirus-induced alterations in the white blood cell differential count (Table 3). On the one hand, in contrast to other reports (45-48) we did not observe a marked lymphocytopenia, possibly because of the early stage of the viral disease. On the other, other subtypes such as eosinophils might play a key role in COVID-19 (49).
When applying artificial intelligence to our dataset, in particular ANNs and MLS, we were able to predict with high sensitivity and specificity the results of RT-PCR (Table 6).
Artificial Neural Networks allow forecasting through understanding of the relationship between variables, in particular through the application of nonlinear relationships (36, 50, 51). These systems initially learn from a set of data with a known solution (training). Thereafter, the networks, inspired by the analytical processes of the human brain, are able to reconstruct imprecise rules, which may be underlying a complex dataset (testing). Machine learning systems and, in particular, ANNs analyse real-world data very efficiently. The internal validity of their assessment is provided by uniquely strict validation protocols, seldom used in classical statistics (50, 52, 53).
In the present manuscript, it was possible to predict with reasonable accuracy the status of being positive or negative to SARS-COV-2 based on 42 simple variables. This was achieved using the TWIST algorithm, which does not have, at the moment, the same popularity of other techniques, such as K Fold, Boosting and others. Nevertheless, it has been used extensively in the past 15 years in different context(54-56). The reason of its low diffusion is partly that TWIST is very complex to program, as it includes two evolutionary algorithms that work together managing a huge population of ANNs, kNN and Naive Bayes algorithms. The execution of TWIST needs therefore to be programmed in C language to be sufficiently fast. Thus, for its complexity and for needed running time, TWIST is not suitable for programming in Phyton, R or similar languages.
TWIST system allowed reaching a global accuracy of 91.4% with the best machine learning system: 94.1% sensitivity (correct prediction of positivity to SARS-COV-2) and 88.7% specificity (correct prediction of negativity to SARS-COV-2). Considering the eight best machine-learning systems their average performance was the following: sensitivity of 91.8%, specificity of 89.6% and global accuracy of 90.8%.
Comparing the two testing procedures (A-B and B-A), explained in the mathematical section, the differences in predicting values between these two experiment is small, therefore reasonably excluding overfitting of the model(57).
In order to analyze our dataset also with more popular and widely applied procedure, we applieda 5 k-fold cross-validationprotocol,using a selected number of machine learning systems (Table 7). With this type of analysis,the best machine learning system obtained an overall accuracy of 87.7% with a sensibility and specificity of 89.2% and 86.2%, respectively.Global average performance was the following: sensitivity of 87.6%, specificity of 75.4% and a global accuracy of 81.5%.
Comparing these results with those obtained by the same machine learning systems, using the AB -BA Train-Testing protocol (shown in Table 6),the latter allows to obtain a slightly better predictive results,reasonably related to the optimal splitting of the records, with an average performance of 89.1% specificity, 82.2% specificity and 85.7% global accuracy.The high variance of results obtained with the K Fold protocol and the low variance of the same results using TWIST protocol is suggestive of the high polarization affecting the K Fold protocol with this kind of data.
Thisis the reason why we have chosen to rely on an optimized distribution of records in training and testing subsets, rather than on a random allocation. Nevertheless, also the application of a standard K fold cross-validation, i.e. a system widely available, was able to predict accurately the results of SARS-COV-2 RT-PCR.
It is useful to analyse variables selected by AI, as they certainly bear specific clinical information. As mentioned above, the white blood cells and their differential count are certainly very informative.
Indeed, total white blood cell count (R = -0.46), lymphocytes (R = 0.26) and eosinophils (R = -0.34) correlated either negatively or positively with COVID-19 and were included in the model.
In addition, the final model included also variables with very low correlation with RT-PCR results, such as dyspnea (R= -0.07), basophils (R =-0.06), mean cell haemoglobin (R = -0.06), non-invasive ventilation (R = -0.05), monocytes (R = 0.05), age (R= 0.04), female sex (R= 0.02) and headache (R = 0.02). The fact that these variables have been included in the model confirms the ability of ANN to handle highly nonlinear functions.
Other authors have applied AI for the diagnosis of SARS-COV-2. Rao et al. employed an AI framework to a mobile phone-based survey, exclusively based on pre-hospital clinical symptoms and demographic characteristics to assess the probability of SARS-COV-2 infection (58). Three differentresearch groups tried to predict positivity to SARS-COV-2 using, among other variables, CT scans (14, 42, 59). Chest CT scan was analysed via deep learning by Li et al. to differentiate SARS-COV-2 induced viral pneumonia from other lung disease (60). Two other research groups developed from machine learning models free online applications, using only lab test results(43, 44).
Our model significantly differs from the abovementioned. First, it relies on basic clinical information, available in almost every emergency department. The required information is quickly obtainable for every patient at hospital admission. For this reason, we decided to include chest X-Ray rather than CT in our model. Indeed, despite CT being certainly more sensitive in identifying alterations typical of viral pneumonia (13), not every SARS-COV-2 suspect will have access to a CT scan. Second, our study is the first one analysing data from an European country. While there is no evidence so far, it is possible that different ethnicities will show slightly different responses to viral invasion.
This study has certainly limitations. First, it is a retrospective, single-center study. Second, the sample size is limited. Third, some fundamental clinical data, such as arterial blood gas analysis were not available for all patients and this information was thus not included in the model. Given the typical profound hypoxemia of patients with COVID-19, it is conceivable that adding these variables to the system could further improve its accuracy. Finally, while no definitive data are available regarding the accuracy of RT-PCR testing for SARS-COV-2 (61), several studies have described a certain percentage of false negative results (5-7). However, every negative result was re-tested after 48 hours. This methodological aspect should reduce the risk of falsely negative results.
Possible clinical and organizational implications
Facing a highly contagious viral outbreak requires a complex effort in terms of political, economic, social and health systems re-organization (62, 63).
A fundamental aspect is to define a clear management protocol, in order to separate infected from non-infected patients, i.e., those admitted for other clinical conditions. Indeed, it is of paramount importance to set up clearly separate pathways, in order to avoid the spread of viral infections within the hospital. A quick and reliable system to identify SARS-COV-2 infected patients is therefore fundamental. Currently, the gold standard for the diagnosis is a RT-PCR assay searching for SARS-COV-2 genome (9). This type of molecular assay has certainly several limitations. During the first month of outbreak in Italy, the processing of samples became more efficient, theoretically reducing the technical time needed for the result. Despite this, the problem of delayed diagnosis still exists. This is due to the availability RT-PCR machines, considering the high demand during the outbreak. Furthermore, laboratories of referral hospitals, such as ours, analyse samples also arriving from smaller hospitals not equipped for SARS-COV-2 testing. Finally, of course, RT-PCR is not a perfect test, and false negative result have been described, even in the presence of strong clinical suspicion for COVID-19 (5-7).
For these reasons, applying AI as a rapid decision support tool for the diagnosis of COVID-19 and therefore to speed up the sorting of infected from non-infected patients would be of great clinical help. Indeed, a simple online software, fed with basic clinical data, easily obtainable in almost every emergency department, could apply trained ANNs to predict with high accuracy the RT-PCR result. The results obtained from this software should of course be integrated with available clinical data.
The application of AI to clinical practice is still limited for its complexity and for limited in-hospital availability of technical infrastructures and support. This, of course, could be particular troublesome for small centres with limited resources.The decision support software that could integrate the information contained in the present manuscript could ideally retrieve data directly from the electronic patient management system. Otherwise, data could be manually entered in an online software, which however significantly increases the risk of errors. An active support from and collaboration with the local information technology infrastructure is therefore fundamental in order to be able, in the future, to integrate AI into clinical practice.
Finally, it is conceivable that the information obtained from the present study might be useful also at the end of the current pandemic. Indeed, it is likely that SARS-COV-2 might become a seasonal virus. In this regard, the early identification would be a key factor to reduce the risk of a further epidemic outbreak.
In summary, our study suggests that basic clinical data might be sufficient for properly trained ANNs and MLS algorithms to predict with good accuracy the positivity and negativity to SARS-COV-2.If confirmed, this could have important clinical and organizational implications. Indeed, while not directly changing the treatment of COVID-19 patients, it could reduce the time patients spend unnecessarily in the emergency department, could reduce the workload of intensive care staff and, finally, reduce the risk of collapsing healthcare systems.