Ironically, AI has not been counted in so far as a technique which could be significantly relied on in fighting against COVID-19 in conditions of its potentials for medicinal remedy and diagnostic and pharmaceutical aspects [23]. This is while because of its versatility, AI technologies could greatly help scientists, scholars and technologists in a variety of areas, including biomedicine, epidemiology, and socio-economy [24]. This adaptation and adjustment process can be facilitated by new tools and frameworks capable of forecasting that promote management of resources at individual and institutional levels in an efficient manner [25]. Control strategies mostly aim at avoiding critical overload of health systems, and are designed to prevent such overload happen because it is through solutions, such as disease contention and mitigation that mortality rate could come under partial but promising control [26]. Notwithstanding the nature of a scientific discipline, the ability to prevent chaos and predict well is an essential ability that could promote the outcome of research and practice related to dynamical systems and various scientific endeavors [27]. An instance of such mechanism is physics where certain laws determine the evolution of physical system, and it is modeled by a dynamical system according to a set of parameters and the stable condition of the system before system’s evolution [27].
Having its roots in computer science, AI builds on human intelligence, but extends beyond human limitations through reducing the workload; an instance of such characteristics is that in contrast to traditional statistics that depends on organized and unified data AI technology screens the original data and calculates attributes that are important [28].While in traditional epidemic models infection rate is analyzed according to the changing number of infections to predict epidemic’s trends, these models fall short in differentiating between different levels of infection and assume the same level of infection for all patients [29]. Lacking this level of deep insight their output is limited to general trends only. In contrast, AI is capable of identifying, tracking and forecasting outbreaks as well as diagnosing the virus, and processing the healthcare claims [30]. AI aims at a better data representation for the desired ML algorithm because the original representation might not be as detailed and thorough as intended. Also, both a manual and an automatic approach is possible while in the before case the feature construction condition is of one typical applications [20, 31].
In addition, big data must be taken into account as an important infrastructure to facilitate modeling studies of viral activity and deeply informs healthcare policymakers to better prepare for an occurring or resurging outbreak [32]. Indeed, emerging intelligent techniques, such as those based on ML, have demonstrated that they can be a great help to trace the source or predict infectious diseases spread trends [33]. Such an ability makes big data and intelligent analytics effective solutions to be employed to the benefit of patients, health care providers and public health [34]. Using analogies and ideas from atmospheric sciences, the authors in [35] critically assess the proposition that bigger predictions are the outcome of bigger data and conclude that compromising modelling and quantitative analysis can yield to a working forecasting strategy. Extracting the features is a process during which dimensionality is reduced and an original set of raw data is refined and reduced to data which could be more conveniently managed. A large data set is characteristically assumed to have a large number of variables making computing resources essential to perform processing [31].
On the other hand, the demand for using social media platforms by associations and organizations reach the public both on national and international level. This huge request for communication purposes leads to a confusing chaos caused by information overload and/or misinformation submersion [30]. A research showed that a better yield is to be expected when technology is employed in strategies to bring pandemic under control or provide the pandemic-stricken community with enough support so that the spread of the infection comes under a decisive control [36]. Admittedly, using big data to fight against COVID-19 presupposes an important role for informatics specialists who could work side by side with physicians, nurses, paramedics, health practitioners, and all other professionals in the field to provide telehealth or virtual care [37]. It is true that, globally speaking, world health systems are still relying on classic public-health measures to combat pandemic of COVID-19. Nevertheless, it cannot be denied that a wide range of digital technologies available to health care providers who could enormously enhance public-health strategies if these technologies would have been adopted by governments and health systems [32].
In [38] an ML-based improved model was employed for prediction of COVID-19 potential threats worldwide. The findings demonstrated that a better fit in developing a prediction frame work was achieved through the use of iterative weighting for fitting Generalized Inverse Weibull (GIW) [39] distribution. Being deployed on a cloud computing platform to achieve real-time and precise forecasting of the epidemic’s growth behavior is one of the main features to rely on [38]. To enhance the accuracy of the predictions, health facilities, population density, weather conditions, average and median age, etc. were integrated [38]. A new analysis of the ongoing DL and ML methods to diagnose and predict the occurrence of COVID-19 was presented in [40]. This research also compared the impact of ML and other competitive approaches such as mathematical and statistical models on COVID-19. To forecast and predict the pandemic, computing approaches as well as factors, such as method types and disease-related research impact on the nature of the data were presented in [40], while a systemic review of epidemiological, clinical, chest imaging, and laboratory data available at the was present in [41].
4.1. ML and DL Methods
In this part, we take a glance at some methods enabled by ML and DL to forecast the virus spread. Table 1 gives information about ML and DL methodologies utilized for COVID-19 spread without combination with statistical methods. ANN method was used in [42] to classify the obtained data between 20 February, 2020 and 9 March 2020. Seven different parameters, namely, birth year, infection reason, country, group, confirmation date, sex, and region were used by the proposed classifier, and the most effective variables related to recovered and fatal cases were analyzed based on ANN model [42]. In addition, a new susceptible-exposed-infected-removed (SEIR) model was utilized, in which domestic reduction data prior and after January 23rd as well as latest COVID-19 pandemic data for prediction of epidemiological spread was incorporated. The proposed model prediction was corroborated through utilization of a ML approach trained on 2003 SARS coronavirus outbreak data [43]. C. Menni et al. presented an mobile application to track the symptom showing that the virus has caused the loss of senses of smell and taste in2,618,862 individuals this technique highlights the significance of big data and COVID-19 related technology[44]; (Methods) were investigated in [45].
On 24 March 2020, this tracker was used in the United Kingdom, and it was then exploited in United States, this app-based tracker was free and installed on smart phones to collect the information from symptomatic and asymptomatic persons and works in in real time to track the progresses of the disease and spread model through health information that are reported by the individuals themselves on a regular daily basis. This self-reported information includes PCR test, hospitalization symptoms, prior medical terms and demographic data, logistic regressions adjusting for age, BMI and sex to identify symptoms other than anosmia that could be related to SARS-CoV-2 infection[45]. The risk of positive COVID-19 diagnosis with ML, through taking the results of emergency care admission exams was conducted by X. Yuan et al. [40]. The data is collected from the patients from a hospital located in São Paulo, Brazil, over a period of time, commencing from 17March 2020 to 30 March [46].
For analysis and prediction of the confirmed cases a Convolution Neural Network (CNN) was also presented by C.-J. Huang et al.[47]. The focus in this study was on Chinese cities, which experienced most of the confirmed cases and a forecasting model for COVID-19was put forward according to CNN deep neural network method. Additionally, to forecast the outbreak, a Multi-Layered Perceptron (MLP) network with two scenarios was utilized in [48], in which a set of data points regarded to both scenarios was employed to train the network. On the basis of this research, 8, 12, and 16 internal neurons were tried for realization of the best possible responses. RMSE and correlation coefficient were used for evaluation and reduction in the cost function value. Furthermore, a novel modified predicting model was presented in [49], based on Adaptive Neuro Fuzzy Inference System (ANFIS) for prediction and estimation of the confirmed individuals involved with virus in a ten-day period ahead of the infection, on the basis of confirmed cases which were recognized in China. The introduced method utilized an enhanced Flower Pollination Algorithm (FPA) which have been equipped by Salp Swarm Algorithm (SSA). The most important advantages of the proposed ANFIS-based technique are its flexibility in the process of indicating nonlinearity in the time series data, and integrating the properties of fuzzy logic systems and artificial neural networks (ANN). An example of different forecasting applications was presented in [50], in which ANFIS and empirical mode decomposition were used to propose a stock price forecasting model. Also, in [51], a Virus Optimization Algorithm (VOA) was combined with an ANFIS to investigate the impact of population density and climate-related factors on COVID-19 spread. Used data in this study was related to climate-related factors and COVID-19 confirmed cases across the U.S counties. Ref. [52] presented a modified version of ANFIS model for prediction of the infections number in 4 different countries. This modified ANFIS is based on marine predators’ algorithm (MPA) which is a new nature-inspired optimizer. The MPA is used to optimize the ANFIS parameters that leads to a better forecasting performance. To evaluate the proposed MPA-ANFIS, official datasets of the four countries were utilized [53].
In [72], ML tools selected three biomarkers to forecast each patient’s mortality more than ten days in advance with more than ninety percent precision: high-sensitivity C-reactive protein (hs-CRP), lymphocyte, and lactic dehydrogenase (LDH). Relative high levels of LDH had apparently a significant role in determining a significant number of cases in dire need of instant medical attention. This confirmed the existing medical knowledge suggesting high LDH levels as being associated with tissue breakdown that happens in different diseases such as pulmonary disorders including pneumonia. Having considered COVID-19 progressive trends in China and South Korea, [57] relied on ANN-based curve fitting techniques to predict and forecast the number of occurring cases and death related to COVID-19 in France, USA, India, UK, and considering the China and South Korea progressive trends.
The impact of COVID-19 epidemic in Italy [55] to identify mediators of perceived stress was investigated using ML models [55]. The findings have the potential of being used for early and targeted intervention and prevention programs. Ref. [73] used WHO datasets and datasets presented by Johns Hopkins University for creation of training dataset. The recurrent neural networks (RNNs) were later used to develop two Prediction Models. The first time-steps information was collected by a dense layer of neural network and a consequent regression output layer to make determining the next predicted value possible. Moreover, Ref. [56] was a case-control population-based study done on the Lombardy region in Italy. There were 6272 patients with SARS-CoV-2 between 21/2/2020 and 11/3/2020. Age, sex, and municipality of residence were criteria according to which these patients were matched to 30,759 beneficiaries of the Regional Health Service (controls). Information related to the use of selected drugs as well as clinical profiles of the patients were collected from health care regional databases [56]. Ref. [58] utilized multiple ML algorithms to predict occurrence of infection globally based on dataset analysis. The lowest R2 score of equal to 0.8273 belongs to ML algorithms, such as Support Vector Regressor, among Bayesian Ridge Regression and Polynomial Regression, and the highest RMSE value equal to 124328.5297 amongst the three models indicating Support Vector Regressor stands last in the line of preferred models [58]. Using a real-time COVID-19 time series data related to the period of January 22, 2020, to May 18, 2020, [54] proposed a hybrid model incorporating ensemble empirical mode decomposition (EEMD) ANN to forecast COVID-19 epidemic. The time-series data first decomposed through the use of EEMD to create sub-signals and made original data denoised, and ANN architecture was implemented for training the denoised data [59]. AI-based methods and natural language processing methods with unstructured data of patients gathered by telehealth visits to improve the computer algorithms efficiency used for screening COVID-19 testing, were used in [74]. The study consists of segmenting and parsing documents as well as a consequent investigation and analysis of overrepresented words appearing in patient’s symptoms. The study was also marked by a word embedding-based ANN used to predict COVID-19 test results according to symptoms that patients have reported themselves.
A dataset including publicly available information related to 51 days (22/01/2020 to 12/03/2020) on the number of infection, recoveries, and deaths in 406 locations was used in [60]. The initial aim of this dataset was being a time-series dataset has to be modified so that it could be a regression dataset to be used to train an MLP ANN. The training, in this case, was aiming at achieving a global model of patient's maximal number in all locations across each time unit. Hyperparameters of the MLP used a grid search algorithm consisting of a total hyperparameter combinations equal to 5376 [60]. Also, Ref [62] presented a multiple ensemble ANN model with fuzzy response aggregation for time series of COVID-19 [62]. Ensemble neural networks include a set of modules employed to create various predictions regardless of existing conditions. To aggregate responses of several predictor modules Fuzzy logic was used, which in turn, improved the ultimate prediction by combining modules outputs in an intelligent way. Fuzzy logic deals with the uncertainty that may rise throughout the process of reaching the final prediction [62]. Besides Ref. [75] proposes an ML-based approach to implement a model to firstly help doctors verify the disease within a short time period, and secondly predict the growth of the disease in the near future of the world. To achieve this aim two models were used: the first model was based on Convolutional ANN model, and the second one considered Convolutional ANN and RNN. These models were evaluated and compared to verify the results predicted for the original one [63].
There were 6 different ML-inspired and statistical time series approaches developed to approximate the active cases percentage in comparison to total number of populations in [76]. This was done looking a week ahead, and for 10 countries that had highest number of confirmed cases ever since 4 May 2020. To work as a tool of data collection an online questionnaire was developed and used in [61]. The data collected by this method was then utilized as input for different forecasting models based on machine learning model (SVM, and MLP) and statistical model (Logistic Regression, LR). Using signs and symptoms these models were employed for predicting potential COVID-19 patients. Ref. [77] was a case control study in which patients whose COVID-19 infection was verified during 23/01/2020 and 06/02/2020 as well as all emergency patients, outpatients, and inpatients, except the control group-those with COVID-19 during the same period -were included. In addition to describing the sources of infection, consultation time, and incubation period in the cases, this study calculated the secondary incidents occurring in Gansu. Moreover, Ref. [78]was focused on investigating the capacity of a simplified macroscopic virus-centric model to simulate COVID-19 evolution across a country with the condition that evolution of development conditions such as behaviors and containment policies in the territory under study were sufficiently homogeneous. Using ML [65], a method to forecast poor COVID-19 patients' prognosis was suggested. The dataset for this study included information of 13, 690 patients that were either dead or recovered and cured.
The development of a comparative regressive and ANN model designed to examine the COVID-19 impacts on China’s demand for electricity and petroleum was reported in [79]. The environmental analysis demonstrated that the gravity of the pandemic has significantly affected China’s demands for electricity and petroleum in direct and indirect manners [79]. The analysis in [80] was founded on a recent momentum management of epidemics theory, while Bessel functions are employed. The utilized parameters were initial transmission rate that reflects the ”normal” frequency and viral fitness of contacts in the infected areas, and indicates the intensity of preventive measures [80].
The main characteristics of trends and patterns of COVID-19 outbreak in Canada were evaluated based on LSTM network [1]. One capability of recurrent LSTM networks is that they can fore ground conventional time series predicting techniques limitations through adjusting nonlinearities of COVID-19 dataset. Accordingly, LSTM blocks operate at different time steps to pass their output to blocks ahead of them and this continues until the sequential output is generated by the final LSTM block [1]. In a separate research, data which was obtained from Google Trends website was used for prediction of COVID-19 in Iran [66]. To predict the cases LSTM and Linear Regression methods were employed, k-fold cross-validation, and Root Mean Square Error (RMSE) were used to validate all models and as the performance metric. The LSTM model demonstrated fluctuations in folds performance at the time when there was low training loss. This signifies overfitting in the LSTM technique due to the limitation of training data [66].
LSTM model is an RNN trained based on the 2003 SARS epidemic statistics which incorporates some epidemiological features including transmission incubation rate, probability, recovery probability as well as death and contact number. To predict COVID-19, a hybrid AI model was suggested in [29]. Initially, an improved SI (ISI) model was proposed analyzing and scrutinizing the alteration in the infectious potentials of the carriers of the virus afterwards the infection. In the next step and with due attention paid to the preventing effects, risen prevention awareness in the public, key control measures and with the purpose of building the hybrid AI-based technique for predicting COVID-19, the Natural Language Processing (NLP) module and the LSTM network were both incorporated into ISI model. In addition to the proposed hybrid method integrated in LSTM network and NLP module described in this paper, this article introduced information related to local and central governments’ efforts as well as public support to the process of prediction calculation [29]. LSTM network was also used for estimation of the deviation of the epidemiological method and was combined with the introduced ISI model for the purpose of estimating the number of infection occurrences.
M. Niazkar et al. foregrounded a response to fight against virus through AI (AI), including some DL methods, such as Extreme Learning Machine (ELM), Generative Adversarial Networks (GANs), and Long /Short Term Memory (LSTM) [81]. Relying on real-time data collected from the Johns Hopkins dashboard [54] proposed ML and DL models designed to understand its day to day exponential behavior of COVID-19 and its future reachability across the nations. New mathematical models are chosen based on ML, such as polynomial regression (PR) and support vector regression (SVR) [82], and DL regression models, such as a standard RNN and Deep Neural Network (DNN) using LSTM. A few significant climate parameters, such as relative humidity, daily average temperature, and wind speed as well as some urban parameters including population density were taken into account to realize analysis of their impacts on COVID-19 confirmed. This analysis was made on three case studies in Italy along with an investigation of the proposed method [83]. Moreover, Prediction of verified cases was foregrounded by an LSTM for time series [64]. Seasonal Autoregressive Integrated Moving Average (SARIMA) [64], RNN, moving averages, and Holt Winter's Exponential Smoothing (HWES) approaches were utilized for justification [64].
4.2. Combination of Statistics, Metaheuristic Algorisms with AI methods
This part demonstrates those statistical and analytical methods that are combined and empowered by AI techniques, such as ML and DL. Table 2 shows techniques such as ARIMA, SEIR, MAE, SAE, and SARIMA which have been boosted by AI methods. A Modified Auto-Encoders (MAE) to realize forecasting the new infected individuals numbers were utilized by D. Charte et al. [20]. Auto-Encoders (AE) is a kind of ANN which is utilized to learn efficient way of coding the data without supervision [84]. While a good number of these are capable of generating reduced feature sets through fusion of the originals, AEs designed with other applications can be options to consider [20]. AEs aim to learn a representation for a set of data through training the network to ignore signal “noise” that is typically used to reduce dimensionality. While in classical AE the number of nodes in the layers increases from the hidden layers to the input layer, the numbers of the nodes in the output layer, the second hidden layer, first hidden layer, and the input layer in the MAE were 1, 4 ,32, and 8, respectively [14]. The results demonstrated high accuracy of prediction and subsequent multiple-step forecasting. Based on their experience a longer training time caused an improved forecasting [14]. In addition, a MAE was designed and developed to deal with the existing limitations [85]. Accordingly, each intervention variable was assigned a weight between 0 and 1 according to the interventions degrees while zero is an indication of no intervention and one being complete. Bringing 152 countries under scrutiny, ending time, peak time, duration, peak number and the number of COVID-19-infected persons under four intervention scenarios were estimated as a result of which critical information was available to high-official ranking and health administrators to facilitate immediate public health measures toward the plans that are suitable to slow COVID-19 spread. The results obtained from this research were in line with the dire need of urgent aggressive interventions.
An MAE-based approach is used in [86] proposing alternative strategies to model COVID-19 dynamics [87]. As the results demonstrate, this approach is superior to traditional and LSTM approaches. The proposed approach has the world regions initial clustering as its outset for which data is available. The data shows the locations with pandemic advanced stage, but does this based on a set features that are manually engineered indicating a country’s response to the early stage of pandemic spread. The TM, FM (including medical staff and hospitals), and DM constructed in [69] are to predict COVID-19 spread in top ten most-affected countries. One main factor that directly impact COVID-19 spread is public knowledge and behavior. Regional properties were exclusively used by the proposed hybrid model to provide robust estimates. To substantially improve the performance of the models extra modules could be included and real data could be employed [69]. Moreover, using seven up to nine days data sequences for predicting the trend of daily growth of COVID-19 infected cases in China, six rolling grey Verhulst models were built in [88]. Hubei province data was compared to the data related to other nine provinces to analyze characteristics and differences of the SV of COVID-19-related symptoms, and investigate the correlations between the SV of COVID-19 and the number of recent suspected / confirmed infection cases [89].
Table 2. Combination of AI techniques to enable ARIMA, SEIR, MAE, SAE, and SARIMA methods for forecasting COVID-19 spread
Author
|
Technique
|
Country/ Region
|
Description
|
Data
|
Results
|
Thadikamala Sathish, et al.[90]
|
ARIMA
|
India
|
Predictions of patients raise, recovery and death rate
|
from 30th Jan 2020
to 15th May 2020
|
Forecasting was done by using the constructed models up to July 8th 2020
|
Roseline Oluwaseun Ogundokun, et al.[91]
|
ARIMA; SVR, NN, and LR
|
India
|
PREDICTION
|
from January 2020 to April 2020
|
The COVID-19 disease can correctly be predicted according to the obtained results
|
Vasilis Papastefanopoulos, et al.[76]
|
ARIMA, HWAAS, NBEATS, TBAT, Gluonts
|
USA, Spain Italy, UK France, Germany, Russia, Turkey, Brazil, Iran
|
Forecasting
|
as of 4 May 2020
|
ARIMA and TBAT obtained better results compared with DL ones such as Deep AR and N-BEATS
|
Zohair Malki, et al.[92]
|
SARIMA
|
France, Italy, USA, UK
|
Predicting the End of Pandemic
|
Collected data from 22/1/2020 to the present time
|
confirmed case will slowdown in October, 2020
|
Leila moftakhar, et al. [93]
|
ANN, ARIMA
|
Iran
|
A Comparison between ARIMA and ANN predcition
|
New cases from 19/2/2020 to 30/3/2020
|
ARIMA model has better prediction result than ANN
|
Kabir Abdulmajeed, et al.[94]
|
ARIMA, GARCH
|
Nigeria
|
Online forecasting mechanism
|
cases from February 27, 2020, to April 5, 2020
|
providing academic thrust in guiding the policymakers
|
George Xianzhi Yuan, et al.[95]
|
iSEIR model
|
China
|
Forecasting of the Critical Turning Period
|
From 2020 January to early of 2020 March
|
Control the epidemic time should be around mid-February 2020
|
İsmail Kırbaş, et al.[96]
|
NARNN, ARIMA, LSTM
|
Germany, Denmark,
France, Belgium, UK, Turkey,
Switzerland, and Finland
|
Comparative analysis and forecasting
|
The data covers 97, 67, 100, 90, 94 55 68 and 90 days respectively and ends on 3/5/2020
|
The best model result has been obtained for LSTM
|
Zixin Hu, et al.[97]
|
MAE, ARIMAX, SEIR
|
152 countries
|
Forecasting and Evaluating Multiple Interventions
|
From 20/1/2020 to 16/3/2020
|
The obtained 2.5% average error of five-step ahead prediction
|
Farhan Mohammad Khan, et al.[98]
|
ARIMA, NAR, MoHFW
|
India
|
Forecasting model for time series analysis
|
from 31/1/2020 to 25/3/2020
|
Estimating trend in the actual and approximately 1500 cases per day on 04th April 2020
|
Igor G. Pereira, et al.[87]
|
LSTM-SAE
MAE
|
Brazil
|
Forecasting
|
From Feb 2020 to May 2020
|
The pandemics estimated to end (with 97% of cases reaching an outcome) in some states in28 May and rest through 14 August
|
Amal I. Saba, et al. [99]
|
ARIMA, NARANN
|
Egypt
|
Forecasting the prevalence
|
Data collected between 1/3/2020 and 10/5/2020
|
NARANN has acceptable error results of less than 5%
|
Zixin Hu, et al.[100]
|
SEIR; AE; IAE
|
USA
|
Estimating that the peak time
|
From January 22, 2020 to April 24
|
The Covid-19 peak time in the US is estimated
|
Zixin Hu, et al [85]
|
MAE
|
Countries worldwide
|
Forecasting intervention
|
The Num. of cumulative, death cases and new cases of Covid-19 in the period of January up to March 2020
|
Num. of cumulative cases by January 10, 2021;
under later intervention: 255,392,154
under immediate intervention: 1,530,276
|
Statistical and AI based approaches for modeling and predicting the epidemic in Egypt was presented in [101]. The used approaches in this study are Nonlinear Auto Regressive Artificial Neural Networks (NARANN) and ARIMA. Furthermore, Ref. [99] investigated and analyzed the environment and situation in and out of China to predict worsening the epidemic. Official data related to infections, deaths, and suspected COVID-19 were collected and the findings demonstrated the seriousness of situation in Hubei Province and Wuhan City. A trend comparison method, ARMA and ARIMA to analyze the data and predict was presented in [102]. A comparison demonstrated 19 February 2020 and 14 March 2020 when a full control of the situation was achieved as the key dates of COVID-19 The numbers related to infections and deaths and GDP growth were also predicted simultaneously. ML models’ ability to forecast upcoming COVID-19 patients’ numbers was demonstrated in [103]. In this study four standard forecasting models namely, least absolute shrinkage, exponential smoothing (ES) selection operator (LASSO), SVM and linear regression (LR) were utilized to forecastCOVID-19 threatening factors [104]. The potential of data science [105] for the purpose of assessing risk factors related to COVID-19 after an analysis of the datasets obtained from Oxford University database as well as and recently simulated datasets, following the analysis of different univariate LSTM models to forecast new cases and related deaths was investigated in [71].
An online forecasting procedure to stream data from the Nigeria Center for Disease Control was employed in [106] for updating an ensemble model’s parameters with the purpose of updating COVID-19 forecasts every day. The ensemble realizes the combination of an ARIMA, Prophet, which is a Facebook developed additive regression model, and a Holt-Winters Exponential Smoothing model which is combined with Generalized Autoregressive Conditional Heteroscedasticity (GARCH). Such assemblage was to provide public health officials and policymakers with substantial academic guidance in the process of establishing containment strategies as well as assessment of containment interventions to deal with disease spread in Nigeria [94]. A symptom-to-disease digital health assistant called Symptoma, was used to differentiate over 20,000 different diseases demonstrating 90% accuracy. Symptoma’ accuracy in identifying COVID-19 in relation to various sets of clinical cases and similar diseases came to be tested in [107].
In [108] the used database included 57 candidate explanatory variables for testing the MLP network performance in anticipating the cumulative occurrence rate of COVID-19 in the United States. Daily data related to the period between 30th Jan 2020 to 15th May 2020, presented by the government of India was used in Govt from [90] to implement an ARIMA model to forecast occurrence rising numbers, recovery and death in India. The autocorrelation function (ACF), partial autocorrelation function (PACF), and standardized residuals were employed to determine if the model implemented in this study is a good fit [90]. In [109], symptoms, transmission modes and putative treatments to deal with COVID-19 were investigated and reported [110]. The report summarized relevant available information on the genome, evolution and zoonosis of coronavirus. In [111] the authors aimed to synthesize the challenges that retailers have to dace and deal with during COVID-19 pandemic. To create a guideline for retailers in this study the pandemic was approached from the perspective of consumers and managers. Using 2 explainable AI methods, ECPI and SHAP, three most significant measures in countries and regions under study were investigated to construct models to forecast the instantaneous reproduction number (Rt) and to use the models as surrogates to the real world [112].
In [92] a forecasting model was developed to estimate the time of the possible halt in the activity of the virus as well as the risk of COVID-19 pandemic resurgence. SARIMA model was adopted to predict virus spread in some selected countries and predict the life cycle and end date. Since the virus acts similarly in different places, this study could be of use in all countries around the world. It yields well to governments and public health officials to make decisions and plan for future policies and actions; hence, reducing anxieties and tensions that pandemic can impose on COVID-19-stricken areas [92]. The Chinese Sina-microblog witnessed an outbreak of public opinions triggered by COVID-19 outbreak. To recognize the important information propagation patterns across social networks [113] proposed a multiple-information susceptible-discussing-immune (M-SDI) model to design effective communication strategies during a pandemic. M-SDI model was developed relying on public discussion quantity. In addition, the underlying mathematical model consisting of individual SEIR (iSEIR) model that is a set of differential equations which extends the classic SEIR model was proposed in [114]. Using the collected data between 26/03/2020 and 04/04/2020, an ARIMA model was adopted on the collected data between 31/01/2020 and 25/03/2020 [98]. In order to compare the accuracy of predicted models a nonlinear autoregressive (NAR) ANN was developed. The model was used to predict the occurrence of COVID-19 within 50 days when no additional intervention was in place [98].
A Genetic Algorithm (GA) was used to estimate parameters of Compound, Cubic, Logarithmic, Linear, Logistic, Quadratic, and exponential equations with the purpose of developing the desired model [115]. The selected population number was 300, and based on various trial and error examinations, iteration number indicated as the maximum generation was 500to decrease the cost function value. In this respect, the Mean Square Error between the output values of the system and target was defined as the cost function. Also, M. H. D. M. Ribeiro et al. demonstrated how stacking-ensemble learning, Support Vector Regression (SVR), ridge regression (RIDGE), Autoregressive Integrated Moving Average (ARIMA), random forest (RF), cubist regression (CUBIST) were able to be used in time series for predicting cumulative confirmed cases of COVID-19 in ten states located in Brazil with high rate of COVID-19 spread [7]. Ref. [50] proposed an AI model called multi-gene genetic programming (MGGP) for the first time to predict the outbreak of COVID19. Despite significant fluctuations in the number of confirmed cases that makes the task a complicated one MGGP results were promising because the predicted confirmed cases were in an acceptable range near to the values that were considered for the seven countries investigated for the purpose of the study. As a result, MGGP could be a good suggestion to be employed in development of the estimation approaches for COVID-19.
Using data from Hungary presented a hybrid ML approach for COVID-19 prediction [116]. The hybrid ML method was an MLP enabled by Imperialist Competitive Algorithm (MLP-ICA) and ANFIS that predicted infected individuals time series as well as mortality rate. The prediction indicated a significant drop in the total morality and the outbreak by the end of May. Besides, having an analysis of global COVID-19 data through utilizing ML techniques, Ref. [54] demonstrated covariates associated with confirmed cases. Moreover, the forecasting for the number of infected cases in the USA, UK, and Russia based on the number of daily confirmed cases of COVID-19for these countries between January 22, 2020 to May 28, 2020 as presented on WHO database was used in [117]. This research tested Autoregressive Distributed Lag Models (ADLM) and ARIMA, and Double Exponential Smoothing (DES) [117]. Data of new cases in Iran happening was used in [93] for predicting of patients numbers. ARIMA and Artificial Neural Networks (ANN) models were used to realize prediction [93]. Open datasets provided by the JOHN Hopkins and daily reports of Iran Ministry of Health were used to prepare the data. The Gompertz and Logistic mathematical models, and the ANN computational model were applied to model the COVID-19 cases numbers of infection between 27th February and 8th May [118].
Support-Vector Machines (SVM) in ML were supervised models which included associated learning algorithms for analyzing data which are used for the purpose of regression analysis and classification. A technique is presented in 1992 to create nonlinear classifiers through the application of kernel trick to maximum-margin hyperplanes [119]. Corinna Cortes and Vapnik, however, proposed the current standard incarnation in 1993 which was published in 1995 [120]. SVM models are representations of examples as points in space which are mapped in a way that a clear wide gap could divide the examples of separate categories [120]. Synthetic Minority Over-sampling Technique (SMOTE) was basically trained from data sets which were imbalanced [121]. Contrary to standard boosting in the process of which equal weights are assigned to all misclassified examples, in SMOTE Boost synthetic examples are created from rare or minority classes, causing indirect changes to in updates of weights and skewed distributions compensations [121]. Researchers focused on tracking people’s transit between Wuhan and mainland China until January 2020 through utilizing a detailed geolocation data of cell phones to calculate total population movements. This research uses the people geographical flow to anticipate the subsequent locations, severity and time of outbreaks in the other parts of mainland China until February 2020. The obtained data proved higher efficiency compared to measures, such as wealth, population size, or distance from the source of the risk. Using population flows, this research also models COVID-19 epidemic curve across different locales while deviations from model predictions were used as tools for detection of the burden of community movements [122].
Group Method of Data Handling (GMDH) refers to an algorithms family which was used in mathematical computer-based modeling of multi-parametric datasets featuring fully automatic structural and parametric model optimization [123]. Complex systems modeling knowledge discovery, data mining, prediction, pattern recognition, and optimization are among the fields in which GMDH was utilized. A main characteristic of GMDH algorithms was inductive procedure to perform sorting-out of complicated polynomial models and adopting optimal solutions through relying on external criterion [123]. Using the classification of COVID-19 confirmed cases a serious challenge in the sustainable development process was scrutinized in [124]. Accordingly, GMDH type of ANN as one of the AI methods used binary classification modeling [124]. S. Uhlig et al. Proposes an empirical top-down method to model and forecast the risks and calculate (local) outbreaks [25]. This research used neural networks for developing leading indicators according to data which was available in different regions. The indicators were used for estimating (new) outbreak risks or determining if a measure is desirably effective in an early stage, but they could also be employed in parametric models to ascertain an effective forecast side by side with the associated uncertainty [25]. In [61] a strategy was developed that was backed by AI, and a combination of three methods: Support Vector Machines (SVM) [119, 120], SMOTE Boost [121], and Ensemblingt [125], to conduct initial screening of probable COVID-19.It contains a ML classifier whose input consist of existing simple blood exams to be classified into two negative (not having SARS-CoV-2) or positive (having SARS-CoV-2) samples [126].
Ensemblingt integrates multiple models to build a predictive model. Ensemble methods are capable of improving prediction performance [125]. Statisticians, AI specialists and researchers from other disciplines can use ensemble methodology. It is based on weighing several individual pattern classifiers, and combining them for reaching a classification superior to those which are obtained by each one separately[125]. An important feature of an ensemble is having diversity in generation mechanism and choosing combination procedure. Z. Allam et al. documented AI’s role in early detection of the COVID-19 as performed by two companies, BlueDot and Metabiota showing that AI-driven algorithms had been superior in rendering precise predictions and future readings through increased data sharing [17]. The findings demonstrate that taking the nature of sensitive issues of privacy and security into account, there is a dire need for an increased data sharing practice to be implemented in urban health sector [19].
A novel forecasting model, called Chaotic Learning (CL) strategy into a multi-layer Feed-Forward Neural Network (MFNN) to use the data reported as of 22 Jan 2020 to analyze and predict the CS of COVID-19 for the future days is suggested in [127]. This forecasting model known as ISACL-MFNN integrates an optimized interior search algorithm (ISA) using CL strategy into a MFNN. The ISACL incorporates the CL strategy with the purpose of enhancing ISA performance ISA and avoiding being trapped in the local optima. The purpose of this approach is tuning the neural network’s parameters to optimal values to train the network so that high precision of forecast results could be achieved [127]. In another research [128], it was suggested that situational information could be resourcefully help both the authorities and public in responding to the epidemic. This study, therefore, employed natural language processing techniques and Weibo data for categorizing information related to COVID-19 into 7 types of situational information. There are specific features found in forecasting the amount reposted for each information [128]. Because of having limited data, the authors merely trained 3 traditional classifiers based on NLP to train classifiers and identify situational information’s content types.
Differential Evolution (DE) algorithm and ANN based on Particle Swarm Optimization (PSO) algorithm were two AI methods utilized in [129] with the purpose of investigating and prioritizing parameters for consequences of COVID-19 outbreak. This research was focused on prioritizing and analyzing the role of some certain environmental parameters. Scrutinizing four Italian cities in Italy some main features including climate parameters, such as relative humidity, daily average temperature, as well as urban parameters such as population density, were utilized as input data set while COVID-19 confirmed cases were considered as output dataset [129]. The information about the recent researches on prediction of COVID-19 with use of both statistical models and AI methods have been brought in Table 3.
Table 3. Combination of statistical techniques with AI-based approaches to predict the COVID-19 spread
Author
|
Technique
|
Country/ Region
|
Description
|
Data
|
Results
|
R. Sujath, et al.[130]
|
LR, MLP, VAR
|
India
|
Forecasting
|
80 instances from the Kaggle dataset for prediction
|
MLP model has obtained better precision compared to LR and VAR models
|
Abolfazl Mollalo, et al.[131]
|
MLP
|
USA
|
nationwide modeling of COVID-19 incidence
|
From 22/1/2020 to 25/4/2020
|
The prediction capability of the model requires a significant improvement
|
Xuanchen Yan, et al.[132]
|
SPSS 25.0
|
China
|
Big Data analysis
|
between January 23 and February 6, 2020
|
Middle-aged people (P=0.038) have more probability to be infected
|
Tajebe Tsega Mengistie [133]
|
Fbprophet
|
Countries worldwide
|
Analysis and Prediction
Modeling
|
start from April 12, 2020
|
the last 10 days and analysis graphically by using the data mining
|
Abdallah Alsayed, et al.[134]
|
SEIR, ANFIS, GA
|
Malaysia
|
Prediction of Epidemic Peak
|
from 25 January to 05 April 2020
|
a NRMSE of 0.041; a MAPE of 2.45%; R2 of 0.9964
|
Yu-Feng Zhao, et al.[88]
|
rolling grey Verhulst models
|
China
|
Prediction
|
from 21 January to 20 February 2020
|
The minimum and maximum MAPEs are 1.65% and 4.72%, respectively for the test stage
|
Ali Behnood, et al.[135]
|
ANFIS, VOA
|
USA
|
Determinants of the infection rate
|
1657 counties
|
The models could forecast the variables effects on the infection rate
|
Mohammed A. A. Al-qaness, et al.[53]
|
MPA-ANFIS,
ANFIS
|
Italy, Iran, Korea, and USA
|
Forecasting
|
from 22 January 2020 to 7 April 2020
|
MPA-ANFIS has better results compared with the other models in almost all performance measures
|
Xiuyi Fan, et al. [112]
|
SHAP and ECPI
|
18 countries
and regions
|
Spreading Factors
|
from 22/01/2020 to 02/04/2020
|
Warm temperature helps for reducing the
transmission
|
Salgotra, Rohit, et al.[136]
|
GP, CC, DC
the GEP-based models
|
India
|
Genetic Evolutionary Programming
|
since 24 March 2020
|
The GEP-based models have precise results for time series prediction
|
Lifang Li, et al.[128]
|
SVM, NB and RF
|
All countries
|
Characterizing the Situational Information Propagation
|
Weibo data: From 30/12/2019 to 1/2/2020
|
Indicating the necessity of information publishing strategies for situational information
|
Ramon Gomes da Silva, et al.[137]
|
VMD
|
USA and
Brazil
|
Forecasting
|
Cumulative cases of COVID-19 that occurred until 28/4/2020
|
VMD-based models are very strong tools for prediction
|
Abhari, Reza S., et al.[138]
|
EnerPol
|
Switzerland
|
Containment Strategy and Growth Prediction
|
Available public data and adapted to Swiss demographics
|
Estimating deaths, recovered and cases between 22 February and 11April 2020
|
Ashis Kumar Das, et al.[139]
|
SVM, KNN, RF, GB, LR
|
South Korea
|
development of a prediction tool
|
3,128 patients
|
GB algorithm has the highest precision compared to the other studied models
|
Pokkuluri Kiran Sree, et al.[140]
|
HNLCA,
|
India
|
cellular automata classifier for trend prediction
|
6785 datasets and 23,078 datasets are used for test and train, respectively
|
The average accuracy of 78.8% is reported
|
Gregory Baltas, et al. [141]
|
SIR, DNN
|
Spain
|
Monte Carlo DNN model for spread and peak prediction
|
Total Infected Until 28th of March
|
The simplicity of the DNN allows to identify the SIR parameters for different COVID-19 evolution curves
|
Li Yan, et al.[142]
|
XGBoost ML Method
|
Wuhan, China
|
prognostic prediction
|
Data collected between 10/1/2020 and 18/2/2020
|
Quickly prediction of patients with high risk using suggested decision rule
|
Furqan Rustam, et al.[104]
|
LR, LASSO, SVM, ES
|
Canada,
Australia, Algeria
|
Future Forecasting
|
dataset from 22/1/2020 to 2/3/2020 is used for training of the model
|
ES has the best precision, while SVM performance is not acceptable
|
Alistair Martin, et al. [107]
|
Symptoma
|
No mentioned
|
digitally screening citizens for risks
|
BMJ cases: 1,112 cases
Test cases: 1,142 medical test cases
|
Symptoma can accurately distinguish COVID19 from diseases
|
Mohammad Pourhomayoun, et al.[143]
|
SVM, KNN,
|
Countries worldwide
|
Predicting Mortality Risk
|
117,000 patients world-wide
|
Obtained 93% precision in forecasting the mortality rate
|
Behrouz Pirouz, et al. [124]
|
GMDH
|
China
Japan
South Korea
Italy
|
confirmed cases analysis using
binary classification
|
The environmental and urban parameters from January 2020 to February 2020 (1 month)
|
The most effective parameters on the confirmed cases are maximum daily temperature and relative humidity had
|
Sina F. Ardabili, et al.[144]
|
MLP, ANFIS, GA, PSO and GWO
|
Iran, Germany,
USA, Italy, and China
|
Outbreak Prediction
|
Data were collected for five
countries on total cases in 1 month
|
ANFIS and MLP reported a high generalization ability for long-term forecasting
|
Majid Niazkar, et al.[145]
|
MGGP
|
China, South Korea, Iran, USA, Japan, and Italy
|
Country-based Prediction Models
|
The confirmed cases from 20
January to 5 April 2020
|
Each infected country has a different trend.
|
Rizk-Allah, et al. [127]
|
MFNN
(GA, PSO, GWO, ISA, ISACL)
|
USA, Italy, and Spain
|
Forecasting the confirmed cases of three countries
|
The data referring to the period
22/1/2020 to 3/4/2020
|
The presented ISACL-MFNN model has promising forecasting results from 4/ 4 / 2020 to 15 / 4 / 2020 are presented
|
Hasinur Rahaman Khan, et al. [146]
|
ML Techniques
|
133
countries
|
Demonstrating ML basic to analyze global COVID-19
|
The data include 10 variables until 17-th April, 2020
|
The countries which has important role to explain the 60% variation of the total variations include USA, Iran, UK, Germany, Spain, France, and Italy
|
K.M.U.B. Konarasinghe
[117]
|
ARIMA, LBQ, DES and ADLM
|
USA, UK, and Russia
|
Modeling COVID -19 Epidemic
|
The data of 22nd January 2020 to 28th May 2020
|
The ARIMA did not satisfy the model validation but the ADLM and DES did
|
Jayson S. Jia [122]
|
Statistical Methods using mobile phone
|
China
|
Spatio-temporal distribution
|
About 10 million counts of mobile phone data between 1/1/2020 and 24/1/2020 to 296 prefectures
|
Developing a spatio-temporal ‘risk source’ model
|
Gergo Pinter, et al. [116]
|
ANFIS, MLP-ICA,
|
Hungary
|
Pandemic Prediction;
A Hybrid ML Approach
|
The data from 24 March to 19 April
|
Results Prediction from April 20 to July 30
|
O. Torrealba-Rodriguez, et al.
[118]
|
Gompertz, Logistic and ANN models
|
Mexico
|
Modeling and prediction
|
The data of 27 February to 8 May
|
R2 of 0.9998, 0.9996 and 0.9999- Prediction of daily cases on 8 May, 25 June and 12 May
|
To break time series into various intrinsic mode functions, Bayesian regression neural network, quantile random forest, cubist regression, support vector regression, and k-nearest neighbors, were employed alone, and used with the recent pre-processing variational mode decomposition (VMD) [137]. Furthermore, to assess coronavirus transmission, 8.57 million Switzerland population along with cross-border commuters as well as the stimulated Swiss public and private transport network were studied [138]. Individual contacts and transmission pathways were settled by simulating day to day activities calibrated with micro-census data. Statistical data available to the public and adapted to Swiss demographics was used as the basis of COVID-19 epidemiology [138].
In [97], AI-inspired methods were developed to model the epidemic’s transmission dynamics and evaluate interventions for curbing COVID-19 spread and impact. These methods focused on WHO data from March 16th, 2020 onward and were used to process data related to new COVID-19 cases as well as the cumulative data as reported by this organization. Accordingly, the timing and intervention degree were evaluated while the five-step average error prior to prediction was 2.5%. The global maximum number of cumulative cases, new cases, and total peak number of cumulative cases with complete intervention 4 weeks after the initial date (March 16th, 2020) mounted to 255,392,154, 10,086,085, and 75,249,909, respectively [97]. With the use of five ML algorithms (SVM, gradient boosting, logistic regression, random forest, and K nearest neighbor) [139] predicted confirmed COVID-19 patients mortality between 20/01/2020 and 07/04/2020 to be (n=3,022) . A comparison of performance of the algorithms demonstrated that the most suitable algorithm was deployed as the online tool for prediction.
In addition, Ref. [147] suggested a preliminary classifier which included non-linear hybrid cellular automata tested and trained to forecast COVID-19 effects with regard to the number of deaths, the number infected people individuals, the number of recovered individuals, etc. The datasets for this study was from Kaggle and other standard websites, and it could predict the epidemic trend in India.
In [141], an AI approach which is based on DNN predicted the peak of coronavirus in Spain. Data generation process in this method was based on Monte Carlo simulations of SIR epidemiology models and DNN prediction model development. This approach’s simplicity with the DNN facilitated the identification of SIR parameters for various COVID-19 evolution curves that could assist researchers to identify curves related to various COVID-stricken population sizes. Although this could not be an ultimate study in this regard and further research is still needed, this study has obtained the SIR model parameters correctly and has generated a population-dependent model.
Ref. [148] too proposed a model to predict COVID-19 spread. To predict epidemiological examples of COVID-19 cases in India, this study used MLP, vector auto regression, and linear regression methods for desire on the COVID-19 Kaggle data. Statistical analysis demonstrated a correlation that exists between the swab tests numbers and mild cases admitted to hospital, daily positive cases, recovery, intensive care cases, and death rate, which provided the foundation for an AI study [114]. A multivariate linear regression (MLR) method was used for results validation. Also, Ref. [143] utilizes the data of 117,000 COVID-19 patients whose infection was confirmed by laboratories to present an AI model that could be used by hospitals and medical facilities to determine patients with a higher priority for hospitalization at the time when the system was prone to be overwhelmed by in-coming patients and significantly reduce delays in the process of care provision. Besides, the approach to forecast COVID-19 along with efforts of Public Health Agency of Canada for modeling the effects of Non-Pharmaceutical Interventions (NPIs) on COVID-19 transmission among Canadian population with the purpose of supporting public health decisions was described in [149]. Additionally, the joint effort of health care organizations, government agencies, and industry partners from around the globe to investigate pandemic’s challenges during social distancing was investigated in [150]. The investigated challenges included conducting treatment research, enabling virtual health care, and scaling high-quality laboratory tests during social distancing [151].