Tree-based machine learning approaches have high accuracy in the analysis of small and big datasets in previous research studies [36–37]. In the case of analysis of the disease data, the GBM was used to predict the association of miRNAs [36]. Besides, the improved performance of the GBM in the predictive modeling of the pandemic has been discussed [36]. This is the reason for selecting the GBM model in the prediction of the COVID-19 cases in India using the atmospheric factors and pollution levels. Due to a large geographical area, there is a huge variation in atmospheric factors (Fig. 1 and Table 1) in different states of India. Besides, the pollution levels also vary in different states, which is obvious from the variation of minimum and maximum PM10 and PM2.5 (Fig. 2 and Table 1). The basic statistics in Table 1 and Fig. 3 demonstrates the variation in the cases of COVID-19 in different states of India. The basic statistics on the atmospheric factors, pollution measures, and cases of COVID-19 suggest their unequal distribution.
The training performance results of twinned GBM for infected cases on the combined dataset of significant states of India provide R2 = 0.99, and RMSE = 834.90 with Poisson distribution, R2 = 0.97, and RMSE = 1527.28 with Gaussian distribution, R2 = 0.96, and RMSE = 1214.40 with Tweedie distribution and R2 = 0.85 and RMSE = 1239.84 with Gamma distributions. The training performance results of twinned GBM for recovered cases on the combined dataset of significant states of India provide R2 = 0.99, and RMSE = 712.99 with Poisson distribution, R2 = 0.98, and RMSE = 1244.66 with Gaussian distribution, R2 = 0.97, and RMSE = 1052.15 with Tweedie distribution and R2 = 0.81 and RMSE = 3272.82 with Gamma distributions. The training performance results of twinned GBM for mortality case on the combined dataset of significant states of India provides R2 = 0.99, and RMSE = 8.49 with Poisson distribution, R2 = 0.97, and RMSE = 14.64 with Gaussian distribution, R2 = 0.98 and RMSE = 11.55 with Tweedie distribution and R2 = 0.85 and RMSE = 38.20 with Gamma distributions. The complete performance result for infected, recovery, and mortality cases are presented in Table 3, Fig. 4, Fig. 5, and Fig. 6 respectively. The performance results of the twined GBM with all four selected four distributions (Poisson, Gaussian, Tweedie, and Gamma) are quite good and quite better it assures that there is a close correlation among the atmospheric factor, air pollutants, and COVID-19 parameters and the study may move for the further processing.
Now the trained model has applied the dataset to the seven largely affected states of India to explore the deeper analysis and correlation for testing. At first, one of the worst affected Maharashtra is taken for testing. Surprisingly the performance result of the infected case provides a very convincing correlation as R2 = 0.90, and RMSE = 5161.50 with Poisson distribution, R2 = 0.90, and RMSE = 5235.36 with Gaussian distribution, R2 = 0.88, and RMSE = 5692.20 with Tweedie distribution and R2 = 0.78 and RMSE = 7840.69 with Gamma distributions. In the case of recovery, it also approves the hypothesis with R2 = 0.87, and RMSE = 5935.77 with Poisson distribution, R2 = 0.89, and RMSE = 5432.84 with Gaussian distribution, R2 = 0.85 and RMSE = 6362.20 with Tweedie distribution and R2 = 0.71 and RMSE = 8767.26 with Gamma distributions. In the case of mortality, the performance results are also in the same hypothesis line as R2 = 0.84, and RMSE = 86.49 with Poisson distribution, R2 = 0.88, and RMSE = 75.96 with Gaussian distribution, R2 = 0.83 and RMSE = 90.38 with Tweedie distribution and R2 = 0.65 and RMSE = 130.92 with Gamma distributions. The complete performance result for Maharashtra is already shown in Table 4, Fig. 7, Fig. 8, and Fig. 9 respectively.
Secondly, the model is tested for the largely affected state of Delhi. The performance result of this testing is R2 = 0.75, and RMSE = 2664.40 with Poisson distribution, R2 = 0.78, and RMSE = 2724.86 with Gaussian distribution, R2 = 0.74, and RMSE = 2501.72 with Tweedie distribution and R2 = 0.69 and RMSE = 2957.83 with Gamma distributions. In the case of recovery, it also approves the hypothesis with R2 = 0.88, and RMSE = 5935.77 with Poisson distribution, R2 = 0.81, and RMSE = 5432.84 with Gaussian distribution, R2 = 0.85 and RMSE = 6362.20 with Tweedie distribution and R2 = 0.67 and RMSE = 8767.26 with Gamma distributions. In the case of mortality, the performance results are also in the same hypothesis line as R2 = 0.73, and RMSE = 36.07 with Poisson distribution, R2 = 0.72, and RMSE = 37.08 with Gaussian distribution, R2 = 0.69 and RMSE = 38.72 with Tweedie distribution and R2 = 0.51 and RMSE = 49.22 with Gamma distributions. The complete performance result for Maharashtra is already shown in Table 5, Fig. 10, Fig. 11, and Fig. 12 respectively.
Third, the trained model has applied the testing dataset of the significant state of Karnataka. The performance result of this testing is as R2 = 0.79, and RMSE = 4456.73 with Poisson distribution, R2 = 0.84 and RMSE = 3786.50 with Gaussian distribution, R2 = 0.74 and RMSE = 4945.09 with Tweedie distribution and R2 = 0.54 and RMSE = 6606.13 with Gamma distributions. In the case of recovery, it also approves the hypothesis with R2 = 0.55, and RMSE = 4969.39 with Poisson distribution, R2 = 0.63, and RMSE = 4463.39 with Gaussian distribution, R2 = 0.50, and RMSE = 5250 with Tweedie distribution and R2 = 0.31 and RMSE = 6143.52 with Gamma distributions. In the case of mortality, the performance results are also in the same hypothesis line as R2 = 0.64, and RMSE = 56.93 with Poisson distribution, R2 = 0.71, and RMSE = 51.71 with Gaussian distribution, R2 = 0.60, and RMSE = 60.03 with Tweedie distribution and R2 = 0.39 and RMSE = 74.71 with Gamma distributions. The complete performance result for Karnatka is already shown in Table 6, Fig. 13, Fig. 14, and Fig. 15 respectively.
Fourth, the trained model has applied the testing dataset of the significant state of Kerala. The performance result of this testing is as R2 = 0.76, and RMSE = 3982.07 with Poisson distribution, R2 = 0.76 and RMSE = 4027.15 with Gaussian distribution, R2 = 0.74, and RMSE = 4159.93 with Tweedie distribution and R2 = 0.59 and RMSE = 5251.54 with Gamma distributions. In the case of recovery, it also approves the hypothesis with R2 = 0.47, and RMSE = 5895.52 with Poisson distribution, R2 = 0.56, and RMSE = 5319.29 with Gaussian distribution, R2 = 0.43, and RMSE = 6212.97 with Tweedie distribution and R2 = 0.19 and RMSE = 6835.58 with Gamma distributions. In the case of mortality, the performance results are also in the same hypothesis line as R2 = 0.59, and RMSE = 11.37 with Poisson distribution, R2 = 0.37, and RMSE = 14.09 with Gaussian distribution, R2 = 0.58, and RMSE = 11.42with Tweedie distribution and R2 = 0.46 and RMSE = 13.08 with Gamma distributions. The complete performance result for Kerala is already shown in Table 7, Fig. 16, Fig. 17, and Fig. 18 respectively.
Fifth, the trained model has applied the testing dataset of the significant state of Madhya Pradesh. The performance result of this testing is R2 = 0.87, and RMSE = 1048.39 with Poisson distribution, R2 = 0.80, and RMSE = 1317.66 with Gaussian distribution, R2 = 0.86, and RMSE = 1109.07 with Tweedie distribution and R2 = 0.59 and RMSE = 1481.84 with Gamma distributions. In the case of recovery, it also approves the hypothesis with R2 = 0.88, and RMSE = 5895.52 with Poisson distribution, R2 = 0.81, and RMSE = 5319.29 with Gaussian distribution, R2 = 0.85, and RMSE = 6212.97 with Tweedie distribution and R2 = 0.67 and RMSE = 6835.58 with Gamma distributions. In the case of mortality, the performance results are also in the same hypothesis line as R2 = 0.84, and RMSE = 8.80 with Poisson distribution, R2 = 0.74, and RMSE = 11.29 with Gaussian distribution, R2 = 0.84, and RMSE = 8.70with Tweedie distribution and R2 = 0.65 and RMSE = 13.10 with Gamma distributions. The complete performance result for Madhya Pradesh is already shown in Table 8, Fig. 19, Fig. 20, and Fig. 21 respectively.
Sixth, the trained model has applied the testing dataset of the significant state of Uttar Pradesh. The performance result of this testing is as R2 = 0.79, and RMSE = 3552.24 with Poisson distribution, R2 = 0.80, and RMSE = 3225.86 with Gaussian distribution, R2 = 0.75, and RMSE = 3616.14 with Tweedie distribution and R2 = 0.67 and RMSE = 4131.75with Gamma distributions. In the case of recovery, it also approves the hypothesis with R2 = 0.88, and RMSE = 967.78 with Poisson distribution, R2 = 0.81, and RMSE = 1208.69 with Gaussian distribution, R2 = 0.85, and RMSE = 1068.95 with Tweedie distribution and R2 = 0.67 and RMSE = 1588.11 with Gamma distributions. In the case of mortality, the performance results are also in the same hypothesis line as R2 = 0.73, and RMSE = 36.07 with Poisson distribution, R2 = 0.72, and RMSE = 37.08 with Gaussian distribution, R2 = 0.69 and RMSE = 38.72 with Tweedie distribution and R2 = 0.51 and RMSE = 49.22 with Gamma distributions. The complete performance result for Uttar Pradesh is already shown in Table 9, Fig. 22, Fig. 23, and Fig. 24 respectively.
Seventh, the trained model has applied the testing dataset of the significant state of West Bengal. The performance result of this testing is as R2 = 0.79, and RMSE = 2012.91 with Poisson distribution, R2 = 0.78 and RMSE = 2066.74 with Gaussian distribution, R2 = 0.64 and RMSE = 2640.02 with Tweedie distribution and R2 = 0.68 and RMSE = 247127 with Gamma distributions. In the case of recovery, it also approves the hypothesis with R2 = 0.80, and RMSE = 1723.12 with Poisson distribution, R2 = 0.72, and RMSE = 2053.87 with Gaussian distribution, R2 = 0.78, and RMSE = 1825.22 with Tweedie distribution and R2 = 0.60 and RMSE = 2475.82 with Gamma distributions. In the case of mortality, the performance results are also in the same hypothesis line as R2 = 0.78, and RMSE = 14.81 with Poisson distribution, R2 = 0.63, and RMSE = 19.17 with Gaussian distribution, R2 = 0.75 and RMSE = 15.64 with Tweedie distribution and R2 = 0.57 and RMSE = 20.60 with Gamma distributions. The complete performance result for West Bengal is already shown in Table 10, Fig. 25, Fig. 26, and Fig. 27 respectively.
The above-discussed performance parameter and the rest of the parameters are demonstrated in Table 4–10 and Figs. 7–27 suggests that the Maharashtra had an ideal atmosphere for infection, recovery, and mortality with R2 = 0.99 in all three with the Poisson distribution. The testing model on Delhi is not so much performing on infection and recovery rate but it supports the mortality rate. The maximum performance was given by Gaussian distribution with R2 = 0.78 for the infection rate, R2 = 0.78 for recovery rate and R2 = 0.83 for the mortality rate. R2 = 0.84 for an infection rate for the Karnataka state, recovery provides R2 = 0.63 and mortality R2 = 0.71 by Gaussian distribution. Kerala infection rate R2 = 0.71 and recovery rate R2 = 0.56 provided by Gaussian distribution and mortality rate R2 = 0.59 by Poisson distribution does not support; it might lack non-arability/missing of the correct atmospheric or pollution dataset. Madhya Pradesh, maximum infection rate, recovery rate, and mortality rate R2 = 0.87, R2 = 0.88, and R2 = 0.84 respectively by Poisson distribution. Uttar Pradesh, maximum infection rate, recovery rate, and mortality rate R2 = 0.80, R2 = 0.88, and R2 = 0.73 respectively by Poisson distribution. West Bengal, maximum infection rate, recovery rate, and mortality rate R2 = 0.79, R2 = 0.80, and R2 = 0.78 respectively by Poisson distribution.
The COVID parameter according to the testing performance conclusion:
Infection Rate: Maharashtra > Madhya Pradesh > Uttar Pradesh > West Bengal > Karnataka > Delhi > Kerala
Recovery Rate: Maharashtra > Madhya Pradesh > Uttar Pradesh > West Bengal > Karnataka > Kerala
Mortality Rate: Maharashtra > Madhya Pradesh > Delhi > West Bengal > Uttar Pradesh > Delhi > Karnataka > Kerala
The adverse effect of weather parameters like temperature and humidity on the cases of COVID-19 has been reported in some of the recently published research, like high spread rate at low temperature and humidity in Iran [11]; low spread rate at high humidity and temperature in China [16]; and low spread rate of high average humidity and temperature [15]. The impact of additional atmospheric factors like air pressure and wind speed are not been properly noticed in any recent studies. A positive correlation between air pollution and the cases of COVID-19 has been established in some studies, like air pollution and spread rate in Italy and China [20–22]. Moreover, the atmospheric factors and the air pollution levels are also related; therefore, the present study explored their combined effect (rate of spread) of COVID-19 in major states/places of India using the twinned GBM model. It was noticed that the states having lower mean temperature, humidity, and air pollution as Uttarakhand, Arunachal Pradesh, Himachal Pradesh, Sikkim, Mizoram, etc. have a smaller number of infected, and mortality cases and a higher number of recovered cases than other states/places with high mean temperature, humidity, and air pollution as Maharashtra, Delhi, Karnataka, Kerala, and Madhya Pradesh, etc. However, in some states, it is still difficult to understand the correlation between the spread rate of COVID-19, atmospheric factors, and air pollution measures. The collected data and the analysis outcomes of the different distribution of GBM suggest a significant correlation between the spread rate of COVID-19, atmospheric factors, and air pollution measures in most of the states of India. Besides, the high population density of some of the states and activities of people towards the government regulations, movement of migrant workers, social gatherings, etc. during the lockdown period are also some factors responsible for the spread of COVID-19.
Maharashtra, Delhi, Kerala, Karnataka, Madhya Pradesh, Uttar Pradesh, and West Bengal are worst affected states than other states of India. The predicted numbers of infected cases in Maharashtra, Madhya Pradesh, and Uttar Pradesh by different distribution of GBM are equal to their exact values for most of the day (Figs. 19–24). Therefore, Maharashtra was the ideal place for the spread and mortality. The missing information on the atmospheric factors, air pollution measures, and cases of COVID-19 in the duration of data collection may be one of the reasons for the average and poor forecast metrics of the different distribution of GBM for some states.