Figure 2 shows the results of linear relationship analysis between the independent and dependent variables by a matrix scatter plot. Figure 2 shows that there is no linear relationship between the frequency of positive cases in 31 provinces and independent parameters. Of course, it should be noted that the scatter plot only shows the totality of the relationships between the data and does not show the relationship details in full dimensions (21). Therefore, to examine the exact relationship between environmental factors and frequency of positive cases, the graphs of each province should be drawn separately. In this study, Khuzestan province was selected as a sample to study the trend of frequency of positive cases and environmental factors. In addition to plotting the matrix dispersion in order to achieve a suitable model in the neural network, it is necessary to study the relationship between independent variables and input parameters, so Pearson correlation coefficient was used to determine the relationship between independent variables. Pearson correlation coefficient, also called torque correlation coefficient or zero-order correlation coefficient, was used to determine the relationship, type and direction of two distances or relative variables or a distance variable and a relative variable (22). The calculation of Pearson correlation coefficient was obtained from the following equation:
In the correlation coefficient formula X and Y, the values of each variable also and are their average. Obviously, the denominator of the fraction uses the product of the sum of variance. The closer the Pearson correlation coefficient is to number one, the more direct the relationship between the two variables. Therefore, according to Table 2, the obtained correlation coefficients can be the reason for the lack of relationship between the independent and the dependent variables (Figure 1). Also, the correlation coefficients between frequency of positive cases and minimum, average and maximum temperatures and RH are -0.021, -0.133 - 0.091 and 0.037, respectively, which indicates an inverse or no relationship. But the obtained correlation coefficient between the independent variables indicates the existence of a high relationship between maximum temperature and average temperature. The obtained coefficient is 0.817, which shows a very high dependence between these two variables. As correlation coefficient with the value higher than 0.8 indicates a strong correlation between variables, therefore the input variables were selected correctly in this study (23).
Artificial Neural network model
The MLP neural network architecture was implemented based on the topologies expressed in the method section in the MATLAB environment and the average accuracy from the different turns of its implementation in two stages of training and testing was listed in Table 3. However, the choice of the maximum number of hidden layers and the maximum number of neurons in each hidden layer was determined based on trial and error. The best average accuracy obtained in the training stage was 87.25%, which is related to model number 19, and the best average accuracy in the testing stage was 86.4%, which belongs to models number 10 and 15.
Multiple linear regression analysis
Multiple linear regression analysis was used to investigate the effect of environmental factors on the frequency of positive cases’ trend. For this purpose, data related to environmental factors in the city of Qom, the capital of a province with the first reported Covid-19 cases and with cold weather condition, and the city of Ahvaz, the capital of province in southwest of Iran with tropical condition, were analyzed and compared. The results of regression model are expressed in equations 5 and 6. The equation 5 is related to the regression model of Qom city and the next equation is related to the regression model of Ahvaz city.
In the above equations, T is the climate temperature. The results showed that the coefficient of determination of R2 in the first and second equations are 0.40 and 0.68, respectively. It shows that despite the appropriateness of R2, the frequency of positive cases in Ahvaz has increased. Also, the predictor variables had the strong role in the prediction of confirmed cases of COVID-19. Co-linearity diagnostics analysis in the results of the second equation indicates the problem of alignment and dependence and overlap of some predictor variables, so it is possible to use a regression model in Ahvaz to justify changes in the frequency of positive cases’ trend to be a bit misleading. However, a careful study of the output of equation No. 1, despite the lower R2, indicates no misalignment, and in this equation, where the frequency of positive cases has decreased relatively, the variables of average temperature, RH and maximum temperature have the highest share and minimum temperature has the lowest share in frequency of positive cases’ changes.
Correlation analysis charts
The relationship between frequency of positive cases and environmental factors was plotted for Ahvaz the capital city of Khuzestan province and is presented in Figure 3. The results obtained from Figure 3 show that from the 4th of March to the 13th of March in 2020, at the same time with the initial outbreak of this disease, despite the increase in temperature and RH, the disease has also increased. Also, from April 1st to 22th, 2020 a growing trend for the frequency of positive cases was reported.