In Table 2, the summary of the statistical analysis of the data are presented. This table shows the amounts of anions in addition to cations in milligrams per liter. pH is in mol/l.
Table 2
Summary of the Karoon River water quality data at Ahvaz Hydrometer Station
Parameter | Max | Std | Avg | Min |
pH | 8.90 | 0.25 | 7.97 | 7.10 |
Na+ | 506.91 | 69.58 | 143.08 | 31.10 |
K+ | 7.20 | 1.17 | 1.96 | 0.00 |
Ca2+ | 229.45 | 24.84 | 81.56 | 29.45 |
Mg2+ | 66.78 | 10.58 | 27.25 | 6.69 |
TH | 625.50 | 84.5 | 315.46 | 155 |
TDS | 2000 | 283.95 | 801.92 | 278 |
EC | 3190 | 434.36 | 1263.93 | 537 |
CO32− | 31.51 | 2.10 | 0.30 | 0.00 |
HCO3− | 246.87 | 26.84 | 170.83 | 36.25 |
SO42− | 562 | 88 | 179 | 45 |
Cl− | 789.28 | 106.36 | 219.43 | 51.78 |
SAR | 7.90 | 1.25 | 3.67 | 1.12 |
Table 2 presents the maximum values, standard deviations, mean and the minimum of water quality parameters. Comparing the values of this table and quality standards, the quality gradation of the Karoon River water can be determined. Using the sum of the cations and anions, accordingly, the ion load equilibrium error value was 0.0037. Thus, all data are used for chemical quality analysis. It should be noted that the change in TDS causes a change in TH and SAR values and the time series graph of TDS of Karun river water has an increasing trend. Therefore, it increases the values of TH and SAR, and its results are consistent with other methods and confirm the accuracy of the methods. Figures (2) and (3) present the trends analysis for TH (mg/l) in addition to SAR. In the figures, the x-axis is dimensionless.
The equation of the TH time series over the statistical period using Eq. (5) is as follows:
Yt=261.1 + 0.829×(t) (5)
The relationship of the SAR time series over the statistical period using the form (6) is as follows:
Yt=3.052 + 0.00262×(t) (6) t is the time
Comparing the average of TDS with the Schuller Diagram, the water quality of the Karoon River in Ahvaz City is considered acceptable. Similarly, comparing the average of EC with Table 1, water quality has a high risk for soil. In Fig. 4 the slope of the line in the time series is positive. Hence, it can be seen that these parameters have an upward trend. Thus, the quality of water for drinking and agriculture has been challenged. Table 3 presents the significant test results in addition to the coefficients of SAR, TH, and water quality parameters.
Table 3
Correlation coefficient and significant test of anions and cations with TH and SAR
| TH(mg/l) | SAR |
| R2 | Significance | R2 | Significance |
SO42− | 0.921 | 0.000 | 0.584 | 0.000 |
Cl− | 0.730 | 0.000 | 0.949 | 0.000 |
CO32− | 0.104 | 0.025 | 0.218 | 0.000 |
HCO3− | 0.123 | 0.008 | 0.057 | 0.222 |
Na+ | 0.732 | 0.000 | 0.965 | 0.000 |
Ca2+ | 0.861 | 0.000 | 0.285 | 0.000 |
K+ | 0.420 | 0.000 | 0.223 | 0.000 |
Mg2+ | 0.666 | 0.000 | 0.649 | 0.000 |
The significance level was 95%. Table 3 shows that TH correlates significantly with all anions and cations. Based on the coefficient variable, SO42− and Ca2+ were the dominant anions in addition to the cations of TH. In addition, SAR correlated significantly with all cations. Of the anions, HCO3− is not significantly associated with SAR. The coefficient of Na+ in addition to Cl−− and SAR was greater than 0.8. Nouraki et al. (2021) determined Na + and Cl− as qualitative variables affecting SAR in another study in the Karun River. And Al Obaidi et al. (2020) determined Na+ as the most effective ion on SAR, which is consistent with the research results. The present tense is consonant. Azad et al. (2019) used Na+, SO4 and Cl− as model design variables to estimate SAR and also used Na+, SO4, Cl− and Mg to predict TH. Among the above-mentioned anions and cations, the most effect was related to Na+.
To use the data of anions and cations, the data were tested for normality using the Monte Carlo test that was dimensionless. Moreover, given that the Monte Carlo constant for all of them was 0.000, the values used had a normal distribution.
Anion and cation application scenarios for TH and SAR are used to implement the tree model as well as the K-NN method. However, the combination of anion and cation scenarios has not been used to execute the tree model. Table 4 shows how to use these scenarios.
Table 4
Tree Model and K-NN Method Implementation Scenarios
Parameters | Scenarios |
| SO42− |
TH | Ca2+ |
| SO42+ and Ca2+ |
| Na+ |
SAR | Cl− |
| Na+ and Cl− |
3.1-Nearest Neighborhood Method
To run this model using the SPSS Package, 70% of data were used for training, and the rest 30% for durability and testing. The distance calculation was based on the Euclidean method. The number of the k was five. In the research, k values were considered between 3 and 5 in order to be able to choose the optimal value. Determining the optimal k value: The optimal k value for which the error of the calculated value of TH and SAR is the lowest by the nearest neighbor method has been obtained:
TH: The linear regression equations between the real and calculated values by the nearest neighbor method using SO42- were determined as follows:
THact=59.23 + 0.817THKnn R2 = 0.815 k = 3 (7)
THact=25.5 + 0.917THKnn R2 = 0.947 k = 4 (8)
THact=19.56 + 0.946THKnn R2 = 0.935 k = 5 (9)
The correlation of the estimated data and the measured TH data was 0.867 using SO42- data.(Fig. 5). The coefficient of the estimated and measured data was 0.862 using Ca2+ data. Similarly, Fig. 6 show the correlation between TH, Ca2+ and SO42- data. And show the results of the implementation of the K-NN model and the correlation between the obtained and measured data.
In Fig. 7 the time series diagram of the real and estimated data using the nearest neighbor method and SO42- is shown for different k value.
The linear regression equations between the real and calculated values using the nearest neighbor method using Ca2+ were obtained as follows:
THact=24.94 + 0.917THKnn R2 = 0.698 k = 3 (10)
THact=24.68 + 0.919THKnn R2 = 0.719 k = 4 (11)
THact=-8.04 + 1.033THKnn R2 = 0.719 k = 5 (12)
The obtained correlation coefficients are all smaller than 0.8. Therefore, Ca 2+ data cannot be used to implement the K-NN model to predict TH. Because the correlation coefficients for k = 4 and k = 5 are equal. Therefore, other statistical parameters should be used to select the optimal coefficient k. In Fig. 8 the time series diagram of the real and estimated data using the nearest neighbor method and Ca2+ is shown for different k value.
Figure 9 presents the result of the K-NN method using Ca2+ and SO42- to examine TH by the SPSS Package.And it can be seen that the results obtained by using different k values match with the real data. The regression relationships between the real and calculated values of TH for k = 3, 4 and 5 were obtained as follows:
THact=5.364 + 0.988THKnn R2 = 0.886 k = 3 (13)
THact=-4.121 + 1.015THKnn R2 = 0.892 k = 4 (14)
THact=-17.34 + 1.06THKnn R2 = 0.873 k = 5 (15)
The correlation coefficients of the above relationships indicate the accuracy of TH prediction results using SO42-Therefore, it can be said that using SO42-data to estimate TH provides more accurate results than Ca2+ and the combination of Ca2+ and SO42- based on the correlation coefficient. The obtained time series shows the high accuracy of the results. In Fig. 10, the linear correlation of TH measured and predicted by the K-NN model with the combination of Ca2+ and SO42- is shown.
In Fig. 11, the coefficient of the TH estimation model was 0.992 using a combination of SO42− and Ca2+. Accordingly, the fitted line is closer to the first-quarter bisector of the coordinate system, which is a 450 angle, showing that the result is satisfactory. Linear regression equations between real and calculated SAR values were determined by the nearest neighbor method using Cl−ion as follows:
(16) k = 3 SARcal=0.303 + 0.923SARobs R2 = 0.92
(17) k = 4 R2 = 0.921 SARcal=0.275 + 0.926SARobs
SARcal=0.25 + 0.935SARobs R2 = 0.927 k = 5 (18)
Based on the correlation coefficient, there is a very small difference between the three correlation coefficients. Figure 11 shows the time series of real and estimated SAR data using the K-NN method and Cl- anion.
Linear regression equations between real and calculated SAR values were determined by the nearest neighbor method using Na+ as follows:
19) ) k = 3 SARcal=0.207 + 0.945SARobs R2 = 0.875
(20) k = 4 R2 = 0.875 SARcal=0.438 + 0.879SARobs
SARcal=0.391 + 0.887SARobs R2 = 0.885 k = 5 (21)
Based on the correlation coefficient, the results for k = 5 were more favorable compared to k = 3,4 values. Figure 12 show the K-NN method outputs with Na+ and Cl- to estimate SAR.
According to Fig. 13, the coefficient of the SAR data obtained from the K-NN method and the observed data was 0.991. In conclusion, the K-NN method is good at estimating SAR. And based on the explanations given and the diagram obtained in Fig. 14 it is observed that the K-NN model provides favorable results for SAR estimation using Na+ and Cl-.
Based on Figs. 11t0 13, the results obtained from the K-NN method did not exhibit good agreement with the observed data. High-precision relation cannot be used to estimate SAR using Na+ and Cl- data separately. The results obtained by Dezfooli et al. (2017) indicated that K-NN had 10% error during calibration similar to validation stages. Kim et al. (2015) showed that the disparity in predictive accuracy was around 5% under dry and wet weather conditions. Babbar and Babbar (2017) observed that the wrong water quality class was around 2%-28% for the K-NN method. In addition, it was 1%-38% and 10% -20% for artificial neural network and rule-based classifiers, respectively.
3.2-Decision Tree Model
In this analysis, 70% of the data were used in the training stage. At last, 30% of data were used in the testing stage. The effective parameter was considered TDS. The method used for the growing method is the CHAID method. Figure 14 presents the output of the tree model for TDS estimation. The figure includes training as well as testing stages using Ca2+.The coefficients of correlation between TH and Ca2+ were 0.999 and 0.757 in training and testing stages, respectively. In addition, the coefficient of TH and SO42- in the training and testing stages was 1.00 and 0.974, respectively.(Fig. 15).
The coefficients of correlation of the tree model to estimate SAR were 0.919 and 0.983 using Na+ data for training and testing stages, respectively. Furthermore, the coefficients for the tree model in training and testing stages using Cl- was 0.999 and 0.998, respectively. Figure 16 shows the outputs of the tree model to estimate SAR using Na+ and Cl-.
Assuming the linear relationship of Na+ and Cl- with SAR, single variable regression was applied to estimate SAR as follows:
SAR = 0.518 Na+ R2 = 0.983 (22)
SAR = 0.52Cl- R2 = 0.983 (23)
Tables 5 and 6 present the results of the ANOVA test in addition to estimation of coefficients. The tables show the sum of squares and mean squares for regression and residuals. A significance level of the ANOVA test of each variable is presented. In addition, F is the test statistic and df represents degrees of freedom. Also, Ebadati and Hooshmandzadeh (2019) identified Cl- in addition to Na+ as dominant over SAR. They obtained equations with the coefficient for Na+ (0.946) and Cl- (0.928). Sarani et al. (2012) obtained a lower coefficient compared to the present study. Due to this research, they regarded pH as a physical property of water.
Table 5
ANOVA test of Na+ and Cl− with SAR
Na+ | Sum of Squares | df | Mean Square | F | Sig. |
Regression | 6916.683 | 1 | 6916.683 | 27519.947 | 0.000 |
Residual | 117.373 | 467 | 0.251 | - | - |
Total | 7034.056 | 468 | - | - | - |
Cl− | | | | | |
Regression | 6904.418 | 1 | 6904.418 | 24871.966 | 0.000 |
Residual | 129.638 | 467 | 0.278 | - | - |
Total | 7034.056 | 468 | - | - | - |
Table 6
ANOVA test of Ca2+ and SO42− with TH
Ca2+ | Sum of Squares | df | Mean Square | F | Sig. |
Regression | 5.374E7 | 1 | 5.374E7 | 17342.261 | 0.000 |
Residual | 1447077.18 | 467 | 3098.666 | - | - |
Total | 5.518E7 | 468 | - | - | - |
SO42− | | | | | |
Regression | 6904.418 | 1 | 6904.418 | 24871.966 | 0.000 |
Residual | 129.638 | 467 | 0.278 | - | - |
Total | 7034.056 | 468 | - | - | - |
Based on Tables 6 and 7, the regression equations were significant. Table 7 presents the results of estimating the coefficients of SO42- and Ca2+ with TH, as well as SAR with Na+ and Cl-. Here, m is the test statistic, B is the correlation equation, and Sig means significant.
Table 7
Coefficients of correlation equations
| Unstandardized Coefficients | Standardized Coefficients | m | Sig. |
B | Std. Error | Beta |
Na+ | 0.518 | 0.003 | 0.992 | 165.891 | 0.000 |
Cl− | 0.520 | 0.003 | 0.991 | 157.708 | 0.000 |
Ca2+ | 3.747 | 0.570 | 0.987 | 131.69 | 0.000 |
SO42− | 1.558 | 0.873 | 0.972 | 89.292 | 0.000 |
Since Na+ and Cl− tend to combine to form the salt, the coefficient of them will be greater than 0.9. In general, multivariate regression using Na+ in addition to Cl− cannot be used to estimate SAR, because the “linearity” problem occurs. In short, multivariate regression was unable to produce accurate results. Linearity is a characteristic of a relationship or mathematics function, being displayed in visual form as a straight line. In Fig. 17, the effect of "collinearity", the linear correlation of Na and Cl is shown
In Table 8, the correlation coefficients between real and predicted data using the tree method and different growth algorithms of this method are presented. As can be seen, SAR prediction by Cl− and CHAID algorithm gives the highest correlation coefficient. In general, all R2 coefficients obtained with three growth algorithms and Cl− are higher than correlation coefficients with Na+. The smallest range of R2 coefficient changes is related to the combination of Ca2+ and SO42−. Based on this, it can be said that in using the combination of these two ions, the type of growth algorithm had no effect on the results of the tree model. In predicting TH using Ca2+and SO42− separately, CHAID algorithm has the highest correlation coefficient.
Table 8
Correlation coefficient of SAR and TH prediction tree model results with tree model growth algorithms
growth algorithms | Input modle |
Na+ | Cl− | Ca2+ | SO42− | Ca2+and SO42− |
CHAID | 0.944 | 0.987 | 0.979 | 0.985 | 0.985 |
Exhaustive CHAID | 0.891 | 0.978 | 0.973 | 0.982 | 0.983 |
CART | 0.893 | 0.979 | 0.972 | 0.975 | 0.984 |
A linear regression between TH, SO42 and Ca2+ as follows in Eqs. (28) and (29),
TH = 1.558SO42− R2 = 0.945 (24)
TH = 3.747 Ca2+ R2 = 0.974 (25)
SO42- and Ca2+ are in meq/l. The obtained coefficients indicate the high accuracy of the results of these equations.
Based on the linear relation between TH, SO42- and Ca2+ using the SPSS 18.00 Package, the following multivariate regression was obtained:
TH = 2.564Ca2++0.521SO42− p = 0.014 (26)
p significant coefficient of regression was obtained.
Considering the obtained p-value, the obtained multivariate regression is assumed to have significant results. Azad et al. (2019) showed that Ca+ 2, Mg + and Cl- have the highest correlation with TH. Contrary to the result of the present research, the correlation coefficient of TH and SO4 (R2 = 0.34) was obtained. The difference between the results can be due to the different geological conditions and the effect of evaporative formations on the water quality of the Karoon River.
Ebadati and Hooshmandzadeh (2019) presented two bivariate regression equations to calculate TH in terms of TDS and EC with p = 0.001. In addition, they presented a trivial regression in addition to TDS and EC with the interference of SO42- yielding a correlation coefficient of 0.873. Ahmed et al. (2016) obtained 0.727 and 0.434 as well as 0.794 and 0.529 for the accuracy and precision of K-NN and tree models, respectively.