3.1. LULC Classification using SVM and RF
The two nonlinear supervised MLAs (Support vector Machine and Random Forest) have classified the watershed’s land use into six major land use classes (Fig. 4–6) waterbodies, settlements, forest, agriculture, shrubs, and bare land. The classification was applied depending on the previous research findings, researcher experiences and ground truth data. The results showed significant variation in area coverage (Fig. 7). As shown in Fig. 7, agricultural land is the most dominant class (47.92% for SVM and 50.91% for RF), and covering the most significant percentage share of the land use classes in the studied watershed in both machine learning algorithms.
The area covered with water, settlements, forest, and bare land (occupying about 0.29% and 0.31%, 11.32% and 12.40%, 12.24% and 12.19%, 7.58% and 7.31% for SVM and RF respectively for LULC_2022) has the almost similar in the classified land use image. The results indicated that the proportion of water, settlements, forest and bareland classified using SVM and RF exhibited minor variations but, agricultural and shrubland image classification indicated significant differences (Fig. 7), since, the classification between them shows the major difference for SVM and RF. Agriculture, shrubs, forests, settlements, bare land, and water bodies were ranked first to sixth regarding the total surface area they covered in the watershed. Our analysis results agree with earlier studies (Abdi 2020; Erbek, Özkan, and Taberner 2004; MohanRajan, Loganathan, and Manoharan 2020; Rogan et al. 2008; Talukdar et al. 2020), who stated that the areas of the LULC classes are not the same in satellite image classification, whether ML or other classification algorithms are applied.
(Woldemariam et al, 2022) presented a study on the evaluation of three different classification algorithms to select an efficient classifier or multiple ways to obtain more accurate land use maps. They found that though all supervised MLAs can be considered robust classifiers, the SVM algorithm is found to be the best classifier to improve the classification accuracy at an individual land use class level.
On the other hands, the Random Forest (RF) algorithm is one of the most widely used ensemble MLAs in digital image classification, demonstrating excellent superiority and classification accuracy (Ali et al. 2012; Goel et al. 2017). The RF provides a robust algorithm to obtain the best results from the effective deployment of large datasets (Dronova, Gong, and Wang 2011; Mao et al. 2020), through which a remarkable classification accuracy is achieved based on remotely sensed data (Talukdar et al. 2020). In addition, among the RF's many benefits is a quickly savvy structure to apply pre-generated trees compared to several MLAs applied for the LULC classification using remote sensing imagery (Talukdar et al. 2020). RF algorithm applies nonparametric technique to create multiple tree-structured classifiers using identically distributed independent random vectors. As a result, each tree cast unit votes for the most popular class at input x (Talukdar et al. 2020; Yulianto et al. 2021).
3.2. Accuracy Assessment of LULC Classification.
Accuracy evaluation was a crucial stage in determining the level of "correctness" of the classified satellite images used in remote sensing images or data processing. According to (Geremew 2013) the minimum accuracy value for reliable land cover classification was 85%. Therefore, as shown in Tables 3–7, the classification carried out in this study produced a Kappa coefficient and an overall accuracy that fulfill the minimum adopted accuracy level of the target reference data for both classifiers. Therefore, SVM and RF classifiers fulfill the minimum accuracy value for reliable land cover classification in the study.
The current study used quantitative methods to evaluate the classified land use maps' accuracy. The accuracy assessment results for separate land use maps are provided in Table 4&6 for LULC_2022 and in Table 7 for the whole study years in general. The classification accuracy assessment was done for the land use maps with OA, PA, UA, and Kˆ coefficient. The OA, and Kˆ values achieved based on the stratified random sampling technique were 87.46% and 0.83 for SVM, 91.19%, and 0.88 for RF (Tables 4 &6). The Kˆ statistics indicate an excellent level of agreement for the classified land use maps using both MLAs. As shown in Tables 4&6, the UA and PA values computed for individual LULC classes ranged widely from 71% (bareland classified by SVM) to 94.52% (settlement by SVM) and 81.39% (bare land classified by SVM) to 100% (water body classified by using both SVM and RF), respectively. The overall accuracy of SVM and RF classifier for the LULC_1992, 2002, 2012 and 2022 were 81.12%, 79.45%, 84.45% and 87.46% and 86.84%, 85.15%, 86.59% and 91.12% respectively (Table 7).
Therefore, the results of the accuracy assessment indicate that both MLAs are robust land-use classifiers; however, the RF algorithm proved to be a superior land-use classifier, outperforming SVM in terms of classification accuracy at the individual class level and developing higher OA and Kˆ values (Table 7). Additional research reveals that one of the most popular ensemble MLAs for classifying digital images is the Random Forest (RF) algorithm, which excels in classification accuracy and superiority (Ali et al. 2012; Goel et al. 2017). The RF provides a robust algorithm to obtain the best results from the effective deployment of large datasets (Dronova et al. 2011; Mao et al. 2020), through which a remarkable classification accuracy is achieved based on remotely sensed data (Talukdar et al. 2020). In addition, among the RF's many benefits is a quickly savvy structure to apply pre-generated trees compared to several MLAs applied for the LULC classification using remote sensing imagery (Talukdar et al. 2020). RF algorithm applies a non-parametric technique to create multiple tree-structured classifiers using identically distributed independent random vectors. As a result, each tree cast unit votes for the most popular class at input x (Talukdar et al. 2020; Yulianto et al. 2021).
Random forest-based land use and cover classification is an effective way to classify land cover because it uses a large number of decision trees to produce reliable results (Abdi 2020; Lukas, Melesse, and Kenea 2023). It can also be used to generate reliable findings rapidly and manage enormous datasets. It also provides a clearer picture of the spatial distribution of land cover and usage. The non-parametric nature of RF, its high classification accuracy, and its ability to assess variable relevance are some of its main benefits. Another advantage of the random forest classifier is that it requires only two parameters to be set, whereas the SVMs require several user-defined parameters (Pal 2005). The relative value of various features throughout the classification process is also provided by this classifier, which is helpful for feature selection (Pal 2005).
(Kulkarni and Lowe 2016) show that Random Forest was outperformed by the neural network and support vector machine. This could be due to impure training sets. Random Forest works well given large homogeneous training data and is relatively robust to outliers. Similarly, in selected hydrological catchments of the Lake Haramaya Watershed, East Hararghe Zone, Ethiopia, SVM, RF, ANN, and OBIA were statistically evaluated (Woldemariam et al, (2022). The results showed that, in general, the SVM algorithm produced a more accurate land use map, followed by the RF and ANN algorithms; however, the overall accuracy and kappa coefficients of RF findings were comparable to the study results (92% and 91.19%), respectively. Because of this, the random forest (RF) classifier was chosen for the LULC classification and change detection analysis in this work. As a result, it exhibits higher overall accuracy and kappa coefficient than the support vector machine classifier.
Table 3
Confusion matrix accuracy for the classification using SVM classifier for LULC_2022
Class_Name | Waterbody | Settlement | Forest | Agricultural land | Shrub land | Bareland | Total (user) |
Waterbody | 7 | 1 | 0 | 1 | 0 | 0 | 9 |
Settlement | 0 | 216 | 0 | 8 | 1 | 7 | 232 |
Forest | 0 | 0 | 76 | 0 | 10 | 0 | 86 |
Agricultural land | 0 | 10 | 2 | 85 | 4 | 2 | 103 |
Shrub land | 0 | 0 | 7 | 3 | 84 | 1 | 95 |
Bareland | 0 | 8 | 1 | 7 | 0 | 41 | 57 |
Total (producer) | 7 | 235 | 86 | 104 | 99 | 51 | 582 |
Table 4
User’s, producer’s, overall accuracy and kappa coefficient for SVM classifier using LULC_2022
Class_Name | User Accuracy (UA %) | Producer Accuracy (PA %) | Overall Accuracy and Kappa Coefficient |
Waterbody | 77.78 | 100.00 | |
Settlement | 93.10 | 91.91 | |
Forest | 88.37 | 88.37 | OA = 87.46% |
Agricultural land | 82.52 | 81.73 | K = 0.83 |
Shrubland | 88.42 | 84.85 | |
Bareland | 71.93 | 80.39 | |
Table 5
Confusion matrix accuracy for the classification using RF classifier for LULC_2022
Class_Name | Waterbody | Settlement | Forest | Agricultural land | Shrub land | Bareland | Total (user) |
Waterbody | 7 | 1 | 0 | 1 | 0 | 0 | 9 |
Settlement | 0 | 276 | 0 | 8 | 1 | 7 | 292 |
Forest | 0 | 0 | 96 | 0 | 10 | 0 | 106 |
Agricultural land | 0 | 10 | 2 | 216 | 4 | 2 | 234 |
Shrub lands | 0 | 0 | 7 | 3 | 84 | 1 | 95 |
Bareland | 0 | 8 | 1 | 7 | 0 | 77 | 93 |
Total (producer) | 7 | 295 | 106 | 235 | 99 | 87 | 829 |
Table 6
User’s, producers’s, overall accuracy and kappa coefficient for RF classifier using LULC_2022
Class_Name | users’ accuracy UA (%) | producer accuracy PA (%) | Overall accuracy (OA) & Kappa coefficient(K) |
Waterbody | 77.78 | 100.00 | |
Settlement | 94.52 | 93.56 | OA = 91.19% |
Forest | 90.57 | 90.57 | K = 0.88 |
Agricultural land | 92.31 | 91.91 | |
Shrub land | 88.42 | 84.85 | |
Bareland | 82.80 | 88.51 | |
Table 7
The overall accuracy and kappa coefficient using SVM and RF classifiers for the study years.
LULC Classes | | Using SVM classifier | | Using RF classifier |
OA (%) | K | | OA (%) | K |
LULC_1992 | 81.12 | 0.76 | 86.84 | 0.79 |
LULC_2002 | 79.45 | 0.73 | 85.15 | 0.77 |
LULC_2012 | 84.45 | 0.81 | 86.59 | 0.83 |
LULC_2022 | 87.46 | 0.83 | 91.12 | 0.88 |
3.3. LULC change detection analysis using RF
The land use/ cover change was analyzed for the respective years from 1992 to 2002, 2002 to 2012, 2012 to 2022, and 1992 to 2022. It can be observed from (Table 8) and (Figs. 4–7) that the watershed has undergone numerous land use and cover changes for the study periods. The result indicated that agricultural land increased from 1992 to 2002 by 8.53%, 2002 to 2012 by 6.44% and from 2012 to 2022 by 14.94%. The settlement area increased significantly from 1992 to 2002 by 69.91%, from 2002 to 2012 by 72.17% and from 2012 to 2022 by 21.44%. The forest declined from 4.07%, from 2002 to 2012 by 48.27% but increased from 2012 to 2022 by 33.51%. Shrub land declined from 1992 to 2002 by 9.28%, from 2002 to 2012 by 8.16%, from 2012 to 2022 by 26.30%, and also there was a decline of bare land from 1992 to 2002 by 31.97% and 2002 to 2012, in which it increased by 74.05%, and from 2012 to 2022, then it decreased by 41.42%. Hence, the decrease of bare land is because of an increasing in afforestation and the expansion of bare land is due to unprotected agricultural practices in the watershed.
According to this study (Table 9), Shrubs decreased through the study periods because of the expansion of agricultural lands and settlements. Another scholar (Derebe, Hatiye, and Asres 2022) founds that agriculture and settlements continuously expanded whereas shrublands decreased during the study periods in the Abelti Watershed.
In general, during the years of this study (1992–2022), numerous land use land cover changes occurred since, agricultural lands, settlements, and waterbodies increased through the study period and forest, shrublands, and bare lands decreased (Tables 9 and Fig. 8). The main causes of LULC changes in the watershed were thought to be population growth with its associated demands and public awareness of management strategies. Therefore, unchanged and changed area coverage for different year’s intervals shows that more area coverage change was happened in the first (1992–2002) and last (1992–2022) year interval compared to others (Fig. 9).
The result of the study showed that significant change detection had observed during the study period (1992–2022). Agriculture, waterbody and settlement areas showed an increasing trend of 12.57, 0.27 and 8.91%, respectively, while forest, shrubland, and bareland showed a decreasing trend of 6.21, 10.97 and 3.23%, respectively (Fig. 10). This result revealed that the change of forest, shrubland, and bareland to agricultural, waterbody and settlement areas which may problems including change in streamflow, soil degradation, and hydrological system in the basin.
Table 8
Percentage distributions and area coverage of the classified LULC types from 1992 to 2022
| Area coverage in percentage (%) | |
Class_Name | LULC_1992 | LULC_2002 | LULC_2012 | LULC_2022 |
Water body | 0.04 | 0.03 | 0.3 | 0.31 |
Settlement | 3.49 | 5.93 | 10.21 | 12.4 |
Forest | 18.40 | 17.65 | 9.13 | 12.19 |
Agricultural land | 38.34 | 41.61 | 44.29 | 50.91 |
Shrub land | 28.34 | 25.71 | 23.61 | 17.4 |
Bare land | 10.54 | 7.17 | 12.48 | 7.31 |
Table 9
Area coverage changes in percentage for the classified LULC types from 1992 to 2022
Area coverage change in percentage (%) |
LULC class | 1992–2002 | 2002–2012 | 2012–2022 | 1992–2022 |
Water body | -0.01 | 0.27 | 0.01 | 0.27 |
Settlement | 2.44 | 4.28 | 2.19 | 8.91 |
Forest | -0.75 | -8.52 | 3.06 | -6.21 |
Agricultural land | 3.27 | 2.68 | 6.62 | 12.57 |
Shrub land | -2.63 | -2.1 | -6.21 | -10.94 |
Bare land | -3.37 | 5.31 | -5.17 | -3.23 |