Understanding land use land cover change dynamics using machine learning algorithms in the Abelti watershed, Omo-Gibe Basin, Ethiopia

doi:10.21203/rs.3.rs-5294673/v1

Download PDF

Research Article

Understanding land use land cover change dynamics using machine learning algorithms in the Abelti watershed, Omo-Gibe Basin, Ethiopia

https://doi.org/10.21203/rs.3.rs-5294673/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Accurate and precise land cover information is essential to subsequent applications, though it is highly sought after. The purpose of this study is to select the better land use land cover (LULC) classifier and investigate change detection. Support vector machine (SVM) and random forest (RF) algorithms were applied using Google Earth Engine (GEE) platform to categorize LULC satellite data in the Abelti watershed. SVM and RF have overall classification accuracy of 87.46% and 91.19%, respectively and thus RF classifier was selected for LULC change detection analysis. Results show that agricultural land was grown by 8.53% between 1992 and 2002, 6.44% between 2002 and 2012, and 14.94% between 2012 and 2022. Between 1992 and 2002, the settlement area grew by 69.91%, between 2002 to 2012 by 72.17%, and between 2012 and 2022, it expanded by 21.44%. Shrub land was also decreased by 38.60% between 1992 and 2022. Additionally, there was a change in bare land between 1992 and 2012 which decreased by 31.97%, then increased by 74.05% between 2012 and 2022. Finally, Agriculture, waterbody, and settlement areas showed an increasing trend of 12.57, 0.27 and 8.91%, respectively, while forest, shrubland, and bareland showed a decreasing trend of 6.21, 10.97 and 3.23%, respectively during 1992–2022. Consequently, utilizing a RF algorithm is a crucial method for classifying multispectral satellite data and in detecting LULC changes. The study results provide useful information for policymakers and planners in the implementation of sustainable land resource planning and management in the context of environmental change.

Google Earth Engine

Land use land cover changes

Machine Learning

Random Forest

Support Vector Machine

LULC alterations illustrate the relationship between human activities and its effects on the environment. For land resource management to be successful over the long term, it is essential to comprehend the dynamics that cause LULC changes. One common indication of ecological and economic processes is land use and land cover. Policymakers' decisions are supported by the detection and modeling of land-use and land-cover change (LULCC), which makes it easier to understand the causes and effects of land-use dynamics. Thus, research on natural resource management, environmental evaluation, territorial and urban planning, and agricultural production management depends on the identification and modeling of LULCC (Jiang, Fu, and Lü 2020; Li et al. 2022; Lin et al. 2018; Rounsevell et al. 2006; Sasmito et al. 2019; Sohl and Claggett 2013; Wang et al. 2021; Zhang et al. 2022; Zhang and Sun 2019). Due to land cover changes, soils of many watersheds and river basins are degraded, which leads to a decrease in the soil infiltration rate and consequently increases the amount and rate of runoff. As a result, much rainwater makes its way to the sea during the rainy season due to the higher runoff without being used for human needs. Deforestation, urbanization, and other anthropogenic activities can significantly alter stream flow's seasonal and annual distribution within a watershed. So, understanding and measuring LULC changes requires accurate and up-to-date land cover change data. Remotely sensed data has been recognized as one of the most important data sources for land cover mapping and for monitoring land cover change over time with Landsat being the most frequently used data source (Li et al. 2017; Roy et al. 2014; Wulder et al. 2016).

Large-scale land cover mapping difficulties can be resolved by using Google Earth Engine (GEE), a cloud-based computing tool. With a web-based Integrated Development Environment (IDE) code editor, users can examine all accessible remotely sensed images without having to download the data to their local computer. In this way, users can easily access, select and process large volumes of data for a large study area (Gorelick et al. 2017). Besides the fast processing, another important aspect that makes GEE more and more popular is the availability of several packages with lots of algorithms simplifying access to remote sensing tools for both expert and non-experts. According to Tamiminia et al. (Gorelick et al. 2017), since 2013 the number of publications using GEE steadily increased. Among the available datasets in GEE, data from optical satellites and particularly the approximately 40 years long time series of Landsat, have been the most frequently used. It is reported that GEE has been applied to various areas, ranging from agriculture, forestry, ecology to economics and medicine (Kumar and Mutanga 2017; Tamiminia et al. 2020). Among these, forest and vegetation studies were the most frequent application disciplines, followed by land use and land cover studies.

Machine learning algorithms have been employed since the launch of the first land observation satellite, Landsat-1, in 1972 to classify pixels in Thematic Mapper (TM) images(Mitchell, Downie, and Diesing 2018). There are many methods to analyze a multispectral image. There are a variety of classification algorithms, including parametric traditional supervised methods like maximum likelihood and unsupervised methods, as well as non-parametric machine learning techniques such as neural networks, support vector machines, decision trees, and combinations. The neural networks are powerful, have an increased degree of strength and tolerance than conventional classifiers, and are a reasonable alternative to conventional classifiers. It is also suitable for classifying large and multispectral images (Kulkarni 2016; Raczko and Zagajewski 2017).

More precisely, there has been a lot of attention lately in ensemble learning techniques (bootstrap, boosting, etc.) (Pande 2022; Pelletier et al. 2016). They consist in learning several weak classifiers to generate a classifier with a strong decision rule. A well-known ensemble learning method is Random Forests (RF) which has demonstrated its ability to yield accurate land cover maps (Anon n.d.; Peters et al. 2009). A more recent approach for classifying land covers has been based on ensemble methods, such as random forests (RF) (Gu et al. 2023). RF is an ensemble learning algorithm based on the idea that a combination of bootstrap aggregated classifiers perform better than a single classifier (Jin et al. 2020). Tree classifiers are the foundation of the RF algorithm. Many categorization trees are grown in the Random Forest. Not only is it one of the world's most accurate algorithms. Large datasets can be efficiently implemented with RF, and its structure can be readily preserved for later use (Chakraborty, Sachdeva, and Joshi 2016; Kulkarni and Lowe 2016; Kumari et al. 2023).

Ethiopia faced significant environmental problems as a result of increased land degradation and soil erosion. In the previous years, different scholars such as (Takala, Adugna, and Tamam 2016; Tesfaye 2014) tried to identify factors affecting soil erosion, sedimentation problems, and land use/cover class analysis in the Omo-Gibe Basin. In the same way, the upper Omo-Gibe Basin is seeing unprecedented changes in its natural surroundings, such as grasslands and woods, as a result of the expansion of agricultural areas. This is because the paper concentrated on the upper part of the basin at the watershed level. This is in response to the population's rapid growth and the rising need for agricultural land. Thus, the goal of this work is to classify multispectral Landsat images in the Abelti watershed using recently established data mining techniques called machine learning algorithms. Large number of earlier research used conventional parametric statistics to classify multispectral imagery for the purpose of mapping land use in the watershed and to examine land cover change detection in different parts of the world. They have also used classified satellite data for a variety of socioeconomic and environmental purposes. But, the researchers didn’t applied machine learning algorithms for LULC classification and change detection analysis any more in Abelti watershed. Therefore, the Support Vector Machine and Random Forest method were employed in this study to classify land cover and detect changes in land use land cover at the Abelti watershed. Hence, the purpose of this work is to test change detection over the Abelti Watershed using multispectral Landsat satellite imagery, as well as the robustness of machine learning techniques for producing correct land cover maps.

2.1. Description of the Study Area

The Abelti watershed is located in the upper Omo - Gibe Basin at 7.35⁰N-9.36⁰N latitude and 36.5⁰E-38.13⁰E longitude in Ethiopia. It is the main tributary for the Omo-Gibe Basin with maximum and minimum elevations of 3259 and 1090m, respectively. Gilgel Gibe River is found in the watershed with an important runoff contributor compared to the others with a drainage area of 15746 km². The topography of the Omo-Gibe basin is characterized by its physical variation. The northern two-thirds of the basin has mountainous to hilly terrain cut by deeply incised gorges of the Omo, Gojeb, and Gilgel-Gibe rivers, while the southern one-third of the basin is a flat alluvial plain punctuated by hilly areas. The basin in general lies at an altitude range of 333–3570 m.a.s.l while the study site (watershed) has an altitude range of 681‑3570 m.a.s.l and the plains of the Lower Omo lies between 400-500m.a.s.l (EEPCO, 2009). The flow direction of Gibe River is southwards, to the Omo River and Lake Turkana a faulty feature(Woodroofe 1996).

The climate of the Omo-Gibe River Basin varies from a hot arid climate in the southern part of the floodplain to a tropical humid one in the highlands that include the extreme north and north-western part of the Basin. Intermediate between these extremes and for the greatest part of the basin the climate is tropical sub-humid. The amount of mean annual rainfall decreases with a decrease in elevation throughout the Omo-Gibe catchments and ranges from 1,200-1,900 mm. Moreover, the rainfall regime is uni-modal for the northern and central parts of the basin and bimodal for the south. The average annual rainfall calculated over the whole Gibe III dam watershed where the dam is located is 1,426 mm and 75–80% of the annual rainfall distribution

occurs during five months from May to September.

The topography of the Omo-Gibe basin is characterized by its physical variation. The northern two-thirds of the basin has mountainous to hilly terrain cut by deeply incised gorges of the Omo,

Gojeb, and Gilgel-Gibe rivers, while the southern one-third of the basin is a flat alluvial plain punctuated by hilly areas. The basin in general lies at an altitude range of 333–3570 m.a.s.l while the study site (watershed) has an altitude range of 681‑3570 m.a.s.l and the plains of the Lower Omo lies between 400-500m.a.s.l (EEPCO, 2009).

The major land-use types in the study area are forest lands, urban areas, rangeland, agricultural land, water body, and other built-up areas (Takala et al. 2016). In a very broad term, most of the northern catchments of the Omo-Gibe basin are, under extensive cultivation with increased land pressure, i.e., the expansion of cultivated areas into marginal lands at the expense of woodlands. The northern catchments' flatter and poorly drained bottomlands are usually not cultivated but used for dry season grazing and eucalyptus tree plantations. The main gorges of the basin are relatively unpopulated and support open woodland and bush land with grasses. The eastern part of the basin has some of the most densely populated and intensively farmed areas.

Subsistence agriculture is the main source of livelihood to the population in the basin. The farming system is different between the highlands and lowlands. In the highland areas, it is a typical mixed crop-livestock system, where cereals are the dominant crops produced, of which maize accounts 30%. Some root crops such as potato and cassava are also produced in the wet northern part of the basin. This part of the basin is also known for its perennial crop cultivation of coffee, ensent (Enset ventricosum) and chat (Catha edulis). On the other hand, pastoralism and agro-pastoralism are the dominant farming systems in the southern lowland areas of the basin (MoWR1996).

2.2. Methods

2.2.1. Study periods

The study period ranges from 1992 to 2022. The Landsat data were selected for the 1992, 2002, 2012 and 2022 years a period were the watershed was faced higher LULC change (Ateka, Agegn, and Belayneh 2022).

2.2.2. Data Used

The important spatial data required for the study were Digital Elevation Model (DEM), Landsat Images, and field data. The Landsat images were downloaded from the Google Earth Engine/USGS (http://glovis.usgs.gov/), and the 30 meters resolution DEM was obtained from the Ethiopian Ministry of Water, Irrigation and Energy (MoWIE). There was used to develop elevation slope and other hydrological parameters of the watershed. The analysis was performed using the ETM⁺ Landsat-5 from 1992, ETM⁺ Landsat-7 from 2002, Landsat-8 OLI/TIRS from 2012, and Landsat-8 OLI/TIRS from 2022. The images were acquired in January corresponds to the dry season in Ethiopia where a clear sky occurs to obtain images with fewer clouds helpful to minimize extreme differences in the land cover reflectance dataset.

2.2.3. Landsat data used for LULC classification

The USGS produces data in 3 categories for each satellite (Tier 1, Tier 2 and RT). We used Landsat 8 Collection 2 Tier 1 DN values, representing scaled, calibrated at-sensor radiance. Hence, Landsat scenes with the highest available data quality are placed into Tier 1 and are considered suitable for time-series processing analysis. Tier 1 includes Level-1 Precision Terrain (L1TP) processed data that have well-characterized radiometry and are inter-calibrated across the different Landsat sensors. The geo-registration of Tier 1 scenes is consistent and within prescribed tolerances [ < = 12 m root mean square error (RMSE)]. All Tier 1 Landsat data can be considered consistent and inter-calibrated (regardless of sensor) across the full collection.

Table 1.Description of the Bands used for creating training data

Name Pixel size Wavelength Description

B1 30 meters 0.43–0.45 µm Coastal aerosol

B2 30 meters 0.45–0.51 µm Blue

B3 30 meters 0.53–0.59 µm Green

B4 30 meters 0.64–0.67 µm Red

B5 30 meters 0.85–0.88 µm Near infrared

B6 30 meters 1.57–1.65 µm Shortwave infrared 1

B7 30 meters 2.11–2.29 µm Shortwave infrared 2

B8 15 meters 0.52–0.90 µm Band 8 Panchromatic

2.2.4. Image preprocessing and classification

The classification of images is to categorize automatically all pixels from the Landsat images into LULC classes to extract useful thematic information (Al-sharif and Pradhan 2014). All training and validation samples were collected based on manual visual interpretation of high-resolution images from Google Earth. Hence, this method is widely applied and reported in the different literatures. The training data were selected randomly for both trainSet (80%) and testSet/validation (20%). Random sampling is the simplest form of sampling data points randomly from a dataset with no specific pattern. Therefore, this method assumes that every data point in the dataset has an equal chance of being selected. The model is initially fit on a training data set, which is a set of examples used to fit the parameters of the model. Successively, the fitted model is used to predict the responses for the observations in a second data set called the validation data set.

Table 2

Description of LULC classification classes
LULC_Classes	Descriptions
Agricultural land	Lands occupied by crops, farmland, plantation, and fallow land.
Bareland	Lands without vegetation, crops or grasses, and barren soils
Forest	Evergreen broad-leafed and evergreen needle-leafed forest, deciduous, woodland, open forest, dense forest, and afro-alpine.
Settlements	Built-up areas and roads
Shrubland	Areas covered by sparsely distributed scrubs, bushlands, and grasslands
Waterbody	Areas covered by perennial rivers, lakes, ponds, reservoirs

2. 2.5.Application of Support Vector Machine (SVM) and Random Forests (RF) for LULC classification

Support-vector machines (SVMs), also known as support vector networks, are nonlinear classifiers using a nonlinear kernels classification analysis system derived from a statistical learning theory (Awad and Khanna 2015). SVM is one of the most potent supervised MLAs available rights out of the box. The training algorithm is based on assigning new examples into one of the two categories, making the decision simple and minimizing error. SVM creates a hyperplane to map samples of different classes with a clear gap that is as wide as possible. SVM uses the concept of kernel approach which map data to a high dimensional space through a non-linear transformation. SVM does not need the feature selection in the preprocessing stage as it can work on full dimensionality of hyperspectral data. By designing kernel function, SVM can be applicable to any complex data types (Gualtieri& Cromp, 1999).

RF is an ensemble learning algorithm based on the idea that a combination of bootstrap aggregated classifiers perform better than a single classifier (Jin et al. 2020). Tree classifiers are the foundation of the RF algorithm. Many categorization trees are grown in the Random Forest. Not only is it one of the world's most accurate algorithms. Large datasets can be efficiently implemented with RF, and its structure can be readily preserved for later use (Chakraborty et al. 2016; Kulkarni and Lowe 2016; Kumari et al. 2023).

SVM and RF were applied to analyze LULC classification accuracy and to compare their performances. The processes to make LULC classification using machine learning (Google Earth engine) were described as follows:

2.2.6. Accuracy assessment

Accuracy assessment tells us to what extent the ground truth is represented on the equivalent classified image. Since land use maps derived from image classification usually contain some errors, the classification results' accuracy must be assessed. Assessing the classification accuracy provides confidence in the results and the subsequent change detection (Nguyen 2015). The common and most effective method used to measure the accuracy of the classified image from remotely sensed imagery is an error/confusion matrix (Morales-Barquero et al. 2019). The confusion matrix provides overall accuracy, user accuracy, producer accuracy, and kappa statistics. Landsat image year 2022 was used for the classification and accuracy assessment for both SVM and RF algorithms.

Confusion Matrix.

The main diagonal of the confusion matrix lists classified pixels. One of the basic accuracy measures is the overall accuracy, which is estimated by dividing the correctly classified pixels bythe total number of pixels checked. Accepted accuracy levels by users will not be acceptable to other users for some other functions (Maina et al. 2020).

Producer’s Accuracy. The producer’s accuracy can be estimated by dividing the number of correct pixels in one class divided by the total number of pixels as derived from the reference data column total (Maina et al. 2020)

Accuracy of producer (%) = 100% - error of omission (%) (1)

User’s Accuracy. Correctly classified pixels within the class are divided by the sum of pixels which were classified in that class. One class on the map may have two types of classes on the ground. The latter classes can be referred to as errors of the commission (Maina et al. 2020).

Accuracy of user (%) = 100% - error of commission (%) (2)

Kappa Coefficient. It is the determination of all agreements of a matrix, and it is a ratio of the total diagonal values to the total number of cell counts in the matrix. Kappa values are also classified into three groups. Group A: a value greater than 0.80 (80%) represents strong agreement; Group B: the value between 0.40 and 0.80 (40 to 80%) represents moderate agreement; and Group C: a value below 0.40 (40%) represents a poor agreement according to (Vivekananda, Swathi, and Sujith 2021). It is determined by using Eq. (4) (Jenness and Wynne 2005).

$$K=\frac{{N\sum\limits_{{i=1}}^{r} {\mathop X\limits_{{ii}} } - \sum\nolimits_{{i=1}}^{r} {\left( {\mathop X\nolimits_{i} +} \right)*\left( {X+i} \right)} }}{{\mathop N\nolimits^{2} - \sum\limits_{{i=1}}^{r} {\left( {\mathop X\nolimits_{i} +} \right)} +\left( {X+i} \right)}}$$

$$K=\frac{{\left( {{\text{Total*sum~of~corrected}}} \right) - {\text{sum~of~all~the~}}\left( {{\text{row~total*column~total}}} \right)}}{{{\text{Total~squered}} - {\text{sum~of~all~the~}}\left( {{\text{row~total*column~total}}} \right)}}$$

Where, r- rows number in the matrix, x_ii- number of observations in row i and column i (the diagonal elements), x + i – and x_i + - the marginal totals of row i and column i respectively, and N – observation number.

2.2.7. LULC change detection using Machine Learning

After selecting the best LULC classifier, the LULC chance detection analysis was done using the (Fig. 3).

3.1. LULC Classification using SVM and RF

The two nonlinear supervised MLAs (Support vector Machine and Random Forest) have classified the watershed’s land use into six major land use classes (Fig. 4–6) waterbodies, settlements, forest, agriculture, shrubs, and bare land. The classification was applied depending on the previous research findings, researcher experiences and ground truth data. The results showed significant variation in area coverage (Fig. 7). As shown in Fig. 7, agricultural land is the most dominant class (47.92% for SVM and 50.91% for RF), and covering the most significant percentage share of the land use classes in the studied watershed in both machine learning algorithms.

The area covered with water, settlements, forest, and bare land (occupying about 0.29% and 0.31%, 11.32% and 12.40%, 12.24% and 12.19%, 7.58% and 7.31% for SVM and RF respectively for LULC_2022) has the almost similar in the classified land use image. The results indicated that the proportion of water, settlements, forest and bareland classified using SVM and RF exhibited minor variations but, agricultural and shrubland image classification indicated significant differences (Fig. 7), since, the classification between them shows the major difference for SVM and RF. Agriculture, shrubs, forests, settlements, bare land, and water bodies were ranked first to sixth regarding the total surface area they covered in the watershed. Our analysis results agree with earlier studies (Abdi 2020; Erbek, Özkan, and Taberner 2004; MohanRajan, Loganathan, and Manoharan 2020; Rogan et al. 2008; Talukdar et al. 2020), who stated that the areas of the LULC classes are not the same in satellite image classification, whether ML or other classification algorithms are applied.

(Woldemariam et al, 2022) presented a study on the evaluation of three different classification algorithms to select an efficient classifier or multiple ways to obtain more accurate land use maps. They found that though all supervised MLAs can be considered robust classifiers, the SVM algorithm is found to be the best classifier to improve the classification accuracy at an individual land use class level.

On the other hands, the Random Forest (RF) algorithm is one of the most widely used ensemble MLAs in digital image classification, demonstrating excellent superiority and classification accuracy (Ali et al. 2012; Goel et al. 2017). The RF provides a robust algorithm to obtain the best results from the effective deployment of large datasets (Dronova, Gong, and Wang 2011; Mao et al. 2020), through which a remarkable classification accuracy is achieved based on remotely sensed data (Talukdar et al. 2020). In addition, among the RF's many benefits is a quickly savvy structure to apply pre-generated trees compared to several MLAs applied for the LULC classification using remote sensing imagery (Talukdar et al. 2020). RF algorithm applies nonparametric technique to create multiple tree-structured classifiers using identically distributed independent random vectors. As a result, each tree cast unit votes for the most popular class at input x (Talukdar et al. 2020; Yulianto et al. 2021).

3.2. Accuracy Assessment of LULC Classification.

Accuracy evaluation was a crucial stage in determining the level of "correctness" of the classified satellite images used in remote sensing images or data processing. According to (Geremew 2013) the minimum accuracy value for reliable land cover classification was 85%. Therefore, as shown in Tables 3–7, the classification carried out in this study produced a Kappa coefficient and an overall accuracy that fulfill the minimum adopted accuracy level of the target reference data for both classifiers. Therefore, SVM and RF classifiers fulfill the minimum accuracy value for reliable land cover classification in the study.

The current study used quantitative methods to evaluate the classified land use maps' accuracy. The accuracy assessment results for separate land use maps are provided in Table 4&6 for LULC_2022 and in Table 7 for the whole study years in general. The classification accuracy assessment was done for the land use maps with OA, PA, UA, and Kˆ coefficient. The OA, and Kˆ values achieved based on the stratified random sampling technique were 87.46% and 0.83 for SVM, 91.19%, and 0.88 for RF (Tables 4 &6). The Kˆ statistics indicate an excellent level of agreement for the classified land use maps using both MLAs. As shown in Tables 4&6, the UA and PA values computed for individual LULC classes ranged widely from 71% (bareland classified by SVM) to 94.52% (settlement by SVM) and 81.39% (bare land classified by SVM) to 100% (water body classified by using both SVM and RF), respectively. The overall accuracy of SVM and RF classifier for the LULC_1992, 2002, 2012 and 2022 were 81.12%, 79.45%, 84.45% and 87.46% and 86.84%, 85.15%, 86.59% and 91.12% respectively (Table 7).

Therefore, the results of the accuracy assessment indicate that both MLAs are robust land-use classifiers; however, the RF algorithm proved to be a superior land-use classifier, outperforming SVM in terms of classification accuracy at the individual class level and developing higher OA and Kˆ values (Table 7). Additional research reveals that one of the most popular ensemble MLAs for classifying digital images is the Random Forest (RF) algorithm, which excels in classification accuracy and superiority (Ali et al. 2012; Goel et al. 2017). The RF provides a robust algorithm to obtain the best results from the effective deployment of large datasets (Dronova et al. 2011; Mao et al. 2020), through which a remarkable classification accuracy is achieved based on remotely sensed data (Talukdar et al. 2020). In addition, among the RF's many benefits is a quickly savvy structure to apply pre-generated trees compared to several MLAs applied for the LULC classification using remote sensing imagery (Talukdar et al. 2020). RF algorithm applies a non-parametric technique to create multiple tree-structured classifiers using identically distributed independent random vectors. As a result, each tree cast unit votes for the most popular class at input x (Talukdar et al. 2020; Yulianto et al. 2021).

Random forest-based land use and cover classification is an effective way to classify land cover because it uses a large number of decision trees to produce reliable results (Abdi 2020; Lukas, Melesse, and Kenea 2023). It can also be used to generate reliable findings rapidly and manage enormous datasets. It also provides a clearer picture of the spatial distribution of land cover and usage. The non-parametric nature of RF, its high classification accuracy, and its ability to assess variable relevance are some of its main benefits. Another advantage of the random forest classifier is that it requires only two parameters to be set, whereas the SVMs require several user-defined parameters (Pal 2005). The relative value of various features throughout the classification process is also provided by this classifier, which is helpful for feature selection (Pal 2005).

(Kulkarni and Lowe 2016) show that Random Forest was outperformed by the neural network and support vector machine. This could be due to impure training sets. Random Forest works well given large homogeneous training data and is relatively robust to outliers. Similarly, in selected hydrological catchments of the Lake Haramaya Watershed, East Hararghe Zone, Ethiopia, SVM, RF, ANN, and OBIA were statistically evaluated (Woldemariam et al, (2022). The results showed that, in general, the SVM algorithm produced a more accurate land use map, followed by the RF and ANN algorithms; however, the overall accuracy and kappa coefficients of RF findings were comparable to the study results (92% and 91.19%), respectively. Because of this, the random forest (RF) classifier was chosen for the LULC classification and change detection analysis in this work. As a result, it exhibits higher overall accuracy and kappa coefficient than the support vector machine classifier.

Table 3

Confusion matrix accuracy for the classification using SVM classifier for LULC_2022
Class_Name	Waterbody	Settlement	Forest	Agricultural land	Shrub land	Bareland	Total (user)
Waterbody	7	1	0	1	0	0	9
Settlement	0	216	0	8	1	7	232
Forest	0	0	76	0	10	0	86
Agricultural land	0	10	2	85	4	2	103
Shrub land	0	0	7	3	84	1	95
Bareland	0	8	1	7	0	41	57
Total (producer)	7	235	86	104	99	51	582

Table 4

User’s, producer’s, overall accuracy and kappa coefficient for SVM classifier using LULC_2022
Class_Name	User Accuracy (UA %)	Producer Accuracy (PA %)	Overall Accuracy and Kappa Coefficient
Waterbody	77.78	100.00
Settlement	93.10	91.91
Forest	88.37	88.37	OA = 87.46%
Agricultural land	82.52	81.73	K = 0.83
Shrubland	88.42	84.85
Bareland	71.93	80.39

Table 5

Confusion matrix accuracy for the classification using RF classifier for LULC_2022
Class_Name	Waterbody	Settlement	Forest	Agricultural land	Shrub land	Bareland	Total (user)
Waterbody	7	1	0	1	0	0	9
Settlement	0	276	0	8	1	7	292
Forest	0	0	96	0	10	0	106
Agricultural land	0	10	2	216	4	2	234
Shrub lands	0	0	7	3	84	1	95
Bareland	0	8	1	7	0	77	93
Total (producer)	7	295	106	235	99	87	829

Table 6

User’s, producers’s, overall accuracy and kappa coefficient for RF classifier using LULC_2022
Class_Name	users’ accuracy UA (%)	producer accuracy PA (%)	Overall accuracy (OA) & Kappa coefficient(K)
Waterbody	77.78	100.00
Settlement	94.52	93.56	OA = 91.19%
Forest	90.57	90.57	K = 0.88
Agricultural land	92.31	91.91
Shrub land	88.42	84.85
Bareland	82.80	88.51

Table 7

The overall accuracy and kappa coefficient using SVM and RF classifiers for the study years.
LULC Classes	Using SVM classifier		Using RF classifier
LULC Classes	OA (%)	K	OA (%)	K
LULC_1992	81.12	0.76	86.84	0.79
LULC_2002	79.45	0.73	85.15	0.77
LULC_2012	84.45	0.81	86.59	0.83
LULC_2022	87.46	0.83	91.12	0.88

3.3. LULC change detection analysis using RF

The land use/ cover change was analyzed for the respective years from 1992 to 2002, 2002 to 2012, 2012 to 2022, and 1992 to 2022. It can be observed from (Table 8) and (Figs. 4–7) that the watershed has undergone numerous land use and cover changes for the study periods. The result indicated that agricultural land increased from 1992 to 2002 by 8.53%, 2002 to 2012 by 6.44% and from 2012 to 2022 by 14.94%. The settlement area increased significantly from 1992 to 2002 by 69.91%, from 2002 to 2012 by 72.17% and from 2012 to 2022 by 21.44%. The forest declined from 4.07%, from 2002 to 2012 by 48.27% but increased from 2012 to 2022 by 33.51%. Shrub land declined from 1992 to 2002 by 9.28%, from 2002 to 2012 by 8.16%, from 2012 to 2022 by 26.30%, and also there was a decline of bare land from 1992 to 2002 by 31.97% and 2002 to 2012, in which it increased by 74.05%, and from 2012 to 2022, then it decreased by 41.42%. Hence, the decrease of bare land is because of an increasing in afforestation and the expansion of bare land is due to unprotected agricultural practices in the watershed.

According to this study (Table 9), Shrubs decreased through the study periods because of the expansion of agricultural lands and settlements. Another scholar (Derebe, Hatiye, and Asres 2022) founds that agriculture and settlements continuously expanded whereas shrublands decreased during the study periods in the Abelti Watershed.

In general, during the years of this study (1992–2022), numerous land use land cover changes occurred since, agricultural lands, settlements, and waterbodies increased through the study period and forest, shrublands, and bare lands decreased (Tables 9 and Fig. 8). The main causes of LULC changes in the watershed were thought to be population growth with its associated demands and public awareness of management strategies. Therefore, unchanged and changed area coverage for different year’s intervals shows that more area coverage change was happened in the first (1992–2002) and last (1992–2022) year interval compared to others (Fig. 9).

The result of the study showed that significant change detection had observed during the study period (1992–2022). Agriculture, waterbody and settlement areas showed an increasing trend of 12.57, 0.27 and 8.91%, respectively, while forest, shrubland, and bareland showed a decreasing trend of 6.21, 10.97 and 3.23%, respectively (Fig. 10). This result revealed that the change of forest, shrubland, and bareland to agricultural, waterbody and settlement areas which may problems including change in streamflow, soil degradation, and hydrological system in the basin.

Table 8

Percentage distributions and area coverage of the classified LULC types from 1992 to 2022
	Area coverage in percentage (%)
Class_Name	LULC_1992	LULC_2002	LULC_2012	LULC_2022
Water body	0.04	0.03	0.3	0.31
Settlement	3.49	5.93	10.21	12.4
Forest	18.40	17.65	9.13	12.19
Agricultural land	38.34	41.61	44.29	50.91
Shrub land	28.34	25.71	23.61	17.4
Bare land	10.54	7.17	12.48	7.31

Table 9

Area coverage changes in percentage for the classified LULC types from 1992 to 2022
Area coverage change in percentage (%)
LULC class	1992–2002	2002–2012	2012–2022	1992–2022
Water body	-0.01	0.27	0.01	0.27
Settlement	2.44	4.28	2.19	8.91
Forest	-0.75	-8.52	3.06	-6.21
Agricultural land	3.27	2.68	6.62	12.57
Shrub land	-2.63	-2.1	-6.21	-10.94
Bare land	-3.37	5.31	-5.17	-3.23

The derivation of land use maps based on multi-resolution satellite images classification remained a widely applied approach in several environmental studies. In this study, the land use classification performance of two nonlinear supervised MLAs, namely: SVM and RF were statistically evaluated and RF algorithm was selected for LULC change detection analysis in Abelti Watershed, Omo-Gibe basin, Ethiopia. Random forest algorithms based on ensembles revolutionize complex computational issues provide novel approaches to land use and cover pattern categorization difficulties. Therefore, the random forest (RF) classifier was chosen for the LULC classification and change detection analysis in this work. As a result, it exhibits higher overall accuracy and kappa coefficient than the support vector machine classifier in study watershed.

LULC change is examined using Landsat images from 1992, 2002, 2012, and 2022 during four time periods. Among the main LULC classes found in the watershed under study were agricultural lands, bare lands, forests, settlements, shrublands, and water bodies. Due to the watershed's rapid population growth, agricultural lands and settlement areas grew greatly at the expense of shrub lands, which declined between 1992 and 2022, and forest lands, which declined between 1992 and 2012. On the other side, the building of the Gilgel Gibe I reservoir caused an increase in the basin's water bodies between 2002 and 2022 that causes the changes in land use type. In general, agricultural lands, settlements, and waterbodies were increased through the study period and forest, bare lands and shrublands decreased. Conversion of forest lands to agricultural lands and forests for fuel is another effect of population development. Furthermore, a rise in construction activity like reservoirs, cities, and roads would result in the removal of trees and the conversion of these areas to agricultural land.

In general, unchanged and changed area coverage for different year’s intervals shows that more area coverage change was occurred in the first (1992–2002) and last (1992–2022) year interval compared to others two study year intervals (2002–2012 and 2012–2022). The result of the study showed that significant change detection had observed during the study period. Agriculture, waterbody and settlement areas showed an increasing trend of 12.57, 0.27 and 8.91%, respectively, while forest, shrubland, and bareland showed a decreasing trend of 6.21, 10.97 and 3.23%, respectively in the study year interval (1992–2022). This result revealed that the change of forest, shrubland, and bareland to agricultural, waterbody and settlement areas which may problems including change in streamflow, soil degradation, and hydrological system in the basin.

Finally, the study found that, in comparison to support vector machines, random forest classification from multispectral satellite images is a potent and economical technique for classifying land cover land use. We can swiftly and accurately map areas for land use planning and conservation with the use of this technology. Land use and land cover change analysis can also be evaluated with it and for large land use land cover data set; random forest (RF) is the best classifier.

This research focuses on the classification of LULC using support vector machine (SVM) and random forest (RF) algorithms. The random forest classifier shows better performance than the support vector machine for LULC classification in the study watershed. So, the researchers should apply machine learning algorithms and the RF to classify LULC in the other watersheds to evaluate the performances. In addition to it, this study addressed that the watershed exists under the land use land cover change. Therefore, the decision-makers, planners, and other stakeholders should design strategies to ensure the sustainability of the watershed for the sake of protecting every development activity such as agricultural and other related activities within the watershed area.

Acknowledgment

First and foremost I am extremely grateful to my supervisors, Asfaw Kebede (Associate professor) and Gebremedhin Gebremeskel (Dr.) for their invaluable advice, continuous support, and patience during my PhD study. Their immense knowledge and plentiful experience have encouraged me in all the time of my academic research and daily life.

FUNDING

The study did not receive any external funding

DISCLOSURE STATEMENT

The authors declare no conflict of interest.

Author Contribution

author1- prepared the whole manuscriptauthor2- supervised,reviewed and gives constrcutive comments on the manuscript before submissionauthor3- supervised,reviewed and gives constrcutive comments on the manuscript before submission

Abdi AM (2020) Land Cover and Land Use Classification Performance of Machine Learning Algorithms in a Boreal Landscape Using Sentinel-2 Data. GIScience Remote Sens 57(1):1–20
Al-sharif AAA, and Biswajeet Pradhan (2014) Monitoring and Predicting Land Use Change in Tripoli Metropolitan City Using an Integrated Markov Chain and Cellular Automata Models in GIS. Arab J Geosci 7:4291–4301
Ali J, Khan R, Nasir Ahmad, and, Maqsood I (2012) Random Forests and Decision Trees. Int J Comput Sci Issues (IJCSI) 9(5):272
Ateka M, Agegn L (2022) and Alemshet Belayneh. Evaluating the Effects of Land Use and Land Cover Change on Watershed Surface Runoff: Case of Abelti Watershed, Omo Basin, Ethiopia. International Journal of Earth Sciences Knowledge and Application 3(2021):32–42
Awad M, and Rahul Khanna (2015) Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers. Springer nature
Chakraborty A, Sachdeva K, Joshi PK (2016) Mapping Long-Term Land Use and Land Cover Change in the Central Himalayan Region Using a Tree-Based Ensemble Classification Approach. Appl Geogr 74:136–150
Derebe M, Ateka SD, Hatiye (2022) and Ligalem Agegn Asres. Dynamics and Prediction of Land Use and Land Cover Changes Using Geospatial Techniques in Abelti Watershed, Omo Gibe River Basin, Ethiopia. Advances in Agriculture 2022
Dronova I, Gong P, and Lin Wang (2011) Object-Based Analysis and Change Detection of Major Wetland Cover Types and Their Classification Uncertainty during the Low Water Period at Poyang Lake, China. Remote Sens Environ 115(12):3220–3236
Erbek F, Sunar C, Özkan, Taberner M (2004) Comparison of Maximum Likelihood Classification Method with Supervised Artificial Neural Network Algorithms for Land Use Activities. Int J Remote Sens 25(9):1733–1748
Geremew AA (2013) Assessing the Impacts of Land Use and Land Cover Change on Hydrology of Watershed: A Case Study on Gigel-Abbay Watershed. Lake Tana Basin, Ethiopia.
Goel E, Abhilasha E, Goel E, and E Abhilasha (2017) Random Forest: A Review. Int J Adv Res Comput Sci Softw Eng 7(1):251–257
Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D (2017) and Rebecca Moore. Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone. Remote Sensing of Environment 202(2016):18–27. 10.1016/j.rse.2017.06.031
Gu Q, Sun W, Li X, Jiang S, Tian J (2023) A New Ensemble Classification Approach Based on Rotation Forest and LightGBM. Neural Comput Appl 35(15):11287–11308
Gualtieri J, Anthony, and Robert F. Cromp (1999) Support Vector Machines for Hyperspectral Remote Sensing Classification. 27th AIPR workshop: Advances in computer-assisted recognition, vol 3584. SPIE, pp 221–232
Jenness J, Judson Wynne J (2005) Cohen’s Kappa and Classification Table Metrics 2.0: An ArcView 3x Extension for Accuracy Assessment of Spatially Explicit Models
Jiang W, Fu B, Lü Y (2020) Assessing Impacts of Land Use/Land Cover Conversion on Changes in Ecosystem Services Value on the Loess Plateau, China. Sustainability 12(17):7128
Jin Z, Shang J, Zhu Q, Ling C, Xie W, Qiang B (2020) RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12343 LNCS:503–15. 10.1007/978-3-030-62008-0_35
Kulkarni AD (2016) Random Forest Algorithm for Land Cover Classification. Int J Recent Innov Trends Comput Communication 4(3):58–63
Kulkarni AD (2016) and Barrett Lowe. Random Forest Algorithm for Land Cover Classification.
Kumar L, and Onisimo Mutanga (2017) Remote Sensing of Above-Ground Biomass. Remote Sens 9(9):1–8. 10.3390/rs9090935
Kumari S, Kumar D, Kumar M, Pande CB (2023) Modeling of Standardized Groundwater Index of Bihar Using Machine Learning Techniques. Phys Chem Earth Parts A/B/C 130:103395
Li C, Gong P, Wang J, Zhu Z, Biging GS, Cui Yuan T, Hu H, Zhang Q, Wang X, Li X, Liu Y, Xu J, Guo C, Liu KO, Hackman M, Zhang Y, Cheng L, Yu J, Yang H, Huang, and Nicholas Clinton (2017) The First All-Season Sample Set for Mapping Global Land Cover with Landsat-8 Data. Sci Bull 62(7):508–515. 10.1016/j.scib.2017.03.011
Li L, Zhu A, Huang L, Wang Q, Chen Y, Ooi MCG, Wang M, Wang Y, and Andy Chan (2022) Modeling the Impacts of Land Use/Land Cover Change on Meteorology and Air Quality during 2000–2018 in the Yangtze River Delta Region, China. Sci Total Environ 829:154669
Lin X, Xu M, Cao C, Singh RP, Chen W, and Hongrun Ju (2018) Land-Use/Land-Cover Changes and Their Influence on the Ecosystem in Chengdu City, China during the Period of 1992–2018. Sustainability 10(10):3580
Lukas P, Melesse AM, and Tadesse Tujuba Kenea (2023) Prediction of Future Land Use/Land Cover Changes Using a Coupled CA-ANN Model in the Upper Omo–Gibe River Basin, Ethiopia. Remote Sens 15(4):1148
Maina, James S, Wandiga B, Gyampoh, Charles KKG (2020) Assessment of Land Use and Land Cover Change Using GIS and Remote Sensing: A Case Study of Kieni, Central Kenya. J Remote Sens GIS 9(1):1–5
Mao W, Lu D, Hou L, Liu X, and Wenze Yue (2020) Comparison of Machine-Learning Methods for Urban Land-Use Mapping in Hangzhou City, China. Remote Sens 12(17):2817
Mitchell PJ, Downie A-L, Diesing M (2018) How Good Is My Map? A Tool for Semi-Automated Thematic Mapping and Spatially Explicit Confidence Assessment. Environ Model Softw 108:111–122
MohanRajan S, Navin A, Loganathan, and Prabukumar Manoharan (2020) Survey on Land Use/Land Cover (LU/LC) Change Analysis in Remote Sensing and GIS Environment: Techniques and Challenges. Environ Sci Pollut Res 27(24):29900–29926
Morales-Barquero, Lucia MB, Lyons SR, Phinn, Roelfsema CM (2019) Trends in Remote Sensing Accuracy Assessment Approaches in the Context of Natural Resources. Remote Sens 11(19):2305
Nguyen T (2015) Optimal Ground Control Points for Geometric Correction Using Genetic Algorithm with Global Accuracy. Eur J Remote Sens 48(1):101–120
Pal M (2005) Random Forest Classifier for Remote Sensing Classification. Int J Remote Sens 26(1):217–222
Pande CB (2022) Land Use/Land Cover and Change Detection Mapping in Rahuri Watershed Area (MS), India Using the Google Earth Engine and Machine Learning Approach. Geocarto Int 37(26):13860–13880
Pelletier C, Valero S, Inglada J, Champion N, and Gérard Dedieu (2016) Assessing the Robustness of Random Forests to Map Land Cover with High Resolution Satellite Image Time Series over Large Areas. Remote Sens Environ 187:156–168
Peters J, Verhoest NEC, Samson R, Van Meirvenne M, Cockx L, and Bernard De Baets (2009) Uncertainty Propagation in Vegetation Distribution Models Based on Ensemble Classifiers. Ecol Model 220(6):791–804. 10.1016/j.ecolmodel.2008.12.022
Raczko E, and Bogdan Zagajewski (2017) Comparison of Support Vector Machine, Random Forest and Neural Network Classifiers for Tree Species Classification on Airborne Hyperspectral APEX Images. Eur J Remote Sens 50(1):144–154. 10.1080/22797254.2017.1299557
Rogan J, Franklin J, Stow D, Miller J, Woodcock C, and Dar Roberts (2008) Mapping Land-Cover Modifications over Large Areas: A Comparison of Machine Learning Algorithms. Remote Sens Environ 112(5):2272–2283
Rounsevell MDA, Reginster I, Miguel B, Araújo TR, Carter N, Dendoncker F, Ewert JI, House S, Kankaanpää R, Leemans, Metzger MJM (2006) A Coherent Set of Future Land Use Change Scenarios for Europe. Agric Ecosyst Environ 114(1):57–68
Roy DP, Wulder MA, Loveland TR, Woodcock CE, Allen RG, Anderson MC, Helder D, Irons JR, Johnson DM, Kennedy R, Scambos TA, Schaaf CB, Schott JR, Sheng Y, Vermote EF, Belward AS, Bindschadler R, Cohen WB, Gao F, Hipple JD, Hostert P, Huntington J, Justice CO, Kilic A, Kovalskyy V, Lee ZP, Lymburner L, Masek JG, McCorkel J, Shuai Y, Trezza R, Vogelmann J, Wynne RH, Zhu Z (2014) Landsat-8: Science and Product Vision for Terrestrial Global Change Research. Remote Sens Environ 145:154–172. 10.1016/j.rse.2014.02.001
Sasmito SD, Taillardat P, Clendenning JN, Cameron C, Friess DA, Murdiyarso D, Hutley LB (2019) Effect of Land-use and Land‐cover Change on Mangrove Blue Carbon: A Systematic Review. Glob Change Biol 25(12):4291–4302
Sohl TL, Peter R, Claggett (2013) Clarity versus Complexity: Land-Use Modeling as a Practical Tool for Decision-Makers. J Environ Manage 129:235–243
Takala W, Adugna T, and Dawud Tamam (2016) Omo Gibe Basin, Ethiopia. Int J Sci Technol 5(7):309–323Land Use Land Cover Change Analysis Using Multi Temporal Landsat Data in Gilgel Gibe
Talukdar S, Singha P, Mahato S, Pal S, Liou Y-A, and Atiqur Rahman (2020) Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations—A Review. Remote Sens 12(7):1135
Tamiminia H, Salehi B, Mahdianpari M, Quackenbush L, Adeli S, and Brian Brisco (2020) A Meta-Analysis and Systematic Review. ISPRS J Photogrammetry Remote Sens 164(May):152–170. 10.1016/j.isprsjprs.2020.04.001. Google Earth Engine for Geo-Big Data Applications:
Tesfaye H (2014) Modeling-Impact of Land Use/Cover Change on Reservoir (Case Study on Omo-Gibe Basin, Gilgel Gibe III Watershed, Ethiopia). Master Esis, National Academic Digital Repository of Ethiopia, Addis Ababa, Ethiopia
Vivekananda GN, Swathi R, and AVLN Sujith (2021) Multi-Temporal Image Analysis for LULC Classification and Change Detection. Eur J Remote Sens 54(sup2):189–199
Wang J, Shrestha NK, Delavar MA, Meshesha TW, and Soumendra N. Bhanja (2021) Modelling Watershed and River Basin Processes in Cold Climate Regions: A Review. Water 13(4):518
Woldemariam G, Weldu D, Tibebe TE, Mengesha, and Tadele Bedo Gelete (2022) Machine-Learning Algorithms for Land Use Dynamics in Lake Haramaya Watershed, Ethiopia. Model Earth Syst Environ 8(3):3719–3736
Woodroofe R (1996) Omo-Gibe River Basin Integrated Development Master Plan Study Final Report Vol. VI Water Resources Surveys and Inventories, Ministry of Water Resources. AA
Wulder MA, Joanne C, White TR, Loveland CE, Woodcock AS, Belward WB, Cohen EA, Fosnight J, Shaw JG, Masek, and David P. Roy (2016) The Global Landsat Archive: Status, Consolidation, and Direction. Remote Sens Environ 185:271–283. 10.1016/j.rse.2015.11.032
Yulianto F, Nugroho G, Chulafak GA (2021) and Suwarsono Suwarsono. Improvement in the Accuracy of the Postclassification of Land Use and Land Cover Using Landsat 8 Data Based on the Majority of Segment-Based Filtering Approach. The Scientific World Journal 2021
Zhang H, Yin Y, An H, Lei J, Li M, Song J, and Wuhong Han (2022) Surface Urban Heat Island and Its Relationship with Land Cover Change in Five Urban Agglomerations in China Based on GEE. Environ Sci Pollut Res 29(54):82271–82285
Zhang Y, and Lixin Sun (2019) Spatial-Temporal Impacts of Urban Land Use Land Cover on Land Surface Temperature: Case Studies of Two Canadian Urban Areas. Int J Appl Earth Obs Geoinf 75:171–181

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Understanding land use land cover change dynamics using machine learning algorithms in the Abelti watershed, Omo-Gibe Basin, Ethiopia

Status:

Version 1

Abstract

Figures

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1. Description of the Study Area

2.2. Methods

2.2.1. Study periods

2.2.2. Data Used

2.2.3. Landsat data used for LULC classification

2.2.4. Image preprocessing and classification

2. 2.5.Application of Support Vector Machine (SVM) and Random Forests (RF) for LULC classification

2.2.6. Accuracy assessment

2.2.7. LULC change detection using Machine Learning

3. RESULTS AND DISCUSSION

3.1. LULC Classification using SVM and RF

3.2. Accuracy Assessment of LULC Classification.

3.3. LULC change detection analysis using RF

4. CONCLUSION

Declarations

References

Additional Declarations

Status:

Version 1