3.1 Supervised Classification of the major land-uses in the Ruiru catchment
A Landsat 8 image Operational Land Imager— OLI—for August was downloaded from the Earth explorer (EarthExplorer—USGS 2019) using the semi-automatic classification plugin in QGIS 3.4 (QGIS.org 2019). All the bands were then clipped to the study area size and converted to reflectance (Young et al. 2017). A band set with band5, band4 and band3 (Bands 5-4-3), which represents a temporary virtual raster that allows for the display of composite colours, was created (NASA 2013). Next, the training sites were defined by creating regions of interest—ROI. In addition to georeferenced points obtained during the field campaign, Google Earth was used to increase the number of ROIs. The main defined land-use classes were water—WTR, Bare land—BARE, Cropland—AG, Grassland and Shrubland—GR/SH, Built-up—BLD and Forests—FOR. The cropland land-use was further split into the mixed cropland—AG and Tea—TEA. The classification was carried out by using the maximum likelihood algorithm—MLA (Benediktsson et al. 1990). The MLA is a rule-based algorithm that is based on the probability that a given pixel belongs to a particular class. This was done iteratively and with the Band 5-4-3 (see description in Table 1) band combination.
Table 1 Landsat 8 band description and combination
Band Number
|
Description
|
Band 1
|
Coastal
|
Band 2–4
|
Visible blue, green, red
|
Band 5–7
|
Infrared (Near, shortwave (1.56–1.66 µm), shortwave (2.10–2.30 µm)
|
Band 8
|
Panchromatic
|
Band 9
|
Cirrus
|
Band 10–11
|
Infrared (Longwave (10–11.3 µm), longwave (11.5–12.5 µm)
|
Source: https://landsat.gsfc.nasa.gov/landsat-8/landsat-8-bands/
Each time the classification was done, the spectral signatures (reflectance as a function of the shortwave and different objects have unique signatures which can be used for classification) (NASA 1999) were assessed. However, during classification similar spectral signatures may be recorded for different materials which could lead to misclassification (Hadi et al. 2018). To overcome this, more training sites were delineated to allow MLA to discriminate between the various vegetation cover and between bare land and built-up areas. The post-processing step included the removal of raster polygons smaller than the minimum mapping unit—MMU, which was set at 25. As the GR/SH vegetation was limited, a new class "Natural Vegetation—NV" was created by aggregating the grasslands, shrublands and forests. A final land-use map with an accuracy of 71% obtained from the confusion matrix, was created.
3.2 Calculating the individual RUSLE factors
The RUSLE (Renard et al. 2017) was determined by multiplying all the erosion risk factors represented below:
This equation was modified to include the temporally dynamic soil erosion factors based on (Schmidt et al. 2019) (Eq. 1)
The data sources and derivation for the RUSLE parameters are given in Table 2.
Table 2 Overview of the individual erosion risk factors, datasets and formula used
R-factor
The required rainfall data for the determination of for the years between 2011 and 2017 was obtained from the Upland Station located in the western part of the Ruiru Reservoir catchment. This is the only existing station in the catchment. These data were complemented with the gridded daily rainfall of the Climate Hazard Group Infrared Precipitation—CHIRPS (Climate Hazards Center - UC Santa Barbara 2020). The Modified Fournier Index—MFI— (Anoldus 1980) and consequently the factor were determined using Eq.3 (Table 2). For each month, was assumed to be spatially static. It was deemed adequate as the area of the river catchment is 51 km2 and does not experience a large monthly rainfall variation.
K-factor
Soil texture, permeability and organic matter of the topsoil obtained from the laboratory analysis were employed to determine the K-factor for the 90 sampled points using the Schwertmann (1987) method (Eq. 4). The cubist method (Quinlan 1992) with the regression kriging interpolation technique with selected covariates—Cubist-kriging— (Malone et al. 2017) presented in Table 3 was applied to create a continuous soil erodibility map.
Table 3 Covariates used in cubist-kriging
The cubist model divides data into partitions based on rules associated with covariates and fits a regression equation to each subset. Predictions are then determined based on the relative importance of the covariates. Moreover, three parameters must be established: rules-maximum number of partitions allowed, committees-maximum number of boosting iterations and extrapolations-model constraints (Malone et al. 2017). On the other hand, regression kriging is a spatial interpolation technique that uses a semivariogram to quantify the spatial structure of residuals (difference between the predicted and observed values) (Ma et al. 2017). In this study, soil erodibility data from the 90 sampling sites were split into two segments representing the training and testing data (∼70% and ∼30% respectively). First, the Cubist model was applied where the data was partitioned based on the most relevant covariates present. This represented the deterministic component of the predictions. Regression kriging, representing the stochastic component, was then undertaken. Finally, these two components were added together to arrive at the final prediction. A leave-one-out cross-validation scheme was applied to assess the accuracy of the predictions, which were represented using the coefficient of determination-R2. A detailed description can be found in Malone et al. (2017).
C-factor
The monthly C-factors were adjusted based on the Normalized Difference Vegetation Index (NDVI) (Jong 1994), which was calculated from Landsat 8 images for the year 2017. The NDVI values range between –1, for almost bare surfaces and water bodies, to 1 for densely vegetated surfaces. The NDVIs were afterwards converted to the C-factor using Eq. 4 (Almagro et al. 2019), by applying the raster calculator tool in ArcGIS (ArcMap version 10.2).
In the months where cloud cover from the Landsat 8+ was greater than 10% (April, May, November), the C-factor values were obtained from spline interpolation. The spline interpolation method is a minimum curvature function that passes through the data with the accuracy of their mean errors. In this method, all the data points influence the value of the interpolated point, with those closest to the main station having the greatest impact on the value of the interpolated point (Niedzielski 2015). Using the polynomials, the first and second derivatives can easily be derived, making them applicable in biological modelling such as developing plant growth curves (Quero et al. 2015).
LS-factor and P- factor
Using the DEM as an input, the LS-factor was determined by using Eq.7 in Table 2 in ArcGIS (ArcMap version 10.2). A few sporadic conservation practices were observed in the Ruiru reservoir catchment. This included 'fanya juu' terraces on steep agricultural lands and grass strips along the riparian region. To account for the 'fanya juu' terraces a threshold of 25% slope was set first for agricultural lands. This means that this conservation measure would be implemented in areas with a slope above 25%. As not all farmers have adopted this conservation measure, 10% of all the possible selections were randomly selected. Using the dply package in R statistical software version 3.5.1 (R core Team 2019), all pixels were loaded and the first filter according to the slope was implemented. The 10% of remaining pixels were selected using random sampling and a value of 0.7 was assigned according to Angima et al. (2003). Furthermore, the P-factor value for all pixels within 30 m of the Ruiru River was adjusted to 0.9 according to Mwangi et al. (2015) in ArcGIS (ArcMap version 10.8) to account for grass strips. These have been developed along the Ruiru River as part of an integrated water resource protection initiative by Water Rural User Association—WRUA (Kamamia et al. 2021). For all other LULC, this value was set to 1.
Plausibility check with GloSEM data
As a plausibility check, the monthly soil loss values—REG_SL— were compared with the global average soil loss estimates (GloSEM) obtained from the European Soil Data Centre- ESDAC (Borrelli et al. 2017). As presented in the flowchart (Fig. 2), the individual RUSLE factors were overlaid and multiplied with each other using the raster calculator tool in ArcGIS (ArcMap version 10.2). The K-factor, LS-factor and P-factors remained static while the R-factor and C-factor were substituted for each month. Monthly soil loss was then compared for the different LULC by using the land-use map created from Landsat 8+ data (Section 3.1).