2.1 Study area
Geographically, the West Nayar Basin (WNB) spans from 29º 54´ 42.351´´ N to 30º 13´ 14.243״ N and 29º 56´ 6.259´´ E to 30º 9´ 18.617´´ E, encompassing a total area of approximately 746.83 km2 and attitudinally, the catchment ranges from 558 m to 2972 m (see Fig. 1). The basin's rugged terrain, severe climatic conditions, geological instability, and the tumultuous flow of streams and tributaries render it highly susceptible to natural hazards such as landslides and flash floods. These hazards, triggered by cloud bursts and heavy rainfall, often lead to significant loss of human lives and infrastructure. Situated in the north and northeast part of the Pauri Garhwal District of Uttarakhand, India, the Himalayan landscape of the WNB is administratively divided into the blocks of Thalisain, Pabau, Ekeshwar, and Kaljikhal. The basin experiences rainfall from the tropical monsoon during summers and from western disturbances in the winter season.
2.2 Inventory of landslides and background points
In this study, we employed the widely renowned Maximum Entropy Algorithm to ensemble data for predicting landslide susceptibility in the WNB. The methodological flowchart of the step-wise modeling process is depicted in Figure-2. Landslide occurrences were spatially recorded as point data across the entire basin using a Garmin 650 Oregon GPS device. Leveraging the ease of identification provided by freely available software like Google Earth (Slingsby and Slingsby, 2019), we utilized a very high-resolution Google Earth Pro image from 2020 to pinpoint the spatial locations of previously observed landslides within the WNB. Following the identification of all landslides within the investigation area, a comprehensive sampling approach was employed. Each identified landslide was meticulously marked, culminating in the pinpointing of a total of 121 potential landslide sites during ground verification.
Among them, a subset comprising 30% of randomly selected landslide sites was verified through ground-truthing (see Fig. 3) from December 2018 to August 2022, resulting in a accuracy assessment. Following the removal of spatial autocorrelations with the SDM toolbox (Brown, 2014), a total of 121 locations remained. Of the identified locations, 70% were randomly assigned as training points for the development of landslide susceptibility models, while the remaining 30% were earmarked as test datasets to assess the performance of each model. Comprehensive field surveys were conducted to collect ground data and conduct on-site verification, ensuring the accuracy and reliability of the models. All the processing and calibrations regarding this assessment were conducted using ArcGIS Pro 3.0.1 (ESRI 2022). Likewise, 121 presence records were utilized for training purposes across the entirety of the West Nayar Basin (WNB) for the subsequent modeling endeavors. Additionally, a total of 10,105 points, randomly permuted by the algorithm, were employed for MaxEnt distribution processing, encompassing both background points and presence points.
2.3 Landslide Predictor Variable
Eight landslide predictor variables for landslide susceptibility assessment, viz., lithology, geomorphology, elevation, precipitation, river, drainage density, angle of slope and aspect (figure-4) have taken into the account.
2.4 Statistical Analysis of Correlation
The Pearson Correlation Coefficient (PCC or r) method was utilized to assess multicollinearity or pairwise correlation among all predictor variables using ArcGIS Pro 3.0.1 (Benesty et al., 2009). PCC values range from − 1 to 1, where a positive value indicates a perfect positive relationship between the variables, a negative value depicts a perfect negative relationship, and a null value signifies relatively no relationship among the variables (Guo et al., 2014). The PCC can be calculated using Eq. 1.
\(\:{r}_{xy}=\frac{{\sum\:}_{i=1}^{n}\left({x}_{i}-\stackrel{-}{x}\right)*\left({y}_{i}-\stackrel{-}{y}\right)}{\sqrt{{\sum\:}_{i=1}^{n}{\left({x}_{i}-\stackrel{-}{x}\right)}^{2}{\varSigma\:}_{i=1\:}^{n}}{\left({y}_{i}-\stackrel{-}{y}\right)}^{2}}\) ……………………………. Eq. 1
where, rxy = Pearson Coefficient Correlation (PCC) between the variables X and Y. The xi is the values for X variables, yi values for Y variables, x stands for the average of the X variable samples and y is for the average of the Y values dataset.
2.4 Model Building: Maximum Entropy Model (MaxEnt)
The Maximum Entropy (MaxEnt) model is rooted in the entropy maximization principle of statistical physics (Banavar et al., 2010), originating from an information-theoretic approach (Ruddell et al., 2013) that is applicable to predicting spatiotemporal patterns of landslide occurrence. Initially employed in ecosystem studies to forecast species distribution using species occurrence data (Grimm et al., 2005; Phillips and Dudík, 2008), MaxEnt has gained popularity in various fields (Elith et al., 2011). However, its application in landslide susceptibility modeling has not been thoroughly explored (Felicísimo et al., 2013).
MaxEnt identifies potential sets of predictors from which a pattern might emerge. Subsequently, it utilizes sets of predictor variables and landslide occurrence data to collectively identify the most 'susceptible' conditions. In this case, eight categorical variables were selected as the most favorable predictors for predicting landslide-susceptible conditions using MaxEnt. The MaxEnt model, based on information theory and statistics, employs default parameters to estimate landslide probability values ranging from 0 to 1. It assigns importance weights to the sample area's locations, thus carving out the landslide distribution as a probability distribution (Felicísimo et al., 2013). By considering both the existence or absence of landslide sites, MaxEnt evaluates landslide probability distributions. Subsequently, employing the maximum entropy rule, it extends the distribution function to the most likely sites (Felicísimo et al., 2013). The primary aim of MaxEnt modeling is to find the Gibbs distribution that maximizes the probability of a given log-likelihood, which can be calculated using Eq. 2 (Phillips and Dudík, 2008; Park, 2015).
\(\:\frac{1}{m}\sum\:_{i=1}^{m}ln\left[q\lambda\:\left({x}_{i}\right)\right]-\sum\:_{j=1}^{n}{\beta\:}_{j}\left|{\lambda\:}_{i}\right|\) ……………………………. Eq. 2
Here, the term βj represents the normalization constant for the variables included for the jth features. The first term of the equation corresponds to the log probability function, which enhances data accuracy and maximizes the model's value. The second part of the equation represents the normalization value. It becomes apparent that the Gibbs distribution function, derived through MaxEnt modeling, provides the best fit for the data.
2.5 Model performance
In this analysis, the Receiver Operating Characteristic (ROC) Curve is utilized to assess the accuracy of statistical models (Chen et al., 2017, Pandey et al., 2020). This approach enables a diagnostic evaluation to differentiate between two independent events and evaluate the performance of the classifier (Swets, 1988). The curve illustrates the trade-off between the true-positive rate, representing the probability of correctly predicting an expected event, and the false-negative or false-positive rate, representing the probability of incorrectly predicting an event. The ROC Curve aids in calculating the future probabilities of landslide occurrence, providing insights into the model's quality and predictive capabilities (Brenning, 2005). Model accuracy, determined by the area under ROC curve (AUC), is computed, with values ranging from 0 to 1. AUC values near 1 indicate excellent model performance, while values around 0.5 suggest poor model performance (Boussouf et al., 2023; Pramanik et al. 2021; Pradhan, 2013).
2.6 Ensemble Model
The MaxEnt model has achieved an AUC value of 0.928, indicating the high reliability of the susceptibility maps. However, while the sensitivity map provides valuable information with values ranging from 0 to 1, it is often more practical for decision-makers to visualize the landscape in distinct categories, such as low, moderate, high and very highly susceptible to landslides. This kind of output, which incorporates a continuous susceptibility index, obviously conveys more information than a simple presence/absence map. It is more convenient for landslide management support because it allows for more nuanced risk assessment and prioritization of mitigation efforts. This approach enables stakeholders to better understand varying levels of risk across different areas, facilitating more informed decision-making and resource allocation for landslide prevention and response (Wang et al., 2024). Real curves often exhibit steps or exponential patterns, underscoring the need for a method that objectively determines thresholds to reclassify susceptibility intensity into different classes. This approach enhances the accuracy of the landslide susceptibility map by providing clear distinctions between different levels of risk.
In the current assessment, the spatial layer of model output indicating the varying intensity of landslide susceptibility was reclassified into four classes. Predicted-to-expected ratios (Fi) for each category were then determined using Eq. 3.
\(\:{F}_{i}=\frac{{P}_{i}}{{E}_{i}}\) ……………………………. Eq. 3
Where, Pi represents the expected frequency of assessment points in class i, while Ei signifies the expected frequency relative to the area covered by each class. Ei values were plotted against class intervals.
The output susceptibility spatial layer was reclassified in to four zones indicting very high, high, moderate and low susceptibility to landslide occurrence. The output spatial prediction of susceptibility was validated using ground-truthing data, showing that out of 121 locations, 80 (96.8%) accurately fell within the risk zone. The ensemble-derived landslide susceptibility map revealed that the areas most prone to landslides were concentrated along the lower valley zone at low to middle elevations, extending towards the southwestern region and along riverbanks. Higher elevation areas showed minimal landslide occurrences.