A Framework for Imbalanced Modelling in Disaster Management: A Case Study Involving Global Landslide Susceptibility

doi:10.21203/rs.3.rs-2337189/v1

Download PDF

Article

A Framework for Imbalanced Modelling in Disaster Management: A Case Study Involving Global Landslide Susceptibility

https://doi.org/10.21203/rs.3.rs-2337189/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

This paper proposes a modelling framework for imbalanced problems in the field of disaster management. Global landslide susceptibility was used as a case study. After investigating metrics for imbalanced classifiers, six metrics were selected: AUC, F1, Precision, Recall, G-mean and Kappa. A comparison was made between methods in the imbalanced learning domain and commonly used strategies in the disaster domain. Ten supervised learning classifiers were built, and the extra Tree classifier outperformed other classifiers according to the post hoc test. The ET classifier built by the SMOTE & ENN hybrid sampling dataset outperformed the other classifiers, and the AUC and F1 were 0.9533 and 0.1049, respectively, on the five validation sets. Such a result indicates that the model has strong robustness and outstanding performance. It was found that the imbalanced learning framework can significantly improve the performance of disaster classifiers even at a global scale.

Earth and environmental sciences/Natural hazards

Physical sciences/Mathematics and computing/Computational science

Landslide

Landslide susceptibility

Undersampling

Oversampling

Machine learning

Deep learning

The number of extreme disaster events in the world has been increasing since the 21st century. According to the International Disaster Database-Centre for Research on the Epidemiology of Disasters (CRED), 7737 natural disasters were reported in 2021, which caused 1.25 million deaths, affected 4.13 billion people, and caused an economic loss of US$3.14 trillion. The impact and severity of disasters have increased significantly compared with the past two decades ^1,2.

With the thriving development of artificial intelligence (AI) technology, its application in the field of disaster management has become common. One hundred papers directly applied AI, machine learning (ML), and deep learning (DL) to disaster studies from 2010 to 2019 ³. ML techniques have fundamentally improved research efficiency and model validity in the field of disaster management in the last two decades ⁴. The algorithms that were implemented most frequently were the Artificial Neural Network (ANN), Support Vector Machine (SVM), Fuzzy Logic, Regression Algorithm, and Genetic Algorithm ⁵. Different disasters normally choose different algorithms; for instance, SVM, logistic regression (LR), random forest (RF) and ANN were implemented in the study of landslide susceptibility (LS) ⁶.

However, all algorithms face a dilemma: how can the dataset be handled for modelling? The imbalanced learning (IBL) problem is inherent in disaster management, which means that one category of data accounts for the vast majority of a binary problem, and the imbalanced ratio (IBR) is too large to train the model properly; for instance, an IBR of 100 means that the ratio of majority class samples to minority class samples is 100:1 ⁷. Researchers intuitively hope that the accuracy of a binary classifier can balance the two categories, but it is difficult for a high IBR dataset. The reason is that the model can classify all samples as the majority class to achieve 99% accuracy ⁸.

The current field of disaster science has proposed corresponding solutions in several aspects to avoid the IBL problem.

In terms of samples, random majority undersampling (RUS) is a common, direct and effective method that trains the model by all minority class data and a random dice of data from the majority class ⁹. A typical approach is to randomly dice twice the amount of majority samples compared with minority samples, and 50 iterations were applied to train 50 different models. The final result and metrics performance are the average of the 50 models ¹⁰. There are two significant shortcomings of RUS. First, the optimal ratio of RUS is unknown, and important features of the majority class are prone to learn insufficiently due to information loss ¹¹.

Cost-sensitive learning (CSL) refers to adjusting the cost weight of different classes, and the model can be relatively balanced during training ¹². Specifically, there are three approaches to achieve it. The first one is to construct a new distribution space by introducing a new cost variable, and the minimum misclassified rate of the new distribution space is approximately equal to the real minimum misclassified rate; the second approach is to make a specific algorithm cost; the third approach is to apply the Bayesian theorem to classify each sample into its lowest risk category ^13,14. The drawback of CSL is an unknowingly optimal ratio ¹³. Nonetheless, the "optimal" ratio is still at risk of overfitting ¹⁵.

The advantages of the above methods are straightforward to apply. However, all of them have different shortcomings. A common problem is that the generalization of the model is prone to be insufficient, which means poor performance on datasets other than the validation set or the training validation set ^13,16.

IBL models have been applied in several disaster studies ¹⁶. For instance, the weighted k-nearest neighbour (KNN) method was used for text data mining to detect and rank disasters ¹⁷, and CSL and resampling were applied to study tornadoes ^14,18,19. However, no study has tried a variety of IBL approaches, and a modelling framework in the field of disaster management is still lacking from a lack of comparative research. Although the solution of IBL is very meaningful in the field of disaster science, a framework for IBL problems in this field has not been proposed, although it is widely used in other domains ¹⁵.

Informed resampling (IR), which uses an advanced resampling strategy, is defined into three categories: undersampling, oversampling and hybrid sampling ¹⁴ (Fig. 1). IR is not a specific algorithm for modelling; it is more similar to the preprocessing of datasets. Briefly, undersampling applies different statistical frameworks to evaluate the characteristics of sample distribution, such as the use of distance-based domain relationships between samples ²⁰, which select a subset to remove samples with ambiguous features or noise, and it makes the refined majority class samples have apparent differences from minority class samples. Compared with RUS, this approach reduces information loss and improves the data quality and the model performance by balancing the learning weight ²¹.

Oversampling or data augmentation generates unobserved data or latent variables to improve the data quality and the efficiency of training models from iterative optimization or sampling algorithms without substantially increasing the amount of data ²². This technique is widely used in image recognition ²³. This method is different from bootstrapping or random oversampling because it is not simply ‘copying’ and ‘pasting’ the original samples, and certain randomness occurs when the algorithm generates samples, which apparently reduces the overfitting of the classifier ^24–26.

Hybrid sampling methods that combine undersampling and oversampling have been proposed ²⁷. For instance, the synthetic minority oversampling technique (SMOTE) ²⁶, a typical oversampling algorithm, can generate noise from the interpolation of new samples, which can affect the performance of the established model. The noise generated by oversampling can be removed if the undersampling method is applied after oversampling ²¹.

This study proposes a new modelling framework in disaster management, and the global landslide susceptibility was involved as a case study. IR and improved ensemble learning (IEL) were implemented. It was found that the new framework can significantly improve the performance of the classifier compared with the commonly used methods. The optimal classifier is the extra tree (ET) according to the post hoc test, and the ET model built on the SMOTE & Edited Nearest Neighbours (ENN) resampling dataset is prime for global SL.

2.1 Supervised learning (SL) algorithm and validation sets

IR datasets were implemented to train SL models (Fig. 2).

A decision tree (DT) ²⁸ is a tree-like structure classifier for classification problems, and the DT creates a rule to predict the value of an instance variable by learning decision rules inferred from the data features. Specifically, the node represents the test on a feature, the branch is the outcome of the test, and the leaf node represents the class label. In this study, the CART algorithm is implemented to construct a binary DT by calculating the Gini index as follows:

$$Gini \left(D\right)=1-\sum _{i=1}^{m}{p}_{i}^{2}$$

where p_i denotes the possibility of a training sample belonging to class C_i in D. p_i is calculated by |C_i,D|/|D|; m is the total number of classes.

RF ²⁹ is a meta estimator that ensembles a number of DT classifiers. RF decreases the variance and tendency of overfitting by involving two randomness strategies, creating splits from all input features and a random subset of features, and averaging the predictions of all decision tree classifiers.

ET ³⁰ is an ensemble algorithm that is similar to RF, and it keeps the strategies of RF but further the randomness by taking random thresholds for features when splitting. ET normally decreases the variance but increases the bias compared with RF.

ADB ³¹ ensembles a sequence of weak classifiers (DTs in this study) to increase the model performance by iteratively modifying the dataset. The boosting iteration of ADB modifies the dataset by reweighting each sample of the training set at each iteration, and the weights of samples that were incorrectly predicted by the model increase, whereas the weights are decreased for those that were predicted correctly.

Gradient tree boost (GB) ³² is a type of boosting algorithm that generalizes a sequence of weak classifiers (DTs in this study) according to an arbitrary differentiable loss function:

$${L}_{MSE}=\frac{1}{n}\sum _{i=1}^{n}{({y}_{i}-F({x}_{i}\left)\right)}^{2}$$

where F(x_i) is the prediction value of classifier F and the goal of the loss function is to minimize the mean squared error (MSE).

Assume X is an n attribute vector and y_i takes values in the set{0,1}. LR predicts the probability of the positive class P(y_i=1|X_i) as:

$$P\left({X}_{i}\right)=\frac{1}{1+\text{e}\text{x}\text{p}(-{X}_{i}\omega -{\omega }_{0})}$$

where ω and ω₀ are parameters, and they can be assessed by minimizing the following cost function with regularization term r(ω):

$$\underset{{\omega }}{\text{min}}C\sum _{i=1}^{n}\left(-{y}_{i}\text{log}\left(P\left({X}_{i}\right)\right)-(1-{y}_{i})\text{log}(1-P({X}_{i}\left)\right)\right)+r\left({\omega }\right)$$

SVM ³³ applies a hyperplane ω^Tz+b=0 to separate samples into two classes for a binary problem:

$$\left\{\begin{array}{c}-{{\omega }}^{T}{x}_{i}+b\ge 1, {y}_{i}=1\\ -{{\omega }}^{T}{x}_{i}+b\le 0, {y}_{i}=0\end{array}\right.$$

where x_i is the attribute vector of the samples and y_i is the label of each one.

The support vectors are defined as samples that are close to the hyperplane. The margin is defined as the distance between two support vectors of two classes:

$$\gamma =\frac{2}{\left|\right|\omega \left|\right|}$$

The maximizing $\gamma$ is the optimal hyperplane.

KNN ³⁴ is an instance-based algorithm. For a multifeature dataset, each sample can be handled as a subset of the feature space X with n-dimensional vector space R_d, X={x₁…,x_N}, x_n ∈R^d. The difference between samples is calculated by the Euclidean function:

$$d\left(x,y\right)=\sqrt{\sum _{i=1}^{n}{({x}_{i}-{y}_{i})}^{2}}$$

and the closest number of k samples to an instance are called its KNN.

ANN ³⁵ implements the back propagation algorithm, which is a gradient-decent-based algorithm including one input layer, one output layer and one or more hidden layers. Each layer is composed of one or more neuron nodes. The training procedure contains three steps: randomly initialize the weights; each input unit feed forwards and broadcasts the signal to each of the neurons of the hidden layers; reverse error propagation from the output layer and updating the weights between the neurons in the input and hidden layers. The sigmoid function is used in this study to determine the output state:

$${a}_{i}=\frac{1}{1+{e}^{-ne{t}_{i}}}$$

The convolutional neural network (CNN) ³⁶ architecture comprisesof convolutional layers, pooling layers and fully connected layers. Specifically, the convolution layer generates the output of neurons connected to the local regions of the input by calculating the weight of the neurons and the scalar product between the area connected to the input volume; the pooling layer applies downsampling to the given input by the spatial dimensionality; the fully connected layer performs the same with ANNs and calculates the confidence of the class score according to the activations.

The gated recurrent unit (GRU) ³⁷ addresses the vanishing gradient problem in a simply recurrent neural network (RNN) by consisting of two RNNs. Assuming a hidden state h and an optional output y that operates on a variable sequence x={x₁,…,x_T}, one encodes a sequence of symbols into a fixed length vector representation:

$${h}_{t}=f({h}_{t-1},{x}_{t})$$

where f is a nonlinear activation function.

The other decodes the representation into another sequence of symbols:

$${h}_{t}=f({h}_{t-1},{y}_{t-1},c)$$

where c is a summary of the whole input sequence after reading the end of the sequence of the first RNN. The two components of RNN are jointly trained to maximize:

$$\underset{\theta }{\text{max}}\frac{1}{N}\sum _{n=1}^{N}{\text{log}}_{{p}_{\theta }}\left({y}_{n}\right|{x}_{n})$$

where θ is the model parameter and each (x_n, y_n) is a pair of input sequences and output sequences.

The validation sets of all models are subsets of the original data because the IR strategies are prone to involve an estimation error. A reasonable and straightforward approach is randomly selecting 30% of the original data as a validation set, which does not involve all IBL training sets five times, and the final result is the average of 5 validation sets.

2.2 IEL methods

The balanced random forest (BRF) ³⁸ classifier applies an improved bootstrapping in each classifier and reduces the information loss by the ensemble.

RUSBoost (RUSB) ³⁹ is an extension of AdaBoost (ADB) that comprises RUS, and the ADB classifier applies RUS before each iteration to improve the performance of IBL models.

Easy Ensemble (EE) ⁴⁰ is the ensemble of multiple ADB classifiers, and each ADB classifier is built by bootstrapping. The advantage of EE is significantly less training time based on apparently improved model performance.

2.3 IR

2.3.1 Undersampling

The cluster centroid (CC) ⁴³ is an extension of k-means clustering. Assuming that the dataset has been clustered into M disjoint subsets C₁...,C_M, each with a centroid c_i, and the most widely used classification criterion for K-means clustering is the sum of squared Euclidean distances of samples and centroids m_k, this criterion is named clustering error, which depends on the clustering centres m₁,…,m_k:

$$E\left({m}_{1},\dots ,{m}_{M}\right)=\sum _{i=1}^{N}\sum _{k=1}^{M}I({x}_{i}\in {C}_{k}){{x}_{i}‖{x}_{i}-{m}_{k}‖}^{2}$$

if X is true I(X) = 1 else I(X) = 0. The K-means algorithm finds local optimal solutions for clustering errors. CC finds every centroid for the majority class by k-means and only keeps the data on the centroid or uses the kNN rule to keep the samples within the k nearest neighbour of the centroids⁴⁴.

Near Miss ²⁰ (NM) (Fig. 3) applies 3 different strategies to balance the dataset, and NM comprises kNN rules.

NM strategy 1 (NM1)) samples from the majority samples using the kNN algorithm, and it filters and retains the samples with the smallest average distance from the minority group.

NM strategy 2 (NM2) samples from the majority class using the kNN algorithm, and it filters the minority samples that are farthest from the majority group as a subset and retains the majority samples with the smallest average distance from them.

NM strategy 3 (NM3) preselects a group of samples of the minority samples as a subset, and it filters and retains the samples with the largest average distance from this group.

Tomek’s link ⁴⁵ (Fig. 3) applies the 1NN rule to remove the noise and border samples, which means there is a Tomek’s link if 2 samples are 1NN of each other, and they are defined as noise or border samples that can affect the decision boundary of the mode.

One Sided Selection (OSS) ⁴¹ will run the program after Tomek’s link: add all minority samples to set C, add 1 sample from the majority class to C, and add all other samples to set S; traverse set S sample by sample, and apply the 1 nearest neighbour rule for each sample in S. If the sample is misclassified, add it to C; otherwise, repeat the above steps until no sample is added ⁴⁶.

ENN ⁴² is a technique to remove noise and border samples. ENN traverses every sample and classifies if they are noise by kNN, and the criterion is the proportion of two classes, which means samples are not noise if 2/3 of samples of kNN are majority class, otherwise the noise needs to be removed.

The neighbour cleaning rule (NCL) ⁴⁶ is similar but different from OSS by the application of ENN rather than Tomek’s link. A T threshold parameter is added to avoid excessive data cleaning: C_i>C*T, C_i is the number of samples of classes and C is the number of samples of the dataset. The subsequent process is the same as that of OSS ⁴⁶.

The instance hardness threshold (IHT) ⁴⁷ defines the study problems of ML as maximizing the probability value by Bayes’ theorem. Instance hardness (IH) means that the instance is prone to be classified incorrectly under the assumption h. A representative set of learning algorithms and their associated parameters ι are weighted a priori with nonzero probability, and all other learning algorithms are handled as having zero probability to approximate the unknown distribution p(h|t) or equivalently p(g(t, α)):

$$I{H}_{\iota }\left(⟨{x}_{i},{y}_{i}⟩\right)=1-\frac{1}{\left|\iota \right|}\sum _{j=1}^{\left|\iota \right|}p(\left.{y}_{i}\right|{x}_{i},{g}_{j}(t,{\alpha }\left)\right)$$

Specifically, a pretrained classifier determines the probability value IHι of each majority class sample on the dataset, and those with low probability are considered indistinguishable samples that need to be removed.

2.3.2 Oversampling

SMOTE ²⁶ is a technique to artificially synthesize minority class samples by interpolating in the feature space of minority class samples. Compared with random oversampling (ROS), the advantage of SMOTE is that it effectively makes the decision region of the minority class samples more general and smoother ⁴⁸.

There are two steps for SMOTE (Fig. 3). First, SMOTE randomly selects samples in kNN according to the required synthesis ratio. For example, 1 of the 5 nearest neighbours of each minority sample is randomly selected if 100% of the minority class samples need to be synthesized. A new sample can be randomly generated on the line segment composed of the two samples by calculating the difference between the feature vectors and multiplying the value difference by a random number between 0 and 1. SMOTE cannot be applied to categorical data.

Synthetic Minority Oversampling Technique for Nominal and Continuous (SMOTENC) ⁴³ is only for the dataset of mixed variables. In oversampling of SMOTENC, the synthesis of continuous data is consistent with SMOTE, while categorical data will select the most common category in the nearest neighbour samples.

The Euclidean distance rule is not applied in the Synthetic Minority Oversampling Technique for Nominal (SMOTEN) but the value difference metric (VDM) ⁴⁹.

2.3.3 Hybrid sampling

Although SMOTE reduces overfitting compared to ROS, it can still generate noise when using SMOTE (Menardi and Torelli, 2014). Hybrid strategies that combine oversampling and undersampling have been proposed.

SMOTE & Tomek’s link (ST) ⁵⁰ implements SMOTE for the minority class and Tomek’s link for both classes.

SMOTE & ENN (SE) ⁵¹ implements SMOTE for the minority class and ENN for both classes.

2.4 Performance measure

For a binary classification problem, the confusion matrix defines the basis for performance metrics, and ROC is a plot generated by the true positive rate (TPR) and false positive rate (FPR). AUC will calculate the probability P value of each entity as a positive class iteratively when a model classifies which category an entity belongs to, and it sorts these values and calculates them as a threshold ϴ to weigh a positive category. The algorithm for calculating AUC is the trapezoidal rule ⁵².

$$TPR=\frac{TP}{TP+FN}$$

$$FPR=\frac{FP}{FP+TN}$$

Recall and Precision are from the confusion matrix(Buckland and Gey, 1994) and are defined as:

$$Recall=TPR=\frac{TP}{TP+FN}$$

$$Precision=\frac{TP}{TP+FP}$$

Normally, a model requires both recall and precision to be high. The F-measure ⁵⁴ is an equation to balance recall and precision. When the value of β is 1, F1 represents that recall and precision are equally important. If β > 1, precision is considered more important than recall, and if β < 1, recall is considered more important.

$$F-measure=\frac{({\beta }^{2}+1)PR}{{\beta }^{2}*P+R}$$

G-mean ⁵² aggregates the accuracy of each class by taking the geometric mean of the accuracy to offset the dominance of the majority class.

$$Gmean\left(Ts\right)=\sqrt[m]{\prod _{i=1}^{m}\frac{corr\left(\text{T}\text{s}\text{i}\right)}{\left|\text{T}\text{s}\text{i}\right|}}$$

Cohen’s kappa ⁵⁵measures the possibility of the agreement that occurs by chance. P₀ denotes the empirical probability of agreement, which equals accuracy, and P_e is the possibility assigned labels randomly.

$$Kappa=\frac{{P}_{0}-{P}_{e}}{1-{P}_{e}}$$

2.5 Post hoc test

Friedman's test is a widely applied approach ⁵⁶ to compare the performance of multiple algorithms on multiple datasets simply by the number of metrics in ML.

Let ${{R}_{i}}^{j}$be the ranking value of the j algorithm on the i dataset in the N datasets (for example, the model with the largest AUC ranks 1, the second largest ranks 2), the Friedman test compares the algorithms by the average ranking values under the null hypothesis that all algorithms are equivalent, and their ranking R_j should be equal when N and k are large enough. The variable${{\widehat{{\chi }}}^{2}}_{F}$ obeys k − 1 degrees of freedom and (k-1)(N-1) F distribution ⁵⁷.

$${{\widehat{{\chi }}}^{2}}_{F}=\left[\frac{12}{nk(k+1)}\right.\left.\sum _{j}^{k}{{R}_{j}}^{2}\right]-3n(k+1)$$

It does not require the commensurability of measures across different datasets because it is nonparametric and does not assume normality of sample means and is robust to outliers.

A further assessment is necessary if the differences are significant when the Friedman test rejects the null hypothesis. Nemenyi ⁵⁸ proposed a method to calculate the threshold of the average ranking value (ARV) among algorithms. Namely, for the ranking averages R_i and R_j of the two algorithms among multiple datasets, if the difference exceeds the threshold of the ARV, reject R_i=R_j:

$$\left|\stackrel{⃑}{{R}_{i}}-\stackrel{⃑}{{R}_{j}}\right|>{q}_{\alpha }\sqrt{\frac{k(k+1)}{6n}}=CD$$

q_α is determined by the number of classifiers and the degree of confidence (Table 1) :

Table 1

**Critical values for the two-tailed Nemenyi test** ⁵⁶
Classifiers	2	3	4	5	6	7	8	9	10
q_0.05	1.960	2.343	2.569	2.728	2.850	2.949	3.031	3.102	3.164
q_0.10	1.645	2.052	2.291	2.459	2.589	2.693	2.780	2.855	2.920

2.6 Case Study and available data

Global LS was involved as a case study.

Table 2: Description of Explanatory Variables to Develop the Global Susceptibility Map

Data type	Dataset	Resolution	Explanatory variable	Extent	Source and details
Elevation	GMTED2010: Global Multi-resolution Terrain Elevation Data 2010	1km	Elevation, General Curvature, Slope Aspect	Global	https://developers.google.com/earth-engine/datasets/catalog/USGS_GMTED2010?hl=en
Slope	GMTED2010: Global Multi-resolution Terrain Elevation Data 2010	1km	Slope	Global	https://developers.google.com/earth-engine/datasets/catalog/USGS_GMTED2010?hl=en
Rainfall	Global Precipitation Measurement	10km	global precipitation measurement (GPM)	Global	https://gpm.nasa.gov/data/directory
Landcover	FROM-GLC 2017v1	300m	Landcover type	Global	http://data.ess.tsinghua.edu.cn/
Soil type	Global Soil Regions Map	1：5000000	Soil classification	Global	https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/use/?cid=nrcs142p2_054013
NDVI	MOD13A2.006 Terra Vegetation Indices 16-Day Global 1km	1km	Forest cover	Global	https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD13A2
Lithology	Global Lithological Map Database v1.0 (gridded to 0.5° spatial resolution). PANGAEA	0.5°	Lithologic classification	Global	https://doi.pangaea.de/10.1594/PANGAEA.788537
Climate Classes	WORLD MAP OF THE KÖPPEN-GEIGER CLIMATE CLASSIFICATION UPDATED	10km	Climate Classes	Global	http://koeppen-geiger.vu-wien.ac.at/present.htm
River network	Major River Basins of the World, 2nd ed. (GRDC, 2020)	Variable	Euclidean distance to rivers	Global	https://www.bafg.de/GRDC/EN/02_srvcs/22_gslrs/221_MRB/riverbasins_node.html;jsessionid= 63179A36F6128A65D1D6355B9035421D.live21323#doc2731742bodyText3
Landslide Catalog	Global Landslide Catalog (GLC) & Landslide Reporter Catalog (LRC)	Variable	Landslide report	Global	https://gpm.nasa.gov/landslides/coolrdata.html

The landslide location data for this study are from NASA GLC and LRC ^59–61. The digital elevation model (DEM), precipitation, slope, NDVI, land cover, lithology, soil types and climate classes are from the Google Earth Engine (GEE) platform (Table 2).

Slope was calculated by the function of the GEE platform. Slope aspect and general curvature (GC) were calculated by DEM in third-order partial derivatives of QGIS ⁶² (Fig. 4). The Euclidean distance to rivers (EDTR) of landslide points was calculated by the Euclidean Distance tool in ArcMap and the variable of the Global River network (Table 2).

This study applied RF, AdaBoost and L1 regularization ⁶³ for feature selection, and 26 features were selected (Table 3).

Table 3

Selected Features
Numerical Variables	Climate Classes	Landcover	Lithology	Soil
GPM	Aw	Cropland	Unconsolidated Sediments	Gelisols
slope	Csb	Forest	Pyroclastics	Ultisols
GC	Cwa	Grassland	Carbonate Sedimentary Rock	Inceptisols
DEM	Dfc		Evaporites	Alfisols
NDVI	ET		Basic Volcanic Rocks
EDTR		I	Basic Plutonic Rocks
Slope aspect			Acid Plutonic Rocks

The performance of different SL algorithms on different metrics is measured by the Friedman & Nemenyi test (Fig. 5). The Y-axis is the ARV of 11 datasets resampled by IR, and the X-axis is the metrics of each SL algorithm. The vertical line is the CD value on a 95% significant interval of Nemenyi, which means there is a significant difference between the two algorithms if the two vertical lines are not overlapping, and there is a difference but not a statistically significant difference between the algorithms if there is an overlapping.

In summary, although the ET algorithm has not achieved statistically significant differences on all metrics, it outperforms other algorithms according to ARV. Therefore, the ET algorithm is the optimal algorithm to study global land susceptibility.

The performance of the IEL, AD and ET models established by the IR datasets is summarized in the table below. Table 4 presents the F-1, AUC, Precision, Recall, G-mean and Kappa for the models.

Table 4

Performance of different IBL methods
Method	AUC	F1	Precision	Recall	G-mean	Kappa
BRF	0.9847	0.0567	0.0292	1.0000	0.9072	0.0469
ST&ET	0.9713	0.1615	0.0879	0.9976	0.9709	0.1533
SMOTENC&CNN	0.9579	0.1039	0.0554	0.8321	0.8771	0.0949
SE&ET	0.9533	0.1049	0.0554	0.9971	0.9709	0.0959
ST&RF	0.9529	0.1197	0.0637	0.9826	0.9525	0.1109
SMOTENC&GRU	0.9503	0.0827	0.0435	0.8457	0.8729	0.0734
SMOTENC&RF	0.9409	0.1317	0.0708	0.9481	0.9409	0.1231
SE&CNN	0.9249	0.0488	0.0251	0.9100	0.8595	0.0389
RUS&ANN	0.9200	0.0633	0.0330	0.7870	0.8309	0.0537
ST&KNN	0.9131	0.0579	0.0298	0.9991	0.9090	0.0481
RUSB	0.9009	0.0463	0.0238	0.8465	0.8306	0.0363
EE	0.8984	0.0451	0.0232	0.8482	0.8286	0.0351
RUS&GB	0.8241	0.0618	0.0322	0.7717	0.8224	0.0522
SMOTENC&LR	0.8049	0.0423	0.0217	0.8023	0.8049	0.0323
SMOTENC&SVM	0.8045	0.0410	0.0210	0.8099	0.8045	0.0310
SMOTENC&ADB	0.8018	0.0442	0.0227	0.7827	0.7948	0.0343

Table 4 only shows the result of the optimal classifier for each algorithm and the classifiers with AUCs higher than 0.94.

In terms of IR datasets, all SL algorithms had better performance by implementing the IR datasets except the ANN and GB.

In terms of AUC, all models had the AUC higher than 0.9 except EE, RUS&GB, SMOTENC&SVM, SMOTENC&LR and SMOTENC&ADB.

When measured by F1, only ST&ET, SMOTENC&RF, ST&RF, SE&ET and SMOTENC&CNN had F1 higher than 0.1. The comparison of Precision is similar to F1, and only the Precision of the last five classifiers is higher than 0.05.

In terms of Recall, BRF, ST&KNN, ST&ET, SE&ET, ST&RF, SMOTENC&RF and SE&CNN had the Recall higher than 0.9.

For G-mean, only SE&ET, ST&ET, ST&RF, SMOTENC&RF, ST&KNN and BRF had the G-mean higher than 0.9.

In terms of Kappa, only ST&ET, SMOTENC&RF, ST&RF, SE&ET and SMOTENC&CNN had the Kappa higher than 0.09.

In summary, IR datasets outperform RUS in 5 metrics except GB and ANN algorithms. The result shows that the improvement after applying IBL methods is not consistent, and only oversampling and hybrid sampling improve the performance of classifiers. BRF, ST&ET, SE&ET, ST&RF, SMOTENC&RF, ST&KNN and SE&CNN are potential to be classifiers of LS.

The susceptibility maps generated by potential classifiers are compared with three regional LS ^64–66 maps for accuracy evaluation.

The susceptibility levels are divided into five categories: very low, low, moderate, high and very high according to the breakpoints of 0.15, 0.4, 0.6 and 0.8. The very low category accounted for 70% of the total grids, and each category was twice as large as the higher category for the remaining 4 categories.

The map of ST&KNN is significantly different from the regional LS maps. The map of ST&ET is significantly different from the EU map. The map of SE&CNN shows a similar LS map with regional LS maps, especially with the EU map, but it rarely shows ‘Low’, ‘Medium’ and ‘High’ areas, which makes it not an excellent LS map (see supplementary materials).

Compared with the US susceptibility map ⁶⁴, it used three susceptibility levels, and higher susceptibility was obviously observed in the states of North Dakota and South Dakota compared with the maps used in this study (Fig. 6 (a)). This susceptibility difference could be made using the dataset in this study, and only 5 landslides were recorded in these states. The maps of BRF and SMOTENC&RF show larger ‘Very High’ areas for northwestern states than that in ⁶⁴, and a strip-shaped ‘Moderate’ risk area dividing the ‘Very High’ area in the eastern states is not in the map of BRF and SMOTENC&RF but is observed in the other three maps. The SE&ET map more closely resembles the US map than other maps.

Comparison with the EU susceptibility map ⁶⁶ revealed regional differences (Fig. 6 (b)). Specifically, the EU map shows significantly higher susceptibility in Italy and southern Greece than all LS maps except BRF. The map of BRF shows close susceptibility in Italy and southern Greece but significantly higher susceptibility in other countries. The map of SE&ET more closely resembles the EU map compared with other maps. This difference could be made using the datasets, and 15,599 and 2321 landslides were recorded in Italy and Greece in ⁶⁶, respectively, but only 77 and 4 were used in this study.

Compared with the Chinese susceptibility map ⁶⁵, the susceptibility of ST&RF and SMOTENC&RF is significantly lower, especially in the southeastern provinces, which are famous for their high elevation, intense relative relief and karst landforms (Fig. 6 (c)). The map of the BRF is similar to the Chinese susceptibility map, but it shows higher susceptibility in the southern provinces and central regions of China. The map of SE&ET is the most similar to ⁶⁵.

In summary, the landslide susceptibility map generated by SE&ET has comparatively higher metric values than other potential classifiers and is optimal for LS. It is closer to the susceptibility distribution of regional maps than other maps.

The main focus of this study is whether the application of IBL methods notably improves the performance of classifiers in disaster management, and global LS was involved as a case study. It shows an inconsistent result, and only oversampling and hybrid sampling significantly improve the classifier performance on 6 metrics.

The potential of the IBL method applied in the field of disaster management is huge. However, it is found that IR also has the attribute of no free lunch (NFL) by the study of various IR methods on 98 datasets ⁶⁷. A general optimization strategy is nearly practicable, and the only way one strategy can outperform another is its specific structure of the problem under specialized consideration ⁶⁸. Researchers should always consider different resampling strategies and the optimization of their associated hyperparameters in experimental comparisons in the absence of domain information or prior assumptions about the specific research ⁶⁷.

Meanwhile, an outstanding category classification classifier is not necessarily an excellent probability classifier ⁴⁹, such as the SL map of SE&CNN in this study. Different approaches and metrics should be applied, such as disaster sensitivity and disaster occurrence. IBL methods can be prioritized according to the types of disasters and IBR, and the threshold of IBR and the selection of methods according to disaster types need further research. The framework of this study is only preliminary, and it can be extended and refined in the future.

The main goal of this paper is to apply IBL methods in the field of disaster management and establish a reliable modelling framework. The main question of this paper is whether the IBL method performs significantly better than the commonly used RUS and CSL methods on multiple metrics. In response to this goal and problem, this study proposes a modelling framework for IBL problems in the field of disaster management and involves global LS as a case study. Under the new framework, multiple SL methods were implemented, and it was found that the ET algorithm is optimal for studying global LS. Only oversampling and hybrid sampling significantly outperform RUS in all metrics. The ET model established by the SE dataset had the best performance. As a contribution to future work in the field of disasters, this paper proposes a workflow for modelling and how to refine it and use it for other disaster research.

Author contributions

All authors contributed to the study conception and design. Data collection, experimental execution and analysis were performed by Junfei Liu under the supervision of Kai Liu and Ming Wang. The manuscript was written by Junfei Liu and revised by Kai Liu and Ming Wang. All authors commented on previous versions of the manuscript.

Conflicts of interest The authors have no financial or proprietary interests in any material discussed in this article.

Consent to participate Not applicable.

Consent for publication Not applicable.

Ethical approval Not applicable.

Data availability statement The datasets generated and analysed during the current study are available in public data repositories and do not require any licences. Table 2 provides the details to access each dataset.

UN-CRED. Human cost of disasters (2000-2019). Human Cost of Disasters https://cred.be/sites/default/files/CRED-Disaster-Report- Human-Cost2000-2019.pdf (2020) doi:10.1186/s12889.
UN-CRED. Disaster Year in Review 2020 Global Trends and Perspectives. Cred vol. May https://cred.be/sites/default/files/CredCrunch62.pdf (2021).
Nunavath, V., Norway, G. & Goddwin, M. The use of Artificial Intelligence in Disaster Management - A systematic Literature Review. 33–35 (2019).
Yu, M., Yang, C. & Li, Y. Big data in natural disaster management: A review. Geosci. 8, (2018).
Tan, L., Guo, J., Mohanarajah, S. & Zhou, K. Can we detect trends in natural disaster management with artificial intelligence? A review of modeling practices. Nat. Hazards 107, 2389–2417 (2021).
Sun, W., Bocchini, P. & Davison, B. D. Applications of artificial intelligence for disaster management. Natural Hazards vol. 103 (Springer Netherlands, 2020).
Chawla, N. V., Japkowicz, N. & Kotcz, A. Editorial:special issue on learning from imbalanced dataset. ACM SIGKDD Explor. Newsl. 6, 1–6 (2004).
Provost, F. Machine learning from imbalanced data sets 101. Proc. AAAI’2000 Work. … 3 (2000).
Ganganwar, V. An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2, 42–47 (2012).
Herrera-García, G. et al. Mapping the global threat of land subsidence. Science (80-. ). 371, 34–36 (2021).
Gautheron, L., Habrard, A., Morvant, E. & Sebban, M. learning from imbalanced data. Proc. - Int. Conf. Tools with Artif. Intell. ICTAI 2019-Novem, 923–930 (2019).
Ramyachitra, D. & Manikandan, P. Imbalanced dataset classification and solutions: a review. Int. J. Comput. Bus. Res. 5, (2014).
Sun, Y., Wong, A. K. C. & Kamel, M. S. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 23, 687–719 (2009).
Maalouf, M., Street, W. B., Trafalis, T. B. & Street, W. B. Rare events and imbalanced datasets : an overview. Data Mining, Model. Manag. 3, 375–388 (2011).
Kaur, H., Pannu, H. S. & Malhi, A. K. A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Comput. Surv. 52, (2019).
Guo, H. et al. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017).
Kim, S., Kim, H. & Namkoong, Y. Ordinal classification of Imbalanced Data with Application in Emergency and Disaster Information Services. IEEE Intell. Syst. (2016).
Maalouf, M. & Siddiqi, M. Knowledge-Based Systems Weighted logistic regression for large-scale imbalanced and rare events data. Knowledge-Based Syst. J. 59, 142–144 (2014).
Trafalis, T. B., Adrianto, I. & Lakshmivarahan, M. B. R. S. Machine-learning classifiers for imbalanced tornado data. Comput. Manag. Sci. (2013) doi:10.1007/s10287-013-0174-6.
Zhang, J. & Mani, I. KNN approach to Unbalanced Data distributions:A case study involving information extraction. (2003).
Borowska, K. & Stepaniuk, J. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 9842 LNCS, 31–42 (2016).
Dyk, D. A. V. & Meng, X. L. The art of data augmentation. J. Comput. Graph. Stat. 10, 1–50 (2001).
Shorten, C. & Khoshgoftaar, T. M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 6, (2019).
Barua, S., Islam, M. M., Yao, X. & Murase, K. MWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26, 405–425 (2014).
Menardi, G. & Torelli, N. Training and assessing classification rules with imbalanced data. Data Mining and Knowledge Discovery vol. 28 (2014).
Fernández, A., García, S., Herrera, F. & Chawla, N. V. SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. J. Artif. Intell. Res. 61, 863–905 (2018).
Padmaja, T. M., Dhulipalla, N., Krishna, P. R., Bapi, R. S. & Laha, A. An unbalanced data classification model using hybrid sampling technique for fraud detection. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 4815 LNCS, 341–348 (2007).
Quinlan, J. R. Induction of decision trees. Mach. Learn. 1, 81–106 (1986).
Breiman, L. Random Forests. Mach. Learn. 5–32 (2001).
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).
Freund, Y. & Schapire, R. E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
Friedman, J. H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 29, 1189–1232 (2001).
Meyer, D. & Wien, F. T. Support vector machines. R News 1, 23–26 (2001).
Keller, J. M., Gray, M. R. & Givens, J. A. A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man. Cybern. 580–585 (1985).
Jain, A. K., Mao, J. & Mohiuddin, K. M. Artificial neural networks: A tutorial. Computer (Long. Beach. Calif). 29, 31–44 (1996).
Lawrence, S., Giles, C. L., Tsoi, A. C. & Back, A. D. Face recognition: A convolutional neural-network approach. IEEE Trans. neural networks 8, 98–113 (1997).
Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv Prepr. arXiv1406.1078 (2014).
Chen, C., Liaw, A. & Breiman, L. Using Random Forest to Learn Imbalanced Data. Discovery 1–12 (2004).
Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J. & Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man, Cybern. Part ASystems Humans 40, 185–197 (2010).
Liu, T.-Y. Easyensemble and feature selection for imbalance data sets. in 2009 international joint conference on bioinformatics, systems biology and intelligent computing 517–520 (IEEE, 2009).
Kubat, M. & Matwin, S. Addressing the Curse of Imbalanced training sets:One-sided selection. 4, (1997).
Olvera-López, J. A., Carrasco-Ochoa, J. A., Martínez-Trinidad, J. F. & Kittler, J. A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010).
Kiyohara, S., Miyata, T. & Mizoguchi, T. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Mach. Learn. Res. 18, 1–5 (2017).
Lin, W. C., Tsai, C. F., Hu, Y. H. & Jhang, J. S. Clustering-based undersampling in class-imbalanced data. Inf. Sci. (Ny). 409–410, 17–26 (2017).
Tomek, I. Two Modifications of CNN. IEEE Trans. Syst. Man, Cybern. Part ASystems Humans SMC-6, 769–772 (1976).
Laurikkala, J. Improving identification of difficult small classes by balancing class distribution. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 2101, 63–66 (2001).
Smith, M. R., Martinez, T. & Giraud-Carrier, C. An instance level analysis of data complexity. Mach. Learn. 95, 225–256 (2014).
Chawla, N. V., Bowyer, K. W., Lawrence, O. H. & Kegelmeyer, W. P. SMOTE: Synthetic Minority Over-sampling Technique. Artif. Intell. Res. 30, 321–357 (2002).
Li, C., Jiang, L., Li, H. & Wang, S. Attribute Weighted Value Difference Metric. IEEE 25th Int. Conf. Tools with Artif. Intell. (2013) doi:10.1109/ICTAI.2013.91.
Batista, G. E. A. P. A., Bazzan, A. L. C. & Monard, M. C. Balancing Training Data for Automated Annotation of Keywords: a Case Study. Proc. Second Brazilian Work. Bioinforma. 35–43 (2003).
Monard, M. C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explor. Newsl. 6, 20–29 (2004).
Vluymans, S. Chapter 2. Classification. in Dealing with Imbalanced and Weakly Labelled Data in Machine Learning using Fuzzy and Rough Set Methods 17–35 (2019). doi:10.1163/ej.9789004172067.i-752.38.
Buckland, M. & Gey, F. The relationship between Recall and Precision. J. Am. Soc. Inf. Sci. 45, 12–19 (1994).
Goutte, C. & Gaussier, E. A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation. Lect. Notes Comput. Sci. 3408, 345–359 (2005).
Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 20, 37–46 (1960).
Demˇ, J. Statistical Comparisons of Classifiers over Multiple Data Sets. Mach. Learn. Res. 7, 1–30 (2006).
Pereira, D. G., Afonso, A. & Medeiros, F. M. Overview of Friedman’s test and post-hoc analysis. Commun. Stat. - Simul. Comput. 37–41 (2014) doi:10.1080/03610918.2014.931971.
García, S. & Herrera, F. An extension on ‘statistical comparisons of classifiers over multiple data sets’ for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008).
Kirschbaum, D. B., Adler, R., Hong, Y., Hill, S. & Lerner-Lam, A. A global landslide catalog for hazard applications: Method, results, and limitations. Nat. Hazards 52, 561–575 (2010).
Kirschbaum, D., Stanley, T. & Zhou, Y. Spatial and temporal analysis of a global landslide catalog. Geomorphology 249, 4–15 (2015).
Juang, C. S., Stanley, T. A. & Kirschbaum, D. B. Using citizen science to expand the global map of landslides : Introducing the Cooperative Open Online Landslide Repository ( COOLR ). 1–28 (2019).
Florinsky, I. V. Computation of the third-order partial derivatives from a digital elevation model. Int. J. Geogr. Inf. Sci. 23, 213–231 (2009).
Park, M. Y. & Hastie, T. L1-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69, 659–677 (2007).
Wieczorek, G. F. & Leahy, P. P. Landslide hazard mitigation in North America. Environ. Eng. Geosci. 14, 133–144 (2008).
Liu, C. et al. Susceptibility evaluation and mapping of China’s landslides based on multi-source data. Nat. Hazards 69, 1477–1495 (2013).
Günther, A., Van Den Eeckhaut, M., Malet, J. P., Reichenbach, P. & Hervás, J. Climate-physiographically differentiated Pan-European landslide susceptibility assessment using spatial multi-criteria evaluation and transnational landslide information. Geomorphology 224, 69–85 (2014).
Moniz, N. & Monteiro, H. No Free Lunch in imbalanced learning. Knowledge-Based Syst. 227, 107222 (2021).
Pepyne, D. L. Simple Explanation of the No Free Lunch. Cybernetics 38, 292–298 (2002).

No competing interests reported.

supplementalmaterials.zip

Download PDF

Version 1

posted

You are reading this latest preprint version

A Framework for Imbalanced Modelling in Disaster Management: A Case Study Involving Global Landslide Susceptibility

Status:

Version 1

Abstract

Figures

1. Introduction

2. Materials And Methodology

2.1 Supervised learning (SL) algorithm and validation sets

2.2 IEL methods

2.3 IR

2.3.1 Undersampling

2.3.2 Oversampling

2.3.3 Hybrid sampling

2.4 Performance measure

2.5 Post hoc test

2.6 Case Study and available data

3. Result

4. Discussion

5. Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1