The Research on Landslide Detection in Remote Sensing Images Based on Improved DeepLabv3+ Method

doi:10.21203/rs.3.rs-5297582/v1

In response to issues with existing classical semantic segmentation models, such as inaccurate landslide edge extraction in high-resolution images, large numbers of network parameters, and long training times, this paper proposes a lightweight landslide detection model, LDNet (Landslide Detection Network), based on DeepLabv3+ and a dual attention mechanism. LDNet uses the lightweight network MobileNetv2 to replace the Xception backbone of DeepLabv3+, thereby reducing model parameters and improving training speed. Additionally, the model incorporates a dual attention mechanism from the lightweight Convolutional Block Attention Module (CBAM) to more accurately and efficiently detect landslide features. The model underwent dataset creation, training, detection, and accuracy evaluation. Results show that the improved LDNet model significantly enhances reliability in landslide detection, achieving values of 93.37%, 91.93%, 86.30%, 89.79%, and 95.28% for P, R, IoU, mIoU, and OA, respectively, representing improvements of 14.81%, 13.25%, 14.58%, 14.27%, and 13.71% compared to the original DeepLabv3+ network. Moreover, LDNet outperforms classical semantic segmentation models such as UNet and PSPNet in terms of recognition accuracy, while having significantly fewer parameters and shorter training times. The model also demonstrates good generalization capability in tests conducted in other regions, ensuring extraction accuracy while significantly reducing the number of parameters. It meets real-time requirements, enabling rapid and accurate landslide detection, and shows promising potential for widespread application.

Earth and environmental sciences/Environmental sciences

Earth and environmental sciences/Natural hazards

Lightweight

DeepLabv3+

Attention Mechanism

Landslide Disaster

Landslides, as a common geological disaster, are characterized by the rapid movement of large amounts of rock, debris, or soil down a slope, with high frequency and speed^1,2. In recent years, with global climate change and increased human activity, landslide disasters have become more frequent, posing a serious threat to human life and property^3,4. According to statistics from the World Health Organization (WHO), between 1998 and 2017, 4.8 million people were affected by landslides, and more than 18,000 people lost their lives as a result⁵. Landslides not only pose direct harm to residents in mountainous and hilly areas but also damage transportation infrastructure, block rivers, and trigger secondary disasters. To effectively mitigate the losses caused by landslides, large-scale landslide detection and monitoring have become increasingly important^6,7.

Landslides typically occur in steep mountainous areas, and the factors that trigger landslides usually include excessive rainfall, earthquakes, volcanic eruptions, and snowmelt. The core of landslide identification and detection is determining the location and boundaries of landslides based on various features, including morphological characteristics (such as texture and shape), structural characteristics (such as faults, cracks, and steep slopes), and kinematic characteristics (such as surface movement), to enable large-scale landslide detection^3,8–10.

Early landslide identification methods primarily relied on field surveys, which provided detailed data on landslide extent, shape, and structural characteristics^11,12. However, field surveys are costly and inefficient, making it difficult to quickly detect and identify large-scale landslide disasters³. In contrast, remote sensing technology, due to its wide spatial coverage and high efficiency, can obtain surface information over large scales and long time sequences, thus demonstrating tremendous potential in landslide detection¹³. Traditional landslide monitoring methods, such as ground observation and geological surveys, are often limited by complex terrain and harsh environmental conditions, making large-area rapid monitoring challenging. In contrast, remote sensing technology overcomes these limitations, providing multi-source and multi-temporal data to offer rich informational support for landslide detection. Optical remote sensing images can identify characteristics such as the shape, size, spectrum, texture, and patterns of landslides^9,14. Therefore, optical remote sensing images and their derived products (such as digital elevation models, slope, surface roughness, etc.) have become the primary data sources for landslide detection^15,16.

Deep learning, as a branch of artificial intelligence, provides new ideas for the precise identification of disasters. The landslide disaster feature extraction process based on deep learning requires little to no human intervention to extract the essential and abstract features of the target, offering advantages such as computational capabilities tailored for big data processing. Convolutional Neural Networks (CNNs) are widely used deep learning network models in the field of computer vision, demonstrating excellent performance in image classification, object detection, and semantic segmentation tasks¹⁷. In addition, CNNs also show significant advantages in processing remote sensing image data. Many studies have combined CNNs with optical remote sensing images for landslide detection, achieving promising results^18–21. The advantage of deep learning over traditional algorithms lies in the ability of multi-layer neural networks to automatically extract useful features. In particular, CNN models extract high-level semantic information from images in addition to local detail features²². CNN consists of multiple nonlinear mapping layers, enhancing the accuracy of landslide disaster recognition by exploring the spatial correlations between target pixels and performing combinatorial analysis to obtain high-dimensional features of the target. These methods can be divided into object detection-based methods and semantic segmentation-based methods. Object detection-based methods use bounding boxes to locate landslides. For example, by combining the YOLO V4 model with attention mechanisms, researchers have utilized optical remote sensing images from Google Earth™ to detect landslides²³. Tanatipuknon et al. found that combining two Faster R-CNN models with a simple decision tree can achieve better landslide detection performance²⁴. Semantic segmentation-based methods, such as UNet and FCN networks, can classify landslide and non-landslide pixels to delineate their boundaries. For example, Ghorbanzadeh et al. proposed a new strategy that combines rule-based object-based image analysis (OBIA) with FCN for detecting landslides from multi-temporal Sentinel-2 images²⁵. Lu et al. proposed a dual-encoder UNet landslide detection method based on Sentinel-2 and DEM data for landslide detection²⁶. These studies, on the one hand, demonstrate the potential of the UNet model in landslide detection tasks, while on the other hand, they reveal certain limitations of the UNet model itself, such as the semantic gap between the encoder and decoder and the loss of spatial information during the upsampling process.

The DeepLabv3+ semantic segmentation model is a typical and high-precision network in the field of semantic segmentation, showing excellent performance in remote sensing image processing. However, the DeepLabv3+ model also has some drawbacks. First, its feature extraction network, Xception, has many layers and parameters, resulting in high computational cost and resource consumption, making it difficult for the model to meet the demands of large-scale real-time detection²⁷. Secondly, as the encoder extracts features, the spatial dimensions of the input data gradually decrease, leading to the loss of valuable information. Additionally, the decoder cannot fully recover the details during decoding, ultimately resulting in lower accuracy in identifying the edges of the target²⁷.

In summary, this study aims to address the issues of inaccurate landslide edge extraction in high-resolution images and the large number of parameters and long training times in existing classical semantic segmentation models. To this end, this paper proposes a lightweight landslide disaster detection model, LDNet, based on DeepLabv3+ and attention mechanisms. The main contributions are as follows:

(1) The lightweight network MobileNetv2 is used to replace the Xception backbone of DeepLabv3+, effectively reducing the number of parameters and speeding up the training process.

(2) A lightweight Convolutional Block Attention Module (CBAM) is introduced to filter out background information, allowing the model to focus more on key feature information, thereby improving landslide detection accuracy in complex terrain.

(3) Landslide dataset construction: utilizing multi-source remote sensing data to build a training and testing dataset that covers various types of landslides and different geographical environments, providing strong support for model training and evaluation.

(4) The improved model can quickly and accurately extract landslide locations from high-resolution remote sensing images, providing effective technical support for landslide disaster monitoring.

2.1 Research Area

Sichuan Province is located in the southwestern part of China, situated in the upper reaches of the Yangtze River, between longitudes 97°21' to 108°33' east and latitudes 26°03' to 34°19' north. The administrative area covers 486,000 square kilometers, extending 1,075 kilometers from east to west and 921 kilometers from north to south (as shown in Fig. 1). This region spans the first and second tiers of the topography of the Chinese mainland and lies in the transition zone between the Tibetan Plateau and the middle and lower reaches of the Yangtze River, exhibiting a significant west-high and east-low topographical feature. The western part is dominated by plateaus and mountains with altitudes exceeding 3,000 meters, while the eastern part consists mainly of basins and hills with altitudes ranging from 500 to 2,000 meters. The geomorphological types are complex and diverse, primarily including mountains, hills, plains, and plateaus, which account for 74.2%, 10.3%, 8.2%, and 7.3% of the total area of the province, respectively.

Due to the complex topography and variable lithology of Sichuan, coupled with active neotectonic movements, frequent earthquakes, and abundant rainfall, landslides are widely developed along the tectonic fault zones of rivers such as the Jinsha River, Yalong River, Dadu River, and Min River, as well as their tributaries. Sichuan is one of the provinces in China most severely affected by collapses and landslides, with the highest number of landslides in the country, accounting for over 5% of the national total. Additionally, large landslides with volumes exceeding 10 million cubic meters make up nearly 50% of similar landslides nationwide.

2.2 Landslide Dataset

First, aerial orthophoto data from drones and manned aircraft of landslide disasters that occurred in Sichuan Province since 2008 were collected. By systematically organizing and uniformly processing this data, the landslide disaster bodies were interpreted and sample annotations were created using ArcGIS software and visual interpretation methods. Using individual landslide disaster bodies as units, a high-precision image and interpretation sample annotation dataset of 118 landslide disasters was completed, along with corresponding disaster information description texts. Subsequently, the data quality was tested and evaluated, and the technical route is shown in Fig. 2.

Secondly, to ensure the completeness and accuracy of the data, information on significant landslide and debris flow disaster events in Sichuan Province and its surrounding areas since 2008 was further collected. Based on the occurrence time and location of these events, corresponding aerial imagery data were selected from the historical image database (see Table 1), following the selection criteria outlined below:

(1) The image data must be acquired within 10 days after the disaster occurrence to avoid the impact of water erosion on the integrity of the samples.

(2) The images should cover typical or key disaster areas, such as the "6·24" Mao County Diexi Landslide in 2017 and the "10·11" Jinsha River Landslide in 2018.

(3) The image resolution must be better than 1 meter, with most images having a resolution better than 0.5 meters, and should be free from clouds or other obstructions.

Table 1

Description of Image Data
Serial Number	Image Acquisition Time	Ground Resolution of the Image(m)	Disaster Event
1	May 16, 2008	0.9	2008 “5·12” Wenchuan Earthquake Secondary Geological Disasters
2	April 21, 2013	0.2	2013 “4·20” Lushan Earthquake Secondary Geological Disasters
3	January 20, 2016	0.2	2016 “1·19” Mianzhu Xiaogang Jianshan Landslide Disaster
4	June 26, 2017	0.2	2017 “6·24” Maoxian Diexi Landslide Disaster
5	August 11, 2017	0.2	2017 “8·8” Jiuzhaigou Earthquake Secondary Geological Disasters
6	October 12, 2018	0.2	2018 “10·11” Jinsha River Baige Landslide Disaster
7	August 24, 2020	0.2	2020 “8·20” Pingwu County Heavy Rain Landslide Disaster
8	August 26, 2020	0.2	2020 “8·20” Pingwu County Heavy Rain Landslide Disaster
During the processing, the image data underwent the following steps:

(1) Coordinate Transformation: The planar coordinates of the image data were uniformly converted to the WGS 84 coordinate system, with the elevation reference based on the 1985 National Elevation Datum, and the unit of measurement in meters.

(2) Image Matching: Geometric correction of the image data was performed using the basic scale topographic map or high-quality digital orthophoto map (DOM) produced by foundational surveying and mapping as the reference.

(3) Image Stitching: Multiple images of the same area acquired during the same time period were stitched together to ensure the integrity of the disaster body.

(4) Image Fusion: Adjustments were made to the tone, ghosting, and noise of the images to ensure clarity, rich texture, uniform tone, and no obvious seam marks or other quality issues.

To ensure the accuracy of the interpretation, landslide interpretation symbols were established with the assistance of geological disaster experts. These symbols were combined with the province-wide 3D model data to visually interpret the landslide and debris flow disaster bodies. The boundaries of the disaster bodies were outlined using ArcGIS software, generating vector polygon data, which were then converted into binary raster images (with 1 representing the landslide and 0 representing other objects). During the interpretation process, strict adherence to the principles of disaster body integrity and accuracy was maintained, focusing primarily on the obvious differences between the landslide and the surrounding environment in terms of morphology, tone, texture, vegetation condition, and other aspects. For instance, landslide bodies tend to be spoon- or tongue-shaped, with steep, often arc-shaped or zigzagging backwalls, sparse vegetation within the landslide body, and trees that are tilted or toppled.

Finally, following the principle of landslide disaster body integrity, the image data and corresponding interpretation label data of each disaster body were cropped into standard rectangular shapes, generating a pair of image and interpretation label data that were equal in size and aligned in position. A total of 118 landslide disaster datasets were produced, as shown in Fig. 3 for some of the data.

3.1 DeepLabv3 + Model

The DeepLabv3 + model is a semantic segmentation model proposed in 2018, and it is a typical and high-accuracy network architecture in the field of semantic segmentation²⁸. As an improvement over the DeepLabv3 model, the main differences of DeepLabv3 + lie in its introduction of an encoder-decoder structure, variations in the dilation rates of the ASPP module, and differences in the backbone network used²⁷.

DeepLabv3 + uses Xception²⁹ as the backbone network and combines it with the Atrous Spatial Pyramid Pooling (ASPP) module to form the encoder. The primary function of the encoder is to extract deep feature information from the input image. It typically consists of multiple convolutional layers, which can learn different abstract features of the image layer by layer, gradually deepening the feature extraction from shallow to deep layers.

The encoder reduces the size of the feature maps through convolution and pooling operations while increasing the number of feature channels, thus achieving compression and extraction of image features. The ASPP module captures multi-scale contextual information of the image by using dilated convolutions with different dilation rates, enabling better understanding of objects of various sizes and scales³⁰. DeepLabv3 + optimizes the ASPP module by employing larger dilation rates, thereby expanding the receptive field and further enhancing image segmentation performance.

In the decoder section, the model first performs fourfold upsampling of the fused deep feature map through bilinear interpolation, and then concatenates it with the low-level feature map extracted from Xception. Finally, after a 3×3 convolution and another fourfold bilinear interpolation upsampling, the final segmentation result is obtained, matching the size of the original image. The network structure is shown in Fig. 4.

3.2 Improved Model

3.2.1 Improved LDNet Model

Landslide disasters typically present a spoon-shaped or tongue-shaped planar morphology, with geomorphological features including landslide walls, flow path scraping marks, and tongue-shaped deposition areas. The landslide body shows significant differences from the surrounding environment in terms of morphology, tone, texture, vegetation development, and growth status. Although the semantic information is relatively simple, the details are rich, which places high demands on the detail extraction capability of the segmentation network. Traditional classical segmentation networks such as DeepLabv3 + are complex in the feature extraction part, as they need to handle the diversity of datasets, variability of object features, and the vast amount of data in order to learn more complex feature patterns. However, the original DeepLabv3 + structure is complicated and has a large number of parameters, making it suitable for multi-object segmentation tasks, but it also results in a high computational load and training difficulty, which are not entirely suitable for landslide recognition tasks.

In response to the landslide features in remote sensing images, this study applies the lightweight MobileNetv2³¹ backbone network to the encoder-decoder structure of DeepLabv3+, serving as the encoder for feature extraction. MobileNetv2 has fewer parameters and introduces more direct connection networks. Compared to Xception, it has fewer parameters, is easier to train, and converges faster. Additionally, the incorporation of dilated convolutions increases the receptive field, enabling the network to better combine the semantic information of the image and significantly enhance the semantic extraction capability. This network operates quickly, making it particularly suitable for landslide recognition tasks in remote sensing images.

At the same time, the introduction of CBAM³² allows the model to focus more on key feature information, thereby extracting landslide features more efficiently and accurately. By utilizing the detail information of the image and the spatial correlation of pixels over a large area, the model can more accurately identify most regions, improving the recognition performance for landslides of different sizes. Figure 5 illustrates the network model structure of this method.

3.2.2 Backbone Network

The traditional DeepLabv3 + semantic segmentation model typically uses Xception as the backbone network. In contrast, this study selects the lightweight modified MobileNetv2 network as the backbone for the semantic segmentation model. MobileNetv2 is a lightweight network architecture proposed by Google, based on the design of depthwise separable convolutions, and introduces inverted residual modules and linear bottleneck layers, effectively reducing the number of parameters in the model and accelerating the convergence speed of the network.

In the MobileNetv2 network, low-dimensional convolutional layers do not use activation functions, while other layers employ the ReLU6 activation function to prevent nonlinear operations from excessively damaging feature information³¹. The feature extraction network first obtains features of the same dimension through a 3×3 depthwise convolution combined with the ReLU activation function. Next, a 1×1 convolution with the ReLU6 activation function processes the data to obtain reduced-dimensional features. Finally, another 1×1 convolution is performed to expand the dimensions. The inverted residual module is primarily used to enhance the network's feature extraction capabilities and facilitate efficient transmission of multi-level feature information. Specifically, it first increases the dimensions through a 1×1 convolution, followed by feature extraction using a 3×3 depthwise convolution. Lastly, a 1×1 convolution is used to reduce the dimensions to extract effective feature information³¹. The network structure is shown in Fig. 6.

3.2.3 Attention Mechanism

The attention mechanism captures the internal correlations of an image by learning its contextual information. Its core idea is to enable the model to focus on information from key areas while ignoring irrelevant information³². The attention mechanism consists of channel attention modules and spatial attention modules, with the former concentrating on which features are more important in the image, while the latter focuses on which regions of the image contain critical features. To accurately extract landslide disaster areas, this study utilizes the attention mechanism to enhance the accuracy and efficiency of extracting landslide edge features.

In this research, the Convolutional Block Attention Module (CBAM) combines both of the aforementioned modules and integrates them into the semantic segmentation network. This module not only pays attention to the features of each channel but also to the features of each pixel point, enabling adaptive optimization based on the characteristics of the input image. A notable advantage of CBAM is its lightweight design, allowing it to be seamlessly integrated into any neural network, truly achieving plug-and-play functionality.

As shown in Fig. 7, CBAM is composed of two modules: the Channel Attention Module (CAM) and the Spatial Attention Module (SAM)³². Figures 8 and 9 depict the basic structures of CAM and SAM, respectively. The channel attention module first performs pooling operations on the input feature map to calculate the weight of each channel, which is then applied to the spatial attention module. The spatial attention module subsequently extracts the maximum and average values of each feature point across all channels and calculates the weight of each feature point using the same operations as the channel attention. Finally, these weights are multiplied by the original input feature map and convolved to obtain deep features that contain multi-scale contextual information.

4.1 Environmental Configuration

The required environment for this experiment is shown in Table 2, with the operating system being Windows 11.

Table 2: Experimental Environment Software and Hardware Configuration

Parameters	Configuration
CPU	13th Gen Intel(R) Core(TM) i7-13620H 2.40 GHz
RAM	64GB
GPU	NVIDIA GeForce RTX 4060 GPU 8G
CUDA	12.1
Deep learning framework	PyTorch 2.4.0、python 3.11

The experimental parameter settings are shown in Table 3.

Table 3: Experimental Parameter Settings

Parameters	Configuration	Description
down_sample_factor	16	Downsampling
input_shape	[512, 512]	Input Image Size
num_epoch	300	Number of Training Epochs
batch_size	2	Batch Size
init_lr	0.01	Initial Learning Rate
min_lr	init_lr * 0.01	Minimum Learning Rate
optimizer_type	AdamW	Optimizer
momentum	0.9	Momentum
weight_decay	0.0001	Weight Decay

4.2 Evaluation Metrics

To validate the performance of the improved LDNet model, five accuracy evaluation metrics were employed: precision (P), recall (R), Intersection over Union (IoU), mean IoU (mIoU), and overall accuracy (OA). These metrics are used to assess the performance of the improved landslide detection model and evaluate its recognition accuracy.

P represents the proportion of actual landslides among the pixels predicted as landslides by the model, calculated using the formula (1):

n is the number of identified species excluding the background. In this study, TP is true positive; TN is true negative; FP is false positive; and FN is false negative. TP and TN represent the number of pixels correctly predicted as “landslide” or “non-landslide,” respectively. FP represents the number of “non-landslide” pixels incorrectly judged as “landslide,” and FN represents the number of “landslide” pixels incorrectly judged as “non-landslide.”

4.3 Comparative Experiments

To verify the effectiveness of the LDNet model in landslide point identification, we compared the LDNet model based on the MobileNetv2 backbone network with the classic semantic segmentation models UNet, PSPNet, and the original DeepLabv3+ model. All models were tested under the same training parameters. The results show that the improved lightweight LDNet model achieved the highest recognition accuracy among the four models, as detailed in Table 4 and Figure 10. The final results regarding the parameter sizes and training times of the four models are shown in Table 5.

Table 4: Comparison of Landslide Recognition Results of Different Models

Model	P	R	IoU	mIoU	OA
UNet	71.84%	71.15%	64.63%	68.46%	74.52%
PSPNet	75.02%	73.24%	66.96%	70.99%	77.32%
DeepLabv3+	78.56%	78.68%	71.72%	75.52%	81.57%
LDNet	93.37%	91.93%	86.30%	89.79%	95.28%

Table 4 presents the accuracy results of the improved lightweight LDNet model compared to other classic semantic segmentation models in landslide recognition. As can be seen from Table 4, the overall accuracy (OA) of all four models exceeds 70%, and they all perform well in landslide recognition extraction, indicating that deep learning demonstrates excellent performance in landslide recognition. Through the comparison of five evaluation metrics, the improved lightweight LDNet model outperforms the other three models in landslide recognition accuracy. Specifically, the OA and mean Intersection over Union (mIoU) of the improved LDNet model reached 95.28% and 89.79%, respectively, which is an increase of 20.76% and 21.33% compared to the UNet model, 17.96% and 18.80% compared to the PSPNet model, and 13.71% and 14.27% compared to the original DeepLabv3+ model. In addition, the improved LDNet model achieved precision (P), recall (R), and IoU values of 93.37%, 91.93%, and 86.30% in landslide recognition results, respectively, which are increases of 21.53%, 20.78%, and 21.67% compared to the UNet model, 18.35%, 18.69%, and 19.34% compared to the PSPNet model, and 14.81%, 13.25%, and 14.58% compared to the original DeepLabv3+ model. These results indicate that the improved lightweight LDNet model employed in this study exhibits superior performance in landslide recognition and extraction.

From the recognition results in Figure 10, it can be observed that the UNet and PSPNet models yield relatively poor extraction results for landslide identification, with many misclassifications and omissions. The original DeepLabv3+ semantic segmentation model employs dilated convolutions and multi-scale strategies, significantly enlarging the receptive field, enabling it to identify most landslide disasters. However, it can only identify landslide locations in the image with minimal differences in reflectance. When the reflectance differences of the objects at landslide locations increase, omissions may occur, leading to significant errors. Spectral features of roads and other objects similar to landslides within the predicted area may be misidentified as scattered landslide zones. The improved lightweight LDNet model outperforms the original DeepLabv3+ model, achieving better recognition results.

Table 5: Parameter Sizes and Training Times of Different Models

Model	Model Parameters (MB)	Training Time (h)
UNet	93.97	3.42
PSPNet	179.32	5.26
DeepLabv3+	208.68	3.44
LDNet	22.52	2.13

From Table 5, it can be seen that the improved LDNet model has the fewest parameters among the four models, with a parameter size of only 22.47 MB, which is much smaller than the other three models, 10.79% of the original DeepLabv3+ model, 23.97% of UNet, and 12.56% of PSPNet. Additionally, the training time for the improved model is 2.13 hours, the shortest among the four models, reducing the training time by 1.31 hours compared to the original DeepLabv3+ model, by 1.29 hours compared to UNet, and by 3.13 hours compared to PSPNet.

Therefore, the LDNet model significantly improves the speed of image segmentation while maintaining accuracy, and it greatly reduces the model's parameter count, achieving rapid and accurate extraction of landslide information.

4.4 Generalization Experiment

To further validate the generalization ability of the improved model, we conducted tests in Yichang City, Hubei Province, China. The test results are shown in Figure 11. The results indicate that the improved model's extraction accuracy for the landslide zone in the Three Gorges Reservoir area is comparable to the results obtained in the research area, demonstrating that the model has strong generalization capability and potential for broader application.

This study proposes a lightweight landslide detection model, LDNet, based on the DeepLabv3 + semantic segmentation model and the CBAM attention mechanism module. The model utilizes the lightweight MobileNetv2 as the backbone network and incorporates the lightweight CBAM module, effectively addressing the inaccuracies in extracting landslide edges in high-resolution images and overcoming the challenges of high parameter counts and long training times associated with classic semantic segmentation models. In the landslide extraction results for the research area, the precision (P), recall (R), Intersection over Union (IoU), mean IoU (mIoU), and overall accuracy (OA) reached 93.37%, 91.93%, 86.30%, 89.79%, and 95.28%, respectively, outperforming the other four comparison models while having the fewest parameters and shortest training time. Additionally, the model's strong generalization ability was also validated in tests conducted in other regions. In summary, this model significantly reduces parameter count and training time while maintaining extraction accuracy, demonstrating strong generalization capabilities, making it worthy of further promotion and application.

Data availability

The datasets generated and/or analyzed during this study are not publicly available due to confidentiality restrictions. Access to these datasets can be granted upon reasonable request from the corresponding author, subject to appropriate data use agreements.

Author contributions

L.Y.: Data Collection, Dataset Creation, Data Management, Methodology, Drafting the Manuscript, Review and Editing.

Keefer, D. K. & Larsen, M. C. Assessing Landslide Hazards. Science 316, 1136–1138 (2007).
Petley, D. Global patterns of loss of life from landslides. Geology 40, 927–930 (2012).
Guzzetti, F. et al. Landslide inventory maps: New tools for an old problem. Earth-Sci. Rev. 112, 42–66 (2012).
Froude, M. J. & Petley, D. N. Global fatal landslide occurrence from 2004 to 2016. Nat. Hazards Earth Syst. Sci. 18, 2161–2181 (2018).
S., S., S. S., V. C. & Shaji, E. Landslide identification using machine learning techniques: Review, motivation, and future prospects. Earth Sci. Inform. 15, 2063–2090 (2022).
Van Westen, C. J., Castellanos, E. & Kuriakose, S. L. Spatial data for landslide susceptibility, hazard, and vulnerability assessment: An overview. Eng. Geol. 102, 112–131 (2008).
Chen, Z., Zhang, Y., Ouyang, C., Zhang, F. & Ma, J. Automated Landslides Detection for Mountain Cities Using Multi-Temporal Remote Sensing Imagery. Sensors 18, 821 (2018).
Pawluszek, K. Landslide features identification and morphology investigation using high-resolution DEM derivatives. Nat. Hazards 96, 311–330 (2019).
Zhong, C. et al. Landslide mapping with remote sensing: challenges and opportunities. Int. J. Remote Sens. 41, 1555–1581 (2020).
Mohan, A., Singh, A. K., Kumar, B. & Dwivedi, R. Review on remote sensing methods for landslide detection using machine and deep learning. Trans. Emerg. Telecommun. Technol. 32, e3998 (2021).
Alexander, D. E. A brief survey of GIS in mass-movement studies, with reflections on theory and methods. Geomorphology 94, 261–267 (2008).
Santangelo, M., Cardinali, M., Rossi, M., Mondini, A. C. & Guzzetti, F. Remote landslide mapping using a laser rangefinder binocular and GPS. Nat. Hazards Earth Syst. Sci. 10, 2539–2546 (2010).
Sameen, M. I. & Pradhan, B. Landslide Detection Using Residual Networks and the Fusion of Spectral and Topographic Information. IEEE Access 7, 114363–114373 (2019).
Yu, B. & Chen, F. A new technique for landslide mapping from a large-scale remote sensed image: A case study of Central Nepal. Comput. Geosci. 100, 115–124 (2017).
Schulz, W. H. Landslide susceptibility revealed by LIDAR imagery and historical records, Seattle, Washington. Eng. Geol. 89, 67–87 (2007).
Haneberg, W. C., Cole, W. F. & Kasali, G. High-resolution lidar-based landslide hazard mapping and modeling, UCSF Parnassus Campus, San Francisco, USA. Bull. Eng. Geol. Environ. 68, 263–276 (2009).
Hu, F., Xia, G.-S., Hu, J. & Zhang, L. Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery. Remote Sens. 7, 14680–14707 (2015).
Ji, S., Yu, D., Shen, C., Li, W. & Xu, Q. Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 17, 1337–1352 (2020).
Gao, X., Chen, T., Niu, R. & Plaza, A. Recognition and Mapping of Landslide Using a Fully Convolutional DenseNet and Influencing Factors. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 14, 7881–7894 (2021).
Ullo, S. et al. A New Mask R-CNN-Based Method for Improved Landslide Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 14, 3799–3810 (2021).
Meena, S. R. et al. Landslide detection in the Himalayas using machine learning algorithms and U-Net. Landslides 19, 1209–1229 (2022).
Chu, X. et al. Glacier extraction based on high-spatial-resolution remote-sensing images using a deep-learning approach with attention mechanism. The Cryosphere 16, 4273–4289 (2022).
Cheng, L., Li, J., Duan, P. & Wang, M. A small attentional YOLO model for landslide detection from satellite remote sensing images. Landslides 18, 2751–2765 (2021).
Tanatipuknon, A. et al. Study on Combining Two Faster R-CNN Models for Landslide Detection with a Classification Decision Tree to Improve the Detection Performance. J. Disaster Res. 16, 588–595 (2021).
Ghorbanzadeh, O., Gholamnia, K. & Ghamisi, P. The application of ResU-net and OBIA for landslide detection from multi-temporal Sentinel-2 images. Big Earth Data 7, 961–985 (2023).
Lu, W. A dual-encoder U-Net for landslide detection using Sentinel-2 and DEM data. doi:10.1007/s10346-023-02089-5.
Mo, L. et al. DeepMDSCBA: An Improved Semantic Segmentation Model Based on DeepLabV3+ for Apple Images. Foods 11, 3999 (2022).
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Preprint at http://arxiv.org/abs/1802.02611 (2018).
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1800–1807 (IEEE, Honolulu, HI, 2017). doi:10.1109/CVPR.2017.195.
Liu, J., Zhang, Y., Liu, C. & Liu, X. Monitoring Impervious Surface Area Dynamics in Urban Areas Using Sentinel-2 Data and Improved Deeplabv3+ Model: A Case Study of Jinan City, China. Remote Sens. 15, 1976 (2023).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 4510–4520 (IEEE, Salt Lake City, UT, 2018). doi:10.1109/CVPR.2018.00474.
Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. CBAM: Convolutional Block Attention Module. Preprint at http://arxiv.org/abs/1807.06521 (2018).

No competing interests reported.

The Research on Landslide Detection in Remote Sensing Images Based on Improved DeepLabv3+ Method

Status:

Version 1

Abstract

Figures

1 Introduction

2 Research Area and Dataset

2.1 Research Area

2.2 Landslide Dataset

3 Research Method

3.1 DeepLabv3 + Model

3.2 Improved Model

3.2.1 Improved LDNet Model

3.2.2 Backbone Network

3.2.3 Attention Mechanism

4 Results

5 Conclusion

Declarations

References

Additional Declarations

Status:

Version 1