Study area
Located in the lower-middle section of the Yangtze River Basin in Guichi is a focal area for schistosomiasis japonica research in China. Covering an area of 2,516 km², its geographic features are notable for numerous rivers such as Qiupu, Jiuhua, Qingtong, and Baiyang in the north. By contrast, the central and southern areas of the region are predominantly hilly. The combination of a favorable climate, plentiful water resources, and lush vegetation fosters an environment conducive to snail reproduction, a key factor in schistosomiasis transmission [13]. This study primarily focuses on the lake areas in the Guichi district, as shown in Figure 1.
Fig. 1 Study area location for snail habitat identification in the lake area of Guichi, Chizhou City, Anhui Province, China. The study area is located in the northern part of Guichi, near the south bank of the middle and lower reaches of the Yangtze River.
Data source
Snail data
Snail distribution data were obtained from field surveys conducted throughout Guichi between March and May 2021. This dataset includes geographic information (latitude and longitude coordinates) on snail presence, snail population types, and environmental characteristics of their habitats. The dataset also features 108 survey sites covering snail populations in lakes and marshlands. For the control group, an equal number of snail-absent points were selected as the control group based on criteria such as areas without snails, including rooftops, roads, construction sites, and permanent water bodies.
Covariate data
Environmental Data
Covariates previously identified as significant in determining snail habitat distribution were chosen [5]. The environmental data includes precipitation, distance to bottomland, distance to water bodies, land surface temperature, normalized difference vegetation index, wetness, land use, and nighttime light, abbreviated in the snail habitat identification model as PRE, DB, DW, LST, NDVI, WET, LU, and NL, respectively. Table 1 details the sources and resolution of all environmental data.
Natural environmental factors in the study area were derived from Landsat 8 images. NDVI was used to assess vegetation cover in the study area [14], the Mono-window algorithm to estimate land surface temperature [15], and the Kirchhoff Transform (K-T) to gauge wetness [16]. The moisture content derived from the K-T provides insights into soil and vegetation moisture. We computed the DW variables using spatial analysis methods in ArcGIS 10.2 (ESRI Inc., Redlands, CA, USA).
Previous studies explored the relationship between snail habitats and waterbody distribution [17]. However, they often overlooked the effect of seasonal changes on water distribution, which is critical for the reproduction of snails. The present study addressed this gap by focusing on bottomland distribution, which reflects these seasonal variations more accurately. For identifying the bottomland area, sentinel-1 radar images, sensitive to waterbodies, were assessed, and the Sentinel-1 dual-polarized water index (SDWI) algorithm proposed by Jia et al. was used to segment the water and land portions of the study area [18]. The waterbody in the study area during the abundant water period (May to October) and the dry water period (November to April) were segmented out, and spatial analysis was conducted using ArcGIS10.2, identifying bottomland areas characterized by "winter land-summer water." The nearest distance from the sampling point to the bottomland was calculated using the Coverage tool in ArcGIS10.2, deriving the variable "DB."
Table 1. Summary of the environment variables used in this study.
Covariate
|
Spatial Resolution
|
Source
|
Reference
|
PRE
|
30m
|
China Meteorological Data Service Centre
|
http://data.cma.cn[41]
|
DB
|
/
|
SDWI
|
Jia et al., 2019 [18]
|
DW
|
/
|
Open Street Map
|
http://www.openstreetmap.org[42]
|
NDVI
|
30m
|
Band math
|
Rouse et al., 1974 [14]
|
LST
|
30m
|
Mono-window algorithm
|
Qin et al., 2003 [15]
|
WET
|
30m
|
Optical image inversion
|
Huang et al., 2002 [16]
|
LU
|
10m
|
EULUC-China
|
Gong et al., 2020 [33]
|
NL
|
500m
|
NPP-VIIRS
|
Chen et al., 2021 [40]
|
Note: PRE, precipitation; DB, distance to bottomland; DW, distance to waterbody; NDVI, normalized difference vegetation index; LST, land surface temperature; WET, wetness; LU, land use; NL, nighttime light.
Texture Data
Texture information reflects the spatial information of ground objects, and it is essential for identifying features of ground objects. Therefore, we introduced the texture information to characterize the spatial distribution of snail habitats. Eight texture feature indicators were used: mean (B1), variance (B2), uniformity (B3), contrast (B4), dissimilarity (B5), entropy (B6), second-order moment (B7), and correlation (B8); these were extracted from Landsat 8 images using the Gray-Level Co-occurrence Matrix (GLCM), a commonly used texture analysis method particularly suitable for optical image texture features [19]. The eight indicators were computed in ENVI 5.3 software (Exelis inc., Boulder, CO, USA). Table 2 shows the texture information categories and their labels.
Table 2. Types of texture indicators used in the study.
Acronyms
|
Textural indicator
|
Meaning of the indicator
|
B1
|
Mean
|
Degree of regularity of texture
|
B2
|
Variance
|
Deviation of the image element value from the mean value
|
B3
|
Standard Deviation
|
Deviation of the image element value from the mean value
|
B4
|
Contrast
|
Local gray level uniformity of an image
|
B5
|
Dissimilarity
|
Local gray level uniformity of an image
|
B6
|
Entropy
|
Amount of information an image has
|
B7
|
Angular Second Moment
|
Uniformity of the gray level distribution of an image
|
B8
|
Correlation
|
Degree of similarity between elements
|
Methodology
To generate a probabilistic risk map of snail habitats quickly, the RF method was used for identifying snail habitats based on multi-source data. The specific process is shown in Figure 2.
Fig. 2 Technical workflow of snail habitat identification study. D: input data, I: model indicators, B: model building, V: validation indicators.
In the specific processing process, the following research hypothesis was tested: incorporating texture information from Landsat 8 images and quantifying the proximity of snail habitats to bottomland improves snail habitat identification accuracy. The baseline model, Model 0, integrated environmental variables, ground-surface texture information, and the "DB" variables. For comparative analysis, three additional models were constructed: Model 1, a traditional model that incorporated only environmental variables (PRE, DW, LST, NDVI, WET, LU, and NL); Model 2, combining environmental variables with ground-surface texture information; and Model 3, merging environmental variables with the "DB" variable. To ensure consistency in the number of covariates, randomly permuted texture information was added as a control in models lacking texture information (Model 1 and Model 3) [20]. RF models were constructed by Python’s Scikit-learn machine learning library [21], with the number of trees set at n=200.
The dataset for model validation was partitioned into two segments: 75% allocated for the training set and 25% for the validation set. The training set was utilized to fit the model, allowing it to learn the relationship between the input features and the target variable. To prevent statistical randomness, the method of 10-fold cross-validation was employed to ascertain optimal parameters and evaluate models using test set data. Model performance was assessed using four metrics: true skill statistic (TSS) [22], accuracy (ACC) [23], kappa [24], and area under the curve (AUC) [25]. These metrics facilitated a comparative analysis of model performances [17,26]. Subsequently, the best-performing model was applied to predict snail habitat probabilities across the study area.