Study area and occurrence data collection
This study focused on South Kivu in Eastern DR Congo, between 1º36’ and 5º South Latitude; 26º47’ and 29º20’ East Longitude. Biological data’s related to FAW occurrence were associated to locations with geo-referenced coordinates. Occurrence data of FAW were collected in Kalehe, Kabare, Walungu, Uvira, Fizi, Mwenga and Idjwi territories in collaboration with local farmers who observed FAW larvae and reported every related field in their localities. All suspected cases of FAW attacks were checked for confirmation through field surveys. To confirm that the larvae observed were indeed those of FAW, we had considered the morphological characteristics of FAW larvae as described by EPPO [16] and Sharanabasappa et al. [26]. Geographic coordinates of infested areas were selected only after positive FAW confirmation. Presence records were collected between February 2018 and September 2019 in 156 fields where FAW has been reported. Geographic coordinates on latitude and longitude in the WGS84 system were recorded using GPS Garmin 64s. The map representing the points of occurrence is illustrated in Fig. 1.
Environmental variables
In this study, we used elevation and potential evapotranspiration data’s combined with 19 bioclimatic variables. Altitude (Digital Elevation Model ASTERDEM) with 30m spatial resolution was obtained from USGS database (https://earthexplorer.usgs.gov) and the bioclimatic data’s were collected from the Africlim database (https://www.york.ac.uk/environment/research/kite/resources/). They were used to build the species distribution model in order to find the FAW suitable areas. Africlim provides high-resolution climate data’s for Africa. Bioclimatic data consisted of 21 environmental variables (Table 1) that were obtained from interpolations of monthly averages of precipitation and temperature taking into account climate data collected over long periods of time (1950 - 2000) [23]. The Africlim spatial database includes monthly grids of temperature and rainfall, deriving from bioclimatic summary variables such as moisture indices and dry season length. All environmental variables were in raster format with a 30 arc seconds resolution (0.93 km x 0.93 km ≈ 0.86 km2 at the equator). Both ArcGIS Desktop 10.6 and QGIS 3.10 were used to process the spatial data: data extraction to the South Kivu province extent, data management in geographic coordinates (datum: WGS84) and resampling all the raster layers to the same resolution for preparing the maps.
Bioclimatic zonation
Initially, all the environmental variables (n = 21) were clipped to have only spatial data corresponding to the extent of the South Kivu province. Then, geographic coordinates of the raster pixels centroids were used to extract the values for each variables corresponding to each pixel in order to produce a dataset to be used to delineate the bioclimatic zones. The generated bioclimatic dataset was used by processing the Principal Component Analysis (PCA) procedure of the FactoMineR [31] package of the R software version 3.5.3 [48]. Based on Kaiser's criterion, only the first 5 principal components were selected for further analysis. The loadings of pixels centroids on the first 5 principal components were then used to perform a hierarchical ascending clustering through the HCPC (Hierarchical Clustering on Principle Components) procedure of the FactoMineR package. Hierarchical clustering was realised using the Euclidean distance as the metric and Ward's aggregation method to determine the optimal number of clusters to be formed. The Kmeans procedure was then used to consolidate the obtained clusters. Clustering results were then imported into QGis 3.10 to produce a bioclimatic zone map of the South Kivu province.
Selection of environmental predictors
Prior to distribution modeling, all the environmental variables were subjected to a correlation test in order to select those susceptible to be used as predictors of the FAW distribution. Consequently, only variables with pairwise Pearson correlation coefficients falling under the interval of ]-0.75, 0.75[ were selected for modeling in order to control for multicolinearity problem in environmental predicators [58].
Species distribution modeling
MaxEnt (Maximum Entropy) program 3.3.3 [43, 44] was used to establish current climate envelope for FAW natural occurrence in South Kivu. MaxEnt is a common species distribution modeling (SDM) tool used for predicting the distribution of a species from a set of records and environmental predictors [19]. The MaxEnt technique uses known occurrence locations (presence only data) and a set of gridded environmental layers to produce an output map of the predicted ecological niche of the species on a scale of 0 (lowest suitability) to 1 (highest suitability). MaxEnt is a modeling technique that measures entropy, a measure of ‘how much choice’ is involved in the selection of an event [44, 45]. MaxEnt is a general-purpose method for characterizing probability distributions from incomplete information. In estimating the probability distribution defining a species distribution across a study area, MaxEnt formalizes the principle that the estimated distribution must agree with everything that is known (or inferred from the environmental conditions where the species has been observed) but should avoid making any assumptions that are not supported by the data [44]. The approach corresponded to find the probability distribution of maximum entropy (a distribution that is most spread-out, or closest to uniform) subject to constraints imposed by the information available regarding the species observed distribution and related environmental conditions across the study area [44]. MaxEnt was presented as one of the highest performing SDM methods [7].
We ran 100 models, each trained to a randomly selected bootstrap process of the occurrence dataset. Prediction map from each model has been generated in order to calculate the mean prediction and standard deviation of each pixel. Model predictions were imported into ArcGis 10.6 to generate maps of the FAW occurrence probability in South Kivu.
Model evaluation
In this study, the Receiver Operating Characteristic (ROC) curve method was used to assess the model's performance [11, 40, 42]. One of the parameters used to evaluate predictive capacity of a model generated by MaxEnt is the area under the curve (AUC) or under the ROC curve. AUC can then be interpreted as the likelihood that a randomly selected point of presence is located in a raster cell with a higher probability of species occurrence than a randomly generated point [44]. The AUC is an effective threshold-independent index that can evaluate a model's ability to discriminate presence from absence (or background) occurrence. Also, the AUC is not affected by collinearity and spatiotemporal autocorrelation [11]. The closer AUC is to 1, the more predictive is the model. Random distribution has an AUC of 0.5. Overall value of AUC can be considered in evaluating the final model. AUC values of 0.5 - 0.7 indicate low accuracy, 0.7-0.9 useful applications and > 0.9 high accuracy [33].
Assessment of variable contribution
The Jackknife procedure was performed on climate variables to determine the major contributors to the prediction model. The model evaluation was completed by an assessment of the contribution of each variable used in the model based on Jackknife test. However, more detailed evaluation can be carried out during construction of the model by analyzing AUC obtained in different Jackknife test scenario. Then, AUC values obtained from a single variable or with the global models (from which a variable had been removed purposively) can be compared. The main goal in such situation is to identify which variable, when added or removed from the model, mainly modify the AUC value. In this study, the jackknife method was used to analyze the effects of environmental variables on model results in order to select dominant factors. Specifically, the process involves 3 independent steps:
- Calculating the training gain for the model with only one variable. Higher training gain indicates that the variable has high prediction power and contributes greatly to species distribution;
- Calculating the training gain for the model without a specific variable and analyzing the correlation between the removed variable and the omission error. If the removal of an environmental variable leads to a significant increase in the omission error, it indicates that the variable has a significant effect on the model's prediction;
- Calculating the training gain for the model with all variables.