Information Included in Each Module
EcoNicheS, developed with Shiny Dashboard, provides an open-access platform featuring an intuitive interface for creating ecological niches and species distribution models. By streamlining the modeling process, EcoNicheS simplifies the complexity inherent in each step. The platform integrates 12 essential R libraries, enabling users to gather species records, develop ensemble models, evaluate model performance, visualize results, and optionally generate ecological connectivity models. The workflow is structured into 12 modules: Occurrence Processing, Load and Plot Maps (performed twice); Correlation Layers, Points and Pseudo-Absences; SDM/ENM using the biomod2 library; Partial ROC Analysis; Removal of Urbanization Effects; Calculation of Area and Gains/Losses; Niche Overlap Analysis via ENMTools; and Connectivity Analysis (Fig. 2, Table 1). A user manual with screenshots of each module and their results is available, along with the package code and installation instructions at https://github.com/armandosunny/EcoNicheS.
Module 1: Environmental Data
This module facilitates the download and processing of the 19 bioclimatic layers from WorldClim using the geodata (Hijmans et al. 2024) and terra (Hijmans 2024) packages for R, with options to obtain global data or country-specific layers. Moreover, monthly variable layers, including Tmin, Tmax, precipitation, wind, and vapor pressure (vapr), can be accessed. Layers are available at three spatial resolutions: 10°, 5°, 2.5°, and 0.5°. Additionally, the platform allows users to upload custom layers containing variables of interest. The downloaded layers can be clipped using an area of interest defined through an interactive map, by country, or by uploading a mask in .shp or .asc format. The clipped layers are then saved in .asc format.
Module 2: Occurrence Processing
The first step in niche modeling is obtaining occurrence records for the target species. EcoNicheS processes these records in two ways: by accessing GBIF databases using the rgbif (Chamberlain et al. 2024) package for R or by incorporating user-provided data in a comma-delimited text file (.csv), aiming to minimize sampling bias. Bias in SDM/ENM can significantly affect the accuracy and reliability of predictions, particularly when occurrence data are derived from museum collections, heavily sampled areas, easily accessible locations, or, less commonly, systematic sampling (Araujo & Guisan, 2006; Boria et al., 2014; Sillero & Barbosa 2021b). As a result, databases often contain inaccuracies in temporal and geographic information as well as errors in taxonomic identification (Araujo & Guisan, 2006; Boria et al., 2014; Sillero & Barbosa 2021b). This presents a significant challenge in modeling, as spatial bias can lead to inflated performance metrics and potential overfitting (Boria et al., 2014). To improve the quality of the data obtained from the GBIF or provided by the user, EcoNicheS uses the CoordinateCleaner package (Zizka et al., 2020) for R. This package allows the identification and removal of temporal and spatial errors, such as records assigned to country or province centroids, urban areas, open oceans, or those containing atypical or invalid coordinates. Additionally, to reduce sampling bias and remove duplicate records, EcoNicheS can select a single occurrence per pixel of the environmental layers (e.g., one record per 1 km²). For spatial filtering of species presence records, the platform uses the spThin package (Aiello-Lammens et al. 2015) in R, which decreases the density of records in widely sampled areas, thereby reducing spatial bias in the models (Aiello-Lammens et al., 2015). During this process, users can visualize the points downloaded from the GBIF or uploaded by the user on an interactive map and observe how many records remain after the data cleaning. The resulting cleaned dataset was subsequently downloaded in .csv format and was ready for use in Module 5 for the addition of pseudoabsences.
Module 3: Load and Plot Maps
This module is designed for visualization purposes. Users can load files in .asc format for a specific area of interest, view the data with the raster package (Hijmans 2023) for R, and save the resulting map as a .pdf image. It is particularly useful for displaying variables intended for use in constructing SDM/ENMs. Additionally, this module allows zooming into the area of interest for a more detailed examination of the map with the leaflet (Cheng et al. 2024) and leaflet.extras (Gatscha et al. 2024) packages for R.
Module 4: Correlation layers
Correlations between environmental layers can significantly affect species distribution modeling, potentially leading to multicollinearity (Graham, 2003; Feng, 2019a) or model overfitting to the training data. The inclusion of correlated variables can also result in redundant or overly complex models (Dormann et al., 2013). Therefore, assessing variable collinearity is crucial for reducing the number of layers, enhancing model efficiency, and simplifying interpretation. Various strategies exist to evaluate the acceptable degree of correlation between layers. In this module, users can set a correlation threshold between 0 and 1 before performing correlation analysis. To assess the collinearity of environmental layers, EcoNicheS calculates the Pearson correlation coefficient and generates a correlation matrix using a heatmap, illustrating the degree of collinearity between layers. Additionally, collinearity issues can be identified through variance inflation factor (VIF) analysis, which quantifies the increase in variance of the regression coefficient caused by collinearity using the usdm package (Naimi et al., 2014). Layers with a VIF greater than 10 were excluded from further analysis (Naimi et al., 2014).
Module 5: Background Points and Pseudoabsences
In species distribution modeling, pseudoabsences are generated to sample the environmental space of a region (Phillips et al., 2009; Whitford, 2024). Various methods exist for generating these points, with the most prevalent being the use of numerous random samples across the study area or M (Araújo et al., 2019; Valavi et al., 2022) to ensure representation of all environmental conditions within the region of interest (Valavi et al., 2022). The EcoNicheS platform includes a module for generating pseudoabsences using the dismo (Hijmans et al. 2023) package for R, allowing users to create points that are randomly distributed within a user-defined geographical area. It is advisable to select a sufficient number of background points to provide comprehensive coverage of the accessible geographic range of the species (Merow et al., 2013; Franklin, 2023). However, users should be aware that an increased number of background samples will increase computational demands (Valavi et al., 2022). The module also produces a comma-separated values (.csv) file, which can be used for further data manipulation by other R packages or spreadsheet editors. This dataset was formatted for integration with Module 6, where it can be employed in SDM/ENM analysis using the biomod2 (Thuiller et al. 2024) package. Initially, a basic map illustrating the distribution of the generated pseudoabsences is shown, which can be downloaded as a PDF. Below this, an interactive map is available, allowing for a visual assessment of the generated database stored in your work directory. This interactive map displays both the original presence points and the added pseudoabsence points. In the .csv database, points are identified by the "response" value associated with each coordinate: a value of 1 denotes presence points, while a value of 0 corresponds to pseudoabsence points. Prior to running the analysis, the name of the generated database was modified from the default, while the .csv extension was retained to confirm proper creation and saving.
Module 6: SDM/ENMs via biomod2
A variety of algorithms are available for performing SDM/ENM, each using different methodologies. Ensemble models, which combine multiple algorithms, can enhance predictive accuracy by compensating for the weaknesses of individual approaches. EcoNicheS facilitates the creation, calibration, and evaluation of these ensemble models by streamlining the process through an intuitive interface. The tool employs the biomod2 platform, which supports species distribution modeling via 12 correlative modeling techniques: generalized linear models (GLM), generalized additive models (GAM), generalized boosting models (often called boosted regression trees; GBM), classification tree analysis (CTA), artificial neural networks (ANN), surface range envelope or BIOCLIM (SRE), flexible discriminant analysis (FDA), multiple adaptive regression splines (MARS), random forest (RF), and maximum entropy (MAXENT) using the MIAmaxent (Vollering et al. 2019) R package MAXNET and eXtreme Gradient Boosting (XGBOOST) (Huang et al., 2023). Users can select multiple techniques to execute analyses, depending on the specific problem and available data, as each technique has its own strengths and weaknesses. EcoNicheS also incorporates model evaluation metrics available in biomod2, offering two types of analysis: goodness-of-fit (e.g., ANOVA, AIC) and model accuracy, which includes calculating metrics such as the area under the curve (AUC), Cohen's kappa, and the true skill statistic (TSS). The TSS is defined as {1 - the maximum value of (sensitivity + specificity)}, where sensitivity and specificity are calculated based on the probability threshold that maximizes their sum. TSS values range from − 1 to + 1, with values close to + 1 indicating perfect agreement between the observations and predictions, while values of 0 or lower suggest that the model performance is no better than random (Allouche et al., 2006; Franklin, 2009). Similarly, the kappa statistic ranges from − 1 to + 1, where + 1 represents high prediction consistency and values of zero or lower indicate performance equivalent to random chance (Thuiller et al., 2009). At the end of the process, users can obtain graphical representations of each model's response curves, evaluation metrics, and results. Additionally, the module generates maps in .tiff format and deposits them in a designated folder, allowing easy access to the necessary files (Fig. 3) for the subsequent steps in the workflow (Fig. 2).
Module 7: Load and Plot Maps
This module enables the visualization of potential distribution maps generated in the EcoNicheS biomod2 module. Users can save these visualizations as images in .pdf format. It is also useful for displaying the environmental variables used in the creation of SDMs/ENMs. Additionally, this module includes a zoom function, allowing users to focus on specific areas of interest for a more detailed examination of the map.
Module 8: Partial ROC Analysis
The receiver operating characteristic (ROC) curve has long been the gold standard for evaluating model performance in species distribution modeling, primarily due to its intuitive interpretation: it indicates the probability that the model ranks a random presence higher than a random absence. However, when reliable absence data are unavailable, the probabilistic interpretation of area under the curve (AUC) values becomes problematic, necessitating a reconsideration of model comparisons in species distribution modeling (Jiménez and Soberón, 2020). Partial ROC analysis, a threshold-independent evaluation proposed by Peterson et al. (2008), addresses this issue by focusing on specific subsectors of the ROC space through the setting of an acceptable true positive error threshold (E). The pROC method evaluates the relationship between the omission error for independent points and the proportion of the area predicted to be suitable for the species, but this approach is valid only under conditions of low omission error. The AUC ratios (the partial AUC divided by random expectations) range from 0 to 2, with a value of 1 indicating random performance (Peterson, 2012; Peterson et al., 2008). This approach enhances model evaluation by avoiding artificially low AUC values, particularly in situations involving presence-only data where conventional ROC analysis and AUC interpretation can be misleading. EcoNicheS facilitates model performance assessment by calculating the partial ROC curve using the pROC (Robin et al. 2011) and ntbox (Osorio-Olvera et al., 2020) packages. This module allows users to employ a raster of continuous values and compare them with species presence points following the method proposed by Peterson et al. (2008).
Module 8: Remove Urbanization
This module allows users to exclude urbanized areas, or other features of interest, from the species distribution model. This is done by utilizing a raster layer containing urbanized areas within the region of interest using the raster package. The process is conducted using the raster package, which enables users to quantify the distribution area or ecological niche of a species while disregarding urban zones with potentially adverse environments. The module also allows users to visualize the generated map and save it in .asc format.
Module 9: Calculate the area
This module calculates the area of suitability (in km²) for the species of interest using the raster package. Users can upload the .asc file created in the previous module or use an existing file. The suitability threshold value can be adjusted according to the specific requirements of the study. The output consisted of the area of suitable habitat displayed in the main panel of the tab.
Module 10: Gains and Losses Plot
This module requires two raster files in .asc format. The first file represents the geographic data or environmental characteristics of the area of interest, while the second file contains predictions for future landscape conditions or another map of interest. The module enables users to visualize areas of change using the raster package via the raster package, allowing analysis of potential gains or losses in species distribution between two time points. Users can also download the generated maps for further use.
Module 11: Niche Overlap Analysis via ENMTools
Niche overlap occurs when coexisting species share portions of their ecological niche. According to ecological and evolutionary displacement models, species should exhibit less niche overlap due to competition (Schoener, 1974). Low niche overlap, indicating differential resource utilization, is crucial for the coexistence of syntopic species and promotes biodiversity (Pianka, 1974). EcoNicheS utilizes the ENMTools package (Warren & Dinnage 2024) to quantify the similarity between. It employs measures such as Schoener’s D (1968) and Hellinger’s distance-based metric ‘I’ (Warren et al., 2008). Additionally, we used a new metric, env. I, env. D, and env.cor (Warren et al. 2021), these new metrics are used in the n-dimensional space of all combinations of environmental variables instead of restricting the measures of model similarity to those sets of conditions that appear in the training region. These metrics compare habitat suitability estimates across grid cells using Maxent-generated SDMs/ENMs. The SDMs/ENMs are normalized so that the suitability scores sum to 1 within the geographic space. Additionally, ENMTools provides other similarity measures, including Spearman’s rank correlation coefficient, and enables hypothesis testing related to equivalence, similarity, and environmental barriers to species distributions using linear, blob, and ribbon rangebreak tests (Glor and Warren, 2011). Linear and blob tests are two versions of a test that allow us to assess whether the geographic regions occupied by two species are more environmentally different than expected by chance. Moreover, the ribbon test is designed to test whether the ranges of two species are divided by a region that is relatively unsuitable for one or both forms.
Module 12: Ecological Connectivity
Landscape connectivity refers to the degree to which an environment facilitates or impedes the movement of organisms between different locations (Taylor et al., 1993; Tischendorf and Fahrig, 2000). In EcoNicheS, users can generate ecological connectivity models by creating flow maps. The connectivity analysis uses habitat suitability as a resistance surface and employs conductance analysis through the gdistance package (van Etten, 2017). Species occurrence data and a resistance raster, which may contain species distribution model data or landscape elements with resistance values, are needed. The resulting flow maps integrate least-cost path analysis, such as Circuitscape-type analyses, to evaluate potential connectivity (Baecher, 2024), identifying the shortest possible paths between locations within the area of interest.
Empirical example: the distribution and connectivity of Tapirus bairdii in the Selva Maya.
We present an example of implementing EcoNicheS through a case study on the Central American tapir (Tapirus bairdii) in the Selva Maya, a concept developed by ecological actors to conserve tropical rainforests in the border region of Mexico, Guatemala, and Belize (Laako et al., 2022). We modeled the species distribution in this area, demonstrating the effectiveness of EcoNicheS in processing bioclimatic layers, recording occurrences, modeling, and evaluating the model. Additionally, we analyzed the ecological niche overlap between T. bairdii and Tayassu pecari, both of which have experienced significant population declines due to overhunting and deforestation (Reyna-Hurtado et al., 2008, Falconi-Briones et al. 2022). To construct the SDM/ENM, we gathered occurrence data from the GBIF and supplemented it with our own records. The dataset was cleaned using the "Clean My Own database" module of EcoNicheS, where we removed occurrences located within 5 km of each other. This process resulted in a total of 356 occurrence points, which aligned with the movement range of the Central American tapir in the region (Reyna-Hurtado et al. 2012; Rivero et al. 2022). Next, we assessed the correlations among the environmental layers using the "Correlation layers" module, obtaining Pearson correlation values and the variance inflation factor (VIF). We excluded environmental layers with a VIF greater than 8, and the final layers used for modeling were bio3 (isothermality), bio4 (temperature seasonality), bio6 (minimum temperature of the coldest month), bio7 (annual temperature range), bio9 (mean temperature of the driest quarter), bio15 (precipitation seasonality), and bio18 (precipitation of the warmest quarter). Subsequently, we employed the "Points and pseudoabsences" module to generate 100,000 background points, enhancing coverage of the study area (Valavi et al. 2022). The SDM/ENM was developed using the biomod2 module, model calibration was conducted using the block strategy, and evaluation metrics included kappa, TSS, and ROC, with four repetitions and a model selection threshold of 0.4 (Fig. 4A), with an ensemble model combining the Maxent, GLM, and RF algorithms (Fig. 4B, 4C, 4D). We also explored the niche overlap between Tapirus bairdii and Tayassu pecari using the "Niche Overlap Analysis" module in ENMTools (Fig. 5). Hypothesis tests for niche identity or equivalence were performed following Warren et al. (2021). These tests included three hypothesis tests—Niche Identity and background, symmetric, and asymmetric similarity tests—along with the rangebreak test (Fig. 6). The tests were conducted using Maxent with four replicates. Additionally, we assessed the functional connectivity of the Central American tapir (Fig. 7) using the "Ecological Connectivity" module. For this analysis, we utilized the previously generated species distribution model from biomod2 and processed the raster file with inverted values through the QGIS 3.22 "invert raster" tool. The connectivity model was constructed using 80% of the occurrence data for training and the remaining 20% for testing. The resulting raster was processed in QGIS to visualize the connectivity between PAs within the region for the Central American lowland tapir. Additional examples of EcoNicheS applications include several articles published during its early development stages (Sunny et al. 2023; Martinez-Martinez et al. 2024; Rubio-Blanco et al. 2024).