Unless otherwise indicated, all analyses took place in the R coding language (R Core Team 2021).
Collection and Sanitization of Occurrence Data
Occurrence records for Chrysodeixis chalcites were obtained through the Global Biodiversity Information Facility (GBIF 2022). All occurrence records in the dataset without latitude or longitude were removed. Chrysodeixis chalcites most commonly occurs in Europe, the Middle East, and North Africa (Sullivan and Molet 2007, CABI 2022), and so models were trained to occurrence points in this range. Occurrence points in sub-Saharan Africa represent invasive populations that may not have fully occupied the entire landscape available to them violating the assumption of species-environment equilibrium (Guisan and Thuiller 2005), and so were not included. Occurrence points from Asia and the Pacific may be C. eriosoma, sometimes considered to be a separate species from C. chalcites (Murillo et al. 2013), and so these were also excluded from the analysis. Removal outside of the home range left 4312 occurrence points. Duplicate points were then removed, resulting in 2156 records. After this, we stratified the data set by removing all points with duplicate latitudes and longitudes when rounded to a tenth of a degree. This was done to mitigate the effects of collector bias (Phillips et al. 2009). After this cleaning process, the dataset contained 679 occurrence records. 20% of these records (137) were randomly removed from the data set and set aside to assess model accuracy. The remaining 80% (542) were used to train the models (Fig. 1)
Generation of Background/Absence Data
All models used in this study require background/absence points. To mitigate the effects of sampling bias inherent in occurrence data, background data were sampled from within the range of presence data (Phillips et al. 2009). First, k-means spatial clusters were identified (k = 1:15), and the smallest number of clusters was chosen that simultaneously minimized the sum of squared average geographical distance. Circles were generated around each occurrence point in the training data set, with the radius being equal to the average distance among occurrence points within each cluster. Circles were merged into a buffer representing the range of occurrence points, and absence and background points were drawn randomly from within this buffer.
Predictor Variables
Thirty-five environmental variables were obtained from the CliMond Bioclim dataset (Kriticos et al. 2012). GTOPO30 elevation data were also included (USGS-EROS 2018) along with a layer summarizing human impact (HI), as a composite of nine global data layers covering human population pressure (population density), human land use and infrastructure (built-up areas, nighttime lights, land use/land cover), and human access in the form of coastlines, roads, railroads, navigable rivers (Wildlife Conservation Society, 2005). The use of such a human impact layer may be confounding. On one hand, many invasive pests are well adapted to disturbance. However, such a pattern could also be generated by collection practices that are biased toward locations close to human-made structures such as roads and settlements (Kadmon et al. 2004; Moerman and Estabrook 2006; Williams and Lutterschmidt 2006, Phillips et al. 2009; Daru et al. 2018). Therefore, models were trained both with and without HI predictor data.
To avoid model inaccuracy due to collinearity of predictor variables (De Marco et al. 2018) contribution of each variable to overall collinearity was estimated as the Variance Inflation Factor (VIF) (Guisan et al. 2002, 2017). A stepwise algorithm for reducing collinearity was performed by first estimating VIF for all predictor values associated with occurrences, removing the layer contributing most to collinearity, and recalculating VIF, until remaining layers had VIF values less than 10 (Chatterjee and Yilmaz 1992); this was performed using the vifstep function in the ‘usdm’ package (Naimi et al. 2014).
For GAM, the relative effect of each predictor was measured as a z-score, or the number of standard deviations from the mean under the null expectation of no-effect. On the other hand, predictor importance for the machine learning models ME and BRT was measured by relative contribution. These evaluations are clearly meant to establish relative importance of predictors within rather than among trained models. However, for uniformity of presentation, z-scores for the GAM model were converted to relative contribution scores by calculating their contribution to overall variance.
Ecological Niche Models
We employed three models that differed markedly in underlying statistical properties, the Generalized Additive Model (GAM), Maximum Entropy (ME), and Boosted Regression Trees (BRT). The Generalized Additive Model (GAM) was trained using the ‘mgcv’ package (Wood 2022). Maximum Entropy models (ME) were constructed using MaxEnt software (Phillips et al. 2006, 2023), run through the ‘dismo’ package (Hijmans et al. 2023). Booted Regression Tree (BRT) models were constructed using the ‘dismo’ package as well using tree complexity of 5 with a learning rate of 0.001. To train the Generalized Additive Model (GAM), we sampled absence points equal to the number of occurrences used to train the model. For Maximum Entropy (ME) and Boosted Regression Tree (BRT) models, we used 10,000 background/pseudoabsence points. Models were then used to predict probability of presence, called ‘habitat suitability’ here, for the contiguous US.
For GAM, the relative effect of each predictor was measured as a z-score, or the number of standard deviations from the mean under the null expectation of no-effect. On the other hand, predictor importance for the machine learning models ME and BRT was measured by relative contribution. These evaluations are clearly meant to establish relative importance of predictors within rather than among trained models. However, for uniformity of presentation, z-scores for the GAM model were converted to relative contribution scores by calculating their contribution to overall variance.
Model Evaluation and Ensemble Prediction
Accuracy of the models in predicting withheld occurrence data was evaluated using three metrics: Area Under the receiver operating characteristic Curve (AUC), Pearson’s correlation coefficient (COR), and Cohen’s kappa (kappa). Because these three modelling approaches are philosophically and mathematically different, agreement in prediction among them is likely due to correct discrimination between signal and noise. Therefore, we incorporated an ensemble approach (Araujo and New 2007; Stohlgren et al. 2010) to summarize models, by calculating the AUC-weighted average of GAM, ME, and BRT predictions of habitat suitability. User accuracy metrics were also calculated for ensemble predictions.