2.1 Study Area
Our research was located in the tidal marshes of the Fraser River Estuary, British Columbia, Canada. At over 21,000 hectares, the Fraser is one of the largest estuaries in Pacific North America, and the largest of western Canada. Widely recognized for its ecological importance, the FRE historically supported the largest annual salmon run of any river in North America, with multiple species using highly-productive estuarine tidal marshes for their juvenile life stages (Chalifour et al. 2019). Globally recognized as an Important Bird and Biodiversity Area (IBA), the FRE hosts over 300 species of birds, the highest concentration of migratory birds in Canada, including over 600,000 Western Sandpipers and 200,000 Dunlins annually, and over 100 plants and animals designated as “at risk” (Kehoe et al. 2020). The FRE has also been assigned the term “Living Working River” for its high economic and social value that predates European arrival. For millennia, First Nations have traded, resided, and been sustained by the abundant resources of the FRE, which range from fisheries to waterfowl, to extensive cultivation of marsh plants for both food and technology. The modern economic value of the FRE is also significant. It contains the largest port by export tonnage in North America, an estimated $50 billion CAD worth of human development, and fertile agricultural land that generates $1.6 billion CAD annually (Richmond Chamber of Commerce 2014). In addition, it supports the most populous metropolitan area in western Canada, with over 2.5 million residents in Greater Vancouver, 300,000 of which reside on the floodplain.
As a result of urban and agricultural development, the FRE has experienced wetland losses of 70-90% since European settlement (Hoos & Packman 1974; Boyle 1997). Remnant marsh habitats, approximately 70% of which are now protected by governments and NGOs, continue to be degraded by a multitude of historical and emerging threats, including invasive plants, sea level rise (Bornhold et al. 2008; Kirwan & Murray 2008), climate change (Taylor 2004), habitat loss (Balke 2017), habitat fragmentation, and excessive waterfowl herbivory (Demarchi 2006). These threats are likely to compound, because the human population in the region is projected to increase at a 1.4% annual rate until 2041 (Ip & Lavoie 2019). Research that investigates and improves our understanding of these threats is therefore critical for conserving and restoring these important remnant habitats.
2.2 Spectral Analysis
Imagery for our analysis was provided by the Vancouver Fraser Port Authority (VFPA) and GeoBC. The datasets were identical in their spatial (10 cm) and spectral (8bit RGB stereo) resolutions, however VFPA imagery was taken 19-24 April 2018 while GeoBC imagery was taken in summer of 2016. Both imagery datasets were acquired at low tide using manned aerial vehicles. Before analyzing the imagery, we catalogued the dominant land cover types of the estuary, which varied from monoculture-forming herbaceous species (e.g. Phalaris arundinacea, Carex lyngbyei) and shrubs (e.g. Myrica gale) to exposed substrates and log debris accumulations. Training polygons of each land cover type were manually drawn in ArcMap (Esri Inc. 2020) using 2019 field data and notes from our previous work throughout the study area. These ground-verified data were used as training sites to calculate the unique spectral signature of each land cover class. Training data varied from 55 - 1,925 pixels per class (x̃ = 663.6). Low training sizes were infrequent, occurring only in instances where verified training sites were lacking, but deemed important to include. High training sizes reflect instances where larger samples were necessary for effectively distinguishing similar classes. Separate classes and training polygons were created for each imagery dataset due to temporal, technological, and spatial differences, with 21 classes in the GeoBC dataset, and 31 in the VFPA respectively.
To enhance processing performance, we removed all non-suitable habitat from the analysis prior to classification. Non-suitable habitat was defined as (1) terrestrial habitats above the high-tide mark, (2) open water, including significant tidal channels (3) tidal mudflats, (4) areas shaded/covered by nearby woody plants, and (5) anthropogenic structures (e.g. riprap, docks, pavement). Remaining imagery was classified in ArcMap using a supervised Bayes Maximum Likelihood Classification, a probabilistic approach that characterizes each land cover class by their mean vector and covariance matrix, and then assigns each cell to the class to which it has the greatest likelihood of being a member (Richards 2013). Classified rasters were aggregated from 0.1 to 1.0 m pixel size using median class values, as the high spatial resolution of the imagery, combined with the effects of varying stem densities, substrates and shadows, produced an undesirable amount of noise in our outputs.
The GeoBC dataset was more effective at classifying vegetation in much of the study area, likely because it was taken later in the growing season when aboveground biomass was fully developed. We therefore used it wherever possible, and used the VFPA imagery in the outer estuary where GeoBC imagery was not available. Non-native T. angustifolia and T. x glauca often had overlapping spectral signatures in our exploratory analyses, perhaps due to similarities in their respective stem densities and flower abundances. To avoid confusion, the two taxa were combined as “non-native cattail” for the final classification. Native T. latifolia was rarely confused with non-native Typha, likely due to factors such as leaf colour, fewer flowering plants (Grace & Wetzel 1982), and physical differences, as T. latifolia rarely achieves densities that are comparable to its exotic congeners in the FRE.
Classification accuracy was determined through post-analysis field verification conducted between 14 November 2019 and 3 January 2020. We divided the study area into 500 m x 500 m grid cells, equalling a total of 486 cells, and randomly selected 30 for field verification, with 8 occurring in the lower estuary classified dataset and 22 in the upper estuary. The disparity of grid cells between datasets reflected the marsh being less abundant in the upper estuary (some grid cells contained no marsh), so more grid cells had to be sampled to verify a comparable amount of habitat between datasets. Field verification included visiting all 30 grid cells, and mapping all non-native cattail. As a hybrid, T. x glauca possesses many intermediate traits that confound its field identification, particularly with T. angustifolia. Molecular analyses and microscopic traits, such as stigma width (Kuehn & White 1999) and pollen shape (Finkelstein 2003) have been used to definitively identify the hybrid, but a combination of macroscopic traits can also be accurate (Smith 1967; Tompkins & Taylor 1983; Grace & Harrison 1986; Kuehn & White 1999). To assist in our cattail identification, we created a multi-access key of macroscopic traits, based on available literature (see Supplementary Material). We then evaluated the accuracy of this key by testing it on several plants of each presumed taxa in July 2019. These identifications were then compared to pollen collected from the same plants, and the key was accurate in all instances.
The accuracy assessment focused on two simplified classes: (1) non-native cattail and (2) other. A total of 200 random-stratified assessment points were generated within the grid cells of each dataset (100 per class), and at each point the classified value (i.e. non-native cattail, Y/N), was compared to field-verified data using a confusion matrix (Congalton 1991). Confusion matrix outputs included the overall accuracy (fraction of correct values assigned to a class), commission errors (fraction of incorrect values assigned to a class), omission errors (fraction of values that were incorrectly assigned to a different class) and Kappa Coefficient, a statistic used in classifications to compare observed accuracy with expected accuracy, thereby accounting for random chance (Cohen 1960).
2.3 Species Distribution Model
To investigate the potential distribution of non-native cattail in the FRE we constructed an SDM that predicts habitat suitability (i.e. probability of establishment and persistence given environmental conditions) and habitat susceptibility (i.e. probability of colonization given propagule pressure and habitat suitability; Martin et al. 2015). This model accounted for introduction, establishment and persistence requirements using a combination of distance-based probabilities and environmental data (Fig. 1).
Predictions were restricted to tidal marsh habitats from the Port Mann Bridge west to the estuary mouth. Marsh habitats were delineated using the BC Land Use spatial layer provided by GeoBC for high water boundaries. Low water boundaries were based on 0.2 m ASL elevation using a Digital Elevation Model (DEM) data (see 2.3.1 for specifications), an elevation determined in our baseline surveys as generally demarking the lower limits of marsh vegetation. Though included in our spectral analysis, a portion of Roberts Bank Wildlife Management Area, located in the southwest corner of the estuary, was excluded from predictions due to missing elevation data. Given that estuarine marshes are highly heterogenous with environmental changes occuring along minute spatial scales, we ran our predictive models at 1.0 m2 resolution, the highest resolution available based on the specifications of our predictor data. The result was a modeled marsh area of 12.3 km2, which at a resolution of 1.0 m2 represented >12 million data points.
2.3.1 Habitat Suitability
Habitat suitability models are dependent on presence or abundance data that can be related to various environmental predictors; accurate presence data are critical. Thus, we refrained from using the outputs of our spectral analysis (see 2.2) as a proxy for presence data, as it would introduce error into our suitability predictions. Verified occurrence data for non-native cattail were not available for our study area from any external source, so we used our personal observations from >5 years of field experience in the FRE. Given that many remote areas of the estuary were visited over this period, and many marsh habitats were surveyed systematically for this and other studies (Lievesley et al. 2016), we concluded that any effects of spatial bias were marginal. The high resolution of our model resulted in 344,846 presence data points produced from our 239 verified polygons. To improve processing performance we selected a randomized subsample of 100,000 data points for use in the model, after comparing the processing time and model performances of models that included 1,000, 10,000, 100,000 and all data points.
Though presence data were ascertained from our personal field records, accurate and spatially-unbiased absence points were unavailable without additional surveys. As an alternative, we created and utilized background data, which characterizes the environmental conditions of the modeled area, regardless of whether cattail is present or not (Phillips et al. 2009). Because identically-sized datasets are optimal for regression and machine learning models (Barbet-Massin et al. 2012), we randomly selected 100,000 data points within our study area to match our presence data sub-sample.
The distribution and abundance of estuarine plants are driven by many interdependent factors including elevation, proximity to channel margins salinity, available nutrients, soil sulfide concentration, sediment grain size, organic content, competition, facilitation, grazing, disturbance, tides, climate, and atmospheric CO2 concentrations (Adams 1963; Hutchinson 1982; Snow & Vince 1984; Vince & Snow 1984; Bertness & Leonard 1997; Zedler et al. 1999; Sanderson et al. 2000; Crain et al. 2004; Sullivan et al. 2010; Hood 2013). For this study we were limited to datasets that were deemed relevant to Typha distributions and freely available, so our predictors were elevation, proximity to nearest channel, and percent sand.
Elevation is considered a powerful predictor of estuarine marsh plant distributions, including T. angustifolia (Hood 2013), likely due to correlations with a host of factors including inundation duration and frequency, and edaphic properties such as soil salinity. Elevation data were comprised of a 1 m gridded bare earth Digital Elevation Model (DEM) provided by GeoBC from their National Disaster Mitigation Program dataset, generated from Lidar Data acquired during low tides between June - September 2016 (12 points/m2, vertical accuracy +/- 15 cm RMS).
Tidal channels also have a significant influence on marsh vegetation due to edaphic correlations (Zedler et al. 1999; Sanderson et al. 2000), and likely other mechanisms such as disturbance and seed dispersal. No channel dataset was accessible at the time of this study, so we used our DEM to designate all pixels with elevations below 0.2 m ASL as channels, an elevation deemed to be the lower limit of emergent marsh vegetation based on our preliminary surveys. Proximity to nearest channel was then calculated in ArcMap using the Euclidean Distance Tool, resulting in a Euclidean distance raster where each 1 m2 pixel was assigned a distance value to the nearest channel.
Percent sand data were acquired through National Resources Canada, and were based on data collation and interpretation by Barrie & Currie (2000) for the Geological Survey of Canada. At 50 m2 resolution, their interpretations were based on textural analyses of over 1500 surficial grab samples collected from throughout the river and delta front, as well as geophysical and core data provided by Hart & Barrie (1995) and Hart et al. (1998).
Before selecting our final models, we evaluated the results of several approaches including profile, regression, machine learning, presence-only, and presence-absence models in RStudio v. 4.0 (R Core Team 2020). The predictive performance of models were evaluated using a five-fold verification process (Fielding & Bell 1997), which partitioned the presence and background data into five sets, one of which was used for training while the remainder were combined for testing purposes for each model run. Each model was run five times so that each partition was used as training data once, and the Area Under the Curve (AUC) values of each model was averaged across the five runs to produce a mean AUC value with variance. Area Under the Curve is frequently used to evaluate SDMs, and measures the ability of a model to discriminate between sites where a species is present versus where it is absent on a 0 to 1 scale, with 1 being perfect and 0.5 equal to random (Hanley & McNeil 1982; Elith et al. 2006). To increase accuracy, we excluded models whose minimum mean Area Under the Curve (AUC) fell below a threshold of 0.8. The result was the inclusion of three modelling methods: Random Forest, Maxent, and Support Vector, which were run using randomForest (version 4.6.14; Liaw & Wiener 2002), dismo (version 1.14; Hijmans et al. 2017), and kernlab (version 0.9.29; Karatzoglou et al. 2004) packages. Maxent was run using default settings, while the number of trees in our Random Forest model was reduced from 500 to 200 to improve processing speeds, as out of bag error stabilized around 200 trees. A radial basis kernel (Gaussian) was selected in our Support Vector model due to its widespread use in literature, and it produced the highest mean AUC value among all kernel options. Finally, we combined the predictions of the methods with AUC values above the threshold into an ensemble model weighted by their AUC values.
2.3.2 Habitat Susceptibility
We defined susceptibility as the risk of marsh habitat being colonized by non-native cattail within 25 years, accounting for both propagule pressure and habitat suitability. The duration of 25 years was selected because greater durations were likely to diminish prediction effectiveness, due to the highly-fecund nature of cattail (i.e. seeds would likely reach most habitats given enough time) and due to the unpredictable environmental changes expected to occur this century, such as sea level rise (Bornhold et al. 2008; Kirwan & Murray 2008), altered river flows due to climate change (Taylor 2004), and chronic herbivory by overabundant waterfowl (Demarchi 2006). Susceptibility was calculated by multiplying the estimated probability of propagule introduction in 25 years with the probability outputs of our habitat suitability model (Fig. 1; see 2.3.1).
The probability of propagule introduction was estimated based on proximity to nearest non-native cattail occurrence, and non-native cattail density in nearby marsh. Proximity was calculated using a cost distance analysis in ArcMap, a variation of Euclidean distance that calculates the least-costly path between a pixel and an object (i.e.,.,. known cattail occurrences), accounting for both the actual distance and any associated costs of travelling through a heterogenous landscape. We based our distance calculations on the locations of non-native cattail determined in our spectral analysis (see 2.2). To account for spatial differences in seed dispersal effectiveness, we generated a cost raster based on land cover types, assigning cost values ranging from 1 for marsh (no added cost) to 5 for developed terrestrial areas (5 x cost; Table 1). These cost values accounted for both water and air-dispersed propagules and factors that inhibit their dispersal in each land cover type. Distances were then calculated for each pixel of our study area, ranging from 0 – 3114 least cost meters (i.e. cost-adjusted euclidean distance to nearest cattail).
Table 1 Land cover types and their associated distance costs used for estimating non-native cattail propagule dispersal probabilities. A cost value of 1 is equal to the actual (Euclidean) distance.
Land Cover Type
|
Cost
|
Cost Value Rationale
|
Marsh
|
1
|
Seeds and plant fragments freely move via wind and water. Sheltered from excessive wind and river flows.
|
Open Water
|
3
|
Seeds and plant fragments freely move via wind and water, but are subjected to deleterious effects of river flows.
|
Agriculture Fields
|
4
|
Waterborne plant fragments and seeds excluded. Airborne seeds freely move, with no tall structures or forests present. Agricultural ditches may transport seeds to suitable habitats.
|
Developed Areas
|
5
|
Waterborne plant fragments and seeds excluded. Airborne seeds potentially inhibited by developed landscape including tall buildings and tall vegetation.
|
To translate cost distances to probabilities we considered two potential pathways of expansion: clonal growth of existing patches, and dispersal through seeds and plant fragments. Clonal growth rates vary in the literature from 0.76 m/year (Bansal et al. 2019) to patch diameter increases of 4 m/year (Boers & Zedler 2008), and are likely driven by environmental variables such as inundation, and available nutrients (Woo & Zedler 2002). To obtain an annual clonal growth rate specific to conditions in the FRE we compared the size of eleven verified patches over time using historical imagery. Lateral growth rates varied from 0.53 – 1.52 m/year and averaged 1 m/year (x̅ = 1.05, SD = 0.36).
Seed recruitment is not considered the primary means of reproduction by T. x glauca during early site colonization (Travis et al. 2010, 2011), and it is often considered a partially sterile hybrid (Smith 1967; Kuehn et al. 1999). This may be the case in the FRE, as pistillate flowers have been observed to contain very few or no seeds, and advanced hybrid generations are yet to be molecularly confirmed. However, in their study of cattail clones, Travis et al. (2011) observed that T. x glauca stands grew more genetically diverse over time, suggesting seedling recruitment of backcrossed and advanced-generation hybrids could supersede vegetative recruitment over time. Though the reproductive status of T. x glauca remains uncertain, the presence of T. angustifolia facilitates dispersal of both viable T. angustifolia seed and where T. latifolia is sympatric, viable F1 hybrids seeds, as evidenced by the abundance of these taxa in spatially-isolated habitats in and near the FRE. For these reasons, we still considered seed dispersal to be an important consideration for susceptibility.
Based on the above considerations, we assigned the maximum probability value to any pixel that was < 250 least cost meters to non-native cattail (Table 2). The minimum probability value (0.4) was assigned to pixels > 2000 least cost meters, the value of which we justify by (1) the 25-year duration of our predictions, which increase the probability of dispersal to these isolated locations, (2) the possibility of cattail occurring in neighbouring inland habitats (e.g. ditches, ponds), and (3) the proven fecundity and dispersal ability of cattail in literature and in our personal observations.
Table 2 Least cost distance classes and associated probabilities used for
estimating the probability of propagule arrival.
Least Cost Meters to Nearest
Non-Native Cattail
|
Probability of Propagule
Arrival in 25 years
|
0 – 250
|
1.0
|
>250 – 1000
|
0.8
|
>1000 – 2000
|
0.6
|
>2000 – 3000
|
0.4
|
In addition to cost distance, we calculated cattail density within a 1000 m circular neighbourhood of each raster cell using the Point Density Tool in ArcMap. Neighbourhood density values varied from 0 – 1.61% and were classified using a geometric scale into four probability categories (Table 3). With the same rationale as our distance probabilities, we elected to set the minimum probability value to 0.4.
Table 3 Density classes and associated probabilities used for estimating the probability of propagule arrival.
% Area Occupied by Non-Native Cattail in Surrounding 1000 m Radius
|
Probability of Propagule
Arrival in 25 years
|
>0.500 – 2.000%
|
1.0
|
>0.200 – 0.500%
|
0.8
|
>0.005 - 0.200%
|
0.6
|
0.000 – 0.005%
|
0.4
|
Proximity and density-based probability values were then averaged to generate an overall propagule introduction probability. This value was then multiplied with our habitat suitability values to generate susceptibility probabilities for each raster cell in our modelled area: