Transmitter deployment
We tracked 141 birds (SNI = 89, RSB = 52) using satellite telemetry with tracking durations ranging from 30 – 2,000 days per bird and spanning October 2016 – January 2024. Individuals were captured and fitted with transmitters in eight wetlands of the Murray-Darling Basin (MDB), which spans approximately 14% of Australia’s landmass and is subject to intensive water and wetland management (Figure 1). We fitted birds with solar-powered GPS transmitter units from three sources: Druid (Druid Technology Co., Ltd., Chengdu, China), Geotrak (Geotrak Inc. North Carolina, USA), or Ornitela (Ornitela, UAB Vilnius. Lithuania). Transmitters weighed 12–40 g and ranged from < 1% to 5% of bird bodyweight. GPS fix resolution was 15–26 m and fix frequency ranged from one minute to 6 hours (depending on transmitter type and programmed duty cycle). Fix frequency was considered in analyses, with interpolation or down-scaling applied when appropriate. The data were transmitted via either the Argos satellite network (for Geotrak units) or the 3G network (for Ornitela and Druid units).
We captured birds either by hand, or with leg-nooses, or by using a net launcher. We attached transmitters as a ‘backpack’ using harnesses made of Teflon ribbon or Spectra ribbon (Bally Ribbon MillsTM), fitted either as wing-loops with a join at the keel (SNI and some RSB), or as leg-loops (RSB). Harness design was based on designs used in other species (Jirinec et al. 2021; Karl and Clout 1986; Roshier and Asmus 2009; Thaxter et al. 2014) and modified and improved over time.
Data pre-processing
Data were filtered based on dilution of precision, removal of outliers from visual inspection of tracks, and removal of points based on extreme outliers on distance travelled per unit time and precise vertical or horizontal movements. To reduce temporal autocorrelation and produce a standardised temporal fix frequency and timing, we interpolated and thinned the data to one fix every six hours. This reduced the initial data set of ~ 1.6 million presence points to ~ 123,000, with movements spanning ~ 4 million km2 in eastern Australia (Figure 2). Most of the fix removals were for individuals tracked at very high frequency with Druid units. We removed data tracking movements from both nesting birds and points from pre-dispersal from capture site, since this study focused on habitat selection between breeding events, not during nesting or associated with capture sites (frequently nest sites for adults). This left 109,034 presence points across the two species.
Background data
We conducted data analysis and modelling using R Statistical Software version 4.4.0 (R Core Team 2024). To compare habitat selection as indicated by species presence (hereafter ‘presence’ data) to background habitat availability (i,e., pseudo-absence; hereafter ‘background’ data), we generated utilisation distribution grids for each individual. We fitted the ‘kernelUD’ function in the adehabitatHR package to create utilisation distribution from all presence points for each individual (Calenge & Fortmann-Roe, 2023). We then used the ‘getverticeshr’ function to extract the 99% contour of the utilisation distribution and sampled points on a regular grid such that the ratio of background points to presence points was approximately 10:1. The total dataset after initial quality control consisted of 84,425/ 900,149 and 24,609/ 284,909 presence/background locations for SNI and RSB respectively.
We annotated presence and background data with several environmental covariates, including: Water Observations from Space (WOfS; (GEOSCIENCE AUSTRALIA, 2015; Mueller et al., 2016); Multi-Resolution Valley Bottom Flatness (MrVBF; Gallant et al., 2012); the Australian National Aquatic Ecosystem classification (ANAE; (Brooks 2021)); the Catchment Scale Land Use of Australia (CLUM; ABARES (2021)); and, the National Vegetation Information System (NVIS; NVIS technical Working Group, 2013). These datasets are extensively described in their respective sources, but briefly: WOfS summarises 30 years of Landsat imagery to provide a long-term understanding of the recurrence of water in the landscape (25 m resolution); MrVBF is a topographic index that uses digital elevation models to classify degrees of valley bottom flatness (30 m resolution); the ANAE integrates the best available mapping data from multiple sources combined with simple rules to define aquatic ecosystem types using a number of relevant attributes (e.g. water regime, water source, salinity, landform and dominant vegetation); CLUM summarises a single dominant land use for a defined area based on the objective of the land manager as identified by state and territory agencies and classified according to a three-tiered hierarchical structure based on potential degree of modification and the impact on a putative ‘natural state’ (50 m resolution); and, NVIS summarises the extent and distribution of vegetation types in the landscape (100 m resolution). For WOfS, we used the recurrence frequency of water as a percentage of the total number of clear observations for each grid cell. We rescaled WOfS to have a mean of 0 and a variance of 1 to assist with model convergence.
To improve computational efficiency and facilitate interpretation, we chose the highest level of classification for each of the multilevel covariates which left 9 levels for MrVBF, 32 levels for CLUM and 30 levels for NVIS. To retain as many complete cases as possible, we imputed a small proportion (~1%) of points that reported no value for each covariate with either a new dummy variable (for MrVBF) or the unknown/no data variables for NVIS and CLUM. A summary of proportions of presence and background points stratified by the levels of each of the variables was initially performed as reference for model outputs.
Habitat selection modelling
For each species, we fitted an infinitely weighted logistic regression mixed model (IWGLMM) to model habitat selection with the environmental covariates as predictors using presence and background data with the glmmTMB package (Brooks et al. 2017). IWGLMM performs similarly to inhomogeneous Poisson point process and maximum entropy (maxent) for habitat selection modelling (Fithian and Hastie 2013). WOfS was continuous while the other four predictors (MrVBF, ANAE, CLUM, and NVIS) were categorical. Quadratic terms (fixed and random) were included for WOfS to model potential nonlinearity; linear terms were used for the categorical variables.
Models were also explored within species split by age (juvenile or adult), season (summer = December – February, autumn = March – May, winter = June – August, spring = September – November), and time of day (diurnal = points from 12pm; nocturnal = points from 12am) fitted by partitioning the data by these variables. Night fixes for SNI represented roosting locations because they are known to roost in one place from sunset to sunrise; conversely, RSB are less predictable, often travelling or feeding at night, and either feeding or resting during the day. We used a generalized linear mixed model (GLMM) framework with random intercepts and random slopes stratified by individual to account for between individual variation and reduce pseudo-replication. We fixed the variance at 103 rather than shrinking individual-specific intercepts towards the overall mean, following Muff et al. (2020). We set the weight of background points as W = 105, in line with previous IWGLMM applications (Muff et al. 2020; Renner et al. 2015).
Initial analyses showed poor convergence when fitting all covariates in a single model due to complete or near-complete separation of factor levels and highly correlated variables. Highly correlated between-covariate (e.g. WOfS and water land use variables) water-based variables were also difficult to interpret in a single model. To overcome this numerical instability, we created a ‘low count’ factor level for all factor variables if at least one cell count of the two-way table of the factor level versus presence-availability status was less than 50. To further improve numerical stability and aid interpretation, we fitted each of the environmental covariates separately.
For each species fitted with the IWGLMM we summarised fixed and random parameters and performed multiple comparison correction for fixed effect parameters following Benjamini (2001).
Habitat predictions
We constructed maps of relative probability of occurrence using a regular grid of points for the MDB (long. min = 138.57, long. max = 152.49, lat. min -37.68, lat. max = -24.59) at ~ 500 m resolution (~ 4 million total pixels). We constrained the predictions to the MDB because waterbird capture sites were targeted to this area and results are likely more representative of preferences for birds originating from this area. We matched environmental covariates to these points and used the ‘predict.glmmTMB’ from the glmmTMB package function along with the fitted GLMM models to produce population-level predictions (i.e., setting all random effects to zero). Model predictions were made separately for each of the environmental covariates (WOfS, MRVBF, ANAE, CLUM, NVIS) and combined into a single prediction using a weighted average of predictions, with the weights taken as the squared correlation Spearman's rank from five-fold cross-validation.
Model validation
We compared predictive performance of the fitted model using five-fold cross-validation to assess out-of-sample predictive error and model weights of each of the fitted models. Each fold consisted of an approximately equal number of individual birds. We trained models iteratively on four of five folds and made predictions with the fifth. We investigated overlap of presence points with the predicted relative habitat preference at a population scale by binning the predicted relative occurrence output over all presence absence points into deciles and computing the frequency of observed presence points within each decile. We then estimated the Spearman's rank correlation coefficient between frequency of the validation presence points within each bin and the bin rank (1-low to 10-high use), also referred to as the ‘Boyce Index’ (Hirzel et al. 2006). A strong positive correlation indicates the model assigns presence points well to preferred habitat (McCabe et al. 2021; Peters et al. 2015).
Bi-monthly surface water relationships
To explore potential relationships between short-term surface water presence and habitat selection, we intersected telemetry points with the Multi Index Method (MIM) surface water dataset for the MDB. This dataset represents a spatial time series of estimated water depth (in mm, error ± 500 mm) and maximum surface water extent at a single point in time every two months, based on multi-index surface water extents, a digital elevation model, and gauged water depths (Teng et al. 2023).
After matching the six-hourly gridded telemetry data used for the prior analyses to the spatial and temporal boundaries of the MIM dataset, 64,220 telemetry points remained for analysis from 97 individuals (SNI, N = 60; RSB = 30). Given that the MIM data are time varying, we generated control points differently to the static variables used in the static habitat selection analyses. We sampled points uniformly at a ratio of 5:1 in a circular buffer around each point with a radius equal to the distance to the next point. For each bird, we intersected both observed and control telemetry points with the MIM datasets. For each point, we produced water presence/absence and depth (mm) datasets. We also produced water depth summary statistics (mean, median, max) and calculated the proportion of wet points for 2 km and 20 km radii around each point. We then used the mgcv package (Wood 2017) to produce generalised additive models (GAMs) that were used to explore relationships between tracked bird presence and: 1) water presence/absence and depth at the point/pixel; 2) water depth statistics within a) 2km and b) 20 km radii around each point; and, 3) the proportion of points wet within a) 2km and b) 20 km radii around each point.