Florida bonneted bat sampling and DNA extraction
We obtained Florida bonneted bat samples from four core regions of the species’ range in Southern Florida (see Supplemental File for details). These four regions were: 1) the vicinity of Avon Park Air Force Range, Polk County (PC), 2) Fred C. Babcock/Cecil M. Webb Wildlife Management Area, Charlotte County (BW), 3) Collier County (CC), and 4) areas in Miami-Dade County (MD) (Fig. 1). The first three regions are in central and southwestern Florida (hereafter referred to as “western populations”) and represent largely natural habitat types including pine flatwoods, freshwater prairies, cypress swamp, and hardwood hammock. Sampling in MD occurred in the developed environment of the Miami metropolitan area.
DNA was obtained from live bats from oral swabs (Catch-All™ Sample Collection Swab, Epicentre cat. QEC89100), wing punches (4 mm biopsy punch of wing membrane, Worthington Wilmer and Barratt 1996), or both. After optimizing PCR conditions (Supplemental File) we did parallel DNA isolations on both swab and wing punch samples collected simultaneously from 20 bonneted bats to test for genotyping consistency across sampling methods.
Primer development and marker assessment
We included one published locus (TabrD15; Russell et al. 2005) and developed microsatellite loci de novo from PacBio sequence reads following the methodology outlined in Saarinen and Austin (2010). Primer information, including details on development, Genbank Accession numbers, and PCR conditions are provided in the Supplemental File.
Null alleles were estimated within each regional sample and globally using the method of Brookfield (1996) implemented in the R (vers. 3.6.1, R Core Team 2019) program PopGenReport vers. 3.0.4 (Adamack and Gruber 2014). Pairwise tests for genotypic linkage disequilibrium (LD) were estimated using Fisher’s exact test implemented in Genepop R v. 1.13 (Rousset 2008). Departures from the null hypothesis of no LD was evaluated using sequential Bonferroni correction at α = 0.05.
The power of these 22 loci to detect genetic heterogeneity among regional samples was tested using POWSIM vers. 4.0 (Ryman and Palm 2006). We simulated empirical sample sizes at effective population sizes of 500, 1000, 3000, and 5000, with FST calculated at generation 0, 20, 35, 50, 75 and 100. Tests for homogeneity were assessed using Fisher’s exact tests with tests at generation 0 expected to give false significance (α) at approximately 0.05. The proportion of significant tests over 1,000 replicates at each value of FST was used to indicate power of the markers to detect differentiation due to drift. Departures from Hardy Weinberg Equilibrium (HWE) were assessed within regional samples and globally using the hw.test function in the R program pegas vers. 0.13 (Pradis 2010). Significance was assessed using exact tests (Guo and Thompson 1992) with 1000 permutations.
Assessing genetic structure and population genetic diversity
We visualized the level of genetic clustering among sample regions using STRUCTURE vers 2.3.4 (Pritchard et al. 2000). We applied the linkage model and evaluated K ranging from 1 to 10 with 20 replicate runs per K, 100,000 burnin iterations and 200,000 sampling iterations per run. Model (K) selection included 1) the maximal estimate of the posterior probability of the data for a given K, Pr(X | K), hereafter L(K); 2) The adhoc ∆K method (Evanno et al. 2005); and 3) The supervised measures (MedMedK, MedMeanK, MaxMedK, and MaxMeanK) proposed by Puechmaille (2016). For the latter, we evaluated inclusion thresholds of 0.5, 0.6, and 0.7. (Puechmaille 2016). Finally, based on the results from L(K), ∆K, and the supervised measures, we inspected the results from a range of K values (Novembre 2016) to evaluate their potential implications for the interpretation of latent genetic structure.
Principal component analysis (PCA) was conducted using the function dudi.pca in the R package ade4 vers. 12.7-16 (Dray and Dufour 2007) applying weighted vectors and mean centering. We transformed absolute variance of principal component axes to percentage of total variance represented in the genotype data.
Finally, we performed a discriminant analysis of principal components (DAPC) in the R package adegenet vers. 2.1.3 (Jombart 2008; Jombart and Ahmed 2011).DAPC reduces intra-group principal component variation in order to maximally explain between group variation (Jombart et al. 2010). We first performed cross-validation for discriminant analysis using the xvalDapc function. We tested up to a maximum of 200 PCA components, retaining three axes for each DA (one less than the number of regional groups). DA clusters were plotted in adegenet with an overlayed minimal spanning network, implemented in the R package ade4, and we examined posterior assignment values for individual bats to assess structure.
We calculated summary statistics on global and latent genetic clusters using the R program diveRsity vers. 1.9.90 (Keenan et al. 2013). These included observed (Ho) and expected heterozygosity (He), unbiased heterozygosity (uHe), inbreeding coefficient (FIS), number of alleles per locus (A) and the 95% confidence interval (CI) based on 999 bootstrap intervals. Allelic richness was calculated using rarefaction to the smallest sample size (Ar15).
We evaluated the proportion of total genetic variation that was contained within and among regional genetic clusters using analysis of molecular variance (AMOVA) implemented in GENODIVE vers 3.04 (Meirmans and Van Tienderen 2004). We performed a standard AMOVA to assess the proportion of variance among regions and a second AMOVA in which western populations (BW, CC, PC) were grouped together to evaluate the variance among MD and western locations (assuming hierarchical genetic structure, see results). FST (Nei 1987) and its 95% CI was calculated using 9,999 bootstraps replicates. We calculated pairwise F-statistics and DJost using the function fastDivPart in PopGenReport (Adamack and Gruber 2014).
Non-spatial estimates of structure (i.e., STRUCTURE and AMOVA) assume an island model (Wright 1943) which includes equal migration regardless of the spatial proximity of populations. We tested for isolation by distance among regional populations by regression of genetic differentiation (FST/(1-FST)) to log geographic distance (centroid of sampling location within each region) using 1000 bootstrap permutations to estimate the 95% CI of the regression slope in Genepop. We calculated a standard Mantel test which tests the correlation of genetic (Euclidian) and geographic distance matrices, and we ran a stratified Mantel test permuting individuals within BW+CC+PC to allow for the potential influence of hierarchical structure as inferred from PCA, DAPC and DIYABC results. Both Mantel tests utilized 9,999 permutations and were conducted in GENODIVE.
Estimates of migration and gene flow
Estimates of genetic differentiation may be biased when subpopulations have high levels of unique allelic variation (i.e., display negative dependence; Jost 2008). We used the corPlot function in diveRsity to plot FST (Θ)/DJost calculated for individual loci against the number of alleles per locus. Contrasting slopes between estimators (e.g., negative FST and positive DJost) was taken as evidence of bias stemming from elevated mutation rates (Keenan et al. 2013).
We used the GENSBACK option in STRUCTURE to evaluate the probability of recent migration or mixed ancestry. This analysis used USEPOPINFO = 1 to identify which individuals were unlikely to be residents of their sampled population or were a descendant of a recent migrant. STRUCTURE was run for K=4 based on outputs from previous STRUCTURE and multivariate analyses (see results). We set GENBACK=2 to assign migrants (0) and recent migrant ancestry (parental=1, grandparent=2). We evaluated different migration rate priors (MIGPRIOR) to assess sensitivity to this variable as an initial condition (Pritchard and Wen 2004), using a range of values (0.01, 0.05, and 0.1) and ran 200,000 and 500,000 iterations for burnin and run length, respectively.
We further calculated contemporary migration using BayesAss ver. 3.0 (Wilson and Rannala 2003). BayesAss estimates the posterior mean migration rates based on posterior individual ancestry. We used a total of 1.0 × 107 iterations, discarding the first 1.0 × 106 as burnin and a sampling frequency of 100 to avoid autocorrelation between samples. Final parameters were dF = 0.3, dM = 0.1, dA = 0.3. Migration rate estimates were considered non-zero where the 95% credibility set (mean ± 1.96 × stdev) did not overlap zero.
While the previous approaches can identify contemporary (0-2 generations) migration, we also calculated the directional relative migration among populations and the symmetry of gene flow using the divMigrate function in diveRsity (Alcala et al. 2014; Sundqvist et al. 2016). We calculated effective number of migrants (Nm) to quantify gene flow and assessed the asymmetry of gene flow among pairs of populations by looking for overlap of 95% CI around estimates of migration generated using 1,000 bootstrap values (Sundqvist et al. 2016). Overlapping confidence intervals were interpreted as evidence for symmetrical gene flow.
Demographic analyses
To inform conservation and planning efforts we conducted multiple Ne analyses, namely: 1) we estimated contemporary (i.e., previous 1-2 generations) Ne based on patterns of linkage disequilibrium (LD); and 2) we tested for evidence of historical (i.e., over the past 500 generations) changes in Ne based on microsatellite repeat motif frequencies and differences between alleles.
Contemporary Ne was estimated using the LD method (Waples and Do 2008) implemented in NeEstimator vers. 2 (Do et al. 2014). We calculated Ne under both a random mating and monogamy models as mating system can have a large impact of inferences of Ne (Weir and Hill 1980).
We used VarEff vers. 1.0 (Nikolic and Chevalet 2014) to test for changes in Ne over the past 500 generations, encompassing the period of major anthropogenic change in south Florida. For the final analyses we applied a generic mutation rate prior of 0.001 and used a two-phase mutation model (‘T 0.15’). Additional population-specific parameters are in Supplemental File, Table S2). Following 10,000 burnin iterations we ran 10,000 steps with 10 steps per batch and 100 steps between sampled steps to avoid autocorrelation for a total of 10,000,000 MCMC iterations.
To assess the relative support for competing demographic models of population history we used approximate Bayesian computation implemented in DIYABC (Cornuet et al. 2014). We developed four models based on outputs from genetic clustering and tests of genetic differentiation, with competing models differing in patterns and times since divergence from a common ancestral population (Fig. 2). The four models were: (M1) Simultaneous radiation of four populations from an ancestral population at time tA in the past; (M2) Ancestral divergence leading to MD and an unsampled ancestral population NW, with the latter subsequently radiating into three populations (BW, PC, CC); (M3) an ancestral population diverged into MD and BW populations with subsequent colonization of the CC population from BW at t2, followed by the colonization of PC from BW at t1; (M4) an ancestral population diverged forming MD and BW followed by the serial colonization of CC at t2 and PC at t1. Prior selection, summary statistics and assessment of confidence in model choice are provided in the Supplemental File.
Mitochondrial sequencing and analysis
Given the relatively high microsatellite differentiation between MD and western populations, we generated sequence data for the cytochrome b (cyt b) gene to improve the resolution of deeper historical patterns of divergence. Cyt b has been demonstrated to perform well at species level delimitation in bats including Eumops (Caraballo et al. 2020). We sequenced a subset of 49 samples including (13 MD, 11 BW, 17 CC, and 8 PC). Details on PCR and sequencing methods are provided in the Supplemental File. Sequences were aligned using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) then imported into Mesquite vers. 3.7 (Maddison and Maddison 2021) for manual error correction and trimming.
Due to the extent of haplotype divergence detected within Florida bonneted bats, and to improve our phylogeographic inference, we added available cyt b sequences of closely related Eumops lineages representing Cuba (N = 14), Jamaica (N = 3), and Mexico (N = 13) that have been shown to form a well-supported clade that includes E. floridanus (McDonough et al. 2008; Bartlett et al. 2013). We also included three previously sequenced Florida bonneted bats from near Fort Myers, Lee County FL, and from Miami-Dade County, and E. glaucinus (N = 6) from South America (See Supplemental File Table S4).
The number of unique haplotypes (h), polymorphic nucleotide sites (S), haplotype diversity (the probability that two randomly selected haplotypes will be identical, Hd; Nei 1987), and nucleotide diversity (the average number of nucleotide differences per site, π; Nei 1987) were calculated for each location and overall with DnaSP vers. 6.12.03 (Rozas et al. 2017). Mesquite was used to visually examine codon polymorphisms. Pairwise differentiation was calculated as GST (Nei 1973) and the pairwise average number of nucleotide differences among populations. Evolutionary relationships among haplotypes were visualized using the statistical parsimony network (Templeton et al. 1992; Clement et al. 2000) implemented in PopART vers. 1.7 (Leigh and Bryant 2015) which estimates the most parsimonious pairwise connections among haplotypes at ≥ 95% probability (Templeton et al. 1987). In addition, distance networks were created using the NeighborNet algorithm (Bryant and Moulton 2004) in SplitsTree vers. 4.11.3 (Huson and Bryant 2006). We used an uncorrected p-distance as input to identify similar haplotypes. The algorithm generates clusters of haplotypes and illustrates ambiguity of connections (‘nets’) where necessary.
We used DIYABC to test support for two competing models that contrast the matrilineal history of MD bats relative to the western populations of E. floridanus and that of the Caribbean bats. The first model (Mt1) assumed that the diversity of haplotypes and differentiation between MD and the other Florida populations reflected the retention of ancestral polymorphism from a Pleistocene bonneted bat population in Florida. The second model (Mt2) assumed that western E. floridanus had an independent population history from MD, and that the latter derived separately and more recently from a Caribbean ancestor (Fig. 2; see Supplemental File for modeling details).