Pollen specialist bee species are accurately predicted from visitation, occurrence and phylogenetic data

doi:10.21203/rs.3.rs-3851958/v1

Download PDF

Research Article

Pollen specialist bee species are accurately predicted from visitation, occurrence and phylogenetic data

https://doi.org/10.21203/rs.3.rs-3851958/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

An animal’s diet breadth is a central aspect of its life history. Yet information about which species have narrow dietary breadths (specialists) and which have comparatively broad dietary breadths (generalists) is missing for many taxa and regions. One possible way to address this gap is to leverage interaction data found on museum specimens and published in the literature. Here, we use bees as our focal taxon to predict dietary specialization and generalization using machine learning models and interaction data, along with a bee phylogeny, and occurrence data for 682 bee species native to the United States. To assess whether our models can transfer to new regions or taxa, we used spatial and phylogenetic blocking in assessing model performance. We found that specialist bees mostly visit their host plants, and that they can be predicted with high accuracy (mean 92% accuracy). Overall model performance was high (mean AUC = 0.84), and our models did a moderate job of predicting generalist bee species, the minority class in our dataset (mean 62% accuracy). Models tested on spatially and phylogenetically blocked data had comparable performance to models tested on randomly blocked data. Our results suggest it is possible to predict specialist bee species in regions and for taxonomic groups where they are unknown but it may be more challenging to predict generalists. Researchers looking to identify pollen specialist and generalist species can generate candidate lists of these species by training models on bees from nearby regions or closely related taxa.

diet breadth

generalist

bee

oligolecty

interactions

pollen

An animal’s diet breadth affects many aspects of its ecology and evolution. Animals with narrow diet breadths, known as dietary specialists, tend to be rarer (Slatyer et al. 2013), are less likely to be targeted by predators and more likely to evolve defense mechanisms against them (Singer et al. 2014), and may be more vulnerable to anthropogenic change (Clavel et al. 2011). In regions and for taxa where we know species’ diet breadths, scientists have asked important ecological and evolutionary questions about why species specialize (Hardy and Otto 2014; Danforth et al. 2019), about how species’ diet breadths change as humans alter the environment (Wood and Roberts 2017), and about whether coevolution between specialist herbivores and their host plants caused radiations in angiosperm and insect lineages (Ehrlich and Raven 1964).

For many regions and taxa, however, we do not know which species are dietary specialists, or what their host resources are. Most regions are under-sampled and missing the interaction datasets needed to categorize species’ diet breadths, with western Europe and the North American coasts being exceptions (Hortal et al. 2015; Etard et al. 2020; Poisot et al. 2021). Invertebrates suffer from particularly high data deficiency. For example, in a global traits database for ants (Parr et al. 2017) only 36% of species have their dietary niches categorized, and ant species from Africa, Asia and Oceania have no dietary niche data at all. A recent synthesis of bee pollen diet breadth suggests that of 20,000 bee species globally, only 860 have sufficient pollen data to categorize species’ diet breadths (Wood et al. 2023).

Yet even in places that have interaction datasets, identifying species’ diet breadths poses challenges. An interaction dataset typically contains only a subset of the interactions occurring in the area that was sampled, because data collectors will inevitably miss interactions between species (Chacoff et al. 2012). For example, Chacoff et al. (2012) found that even when they sampled 80% of all pollinator species in a study area, they still missed 45% of the pollinators’ interactions with plants; the paper estimated that to sample 90% of interactions it would require a five-fold increase in sampling effort. When many interactions go undetected, generalist species may be incorrectly classified as specialists (Blüthgen 2010; Dorado et al. 2011). This problem is especially pronounced for rare species. For instance, a species observed only once in a dataset can only be detected using a single resource species. Thus, regardless of its actual diet breadth it may be classified as a specialist. Most ecological datasets have many of these singletons (Novotný and Basset 2000; McGill 2003).

When it comes to pollinators such as bees, identifying dietary specialists and generalists holds special importance. Bees frequently specialize on one plant taxon for pollen, their main source of proteins and lipids. Here, we define dietary specialist bees (hereafter, pollen specialists) as species that consume pollen from within a single family of plants, and we define generalists as species that consume pollen from plant species in more than one family, following Robertson (1925). Identifying pollen specialist and generalist bees is important in supporting populations of specialist bees, as it allows restoration and conservation practitioners to strategically monitor and plant the host plants specialists rely on (e.g., as has been done with butterflies; Pelton et al. 2019). In addition, a bee species’ diet breadth can influence its effectiveness as a pollinator. Pollen specialist bees have the potential to deplete a plant’s pollen supply by feeding the host plant’s pollen to their offspring, thereby reducing the success of a plant’s male gametes (Parker et al. 2016). By contrast, generalist bees may transfer more heterospecific pollen from different plant species, clogging a plant’s stigmas and leading to a decrease in seed production (Morales and Traveset 2008; Smith et al. 2019). Both processes have implications for the overall fitness and reproductive success of plants.

Like we see with other animals, bees’ diet breadths are challenging to identify because of missing interaction data (Dorado et al. 2011; Chacoff et al. 2012). But for bees, identifying diet breadths poses an additional challenge: pollen specialists can, superficially, appear to be generalists because they will visit flowers from non-host plants to obtain nectar. In his influential work from 1925, the bee biologist Charles Robertson noted that pollen specialist bee species made almost as many visits to alternative floral hosts for nectar as they did to their pollen hosts (253 non-host nectar visits vs 317 visits to host plants; Robertson 1925). Later studies have found specialists’ nectar-feeding behavior to be more variable, with some pollen specialist bee species nectaring exclusively at host plants and others nectaring from a variety of plant taxa (Neff and Danforth 1991; Pekkarinen 1997). When specialists visit non-host plants for nectar, it can create the perception that specialist species are generalists, and Robertson (1925) emphasized the need to differentiate between pollen and nectar visits when identifying pollen specialist and generalist bees. Since Robertson, bee biologists have typically identified pollen specialist and generalist bees by examining the pollen contained in bee species’ larval provisions, or in pollen scopal or corbiculate loads that bees carry back to their nests (e.g., Müller and Kuhlmann 2008, Sedivy et al. 2013, Wood and Roberts 2017). However, pollen datasets like these are challenging to collect, and are relatively rare. Much more widely available are visitation datasets, which provide records of the flowering plants that a bee species is observed visiting and typically do not identify if the bee is collecting or carrying pollen from the visited plant. A separate body of literature has used visitation data to identify generalist and specialist pollinators of plants (Blüthgen et al. 2006; Dormann 2011). These are distinct from pollen specialists in that they might pollinate a range of plant genera but collect pollen from only one of them. We are not aware of any studies using visitation data to broadly identify pollen specialist and generalist bee species across multiple bee genera and families.

In sum, there are two challenges when identifying pollen specialist and generalist bees using visitation data. First, rare generalists may be incorrectly classified as specialists due to missing data. Second, specialist bee species may be incorrectly classified as generalists because they will visit non-host plants to obtain nectar. Two additional types of data may help with these challenges. First, phylogenetic data may improve predictions of specialists and generalists. In bees, diet breadth shows a strong phylogenetic signal, meaning closely related bees tend to share the same pollen hosts and dietary breadths (Sipes and Tepedino 2005; Sedivy et al. 2013). Some bee genera, such as Perdita and Dufourea are comprised almost entirely of pollen specialists, while others like Bombus and Lasioglossum are predominantly comprised of generalists (Fowler 2020a, b, Fowler and Droege 2020, this paper). Second, an animal’s geographical distribution and phenology are known to be associated with its diet breadth (e.g., Slatyer et al. 2013, Danforth et al. 2019, Glaum et al. 2021). Specialists tend to be active later in the season compared to generalists (Pelletier and Forrest 2023), and they have shorter adult activity periods (Glaum et al. 2021). They also tend to have narrower geographic ranges than generalists (Slatyer et al. 2013) and, in the United States exhibit greater diversity in more western longitudes (Danforth et al. 2019).

Here, we use bees of the United States as a focal taxon to investigate if specialist and generalist species can be predicted from flower visitation data, a bee phylogeny and geographical and phenological data. Our dataset contains a total of 682 specialist and generalist bee species whose diet breadths are already known, based on classifications made using bee pollen loads, provisions or observations of pollen foraging. Our approach enables us to see if visitation and other widely available types of data can be used to make predictions of bee diet breadth without more resource-intensive pollen data. We ask 1) How often do pollen specialist bees visit their host plants? 2) Can we predict pollen specialist and generalist bees? And 3) what variables are most important for predicting specialist and generalist bees: interaction-, phylogenetic-, phenological- or geographic variables?

Diet breadth datasets

To predict pollen specialist and generalist bee species we fit predictive models using bee species with known diet breadths for the United States. Our diet breadth data came from two sources. First, we used a list of pollen specialist bee species and their host plants in the eastern, central and western United States (compiled from Fowler 2020a, b, Fowler and Droege 2020); this list categorizes bee species as pollen specialists using multiple lines of evidence, of which pollen data are a main part. The host plants of some bee species on this list spanned multiple plant families, and we reclassified these bee species as a pollen generalists.

Second, we used a bee-pollen database we compiled ourselves using a literature survey. We searched the scientific literature between September 2019 and August 2022 for articles or books in English, German, French and Portuguese that reported descriptions of bees’ pollen hosts using primary pollen data or that were secondary sources synthesizing bee’s pollen hosts from primary data. Here, primary pollen data are defined as the plant genera or families found in bees’ scopal/corbiculate loads or nest provisions, or, rarely, the plants that the authors observed bees pollen-foraging from. We searched Google Scholar and Web of Science using the search terms “bee,” “mono/oligo/poly,” “lecty/lege/lectic,” “pollen host plant,” “pollen host,” “pollen specialization,” “pollen diet breadth,” and “host preference,” or the comparable search terms in German, French and Portuguese. We also searched for papers cited within the articles found using this search. For each bee species we classified a plant taxon as its pollen host if it made up at least 5% of the total scopal/corbiculate load or nest provision, following (Cane and Sipes 2006). We also considered a plant taxon to be a pollen host if the authors of the paper observed the bee collecting pollen from that plant taxon (although such studies were a minority). We classified bee species on this list as pollen specialists if their pollen hosts came from within one family and as pollen generalists if their host plants spanned more than one plant family. Our full list of specialist and generalist bees and the sources used to classify them is provided in the Appendix S2. Because this second dataset relied on a broader range of studies, occasionally a bee species classified as a specialist by the first dataset was classified as a generalist by the second dataset. In such cases we used the classification of the second dataset.

Visitation dataset

The visitation data we used to predict bee species’ diet breadths came from bee species visitation records found in Global Biotic Interactions (GloBI) (https://www.globalbioticinteractions.org/; Poelen et al. 2014). GloBI is an open dataset indexer that unifies species interactions across scientific literature, specimens from natural history collections and online observations. GloBI provided bee species visitation records across large spatial scales and with large sample sizes – the total dataset contains 259,210 bee visitation records between bees and plants (i.e., prior to filtering by geography or other variables, see below). We used version 0.5 of the GloBI indexed dataset (GloBI Community 2022).

We first filtered the GloBI database to only include interactions between bees and plants, searching for interactions between individuals within the seven families of bees and individuals within the kingdom Plantae. We excluded records where plants were not identified to at least the genus level and bees to the species level, as well as all duplicate records from the data. Hereafter, we refer to this dataset as the ‘visitation dataset.’ Note, this may exclude some species that have unresolved taxonomy and/or are rare – and thus harder to classify.

We further filtered the data to only include bee species that occur in the contiguous United States. Because not all records in the visitation dataset have geographic coordinates, we determined which bee species occur in the contiguous United States using a list from Chesshire et al. (2023), which was compiled using specimen records from GBIF and SCAN (GBIF.org 2021a, b, c). We then excluded cleptoparasites (Michener 2000; Appendix S2) and non-native bee species, using a list of non-native bee species for the United States (Russo 2016; Appendix S1: Table S1). We also removed records of eight bee species for which we did not have any phenological data (see ‘Estimating geographical and phenological predictors’). Finally, we excluded species we did not have diet breadth data for (Appendix S2).

We updated bee taxon names from the visitation dataset to the current valid name following the same methodology outlined in Chesshire et al. (2023; see Appendix S1 Supporting Methods for more details). We updated plant taxon names from the visitation dataset using The Plant List (www.theplantlist.org). Although this list is outdated (last updated in 2013), we opted to use it because this was the taxonomic standardization method employed by the plant phylogeny we used (Jin and Qian 2019; see next section). We accessed The Plant List using the R package Taxonstand (Cayuela et al. 2012) and updated plant family names separately, using the World Flora Online (http://www.worldfloraonline.org/). Plant taxa not on The Plant List were also updated using the World Flora Online.

After these filtering steps and taxon name updates, our visitation dataset contained 150,880 records of 682 bee species visiting 1,185 plant genera: 50,858 were records of pollen specialist bees, from 477 bee species, and 100,022 were records of generalist bees from 205 bee species. The records came from 40 sources total (Appendix S1: Table S2), with 22.6% of the records coming from observations, such as iNaturalist, 65.3% of the records coming from museum specimens (typically the flowering plant species the specimen was collected from), and 12.1% of the records coming from the scientific literature or unknown, compiled, sources (note, it is possible that some records in the “unknown” category may contain pollen data).

Bee and plant phylogenies

Since there are not species-level phylogenies of bees for all genera, we used a genus-level bee phylogeny from Hedtke et al. (2013). This phylogeny was missing 5 bee genera from our visitation dataset (of 63 total). To add them, we reconstructed the phylogeny using the R package U.PhyloMaker (Jin and Qian 2022), using the Hedtke phylogeny as the megatree. To add new taxa to this megatree, we used “scenario2,” from U.PhyloMaker; this adds genera randomly within their families (Jin and Qian 2022).

To make a phylogeny of plants, we used the megatree from Smith and Brown (2018), which we accessed using the R package V.PhyloMaker (Jin and Qian 2019). We pruned the tree to be a genus-level phylogeny. This phylogeny was missing 103 of the plant genera in our visitation dataset, of 1185 plant genera total, and we added them randomly within their family using scenario 2 from V.PhyloMaker (Jin and Qian 2019).

Occurrence dataset

To obtain geographic and phenological predictors for all bee species in our interaction dataset, we used specimen records from North America (Chesshire et al. 2023). This dataset included specimens in Canada, Mexico and Alaska. To ensure independence of the specimen records, we filtered the dataset to have one record of each species per combination of latitude, longitude and collection date. Latitude and longitude coordinates were rounded to three decimal places prior to this filtering step. We removed geographic outliers, defined as specimens collected at least 1500-km from any other specimen of the same species. Hereafter, we refer to this as the Chesshire et al. 2023 occurrence dataset.

Estimating visitation predictors

We used the visitation dataset to identify which plant taxa a bee species visits and how many.

To quantitatively measure which plant taxa a bee species visits, we used a multivariate approach: we first built a matrix with bee species as rows and plant genera as columns, with matrix cells filled by the number of interactions observed. We used the matrix and the Morisita-Horn index to calculate the difference between each pair of bee species in the plant genera visited. We took the first two eigenvectors of the resulting distance matrix to use in our models to get an eigenvalue for each bee species. Similar eigenvalues indicate bee species visit similar plant taxa. We also estimated these eigenvalues for interactions between bee species and plants at the plant-family level.

To estimate how many plant taxa each bee species visits, we calculated the diversity of plant genera and families visited using the inverse Simpson index. The diversity of plant families a bee visited was strongly correlated with phylogenetic diversity of plant genera it visited (r > 0.7) and the Simpson diversity of plant genera visited (r > 0.7). We thus excluded this variable from our final model.

Finally, we estimated the phylogenetic diversity of plant genera a bee species visits. In contrast to taxonomic diversity, phylogenetic diversity will be higher for bee species that visit distantly related plant genera than for bee species that visit closely related plant genera. We estimated phylogenetic diversity using a phylogenetic generalization of the inverse Simpson index (Chao et al. 2010, 2014), and hereafter refer to this metric as “phylogenetic Simpson diversity.” We also estimated phylogenetic richness. However, this variable was strongly correlated with a number of other variables in the model (r > 0.7). We estimated phylogenetic Simpson diversity using the function ‘hill_phylo’ from the R package hillR (Li 2018). More details about how we calculated this metric are provided in the Supporting Methods (Appendix S1).

Estimating bee phylogenetic predictors

We included bee phylogenetic information in our model, following the approach used in Lucas (2020). For each bee species, we calculated the phylogenetic distance to each bee genus in the dataset, and used these distances as predictor variables, resulting in 63 phylogenetic predictor variables, one for each bee genus. We used the function ‘cophenetic’ from the package ape to calculate pairwise phylogenetic distance between bee genera (Paradis and Schliep 2019).

Estimating geographic and phenological predictors

To estimate each bee species’ approximate geographic location, we used the median latitude and longitude of the specimen records in the Chesshire et al. 2023 occurrence dataset. To estimate each bee species’ extent of occurrence, we created a minimum convex polygon from all records and calculated the area in hectares. For species with fewer than four unique latitude-longitude combinations, there were too few points to create a minimum convex polygon. For these, we randomly added points within 100-km of existing specimen records to reach the four points needed to create the minimum convex polygon. We also calculated the sample size of each bee species in the dataset, a measure of the bee species’ regional abundance. Geographic analyses were conducted using the R packages sf (Pebesma 2018) and sp (Pebesma and Bivand 2005; Bivand et al. 2013). The minimum convex polygons were created using the function ‘chull’ from the R package grDevices.

To calculate phenological predictors, we excluded specimen records without collection dates (12.4% of records). We estimated the approximate time of year the bee was active by calculating the median date of collection. To estimate the length of the bee’s flight period, we subtracted the beginning of the bee species’ activity period from the end of its activity period by subtracting the 10th percentile of the bee’s collection dates from the 90th percentile, following Harrison et al. (2019).

Analyses

All analyses were conducted in R version 4.2.1 (R Core Team 2022). The code and data for running the analyses are available on Zenodo: https://doi.org/10.5281/zenodo.8347146.

1) How often do pollen specialist bees visit their host plants?

To assess how often specialist bees visit their host plants, we used the visitation dataset. We calculated the proportion of times a specialist bee species was visiting its host plant out of all visits recorded. For this analysis, we excluded bee species with fewer than 20 records, to avoid assessing the visitation records of incompletely sampled species. This left us with a sample size of 300 specialist bee species and 49,710 records.

We also conducted two post-hoc analyses to investigate why some pollen specialist bee species mostly visited non-host plants. First, we investigated if this was a statistical artifact driven by bees with small sample sizes, which are less likely to be representative of their true population. To do this, we visually examined the relationship between a bee species’ sample size and the proportion of visits to its host plant. We also calculated the Pearson correlation coefficient of this relationship.

Second, we examined whether these pollen specialist bees are explained by male bees, which do not collect pollen for their offspring and, as a result, might nectar at non-host plants more frequently. Because the sex of the bee was not specified in most records in our visitation dataset, we narrowed down our analysis to bee species with at least 10 records each for male and female bees, resulting in 260 specialist bee species from 34,822 records. We then compared the percentage of visits made by male and female bees to their pollen hosts using a paired Wilcoxon signed ranks test.

2) Can we predict pollen specialist and generalist bees?

To predict whether a bee species is a specialist or generalist, we used a random forest model for binary classification, using the R package randomForest (Liaw and Wiener 2002). Random forests are a type of supervised machine learning, which make no distributional assumptions and can detect complex, non-linear relationships. In our random forest, we used the default parameters from the R package: decision trees were created using bootstrapped samples the same size as the data, and ten random predictor variables were considered at each tree split. The decision trees were optimized by finding the tree with the smallest node impurity. The full set of predictor variables used in our random forest model are described in Table 1.

Table 1. Predictor variables considered in the random forest models to predict bee diet breadth. We removed some predictors due to collinearity (Pearson correlation coefficient > 0.7). Variables we excluded for this reason are indicated with a “no” in the “Included?” column.

Predictor variable	Description	Included?	Dataset used
Phylogenetic richness	Faith's phylogenetic diversity of plant genera visited	no	Global Biotic Interactions
Phylogenetic Simpson diversity	Phylogenetic Simpson diversity of plant genera visited	yes	Global Biotic Interactions
Simpson diversity (plant genus)	Simpson diversity of plant genera visited	yes	Global Biotic Interactions
Simpson diversity (plant family)	Simpson diversity of plant families visited	no	Global Biotic Interactions
Identity of plant genera visited	First and second eigenvalues of Morisita-Horn distance-matrix for plant genera visited	yes	Global Biotic Interactions
Identities of plant families visited	First and second eigenvalues of Morisita-Horn distance-matrix for plant families visited	yes	Global Biotic Interactions
Median latitude	Median latitude of bee specimen records in North America	yes	Chesshire et al. 2023
Median longitude	Median longitude of bee specimen records in North America	yes	Chesshire et al. 2023
Regional abundance	Number of specimen records in North America	yes	Chesshire et al. 2023
Extent of occurrence	Area in hectares of minimum convex polygon for specimen records in North America	yes	Chesshire et al. 2023
Median day-of-year	Median day-of-year of collection	yes	Chesshire et al. 2023
Duration of flight season	90% quantile of day-of-year of collection - 10% quantile of day-of-year of collection	yes	Chesshire et al. 2023
Pairwise phylogenetic distance	Phylogenetic distance to each bee genus in the dataset	yes	Hedtke et al 2013

To assess model performance, we used k-fold cross validation, in which separate datasets are used to train and test the model. In this process, the data are divided into k folds: k-1 folds are used to train the model and the remaining fold is used to test the model. This is repeated until all k folds have been used to test the model.

To evaluate how effectively our model predicts specialist bees in novel regions or phylogenetic groups, we used a special type of k-fold cross validation. While dividing our data into training and testing sets, we used spatial and phylogenetic blocking (Bahn and McGill 2013; Roberts et al. 2017). This approach leads to the creation of datasets that are either spatially or phylogenetically independent. It provides more accurate assessment of predictive power than the conventional random selection of folds (Bahn and McGill 2013; Roberts et al. 2017). By using this technique, we can assess how well our model performs when dealing with bee species located in different regions or originating from distinct families compared to those used to train the model. As a baseline, we also used random-stratified blocking to see how blocking methods affected our results.

For phylogenetic blocking, we blocked bees by family. However, the smallest family in our dataset, Melittidae, had only three generalist bee species. We therefore combined this family with Colletidae, the second smallest bee family in our dataset. We grouped by sample size rather than by phylogenetic distance because Melittidae is the likely basal family of bees (Danforth et al. 2013).

For the spatial blocking methods, we removed all spatial predictors from the models. For the phylogenetic blocking methods, we removed all phylogenetic predictors from the models. We did this to avoid extrapolating outside the predictor space used to train the model. The three blocking methods (random, spatial, phylogenetic) are described in more detail in the Supporting Methods (Appendix S1).

We used the same metrics to assess model performance for all blocking methods. As measures of overall model performance, we used the area under the receiver operator curve (AUC) and balanced accuracy (the arithmetic mean of specialist and generalist prediction accuracies); both are insensitive to class imbalance, which we had in our dataset (70% specialist species and 30% generalist species). We also calculated the prediction accuracies of specialists and generalists.

We found that model performance was similar between random-stratified blocking and the other two blocking methods (see Appendix S1: Figure S1). For simplicity, we report here model performance metrics for spatial and phylogenetic blocking methods only and provide the metrics for random blocking in Appendix S1 (Figure S1).

We also conducted a comparison between the random forest models and a simpler phylogenetic model. In this simpler model, we predicted that the diet breadth of a bee species was the same as the diet breadth of the majority of bee species within its genus, based on the training data. For bee species with no congeners in the training data, we predicted its diet breadth to be the same as the majority of bee species within its family. We evaluated the performance of this simpler model using spatial cross validation, employing the same methods as for the random forest model.

3) What variables are most important for predicting specialist and generalist bees?

To determine what variables are important for predicting specialist and generalist bees we used the “importance” function in the package randomForest to calculate each variable’s importance. The function calculates the change in the error rate of the model when a predictor variable is permuted, divided by the standard deviation of the difference. To rank the importance values, we took the mean of the importance values for each predictor. The means were calculated by aggregating across all model runs from all blocking methods. We also assessed the importance of phylogenetic predictors in aggregate by removing them from spatially blocked models and calculating the change in the models’ mean accuracy and AUC.

1) How often do pollen specialist bees visit their host plants?

In our visitation dataset, we found that on average 72.3% of the visits made by pollen specialist bees were to their host plants (median = 82.8%; Fig. 1), with approximately 10.0% of specialist bees having 100% of visitation records to their host plant. For eight specialist bee species, there were no records of the bee visiting their pollen hosts. These species were Dufourea virgata (Cockerell, 1898), Megachile frigida (Smith, 1853), Perdita hurdi (Timberlake, 1956), Perdita layiae (Cockerell, 1938), Svastra sila (LaBerge, 1956), Svastra atripes (Cresson, 1872), Perdita zebrata (Cresson, 1878) and Perdita wilmattae (Cockerell, 1906).

Many pollen specialist bee species that mostly visited non-host plants had smaller sample sizes (see bottom left cluster of points in Fig. 2a), suggesting these bees may be a statistical artifact. Overall, the relationship between host plant fidelity and sample size was weakly negative (r = -0.10) and there were common bee species visiting their host plants less than 50% of the time. Twelve species had over 200 records with less than half to the putative pollen host, including Perdita zebrata (n = 1951; 0% of visits to its pollen host), Protoxea floriosa (Fox, 1983; n = 1366, 3% of visits to its pollen host), Megachile brevis (Say, 1837; n = 1148, 30% of visits to its pollen host) and Megachile mendica (Cresson, 1878; n = 934, 33% of visits to its pollen host; see Appendix S1: Table S3 for full list and host plants).

Male bees were also significantly more likely to visit nonhost plants than female bees of the same species (Wilcoxon signed ranks test: V = 15,404; p = 0.00006). However, the effect size was small: 77% of visits were to host plants for females vs 72% for males (Fig. 2).

2) Can we predict pollen specialist and generalist bees?

In our random forest models, we achieved 84.8% mean overall accuracy (14 percentage points better than a naïve majority guessing approach), and a mean balanced accuracy of 77.0%. Moreover, we achieved 91.8% mean accuracy at predicting specialists, an 62.2% mean accuracy at predicting generalists, and a mean AUC score of 0.84.

Blocking methods did not strongly affect overall model performance or the model’s ability to predict specialists. Models achieved a mean AUC of 0.83 and mean specialist accuracy of 91.3% when tested and trained on phylogenetically independent sets of data (phylogenetic blocking); they achieved a mean AUC of 0.85 and a mean specialist accuracy of 92.1% when tested on and trained on spatially independent sets of data (spatial blocking). However, the spatially blocked models tended to perform worse at predicting generalists (60% mean accuracy vs 66% for phylogenetically blocked models). At times, the spatially blocked models’ predictions were worse than a coin toss at predicting generalists (minimum prediction accuracy = 38%).

Our simple phylogenetic models performed well at predicting specialists (mean accuracy = 93.4%) and had moderate overall performance (mean balanced accuracy = 65.9%), but they performed poorly at predicting generalists (mean accuracy = 38.4%; minimum accuracy = 9.1%).

Table S4 (Appendix S1) provides a list of specialist bee species for which our random forest models provide a specialist classification probability of less than 50%. Table S5 (Appendix S1) provides the same information for generalists.

3) What variables are most important for predicting specialist and generalist bees?

The two most important variables for predicting specialist and generalist bees were the phylogenetic and taxonomic diversity of plant genera visited (Fig. 4). On average, the phylogenetically blocked models predicted that a bee species had a 72.5% chance of being a specialist if it visited the smallest phylogenetic diversity of plants in the dataset vs a 56.0% if it visited the greatest (holding other covariates at their true values; Appendix S1: Figure S1). Similarly, for Simpson diversity, the phylogenetically blocked models predicted that a bee species had a 76.0% chance of being a specialist if it visited the smallest taxonomic diversity of plants in the dataset vs a 57.6% chance if it visited the greatest (Appendix S1: Figure S1). Other important variables for predicting bee diet breadth included the identities of the plant genera a bee visited, the bee species’ extent of occurrence and its regional abundance (Fig. 4). The full list of mean importance values for all predictors is provided in the Supporting information (Appendix S2: Table S6).

Overall, we found that bee phylogenetic variables in aggregate had minimal effect on model performance. When bee phylogenetic predictors were removed from the spatially-blocked models, overall model performance changed little (change in average AUC from 0.85 to 0.81, change in overall accuracy from 85.6–83.2%), as did prediction accuracy for specialists and generalists (change in generalist prediction accuracy from 60.1–57.4%; change in specialist prediction accuracy from 92.1–91.1%).

Our analyses reveal that pollen specialist bees visit their pollen host plants 72.3% of the time and that these species can be predicted using visitation, geographic, phenological and phylogenetic data. Performance for predicting generalists was moderate, likely due to class imbalance in our data (30% generalists vs 70% specialists), but overall model performance was high, with AUC scores above 0.8. The random forest models performed substantially better than simpler phylogenetic models without geographic, phenological or visitation data. Below, we discuss the significance and applications of our findings.

In our study, we find that visitation data – used alongside phylogenetic and occurrence data – can be used to determine if a bee species is a pollen specialist. This conclusion is different from the one bee biologist Charles Robertson came to almost a century ago (Robertson 1925). Robertson found that, in his smaller sample (n = 570 records total), almost half of all visits that specialist bees made were to non-host plants, leading him to conclude that visitation data cannot be used to differentiate between pollen specialists and generalists. But our much larger dataset (n = 150,880 records total, 52,550 of them from pollen specialists) shows that pollen specialist bees are generally more faithful visitors to their pollen hosts than what Robertson found, with pollen specialists making about 72% of visits to their pollen hosts. Pollen generalist bee species, by contrast, visited a wide phylogenetic diversity of plants, 15.3% greater than what specialists visited (Appendix S1: Figure S3). These differences in visitation, along with phylogenetic and biogeographic differences between generalists and specialists, allowed us to predict specialist bees with an average accuracy of 92% and generalist bees with an average accuracy of 62%.

We found that generalists were more challenging to predict than specialists (Fig. 3). One potential reason why is that rare generalist species are harder to classify (see Introduction). A second potential reason why is that generalist species made up the minority of bee species in our dataset. Random forest models for binary classification generally perform worse at predicting the minority class (Elrahman and Abraham 2013), and the more imbalanced the data, the worse the performance. Although our data are not strongly imbalanced (70% of bee species in our dataset are specialists), a model fitted with our data could have an overall accuracy of 70% by predicting that bee species are specialists 100% of the time. Our models performed better than that, but future research could improve model performance by utilizing methods to explicitly deal with class imbalance (Elrahman and Abraham 2013).

Significance

Predicting specialist and generalist bee species is important for several reasons. First, predicting specialists allows for more targeted conservation because specialists require their host resources to complete their lifecycle. For example, planting milkweed, the larval host of the declining monarch butterfly, is seen as a critical part of this specialist’s recovery strategy (Pelton et al. 2019). In the United States, there are 29 specialist bee species currently rated by the website NatureServe as critically imperiled or imperiled, though the overwhelming majority (88%) of specialist bee species have not been assessed for their conservation status (www.natureserve.org; though see Bartomeus et al. [2013], Harrison et al. [2019], Lane et al. [2023] for studies that have assessed solitary bee species for their degree of rarity or relative decline). For imperiled species, planting host plants may play a crucial role in promoting population recovery. In addition to supporting species conservation, predicting specialist and generalist bee species can tell us more about the quality of pollination services provided by a bee species, with specialists potentially depleting more pollen, but also transferring more conspecific pollen than generalists (Parker et al. 2016; Smith et al. 2019).

Applications

Our proposed modeling approach can help guide data collection on specialist and generalist bees for taxa or regions where pollen data are missing. For instance, researchers can fit models to existing data for bee genera where lists of specialist and generalist bees have been generated using pollen data. Once the models are fit using the training data, researchers can then use these models to predict specialists and generalists in a closely related bee genus where such lists are not available. The accuracy of the model’s predictions should improve the more closely related the bees in the training and testing data (Houlahan et al. 2017). Although our findings suggest that models phylogenetically blocked in this way will misclassify ~ 16% of species (in our data, 34% of generalists and 8% of specialists), their predictions can provide guidance for future research by pointing researchers towards the bee species most likely to be specialists or generalists. To validate model predictions, researchers can collect pollen data from the scopal loads or nest provisions of the bees (Cane and Sipes 2006). Pollen data are essential because they reveal the plant taxa from which the bee collects pollen. This approach enables focused research efforts and directs researchers towards the bee species that are most likely to specialize.

Approximately 30% of the bees in our visitation dataset (n = 389 species total) were missing a specialist or generalist classification from our bee-pollen dataset and were thus excluded from our analyses. Because we had visitation, geographic and phylogenetic data for these bee species, their diet breadths might be predicted using our own modeling approach. However the approach would need to be modified to account for the likelihood that pollen generalist bee species are over-represented in the missing data and under-represented in the training dataset. This is because we had a comprehensive list of specialist species for the United States, but not generalists (Fowler 2020a, b; Fowler and Droege 2020). In fact, generalist bees made up only 30% of the bee species in our analyzed dataset, even though they likely represent 50–58% of bee species (Wood et al. 2023; note this paper was at a global scale). One possible way to address this class imbalance issue is to sub-set the training data so that the proportion of generalists is higher, and matches our best guess for what we expect from the data being predicted (as in Elrahman and Abraham 2013).

Models trained and tested on spatially and phylogenetically blocked data had comparable performance to models trained and tested on data that were blocked randomly (Appendix S1: Figure S1), suggesting that models can transfer from one set of bee families to a separate family, and from one set of regions to a separate region. This was the case despite strong differences between families and regions in the proportion of specialists. For example, pollen specialist bees comprised at least three quarters of species in the families Melittidae (85.7% of species) and Andrenidae (80.5% of species), but fewer than half of the species in Halictidae (45.1% of species). Among bee genera there was even greater skew: 19.0% of the bee genera in our data were comprised entirely of generalists and almost half (49.2%) were comprised entirely of specialists. Similarly, there are large differences between different regions of the United States in the proportion of specialists. The western United States, known for its high bee species diversity, hosts a greater number and proportion of specialist bee species than the eastern half of the country (Danforth et al. 2019; Fowler 2020a, b; Fowler and Droege 2020). Notably, the Chihuahua and Sonoran deserts are hotspots of specialist bee diversity, likely because these desert regions experience significant seasonal variation in rainfall that may promote the evolution of specialists (Minckley et al. 2000). By contrast, the eastern United States hosts fewer bee species overall and a smaller proportion of pollen specialists. Our data were consistent with these overall trends in bee biogeography: the northwestern- and northeastern- most regions in our dataset had the lowest proportion of specialists (45% and 44% of specialists, respectively) while the three regions in the southwest (spanning California to Texas) had the greatest proportion of specialists (94%, 90% and 90% of specialists).

Unfaithful pollen specialists

There were some specialist bee species in our data that hardly ever visited their pollen hosts, and eight species that never visited them (Fig. 1). These bee species might be dominated by records of male bees, which do not provision nests with pollen, and likely make fewer trips to their pollen hosts than females. However, we found that males probably do not explain the pattern: they were less likely – but not dramatically less likely – to visit pollen hosts than conspecific females (Fig. 2b). Many of these “unfaithful” pollen specialists were species with small sample sizes and are thus probably statistical anomalies that do not reflect the populations they are drawn from (Fig. 2a). However, twelve pollen specialist species had more than 200 records with less than half those to the putative pollen host (Table S3). These bee species may not be pollen specialists. Consistent with this, others have found that some species of putative pollen specialists carry large proportions of non-host pollen in their pollen loads (Michener and Rettenmeyer 1956; Ritchie et al. 2016; Smith et al. 2019). We emphasize that rigorous methods are needed to definitively identify pollen specialist bees, including examining the pollen contained in bee nest provisions, scopae or corbiculae, and sampling bees from across a species’ entire geographic range.

It may also be that some putative pollen specialists are really what have been called "facultative oligoleges" (Cane and Sipes 2006). Cane and Sipes (2006) define this type of specialist bee species as one that has a strong preference for its host plant but will collect non-host pollen when its host plant is not in bloom, or to supplement the pollen of its host plant. Alternatively, putative pollen specialists may be geographic specialists, or bees that specialize in one location, but use other host plants in other parts of their range (Davis et al. 2012; Gaiarsa et al. 2022; Mesler and Carothers 2023).

Our findings suggest that machine learning models can provide a starting point for predicting specialist and generalist bee species. Identifying bee species’ diet breadths for taxa and for regions where they are unknown can help us answer important questions in bee conservation ecology and plant-pollinator ecology, lead to improved species conservation outcomes, and provide a better understanding of the pollination services that bee species provide.

Author Contributions

KCS conceived the idea for the study; CS designed the study with support from all authors; ALR led the collection of the pollen dataset with support from VM and ARM. CS curated other datasets and led the analysis with support from NB and KCS. CS wrote the manuscript with editorial advice from all authors.

Acknowledgements

We express our gratitude to Michael Orr (State Museum of Natural History Stuttgart) and Alice Hughes (University of Hong Kong) for their valuable discussions on predicting specialist bees. Additionally, we appreciate Yon Visell and Gregory Reardon (RE TOUCH Lab; UCSB Biological Engineering) for their informative conversation that contributed to our model blocking methods. We thank Jorrit Poelen for the development and maintenance of GloBI. This research received support from a National Science Foundation (NSF) Award Extending Anthophila research through image and trait digitization (Big-Bee; DBI-2102006) granted to KCS, as well as funding for undergraduate research through the National Science Foundation's Harnessing the Data Revolution Data Science Corps (HDR DSC Award #1924205 and Award #1924008).

Data accessibility statement

All code and relevant data files are available on Zenodo: https://zenodo.org/records/10420917.

Bahn V, McGill BJ (2013) Testing the predictive performance of distribution models. Oikos 122:321–331. https://doi.org/10.1111/j.1600-0706.2012.00299.x
Bartomeus I, Ascher JS, Gibbs J, et al (2013) Historical changes in northeastern US bee pollinators related to shared ecological traits. Proceedings of the National Academy of Sciences 110:4656–4660. https://doi.org/10.1073/pnas.1218503110
Bivand RS, Pebesma E, Gomez-Rubio V (2013) Applied spatial data analysis with R, 2nd edn. Springer, NY
Blüthgen N (2010) Why network analysis is often disconnected from community ecology: A critique and an ecologist’s guide. Basic and Applied Ecology 11:185–195. https://doi.org/10.1016/j.baae.2010.01.001
Blüthgen N, Menzel F, Blüthgen N (2006) Measuring specialization in species interaction networks. BMC Ecol 6:. https://doi.org/10.1186/1472-6785-6-9
Cane J, Sipes SD (2006) Floral specialization by bees: analytical methodologies and a revised lexicon for oligolecty. In: Waser N, Ollerton J (eds) Plant-Pollinator Interactions: From Specialization to Generalization. University of Chicago Press, pp 99–122
Cayuela L, Granzow-de la Cerda Í, Albuquerque FS, Golicher DJ (2012) Taxonstand: An r package for species names standardisation in vegetation databases. Methods in Ecology and Evolution 3:1078–1083. https://doi.org/10.1111/j.2041-210X.2012.00232.x
Chacoff NP, Vázquez DP, Lomáscolo SB, et al (2012) Evaluating sampling completeness in a desert plant-pollinator network. Journal of Animal Ecology 81:190–200. https://doi.org/10.1111/j.1365-2656.2011.01883.x
Chao A, Chiu C-H, Jost L (2010) Phylogenetic diversity measures based on Hill numbers. Phil Trans R Soc B 365:3599–3609. https://doi.org/10.1098/rstb.2010.0272
Chao A, Chiu C-H, Jost L (2014) Unifying Species Diversity, Phylogenetic Diversity, Functional Diversity, and Related Similarity and Differentiation Measures Through Hill Numbers. Annu Rev Ecol Evol Syst 45:297–324. https://doi.org/10.1146/annurev-ecolsys-120213-091540
Chesshire PR, Fischer EE, Dowdy NJ, et al (2023) Completeness analysis for over 3000 United States bee species identifies persistent data gap. Ecography. https://doi.org/10.1111/ecog.06584
Clavel J, Julliard R, Devictor V (2011) Worldwide decline of specialist species: Toward a global functional homogenization? Frontiers in Ecology and the Environment 9:222–228. https://doi.org/10.1890/080216
Danforth BN, Cardinal S, Praz C, et al (2013) The impact of molecular data on our understanding of bee phylogeny and evolution. Annual Review of Entomology 58:57–78. https://doi.org/10.1146/annurev-ento-120811-153633
Danforth BN, Minckley RL, Neff JL (2019) The Solitary Bees: Biology, Evolution Conservation. Princeton University Press, Princeton and Oxford
Davis ES, Reid N, Paxton RJ (2012) Quantifying forage specialisation in polyphagic insects: the polylectic and rare solitary bee, Colletes floralis (Hymenoptera: Colletidae). Insect Conserv Diversity 5:289–297. https://doi.org/10.1111/j.1752-4598.2011.00166.x
Dorado J, Vá Zquez DP, Stevani EL, Chacoff NP (2011) Rareness and specialization in plant-pollinator networks. Ecology 92:19–25. https://doi.org/10.1890/10-0794.1
Dormann CF (2011) How to be a specialist? Quantifying specialisation in pollination networks. Network Biology
Ehrlich PR, Raven PH (1964) Butterflies and plants: a study in coevolution. Evolution 18:586. https://doi.org/10.2307/2406212
Elrahman SMA, Abraham A (2013) A Review of Class Imbalance Problem. Journal of Network and Innovative Computing 1:332–340
Etard A, Morrill S, Newbold T (2020) Global gaps in trait data for terrestrial vertebrates. Global Ecol Biogeogr 29:2143–2158. https://doi.org/10.1111/geb.13184
Fowler J (2020a) Pollen specialist bees of the central United States. https://jarrodfowler.com/bees_pollen.html
Fowler J (2020b) Pollen specialist bees of the western United States. https://jarrodfowler.com/pollen_specialist.html
Fowler J, Droege S (2020) Pollen specialist bees of the eastern United States. In: https://jarrodfowler.com/specialist_bees.html
Gaiarsa MP, Rehan S, Barbour MA, McFrederick QS (2022) Individual dietary specialization in a generalist bee varies across populations but has no effect on the richness of associated microbial communities. The American Naturalist 200:730–737. https://doi.org/10.1086/721023
GBIF.org (2021a) GBIF Occurrence Download https://doi.org/10.15468/dl.6cxfsw
GBIF.org (2021b) GBIF Occurrence Download https://doi.org/10.15468/dl.b9rfa7
GBIF.org (2021c) GBIF Occurrence Download https://doi.org/10.15468/dl.b9rfa7
Glaum P, Wood TJ, Morris JR, Valdovinos FS (2021) Phenology and flowering overlap drive specialisation in plant–pollinator networks. Ecology Letters 24:2648–2659. https://doi.org/10.1111/ele.13884
GloBI Community (2022) Global Biotic Interactions: Interpreted Data Products
Hardy NB, Otto SP (2014) Specialization and generalization in the diversification of phytophagous insects: Tests of the musical chairs and oscillation hypotheses. Proceedings of the Royal Society B: Biological Sciences 281:. https://doi.org/10.1098/rspb.2013.2960
Harrison T, Gibbs J, Winfree R (2019) Anthropogenic landscapes support fewer rare bee species. Landscape Ecol 34:967–978. https://doi.org/10.1007/s10980-017-0592-x
Hortal J, De Bello F, Diniz-Filho JAF, et al (2015) Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity. Annu Rev Ecol Evol Syst 46:523–549. https://doi.org/10.1146/annurev-ecolsys-112414-054400
Houlahan JE, McKinney ST, Anderson TM, McGill BJ (2017) The priority of prediction in ecological understanding. Oikos 126:1–7. https://doi.org/10.1111/oik.03726
Jin Y, Qian H (2019) V.PhyloMaker: an R package that can generate very large phylogenies for vascular plants. Ecography 42:1353–1359. https://doi.org/10.1111/ecog.04434
Jin Y, Qian H (2022) U.PhyloMaker: An R package that can generate large phylogenetic trees for plants and animals. Plant Diversity S2468265922001329. https://doi.org/10.1016/j.pld.2022.12.007
Lane IG, Portman ZM, Herron-Sweet CR, et al (2023) Higher floral richness promotes rarer bee communities across remnant and reconstructed tallgrass prairies, though remnants contain higher abundances of a threatened bumble bee (Bombus Latreille). Biological Conservation 279:109862. https://doi.org/10.1016/j.biocon.2022.109862
Li D (2018) hillR: taxonomic, functional, and phylogenetic diversity and similarity through Hill Numbers. JOSS 3:1041. https://doi.org/10.21105/joss.01041
Liaw A, Wiener M (2002) Classification and Regression by {randomForest}. R News 2:18–22
Lucas TCD (2020) A translucent box: interpretable machine learning in ecology. Ecol Monogr 90:. https://doi.org/10.1002/ecm.1422
McGill BJ (2003) Does Mother Nature really prefer rare species or are log-left-skewed SADs a sampling artefact? Ecology Letters 6:766–773. https://doi.org/10.1046/j.1461-0248.2003.00491.x
Mesler MR, Carothers SK (2023) Host-switching by a bee where its usual pollen host is not present: Diadasia diminuta (Cresson, 1878) (Apidae: Eucerinae: Emphorini) uses the rare mallow, Iliamna latibracteata Wiggins (Malvaceae), as its pollen host in northwestern California and southwestern Oregon. The Pan-Pacific Entomologist 99:. https://doi.org/10.3956/2022-99.3.192
Michener CD (2000) Bees of the World. The Johns Hopkins University Press, Baltimore, Maryland
Michener CD, Rettenmeyer CW (1956) The ethology of Andrena erythronii with comparative data on other species (Hymenoptera, Andrenidae). The University of Kansas Science Bulletin 37:645–684
Minckley RL, Cane JH, Kervin L (2000) Origins and ecological consequences of pollen specialization among desert bees. Proceedings of the Royal Society B: Biological Sciences 267:265–271. https://doi.org/10.1098/rspb.2000.0996
Morales C, Traveset A (2008) Interspecific pollen transfer: magnitude, prevalence and consequences for plant fitness. Critical Reviews in Plant Sciences 27:221–238. https://doi.org/10.1080/07352680802205631
Müller A, Kuhlmann M (2008) Pollen hosts of western palaearctic bees of the genus Colletes (Hymenoptera: Colletidae): the Asteraceae paradox. Biological Journal of the Linnean Society 95:719–733. https://doi.org/10.1111/j.1095-8312.2008.01113.x
Neff JL, Danforth BN (1991) The nesting and foraging Behavior of Perdita texana (Cresson) (Hymenoptera: Andrenidae). Journal of the Kansas Entomological Society 64:394–405
Novotný V, Basset Y (2000) Rare species in communities of tropical insect herbivores: pondering the mystery of singletons. Oikos 89:564–572. https://doi.org/10.1034/j.1600-0706.2000.890316.x
Paradis E, Schliep K (2019) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528. https://doi.org/doi:10.1093/bioinformaCcs/bty633
Parker AJ, Williams NM, Thomson JD (2016) Specialist pollinators deplete pollen in the spring ephemeral wildflower Claytonia virginica . Ecol Evol 6:5169–5177. https://doi.org/10.1002/ece3.2252
Parr CL, Dunn RR, Sanders NJ, et al (2017) GlobalAnts : a new database on the geography of ant traits (Hymenoptera: Formicidae). Insect Conserv Divers 10:5–20. https://doi.org/10.1111/icad.12211
Pebesma E, Bivand RS (2005) Classes and methods for spatial data in R. R News 5:
Pebesma EJ (2018) Simple features for R: standardized support for spatial vector data. The R Journal 10:439–446. https://doi.org/10.32614/RJ-2018-009
Pekkarinen A (1997) Oligolectic bee species in Northern Europe (Hymenoptera, Apoidea). Entomol Fennica 8:205–214. https://doi.org/10.33338/ef.83945
Pelletier D, Forrest JRK (2023) Pollen specialisation is associated with later phenology in Osmia bees (Hymenoptera: Megachilidae). Ecological Entomology 48:164–173. https://doi.org/10.1111/een.13211
Pelton EM, Schultz CB, Jepsen SJ, et al (2019) Western monarch population plummets: Status, probable causes, and recommended conservation actions. Frontiers in Ecology and Evolution 7:. https://doi.org/10.3389/fevo.2019.00258
Poelen JH, Simons JD, Mungall CJ (2014) Global biotic interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics 24:148–159. https://doi.org/10.1016/j.ecoinf.2014.08.005
Poisot T, Bergeron G, Cazelles K, et al (2021) Global knowledge gaps in species interaction networks data. Journal of Biogeography 48:1552–1563. https://doi.org/10.1111/jbi.14127
R Core Team (2022) R: a language and environment for statistical computing
Ritchie AD, Ruppel R, Jha S (2016) Generalist behavior describes pollen foraging for perceived oligolectic and polylectic bees. Environmental Entomology 45:909–919. https://doi.org/10.1093/ee/nvw032
Roberts DR, Bahn V, Ciuti S, et al (2017) Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40:913–929. https://doi.org/10.1111/ecog.02881
Robertson C (1925) Heterotropic Bees. Ecology 6:412–436. https://doi.org/10.2307/1929107
Russo L (2016) Positive and negative impacts of non-native bee species around the world. insects 7:. https://doi.org/10.3390/insects7040069
Sedivy C, Dorn S, Widmer A, Müller A (2013) Host range evolution in a selected group of osmiine bees (Hymenoptera: Megachilidae): the Boraginaceae-Fabaceae paradox. Biological Journal of the Linnean Society 108:35–54. https://doi.org/10.1111/j.1095-8312.2012.02024.x
Singer MS, Lichter-Marck IH, Farkas TE, et al (2014) Herbivore diet breadth mediates the cascading effects of carnivores in food webs. Proceedings of the National Academy of Sciences of the United States of America 111:9521–9526. https://doi.org/10.1073/pnas.1401949111
Sipes SD, Tepedino VJ (2005) Pollen-host specificity and evolutionary patterns of host switching in a clade of specialist bees (Apoidea: Diadasia). Biological Journal of the Linnean Society 86:487–505. https://doi.org/10.1111/j.1095-8312.2005.00544.x
Slatyer RA, Hirst M, Sexton JP (2013) Niche breadth predicts geographical range size: A general ecological pattern. Ecology Letters 16:1104–1114. https://doi.org/10.1111/ele.12140
Smith C, Weinman L, Gibbs J, Winfree R (2019) Specialist foragers in forest bee communities are small, social or emerge early. Journal of Animal Ecology 88:1158-1167. https://doi.org/10.1111/1365-2656.13003
Smith SA, Brown JW (2018) Constructing a broadly inclusive seed plant phylogeny. Am J Bot 105:302–314. https://doi.org/10.1002/ajb2.1019
Wood T, Roberts S (2017) An assessment of historical and contemporary diet breadth in polylectic Andrena bee species. Biological Conservation 215:72–80. https://doi.org/10.1016/j.biocon.2017.09.009
Wood TJ, Müller A, Praz C, Michez D (2023) Elevated rates of dietary generalization in eusocial lineages of the secondarily herbivorous bees. BMC Ecol Evo 23:67. https://doi.org/10.1186/s12862-023-02175-1

Download PDF

Reviewers agreed at journal
05 Feb, 2024
Reviewers invited by journal
28 Jan, 2024
Editor assigned by journal
11 Jan, 2024
First submitted to journal
10 Jan, 2024

You are reading this latest preprint version

Pollen specialist bee species are accurately predicted from visitation, occurrence and phylogenetic data

Status:

Version 1

Abstract

Figures

Introduction

Methods

Diet breadth datasets

Visitation dataset

Bee and plant phylogenies

Occurrence dataset

Estimating visitation predictors

Estimating bee phylogenetic predictors

Estimating geographic and phenological predictors

Analyses

Results

1) How often do pollen specialist bees visit their host plants?

2) Can we predict pollen specialist and generalist bees?

3) What variables are most important for predicting specialist and generalist bees?

Discussion

Significance

Applications

Unfaithful pollen specialists

Conclusion

Declarations

Author Contributions

Acknowledgements

Data accessibility statement

References

Supplementary Files

Status:

Version 1