Specimen Bank Data
The German Environmental Specimen Bank (ESB) has been in operation since the early 1980s. The ESB collects samples of indicator species from various terrestrial and aquatic ecosystems. These species serve as accumulators of environmental chemicals and provide a detailed image of pollution and ecosystem health. The long operational history of ESBs results in exceptionally long time series of environmental pollution. To make pollution analysis comparable between years, ESB samples are collected according to highly standardized protocols. Samples are taken at the same time of the year, at identical sites and using identical protocols. ESB collection is done using sterile equipment to avoid carryover of even trace amounts of pollutants between samples. To ensure preservation of unstable chemical compounds, the samples are stored over liquid nitrogen after collection and for the long term, halting all chemical and biological degradation. To acquire an integrative view of pollution in an ecosystem, ESB samples are large, each one including hundreds to thousands of specimens or tissue compartments (in case of trees)18,31-36. Each sample is cryo-milled to a fine powder of a grain size of 200 µm, thoroughly homogenizing all traces of chemicals37.
Recent work shows that ESBs, being set up as chemical pollution archives, also makes them ideal for studies on biodiversity change. ESB indicator species can be considered metaorganisms or natural community DNA samplers, which preserve an imprint of the surrounding biological community at the time of sampling. The highly standardized and contamination-free sampling and sample processing conditions, coupled with storage at ultra-low temperatures, make ESB samples perfectly suited for metabarcoding. The cryomilling of large ESB samples also guarantees an even distribution of community DNA traces among the sample and breaks open cell walls of various microorganisms, whose DNA is then uniformly released. Former studies have already extensively tested and highlighted the suitability of different ESB samples for retrospective biodiversity monitoring9,10,12. Here, we use four different types of ESB samples from terrestrial, limnic and marine habitats as natural community DNA samplers to measure biodiversity change across four decades.
1) Tree leaves: The ESB collects leaves from three common tree species in Germany, namely European beech (Fagus sylvatica), Norway spruce (Picea abies) and Lombardy poplar (Populus nigra). The leaves are collected once annually or biannually and serve as samplers for aerial pollutants deposited on the leaves31,32,33. ESB leaf samples are sampled from different forest ecosystem types, spanning a land use gradient from core zones of national parks to timber forests, forests neighboring agricultural sites, and urban parks. Sample series from nine sites were included in this study, starting from 1985. Each sample contains hundreds of leaves from at least 15 individual trees, milled to a fine powder. These samples contain DNA traces of all organisms that interacted with the tree canopy at the time of collection9. Here we characterize communities of canopy-associated arthropods, fungi and bacteria.
2) Bladderwrack (Fucus vesiculosus): This macroalgae is widespread along the European coastline and makes up a significant part of the biomass in Europe’s coastal ecosystems. Marine pollutants are enriched in the tissue of the algae, making it an ideal sentinel species for pollution34. Three sites have been sampled annually or biannually for bladderwrack thalli beginning in 1985. ESB samples from two North Sea sites are collected at intervals of two months, six times a year, and then merged into a pooled annual sample. The third site at the Baltic Sea is sampled twice a year. Bladderwrack is a critical species in coastal ecosystems, providing a habitat for countless taxa. All these taxa leave detectable DNA traces in the ESB sample12. Here we characterize communities of animals and bacteria that interacted with the bladderwrack.
3) Blue mussels (Mytilus edulis): This is the most common mussel in coastal ecosystems of Northern and Central Europe. Blue mussels constantly filter the water column for planktonic organisms and organic particles. In doing so, they enrich pollutants in their tissue, making them an excellent sentinel species for pollution monitoring35. The ESB has collected blue mussels at three coastal sites in Germany since 1985. The mussel’s entire soft tissue including respiratory water is used for the sample. Annual or biannual samples of hundreds of mussels are compiled from six sampling events at the North Sea and two at the Baltic Sea. With each mussel filtering roughly 1 liter of water per hour, these samples contain a comprehensive imprint of the annual planktonic biodiversity at the sampling site12. Here we characterize communities of eukaryotic plankton and mussel-associated bacteria.
4) Zebra mussels (Dreissena polymorpha): The limnic zebra mussel is an invasive species from the Black Sea region, which has colonized nearly all major rivers of Germany since the 1960s. Like the blue mussel, zebra mussels are highly efficient filter feeders. Since the 1990s, zebra mussels are reared by the ESB on special plate stacks, which are then placed in four major German rivers for about one year, allowing the mussels to accumulate pollutants in their tissue. The mussels are then collected from the plate stacks, immediately deep-frozen and a sample of soft tissue including respiratory water is compiled from thousands of mussels36. The samples from nine sites used here provide an overview of limnic biodiversity from major rivers of Germany12. Here we characterize communities of animals, eukaryotic plankton and mussel-associated bacteria.
DNA extraction, library preparation, sequencing and sequence processing
Samples were processed as described in Krehenwinkel et al. (2022a) and Junk et al. (2023). Work steps were performed under clean benches to avoid carryover and cross-contamination. We isolated DNA from 200 mg of homogenate from each sample, which was shown to be sufficient to recover the sample-associated diversity9. DNA was extracted in one or two replicates depending on the sample type (Supplementary Table 5) using a CTAB protocol (OPS Diagnostics, New Jersey, USA), which proved best suited to extract high purity DNA from these sample types. The DNA extracts were then amplified for different DNA barcode markers to enrich various taxonomic groups from the samples (for a list of barcode markers and PCR conditions see Supplementary Table 5). PCR was performed using the Qiagen Multiplex PCR kit (Hilden, Germany) in 10-µl volumes according to the manufacturer’s protocols. Primers were chosen to amplify the associated community, but not the ESB indicator species itself, whose DNA dominates the extract. To characterize bacterial communities (all sample types), we amplified the V1 or V5-7 region of 16SrDNA39,40,41. For terrestrial arthropods (tree leaf samples), we used a mitochondrial Cytochrome Oxidase I (COI) marker38. For fungi (tree leaf samples) we used the ITS1 region of the nuclear ribosomal cluster42,43. The variable V9 region of nuclear 18SrDNA was targeted to characterize communities of aquatic animals and eukaryotic plankton (bladderwrack and mussel samples12; for primer details, see Supplementary Table 5). PCR success was checked on 1.5 % agarose gels, and the PCR products were then amplified in another round of PCR to add Illumina Truseq adapters and unique combinations of dual indexes44 (Supplementary Table 5). All final libraries were pooled in approximately equimolar amounts, cleaned of leftover primers using 1X AMPure beads XP (Beckman-Coulter, California, USA), and then sequenced on an Illumina Miseq (California, USA) using paired-end sequencing with 500-cycle V2 and 600-cycle V3 kits. To ensure reproducibility of our data and to recover rare species, we ran several PCR replicates for every sample, which were indexed and sequenced separately. The number of PCR replicates was adapted based on sample type and marker and varied between three and six (Supplementary Table 5). Blank DNA extractions were included in every batch of extractions, and non-template control PCRs were run alongside all PCR reactions. All controls were sequenced along with the samples to provide a baseline for carryover or cross-contamination during processing.
Forward and reverse reads were merged using PEAR45 with a minimum quality of 20 and a minimum overlap of 50 bp. The merged reads were then quality-filtered by limiting the number of expected errors in a sequence to 146 and transformed to fasta format using USEARCH47. Primer sequences were trimmed off using UNIX scripts. Long 18SrDNA amplicons (~350 bp) of aquaticmetazoan and microeukaryotic plankton generated from zebra mussels were trimmed to match the corresponding short amplicon of ~150 bp. Since both metazoan and phytoplankton amplicons span exactly the same nuclear 18SrDNA region, all sequences were combined into one file. Likewise, reads generated from the three tree species were saved to one file for each marker. After trimming, the resulting file for each marker and sample type was dereplicated and clustered into zero radius OTUs (hereafter OTUs) using the USEARCH pipeline. OTU tables were built for each sample type and marker, also using USEARCH. Taxonomy was annotated using blast2taxonomy script v1.4.2.48 after BLAST searching49 against the entire NCBI Genbank database for 18SrDNA and COI with a maximum number of 10 target sequences. The Silva database50 was used for annotating 16SrDNA sequences, and the UNITE database51 for fungal ITS1. The FungalTraits database52 was used for the functional annotation of fungi. Taxonomic assignments based on BLAST hits with a base pair length of less than 80 % of the amplicon length and/or less than 85 % sequence identity were removed. Wed excluded all taxa except Bacteria, Algae/Protozoa, Metazoa or Fungi from the respective datasets.
Only OTUs with a minimum of 3 reads per sample replicate were retained. The OTU tables were checked for contamination using negative controls. PCR replicates were merged and all datasets were checked for sufficient sequencing depth and sampling size (Extended Data Figure 1; for resulting sampling and OTU count as well as number of phyla and orders see Supplementary Table 2 “cleaned dataset”).
Statistical model and analyses of community diversity
For each type of sample, we only (i) selected sites that were sampled for at least 5 years, (ii) only kept sampling years represented by at least 50 % of the sites and (iii) removed years that were isolated from the others (>2 years). We also removed samples with low read coverage (less than 50 % of the median number of reads). Finally, because OTU read abundances from metabarcoding datasets are subject to many biases53, we converted OTU abundances into binary presence/absence data and only analyzed trends in terms of OTU occurrence. To limit cross-contamination, we considered an OTU as present in a sample if it represents at least 0.01 % of the total reads (for resulting sampling and OTU count as well as number of phyla and orders see Supplementary Table 2 “filtered dataset (model)”; Extended Data Table 1).
We measured community diversity trends in three different ways. First, in each community, in each year, we computed the ɑ-diversity using the OTU richness as a measure of local diversity at a given time. Second, we computed 𝛽-diversities between pairs of communities (1) sampled at the same site in different years or (2) sampled at different sites in the same year. (1) gives an idea of temporal turnover in community composition (temporal 𝛽-diversity), whereas (2) indicates the spatial turnover of the communities (spatial 𝛽-diversity). We measured 𝛽-diversities using the turnover component of the Jaccard distances (R-package betapart 54).
To identify temporal trends in these diversity indices, temporal models were fitted using mixed linear models accounting for the temporal autocorrelation between sampling years and the effect of the different sampling sites. We used the lme function (R-package nlme55) with the corAR1 temporal correlation and the different sites as random effects. We fitted these temporal models with either the ɑ-diversities or the temporal and spatial 𝛽-diversities as response variables.
The significance of the observed trends was evaluated by comparing them to null expectations of community changes. Changes in community composition occur as a result of immigration and extinction events, which are influenced by neutral and/or niche factors. This generates a dynamic equilibrium with ever-changing communities, even in the absence of any kind of disturbance. Following Dornelas et al. (2014), we built upon the equilibrium theory of island biogeography (ETIB)19 to set up a dynamic model for community assembly that generates null expectations of diversity trends in the absence of disturbance (Extended Data Figure 3A). The ETIB is a lineage-based model of species colonization of a local community (the island) from a metacommunity (the continent). In its simplest form19, at each time step, it assumes that each OTU has a probability to migrate from the metacommunity to the local community , and once settled in the community, each OTU has a probability to go extinct. The number of new immigration events (i.e. of species not already in the community) per time step is given by where is the total number of OTUs in the metacommunity and is the number of OTUs already present in community ; it declines as the number of OTUs in the community increases. The number of extinction events per time step, given by , increases with the number of OTUs in the community. An equilibrium is reached when the number of immigration events per time step equals the number of extinction events, i.e. . The equilibrium number of OTUs in the community is given by . This simple form of ETIB implies a linear decrease or increase of the number of new immigration events (resp. extinction events) per unit of time with the number of settled species in the local community. It assumes that all species have the same probabilities to migrate or go extinct (neutrality), and that these probabilities do not depend on the number of species in the community, implying that there is a negligible influence of interspecific competition on immigration and extinction. It thus applies best to communities that are far from carrying capacity. This model has the advantage of being very straightforward to simulate using a simple discrete-time Markov chain.
Given the incomplete sampling of communities typically achieved with metabarcoding techniques, we assumed that each OTU present in community is observed at each time step with a fixed probability . This extra parameter can be handled using hidden Markov models. We assumed that the rates , , and vary from one community to the other due to various extrinsic and intrinsic factors (e.g. distance to the metacommunity, community size, and environmental factors).
Assuming neutrality, i.e. that all OTUs have the same immigration and extinction rates, is a strong assumption often proven to be unrealistic56. We therefore relaxed the assumption of neutrality. In our non-neutral model, we assumed that immigration (resp. extinction) rates for each OTU are sampled from a beta distribution with parameters and (resp. ). The rates are therefore sampled around (resp. ) with a variance that is inversely proportional to : a large corresponds to scenarios of neutrality whereas closer to 0 indicates that immigration and extinction rates are very different across OTUs. While each OTU has specific immigration and extinction rates in each community, we assumed that the ranks of the OTUs in terms of immigration and extinction rates are conserved across communities (i.e. an OTU with a low extinction rate in community compared with other OTUs will also have a low extinction rate in community ). We thus obtained a non-neutral model derived from the equilibrium theory of island biogeography, which assumes that the presence of an OTU in a community results from the balance between immigration and extinction, and that each OTU is characterized by specific rates of immigration and extinction centered around the average rates. At equilibrium, some OTUs are more likely to be present in the community (e.g. OTUs with higher immigration and lower extinction rates).
Instead of testing different parameters values chosen a priori5, we implemented a new inference strategy to adjust the model parameters to the empirical data using a sequential technique. First, in each site, the sampling fraction was inferred using ACE (R-package vegan57). Second, we estimated the average rates and by fitting the neutral model to each community using a hidden Markov model (R-package seqHMM58). We used these estimates as community-specific estimates of the average rates of immigration and extinction of the non-neutral model. Third, given , , and , we used simulation-based inferences using artificial neural networks to estimate the parameters and . We generated a large number of simulated datasets by sampling and from uniform prior distributions and simulating the corresponding non-neutral model of community assembly, and for each of these simulations, we recorded ɑ-, spatial and temporal 𝛽-, and γ-diversities through time across all the sampled sites. We specifically incorporated subsampling into the simulations (with probability ), such that not all OTUs present in the local communities are observed, mimicking the detection bias of metabarcoding: simulated ɑ-, spatial and temporal 𝛽-, and γ-diversities are therefore directly comparable to empirical diversites. For , we used a uniform prior distribution between the number of observed OTUs and three times the ACE estimate of γ-diversity; for , we used a uniform prior between 1 and 5. We started the simulations at year 1500 with a random community composition at each site (each OTU has a probability to be initially present in the community, where is the theoretical number of species at equilibrium in the neutral model, given , and : ). Next, we simulated community composition over time in each site until 2023, sampled the communities according to , and recorded community composition through time for the years of sampling. We then computed for each simulation the ɑ- and 𝛽-diversities across all the sampled sites. We used the same methods and sampling scheme as for empirical data (the number of sampling sites varied through time) to obtain measures of ɑ-, spatial and temporal 𝛽-diversities. We trained an artificial neural network to estimate and from time series of ɑ-, 𝛽-, and γ-diversities using the Python library Keras59, with 100,000 simulations per dataset until reaching a sufficient predictive power. Once trained, the artificial neural network takes as input ɑ- and 𝛽-diversities through time and outputs estimates of and . We used a neural network with 3 intermediate layers, containing 132, 64, and 32 neurons respectively, and with ELU activation functions. We prevented overfitting by using a dropout of 0.5 at each intermediate layer. Input and output data were scaled between 0 and 1 before fitting and the simulations were split between the training set (90 %) and the test set (10 %). Once validated (Extended Data Figure 7), we finally applied the trained neural network to our empirical data, and obtained corresponding and values.
Given the estimated and values, we simulated our non-neutral model 1000 times with these parameters to generate time series of ɑ- and 𝛽-diversities under an equilibrium model of community assembly. We then compared empirical and simulated temporal trends and considered an empirical trend to be significant if it was above or below 95 % of the simulated trends. We interpreted significant deviations from the simulated equilibrium model of community assembly as indicative of out-of-equilibrium dynamics in the empirical data, potentially driven by anthropogenic disturbances.
We then investigated the temporal trends of OTU occurrences, as some OTUs may be recent invaders and others may have gone extinct at the regional scale. In each ecosystem, we only looked at OTUs present in at least 10 % of the samples and across at least 2 sites. For each OTU, we first tested whether its occurrence in the communities tended to vary through time. To do so, we fitted a generalized linear mixed model with a binomial response (presence/absence of the OTU in a community) and considered the sampling site as random effects.
References Methods
- Tarricone, K., Klein, R., Paulus, M. & Teubner, D. Guideline for Sampling and Sample Processing Red beech (Fagus sylvatica). Umweltbundesamt (2018b).
https://www.umweltprobenbank.de/upb_static/fck/download/SOP_ESB_Red_Beech_V2.0.3_2018_en.pdf.
- Klein, R., Tarricone, K., Teubner, D. & Paulus, M. Guideline for Sampling and Sample Processing Norway Spruce (Picea abies)/ Scots Pine (Pinus sylvestris). Umweltbundesamt (2018).
https://www.umweltprobenbank.de/upb_static/fck/download/SOP_ESB_Spruce_Pine_V2.0.2_2018_en.pdf.
- Tarricone, K., Klein, R., Paulus, M. & Teubner, D. Guideline for Sampling and Sample Processing Lombardy Poplar (Populus nigra ‘Italica’). Umweltbundesamt (2018a).
https://www.umweltprobenbank.de/upb_static/fck/download/SOP_ESB_Lombardy_Poplar_V2.0.3_2018_en.pdf.
- Quack, M. et al. Guideline for Sampling and Sample Processing Bladderwrack (Fucus vesiculosus). Umweltbundesamt (2010).
https://www.umweltprobenbank.de/upb_static/fck/download/SOP_Blasentang.pdf.
- Paulus, M., Klein., R. & Teubner, D. Guideline for Sampling and Sample Processing Blue Mussel (Mytilus eduliscomplex). Umweltbundesamt (2018).
https://www.umweltprobenbank.de/upb_static/fck/download/SOP_ESB_Blue_Mussel_V2.1.0_2018_en.pdf.
- Teubner, D., Klein, R., Tarricone, K. & Paulus, M. Guideline for Sampling and Sample Processing Zebra Mussel (Dreissena polymorpha). Umweltbundesamt (2018).
https://www.umweltprobenbank.de/upb_static/fck/download/SOP_ESB_Zebra_Mussel_V2.1.1_2018_en.pdf.
- Rüdel, H., Uhlig, S. & Weingärtner, M. Zerkleinerung und Homogenisierung von Umweltproben durch Cryomahlung. Fraunhofer-Institut für Molekularbiologie und Angewandte Oekologie (2008).
https://www.umweltprobenbank.de/upb_static/fck/download/IME_SOP_Probenvorbereitung_Dez2008_V200.pdf
- Krehenwinkel, H., Weber, S., Künzel, S. & Kennedy, S. R. The bug in a teacup-monitoring arthropod-plant associations with environmental DNA from dried plant material. Biology Letters, 18, 20220091 (2022b).
https://doi.org/10.1098/rsbl.2022.0091.
- Donia, M. S. Fricke, W. F., Partensky, F. & Schmidt, E. W. Complex microbiome underlying secondary and primary metabolism in the tunicate Prochloron symbiosis. PNAS, 108, 1423-1432 (2011). https://doi.org/10.1073/pnas.1111712108.
- Bodenhausen, N., Horton, M. W. & Bergelson, J. Bacterial communities associated with the leaves and the roots of Arabidopsis thaliana. PloS One, 8, e56329 (2013). https://doi.org/10.1371/journal.pone.0056329.
- Chelius, M. K. & Triplett, E. W. The Diversity of Archaea and Bacteria in Association with the Roots of Zea mays L. Microbial Ecology, 41, 252-263 (2001). DOI: 10.1007/s002480000087.
- 42. Gardes, M. & Bruns, T. D. ITS primers with enhanced specificity for basidiomycetes--application to the identification of mycorrhizae and rusts. Molecular Ecology,2, 113-118 (1993). DOI: 10.1111/j.1365-294x.1993.tb00005.x.
- White, T. J., Bruns, T., Lee, S.& Taylor, J. Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. In : T. J. White, coord.: PCR Protocols: Elsevier, 315-322 (1990). https://doi.org/10.1016/B978-0-12-372180-8.50042-1.
- Lange, V. et al. Data from: Cost-efficient high-throughput HLA typing by MiSeq amplicon sequencing. BMC Genomics15, 63 (2014). https://doi.org/10.1186/1471-2164-15-63.
- Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics, 30, 614-620 (2014). https://doi.org/10.1093/bioinformatics/btt593.
- Edgar, R. C. & Flyvbjerg, H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics, 31, 3476-3482 (2015). https://doi.org/10.1093/bioinformatics/btv401.
- Edgar, R. Usearch. Lawrence Berkeley National Lab (LBNL), Berkeley, CA (United States) (2010).
- Schöneberg, Y. „yschoeneberg/blast2taxonomy“. (2023). Zenodo. doi: 10.5281/zenodo.10009721.
- Altschul S. F., Gish W., Miller W., Myers E. W. & Lipman D. J. Basic local alignment search tool. Journal of Molecular Biology, 215, 403-410 (1990). https://doi.org/10.1016/S0022-2836(05)80360-2.
- Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research, 41, 590-596 (2012). https://doi.org/10.1093/nar/gks1219.
- Abarenkov, K. et al. The UNITE database for molecular identification and taxonomic communication of fungi and other eukaryotes: sequences, taxa and classifications reconsidered. Nucleic Acids Research, 52, 791-797 (2024). https://doi.org/10.1093/nar/gkad1039.
- Põlme, S. et al. FungalTraits: a user-friendly traits database of fungi and fungus-like stramenopiles. Fungal Diversity105, 1-16 (2020). https://doi.org/10.1007/s13225-020-00466-2.
- Piñol J., Mir G., Gomez-Polo P. & Agustí N. Universal and blocking primer mismatches limit the use of high-throughput DNA sequencing for the quantitative metabarcoding of arthropods. Molecular Ecology Resources, 15, 819-830 (2015). https://doi.org/10.1111/1755-0998.12355.
- Baselga, A. et al. Package ‘betapart’: Partitioning Beta Diversity into Turnover and Nestedness Components (2023). https://cran.r-project.org/web/packages/betapart/index.html.
- Pinheiro, J. et al. Package ‘nlme’: Linear and Nonlinear Mixed Effects Models (2024). https://cran.r-project.org/web/packages/nlme/index.html.
- Lomolino M. V. A call for a new paradigm of island biogeography. Global Ecology and Biogeography, 9, 1-6 (2001). https://doi.org/10.1046/j.1365-2699.2000.00185.x.
- Oksanen, J. et al. Package ‘vegan’: Community Ecology Package (2024). https://cran.r-project.org/web/packages/vegan/vegan.pdf.
- Helske, S. & Helske, J. Mixture Hidden Markov Models for Sequence Data : The seqHMM Package in R. Journal of Statistical Software, 88, Article 3 (2019). https://doi.org/10.18637/jss.v088.i03.
- Chollet, F. et al. Keras. (2015). https://github.com/fchollet/keras.