1. Sample selection
Fecal pellets from 96 individuals sampled in seven locations in Manitoba and Saskatchewan, Canada, were selected from an existing extensive collection held by the EcoGenomics Team and used to generate population demographic parameters42. Samples from four to eight males and females from each sampling location were included in the analysis (Fig. 1, Supplementary Table S1). Individuals were identified by genotyping based on variable microsatellite loci amplified from fecal DNA, as previously described43–46, and caribou-specific primers Zfx/Zfy were used for sex identification44. Samples were shipped on dry ice and kept frozen at -80°C.
2. Selection of taxonomic markers and primers
Appropriate taxonomic markers and corresponding primers were selected to ensure the detection of group of organisms of interest from caribou fecal DNA. Broad coverage 16S rRNA gene primers targeting the V4-V5 regions47 were selected to characterize the gut microbiota, which is mainly composed of bacteria in ruminants31,48. Parasites colonizing the caribou digestive tract are members of various taxonomic groups, including the Apicomplexa, Archamoebae, Platyhelminthes, Nematoda, and Arthropoda49. To simultaneously detect all these organisms, broad coverage primers targeting the 18S rRNA gene of eukaryotes50 were selected. These primers have the potential to also inform on the caribou’s winter diet, which is composed of eukaryotic organisms: lichens ̶ a symbiotic relationship between a fungi and at least one photosynthetic partner, cyanobacteria and/or green algae, and plants10. However, more specific markers are commonly used to identify lichens and plants, so complementary sets of primers were selected for these groups. The identification of lichens is based on the taxonomy of the fungal symbiont51 and can be achieved through the sequencing of the nuclear ribosomal internal transcribed spacer (ITS)52, and primers targeting the ITS2 region were selected53,54. For plants, no single widely used marker gene is currently accepted, but sequencing of the plant ITS2 region has been shown suitable for a large variety of plants, allow for an appropriate taxonomic identification at the genus level55, and is supported by well-developed gene databases56. Previously published primers targeting the plant ITS2 were therefore included in the selection57. A list of al primers used in this study is provided in Supplementary Table S2.
3. DNA isolation, library preparation, and sequencing
For each individual sample, three frozen fecal pellets were ground in liquid nitrogen using a mortar and pestle. DNA isolation was performed on 100 mg of ground homogenized samples, using the DNeasy PowerSoil kit (QIAGEN, Netherlands) according to the manufacturer’s protocol (May 2017) for QIACubes instruments (QIAGEN). The Qubit 3.0 fluorometer (Thermo Fisher Scientific, MA, USA) was used for the quantitation of DNA using the dsDNA BR (broad range) assay kit (Life Technologies, CA, USA). DNA extracts were diluted to 5 ng/µL with 10 mM Tris pH 8.0 solution using the QIAgility automated system (QIAGEN).
Library preparation for Illumina sequencing was conducted following the manufacturer’s protocol (16S Metagenomic Sequencing Library Preparation, Part 15044223 Rev. B) with some modifications. The 5’ end of the primers (Supplementary Table S2) was flanked with one or the other of the following Illumina overhang adapter sequences: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG (forward overhang) and GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG (reverse overhang). PCR reactions were set up by mixing 25.0 µL of 2X HotStarTaq Plus Master Mix (QIAGEN, Germany), 0.5 µL of each 10 µM HPLC-purified (or desalted) primer (Invitrogen, USA), 19 µL of RNase-free water (QIAGEN) and 5.0 µL of DNA diluted at 5 ng/µL, for a total volume of 50 µL. Thermal cycling conditions for the 16S rRNA gene were as follows: initial denaturation at 95°C for 5 min; 35 cycles (40 cycles for the 18S, fungal ITS2 and plant ITS2) at 94°C for 45 s, 50°C (47°C for 18S, 50°C for fungal ITS2 and 49°C for plant ITS2) for 45 s, 72°C for 1 min; and a final elongation step at 72°C for 10 min. PCR products were visualized on GelRed-stained 1% agarose gel, using the ChemiGenius Bioimaging System (Syngene, Cambridge, UK). Target fragment sizes were ~ 479 bp for the 16S rRNA gene, ~ 484 bp for the 18S rRNA gene, ~ 347–487 bp for the fungal ITS2 region and ~ 230–378 bp for the plant ITS2 region.
PCR products were purified using AMPure XP reagent (Agencourt, Beckman Coulter Life Science, CA, USA) according to Illumina’s protocol, with gentle pipetting as the preferred option for mixing. Combinatorial dual-index barcodes were added by amplifying 5 µL of the purified PCR product with 25 µL of 2X KAPA HIFI HotStart Ready mix (Roche, South Africa), 5 µL of each Nextera XT index primer (Nextera XT Index kits v2, Illumina Inc., CA, USA) and 10 µL of RNase-free water, for a total volume of 50 µL. Thermal cycling conditions were as follows: 3 min at 95°C; 10 cycles of 30 sec at 95°C, 30 sec at 55°C, 30 sec at 72°C; and a final elongation step of 5 min at 72°C. The resulting dual-indexed libraries were purified with the AMPure XP reagent following the procedure described above, quantified using the SYNERGY Mx system (BioTek Instruments, VT, USA) and pooled at equimolar ratio. To compensate for low base diversity in libraries, 15% of PhiX Control V3 (Illumina) was spiked in the pool. Sequencing was performed using a Miseq v3 600 cycle kit on an Illumina MiSeq system at the Next-Generation Sequencing Platform, Genomics Centre, CHU de Québec-Université Laval Research Centre, Québec, Canada.
4. Pipeline development and database optimization
The pipeline Q2Pipe was developed to ensure a user-friendly and standardized treatment of amplicon sequencing data with QIIME258. Briefly, QIIME2 and its dependencies were loaded to a Singularity image (https://zenodo.org/records/4667718), which allows users to work in a batch environment by installing Singularity, downloading the Singularity image and cloning the Q2Pipe Github repository (https://github.com/NRCan/Q2Pipe). Users can also build the Singularity image by using the image recipe provided in the Q2Pipe repository. The newly built pipeline allows for the selection and modification of all options available in QIIME2 in a single text file which can easily be shared among users and published, while also offering the option to run secondary external programs like FUNGuild59, directly from the pipeline. For compatibility purposes, Q2Pipe can also be used with the QIIME2 Anaconda installation method if the user installs the external dependencies in the Anaconda virtual environment.
As several errors have been detected in the genus name of Eukaryota within the SILVA SSU database 138–9960, a custom python script was used to screen and correct taxonomical disparities between the name at level 6 (i.e. Genus) and binomial name at level 7 (i.e. species). This correction affected 56% of the Eukaryota genera (not applied to Bacteria and Archaea). Furthermore, a low number of entries for lichens were observed in the UNITE database61, which is intended for the molecular identification of fungi based on sequencing of the ITS region. To improve the taxonomic assignment for this group, UNITE was enriched with sequences from the OLICH database62 as well as from twenty lichen specimens collected in Newfoundland, Canada, and formally identified based on anatomical/morphological features63. Protocols for DNA isolation, amplification, sequencing and bioinformatic treatment of sequences from lichen samples was as described above for fecal samples. Only ITS2 region sequences corresponding to the genus identified by anatomical/morphological features and representing more than 10% of sequences from a sample were conserved. This resulted in twenty-two lichen sequences from nineteen species that have been added to the customized database: Alectoria sarmentosa, Cetraria aculeata, Cladonia amaurocraea, Cladonia arbuscula, Cladonia gracilis var gracilis, Cladonia maxima, Cladonia rangiferina (three sequences from two specimens), Cladonia squamosa, Cladonia stellaris, Cladonia uncialis, Lasalia papulosa, Lobaria pulmonaria, Lobaria quercizans, Platismatia norvegica, Sphaerophorus globosus, Stereocaulon tomentosum, Tuckermannopsis americana and Usnea filipendula. Lichen sequences are available on NCBI under OQ792092 to OQ792112 accession numbers. Lichen sample metadata are available on NCBI under BIOProject PRJNA904780, BIOSamples numbers SAMN34138716 to SAMN34138735.
5. Bioinformatics analyses on sequences from caribou fecal samples
Bioinformatics analyses for each sequence dataset obtained from caribou feces were performed using QIIME2, version 2020.8 within Q2Pipe. Option files for each gene analyzed can be found as Supplementary Material 1. Although parameters were slightly different for each taxonomic marker, the overall procedure was similar. Briefly, demultiplexed, per-sample, gzipped fastq files were imported and sequencing reads were truncated at their 5’ and 3’ ends based on 1) the length of the primer (16S and 18S) or specific primer sequence (ITS2) and 2) per base sequence quality score. The QIIME2 plugin DADA264 was selected as the denoising method, which allowed for filtering, dereplication, merging paired-end reads and chimera identification. This resulted in inferences of amplicon sequence variants (ASVs). The least abundant ASVs were filtered out if their frequency was less than 0.05% of the mean ASV frequency. Naive Bayes trained classifier were generated from the corrected and amended SILVA SSU database 138–9960, the amended UNITE v8.3 dynamic fungal database61, and the PLANiTS v2020.3 database56. Taxonomic assignment of 16S and 18S rRNA gene ASVs (SILVA database), fungal ITS2 ASVs (UNITE database) and plant ITS2 ASVs (PLANiTS) was then performed using the QIIME2 plugin feature-classifier classify-sklearn, which is a machine-learning-based classification method that requires trained classifiers. For the 16S rRNA gene, ASVs assigned to Eukaryota, mitochondria, chloroplast and unassigned ASVs at the kingdom level were filtered out, while ASVs from Bacteria, Mammalia and unassigned ASVs at the kingdom level were filtered out for the 18S rRNA gene. For plant and fungal ITS2 regions, taxonomic assignment followed the same process as for the 16S and 18S rRNA genes but using respectively the and the customized, respectively, whereas taxonomic filtering was performed to keep only either Streptophyta or Fungi. ASV tables of the 18S rRNA gene and fungal ITS2 region were further analyzed using FUNGuild59 to infer the ecological guilds of fungi based on their taxonomy. Raw sequence data are available in the Sequence Read Archive (SRA) of the NCBI under accession numbers SRR23356982 to SRR23357367. Sample metadata are also available on NCBI under BIOSamples numbers SAMN32246636 to SAMN32246732, BIOProject PRJNA904780.
6. Characterization of the diet and detection of potential parasites
Potential components of the diet were filtered from non-rarefied ASV tables among plants (plant ITS2 and 18S datasets), lichen-forming fungi (fungal ITS2 and 18S datasets) and lichen-forming algae (18S dataset). Lichen-forming fungal ASVs were identified based on FUNGuild trophic mode assignment, while ASVs from lichen-forming algae were identified based on the list published by Sanders & Masumoto65. Following steps were carried out on filtered datasets containing only ASVs belonging to these potential components of the diet. To increase the reliability of our data and discard potential contaminants, ASVs were considered as part of the diet when accounting for more than 1% of the sequences in a sample66. Then, relative abundance data was converted to presence/absence, as numerous factors may lead to bias in sequence representation of food items in DNA isolated from fecal samples67. The frequency of occurrence (i.e. the number of samples that contain a given food item, referred to here as FOO), as proposed by Ando et al.68, was used as a semi-quantitative approach to underline diet differences between locations. Plant and lichen ASVs were classified according to their morphology (hereby called classes: herbaceous, non-vascular or lignified for plants; crustose, epiphyte foliose, epiphyte fruticose, foliose or fruticose for lichens). The number of ASVs for the plant and lichen components of the diet as well as for each class were calculated and used as a proxy for the richness of the diet.
To identify parasites, ASVs belonging to the Apicomplexa, Archamoebae, Platyhelminthe, Nematoda and Arthropoda were extracted from the non-rarefied 18S rRNA ASV table; ASVs from these groups were then listed and parasites were identified based on the list of caribou parasites from Tryland and Kutz49. ASVs assigned to the Entamoeba genus, a common parasite in mammals, were also included in the parasite list even if, to the best of our knowledge, it has never been reported as a caribou parasite. Moreover, ASVs assigned to the Blastocystis genus, a ubiquitous anaerobic protist which develops in the gastrointestinal tract of a wide variety of hosts such as mammals (including some ungulates), birds and reptiles69,70 were also included, although it belongs to the Stramenopila lineage71 and was not identified through our initial taxa selection. As errors in the taxonomic assignment of Eukaryotes was previously observed with the SILVA database, each taxonomic assignment was verified by conducting a BLAST analysis of ASV sequences against the NCBI Genbank database. When the BLAST results were not concordant with the taxonomic assignment obtained at the genus level by our bioinformatic pipeline, the sequences were discarded from the list. As a result, all sequences assigned to the Cryptosporidium genus were discarded as they matched to uncultured eukaryote clones, the first Cryptosporidium hit in NCBI sharing only around 91% similarity with sequences from our dataset. Overall, five genera of potential parasites were detected and included in further analyses. As for the diet, parasite detection was described by the frequency of occurrence and the number of ASVs assigned to each genus.
7. Statistical analyses
All statistical analyses were performed in R (4.2.2 Patched) using 0.05 as a significance threshold value.
For alpha and beta diversity analyses, datasets were normalized by rarefaction in QIIME2 to control for uneven sequencing across samples72 (see Q2Pipe Option files in Supplementary Material 1 for details). Optimal library size for each taxonomic marker was selected considering both the rarefaction curves and the number of reads of the smallest libraries. Sampling depth was set at 18,430 sequences for the 16S rRNA gene, resulting in 1,769,280 sequences clustered into 7,192 ASVs observed among 96 samples (100%); 2,750 sequences for the 18S rRNA gene dataset, resulting in 264,000 sequences clustered into 5,174 ASVs observed among 96 samples (100%); 3,700 sequences for the plant ITS2 region, resulting in 351,500 sequences clustered into 1,585 ASVs observed among 95 samples (99%); and 7,100 sequences for the fungal ITS2 region, resulting in 674,500 sequences clustered into 7,326 ASVs observed among 95 samples (99%). The ASV richness (Chao1 index) and diversity (Shannon index) for all markers were estimated with the “vegan” package. To evaluate differences in these indices as well as in richness (i.e., number of ASVs) of different functional groups (parasites, plant diet and lichen diet) among locations and between sex, Kruskal-Wallis tests (kruskal.test function) were performed, followed by Dunn post hoc test (dunn.test function) when significant. The same approach was used to assess differences in the relative abundance of various taxonomic groups (i.e. eukaryotic, fungal and bacterial phyla, plant families, and the twenty most abundant genera of each dataset) among locations and between sex. Correlations between the abundance of the twenty most abundant bacterial genera and diet and parasite richness were also tested using the Spearman's Rank correlation (cor.mtest function).
The effects of sampling locations and sex on beta diversity for each marker gene were tested by permutational multivariate analysis of variance (PERMANOVA) using the “adonis” function of the “vegan” package, based on Bray-Curtis dissimilarity matrices obtained from the “vegdist” function. When significant, differences between sampling locations were tested by pairwise PERMANOVA using the pairwise.adonis function. Non-metric multidimensional scaling (NMDS) plots were generated for each taxonomic marker using the “metaMDS” function, based on Bray-Curtis dissimilarity to visualize differences among locations and between sex. The “envfit” function of the “vegan” package was used to test whether the parasite, plant and lichen richness were correlated to the bacterial NMDS space. Only significant correlations were mapped on the NMDS plots.