2.1 Sampling and fieldwork
We collected stickleback with minnow traps from both lakes on 14 June 2019, and transported them to the fish facility at Verið aquaculture station of Hólar University (Saudárkrókur, Iceland), where they were housed by population, with 30 fish in each 19 l aerated bucket, at 12 degrees C , which is close to the temperature of the lakes. Fish were fed ad libitum once per day with frozen bloodworms. The experiment started on 21 June 2019 and ended on 13 July 2019. We selected subjects in the mornings before feeding and moved them from their home tanks to ensure their guts were free of digesting food before behavioral trials and gut dissection.
2.2 Behavioral experiments
Food-deprived adult sticklebacks for one day were recorded for behavior in tanks of 150 x 50 cm with a water depth of 21 cm and a distance from the camera of 155 cm. The experiment was divided into two treatments: predation stress and control treatment. For the predation stress group, the stickleback fish had two minutes of adjustment time to behave normally in the tank. After that, the activity of the fish was observed for 4 minutes. A Silicone model fish was used as a visual predator simulating an attack. The predation cue was triggered when the stickleback fish passed by in front of the silicone fish. After triggering the predation cue, the behavior was recorded for 8 minutes. In a similar way, for the control group, the stickleback fish had two minutes of adjustment to behave normally in the tank. Then, the activity was recorded and followed for 4 minutes. The behavior was recorded for an extra 8 minutes in the same way as the treatment group but there were no silicone fish in the tank therefore no predator cue to be triggered. For both treatments, the stickleback fish was measured and dissected to extract the gut for further analyses.
2.3 Parasite identification
Parasites were identified by visual inspection during the dissection for gut extraction, focusing on a major stickleback parasite in the body cavity, Schistocephalus solidus. Parasitism was higher in Galtaból compared to Þristikla (27.3% vs 15.4%), resulting in an unbalanced study design when testing for associations with parasite status. Final sample numbers for the behavioral analyses were as follows: 40 Galtaból fish in the control group (30 non-parasitized, 10 parasitized), 48 Galtaból fish in the predator-exposed group (34 non-parasitized, 14 parasitized), 38 Þristikla fish in the control group (38 non-parasitized, 0 parasitized) and 40 Þristikla fish in the predator-exposed group (28 non-parasitized, 12 parasitized).
2.3 Characterization of behaviour
2.3.1 Arena for control condition
The arena was divided into five virtual zones (Figure 1A) using the software EthoVision XT 15 (Noldus, The Netherlands) :
-Three equal-sized zones : Start (left part of the arena where the tested fish is placed at the beginning of experiment), Opstart (right part of the arena at the opposite side of start), Middle (zone located between Start and OpStart)
-Two additional zones : Center (The centre zone was considered a risky area, a common measure indicative of a high degree of boldness in such an apparatus), Border (this zone is associated with thigmotaxis that is staying close to the walls of an arena, a common measure indicative of a high degree of shyness in such an apparatus)(Dahlbom et al. 2011). The variables of interest extracted with EthoVision XT 15 were as follows: i)The time spent in each zone previously described, ii)The distance travelled by each fish in the device (Dtot in mm), iii)The absolute angular velocity of the fish expressed in degrees per second (Vang in ° s−1), and its mean velocity expressed in body length per second (Vel in BL s−1).
2.3.2 Arena for treatment condition
Two reversed arena settings were designed in Ethovision for the situations before and after triggering the fake predator. So, we only describe the situation before triggering the predator (Figure 1B).
The same five zones as described for the control were also used here. In addition to these zones, we defined Pred (hidden zone under the fake predator) and Entry (zone located all around the predator). The variables of interest extracted with EthoVision XT 15 were as follows: i)The time spent in each zone previously described, ii)The time spent in Pred and Entry, iii)The distance travelled by each fish in the device (Dtot in mm), iv)The absolute angular velocity of the fish expressed in degrees per second (Vang in ° s−1), and its mean velocity expressed in body length per second (Vel in BL s−1).
2.4 Nucleic Acids Extraction, Library Preparation and Sequencing
For nucleic acid extraction, whole guts previously stored in RNA later were homogenized and lysed with a TissueLyser for 1.5 min at 25–30 Hz in 1.6 ml Buffer RLT (Qiagen). DNA and RNA were then simultaneously extracted from 450 µl of lysate per sample using the Allprep® DNA/RNA Mini kit (Qiagen, ID/No:80204) following the manufacturer's protocol. The extractions were performed in six batches and each batch included a blank negative control containing only Buffer RLT, which were taken through all stages of library preparation and sequencing.
DNA extract concentrations and fragment length distributions were quantified. Based on DNA quantity and quality, 71 gut extracts were selected for sequencing, including 27 non-parasitized Galtaból individuals, 13 parasitized Galtaból individuals and 31 non-parasitized Þristikla individuals. The parasitized Þristikla group was excluded from sequencing due to low sample numbers (only four extracts were available). TruSeq Nano DNA libraries (Illumina) were prepared and sequenced by SciLifeLab Uppsala on two lanes of a Illumina NovaSeq 6000 SP flowcell using paired-end 150 bp read length v1 sequencing chemistry.
2.5 DNA Sequence data processing, taxonomic assignment and taxa filtering
Adapter and quality trimming of sequenced DNA reads was performed using AdapterRemoval v.2.2.4 (Schubert, Lindgreen, and Orlando 2016), trimming consecutive bases with quality scores of <30 and removing reads <50 bp after trimming. Paired reads from both lanes were concatenated into a single fasta file per sample. Reads with mean base quality < 30 and exact PCR duplicates (original or reverse complement) were filtered out with PrinSeq-Lite v0.20.4 (Schmieder and Edwards 2011). Reads resulting from the phage PhiX, a positive control spiked in during Illumina sequencing, were removed by mapping to the PhiX reference genome (GCF_000819615.1) with bwa mem v0.7.17 (Li 2013). The unmapped reads were retained with SAMTools v1.9 (Li et al. 2009) and BEDTools v2.27.1 (Quinlan and Hall 2010) downstream analyses. In the same manner, reads from human contamination were removed by mapping to the Homo sapiens reference genome (GCF_000001405.38) (Schneider et al. 2017) and reads from the host stickleback were removed by mapping to the G. aculeatus reference genome (GCF_016920845.1) (Nath, Shaw, and White 2021).
Microbial taxonomic assignment was then performed on the unmapped, non-host reads, using the k-mer based classifier Kraken2 v2.0.8 (Wood, Lu, and Langmead 2019) with the standard Kraken2 database (all archaea, bacteria, viruses and the human genome in RefSeq; built 2020-10-01) and default parameters. Bracken v2.0 (Lu et al. 2017) was used to estimate taxa abundances from the Kraken read assignments at the species level (-l S) using a read length of 150 bp (-r 150), a k-mer length of 35 bp (-k 35) and without an abundance threshold (-t 0). Kraken-biom (https://github.com/smdabdoub/kraken-biom) was used to extract the summarized abundances assigned at the species levels.
Quality control was then performed on the taxa abundances in RStudio v402 using R v4.3.1. To reduce noise, taxa present at < 0.05% relative abundance (Bracken abundance divided by sum of Bracken abundance in a sample) were filtered out. The community compositions of the blank control samples were then compared to the stickleback gut samples, with the aim to identify and remove putative laboratory contaminants. All six negative blank controls included in the DNA extractions had low numbers of microbial reads (mean: 403, range: 113–722). One blank had a microbial community more similar to the fish gut samples than the other blanks (Supplementary Figure S1). Since there was no clustering by extraction batch (Supplementary Figure S2), we determined that low-level cross-contamination between samples and the blank had occurred during this extraction batch. We therefore excluded this blank from contamination identification. Using the other five blanks, contaminants were identified using the decontam ‘prevalence’ function (N. M. Davis et al. 2018) with the default threshold of 0.1, resulting in 134 taxa identified as contaminants and removed from the dataset. Using the decontam ‘frequency’ function (N. M. Davis et al. 2018) with DNA extraction concentration, an additional 38 taxa were identified and removed. The final taxa abundances were summarized at the genus-level for subsequent statistical analysis.
2.6 Statistical analysis
2.6.1 Behavior
To test if there were differences between fish populations in the control treatment and between the control and the period before triggering the robotic predator, a linear model was performed. First, to test the swimming activity, we used Total Distance Swum, Velocity, and Angular Velocity as response variables. Second, to test tank use, Center Duration, and Border Duration were the response variables. For both swimming activity and tank use, the fixed factors included Population (Galtaból and Þristikla), Type (Control and Predator) and their interaction (Population by Type). Posthoc Tukey tests were used using the emmeans R package to observe differences between treatments (Lenth et al. 2020). This analysis is a post hoc of multiple comparison test on the population by type interaction (Hothorn et al. 2006) to assess only biologically meaningful pairwise differences between populations, i.e., Galtaból-Control vs Þristikla-Control; Galtaból-Predator vs Þristikla-Predator; Galtaból-Predator vs Galtaból -Predator; Þristikla -Predator vs Þristikla -Predator.
To further test the differences in behavior under the robotic fish stress, a linear mixed effect model was performed using Total Distance swum, Velocity, and Angular Velocity as response variables of Swimming activities parameters. Center duration and Border duration were used as well as response variables to test the fish boldness. Population (Galtaból and Þristikla), Period (Before and after triggering the robotic predator), and Parasite (presence-absence) and their interactions were used as fixed factors. The individual fish ID was used as a random effect factor. Diagnostics based on residuals of the model were performed to assess the adequacy of the reduced model and compliance with the underlying assumptions. A contrast list post hoc test was performed as previously described by using the R package emmeans (Lenth et al. 2020). This analysis is a multiple comparison post hoc test on the population by Period by Parasite interaction (Hothorn et al. 2006)to assess only biologically meaningful pairwise differences between populations, Period and Parasite (full results reported in Supplementary Tables S1-2).
The dependent variables were transformed whenever necessary to ensure that the residuals followed the assumed error distribution. Finally, the effects of the independent variables were estimated from the models and their significance was tested by likelihood ratio tests (LRT) between models, respecting the marginality of the effects that are supposed to follow a chi-2 distribution under the null hypothesis (type II tests; (Fox and Weisberg 2011)).
2.5.2 Microbiome
Alpha diversity metrics were calculated from Braken relative abundances using the Hill numbers framework with the R package hillR (Alberdi and Gilbert 2019), using q = 0 to estimate richness and q = 1 to estimate the Shannon index. Statistically significant differences (p < 0.05) in alpha diversity metrics between groups were evaluated using Wilcoxon rank-sum tests implemented in the R package ggsignif (Ahlmann-Eltze and Patil 2021). For unsupervised analyses, Bracken abundance counts were normalized by the center-log ratio (CLR) transformation, using a pseudocount of 1 added to all taxa in all samples to resolve the problem of zero values. Euclidean distance matrices were calculated from the CLR-transformed data with the phyloseq (McMurdie and Holmes 2013) function distance. Principal Coordinates Analysis (PCoA) was performed using the phyloseq function ordinate, using Euclidean distances. Permutational multivariate analysis of variance (PERMANOVA) was performed on the Euclidean distances using the adonis2 function in the R package vegan v.2.6–2 (https://github.com/vegandevs/vegan), including the following as covariates: sequencing depth (total Braken abundance counts in a sample), length of individual, population or parasite (depending on the comparison), type (control or predator-stimulated) and extraction date. As there was still an effect of extraction batch on our community composition data (Supplementary Figure S3, Supplementary Table S3), we used limma’s function removeBatchEffect (Ritchie et al. 2015) to regress out the DNA extraction batch effect from the normalized data and repeated the ordination and PERMANOVA as described.
General linear models implemented through MaAsLin2 (Mallick et al. 2021) were used to identify genera with significantly different relative abundance (calculated via total sum scaling) between 1) non-parasitized Galtaból and Þristikla individuals and 2) non-parasitized and parasitized Galtaból individuals. In both models, type (control vs predator) was included as a fixed effect while extraction batch was included as a random effect. MaAsLin2 was run without additional normalization or transformation steps and otherwise default parameters. Genera with adjusted p-values (MaAsLin2 q-value) < 0.05 were classed as significantly differentially abundant.
To integrate the behavioral and microbiome datasets, multi-omic factor analysis (MOFA) was performed using the R package MOFA2 (Argelaguet et al., 2018). MOFA is similar to PCA, where matrices of different types of data generated from the same individuals are reduced to a small number of latent factors representing the key contributors of variation across the datasets (Argelaguet et al. 2018). In this study, MOFA was performed on the control individuals and predator-triggered individuals separately. For the behavioral dataset, all variables were included, both before and after the trigger. The behavioural variables were converted to approximate a normal distribution using the inverse normal transformation. For the microbiome dataset, the 100 most variable genera in each MOFA run (control and predator) were included, using CLR-transformed abundances after regressing out the extraction batch effect (as described for PCoA). Both MOFA models were trained with 7 factors, using Gaussian distributions for both datasets, scaling each dataset to have similar variances and otherwise using default values. After the MOFA models were trained, the function correlate_factors_with_covariates in the MOFA2 package was used to identify factors that were significantly correlated (alpha = 0.05) with metadata variables (fish population, parasite status, length of individual and sample extraction batch). Factors that explained >1% of the variation in both datasets were investigated further. The top 10 features (behavioural variables and microbial genera) contributing to the variation captured by each factor (5 with positive weights and 5 with negative weights) were extracted and classified.