Cohort Metadata & Behavioural Scores
A total of 494 dog owners completed the Diet & Lifestyle Questionnaire via Qualtrics, with 235 participants continuing to complete the C-BARQ. After filtering respondents based on age, location, and consistency of diet and living arrangements, 72 dogs remained for the matching process (described above). Before matching, the behavioural scores for these 72 dogs were evaluated; the summary is displayed in Table 1. The mean composite anxiety score was 0.955, with a median of 0.782. Dogs with a composite anxiety score less than 0.782 were assigned to the lower anxiety group, with those with an anxiety score equal to or greater than 0.782 were assigned to the higher anxiety group. Similarly, the dogs with a composite
Table 1
C-BARQ behavioural scores from 72 pet dogs used to calculate median split for pair-wise matching of dogs in higher vs lower anxiety or aggression groups. The cohort displayed was selected from questionnaire respondents based on location, age (2–7 years), consistent diet and living arrangements for 3 + months.
| Anxiety Score (C-BARQ) |
| Composite | Stranger Directed | Dog Directed | Nonsocial Fear | Separation Related |
N | 72 | 72 | 72 | 72 | 72 |
Mean ± S.E. | 0.955 ± 0.07 | 0.82 ± 0.121 | 1.155 ± 0.119 | 1.05 ± 0.107 | 0.82 ± 0.101 |
Median | 0.782 | 0.5 | 1 | 0.83 | 0.5 |
Range | 0.0425–2.625 | 0–3.25 | 0–3.25 | 0–3.17 | 0–3.25 |
| Aggression Score (C-BARQ) |
| Composite | Stranger Directed | Dog Directed | Owner Directed | Familiar Dog |
N | 72 | 72 | 70 | 72 | 55 |
Mean ± S.E. | 0.565 ± 0.05 | 0.618 ± 0.08 | 1.061 ± 0.117 | 0.158 ± 0.03 | 0.406 ± 0.08 |
Median | 0.455 | 0.4 | 0.875 | 0 | 0 |
Range | 0–1.8825 | 0–2.9 | 0–4 | 0–1.38 | 0–1.75 |
aggression score less than the 0.455 median were assigned to the lower aggression group, with those scoring equal to or greater than 0.455 placed in the higher aggression group. The median values were overall low scores with respect to the maximum possible score for the most extreme aggression and anxiety cases (C-BARQ is scored from ‘no concern’ values of 0, to ‘most concern’ values of 4), indicating a clustering of dogs scoring close to 0 for both behavioural scales.
While fewer dogs had more concerning scores of 3–4 on C-BARQ, values for the n = 72 group ranged from 0-3.25 for individual anxiety scores on scales for stranger-directed fear, dog-directed fear, and separation-related fear, 0–4 for dog-directed aggression, and 0-2.9 for stranger-directed aggression.
Table 2
Metadata for 48 pet dogs used in microbiome analysis. a denotes statistical difference between groups (p = 0.020, Mann-Whitney U).
| | Anxiety Group | Aggression Group |
| All Dogs (n = 48) | Lower (n = 25) | Higher (n = 23) | Lower (n = 23) | Higher (n = 25) |
Age (years) ± S.E.M. | 3.95 ± 0.23 | 3.70 ± 0.33 | 4.22 ± 0.30 | 3.41 ± 0.33a | 4.44 ± 0.28a |
Male | 30 | 15 | 15 | 14 | 16 |
Female | 18 | 10 | 8 | 9 | 9 |
The mean age of the 48 dogs included in the microbiome analysis was 3.95 years (± 0.23 S.E.), and included 30 males and 18 females (Table 2). Of these dogs, the majority were spayed or neutered (n = 45) with 3 dogs remaining intact. Half of the cohort (n = 24) were regularly using a dewormer, while only 5 dogs were regularly supplemented with a commercial probiotic. Abundance analysis comparisons between dogs using probiotics and those not using probiotics were not found to be significant, so we left these in the dataset for further analyses.
Sequence Data Quality
A total of 4,405,983 reads were obtained from Illumina sequencing (91,791 ± 4016 reads per sample ± SE). After filtering, denoising, merging and removal of chimeras using DADA2, a total of 1,737,507 reads remained for the analysis (36,198 ± 1448 reads per sample) (Table S1). These sequences were clustered into 5508 taxa by seven taxonomic ranks. The most reads per genus identified across the cohort were Bacteroides, Fusobacterium, Prevotella_9, Megamonas and Alloprevotella (Fig. 1). Some genera such as Bacteroides and Fusobacterium have relatively low variance among the 48 samples, while others such as Prevotella_9 and Alloprevotella have a wider range across the samples (Fig. 1).
Figure 1. The top 20 most abundant genera, as per total number of reads, identified across the entire cohort of dogs (n = 48). The horizontal line within each box indicates the median and the diamond indicates the mean of the log2 of the number of reads.
Descriptive Statistics for Relative Abundance and Diversity Across Taxonomic Levels
The most abundant phyla detected across the entire cohort were Bacteroidota (relative abundance ± SE, 53.6 ± 2.3%), Firmicutes (23.9 ± 1.7%), Fusobacteriota (18.5 ± 1.8%) and Proteobacteria (3.7 ± 0.3%), with all other subdominant phyla having a relative abundance below 1%. At the class level, Bacteroidia were most abundant across the entire cohort (53.6 ± 2.3%), followed by Fusobacteria (18.5 ± 1.8%), Negativicutes (11.8 ± 1.6%), Clostridia (9.3 ± 0.6%), Gammaproteobacteria (3.7 ± 0.3%) and Bacilli (2.7 ± 0.4%). The order Bacteroidales was most abundant across the cohort (53.6 ± 2.3%), followed by Fusobacteriales (18.5 ± 1.8%). Veillonellales-Selenomonadales (11 ± 1.6%), Lachnospirales (4.4 ± 0.5%), Oscillospirales (3.9 ± 0.4%), Burkholderiales (3.1 ± 0.3%) and Erysipelotrichales (2.4 ± 0.3%). The seven most abundant families identified across all samples in this study were Bacteroidaceae (29.9 ± 2.5%), Prevotellaceae (23.5 ± 3.1%), Fusobacteriaceae (18.5 ± 1.8%), Selonomondaceae (11 ± 1.6%), Lachnospiraceae (4.4 ± 0.5%), Ruminococcaceae (3.6 ± 0.4%) and Sutterellaceae (3.1 ± 0.3%).
Both higher anxiety and higher aggression groups showed a greater number of reads in alpha diversity metrics ACE and Chao1 (Figs. 2A and 3A, respectively), with the aggression groups showing a greater distinction between the two curves. Overall, there were few statistically significant differences in relative abundance between behavioural groups across taxonomic levels (Table S2), with both anxiety and aggression groups displaying similar profiles at the family level (Figs. 2B and 3B). When comparing higher and lower anxiety dogs, the class Bacilli had a greater relative abundance in higher anxiety dogs (2.9 ± 0.4%) than in lower anxiety dogs (2.4 ± 0.7%, p = 0.043), as did the order Erysipelotrichales (2.7 ± 0.3% in higher anxiety dogs, 2.2 ± 0.6% in lower anxiety dogs, p = 0.039). The most notable differences in
relative abundance between aggression groups were the order Oscillospirales (4.8 ± 0.5% in higher aggression dogs, 3.0 ± 0.5% in lower aggression dogs; p = 0.05) and the family Ruminococcaceae (4.5 ± 0.5% in higher aggression dogs, 2.7 ± 0.5% in lower aggression dogs, p = 0.055).
The linear discriminant analysis (LDA) highlighted a difference in relative abundance of the genus Faecalibacterium as an important distinction between both higher and lower anxiety and aggression groups, with the genus Blautia also differing between anxiety groups (Table 3). In addition to Faecalibacterium and Blautia, the relative abundances of phylum Firmicutes, class Clostridia, order Erysipelotrichales and order Oscillospirales were highlighted as important distinctions between anxiety groups, while the order Oscillospirales and family Ruminococcaceae showed differing abundances between higher and lower aggression groups (Figure S1). While these were all considered interesting findings for the abundance LDA, only those indicated as different across two or more analyses are displayed in Table 3.
Microbial balances associated with behaviour groups
Microbiome data are compositional due to the total number of reads being constrained by the sequencing technology. This constraint introduces strong dependencies among the abundances of different microbes: when the proportion of one microorganism increases, the proportion of others must decrease in the data for the total number of reads to remain within the limit. Note, however, that those microbes whose abundance decrease might not be related to the trait or treatment of interest. Thus, considering the abundances independently can lead to the discovery of false associations. Balance approaches are aware of the compositionality of microbiome data and test for differences in the log ratios between microbial abundances (called balances). Balance approaches vary in how balances are calculated and how testing for differences in the balances is performed. We used two balance approaches: PhILR [28], which applies evolutionary models to guide the calculation of the log ratios, and Selbal [29]. Selbal searches for the two OTUs whose balance is most associated with the trait of interest, then adds other OTUs to this best balance to see if the new balance is better associated to the trait of interest in terms of the area under the receiver operating characteristic curve (AUC-ROC) for classification or the mean squared error (MSE) for regression.
We then compared Random Forest models generated with either the log ratios calculated by PhILR or the raw abundances to assess the benefits of using balances for classification and identify the most informative features. To further reduce the likelihood of discovering false associations, we only consider as likely true associations those taxa identified as associated with the behaviour group in two or more analyses (abundance LDA, PhILR, Selbal-classification, Selbal-regression and the two best Random Forest models) as displayed in Table 3. The genus Blautia was identified by all but one of the analyses, indicating a strong support for an association between this genus and anxiety level in dogs. The family Oscillospiraceae was associated with anxiety score in both Selbal analyses, and the phylum Firmicutes and family Peptostreptococcaceae were also indicated in the PhILR and Random Forest analysis.
Table 3 Summary of bacteria identified across two or more analyses. Analyses were linear discriminant analysis (LDA), phylogenetic isometric log ratio transform (PhILR), Selbal classification (Class), Selbal regression (Regr) & Random Forest (Abundance + Feature Selection (Ab+FS); Isometric Log Ratio Transform + Feature Selection (ILR+FS)).
| AGGRESSION | ANXIETY |
Taxonomic Level | Bacteria | LDA | PhILR | Selbal (Class) | Selbal (Regr) | Random Forest (Ab + FS) | Random Forest (ILR + FS) | LDA | PhILR | Selbal (Class) | Selbal (Regr) | Random Forest (Ab + FS) | Random Forest (ILR + FS) |
Phylum | Firmicutes | | * | | | | * | | * | | | | * |
Order | Burkholderiales | | * | | | | * | | * | | | | |
Family | Oscillospiraceae | | | | | | | | | * | * | | |
Family | Peptostreptococcaceae | | * | | | | * | | * | | | | * |
Genus | Bacteroides | | | | | | * | | * | | | | * |
Genus | Blautia | | * | | | * | | * | * | * | * | * | |
Genus | Faecalibacterium | * | | | | * | | * | | | | * | |
Genus | Faecalitalea | | * | | | | * | | | | * | | * |
Genus | Parasutterella | | * | | | | * | | * | | | | |
Genus | Turicibacter | | | * | * | | | | | | | * | |
Based on the balance between Blautia and the mean of Oscillospiraceae and Negativicutes (Fig. 4), Selbal was able to assign a dog based on the bacteria found in its fecal sample to the higher or lower anxiety group with AUC-ROC of 0.856. The AUC-ROC indicates the probability that a random high-anxiety dog will be considered by the classifier more likely to belong to the higher anxiety group than a random low-anxiety dog. A perfect classifier has an AUC-ROC of 1 and a random classifier has an AUC-ROC of 0.5. According to Selbal, higher anxiety dogs typically had greater amounts of Blautia with respect to Oscillospiraceae and Negativicutes than lower anxiety dogs.
Figure 4. (A) Selbal analysis identified the balance between Oscillospiraceae and Negativicutes (numerator) and Blautia (denominator) as an important distinguishing factor between higher and lower anxiety dogs. (B) The balance between these bacteria could predict the assigned behavioural group (higher or lower anxiety) with an AUC-ROC of 0.856.
The genus Turicibacter was associated with aggression score in both classification and regression Selbal analyses, and the phylum Firmicutes was an important distinguishing factor between higher and lower aggression groups in PhILR and Random Forest. However, there are fewer taxa associated with higher and lower aggression groups overall when compared to the anxiety analysis (Table 3).
Using Selbal, further investigation into individual anxiety-related C-BARQ scores for dog-directed, stranger-directed, and nonsocial fear (Table 4) indicated Oscillospiraceae as the family most closely associated with stranger-directed fear, along with the genus Faecalitalea and Phascolarctobacterium succinatens. Blautia and Parasutterella were associated with nonsocial fear at the genus level, and interestingly, Blautia was further identified to the species level as Blautia hansenii when associated with the stranger-directed fear analysis. Phylum Campylobacterota, and genus Clostridium sensu stricto 1 were associated with dog-directed fear.
Table 4
Further Selbal analyses based on continuous anxiety-related C-BARQ scores dog directed fear, stranger directed fear, and nonsocial fear.
Further Selbal Results (Anxiety) |
Taxonomic Level | Bacteria | Dog Directed Fear | Stranger Directed Fear | Nonsocial Fear |
Phylum | Campylobacterota | * | | |
Family | Oscillospiraceae | | * | |
Genus | Faecalitalea | | * | |
Genus | Clostridium sensu stricto 1 | * | | |
Genus | Parasutterella | | | * |
Genus | Blautia | | | * |
Species | Blautia hansenii | | * | |
Species | Phascolarctobacterium succinatens | | * | |
We generated four Random Forest models per trait: two using as input PhILR log ratios and two using as input the abundances. In addition, for two of the models we removed all those features whose permutation did not cause a decrease in classification accuracy (this process is called feature selection), as these features presumably are un-informative. When comparing higher and lower anxiety groups, the models with the highest classification performance were those generated using the PhILR log ratios (ILR) and feature selection (FS) (Fig. 5). This ILR + FS model for anxiety achieved an AUPRC of 0.82 and AUC-ROC of 0.87. Using this model one can identify half of higher-anxiety dogs with a precision of around 87% (Fig. 5). Genera
Lachnoclostridium, Fusobacterium, Bacteroides, Butyricicoccus, Escherichia-Shigella, Catenibacterium, and Faecalitalea, family Peptostreptococcaceae, and phylum Firmicutes were associated by this model with anxiety levels. The ILR + FS model for aggression achieved an AUPRC of 0.74 and AUC-ROC of 0.73. Using this model, one can identify half of higher-aggression dogs with a precision of around 75% (Fig. 6). Genera Bacteroides, Prevotella_9, Faecalitalea, Blautia and Parasutterella, family Peptostreptococcaceae and Lachnospiraceae, order Burkholderiales, and phylum Firmicutes, were associated by this model with aggression levels.
Figure 6. Precision-Recall Curves for Random Forest models predicting assignment of dogs based on their fecal microbiota to higher aggression group, generated using abundance, abundance + feature selection (FS), PhILR log ratios (ILR), and PhILR log ratios with feature selection (ILR + FS). The solid line indicates the average cross-validation Precision-Recall curve and the shaded area indicates the performance range per model observed during cross-validation.