Characteristics of the primary cohort
The primary cohort included 1,262 participants (38.43% women) after all exclusions (see method for details). In 537 (42.55%) participants at least one thyroid nodule was detected. The mean ages of the control and TN groups were 42 and 47 years old. The participants were then classified into four groups according to the thyroid ultrasonographic findings: (1) 725 demonstrated no remarkable findings (controls); (2) 172 had Thyroid Imaging Reporting and Data System (TI-RADS) level 2 (TN2); (3) 328 had TI-RADS level 3 (TN3); and (4) 34 had TI-RADS level 4a and 3 had TI-RADS level 4b (TN4), of which six TI-RADS level 4a and 3 TI-RADS level 4b underwent fine-needle aspiration biopsy, which did not identify malignancy. TI-RADS 4b were not included in the downstream analysis, despite having no malignancy. Participants with more than one thyroid nodule were classified according to the highest TI-RADS level. The demography and laboratory test data are shown in Table S1.
Age and blood pressure were higher in the TN group. In addition, when compared with female controls, female TN had higher body mass index (BMI), waist and hip circumference(WC and HC), triglyceride (TC), total cholesterol (TG), high-density lipoprotein-cholesterol (HDL-C) concentrations, and red blood cell counts (RBC). No significant differences were found between the TN2 and control groups, or between the TN3 and TN4 groups, except for higher systolic blood pressure in female TN4 and higher RBC in male TN4. Thyroid function and lifestyle habits, including smoking and drinking, were not related to TN incidence. Principal component analysis (PCA) based on demographics identified the principal components PC1 and PC2, which correlated with TI-RADS. Pulse pressure (PP) and age were with accordant contribution, together with BMI, WC and HC (Figure S1).
Gut microbial characteristics of the primary cohort
A total of 928 species presented in at least five samples were used for downstream analysis. 343 species of these were viruses, six were Archaea, and 575 were bacteria. 61 species had a median relative abundance >0.01%, comprising 76.4% the total microbiome. 839,310 gene families and 459 metabolic pathways were represented in at least five samples (Figure S2). The representation of metabolic pathways was more equal than that of species, demonstrated by mean Gini indexes of 0.768 and 0.983, respectively (Table S2). In addition, the intra-group and inter-group similarities and distances standard deviations were lower, with respect to the metabolic pathways represented, compared with the similarities based on species composition. (Figure S3). Principal coordinates analysis (PCOA) conducted on the basis of metabolic pathway composition showed that the principal coordinates PCO1 and PCO2 correlated with the presence of TNs and TI-RADS (Figure 1a), but not species composition.
TN2 were more similar to controls when inter-group distances were compared with TN3 and TN4 (Figure S4), except with respect to Spearman coefficient distance (SCD) based on species composition. When the distribution of PCOs was analysed, few significant differences were found between the control and TN2 groups(Table S3 and Table S4). There were significant differences between TN2 and TN3 in PCO1 for metabolic pathways (Table S4). Because >99% of TNs with TI-RADS level 2 are benign and do not require treatment[13], we then classified control and TN2 as having no or low grade TIRADS (Group LTN) and those with TN3 or TN4 as having advanced grade TIRADS (Group HTN).
1259 individuals could be clustered into two enterotypes on the basis of species composition (Figure S5). Enterotype 1 had higher microbial diversity, and contained a higher proportion of women (41.1%, compared with 30.9% in enterotype 2). None of TN incidence, TI-RADS level, or thyroid hormone concentrations (serum FT3 and FT4) were significantly associated with enterotype, except that men with enterotype 1 had higher serum TSH concentrations (Figure S6).
Lower microbial richness in TI-RADS level 4a
The number of observed species (OBS) was significantly lower in TN4 than in the other groups (Wilcoxon’s rank sum test, p = 0.005, p = 0.01, and p = 0.011, compared with control, TN2, and TN3, respectively). 56.91% (247 of 434) of the species not detected in TN4 were viruses, and these were less abundant than other species in all the participants: the median fold-differences (common species versus species not found in TN4), based on the relative abundance in all the participants (including TN4), were 121.52 and 2.67 for bacteria and viruses, respectively. The majority of the viruses not detected in TN4 were phages that might be hosted by Enterobacteriaceae. The number of gene families identified and viruses ratio were also lower in TN4 (Figure 1b-e).
The OBS was significantly higher in women in the control and TN2 groups, and was also associated with the age of the women and the BMI of the men in the control group (Figure S7). In considering female ratio, BMI and age were higher in the TN4 group than in the control and TN2 groups. In addition, the TN4 group was smaller; therefore, we used bootstrap sampling to generate sample number-, sex-, age- and BMI-matched subsets of controls and TN2, such that each subset contained 40 randomly chosen samples. However, the OBS in the TN4 group was still significantly lower after resampling (Wilcoxon’s rank sum test, median p = 0.0365 and median p = 0.0165, compared with controls and TN2, respectively) (Figure S8). We also used analysis of variance (ANOVA) to confirm the lower microbial richness in the TN4 group, after adjusting for age, sex, and BMI (p = 0.005, p= 0.014, and p= 0.017, compared with the control, TN2, and TN3 groups; Table S5). No ultrasonographic parameters were associated with OBS, except that male TN without thyroid echo had higher OBS (p = 0.004) (Figure S9).
Gut microbiota changes in thyroid nodules
We analysed 418 prokaryote species and 440 metabolic pathways that had been identified in ≥20 samples. Sixteen species and 24 pathways withstood adjustment for multiple covariates (sex, age, and BMI) and false discovery rate(FDR) controlling multiple testing to show distinct relative abundances between LTN and HTN, by using multivariate association with linear models(MaAsLin2, see method, q value < 0.25) (Figure 2a and Table S6), and these comprised 0.67% and 4.68% of the total numbers of microbial species and pathways, respectively (Figure S10).
We found that butanoate formation pathways (P163-PWY and PWY-5676) and Butyrivibrio (unclassified) were over-represented in LTNs (Figure 2c), as was the essential transfer RNA charging pathway (TRNA-CHARGING-PWY), which incorporates amino acids into growing polypeptides, could promote butyrate formation by the intestinal microbiota[14,15]. L-isoleucine biosynthesis (PWY-3001) was enriched in HTNs, and in vitro studies of ruminal fermentation have shown that butyrate production increases with isoleucine addition[16].
The metabolic pathways that were perturbed in HTNs were mainly involved in amino acid degradation and glycolysis. Several metabolomic studies have shown the accumulation of L-glutamate[17] and lactate[18,19] in benign cases of TN, which might be contributed to by the L-histidine and 4-amino butanoate (GABA) degradation (by PWY-5030, HISDEG-PWY and GLUDEG-I-PWY), glycolysis (Glycolysis-I, PWY-5484) and homolactic fermentation (ANAEROFRUCAT-PWY) pathways (Figure 2b). The human intestinal microbiome can ferment lactate to generate butyrate in the presence of polysaccharide[20]. However, the accumulation of lactate and the resulting low pH environment would inhibit butyrate formation[21,22]. In addition, inter-individual variations in short-chain fatty acid (SCFA) synthesis ability are typical[23]. Although L-glutamate degradation (P162-PWY) could also yield butyrate via butanoyl-CoA, P162-PWY was far less abundant (mean relative abundance 0.0077%) than PWY-5676 (0.108%, detected in 867 LTN and 341 HTN), as was another LTN-enriched butyrate synthesis pathway (P163-PWY; 0.00018%) and a neutral pathway CENTFERM-PWY (0.012%) that can also ferment pyruvate to produce butanoate (Figure S11). The gut microbiota participating in PWY-5676, which was the main butanoate produce pathway via acetyl-CoA fermentation, could not be classified into any known species, but were positively associated with several important butyrate-producers (Table S7 and Figure S12). As a well-recognized histone deacetylase inhibitor[24,25], sodium butyrate could induce growth arrest in a variety of normal cell types[26,27], enhance radiosensitivity[28], and regulate the expression of several tumour-suppressor genes[29,30,31], as well as increasing I125 uptake[32], which has been shown in thyroid follicular carcinoma cells to occur by restoring human sodium-iodide symporter function (nHIS). It has also been shown to increase thyroid hormone receptor expression in GH3[33] and glial C6[34] cells.
Furthermore, we identified Bacteroidesplebeius, which is of marine origin and was found to be ubiquitous in a Japanese cohort[30] and with porphyran-degrading ability[31], to be enriched in male LTNs after covariates (BMI and age) adjusting (multiple linear regression, adjusted p < 0.05). B. plebeius positively correlated with BMI[35,36], despite LTNs having lower BMI. Numerous in vitro studies have demonstrated that red or brown seaweeds can be fermented to produce butyrate in the human gut, but porphyran or green seaweeds are not effective as substrates[37]. Increases in the populations of Bifidobacterium and Lactobacillus were accompanied by the loss of butyrate-producing bacteria in patients with inflammatory bowel disease[38]. We found that the populations of two Lactobacillus species (L. salivarius and L. sanfranciscensis) were higher in whole HTNs (see method, q < 0.25) and that Bifidobacterium longum was enriched in female TN4 (multiple linear regression, adjusted p < 0.05, Table S8 and Figure S13).
Association of gut microbiota and thyroid function
Data from 624 participants with normal thyroid function were used for association analysis. All the thyroid parameters measured correlated with age and sex (Table S9). 200 pathways and ten species were associated with at least one thyroid parameter after adjustment for covariates by using multivariate association with linear models(MaAsLin2, see method, q value < 0.25)(Table S10). TSH inversely correlated with FT3 and FT4 concentrations, and the microbial features that positively correlated with TSH also showed inverse associations with FT4 and FT3, but for the majority no significant associations were identified (Figure 3a).
Thyroid function was associated with a broad range of metabolic pathways(Figure 3b). L-histidine biosynthesis (HISTSYN-PWY) was positively associated with serum TSH, perhaps because it is an essential component of thyrotropin releasing hormone (TRH) and essential amino acid that is not synthesized de novo in humans, and therefore could increase TSH synthesis and secretion from the anterior pituitary[39]. Tyrosine biosynthesis (PWY-6629, PWY-6630) and biosynthesis (PWY-6628) were also positively associated with FT3, tyrosine is composed of iodinated tyrosine residues (Figure 3c), phenylalanine is an essencial amimo acid which could be converted into tyrosine via hydroxylation in human. Previous studies have shown that low serum tyrosine is associated with reduced circulating FT3 concentration and thyroid dysfunction[40,41,42]. Otherwise, the majority of amino acid and adenosine biosynthetic pathways negatively correlated with FT3. Caloric restriction could reduce thyroid hormone concentrations by inhibiting 5’-deiodinase activity, which might be caused by lower production of NADPH (PWY-7269)[9], accompanied by lower ATP content and synthase activity [43,44]. Conversely, administration of thyroid hormones could reduce plasma total free amino acid concentration[45] and inhibit ATP hydrolysis[46].
Ubiquinol (Coenzyme Q, CoQ), menaquinone (Vitamin K2), various B vitamins (including B9, B12, B1, and B7) which are only synthesized by plants, yeasts and bacteria but not by human host, and some essential pathways involved in CoQ and vitamin biosynthesis were positively associated with FT3 (Table S10). We found Biotin (vitamin B7), which is a commonly used supplement in practice, might induce abnormal thyroid function test results[47], via its direct effect to increase thyroid hormone and reduce TSH concentrations [48,49].
Models to distinguish advanced TIRADS thyroid nodules
To evaluate the potential for the gut microbiome to be used to distinguish HTN, we constructed two types of classification models: a) Demographic models based on age, gender, BMI and SP. b) Mixed models based on the microbial species present and their metabolic pathways, together with the demographics for 1,259 samples. All the available 180 LTNs and 114 HTNs from the CHIP cohort were used as an independent validation cohort; the corresponding samples were still in sequencing while we were analyzing the primary cohort data. No obvious batch effect bias was detected between the train and the independent validation cohorts, so we did not filter out any samples (Figure 4a). Model performance was assessed by receiver operating characteristic (ROC) curve analysis using the area under the curve (AUC).
We used a combination of gradient boosting machine (GBM) feature importance ranking and random forest backward elimination to remove irrelevant features that might have hampered model performance in a 10-fold cross-validation in train samples (Figure S14), then randomly divided 1,259 samples into 886 train and 373 cross-validated samples. Models constructed on train samples performed the best with 53 selected features (Figure S15 and Table S11), which achieved an AUC of 0.734 (95% confidence interval 0.678–0.789) and AUC of 0.739 (0.672-0.806) on cross validate samples and independent validate samples (Figure 4c-f). Demographic models did not filter out any features and could achieve AUC of 0.7016 (0.645-0.7581, p = 0.086) and 0.6867(0.6175-0.7559, p = 0.039) compared with mix models. Age, gender, BMI and SP still had significant contribution to mix models and 10 selected microbiota features were significantly associated with HTN group (Figure 4b). A generalized AUC of 0.736 was obtained after iteratively choosing 200 random samples from the independent cohort 100 times (Figure 4g).