Microbial composition in three fermented soy products
A total of 93 fermented soy products (DC, DJ and FR) were purchased from online and offline supermarkets across 20 provinces (Table S1). All samples were subjected to shotgun metagenomic sequencing, generating a total of 203.4 Gb data, with an average of 12,691,369 reads per sample.
Compared to DC and FR, DJ samples exhibited a notably higher abundance of eukaryotic organisms (Fig. 1A). To minimize false positives in Kraken2 classification, only species with assigned reads greater than 500 in at least 10 samples per group were retained for further analysis. Notably, DJ samples contained a diverse range of 10 distinct eukaryotic species, whereas DC and FR samples contained only 3 and 1 identified species, respectively (Figure S1A). All three fermented foods shared a single eukaryotic species, Aspergillus oryzae, which was highly prevalent in each (Figure S1B).
In regard to bacterial composition, the dominant phyla found in DC, FR, and DJ samples were Firmicutes (80.9%, 48.8%, 46.2%, respectively) and Proteobacteira (9%, 41.1%, 25.3%, respectively) (Fig. 1A). In DC, the dominant order was Bacillales (66%). On the other hand, Lactobacillales was the most dominant order in FR and DJ (39.9% and 30.1%, respectively), and was also the second most abundant order in DC (14.4%) (Fig. 1A). Notably, Lactobacillales encompassed a diverse group of lactic acid bacteria that play crucial roles in food production, fermentation, and development of probiotic products. As highlighted in Figure S1C, the primary genera in DC were Bacillus (50.3%), Caldibacillus (6.54%), Staphylococcus (5.8%), Tetragenococcus (5.31%), and Weissella (2.46%). In DJ samples, the dominant genera were Staphylococcus (9.38%), Pantoea (7.75%), Tetragenococcus (6.60%), Weissella (5.49%), and Leuconostoc (3.56%). Primary genera of FR samples included Tetragenococcus (15.4%), Enterobacter (9.19%), Lactococcus (8.04%), Pseudomonas (7.22%), and Leuconostoc (5.33%). FR had the highest proportion of identified species (76.2% of total bacterial species) among the three types of fermented foods, in which 52.9% can only be exclusively found in FR samples (Figure S1A). Prevalent bacterial species, present in over 90% of samples in each group, include Bacillus, Enterococcus, Lactococcus, Leuconostoc, Staphylococcus, Tetragenococcus, and Weissella. All of these genera fall within the lactic acid bacteria category and are classified under the class Bacilli, with the majority belonging to the order Lactobacillales.
Alpha diversity analysis showed that the DC group exhibited the least species diversity (Shannon index) among the three types of fermented soy products (Fig. 1B). This observation is justifiable considering the predominantly solid-state nature of DC samples. To investigate the microbial community structures among samples, we performed non-metric multidimensional scaling (NMDS) based on Bray-Curtis dissimilarity. The NMDS plot showed significant differences in microbial communities among the three fermented foods (Fig. 1C), in which PRMANOVA analysis indicated a 19% contribution to the total variability (R2 = 0.19, p = 0.001). Notably, an even greater contribution to the dissimilarity was attributed to geographic origin (R2 = 0.26, p = 0.001). These results indicated that the composition and structure of fermented soy products were dependent on both food type and geographic origin. A plausible hypothesis is that close proximity in regions leads to the adoption of similar fermentation processes. To validate this theory, the Mantel test was used to examine the correlation between geographical distance and Bray-Curtis dissimilarity of the microbiome. The results of the Mantel test, whether applied to the entire sample set (r = 0.092, p = 0.0045) or subgroup samples, supported this relationship (Fig. 1D). This indicated a strong positive correlation between the geographic distance and Bray-Curtis dissimilarity matrices (Figure S1D).
Linear discriminant analysis Effect Size (LEfSe) was used to analyze the microbial taxa in fermented foods. The results highlighted that Firmicutes and Bacteroidetes were the dominant microbial taxa in DC and FR samples respectively, and a multitude of other phyla were enriched in DJ samples (Fig. 1E). Notably, there were 56, 89, and 115 species enriched in DC, DJ, and FR samples respectively (Linear Discriminant Analysis, LDA > 2). The most enriched species in the three fermented foods were fermentation starters (DC: Bacillus subtilis (LDA = 5.07); DJ: Aspergillus oryzae (LDA = 4.88); FR: Tetragenococcus halophilus (LDA = 4.81). B.subtilis was identified as a product component in several DC products. Apart from B.subtilis, DC-enriched Bacillus species such as B.velegensis, B.amyloliquefaciens, and B.licheniformis were also commonly used in fermented soy products. All Bacillus species are able to produce bioactive compounds, which are known to exert health benefits.[21, 22] FR samples were enriched with several lactic acid bacteria, including Lactococcus lactis, Ligilactobacillus acidipiscis, Leuconostoc citreum, Leuconostoc lactis, Leuconostoc mesenteroides (Table S2).
On the other hand, a notable presence of opportunistic pathogenic species mostly from the Proteobacteria phylum and the Enterobacteriaceae family was observed in each group. This includes Klebsiella pneumoniae, Klebsiella aerogenes, Klebsiella oxytoca, Klebsiella michiganensis and Enterobacter hormaechei enriched in FR, as well as Enterobacter cancerogenus enriched in DJ. Using a read count of > 1000 as cut-off, we also detected several foodborne pathogens including Bacillus cereus, Clostridium botulinum, Clostridium perfringens, Cronobacter sakazakii, Escherichia coli, Listeria monocytogenes, Salmonella enterica, Staphylococcus aureus, Vibrio anguillarum and Yersinia enterocolitica in our fermented food samples (Fig. 1F). Amongst these, Salmonella enterica was notably present in FR (LDA = 2.63).
Functional landscapes of microbiome in fermented soy products
To better understand the functions of microbiome in fermented soy products, sequenced reads were assembled into contigs and the genes predicted from these contigs were dereplicated. This resulted in the Soybean Fermented Food non-redundant Gene Catalogue (SFFGC, see Methods) which contained a total of 2,359,387 genes. The functional annotations of these genes were accomplished using a multiple of available databases, including Clusters of Orthologous Genes (COGs), Kyoto Encyclopedia of Genes and Genomes (KEGG), Enzyme Commission Categories (ECs), Comprehensive Antibiotic Resistance Database (CARD), and Carbohydrate Active Enzymes (CAZys).
Our analysis revealed that 84.4% of the SFFGC genes could be successfully annotated using at least one of the aforementioned databases (Fig. 2A). Among these genes, only 8.54% originated from DJ samples, while over half (53.4%) were derived from FR, and 38.1% were from DC. A total of 864 KEGG pathways, 734 KEGG modules, 11930 KEGG Orthology (KO) groups, and 327 CAZy gene families were annotated to the SFFGC genes. For each functional component, the abundance was calculated based on the collective abundance of SFFGC genes associated with that element. Additionally, the abundance of 562 MetaCyc pathways were determined using HUMAnN3. NMDS plot of Bray-Curtis dissimilarity based on KO abundances revealed significant microbial function differences among the three types of fermented foods (Fig. 2B).
Using a LDA score of > 2 as threshold, the LEfSe results showed discernible differences in the abundance of genetic elements among the three groups. Notably, these differences encompassed 7.4% of KO genes, 79.8% of CAZy gene families, 61.7% of KEGG modules, 21.6% of KEGG pathways, and 21.5% of MetaCyc pathways. Interestingly, there was a notably higher count of DC-enriched KO genes, FR-enriched antibiotic genes, and DJ-enriched CAZy gene families (Fig. 2C). We also observed that most reads (66.7% on average) of MetaCyc pathways enriched in DJ were assigned to “UNMAPPED” by HUMAnN3. In contrast, 40% of reads in DC and 35.4% in FR were unassigned, indicating a significant proportion of unknown microbial functions in fermented foods. A significant increase in AA and GT counts were observed in DJ-enriched CAZy gene families (Fig. 2D). GTs are known for facilitating the synthesis of glycosidic linkages by transferring sugar moieties from phospho-activated sugar donors to various acceptors, including both saccharide and non-saccharide molecules. This enzyme family is crucial in the biosynthesis of disaccharides, polysaccharides, and oligosaccharides.
In regards to the resistome of the fermented soy products, a total of 2406 distinct antibiotic resistance genes were identified and annotated with 734 terms from the Antibiotic Resistance Ontology (ARO). Notably, the resistance-nodulation-cell division (RND) antibiotic efflux pump resistance genes were the most prominent (26.6% (640 genes)), followed by the major facilitator superfamily (MFS) antibiotic efflux pump genes (17.6% (424 genes)). 373 of the 734 ARO terms were present in all three fermented soy products, each with non-zero abundances in at least 10% of samples (Fig. 2E). Additionally, LEfSe analysis (LDA score > 2) revealed that 54.4% (399) of these terms varied in abundance across the different food types (Fig. 2C).
Antagonistic activity of Lactobacillales or Bacillales against Enterobacterales
To uncover the inter-relationships among these species, co-abundance microbial species networks were constructed for each food category. Only species present in at least 5 samples in each group were included in the microbial network inference using SparCC algorithm. For species networks, we identified a total of 458 correlations in DC, 371 in DJ, and 401 in FR (FDR < 0.05) (DC in Fig. 3A, DJ and FR in Figure S2). After clustering, we observed that species exhibited a tendency to form “bacteria clique” based on their taxonomic order affiliations (Fig. 3A, 3B).
Analysis revealed a notable competition between Lactobacillales and Enterobacterales across all three categories of fermented foods. In each group of fermented soy products, species belonging to the same taxonomic order demonstrated positive correlations, whilst those from distinct orders-such as Bacillales and Enterobacterales, or Enterobacterales and Lactobacillales-displayed negative correlations. An exception existed in the relationship between Bacillales and Lactobacillales, which exhibited predominantly positive correlations in DC and DJ but displayed negative correlations in FR (Fig. 3C). In the constructed networks, species with a degree exceeding the 80th percentile were designated as keystone species. Notably, in the context of DJ and FR networks, approximately 50% of the identified keystone species originated from the Enterobacterales order (Fig. 3D). This observation highlighted the significant role of Enterobacterales species in the microbial ecosystem of fermented foods.
Recovery of fermented food bacteria genomes from metagenomes
To better understand the distinctions among the three fermented soy products at the genomic level, we used the binning method in metaWRAP on assembled contigs to generate metagenomic assembled genomes (MAGs). Following this pipeline, we recovered a total of 707 high-quality MAGs, with completeness greater than 70% (average: 89.08%) and contamination less than 10% (average: 2.5%) (Figure S3A).
Median size of 707 MAGs was 2.31 megabases (MB) (2.08–3.75 MB), the N50 values ranging from 1.7 kilobases to 2.8 Mb. 363 MAGs (51.3% of the total dataset) were > 90% complete and < 5% contaminated; 10 MAGs even reached 100% completeness and thus referred to as near-complete genomes. Among these MAGs, 36%, 12%, and 52% were from DC, DJ and FR, respectively. The 707 MAGs were dereplicated at a 99% average nucleotide identity (ANI), resulting in 304 dereplicated MAGs, which were used to construct the phylogenetic tree (Fig. 4).
The 707 MAGs were spread over 2 archaea and 705 bacterial genomes. The majority of bacterial genomes belonged to the Phylum Firmicutes (533, 76%), Proteobacteria (87, 12%), Actinobacteriota (50, 7%) and Bacteroidota (30, 4%). A substantial number of MAGs (n = 648 or 91.6%) were successfully allocated to 184 known species, and the remaining 59 MAGs (8.4%) were currently unclassified (Figure S3B).
We estimated ANI against 666 reconstructed food MAGs from Pasolli et al,[10] using a 95% ANI cut-off defined as intraspecies genomes. The 666 food MAGs were reconstructed from 303 metagenomes of different fermented food types from 12 datasets, which were comprised mostly of fermented foods in the Western culture. Only 69 (22.7%) of our dereplicated MAGs matched 275 (41.3%) of the 666 food MAGs.
We used metaProbiotics to predict whether MAGs could potentially act as probiotics.[23] This method employed innovative biological sequence representation by converting 8-mers of DNA sequences into word vectors using natural language processing (NLP) techniques, and built prediction models using random forests. 138 of the 304 dereplicated MAGs were predicted as potential probiotics (Fig. 4). The majority of predicted probiotics were Actinobacteriota, the ratio of predicted probiotics to non-probiotics were 31:6, followed by Firmicutes (88:86), Proteobacteria (12:59), and Bacteroidota (6:15). These results suggested that fermented foods may be a source of potential probiotics.
Among the 2406 antibiotic resistance genes (ARGs) annotated, a total of 721 were detected in our set of 209 MAGs, and 342 of these genes (47.43%) were classified under the Enterobacterales order. Notably, the five most prevalent species that possessed ARGs were Klebsiella pneumoniae (32 instances), Enterobacter hormaechei_A (31 instances), Acinetobacter baumannii (28 instances), Enterobacter cloacae (27 instances), and Enterobacter kobei (26 instances). It is worth noting that ARGgene families within the Enterobacterales species included resistance-nodulation-cell division antibiotic efflux pump and major facilitator superfamily antibiotic efflux pump (Figure S4). These findings collectively suggest that Enterobacterales species serve as the principal reservoir of antimicrobial resistance in fermented soy products.
Comparative analysis indicated Klebsiella strains in gut of Chinese individuals may originate from fermented foods
To investigate the prevalence of fermented food bacteria within the gut microbiome of healthy Chinese individuals, we accessed the taxonomic profile of 689 individuals from 8 publicly available Chinese metagenomic datasets curated in CurateMetagenomicData.[24] We used MetaPhlan3 for taxonomic profiling of our metagenomic data derived from fermented foods to ensure comparability with data in CurateMetagenomicData. Only species present in more than 50% of samples in each food group were considered, resulting in a total of 67 species included in the analysis.
The results showed that the most prevalent species were from the order Enterobacterales, including three Klebsiella species: Klebsiella pneumoniae (62.4%), Klebsiella variicola (50.5%) and Klebsiella quasipneumoniae (42.5%). Additionally, the Enterobacter cloacae complex (16.9%) and Citrobacter freundii (12.04%) were also significant (Fig. 5A).
The most common lactic acid bacteria were Weissella confusa (7.55%) and Lactococcus lactis (4.35%). Notably, Lactococcus lactis in the Chinese population was observed to be lower than the global average of 7.5%.[10] To determine potential associations between Enterobacterales species and lactic acid bacteria in the gut microbiome, samples containing both K. pneumoniae and either W. confusa or L. lactis were selected for analysis. Findings indicated that the co-occurrence of Weissella confusa/Lactococcus lactis and Klebsiella pneumoniae was found to be mutually exclusive (validated through a chi-squared test, both p-values < 2.2e-16). This observation was consistent with the network analysis results mentioned above, revealing an antagonistic relationship between Enterobacterales and lactic acid bacteria.
We conducted a genomic level investigation to gain a deeper understanding of the complex relationship between bacterial species present in fermented foods and those in the gut of Chinese individuals. 24417 gut MAGs of Chinese individuals were retrieved from 154,723 human MAGs by Pasolli et al.[25] Using fastANI (95% ANI intra-species cutoff), we identified 203 hits from the 303 food-derived MAGs in this study. Remarkably, MAGs annotated with K. pneumoniae showed the highest number of matches (Fig. 5B). This supported the idea that K. pneumoniae found in both fermented foods and the gut belonged to the same species. Additionally, we observed that potentially harmful species were more prevalent than putative beneficial species in the gut metagenome-assembled genomes. Examples include Proteus mirabilis, a prevalent pathogen that can cause a range of diseases,[26] and Morganella morganii, a colorectal cancer-associated species enriched in the gut microbiota of inflammatory bowel disease (IBD) and colorectal cancer (CRC) patients.[27]
In order to analyze the relationship between fermented food and gut microbial species at the strain level, StrainPhlan3[28] was used to construct the phylogenetic relationships of common species in both fermented foods and gut samples of Chinese individuals. Only two phylogenetic trees (Klebsiella quasipneumoniae and Klebsiella pneumoniae)—had terminal branches representing lineages found in both food and gut samples (Fig. 5C). Notably, the phylogenetic tree related to K. pneumoniae showed a distinct branch specific to gut samples. However, it is noteworthy that several gut samples were scattered among the branch predominantly populated by fermented foods samples.
Potential Horizontal Gene Transfer of Antibiotic Resistance Genes from Fermented Foods to the Human Gut Microbiome
A comprehensive analysis was conducted to evaluate the risk of ARGs in fermented foods being transmitted to the human gut microbiota. A BLAST search was conducted on our set of 2,406 ARG protein sequences against the extensive collection of 216,849 ARGs from the human microbiome, as curated by Lee et al.[29]
Our analysis revealed 898 matches between ARGs in fermented foods and 4,518 analogous ARGs in the human gut microbiome. Of these, 4,219 hits (93.38% of the total) originated from fecal metagenomes. The hit matches spanned 140 ARG families, representing 32.4% of human gut ARG families. Given that Pasolli et al.'s Species-level genome bins (SGBs) were also used by Lee et al.,[10] we extended our analysis to explore potential routes for ARG transfer. Specifically, we linked the ARGs identified in fermented foods to Chinese SGB genomes from Pasolli et al.'s dataset to uncover potential pathways for ARG transfer. Interestingly, it was found that ARGs from fermented foods that matched Chinese SGB genomes were primarily associated with Enterobacteriaceae bacteria in the gut. The ARGs in K. pneumoniae from fermented foods showed striking similarities to those found in the gut, much like the similarities observed in Proteus mirabilis and Morganella morganii. Furthermore, the notable presence of ARGs in Enterobacter hormaechei highlighted the increase of gene transfer activity, as shown by gene transfer to various species in the gut microbiota (Fig. 6). Collectively, these findings strongly support the idea that ARGs in fermented foods can be horizontally transfer to intestinal microorganisms, thereby suggesting complex cross-microbial interactions.
Resistome risk scores were computed using MetaCompare2,[30] which generated two resistance risk assessments: ecological resistome risk (ERR) and human health resistome risk (HHRR). ERR scores account for various pathogens and ARGs, reflecting the potential mobility of ARGs within an environment. In contrast, HHRR specifically focused on pathogens and high-risk ARGs associated with antibiotic-resistant infections in humans (defined as Rank I ARGs by an omics-based framework). Our results indicated that FR exhibited the highest ERR, followed by DC. Meanwhile, DC exhibited a relatively higher HHRR compared to the other groups (Figure S5). Five samples with the highest HHRR in DC were further examined and it was revealed that 3 of them originated from natto in Japan. These results highlighted the diverse antibiotic resistance risks associated with different fermented foods.