Chemicals
SQ and 3-sulfolactate were synthesized by MCAT GmbH (Donaueschingen, Germany). DHPS was synthesized and validated by NMR and HPLC-MS as reported previously18. Taurine, isethionate and all other purchased, routine chemicals were of at least puriss p.a. grade if not otherwise stated.
Human fecal microcosms
Human fecal samples were collected from healthy individuals adhering to ovo-lacto-vegetarian diets and who had not received antibiotics in the prior 6 months. Sampling and microbiota analysis of human fecal samples was approved by the University of Vienna Ethics Committee (reference #00161). Study participants provided informed consent and self-sampled using an adhesive paper- based feces catcher (FecesCatcher, Tag Hemi, Zeijen, NL) and a sterile 107 x 25 mm polypropylene tube with a screw-cap-attached sampling spoon (Sarstedt, Nümbrecht, DE). Samples were kept cool with ice during transport periods and stored at 4°C for up to a 48 h period. Feces from eight individuals were transferred to an anaerobic chamber (Coy, Grass Lake, MI) having a mixed-gas atmosphere of 5% H2, 10% CO2, 85% N2 (Air Liquide, Austria). Fecal materials were pooled and a final mass of 3.9 g was homogenized in 30 mM anaerobic bicarbonate buffer, pH 6.8. Fecal homogenate was distributed into 20 ml hungate tubes across five treatment conditions in triplicate including buffer alone, 10 mM glucose, 10 mM taurine with 10 mM formate, 10 mM sulfoquinovose, and 10 mM sulfoquinovose with 50% D2O (99.9 atom % D, Sigma-Aldrich) having a final volume of 10 ml with a fecal concentration of 19.5 mg/ml. Tubes were sealed with butyl-rubber stoppers and screw-on caps and incubated at 37°C. One ml of each incubation was subsampled at 0, 6, 20, 28, 44, 52, 73, 99, 114, and 140 h for metabolite and nucleic acid analyses. Twenty µl of each subsample was added to 100 µl of 2 g/l zinc acetate for fixation of H2S. Two-hundred µl were fixed with paraformaldehyde (PFA) for FISH-based analyses as described previously 47,48. The remaining sample volume was centrifuged at 20,000 X g at 4°C to pellet biomass and store at -80°C for nucleic acid extraction. For SQ and DHPS quantification, 500 µl of the biomass-free supernatant was transferred to a 2 ml glass vial containing 170 µl acetonitrile (Sigma-Aldrich, Austria) and crimp-sealed with butyl-rubber stoppers and stored at - 20°C. The remaining supernatant (~300 µl) was transferred to a new microcentrifuge tube for quantification of anionic metabolites (e.g. short-chain fatty acids).
Pure culture and defined co-culture experiments
E. rectale DSM 17629 (purchased from the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany) and B. wadsworthia strain 3.1.6 (kindly provided by Dr. Emma Allen-Vercoe, University of Guelph, Canada) were grown under conditions as described previously21,29 (BF, AWF, Oehler SR, BTH, AL, Franchini P, Spiteller D, DS, submitted). Briefly, a carbonate-buffered mineral-salts medium reduced with titanium(III)- nitriloacetate (Ti(III)-NTA) was used under a N2/CO2 gas phase (80:20) in butyl-rubber stopperedserum flasks (20-ml scale) or culture tubes (10-ml scale). All cultures were incubated at 37°C in the dark without shaking. For growth of E. rectale with SQ or glucose as substrates for fermentation, the medium was supplemented with yeast extract (0.1% w/v), hemin (0.5 µg/ml) and vitamin K1 (0.1 µg/ml). For growth of B. wadsworthia with DHPS, taurine, 3-sulfolactate, or isethionate as electron acceptors (10 mM each), the medium was supplemented with 1,4‑naphthochinone (0.2 µg/ml) and with lactate (20 mM) (or formate) as an electron donor. At intervals, growth was monitored as optical density (OD580nm) of the resuspended cultures (aftermixing by inversion), of which samples were taken to determine sulfonate disappearance and formation of products (see below). Cells were harvested from B. wadsworthia 3.1.6 replicate growth experiments in the late exponential growth phase for differential proteomics when grown with the four different organosulfonates (Extended Data Fig. 5c) or for preparation of anoxic cell- free extracts for DHPS sulfite-lyase enzyme tests (Fig. 2b) (described below). For co-cultivation experiments, medium containing SQ as the sole substrate for fermentation and organosulfonate respiration plus all supplements (above) was used and either both strains were co-inoculated at the start of the growth experiment or (Fig. 1f) E. rectale DSM 17629 was inoculated first (n=3) and after 81 hours B. wadsworthia 3.1.6 was co-inoculated.
Anoxic cell-free extracts and enzyme tests
Preparation of anoxic cell-free extracts of B. wadsworthia 3.1.6 and DHPS sulfite-lyase assays were done under conditions as described previously for the isethionate sulfite-lyase (IslA) reaction29. Briefly, cultures were harvested in the late exponential growth phase (OD580 appox. 0.3 – 0.4) by centrifugation of the serum bottles. Cells were resuspended in anoxic Tris-HCl buffer (50mM, pH 8.0) containing MgCl2 (5 mM) and disrupted by three passages through a cooled French pressure cell, which had been flushed with N2 gas. The enzyme was assayed discontinuously at room temperature by monitoring the formation of the two products sulfite and hydroxyacetone. The reaction mixture (1 ml) contained 20 mM DHPS, 1 mM S‑adenosylmethionine chloride (SAM), 1 mM Ti(III)-NTA, and about 0.2 – 0.9 mg total protein in 50 mM Tris-HCl (pH 8.0) containing MgCl2 (5 mM). The buffer, including DHPS and SAM, was initially degassed under vacuum in glass cuvettes with rubber stoppers, and flushed with nitrogen gas, three times each. Then, Ti(III)-NTA was added, and the reaction was initiated by the addition of anoxic crude extract, each with a syringe and needle through the rubber stoppers. At appropriate time intervals, samples (100 μl) were taken by syringes for quantification of sulfite by a colorimetric assay and of hydroxyacetone by derivatization and HPLC-UV analysis (see below).
Differential proteomics
Total proteomic analyses of cell-free extracts of B. wadsworthia 3.1.6 were done as previously described29 at the Proteomics Centre of the University of Konstanz (https://www.biologie.uni-konstanz.de/proteomics-centre/). Briefly, each trypsin-digested and purified sample was analyzed twice on a Orbitrap Fusion with EASY-nLC 1200 (Thermo Fisher Scientific), and tandem mass spectra were searched against the protein database using Mascot (Matrix Science) and Proteome Discoverer v1.3 (Thermo Fisher Scientific) with “Trypsin” enzyme cleavage, static cysteine alkylation by chloroacetamide, and variable methionine oxidation.
Metabolite analyses
H2S was quantified colorimetrically49. Absorbances at 670 nm were measured on a transparent 96- well plate (Greiner Bio-One, Austria) using an Infinite 200 PRO spectrophotometric microplate reader (TECAN Group Ltd., Männedorf, Switzerland). The samples were quantified with external standards prepared in parallel using sodium sulfide (Sigma-Aldrich, St. Louis, Missouri, USA). Biomass-free supernatants were subjected to capillary electrophoresis for separation and quantification of short-chain fatty acids on a P/ACE–MDQ apparatus (Beckman Coulter, Krefeld, Germany) equipped with a UV-detector with a wavelength filter at 230 nm. The capillary was a fused silica column (TSP075375; 75 µm ID, Polymicro Technologies) 60 cm long (50 cm to the detector) and 75 µm in diameter. Samples were processed using a CEofix Anions 5 Kit (Beckmann Coulter, Krefeld, Germany) according to the manufacturer's instructions. Analytes were separated in reverse polarity mode at 30 kV (ramp: 0.5 kV/s) for 10 min. All samples were diluted 1:10 with a working solution consisting of 0.01 M NaOH, 0.5 mM CaCl2 and 0.1 mM caproate (internal standard). Analytes were quantified using an external standard mixture of SQ, Na2SO4, formate, succinate, acetate, lactate, propionate, butyrate, and valerate. SQ, DHPS, 3-sulfolactate, taurine and isethionate were quantified in culture experiments by HPLC against authentic standards, as described previously18, using a hydrophilic interaction ZIC-HILIC (Merck, Darmstadt, Germany) column and an evaporative light scattering detector. Hydroxyacetone, short-chain fatty acids, and alcohols in culture experiments were analyzed by a HPLC method described previously21, using an Aminex column (BioRad) and a refractive index detector. Hydroxyacetone was quantified also by derivatization with dinitrophenylhydrazine and HPLC-UV, using a method described for acetaldehyde29. Sulfite in enzyme reactions was quantified using a colorimetric assay (Fuchsin assay) or by derivatization with N-(9-acridinyl)maleimide and HPLC-UV29.
Fluorescence cell counting
FISH probes targeting E. rectale (EREC996-Cy5, EREC1252-Cy5) and B. wadsworthia (BWA829-Cy3) were designed using ARB50 and evaluated using SILVA TestProbe 3.0 against the SSU REF database (Release 132; December 13, 2017) (Supplementary Table 10)51. Optimal hybridization conditions were evaluated for fluorescently labeled probes with formamide concentrations ranging 0-70% using PFA-fixed cells of pure cultures of E. rectale DSM 17629 and B. wadsworthia DSM 11045 (purchased from the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany). For microcosm cell counts, aliquots of PFA-fixed cells (5 or 20 µl) were filtered onto black polycarbonate filters (0.2 µm pore size, GTBP Isopore Membrane Filters, Millipore, Cork Ireland) using a 25 mm glass vacuum filter holder (Sartorius Stedim Biotech, Goettingen, Germany). One-sixth filter sections were cut and stained with FISH probes using standard procedures47. Samples were stained with DAPI (1 μg/mL; Sigma– Aldrich) for 5 min and washed with deionized water. Filter sections were placed on a glass microscope slide and overlain with one drop of Citifluor AF1 (Electron Microscopy Services, Hatfield, PA) and a coverslip. Cells were imaged using a Zeiss Axioplan 2 equipped with a Plan- Neofluar 100 x oil objective with a 1.3 numerical aperture (Zeiss, Germany). Cells from twelve fields of view were manually counted in a grid measuring 0.1225 mm2.
Single-cell stable-isotope probing by FISH-Raman microspectroscopy
PFA-fixed cells (5-10 µl) from fecal microcosms incubated with 10 mM SQ and 50% D2O were diluted into 100 µl of 1X PBS and sonicated at 35 kHz for 2 min at 20°C (Sonorex, Brandelin Electronic, Berlin, Germany) followed by centrifugation for 20 min at 20°C and 20,913 X g. Liquid-based FISH was performed as previously described23 using FLUOS-labeled probes targeting E. rectale (Supplementary Table 10) and most Bacteria (EUB338mix)52. Stained cell pellets were resuspended in 5-10 µl sterile filtered water. One µl of a 1:10 dilution of each sample was spotted onto an aluminum-coated glass slide (Al136; EMF Corporation) and allowed to air dry, followed by a 2 s wash in cold sterile filtered water. After air drying, fluorescence was used to identify cells for Raman analyses. Single cell Raman spectra were acquired using a LabRAM HR800 confocal Raman microscope (Horiba Jobin-Yvon) equipped with a 532 nm neodymium- yttrium aluminium garnet laser and a 300 grooves/mm diffraction grating. D-labeling (%CD) was quantified from integrated C-D and C-H peak areas (2,040-2,300 and 2,800-3,100 cm-1, respectively) as previously described23. Cells having %CD values greater than the average plus 3X the standard deviation of unlabeled cells, measured at the 0 h time point (threshold 3.54% CD), were considered D-labeled and metabolically active. Significant enrichment of %CD-labeling in FISH-positive E. rectale cells was evaluated using a Wilcoxon rank-sum test comparing FISH- positive cell counts for each time point with the 0 h time point.
Extraction of nucleic acids
DNA for amplicon and metagenome sequencing and RNA for metatranscriptome sequencing were extracted following standard procedures53 that included bead beating for 30 s on dry ice using a Lysing Matrix E tube (MP Biomedicals). Purified nucleic acid pellets were air dried at ambient room conditions and resuspended in 30 µl DNase/RNase-free H2O (Carl Roth GmbH, Germany) and stored at -80°C.
16S rRNA gene and dsrB amplicon sequencing
Established two-step PCR barcoding protocols were used for amplicon sequencing of the 16S rRNA gene (with primers H_515F_mod-5’-(head) GCTATGCGCGAGCTGC- GTGYCAGCMGCCGCGGTAA and H_806R_mod-5’-(head) GCTATGCGCGAGCTGC-GGACTACNVGGGTWTCTAAT that target most Bacteria and Archaea)54 and the dissimilatory sulfite reductase gene dsrB55. Blank nucleic acid extractions and negative (water only) PCR reactions were included as controls. Barcoded amplicon libraries were pooled at equivalent copy numbers (20x109) for 300 bp paired-end sequencing on an Illumina MiSeq sequencer (Microsynth AG, Balgach, Switzerland). Sequencing results were analyzed according to the procedures outlined previously54,55 and clustered into operational taxonomic units (OTU) according to sequence similarity cut-off of 97% using Uparse56, which also screens out chimeras. 16S rRNA amplicons were classified using RDPclassifier57 as implemented in Mothur v1.42.358 using Release 132 of the SILVA database. 16S rRNA gene amplicons that showed significant increases over time in the SQ-amended microcosms were classified to the species-level by phylogenetic inference by constructing a maximum likelihood tree (MEGA7 v7.0.18)59 using a collated dataset of amplicons and their top 10 subjects from querying against the NCBI 16S rRNA reference gene database. Classification of dsrB sequences was performed with phylogenetic placement using a curated DsrB reference sequence database and corresponding consensus tree55,60. 16S rRNA gene and dsrB libraries were analysed using the software package Phyloseq61 for R Studio v1.0.143 (rstudio.com). Statistical tests between treatment time points were performed with DESeq2 using the variance stabilizing transformation (vst)62.
Metagenomics
Samples from three time points (28, 73, and 114 h) from the SQ-amended microcosms were chosen for metagenome sequencing. DNA samples were diluted to 0.1 ng/µl in 130 µl and sheared using a Covaris M220 Focused-ultrasonicator Instrument (Covaris, Woburn Massachusetts, USA) to a target length of 350 bp. Library preparation was conducted according to the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England BioLabs Inc., Ipswich, Massachusetts) protocol. Indexing primers were ligated conforming to NEBNext Multiplex Oligos for Illumina manual (Index Primers Set 1, New England BioLabs Inc., Ipswich, Massachusetts). Library fragment sizes and purity were screened using a Bioanalyzer 2100 and the High Sensitivity DNA kit (Agilent Technologies, Santa Clara, CA). Samples were pooled for 150 bp paired-end sequencing on an Illumina HiSeq 3000/4000 at the Biomedical Sequencing Facility (BSF) of the Research Center for Molecular Medicine (CeMM), Vienna, Austria. Metagenomic datasets consisted of 87, 93, and 158 M reads (28, 73, and 114 h time points, respectively). For assembly and binning of metagenome-assembled genomes (MAGs), adapters and low quality base-calls were trimmed from Illumina reads using BBduk (https://jgi.doe.gov/data-and-tools/bbtools/ - right -trimmed, q=15, minlength = 149 bp). Trimmed reads were error-corrected using BayesHammer63 and assembled using metaSPAdes (SPAdes v3.13.1)64. MAGs were binned using MetaBAT v2.12.2 using composition and abundance under all presets on each dataset65. MAGs were assessed for quality using checkM v1.0.1266 and de-replicated using drep (v1.4.3)67. The highly fragmented and incomplete original Bilophila sp. MAG_14 MAG was missing dslA, however an unbinned 3913 bp contig (99.3% identity to B. wadsworthia 3.1.6) containing the dslA sequence had a similar coverage to the MAG (19.7 vs. 20.4 fold coverage). This contig was tested for consistency with the original MAG through iterative reassembly using MAGspinner v0.10 (github.com/hexaquo/MAGspinner). MAGspinner removes contigs with abnormal coverage and/or tetranucleotide signatures during the reassembly process. The inclusion of the contig containing the dslA sequence in the final bin (30 rounds of reassembly) was interpreted as evidence that the contig belongs to Bilophila sp. MAG_14. Representative MAGs were assigned taxonomy through alignment and phylogenetic placement of concatenated marker genes into the Genome Taxonomy Database reference tree (GTDB-Tk)22. 16S rRNA sequences were extracted from each MAG with nhmmer68 using rfam69 models for bacterial and archaeal small subunit rRNAs (RFAM: RF00177, RF01959). Minimum overlap between sequence and model was 300 nucleotides. Sequences were classified using the RDP classifier57 as implemented in Mothur. The number of tRNAs and associated amino acids were determined using tRNAscan-SE70. Pairwise, whole- genome, average nucleotide identity (ANI) between genomes of Eubacterium rectale and between genomes of 16 representative dsrAB-containing bacteria from the gut, including Bilophila, were calculated using FastANI (v1.2) using >95% ANI and <83% as the intra- and inter-species cutoffs71, respectively.
Metatranscriptomics
Metatranscriptome libraries (n = 9) were prepared from nucleic acid extracts from three timepoints (6, 20, and 52 h) of the triplicate SQ-amended microcosms. DNase digestion of nucleic acid extracts was performed up to three times (ezDNase; Invitrogen, Carlsbad, CA) according to the manufacturer's instructions. Removal of DNA was confirmed using a standard 16S rRNA gene PCR followed by electrophoretic analysis. Total RNA quality was evaluated using a Bioanalyzer 2100 and Agilent RNA 6000 Pico kit (Agilent Technologies, Santa Clara, CA). rRNA was removed from total RNA using the Ribo-Zero Gold rRNA Removal Kit (Epidemiology) (Illumina Inc., San Diego, CA) according to the manufacturer’s instructions. Purified mRNA pools were quantified (Quant-iT RiboGreen RNA Assay Kit, ThermoFisher Scientific) and 1-100 ng was used in each library preparation for multiplexed Illumina RNASeq analysis (NEBNext Ultra RNA Library Prep Kit for Illumina with NEBNext Multiplex Oligos for Illumina primer set 1, New England BioLabs Inc., Ipswich, MA). Indexed libraries were purified from adapter sequences (Agencourt AMPure XP Beads, Beckman Coulter), quantified (Quant-iT PicoGreen RNA Assay Kit, ThermoFisher Scientific), evaluated for size and purity (Bioanalyzer 2100 and the High Sensitivity DNA kit, Agilent Technologies, Santa Clara, CA), and pooled at equimolar contributions. Sequencing (50 bp, single-end) was performed using Illumina HiSeq 3000/4000 (BSF, Vienna, Austria). The reads were demultiplexed and bam files were converted to fastq using SAMtools (v1.9)72 followed with trimming using Trimmomatic (v0.39)73 with the following settings: LEADING:3, TRAILING:3, SLIDINGWINDOW:4:15, MINLEN:40. The trimmed reads were then mapped to the genomes of Eubacterium rectale ATCC 33656 (GCF_000020605.1_ASM2060v1) and Bilophila wadsworthia 3.1.6 (GCF_000185705.2_Bilo_wads_3_1_6_V2) using bbmap (v37.61; sourceforge.net/projects/bbmap/) with a minid of 97%. Feature counting was done using featureCounts within the Subread package (v1.6.3)74; reads that mapped equally well to multiple locations were assigned as fractions of the uniquely mapping reads for the same region (-M -- fraction). After removal of reads that mapped to any rRNA, the raw read counts for all libraries were then converted to copies per million counts (cpm) and normalized using a trimmed mean of M-values (TMM) between each pair of samples using calcNormFactors (edgeR, v3.9)75. Differential expression analysis was done by pair-wise comparison of the samples from 20 h incubation to the samples from the 6 h incubation using the exactTest function (edgeR, v3.9). Genes were considered differentially expressed if the adjusted p-value was below 0.05 (P < .05).
Construction of a bacterial YihQ/SftG profile hidden markov-model (HMM)
Bacterial genomes and MAGs that were publicly available in Genbank as of July 27, 2017 (N=103411) were screened using hmmsearch (v3.1b2)76 for genes that contained the Glycosyl hydrolases family 31 pfam (PF01055.25). The resulting dataset (N=127608) was further screened using a YihQ/SftG-specific motif that we constructed using the signature sulfoquinovosidase residues reported by Speciale et al.27 corresponding to positions 290-508 in YihQ (Genbank accession AAB03011.1) from E.coli: W-X(2)-D-W-X-G-X(6)-G-X(4)-W-X-W-X(1,300)-D-X-G-G-[YWF] (expressed as a PROSITE pattern). Subsequences that matched this motif (N=585, mean length = 225 nt, length standard deviation = 2 nt) were extracted from the full dataset using a simple grep command in a Unix environment and aligned using MAFFT v7.39777. Positions corresponding to the variable-length linker corresponding to positions 308-505 in AAB03011.1 (corresponding to -X(1,300)- in the prosite pattern) were manually removed from the alignment. Remaining subsequences were clustered at 70% identity using Usearch78. Centroid sequences (N=15, length=26) were aligned using MAFFT v7.397. The resulting alignment was used to construct an HMM using hmmbuild (v3.1b2)76. The performance of the hmm for identifying the clade of sulfoquinovosidase proposed by Speciale et al.27 was examined phylogenetically. The set of genes that contained the Glycosyl hydrolases family 31 pfam (PF01055.25, N=127608) were length-filtered (min. 50 amino acids) and clustered at 90% identity with Usearch. Centroid sequences (N=2591) were aligned using MAFFT v7.397 and a phylogenetic tree was constructed in FastTree279. E-values < 1 were diagnostic for the proposed clade of sulfoquinovosidase. Of the 2591 Glycosyl hydrolases family 31 test sequences, 103 belonged to the YihQ/SftG clade and 2488 did not. There were 4 false negatives and 1 false positive, resulting in a balanced accuracy of 0.98.
Genome analyses and comparative genomics
We searched all contigs and MAGs from the SQ-amended microcosm for known SQ metabolism genes using blastp and the YihQ/SftG profile HMM.
To identify publically available genomes that encode the SFT pathway, we searched the NCBI genome database using blastp for organisms that contain all four of the essential genes for the pathway - the sulfoquinovosidase (YihQ/SftG), SQ isomerase (SftI), 6-deoxy-6-sulfofructose transaldolase (SftT), and the SLA reductase (SftR). Each of these genes from the E. rectale ATCC 33656 genome was searched against all NCBI genomes using blastp with -max_target_seqs set to 20,000, an e-value cutoff of 1e-30, and a minimum 70% query coverage. All hits for each gene were then collated, sorted, and cross referenced to identify organisms which had hits for all four genes. Each genome was then checked for co-localization and synteny of the genes of interest to identify those that potentially encoded the complete pathway. This analysis identified 83 bacterial genomes that encode all four core enzymes of the SFT fermentation pathway (Supplementary Table 11).
Additionally, the presence of the SFT gene cluster in 73,798 high-quality MAGs from several thousands of publically available human gut metagenomes28, was searched using a two- step approach. Firstly, all translated open reading frames, as inferred with Prokka v1.14.080/Prodigal v2.6.381, were queried with our YihQ/SftG profile HMM (Supplementary file YihQ-SftG.hmm). Among 1,976 hits, all had an e-value of less than 0.1, and 84% had an e-value of less than 10-5. We did not apply an e-value filter in this step since we expected that the proximity check (see below) removes false positives. Secondly, blast-based searches including the three additional SFT pathway core protein sequences (SftI, SftT, and SftR) were performed against all translated open reading frames of genomes with an YihQ/SftG HMM hit. Physical proximity of the four genes in the genome was checked to validate SQ-gene clusters. We retained blast hits if they were above 75% of the mean reference gene length and had a mean identity of at least 70%. For each query genome and gene, we then kept only that blast hit with the largest e-value. After these steps, all blast hits had an e-value < 10-114. We then considered only those genomes where all blast hits were found on the same contig and ensured that all of them were within a 10-ORF window. 1118 MAGs passed these filters (Supplementary Table 12). The reference genome set used for taxonomic annotation of MAG clusters consisted of 80,853 genomes obtained from GenBank (March 2018) corresponding to 17,607 microbial species for which at least one proteome was available in UniProt. 137 isolate genomes82 were added to this set for a total of 80,990 isolate genomes with confident taxonomic annotation. Concatenated marker gene alignments using representative MAGs from each of the 23 species-level MAG clusters containing the SQ operon were generated using checkM v1.0.1266 and placed into a phylogenomic tree using FastTree279. Genes that enable taurine (tpa), DHPS (dslAB, hpsO, hpsN, dphA), 3-sulfolactaldehyde (slaB), 3- sulfolactate (suyAB, slsC, comC), sulfoacetaldehyde (xsc, sarD), and isethionate (islAB) catabolism and sulfite reduction (dsrABC) were initially identified in genomes of 16 representative dsrAB-containing human gut bacteria by blastp screening all protein coding genes from the genomes of interest against custom amino acid sequence databases established for each gene. The blastp results were filtered with an e-value cutoff of 1e-30 and a minimum 70% query coverage. Neighbourhoods of some genes of interest were checked manually for synteny: tpa with sarD, adhE with islAB, dhpA with slaB, hpsOPN, and slsC with comC.
Metatranscriptome analysis of publically available datasets
Reads from a total of 1,090 paired gut metagenomes/metatranscriptomes were downloaded from https://ibdmdb.org/ (725 paired samples from 105 individuals8) and the sequence read archive at https://www.ncbi.nlm.nih.gov/sra (365 samples from 96 individuals, BioProject accession PRJNA3542356. The downloaded reads, which were already screened for reads originating from the host, were then trimmed using Trimmomatic (v0.39)73 with the following settings:
LEADING:3, TRAILING:3, SLIDINGWINDOW:4:15, MINLEN:80. Taxonomic and functional profiling of the metagenomic and metatranscriptomic samples was done using HUMAnN2 v0.11.183 and MetaPhlAn2 v2.7.784. To assess the contribution of various SFT pathway encoding organisms to total transcription of the SFT pathway, we mapped all of the trimmed reads from the 1090 metatranscriptomes to all of the SFT pathway-encoding genomes we identified in our NCBI search (total of 83) with bbmap (v37.61, sourceforge.net/projects/bbmap/) using a minid of 97%. Additionally, we used the same pipeline and settings to map the trimmed reads from the 1090 metatranscriptomes to dslAB, islAB, and dsrABC genes identified in genomes of 16 representative dsrAB-containing human gut bacteria (see previous section). Feature counting was done using featureCounts within the Subread package (v1.6.3)74; reads that mapped equally well to multiple locations were assigned as fractions of the uniquely mapping reads for the same region (-M -- fraction). Read counts were then converted to reads per kilobase per million reads (RPKM) or reads per kilobase (RPK). To estimate the relative abundance of the SFT pathway in the 1,090 metatranscriptomes, we analyzed all of the metatranscriptome samples using HUMANn2 with default settings and then manually added the RPK (read per kilobase) values from each organism found to be expressing the SFT pathway as well as the sum pathway RPK values to the pathway abundance output from HUMANn2 prior to normalizing all of the RPK values to relative abundance using the humann2_renorm_table script provided in HUMAnN2. Proportional contribution of each organism to total expression of the SFT pathway was calculated as a ratio between the RPKM value for the organism of interest divided by the total RPKM value for the pathway in that sample. We compared mean expression (normalized RPKM values) of the SFT pathway and the DsrAB-DsrC pathway between the different cohorts in the metatranscriptomes (non-IBD vs UC, non-IBD vs CD, and UC vs CD) using a one-factor ANOVA test followed by Tukey’s test; there were no statistically discernible differences in the expression means of both pathways between the different cohorts.
E. rectale transcription network analysis
Downloaded and trimmed reads (processed as described above) from the 1090 gut metatranscriptomes were mapped to the E. rectale ATCC 33656 genome using bbmap (v37.61, sourceforge.net/projects/bbmap/) with at a minid of 97%. Feature counting was done using featureCounts within the Subread package (v1.6.3)74. Samples with greater than ⅝ of all E. rectale ATCC 33656 genes with at least 1 read mapped (n = 244) were selected for network construction.
Raw read counts for these 244 samples were then converted to copies per million counts (cpm) and normalized using a trimmed mean of M-values (TMM) between each pair of samples using calcNormFactors (edgeR, v3.9)75. The normalized matrix was then used as input into SPIEC-EASI (v1.0.5)85 to construct a sparse and compositionally robust gene expression network using the following settings: method='mb', lambda.min.ratio=0.1, nlambda=10, rep.num=60. Combinations of different lambda min ratios (1e-3 to 1e-1) and nlambda (10 to 100) were used to find a stability threshold near the target (0.05); a lambda min ratio of 0.1 and nlambda of 10 yielded an optimal stability threshold of 0.047. Significant correlations between gene pairs (positive or negative interaction strength of >0.099) with any functional prediction (n=1349 unique genes, 2326 gene- pairs) were then imported into Cytoscape (v3.7.1)86 to visualize the network. Significant clusters were identified using ClusterONE87 with the following settings: min. size=4, min.density=auto, edge.weights=unweighted. The list of significantly upregulated E. rectale ATCC 33656 genes from the SQ-amended fecal microcosms (see metatranscriptomics above) was then cross referenced with this network to identify co-expressed gene clusters that respond to SQ amendment.
Phylogenetic and phylogenomic tree construction
Phylogenomic analysis of the 93 MAGs from the fecal incubations was performed with GToTree (v1.4.2)88 using 74 bacterial single-copy genes and a minimum 70% hit fraction for a MAG to be included in the analysis. With this restriction, 66 MAGs were retained for further analysis. FastTree279 was then used to construct approximate maximum likelihood trees using the JTT+CAT model.
Comparative phylogenetic analysis of the glycoside hydrolase family 31 (which includes YihQ/SftG sulfoquinovosidases) was done by downloading the glycoside hydrolase family 31 seed alignment from the pfam database (780 amino acid sequences), aligning the sequences using MAFFT (v7.427)77 (FFT-NS-i-x2 strategy), and then adding 219 putative novel YihQ sequences (70 characterized glycoside hydrolase family 31 proteins, 70 amino acid hits from the Uniprot database using the yihQ HMM model at an e-value of 1e-6, and 79 unique YihQ homologs from the 83 genomes encoding the SFT pathway) and an additional 8696 glycoside hydrolase 31 sequences using the --add and --keeplength options. The 8696 glycoside hydrolase family 31 sequences correspond to 9126 amino acid sequences from uniprot reference proteomes that matched the glycoside hydrolase HMM with an e-value of 1e-40 or less and then clustered at 90% identity using usearch. TrimAl (v1.4.rev15) was used in strict mode to trim the alignment to a final length of 284 amino acid positions. An approximate maximum likelihood tree was then constructed from the alignment using the JTT+CAT model.
For comparative phylogenetic analysis of pyruvate formate-lyase like enzymes and to distinguish isethionate sulfite-lyases (IslA) from DHPS sulfite-lyases (DslA), we retrieved amino acid sequences of known choline trimethylamine-lyases, 4-hydroxyphenylacetate decarboxylases, hydroxyproline dehydratases, aryl- alkyl-succinate synthases, glycerol dehydratases, and pyruvate formate lyases from the KEGG ligand enzyme nomenclature database (abbreviation E.C.) and clustered them at 90% identity using usearch (v11.0.667)78. All of the sequences were aligned with MAFFT (v7.427)77 using the FFT-NS-i-x2 strategy. 23 glycyl radical enzyme sequences, including known isethionate sulfite-lyases, from genomes of members of the family Desulfovibrionaceae originating from the human gut were added to the 1204 reference amino acid sequence alignment with MAFFT using the --add and --keeplength options. TrimAl (v1.4.rev15)89 was used in strict mode to trim the alignment to a final length of 569 positions. An approximate maximum likelihood tree was then constructed from the alignment with FastTree2 using the JTT+CAT model.