Subject details
This study involved 13 patients with COVID-19 and 21 healthy controls. The associated clinical data and metadata are provided in Table S1 and Fig. 1A. SARS-CoV-2 infection was confirmed by two consecutive real-time reverse transcription PCR (RT-PCR) tests. These patients were classified into three groups according to the severity of their symptoms: mild (mild clinical symptoms without pneumonia manifestations in CT imaging (7 patients)); moderate (respiratory symptoms, fever, and imaging features of COVID-19 pneumonia (5 patients); severe (respiratory distress (respiratory rate ≥ 30 breaths/min), oxygen saturation ≤ 93% and arterial oxygen tension (PaO2)/fractional inspired oxygen (FiO2) ratio ≤ 300 mm Hg (1 patient)). We collected 53 stool samples from COVID-19 patients with a range of one to nine longitudinal time-points that occurred 1–94 days post symptom onset (Fig. 1A, Table S1). Stool samples from 21 healthy subjects severed as controls. A total of 73 stool samples were subjected to multi-omics analysis.
Metaproteomics sample preparation
Fecal sample (~ 150 mg) was lysed by boiling for 5 min at 95°C in 800 µl of lysis buffer (6 M Guanidinium hydrochloride (GdmCl), 10 mM tris(2-carboxyethyl)phosphine (TCEP), 40 mM chloroacetamide, 100 mM Tris pH 8.5) in Eppendorf protein LoBind tubes. The lysate was then sonicated for 15 min using a waterbath sonicator. The crude protein extract was centrifuged at 16,000 g for 5 min with the clarified lysate subjected to ultrafiltration (cut off 30 kD, Millipore), diluted 1:10 with dilution buffer (10% (v/v) acetonitrile (ACN), 25 mM Tris pH 8.5) containing 1 µg sequencing grade trypsin (1/50, w/w), and digested overnight at 37°C. The digest was acidified to an end-concentration of 1% trifluoroacetic acid (TFA) and debris were removed after centrifugation at 16,000 g for 5 min. Finally, the peptides were desalted on StageTips assembled by Empore C18 disk, dried using a SpeedVac centrifuge at 45°C, and suspended in 2% ACN and 0.1% formic acid (FA).
Metaproteomics data acquisition
Peptides were trapped onto an Acclaim PepMap 100 C18 column (75 µm × 20 mm, 3 µm, 100 Å, Thermo Scientific) at a flow rate of 8 µL/min and separated using an Acclaim PepMap 100 C18 column (75 µm × 250 mm, 2 µm, 100 Å, Thermo Scientific) at 300 nL/min on a Thermo Scientific Dionex UltiMate 3000 RSLCnano LC system. Mobile phase solvents were 0.1% formic acid in water (A) and 0.1% formic acid in 80% acetonitrile (B). The separation gradient was as follows: 15% B rising to 30% B at 130 min, rising to 98% B at 137 min, and keeping at 98% B for 27 min. Finally, the separation column was equilibrated using 15% B for 6 min. The trap column was switched online with the separation column at 3 min and switched back to load position at 168 min. Data were acquired on an Orbitrap Fusion Lumos Tribrid mass spectrometer with a Nanospray Flex ion source in positive ionization mode with a spray voltage of + 2600 V using Xcalibur software (Thermo Scientific, San Jose, CA, USA). The ion transfer tube temperature was 300°C, the vaporized temperature was 325°C, the sheath gas flow was 40 units, the auxiliary gas flow was 15 arbitrary units, and the sweep gas was 1 unit. Full scan MS spectra was acquired in the 400 − 1,600 m/z range with an AGC target of 5 × 104, a maximum injection time of 50 ms, and a resolution of 60K at m/z 200. MS/MS spectra were acquired using higher-energy collisional dissociation (HCD) with a normalized collision energy (NCE) of 30% and a resolution of 15K with an AGC target of 5× 104 and a maximum ion injection time of 22 ms.
Glycopeptide enrichment
Glycopeptides were enriched using hydrophilic interaction liquid chromatography (HILIC) cartridges packed with the C18 plug followed by microcrystalline cellulose resins [45]. The resin was washed with 300 µL of 0.1% TFA and initialized using 300 µL of 0.1% TFA in 80% acetonitrile. After loading ~ 200 µg of peptides in 300 µL 80% acetonitrile/0.1% TFA), the resin was washed with 80% acetonitrile/0.1% TFA three times to remove non-specific peptides. Then glycopeptides were eluted by 300 µL of H2O, followed by 200 µL of 80% acetonitrile. Peptides were dried using a SpeedVac centrifuge at 45°C, and suspended in 2% ACN and 0.1% formic acid (FA).
Glycoproteomics data acquisition
Peptides were trapped onto an Acclaim PepMap 100 C18 column (75 µm × 20 mm, 3 µm, 100 Å, Thermo Scientific) at a flow rate of 8 µL/min and separated using an Acclaim PepMap 100 C18 column (75 µm × 250 mm, 2 µm, 100 Å, Thermo Scientific) at 300 nL/min. Mobile phase solvents were 0.1% formic acid in water (A) and 0.1% formic acid in 50% acetonitrile and 40% isopropanol (B). The separation gradient was as follows: 5% B at 0–15 min, 20%-30% B at 90–100 min, and 98% B at 107 min and kept for 20 min. Data were acquired on an Orbitrap Fusion Lumos Tribrid mass spectrometer with a Nanospray Flex ion source in positive ionization mode with a spray voltage of + 2600 V using Xcalibur software (Thermo Scientific, San Jose, CA, USA). The ion transfer tube temperature was 300°C, the vaporized temperature was 325°C, the sheath gas flow was 40 units, the auxiliary gas flow was 15 arbitrary units, and the sweep gas was 1 unit. Full scan MS spectra was acquired in the 400 − 1,600 m/z range with a maximum injection time of 50 ms and a resolution of 60K at m/z 200. MS/MS spectra were acquired using HCD with stepped NCE at 20%, 30%, and 40% to generate fragment ions of both glycan and peptide of a glycopeptide in a single spectrum MS/MS spectra. The resolution of HCD was 15K with a maximum ion injection time of 22 ms.
Metabolomics sample preparation
Samples were prepared from ~ 150 mg stool samples by protein precipitation with the addition of 1 ml of 80% methanol. Samples were centrifuged (10 min, 16,000g, 4°C) and the supernatants were evaporated to dryness under nitrogen, reconstituted in 150 µl of 0.1% formic acid in 5% acetonitrile, and kept at -80°C until analysis.
Metabolomics data acquisition
Metabolic extracts were separated on a Thermo Scientific Dionex UltiMate 3000 Rapid Separation LC (RSLC) using an ACQUITY UPLC HSS T3 analytical column (2.1 × 150 mm, 1.8µm, 100 Å, Waters) protected by an ACQUITY UPLC HSS T3 VanGuard pre-column (2.1 × 5 mm, 1.8 µm, 100 Å, Waters). Mobile phase solvents for positive ionization mode were 0.1% formic acid in water (A) and 0.1% formic acid in acetonitrile (B); mobile phase solvents for negative ionization mode were 0.01% formic acid in water (A) and acetonitrile (B). The following gradient elution was used: 0–3 min, 95% A; 5–13 min, 80%-30% A; 15–18 min, 2% A; 18.1–22 min, 5% A. The flow rate was 0.3 mL/min, the injection volume was 2 µL and the column oven was set at 35°C. Data were acquired on an Orbitrap Fusion Lumos Tribrid mass spectrometer fitted with a HESI source in both positive and negative ionization modes with an independent run for each polarity and a spray voltage of + 3500 V and − 3500 V, respectively (Thermo Scientific, San Jose, CA, USA). The ion transfer tube temperature was 300°C, the vaporized temperature was 350°C, the sheath gas flow was 40 units, the auxiliary gas flow was 15 arbitrary units, and the sweep gas was 1 unit. Metabolite profiling was profiled in full scan mode using a mass range of m/z 100–1000 with a resolution of 120K at m/z 200, an AGC target of 5 × 104, and a maximum injection time of 50 ms. For metabolite identification, data dependent MS/MS data were acquired on quality control samples (QC) containing equally volumes of all samples used in this study. In-depth MS/MS was performed using nine staggered gas-phase fractionations (sGPFs) to allow more homogeneous selection of precursor ions in low, medium, and high m/z ranges [46]. This was achieved in nine separated LC-MS runs: (run 1) 100–110, 200–210, 300–310, 400–410, 500–510, 600–610, 700–710, 800–810; (run 2) 110–120, 210–220, 310–320, 410–420, 510–520, 610–620, 710–720, 810–820; (run 3) 120–130, 220–230, 320–330, 420–430, 520–530, 620–630, 720–730, 820–830; (run 4) 130–140, 230–240, 330–340, 430–440, 530–540, 630–640, 730–740, 830–840; (run 5) 140–150, 240–250, 340–350, 440–450, 540–550, 640–650, 740–750, 840–850; (run 6) 150–160, 250–260, 350–360, 450–460, 550–560, 650–660, 750–760, 850–860; (run 6) 160–170, 260–270, 360–370, 460–470, 560–570, 660–670, 760–770, 860–870; (run 7) 170–180, 270–280, 370–380, 470–480, 570–580, 670–680, 770–780, 870–880; (run 8) 180–190, 280–290, 380–390, 480–490, 580–590, 680–690, 780–790, 880–890; (run 9) 190–200, 290–300, 390–400, 490–500, 590–600, 690–700, 790–800, 890–900. Each sGPF LC-MS run was performed twice. Quadrupole isolation window was 1.4 m/z and dynamic exclusion was enabled for 10s. The stepped NCE at 10%, 25%, and 40% was employed to obtain information-rich MS/MS spectra. The run order was the blank first (0.1% formic acid in 5% acetonitrile), pooled QC samples for DDA-MS/MS, and a pooled QC every 12 randomized clinical samples.
Lipidomics sample preparation
Extraction of lipids started with the addition of 1 mL of methanol to 150 mg of fecal samples and the tube was vigorously shaken with a vortex for 30 s [47]. Subsequently, 5 mL of methyl tertbutyl ether was added, vortexed for another 30 s, and shaken for 20 min at 200 rpm at room temperature. Next, phase separation was induced by adding 3 mL of ultrapure water with 2.5% trichloroacetic acid (w/v) and centrifugation for 5 min at 3000 rpm. Thereafter, 1 mL of the upper layer (consisting of methyl tert-butyl ether) was transferred and evaporated to dryness at 37°C under a gentle stream of nitrogen. The residue was sequentially resuspended in 250 µL of chloroform and 650 µL of methanol.
Lipidomics data acquisition
Lipid extracts (2 µL) were separated on a Thermo Scientific Dionex UltiMate 3000 Rapid Separation LC (RSLC) using an ACQUITY UPLC HSS T3 analytical column (2.1 × 150 mm, 1.8µm, 100 Å, Waters) protected by an ACQUITY UPLC HSS T3 VanGuard pre-column (2.1 × 5 mm, 1.8 µm, 100 Å, Waters). Mobile phase solvents A and B were ACN:H2O (6:4 v/v) and isopropanol: ACN (9:1 v/v), respectively, both contained 10 mM ammonium acetate and 0.1% acetic acid. The separation was performed at 55°C with a flow rate of 0.35 mL/min using the following gradient: 0–3.0 min, 30%-35% A; 5.0–14.0 min, 65%-98% A; 18.0-18.1 min, 98%-30% A; 18.1–22.0 min, 30% A. Data were acquired on an Orbitrap Fusion Lumos Tribrid mass spectrometer fitted with a HESI source in both positive and negative ionization modes with an independent run for each polarity and a spray voltage of + 3500 V and − 3500 V, respectively (Thermo Scientific, San Jose, CA, USA). The ion transfer tube temperature was 300°C, the vaporized temperature was 350°C, the sheath gas flow was 40 units, the auxiliary gas flow was 15 arbitrary units, and the sweep gas was 1 unit. Lipid profiling was profiled in DDA mode using a full MS scan range of m/z 150–2000 (resolution was 60K at m/z 200) with top ranked precursor ions subjected to DDA-MS/MS using a maximum injection time of 22 ms. The stepped normalized collision energy (NCE) at 25, 30, and 35 was employed to obtain information-rich MS/MS spectra with a resolution of 15K at m/z 200. Quadrupole isolation window was 1.6 m/z and dynamic exclusion was enabled for 10s. To promote lipid identification, in-depth DDA MS/MS of QC sample was performed using the following four sGPFs which was performed in four separated runs (Yan and Yan, 2015): (run 1) 150–250, 550–650, 950–1050, 1350–1450, 1750–1850; (run 2) 250–350, 650–750, 1050–1150, 1450–1550, 1850–1950; (run 3) 350–450, 750–850, 1150–1250, 1550–1650; (run 4) 450–550, 850–950, 1250–1350, 1650–1750.
Metaproteomics data analysis
Peptide identifications were performed using the search engine PEAKS DB combined with PEAKS de novo sequencing [48] (De Novo ALC(%) threshold was 15). False discovery rate (FDR) was set to 1% using the decoy fusion approach. Raw files were refined by precursor ion mass correction and resolving chimeric MS/MS spectra. The precursor mass tolerance was set to 15 ppm and the fragment mass tolerance to 0.03 Da. Enzyme specificity was set to trypsin and up to three missed cleavage sites were allowed. The maximum number of variable posttranslational modifications per peptide was three, including acetylation of protein N-terminus, carbamidomethylation of Cys, oxidation of Met, deamidation of Asn and Gln as well as Pyro-glu from Gln. PEAKS PTM search tool [49] was used to search for peptides with unspecified modifications (313 built-in post-translational modifications), and the SPIDER [50] search tool was used for exploring novel peptides that are homologous to peptides in the protein database.
Database search was performed using a comprehensive meta-database containing human, microbial, and dietary organism sequences [51]. The gut microbial protein database was generated by combining the following parts: (1) the integrated gene catalog of 1,267 human fecal metagenomes [52]; (2) the 1,520 reference genomes of > 6,000 cultivated human fecal bacteria isolates [53]; (3) the genomes of 215 human fecal bacteria isolates [54]; (4) all Archaea, Bacteria, and Fungi sequences in NCBI RefSeq (Release 90) and UniProtKB (Release 2017_06). The microbial database was appended by the SARS-COV-2 protein sequences [55], a UniProt human reference proteome (downloaded on 2017_06), and a food database of common dietary organisms. A total number of 130,975,891 non-redundant sequences were obtained after dereplicating at 100% amino acid identity using USEARCH v11.0.667 (–fastx_uniques) [56]. Proteins identified by at least one unique peptides (1% false discovery rate (FDR) using the decoy fusion approach) was considered for further analysis. Label-free quantification of protein groups was performed based on the number of peptide spectrum matches (PSM).
Taxonomy and functional analysis of gut microbiota
Taxonomy and functional analysis of peptides was performed with UniPept (version 4.3.7) [57] based on the lowest common ancestor (LCA) algorithm using the following parameters: Equate I and L, Advanced missing cleavage handling. Peptide functional annotations were performed using Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. The relative abundance of microbial taxonomic and functional groups were determined using the normalized number of corresponding peptides. Functions of the unannotated microbial proteins were predicted using protein-protein BLAST (BlastP, https://blast.ncbi.nlm.nih.gov/Blast.cgi) against the non-redundant protein sequences (nr) with an E-value threshold of 1e-10.
Glycoproteomics data analysis
High-confidence identification of intact N-glycopeptides was performed by pGlyco 2.0 [20]. Sequences of proteins identified in the above metaproteomics analysis as well as the SARS-COV-2 protein sequences were used in glycopeptide identification. The precursor mass tolerance was set to 10 ppm and the fragment mass tolerance to 20 ppm. For N-glycopeptide analysis, a FDR of 5% at the glycan level and a FDR of 1% at the peptide level was used. Match between run and intensity based label-free quantification was not supported by pGlyco. To calculate the frequency of N-glycosylation at a specific site, any sample containing a glycopeptide bearing an N-glycan at that site was considered positive regard less of the variations of glycopeptide sequences and glycans. To calculate the frequency of a specific N-glycan at a specific site, any sample containing a glycopeptide bearing that glycan at that site was considered positive regard less of glycopeptide sequence variations. The frequency was defined as the number of positive samples relative to the total number of samples in each group (18 for the control and 49 for the COVID-19 groups, respectively).
Global identification of both O-glycopeptides and N-glycopeptides were performed using PEAKS. Acetylation (protein N-term), carbamidomethylation, deamidation (NQ), pyro-glu from Q, oxidation (M), HexNAcylation (ST), Hex1HexNAc1, Hex1HexNAc2, Hex2HexNAc2, Hex3HexNAc2, Hex1HexNAc(3), Hex(2)HexNAc(3), Hex(3)HexNAc(3), Hex(4)HexNAc(2), Hex(6)HexNAc(2), Hex(7)HexNAc(2), Hex(8)HexNAc(2), Hex(9)HexNAc(2), and dHexHex(3)HexNAc(2) were set as variable posttranslational modifications. The precursor mass tolerance was set to 10 ppm and the fragment mass tolerance to 0.02 Da. The PEAKS output file gave the monosaccharide composition of the attached glycan. Label-free quantification was performed using match between runs with a mass error tolerance of 20 ppm and a retention time shift tolerance of 1 min.
Metabolomics data analysis
Metabolomics features were extracted, aligned, identified and quantified using Compound Discoverer (v3.1, Thermo Fisher Scientific). The analysis employed the following major steps and parameters: retention time alignment (alignment model = adaptive curve, mass tolerance = 5 ppm, maximum shift = 2 min), unknown compound detection (mass tolerance = 5 ppm, intensity threshold = 30%, S/N threshold = 3, minimum peak intensity = 1 × 106, adducts ions = [M + H]+1, [M + H-H2O]+1, [M + H-NH3]+1, [M + K]+1, [M + Na]+1, [M + NH4]+1, [2M + H]+1, [2M + K]+1, [2M + Na]+1, [2M + NH4]+1, [M + 2H]+2, [M-H]−1, [M-2H]−2, [M-H + HAc]−1, [M-H-H2O]−1, [M-H + FA]−1, [M + Cl]−1, [2M-H]−1, [2M-H + HAc]−1), compound grouping (mass tolerance = 5 ppm, RT tolerance = 0.2 min), prediction of elemental compositions (mass tolerance = 5 ppm, maximum element counts = 90 × C, 190 × H, 10 × N, 15 × O, 5 × S and 3 × P), filling gaps across all samples (mass tolerance = 5 ppm,, S/N threshold = 1.5), chemical background subtraction (using blank samples), identifying compounds by searching ChemSpider (by formula or mass, https://www.chemspider.com/), mzVault and mzCloud (by MS and MS/MS data, precursor mass tolerance = 10 ppm, fragment mass tolerance = 10 ppm, match factor threshold = 60, https://www.mzcloud.org), and QC-based batch normalization (regression model = Cubic Spline). The mzCloud and mzVault match were performed base on Similarity Forward method and HighChem-HighRes search algorithm, respectively. Extracted ion chromatogram (EIC) and MS/MS spectra of all metabolites of interests were manually inspected.
Lipidomics data analysis
Raw data files were processed using the LipidSearch software (version 4.1) (Thermo Fisher Scientific) to identify and quantify lipid molecular species. Peak detection was performed as follows: Recalc Isotope, on; RT interval (min), 0.01. Lipid identification was as follows: Search type, Product; Exp type, LC-MS; Precursor tolerance, 10 ppm; Product tolerance, 10 ppm; Intensity threshold, 1.0%; Target class, ALL lipid classes; Ion adducts (positive ion mode) of + H, +NH4, +Na, +H-H2O, and + 2 H; Ion adducts (negative Ion mode) of − H, +HCOO, +CH3COO, +Cl, and − 2H; Top rank filter, On; Main node filter, Main isomer peak; m-Score threshold, 5.0; FA priority, On. ID Quality Filter, Check A, B, C, D (A: lipid class and FA are identified, B: lipid class and some FA were identified, C: Lipid class or FA were identified, D: Lipid identified by other fragment ions (H2O loss, and other non-specific neutral losses). Quantitation was performed using a m/z tolerance of −/+ 5.0 ppm and a RT range of − 0.5/+ 0.5 min. Peak alignment was performed using the following parameters: Alignment Method, Max; RT Tolerance, 0.25 min; Calculate unassigned peak area, On; Filter type, New filter; Top rank filter, On; Main node filter, Main isomer peak; m-Score threshold, 5.0; ID quality filter, A, B and C.
Statistical analysis
The raw quantification data matrix of different omics was imported to MetaboAnalyst [58] for further processing and analysis. Data filtering was performed using interquantile range (IQR) to remove baseline noises. Missing values were imputed using KNN. Quantile normalization and pareto scaling were employed. Unsupervised multivariate data analysis was performed using principal component analysis (PCA) and hierarchical cluster analysis (HCA). Significantly differentiated omics features between COVID-19 and control groups (should present in at least 50% of samples) were detected using Wilcoxon’s rank sum test (FDR adjusted p value (q) < 0.05). Microbial taxonomic and functional groups were normalized by total abundance. Statistical significance of microbial taxonomic groups was calculated using Mann-Whitney U test (p value < 0.05). Protein-microbiome and metabolite-microbiome correlations were determined using Pearson correlation (q < 0.05) using R. The association between differentiating omics features with categorical confounding variables (gender and medicine) were determined using Wilcoxon’s rank sum test. The association between differentiating omics features with continuous confounding variables (age and Body Mass Index (BMI)) were determined using Pearson correlation.