Study Participants and Ethics Declaration
The study participants are healthy children of 1–2 years of age. Blood samples were collected during participant’s annual wellness health check visit in the pediatric clinic at Children’s hospital Dallas Texas, USA. All research protocols and experimental methods used in the present study were approved by the University of Texas Southwestern Medical Center Review Board (IRB#: STU-2016-081). All participants provided their written and informed consent to participate in the present research study. The demographics and medical records were collected for all the participants.
Blood DNA Extraction
DNA was extracted from 20–200 microliter of blood using ZymoBIOMICS DNA Miniprep Kit (D4300, Zymo Research, CA, United States) according to manufacturer’s instructions. The blood was treated with Proteinase K (20 mg/mL) at 55°C for 30 minutes for unbiased metagenome extraction as per manufacture’s recommendation. The DNA concentration was measured using the Picogreen method (Invitrogen Quant-iT™ Picogreen dsDNA Assay Kit Reference No. P11496 on Perkin Elmer 2030 Multilabel Reader Victor X3) and DNA integrity number (DIN) was determined on 4150 Tapestation from Agilent using Agilent’s gDNA Screen Tape (Reference No. 5067–5365) and Agilent’s gDNA Reagents (Reference No. 5067–5366).
16S rRNA v3-v4 amplicon sequencing
The 16S rRNA gene, a component of the prokaryotic ribosome, is conserved in most of the bacteria and archaea. This gene is about 1,500 bp long with nine hypervariable regions (V1–V9). Bacterium specific nucleotide polymorphisms in these hypervariable intervals allow assessment of relative abundance of various bacteria or OTUs (Operational Taxonomic Units) in a given specimen. 16S rRNA gene sequencing is widely used to profile the microbiome using next-generation sequencing (NGS) (14). The 16S gene specific primers selectively amplify prokaryotic fragments from total DNA, thus enriching for bacterial DNA sequences. We used 20 ng of purified DNA to amplify hypervariable region V3-V4 of the bacterial 16S rRNA gene using Zymo Research Quick-16S NGS Library Prep Kit (Catalog No. D6400). The PCR was stopped using the Reaction Clean-Up Solution per the kit’s recommendations. Then the libraries were brought to a volume of 50ul with the addition of DNase/RNase free water and purified using 56ul of Beckman Coulter’s AMPure XP beads (Catalog No. A63881). The final libraries were eluted using 27.5ul of Tris-HCl buffer 10mM, pH 8.5. Quality and quantity of each sequencing library were assessed using Agilent’s 4150 Tapestation using gDNA Screen Tape and gDNA Reagents both from Agilent and picogreen measurements, respectively. The libraries were then pooled in equal concentrations according to picogreen measurements. Each pool was quantified using KAPA Biosystems Library Quant Kit (Illumina) ROX Low qPCR Mix (Reference No. 07960336001) on an Applied Biosystems 7500 Fast Real-Time PCR system. According to the qPCR measurements, 6 pM of pooled libraries was loaded onto a MiSeqDX flow cell and sequenced using MiSeq Reagent Kit v3 600 Cycles PE (Paired end 300 bp). Raw FASTQ files were demultiplexed based on unique barcodes and assessed for quality.
Full length 16S rRNA gene sequencing
For high resolution species level taxonomic classification of bacterial signature, we performed complete 16S rRNA gene sequencing using MinION, Oxford Nanopore Technology (ONT). About 100 ng of high molecular weight DNA was used to amplify and barcode the entire (1.5Kb) segment of 16S rRNA gene using Nanopore’s 16S Barcoding Kit (SQK-16S024). The PCR product was then purified using Beckman Coulter’s AMPure XP beads (Catalog No. A63881). The purified amplicon libraries were quantified using the Qubit dsDNA HS Assay kit (Catalog No. Q32854) on an Invitrogen Qubit 4 fluorometer. Then sequencing adapters were ligated to the pooled barcoded libraries according to the manufacturer’s instructions. Sequencing was performed using R9.4.1 flow cell for 24 hours on the MinION, from Oxford Nanopore Technologies (ONT). Nanopore sequencing data was analyzed with EPI2ME Agent v2020.2.10. 16S sequences were assigned taxonomy using Whats In My Pot (WIMP) workflow of ONT.
Amplicon and full-length 16S sequencing data analysis
Samples with QC pass sequencing reads were used for 16S OTU analysis. Out of total sequencing data per sample, 100K randomly subsampled sequencing reads were used for taxonomic classification and Operational Taxonomic Units (OTUs) analysis to determine relative abundances using the CLC Bio microbial genomics module (https://www.qiagenbioinformatics.com/plugins/clc-microbial-genomics-module/). Greengenes v13 database at 97% similarity index was used for annotation. Alpha and beta diversity analysis was done to understand within- and between-treatment group diversity, respectively. Nanopore 16S data was analyzed using EPI2ME pipeline and WIMP (What’s In My Pot) workflow from Oxford Nanopore Technology (ONT). Raw FASTQ files from Illumina and Nanopore sequencing have been deposited in the Sequence Read Archive (SRA) with accession no. PRJNA970378. Details on other methods are provided in Supplemental Methods.
Quantitative Real-Time PCR (qPCR) for Streptococcus
To validate the 16s sequencing detected Strep DNA signal in the blood, we performed qPCR analysis on subset samples. Each reaction for qPCR consisted of 10ul of KAPA SYBR FAST qPCR Master Mix (2x), 10uM forward primer, 10uM reverse primer with total volume of 20 ul. The qPCR was performed with a 7500 Fast Real-Time PCR system (Applied Biosystems) with the following 2-step conditions: hot-start of 1 cycle at 950C for 3 min, then 40 cycles of denaturation at 950C for 3 seconds and combined annealing/extension at 600C for 30 seconds. The qPCR primer pairs used for detecting Streptococcus (rpoB) are as following: Forward Primer 5’-- GAA CGG CGT TGA GCA TGA AG-3’, Reverse Primer 5’- GCA ACG ATT GGA CGT TGG TT-3’. The DNA from cultured Streptococcus was used as a positive control, and the nuclear-free water was used as negative control to set qPCR base signals. These results are presented in Suppl. Figure 2. We also used oral commensal species i.e. Streptococcus oralis and Streptococcus mitis specific primer to confirm the species specific qPCR signals using GDH1 primer described by others (15).
Blood RNA Sequencing
Total RNA was extracted from 100 microliter whole blood sample using the ZymoBIOMICS Direct-zol Microprep Kit (Catalogue No: R2060, Zymo Research, CA, United States) according to Manufacturer’s instructions. About 100 microliter blood sample were subjected to mechanical lysis using ZR BashingBead lysis tubes (Catalogue No: ZS6003, Zymo Research, CA, United States) followed by Proteinase K (20 mg/mL) at Room Temperature for 30 minutes for improved RNA yield and quality. The quality and quantity of RNA was checked on Nanodrop2000 UV-Vis Spectrophotometer (Thermoscientific, CA, USA) and RNA integrity number (RIN) was determined on 4150 Tapestation from Agilent using Agilent’s RNA Screen Tape (Reference No. 5067–5576) and Agilent’s RNA Reagents (Reference No. 5067–55577).
The total input of 100ng RNA was used for the library preparation on each sample. The library was prepared using Zymo-Seq RiboFree® Total RNA Library Kit (Zymo Research, CA, United States) according to Manufacturer’s protocol. The library indexing with Zymo-Seq UDI primers and pooling were also done according to Manufacturer’s instructions. Quality and quantity of each sequencing final library were assessed using Agilent 2100 BioAnalyzer system (Agilent, CA, USA). The pooled libraries were sequenced on Illumina NexSeq platform in SE75 (75 bp single end) run with the NextSeq reagent kit v2.5 for 75 cycles. About 30–40 Million sequencing reads were generated per sample for the transcriptome analysis.
Multiplex Cytokine Quantification
The serum separated from blood samples was used for cytokine quantification using 48-flex magnetic microspheres (bead) based luminex assay (Bio-Plex pro Human cytokine Assay, Bio-Rad). It assays 48 different pro-inflammatory and anti-inflammatory cytokines. For each blood sample, 15uL of serum was used in each well of a 96-well plate. Analytes were quantified using a Luminex analytical instrument which utilizes xMAP technology (Luminex Corp., Austin, TX, USA) and Bioplex Manager (Biorad) Concentrations of cytokines (pg/ml) were determined based on the fit of a standard curve for mean fluorescence intensity versus pg/ml.
Antibody measurements in serum
Serum IgG and IgM reactivity was tested with a microfluidic antigen array comprising 110 antigens, including 53 autoantigens, 26 pathogen agents, 17 vaccine antigens, 14 allergens along with two controls (HEL, BSA). Serially diluted human IgG and IgM were added as internal controls. In addition, serial diluted anti-human IgG was added in 4 additional wells to calculate the relative serum IgG levels. This customized antigen panel was named as Kids Panel I and is available from the Microarray and Immunophenotyping Core Facility at University of Texas Southwestern Medical Center. The 110 antigens were printed in duplicate on nitrocellulose membrane-coated slides. Each slide can be used with 16 samples, 15 different serum samples were compared in one slide along with a PBS control (16, 17). Such microfluidic antigen arrays were developed as previously described (17, 18). In brief, 2 µl of serum was diluted 1:100 in PBS with 0.01% Tween 20 and applied to the slide array using microfluidic applications. After washing, IgG and IgM isotype antibodies binding to the diverse antigens were detected with Cy-3 conjugated anti-human IgG (1:1000) and Cy-5 conjugated anti-human IgM (1:1000) secondary antibodies (Jackson ImmunoResearch Laboratory). Array slides were scanned with GenePix 4400A scanner (Molecular Devices) to generate images for each array. This was converted to Genepix Report file (GPR) using Genepix Pro7.0 software (Molecular Devices). The averaged fluorescent signal intensity of each antigen was subtracted by local background and the PBS control signal. The net fluorescent intensity of each antigen was normalized by robust linear model (19) using internal positive controls with different dilutions to generate the normalized fluorescent intensity (NFI) values. The heatmaps were generated by R Heatmap package. The ward’s hierarchical cluster analysis was performed by R cluster package and optimal number of clusters (in our case, k = 3) was determined by gap statistics.
Bioinformatic Analysis
The 16S V3-V4 short reads sequenced on the MiSeq platform were processed using the CLC Genomics workbench v20.0.04 (CLC Bio, Denmark). The quality metric used for the analysis was the samples with less than 100,000 raw reads were not included in the analysis and only 100,000 random reads from each sample were subjected for the taxonomic assignment. Qualified and trimmed reads were alone mapped to reference 16S rRNA SILVA database (SILVA SSU 138.1 - Ref). Microbe identification and Taxonomic assignment was done with K-mer spectra application (Microbial Genomics Module, CLC genomics workbench). In addition, the short-read sequences were analyzed on QIIME2 v2022.2 (Bolyen et al., 2019) of the QIIME software for the comparisons. In brief, FASTQ sequences were imported into QIIME2 as a qza artifact and amplicon primers were removed using cutadapt plugin (Marcel, 2011). DADA2 plugin (Callahan et al., 2016) was used to denoise, filter and trim the reads. Amplicon Sequence Variants (ASVs) were classified and with a classifier trained using the SILVA v138 database (Quast et al., 2013). The 18S rRNA (ITS) sequence analyses was also followed the same pipeline which was used for 16S rRNA analyses on CLC genomics workbench and QIIME2 platform except the classifier was trained with ITS3f and ITS4r primer sequences for QIIME2 taxonomy assignment.
The long-read sequences (passed FASTQs) were analyzed on EPI2ME desktop agent (https://epi2me.nanoporetech.com/), a cloud-based algorithm including analytical workflow for classification of 16S rRNA (Fastq 16S). The passed FASTQ data (distinguished barcodes) were uploaded to EPI2ME-AWS cloud using EPI2ME desktop agent using default parameters and accessed through a web-interface workflow. The taxonomic assessment was further confirmed with Emu pipeline and Emu database (Curry et al., 2022). Both raw FASTQ files from Illumina and Nanopore sequencing have been deposited in the Sequence Read Archive (SRA) with accession no. PRJNA970378.
Total RNA-Seq Mapping
Single end demultiplexed fastq files were generated using bcl2fastq2 (Illumina, v2.17), from NextSeq550 high output v3 reagent’s bcl files. Initial quality control was performed using FastQC v0.11.8 and multiqc v1.7. Fastq files were imported batch wise, trimmed for adapter sequences followed by quality trimming using CLC Genomics Workbench (CLC Bio, v23.0.3). The imported high-quality reads were mapped against gene regions and transcripts annotated by ENSEMBL v99 hg38 using the RNA-Seq Analysis tool v2.7 (RNA-seq module, CLC Genomics Workbench), only matches to the reverse strand of the genes were accepted (using the strand-specific reverse option). Pathway analysis of differentially expressed genes was done with IPA software (QIAGEN, Hilden, Germany). Unmapped reads were then mapped against Streptococcus pyogenes (STAB902, Accession No. CP007041.1) reference genome.
Statistical Analyses
To compare the microbiome diversity between samples and treatments, we applied PERMANOVA Analysis (PERmutational Multivariate ANalysis Of VAriance, also known as non-parameteric MANOVA(20) available in CLC Bio Microbial Genomics Module 20.0. This measures effect size and significance on beta diversity for variable. The significance is obtained by a permutation test. Statistical analysis was performed using GraphPad Prism software version 10 (GraphPad). T-test was performed and two-tailed p value of < 0.05 was considered significant. The cytokine raw and antigen data were normalized with Min-Max normalization function deployed as R script. The differentially expressed gene list was entered into IPA software (QIAGEN, Hilden, Germany), which employed the program’s core analysis’ feature to analyze the data regards canonical pathways and networks.