Study cohort description
Recruitment. Seven German medical centers participated in the recruitment of inpatient subjects (both sexes, 37-85 years of age) with laboratory confirmed COVID-19 (n=93): University Clinic Heidelberg (Thoraxklinik and Department of Gastroenterology and Infectious Diseases), University Hospital Mannheim, University Clinic Regensburg, University Clinic Frankfurt, Klinikum rechts der Isar of Technical University Munich, and University Heart Center Freiburg. As a control for patients with severe COVID-19 that were admitted to an intensive care unit, we included oropharyngeal biospecimen from 30 SARS-CoV-2 negative patients (both sexes, 29-94 years of age) that were treated at the ICU at the University Clinic Heidelberg because of pulmonary infections and respiratory distress (see Extended Data Fig. 4 for demographic details). Recruitment of outpatient subjects (n=148, both sexes, 18-86 years of age; Extended Data Fig. 1) was performed in COVID-19 test centers run by the Heidelberg public health department. Testing was initiated by health authorities because of emerging, new flew-like symptoms, individual contact with SARS-CoV-2 infected people or travel to COVID-19 outbreak areas. Adult healthy controls (n=74, both sexes, 23-70 years of age; no SARS-CoV-2 infection, no symptoms for upper respiratory tract infections) were recruited among volunteers from DKFZ, University Clinic Heidelberg and the National Center for Tumor Diseases Heidelberg.
Regulatory compliance. The study was approved by the Ethics Committee of the University of Heidelberg (# S-194/2020, S-240/2020 and S-148/2020) according to the principles of the Declaration of Helsinki and is registered at the German Clinical Trials Register (# DRKS00021424). Informed consent was obtained from all the participants of the study.
Participant metadata. Epidemiological, clinical, laboratory, treatment and outcome data of inpatient study participants were extracted from medical records using a standardized version of the WHO/International Severe Acute Respiratory and Emerging Infection Consortium case record form for severe acute respiratory infections6. Baseline epidemiologic characteristics, symptoms of infection and antibiotic use were captured from healthy volunteers and outpatient participants.
Assessment of COVID-19 severity according to the WHO guidance document for clinical management of COVID-19 (May 27th, 2020). Symptomatic patients who were tested positive for SARS-CoV-2 infection without evidence of pneumonia or hypoxia were categorized as mild COVID-19. These were largely patients that were screened for the infection in the outpatient test center. Patients admitted to the hospital with signs of pneumonia but no signs of severe pneumonia (including SpO2 ≥ 93% on room air) and tested positive for SARS-CoV-2 were considered to have a moderate disease. Severe COVID-19 was assigned to cases with severe pneumonia (respiratory rate > 30 breaths/min; or SpO2 < 93% on room air), acute respiratory distress syndrome, sepsis or septic shock. For six patients in the severe COVID-19 group that received mechanical ventilation, we do not have gender, age or clinical outcome information. Clinical outcomes were monitored until June 8th, 2020.
Observational study design consideration. This manuscript describes an observational cohort study and was prepared, where applicable, according to the reporting recommendations for observational studies (STROBE Statement)23. The oropharyngeal specimens were collected prospectively between 30 March 2020 and 20 May 2020 at the different study centers either with the goal of microbiome sequencing and CoV-2 detection, or with the goal of assembling biospecimen banks that would facilitate many different analyses.
Specimen collection and processing
Collection. Oropharyngeal (throat) swabs for microbiome analysis were collected upon testing in an outpatient COVID-19 test center using a DNA-RNA collection kit (Zymo Research, CA, USA). The swabs were collected in addition to diagnostic swabs for laboratory confirmation of SARS-CoV-2 infection. For patients admitted to the hospital oropharyngeal specimen for microbiome analyses were collected primarily at admission day again with the Zymo DNA-RNA collection kit; 15 samples from the University Clinic Regensburg were collected by pharyngeal rinses with sterile water. The rinse solutions were centrifuged, and the pellet was resuspended in an RNA-DNA stabilizing buffer. The oropharyngeal specimens from healthy volunteers were also sampled using the Zymo Research collection kit, and these biospecimens were used for both virus detection and microbiome sequencing.
SARS-CoV-2 RT-PCR. Laboratory confirmation of SARS-CoV-2 for hospitalized patients was performed in certified diagnostic departments of the participating university hospitals. RT-PCR assays were performed in accordance with the protocol established by the WHO / Dr. Drosten (Charité, Universitätsmedizin Berlin, Germany)24 applying an assay kit from Tib-Molbiol (Berlin, Germany) with the E and RdRP genes as RT-PCR targets. The same method was applied to assess the SARS-CoV-2 status of outpatients and healthy volunteers after RNA isolation using the Viral RNA Mini kit (Qiagen, Hilden, Germany).
Metagenome sequencing
For microbiome analyses, DNA was extracted from 200 μL of one aliquot of collected swab material using the QIAamp Fast DNA tissue kit (Qiagen) as per manufacturer protocol. DNA concentrations were measured using a Qubit 4.0 fluorometer with a Qubit™ 1X dsDNA HS Assay-Kit (Life Technologies, Grand Island, NY, USA, Cat. No Q33230). Metagenomes were generated from the resulting DNA, and whole-genome libraries were prepared as follows. DNA was normalized to a concentration of 0,2 ng/µl. Illumina sequencing libraries were prepared from 1 ng DNA using the Nextera XT DNA Library Preparation kit (Illumina, San Diego, CA, USA, FC-131-1096) according to the manufacturer’s recommended protocol, with reaction volumes scaled accordingly. Concentrations for each pooled library were determined using Qubit™ 1X dsDNA HS Assay-Kit (Life Technologies, Grand Island, NY). Prior to sequencing, libraries were pooled by collecting equal volumes of normalized libraries (2 µl) from batches of 96 samples. The resulting multiplexes were sequenced on Illumina NovaSeq 6000 to obtain 100 bp paired-ends reads with Unique Dual Barcodes to yield ~30-80 million paired end reads per sample. Post-sequencing demultiplexing and generation of FASTQ files was generated using the bcl2fastq Conversion Software (Illumina, San Diego, CA, USA).
Analyses of microbial sequencing data
Metagenomic data processing. Sequencing reads from each sample in a pool were de-multiplexed based on their associated IDT barcode sequence. Next, in order to remove host contaminant reads, sequencing reads mapping to the human genome were first filtered out using KneadData version 0.7.2. Taxonomic profiles of shotgun metagenomes were generated using Kraken 2 version 2.0.8-beta25 followed by re-estimation of species abundance using Bracken version 2.5.3, using the MiniKraken2_v1_8GB kraken database built from the Refseq bacteria, archaea and viral libraries (version of database update: 29/04/23, https://ccb.jhu.edu/software/kraken2/downloads.shtml). Functional profiling of metagenomes was performed by HUMAnN2 (http://huttenhower.sph.harvard.edu/humann2)26. Here, reads are mapped against a UniRef-based protein sequence catalogue to extract the abundance profiles of genes and gene families (UniRef90). Rarefaction of the metagenome data was done at described below.
Microbiota diversity indices. Alpha-diversity is a mathematical value that summarizes a microbial community according to the count of unique species within a sample and how evenly their frequencies are distributed in the community. Here we calculated alpha-diversity using the Shannon index at the species level that weights also relatively rare abundant species27. To analyze differences in the microbiome composition (i.e., beta-diversity) between samples, we performed a principle coordinates analysis (PCoA) based on species-level Bray-Curtis dissimilarity. These differences were also visualized using the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm where samples with similar compositions are located relatively close to each other, and clusters of distinct microbiome compositions can be separated16.
Dysbiosis index. The extent of microbiome dysbiosis in each sample was quantified based on Bray-Curtis distances, following a previously described strategy15. First, the set of all samples was subjected to principal coordinate analysis (PCoA) using Bray-Curtis distances. Next, the centroid of the healthy control samples subset was derived as the median of their values along all PCoA axes. Then, the dysbiosis score of each sample was calculated as the Euclidian distance between its position in the PCoA space and the healthy control centroid.
Quantification and statistical analysis
Association testing. We assessed differences in relative abundances for species between two different subgroups by Wilcoxon tests and applied multiple hypothesis testing correction by the Benjamini-Hochberg False discovery rate (FDR) method as implemented in SIAMCAT R package28. We also performed a confounder analysis herein computing associations using Fisher’s exact tests or Wilcoxon tests (for discrete or continuous variables) between different meta-variables and species abundances and illustrated the variance in relative abundance as explained by different meta-variables. For association analyses, taxonomic input data were filtered using abundance filtering with 10-3 as cutoff for minimal abundance and after log-transformation. To associate microbiological taxa with age, gender or antibiotic treatment as clinical cofactors we used a multivariate statistical linear regression analysis by the R package MaAsLin2 with default parameters (abundance filtering with
10-3, min. prevalence 0.1, arcsine-sqrt transformation of relative abundances to normalize microbiome profiles)29. Statistical significance was determined by FDR multiple hypothesis testing correction p< 0.05.
Logistic regression. Univariate and multivariate logistic regression analyses were used to predict severity of COVID-19 and clinical outcomes stratified by Shannon alpha diversity or the dysbiosis index of the oropharyngeal microbiome of the COVID-19 inpatient cohort (i.e., patients with moderate and severe COVID-19). Shannon, dysbiosis index and age were used as continuous variables.
Validation and training cohort. The multi-center cohort of hospitalized patients with moderate and severe COVID-19 was divided into a training and validation sub-cohort based on geographic areas. The training set consisted of patients from Heidelberg area (Heidelberg University Clinic, University Hospital Mannheim and Thoraxklinik Heidelberg; n=15 moderate, 33 severe cases), and the patients from the other medical centers were grouped into the validation data set (University Clinic Regensburg, University Clinic Frankfurt, Klinikum rechts der Isar of Technical University Munich, and University Heart Center Freiburg; n=12 moderate and 33 severe COVID-19 cases). Clinical features of the two cohorts are summarized in Extended Data Fig. 4.
Random forest models. Random forest (RF) analysis was performed using the microbiome and clinical data of patients with moderate and severe SARS-CoV-2 infection. Microbiome features were filtered based on fractional abundance above, excluding taxa ≤ 0.0005, gene ≤ 0.00005 and pathways ≤ 0.001 relative abundance. Due to the sensitivity of the RF classification to the sampling depth, this was performed after samples were rarefied after removal of human reads (rarefication of taxa data to 16K Kraken-Bracken counts and genes and pathways to 190K HUMAnN2 gene counts). Following these filtrations, the species, microbial gene and pathway abundance matrices were each renormalized to have a sum fractional abundance of 1 per each sample. Random forest classification was applied using the random Forest R package (https://cran.r-project.org/web/packages/randomForest/; A. Liaw, M. Wiener, Classification and regression by random Forest R News, 2/33 (2002), pp. 18-22), embedded in a multi-layer machine learning architecture (Extended Data Fig. 8). Machine learning training and validation were performed on the validation and training cohort as described above (Heidelberg area patients [n=43] vs. patients from the other medical centers [n=33 patients] including only patients with clinically annotated data and HUMAnN2-derived gene and pathway counts). In the first machine learning layer, RF models were trained to predict one clinical outcome – either ICU admission, mechanical ventilation or mortality based on one type of microbiome features – i.e. either taxa, gene, or pathways. Hence 9 different combinations of feature type X outcome type were addressed. For each such combination, 100 forests were trained independently, each containing 10000 trees. For each forest, the data was divided randomly and equally to a training set and testing data sets, such that each set contains an equal number of the two possible states of the given outcome. For reproducibly, the randomization seed was set sequentially to a seed value from 1 to 100 for the 100 forests. In the second layer, RF models were trained to predict the ICU and ventilation clinical outcomes using outcome-specific sets of input features, combining together the taxa, gene and pathway features that had Gini importance > 0.01 for the prediction of the given clinical outcome in the first layer, as well as Shannon index and dysbiosis index as additional features. In the third layer, RF models were trained to predict the ICU and ventilation clinical outcomes, using the features that had Gini importance > 0.01 for the prediction of the corresponding clinical outcome in the second layer. For the ventilation outcome, the RF model that was trained at the third layer was taken as the final predictive model. For the ICU and mortality outcomes, an additional, forth, layer was applied, including as input features for each sample the percentages of trees voting for ventilation-positive and ICU-positive states in the corresponding random forests of the third layer, as well as the Shannon and dysbiosis indices. The two RF models that were trained in this forth layer were taken as the final predictive models for the ICU and mortality outcomes, correspondingly. The performance of the random forests models were assessed by (i) confusion matrices, enumerating the total number of true-positive, true-negative, false-positive and false-negative predictions, (ii) mean error rates, calculated based on the enumerating confusion matrices as (false positive + false negative)/( true positive + true negative + false positive + false negative), (iii) recall, calculated based on the enumerating confusion matrices as true positive / (true positive + false negative), (iv) the area under the receiver operating characteristic curve (ROC AUC) using the ROCR R package30, derived from 100 repeats of random subsampling (starting from randomization seed 1) of the patients to enforce equal representation of the positive and negative cases of the given outcome in each subset to be predicted.