Study Design and Population
A prospective cohort of patients were consecutively recruited at Intensive Care Units (ICUs) in Guangdong Provincial People's Hospital from March 2017 through October 2017. Patients were admitted from general or emergency wards. Participants were excluded if any of the following criteria are met: (1) the participant was under age 18 years; (2) the participant received systematic antibiotic treatment within 48 hours before ICU admission; (3) the participant did not defecate within 24 hours of ICU admission; (4) the participant was admitted to ICU for end-stage chronic diseases; (5) the participant died within 24 hours of ICU admission; (6) the participant was discharged from hospital with discontinued follow-up. The study was approved by the local ethics committee. All patients provided informed consent form before participating in the study.
Sample size consideration
The sample size was calculated according to the one in ten rule, a rule of thumb recommended by Peduzzi et al. [8] and Harrell et al. [9], namely, events per variable being 10 or greater in the setting of a multivariate logistic regression model. Our prediction model covers two variables - Bifidobacterium abundance and APACHE II or SOFA score. This would have required a minimum sample size of 20 participants who had events (dying in hospital) to predict the outcome of in-hospital death.
Data Collection
At admission to the ICU, demographic, laboratory and clinical variables were recorded. SOFA score and APACHE II score were assessed twice within 24 hours of ICU admission with the worst value chosen. Stool samples were collected at ICU admission and immediately frozen at -80°C. Taxonomic compositions of the intestinal microbiome were determined using the 16S rDNA gene sequencing of the stool sample.
Intestinal microbiota analysis
Intestinal flora analysis was performed by the 16S rDNA amplicon sequencing method, the most suitable tool for bacterial phylogeny and classification identification [10-12]. The genomic DNA of the sample was extracted by the sodium dodecyl sulphate (SDS) method. The purity and concentration of the DNA were detected on 1% agarose gel electrophoresis. Dilute the DNA to 1 ng/ml with sterile water depending on the concentration. Specific primer (e.g. 16S V4: 515F-806R, 18S V4: 528F-706R, 18S V9: 1380F-1510R, et. al) with the barcode was used to amplify the 16S rDNA gene in different regions (16S V4/16S V3/16S V3-V4/16S V4-V5, 18S V4/18S V9, ITS1/ITS2, Arc V4). All PCR reactions were performed using Phusion® High-Fidelity PCR Master Mix (New England Biolabs, Ipswich, MA, USA). The same volume of 1X loading buffer (containing SYB green) was mixed with the PCR product and electrophoresed on a 2% agarose gel. Samples with a bright master band between 400-450 base paris were selected for further experiments. The mixture PCR product was then purified using a Qiagen Gel Extraction Kit (Qiagen, Duesseldorf, Germany). Following the manufacturer's recommendations, TruSeq® DNA PCR-Free Sample Preparation Kit (Illumina, San Diego, CA, USA) was used to generate sequencing libraries and index codes were added. Qubit 2.0 fluorometer (Thermo Scientific, Waltham, MA, USA) and Agilent Bioanalyzer 2100 system (Agilent Technologies, Beijing, China) were used to evaluate the library quality [13]. The library was sequenced by an IlluminaHiSeq2500 (Illumina, San Diego, CA, USA) platform and 250 base pair paired-end reads were generated [14].
Sequences analysis was performed with Uparse software (Uparse v7.0.1001, http://drive5.com/uparse/) [15]. By default, the sequence was clustered into operational taxonomic units (OTUs) with 97% identity. Meanwhile, the representative sequence of OTUs was selected. According to the algorithm principle, the sequence with the highest frequency in OTUs was selected as the representative of OTUs. The Mothur method (with a cut-off value of 0.8) and Silva database (version 128, https://www.arb-silva.de) were used to annotate OTUS representative sequences and analyze species annotations to obtain classification information of each classification level: kingdom, phylum, taxa, order, family, genus and species [15-17]. Fast multi-sequence alignments were performed using MUSCLE (Version 3.8.31, http://www.drive5.com/muscle/) software to obtain phylogenetic relationships for all OTUs representative sequences [17]. At last, the data of each sample was homogenized, and the sample was homogenized according to the minimum amount of data in the sample.
Data analysis
Patients participating in the study were categorized into two groups according to their survival status in hospital: survival and death groups. All baseline data between groups were assessed. Based on the OTUs clustering results, the abundance analysis of OTUs was carried out to explore the differences in community structure between groups. The difference between the beta diversity index groups was then analyzed. Metastats analysis was performed at each classification level (Phylum, Class, Order, Family, Genus, Species). P values were obtained by performing permutation tests between groups, and abundance distribution boxes of different species among groups were plotted.
The prognosis prediction models for Bifidobacterium abundance, APACHE II score, SOFA score were established with univariate logistic regression. The prognosis prediction models for APACHE II plus Bifidobacterium abundance, and SOFA score plus Bifidobacterium abundance were established with multivariate logistic regression. All prediction models were evaluated by the area under the receiver operating characteristic curves (AUROC). To avoid overfitting effect due to relative small sample size, The bootstrap method was used with 1000 resamples, and the bootstrap-corrected AUROC and 95% CI were reported. The Youden index [18] was used to determine the optimal cut-off points, at which sensitivity and specificity were calculated. Comparisons between groups of AUROCs were performed using the DeLong method [19]. using bootstrapping methods [19].
Continuous variables are presented as mean ± standard deviation (SD), and were compared using the independent sample Student's t test. Categorical variables were presented as number (percentage) and were compared using the Pearson Χ2 test or Fisher's exact test, as appropriate. The significance level was set at P < 0.05. The analysis was performed with SPSS 24.0 software (SPSS, Chicago, IL, USA) and MedCalc 12.5.0 software (MedCalc Software, Ostend, Belgium) , and R 3.3.1 software (R Foundation for Statistical Computing, Vienna, Austria) using RStudio v1.0.136 (RStudio Inc, Boston, MA, USA).