Study design and ethics statement
This study is part of the Human Functional Genomics Project (HFGP) (https://hfgp.bbmri.nl/) and uses the 500 Functional Genomics (500FG) cohort, a population-based cohort comprising of 500 healthy Western European adults, which aims to identify the determinants accountable for the variability of immune responses in human body (Data publicly available at: https://hfgp.bbmri.nl/ menu/main/ home). A detailed description of the design and protocol of the 500FG cohort has been described previously (19-21). Briefly, sampling of 534 healthy participants (237 males and 296 females) of Western European ancestry aged > 18 years was performed at Radboud University Medical Center, Netherlands from August 2013 to December 2014. After blood collection in hospital, participants completed a self-administered questionnaire that contained questions related to demographics, lifestyles, food intake and medical histories (Supplementary Table 1). Primary exclusion criteria in the final analysis were participants who had been treated with medication, and/or had a history of kidney disease or diabetes mellitus (n=45) and those whose food questionnaire were not completed (n = 7), leaving 482 eligible individuals for the analysis of diet-circulating markers correlations, and 471 subjects of them for GM analysis.
The study received ethical clearance from Ethical Committee of Radboud University Nijmegen (NL42561.091.12, 2012/550) to ensure subjects protection and adherence to the principles expressed in Declaration of Helsinki. All participants provided written informed consent at study entry.
Assessment of dietary intake
A semiquantitative food questionnaire was employed to assess usual intake frequency of 10 core foodstuff and beverage groups during the previous year, with an overall response rate of 96.5 %. The 4 responses ranged from “never” to “everyday” for the questions on processed meat, fish, fruit, legumes, vegetable and sweeten beverage, while information on consumption of chocolate (bar of 200 gram/month), milk (glass of 200 milliliters/day), alcohol at weekdays(glass/day) and weekends(glass/day) was collected in a form of options with increased amount. Legumes were not included as vegetables (22), due to the inconsistent classification of legumes as a subcategory of vegetables (23) (See Supplementary material for more details). All responses were regarded as ordered categorical variables, as described later.
Lab assessments
Circulating markers:
Three adipokines (leptin, adiponectin, and resistin) and two acute phase proteins (CRP and alpha-1 antitrypsin (AAT)) were measured with DuoSet enzyme-linked immune- sorbent assay (ELISA) kits following standards from R&D Systems as described previously (19). Plasma concentrations of IL-1Ra and IL-18 binding protein (IL-18BP) were determined by commercially available Quantikine® kits and protocols (R&D Systems). The serum levels of IL-1β, IL-6, IL-18 and vascular endothelial growth factor (VEGF) were assayed in Simple Plex™ microfluidic cartridges by using Ella instruments (ProteinSimple).
Fecal samples and Sequencing
Participants were provided with kits and instructions for stool collection at home and storage in the fridge. After delivery to the hospital, stool samples were aliquoted and stored at -80 ℃ until DNA extraction. DNA isolation from stool was done with the use of the AllPrep DNA/RNA Mini Kit (QIAGEN) with the addition of mechanical lysis. DNA concentrations were subsequently measured with the use of the Quant-iT dsDNA Assay Kit and normalized to a concentration of 50 pg/mL. Libraries were prepped for metagenome sequencing on an Illumina HiSeq 2000 platform, sequencing paired-end reads (2*101 base pairs).
Statistical Approach
Relation between dietary patterns and immune markers
The population-level dietary patterns were extracted with exploratory factor analysis, using polychoric correlation analysis with the R package “psych” (24), a data-driven technique to handle ordered categorical data (25) (see http://dwoll.de/rexrepos/posts/multFApoly.html for more details). The number of factors retained was based on the output of R function “fa.parallel.poly” by parallel analysis and empirical interpretability of the factors. Factor loadings were derived by varimax rotation and food items were considered to load on a factor dependent on the highest absolute correlation. Dietary score (namely factor score) coefficients were estimated by regression approach and saved for the individual values of the food pattern. The naming of the rotated factors was based on the variables with the strongest contribution and based on prior literature.
Because the concentrations of circulating immune markers were not normally distributed, circulating markers were log-transformed before regression analysis. Values that were below the lower limit of detection were set to the threshold value of the kit. The percentage of missing value is as follows: dietary questionnaire 3.1%, and circulating markers 1.5%. Missing values in diet were imputed with the mode and immune markers were not imputed.
The associations between the derived dietary patterns and the immune biomarkers were evaluated using multiple regression analysis with ‘lm’ function in R language. The regression analysis was conducted separately for each pattern considered, with the continuous dietary score as independent variable and immune characteristics as dependent variables. The potential confounders were chosen on the basis of their associations of the immune markers as indicated in literature: sex, age, smoking status (current smoker, former smoker, passive smoker and never smoker), weekly physical exercise (no exercise, less than twice per week and twice or more exercise/week), body mass index (BMI, calculated as weight/height2), educational attainment (primary or secondary, current college training or college). Moreover, we mutually adjusted for other food pattern.
Gut microbiota selection
As part of the QC samples with lower than 4 million sequencing reads were removed. Furthermore, forward and reverse reads were combined, filtered based on a minimal read length, and non-bacterial DNA was removed. Taxonomic profiling was done using MetaPhlAn 2.2. For more information on these procedures see (26). The analysis focused on taxa at species level, encompassing the highest resolution possible in this dataset. Absolute abundance was available of 408 microbial species. These abundances were made compositional by dividing the absolute abundance per species by the total read count across all species per sample. To reduce the number of zeros in the abundance data, we selected only the species with a prevalence of ≥3% in all samples, with a count detection threshold of 0.01. After this cutoff, we were left with a total of 75 prevalent species. We used this set of species as a starting point for the mediation analyses.
To account for multiple testing between biomarkers and food patterns, an adjusted p-value was calculated with Benjamini-Hochberg procedure (27) and a threshold of false discovery rate (FDR) < .05 was considered as indicating statistical significance. All other statistical analyses were conducted in the R programming language (28).
Mediation analysis
Mediation analysis was used to investigate whether the identified diet-immune relations are mediated by microbial species. In general, mediation analysis aims to extract the mechanisms by which an exposure (X) impacts the outcome variable (Y) by considering a set of potential variables or mediators which may mediate the effect (M) (Figure 1).
The characteristics of GM data such as compositionality, high dimensionality and sparsity violate the assumptions made by linear regression. Therefore, non-parametric entropy mediation (NPEM) was employed to perform mediation analysis, which is based on information theory and uses entropy to calculate the variation and thereby the information that variables contain (29). NPEM analysis was performed in R with a function supplied by Carter et.al (29). The NPEM algorithm identifies relevant mediators explicitly in the context of noise. Therefore, it relies on the inclusion of the complete dataset rather than for example the selection of specific microbial targets. The NPEM analysis was performed with all selected species and diets at once, but separately for each immune factor, yielding a total of five analyses. Accordingly, the input to the algorithm consisted of: X = three dietary pattern adherence scores, M = log transformed absolute abundances of 75 species, and Y = concentration of one of the immune factors. Information score was a score calculated based on information theory and was comparable to coefficient in parametric mediation analysis (29).