We reported this study in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology statement.
Study population
Between November 1993 and September 2001, nearly 155000 American adults aged 55 to 74 years were enrolled to the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, a multicenter randomized controlled trial for investigating whether screening for prostate, lung, colorectal, and ovarian cancer could decrease the risk of mortality from these cancers. Study design of this trial has been reported elsewhere [17]. The PLCO Cancer Screening Trial was approved by the Institutional Review Boards of the US National Cancer Institute and each recruitment center. All participants provided written informed consent.
The following subjects were excluded in the present study: (1) subjects receiving a diagnosis of any cancer before completing a baseline questionnaire or a diet history questionnaire (DHQ; n = 11882); (2) subjects with an invalid DHQ, which is defined as the presence of extreme values of energy intake (i.e., the first or last percentile), the date of DHQ completion prior to the date of death, missing the date of DHQ completion, or ≥ 8 missing DHQ items (n = 4841); (3) subjects with an incomplete DHQ (n = 34401); (4) subject with a history of stroke or heart attack at baseline (n = 9932); and (5) subjects failing to return the baseline questionnaire (n = 1940). After exclusions, a total of 91891 subjects were included (Figure S1). Of note, a comparison of baseline characteristics of included (n = 91891) and excluded (n = 62966) subjects showed that there were no significant differences in age, sex, race, educational level, body mass index (BMI), smoking status, history of diabetes, and history of hypertension between two groups (all P for difference > 0.05), suggesting that the potential for nonparticipation biases was low in our study. For all eligible subjects, follow-up time was calculated from the date of DHQ completion to the date of death, study dropout, or the end of follow-up (i.e., December 31, 2015), whichever came first (Fig. 1).
Data collection
Baseline data, including sex, marital status, race, height, body weight, educational level, history of diabetes or hypertension, and smoking status, were collected with a self-administrated baseline questionnaire. BMI was calculated as body weight (kg) divided by height squared (m2). Age at DHQ completion, alcohol intake, food consumption, nutrient intake, and energy intake from diet were collected with a DHQ (version 1.0, National Cancer Institute, 2007). The DHQ is a self-administered 137-item food frequency questionnaire, which is designed to assess the frequency and portion size of food consumption and nutrient intake during the past year. The Eating at America’s Table Study had validated the DHQ performance in a nationally representative sample of 1640 subjects against four 24-hour dietary recalls, indicating that the DHQ had good performance in the estimation of dietary intakes [18]. Daily consumption of each food in the DHQ was estimated by multiplying food frequency by portion size; daily intake of each nutrient was estimated using the approach described by Subar et al. [19] based on the USDA's 1994-96 Continuing Survey of Food Intakes by Individuals [20] and the Nutrition Data Systems for Research [21]. Healthy Eating Index-2005, a measure of diet quality, was calculated using the method described in the literature [22]. Physical activity level was estimated based on the frequency and duration of moderate and strenuous activities that were collected with a self-administrated supplemental questionnaire.
Assessment of ultra-processed food consumption
Two dietitians classified all food and drink items of the DHQ into one of the four food groups defined by the NOVA classification [23]. Based on the purpose, nature, and degree of food processing, the NOVA classification outlines four food groups: unprocessed or minimally processed foods, processed culinary ingredients, processed foods, and ultra-processed foods. The detailed description, including definition and example, for each group is available elsewhere [23]. In the present study, we focused on ultra-processed foods, which include sour cream, cream cheese, ice cream, frozen yogurt, fried foods, breads, cookies, cakes, pastries, salty snacks, breakfast cereals, instant noodles and soups, sauces, oils and fats, candy, soft drinks, fruit drinks, restaurant/industrial hamburgers, hot dogs, and pizza. Based on a reported categorization method [24], all ultra-processed foods were further categorized into nine food groups for relevant analyses, namely soft drinks, cereals, ultra-processed fruits and vegetables, ultra-processed dairy products, meat and fish, sauces and dressings, salty snacks, sugary products, and oils and fats. Table S1 shows the full list of ultra-processed foods in each food group.
The amount consumed of each food item (78 items, see Table S1) was summed together to calculate an individual’s overall consumption of ultra-processed foods. Similarly, the energy content (kcal) of each food item, which was estimated based on the USDA Food and Nutrient Database for Dietary Studies 2015–2016 [25], was summed together to calculate total energy from ultra-processed foods. Importantly, ultra-processed food consumption used for all analyses was adjusted for energy intake from diet using the residual method [26].
Outcome assessment
Vital status was ascertained primarily through a mailed annual study update form. Individuals who did not return the form were contacted repeatedly via telephone or e-mail. Additionally, information on vital status was supplemented by periodic linkage to the US National Death Index to increase its completeness. The International Classification of Diseases, ninth Revision (ICD-9) was used to define the underlying causes of mortality obtained from death certificates: CVD (codes: 390–459), heart disease (codes 390–398, 402, 404, and 410–429), and cerebrovascular disease (codes 430–438).
Statistical analysis
As there were seven covariates with missing data (see Table S2), for increasing statistical power and reducing potential biases, multiple imputation by chained equations was applied to impute missing data (the number of imputations = 25) [27], with the assumption that the above-mentioned data were missing at random. All variables involved in statistical analyses were employed to yield the imputed data sets. Main analyses were repeated in participants with complete data for comparison.
Cox proportional hazards regression was applied to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for the association between ultra-processed food consumption and cardiovascular mortality, with person-year as the underlying time metric. Ultra-processed food consumption was divided into quintiles, with the lowest quintile as the reference group. To test linear trends across quintiles of ultra-processed food consumption, the median value of each quintile was assigned to each participant in the quintile and then regarded as a continuous variable in regression models. In multivariable analyses, covariate selection was based on the change-in-estimate approach [28] and the existing literature. Specially, model 1 was adjusted for age, sex, race, educational level, marital status, and study center; model 2 was further adjusted for aspirin use, history of hypertension, history of diabetes, smoking status, alcohol consumption, BMI, physical activity level, and energy intake from diet. To assess how robust our results were to the potential unmeasured confounding, we calculated the E-value through an online calculator (https://mmathur.shinyapps.io/evalue/) [29], with an assumption of outcome prevelence less than 15%. The E-value represents what the minimum HR would have to be for an unmeasured confounder, conditional on the measured covariates, to negate the observed association of ultra-processed food consumption with cardiovascular mortality. No violation of the proportional hazards assumption was found using Schoenfeld residuals method (all P > 0.05). We expressed ultra-processed food consumption as the serving daily in all main analyses based on the USDA Pyramid Servings Database [30]. Meanwhile, we expressed ultra-processed food consumption as serving per day/kilogram body weight in supplementary analyses to examine the potential impacts of body size. For comparison with the published data, we also tested the association between the proportion of energy from ultra-processed foods to total daily energy intake (% energy) and cardiovascular mortality.
Prespecified subgroup analyses were conducted to assess whether the observed association of ultra-processed food consumption with cardiovascular mortality was modified by age (≥ 65 vs. <65 years), sex (male vs. female), BMI (≥ 25 vs. <25 kg/m2), smoking status (current or former vs. never), and alcohol consumption (no, light, or moderate vs. heavy). Here, light, moderate, and heavy alcohol consumption were defined as ≤ 6 g/day, > 6–28 g/day for male and > 6–14 g/day for female, and > 28 g/day for male and > 14 g/day for female, respectively [31]. A Pinteraction was obtained through a likelihood ratio test, which compares the models with and without interaction terms.
Restricted cubic spline regression [32] with four knots at the 5th, 35th, 65th, and 95th percentiles was used to explore the potential dose–response relationship between ultra-processed consumption and cardiovascular mortality. The reference level was set at 0 serving/day. A Pnonlinearity was obtained by testing the null hypothesis that regression coefficients of the second and third splines are equal to zero [32].
Sensitivity analyses were performed to evaluate the robustness of our results: (1) excluding deaths occurring within the first five years of follow-up to determine the potential effects of reverse causation; (2) excluding subjects with extreme values of energy intake, which are defined as < 800 or > 4 000 kcal/day and < 500 or > 3500 kcal/day for men and women, respectively [33]; (3) including subjects with history of cancer at baseline; (4) including subjects with history of heart attack or stroke at baseline; (5) repeating main analyses with competing risk regression [34] to assess the potential effects of competing risk bias, with non-CVD causes of death as competing events; (6) adjustment for propensity score on unadjusted model (all covariates in model 2 were used to calculate propensity score); and (7) additional adjustment for several indicators of diet quality, including Healthy Eating Index-2005, intakes of sodium, added sugars, and saturated fatty acids, and consumption of red meat, processed meat, whole grain, fruit, vegetable, dietary fiber, and dairy.
We calculated the proportion of each food group in total amounts of ultra-processed foods to quantify their contributions to ultra-processed food consumption. In addition, we tested the association between ultra-processed food consumption by food group and cardiovascular mortality.
As cancer is most common cause of death in this study population, we also performed a supplementary analysis to examine the association between ultra-processed food consumption and overall cancer mortality. To validate our study design and methods, we used all-cause mortality as a positive control outcome, given the well-established association of ultra-processed food consumption with all-cause mortality [15, 16, 24, 35]. The statistical significance level was set at P < 0.05 under a two-tailed test. Statistical analyses were performed using STATA version 12.0 (StataCorp, College Station, TX).