Study design and participants
This study is a longitudinal analysis of the Health Workers Cohort Study (HWCS), an ongoing prospective cohort study established in January 2004 with two waves of cohort follow-up at six-year intervals on average. The participants are employees from three different health and academic institutions, as well as their relatives, from the cities of Cuernavaca and Toluca, Mexico.
Study population
Details concerning the study population of HWCS and full study design have been described elsewhere (16). Briefly, from January 7, 2004 to November 27, 2007, 10,729 participants aged 6-94 years old, were recruited. However, due to financial constraints, only 2,500 (23.3%) of the initially enrolled participants from Cuernavaca were invited to the first follow-up phase between 2010 and 2013, with a response rate of 83% (n = 2,070). Figure 1 shows the flow chart of the included participants through this study. For our analysis, we excluded participants who at baseline were younger than 19 years old (n=169), who had missing data on soft-drinks (n=22), as well as pregnant women at baseline (n=3). Subjects with missing type 2 diabetes baseline data (n=36) or with previously known or newly diagnosed diabetes (n=127), heart disease (n=43), or cancer (n=6) (except skin or melanoma) at baseline, were excluded. We also excluded 80 participants who responded <75% of the food-frequency questionnaires (FFQs), had missing data in an entire section of the FFQ or implausible energy consumption defined as those who were below a predefined limit of 500 kcal/d or above 6,400 kcal/d, following the standard deviation method (17), as previously used in studies from this cohort (18; 19). After excluding 144 participants with incomplete data for disease outcome at six-year follow-up, 1,445 participants were used as our analytic sample.
Soft drinks intake
Soft drink consumption was assessed at baseline and the subsequent examinations with a semi-quantitative 116-item FFQ that has been validated in the Mexican population (20). Participants were asked to report the frequency of consumption of a standard portion of each food in the last 12 months using ten possible responses (never, <1 time/month, 1-3/month, 1, 2-4, 5-6 times/week, 1, 2-3, 4-5, 6 or more times/d). Soft drinks were defined as cola soft drinks and flavored carbonated soft drinks with a standard serving of 355 ml. We converted the reported frequency of soft drinks into a daily intake. The frequency was converted into four categories of intake (<1/month, 1-4,/month, 2-6/week, and ³1/d) to get comparable data of soft drinks consumption with previous studies (21). However, due to most participants were in the middle two categories of consumption (74.1%), we reclassified the categories of exposure, as follows: <1 time/week, 1-4 times/week, and >5 times/week.
Type 2 diabetes
Incident type 2 diabetes was defined as having one of the following three criteria during follow-up: self-report of physician-diagnosed type 2 diabetes, new use of hypoglycemic medication, or fasting glucose >126 mg/dL during the examination (22). A fasting venous blood sample (fasting time ≥ 8 hours) was collected from each participant. Fasting glucose was measured with the enzymatic colorimetric method by using glucose oxidize with a Selectra XL instrument (Randox, ELITechGroup, Delhi, India). The onset of type 2 diabetes was defined based on either the date of the follow-up examination or the year of physician diagnosis self-reported by the participants. Intervals of one-year between the two examinations were included in the questionnaire to record the time since type 2 diabetes diagnosis. June 30th was set as the diagnosis date for each year. We estimated the date of physician diagnosis subtracting the date of type 2 diabetes diagnosis to the date when completed questionnaires were returned.
Covariates
At each wave, participants completed a self-administered questionnaire that includes information regarding demographic characteristics (age, sex), previous and current illnesses, family history of diabetes, medication use as well as lifestyle habits (smoking status and physical activity). We used the same measurement instruments for time-varying covariates to ensure comparability across waves. Participants were classified according to smoking status as never, former, and current smokers. Alcohol consumption (in g/d) was estimated from FFQ and categorized in tertiles. We calculated total energy intakes in kilocalories by multiplying the frequency of consumption of each food by the energy content of the food and summing over all foods. Leisure time of physical activity was assessed through a validated physical activity questionnaire (23). Participants were asked to report the weekly leisure time they devoted to 16 activity items like walking, running, and cycling. Participants were classified as active if their leisure time of physical activity was ≥150 min/week (24).
Medical examinations and anthropometric measurements were also performed. All anthropometric measurement procedures were performed by nurses trained to use standardized procedures (reproducibility was evaluated, resulting in concordance coefficients between 0.83 and 0.90). Weight was assessed on participants wearing minimal clothing with a previously calibrated electronic TANITA scale. Height was measured with a conventional stadiometer. Body mass index (BMI) was calculated as weight (kg) divided by the square of height (m2). Waist circumference (WC) was measured midway between the lowest border of the rib cage and the upper border of the iliac crest, while the participant was standing up. We defined abdominal obesity as waist circumference >90 cm for men and >80 cm for women (25). Resting blood pressure (mmHg) was measured twice using an automatic digital blood pressure monitor, and the average of two measurements was calculated. Subjects with a systolic or diastolic blood pressure of >140 mmHg or >90 mmHg, respectively, as well as those who reported use of antihypertensive medication, were classified as hypertensive.
Statistical analysis
The characteristics of the analytic sample across categories of soft-drinks intake were described as means and standard deviation, as medians with interquartile ranges (IQR) for skewed distributions, or percentages for categorical variables. Because the frequency of missing data at baseline for smoking status (3.5%), and abdominal obesity (1.4%), we used a missing indicator category for these covariates to minimize the reduction in sample size. Tests of linear trend across categories of soft drinks intake were conducted by a Chi-square test for linear trend. We calculated person-years of follow-up from the date of returning the baseline questionnaire to the date of type 2 diabetes diagnosis or were censored on the date of their final follow-up visit. To examine the association of soft-drinks consumption at baseline with type 2 diabetes, hazard ratios (HRs) along with 95% confidence intervals (CIs) were estimated using Cox proportional hazards regression with the time on study as the time scale. The category of <1 time/week was considered as the reference group in all analyses.
Two separate models were fitted to assess the relationship between soft drinks intake and incidence of type 2 diabetes. Model 1 was adjusted only for age of participants (continuous). Model 2 was further adjusted for potential confounders identified after reviewing the literature, and by using the causal diagram methodology to select all variables related with the exposure and outcome. We considered the following covariates in the multivariate-adjusted analyses: age (continuous), sex, total energy intake (continuous), smoking status at baseline (never, former, current, missing), leisure-time physical activity in hours per week (active ≥150 min/week), family history of diabetes (no, yes, unknown), alcohol intake at baseline (tertiles of g/d). The potential modifying effect of first-degree family history of diabetes, as a proxy for genetic susceptibility for type 2 diabetes risk (14; 26), was evaluated by including interaction terms between this covariate and soft drinks intake in model 2. This analysis just included the information of participants who responded yes or no in the variable of family history of diabetes (n=1,341). For all models, we tested for a linear trend in the HR by assigning the median value of each category of soft drinks and treating these as a single continuous variable into separate Cox regression models (adjusted by the same covariates). The proportional hazards assumption was assessed by a graphical check on the log cumulative hazard versus time and tested by using Schoenfeld residuals (27), which test the null hypothesis of zero slopes for individual covariates and globally for each regression model. The assumption of proportional hazards was not violated (P > 0.05).
Sensitivity analysis
Several sensitivity analyses were made. First, the multivariable-adjusted model 2 was further adjusted for hypertension status (no/yes), to test the potential confounding effect of hypertension in the association of soft drinks and type 2 diabetes. Some studies have suggested that having hypertension increases the risk of type 2 diabetes, while at the same time assuming that hypertensive individuals can alter their soft drinks consumption (28; 29). Second, the multivariable-adjusted model 2 was further adjusted for BMI or abdominal obesity at baseline, respectively, to assess the potential confounding effect of these adiposity indicators. Finally, we conducted a complete case analysis using data from those participants with complete follow-up data from 2004 to 2018 (n=600). We decided not to use data based on complete cases as the main analysis because the large loss to follow-up could affect estimates through selection bias.
All P-values were two-tailed and P <0.05 was considered significant. Statistical analysis was performed using Stata version 14.0 (StataCorp, College Station, TX, USA).