Study population
The Centers for Disease Control and Prevention (CDC) administer the National Health and Nutrition Examination Survey (NHANES) to conduct cross-sectional health assessments. This survey is instrumental in obtaining comprehensive health data at a national level. This entity systematically gathers data that is representative of the population through surveys. The study protocol has undergone ethical review by the committee responsible for NHANES, and all participating subjects have provided written informed consent.
For this inquiry, we employed data openly accessible from three NHANES cycles, spanning the years 2013–2014, 2015–2016, and 2017–2018. Our study included participants aged 18 and above, with comprehensive demographic information such as education, income, and body mass index (BMI) fully documented. These individuals voluntarily provided blood and urine specimens for assessing concentrations of heavy metals. The information pertaining to the limit of detection (LOD) for individual heavy metals is included in the NHANES database file. To enhance the credibility of our assessment, we took into account heavy metals with exposure concentrations falling below the limit of detection (LOD) divided by the square root of 2. This meticulous approach was adopted to uphold the robustness of our evaluation process. This meticulous approach was adopted to refine our final assessment. Remarkably, the heavy metal detection data and Type 2 Diabetes (T2D) evaluation data of 1617 participants were acquired with no missing information, as depicted in Fig. 1.
T2D definition
By employing the Diabetes Questionnaire (DIQ) scores, this study's methodology enables the evaluation of the presence of diabetes among participants. Proficient interviewers administer the DIQ, utilizing the Computer-Aided Personal Interview (CAPI) system integrated into the Mobile Examination Center's (MEC) management questionnaire. This process ensures the comprehensive acquisition of the DIQ data, maintaining adherence to standardized protocols. This instrument aims to gather comprehensive data on individuals' encounters with diabetes, prediabetes, and relevant treatments, encompassing both insulin and oral hypoglycemic agents. Furthermore, it seeks to capture insights into diabetic retinopathy, thus contributing valuable information to the overall assessment. This tool also captures the occurrence and prevalence of diabetes over the previous year. Responses to the DIQ are categorized into five possible answers: 1 for "yes", 2 for "no", 3 for "borderline", 7 for "refused to answer", and 9 for "unknown". A diabetes diagnosis was confirmed either by self-report of a prior medical diagnosis or by an HbA1c level of 6.5% (48 mmol/mL) or above. Within the scope of the NHANES survey, the majority of identified diabetes cases are classified as Type 2 Diabetes (T2D) among the general populace.
Measurement of metal concentrations
The National Center for Environmental Health in Atlanta, Georgia's Department of Laboratory Science conducted a thorough analysis of urine and blood samples. In the urine samples, an extensive array of heavy metals, including creatinine, barium, cadmium, cobalt, cesium, molybdenum, manganese, lead, antimony, tin, tungsten, strontium, and uranium, was comprehensively examined. Concurrently, blood samples collectively indicated the presence of lead, cadmium, selenium, manganese, and mercury. The meticulous analyses employed inductively coupled plasma dynamic reaction cell mass spectrometry (ICP-DRC-MS), as elaborated in the 2018 CDC/NCHS report. Emphasizing the omission of urinary arsenic acid, arsenic choline, beryllium, and platinum from the study is of paramount importance. This choice was informed by their consistently low detection levels, registering below the limit of detection (LOD) in more than 80% of cases. In order to gauge metal concentrations that fall below the LOD within the parameters of the National Health and Nutrition Examination Survey (NHANES), computations were formulated by dividing the LOD by the square root of two. Our analytical approach adhered to these calculations, aligning seamlessly with the methodology elucidated in the national report on human exposure to environmental chemicals by the Centers for Disease Control and Prevention (CDC).
Covariates
The scope of this study involved considering demographic and individual characteristics as covariates. These factors included gender (categorized as male or female), age (limited to individuals aged 18 and above), various educational achievements (spanning from below high school to high school completion, and individuals with education beyond high school), and distinct categorizations of body mass index (BMI), precisely delineated as underweight (BMI below 18.5), normal weight (BMI ranging from 18.5 to 24.9), overweight (BMI between 25 and 29.9), and obese (BMI exceeding 29.9). The investigation aimed to explore the impact of these variables on the study's outcomes. For analytical purposes, ethnicity was systematically classified into distinct groups, including Mexican, other Hispanic, non-Hispanic white, non-Hispanic black, and various other categories. This comprehensive categorization facilitated a detailed exploration of diverse ethnic backgrounds. Moreover, the study took into account the stratification of annual household incomes into brackets: $5,000–$9,999; $10,000–$14,999; $15,000–$19,999; $20,000–$24,999; $25,000–$34,999; $35,000–$44,999; $45,000–$49,999; and incomes exceeding $50,000. This classification was implemented to facilitate a detailed analysis of income distribution.
Statistical analysis
We utilized SPSS version 25.0 and R version 4.0 to analyze the collected data, establishing a statistical significance threshold at an alpha level of 0.05. This rigorous examination of the gathered information was facilitated by these software tools. In this study, we present descriptive statistics, highlighting median values and standard deviations. Age, considered as a continuous variable, underwent T-tests for comparison, and chi-square tests were employed to assess distribution disparities between cohorts with and without diabetes. To optimize data distribution, we performed ln transformation followed by normalization on continuous variables. Additionally, we computed Pearson correlation coefficients to determine the association between ln-transformed concentrations of heavy metals. Table S1 presents data on the blood and urine levels of 12 heavy metals in 1617 participants, including average concentration, limit of detection (LOD), and range. The recorded average concentrations for various metals in urine and blood were as follows: 0.50 µg/L for blood lead, 1.67 µg/L for urinary barium, 0.31 µg/L for urinary cadmium, 0.55 µg/L for urinary cobalt, 0.16 µg/L for urinary manganese, 0.07 µg/L for urinary lead, 1.43 µg/L for urinary antimony, 0.17 µg/L for urinary tin, 0.10 µg/L for urinary thallium, 119.6 µg/L for urinary tungsten, and 0.012 µg/L for urinary uranium. The LOD values for these heavy metals ranged from 32.8–80.7%. For further insights into the relationships between these metal concentrations, refer to Table 1. A noteworthy association was observed between blood lead levels and the likelihood of diabetes development. Of particular significance were the robust correlations identified between the risk of diabetes and urinary concentrations of strontium and uranium.
Generalized linear regression
Initially, our approach involved utilizing linear regression to investigate the correlation between exposure to heavy metals and the resulting diabetes scores. The logarithmically transformed levels of individual chemical substances served as the continuous dependent variable, while the continuous outcome variable was represented by the diabetes score. Adjustment for gradients was categorized into four distinct methods, taking into account variables such as gender, ethnicity, chronological age, educational attainment, yearly income of the household, body mass index, presence of heavy metals, and the natural logarithm transformation. The first model did not undergo any adjustment. Gender, age, and ethnicity were taken into account when modifying the second model. The third model expanded on the second by including adjustments for body mass index, level of education, and household annual income. Adjustments for all the mentioned covariates were implemented in the fourth model. The linear regression forest map's standard error (SE) was derived using the chi-square quantile function from the coefficient β, with conversion from the corresponding p value.
Quantile regression
Quantile regression was utilized to assess the potential impact of heavy metal exposure on the risk of diabetes, considering both the general population and those subjected to secondary sources. This statistical approach allowed for a comprehensive examination of the relationship between heavy metal exposure and the risk of developing diabetes in diverse contexts. Employing this statistical methodology facilitated a thorough examination of the impact of heavy metal exposure on the risk of diabetes across various quantiles within the population. This method, traditionally employed for estimation at predetermined quartiles, aids in assessing how exposure impacts outcome distribution variably across segments. Within our analytical framework, we categorized exposure levels into quartiles; the lowest quartile encompassed all values beneath the Limit of Detection (LOD), while the subsequent three quartiles equally distributed the remaining observations. We employed a generalized linear regression model to examine quartile mean values, converting concentrations of heavy metals into quartile rankings. This methodology facilitated the acquisition of a holistic impact, illustrating the directional pattern in outcomes linked to a gradual rise in heavy metal concentrations within each quartile.
Weighted quantile sum
When constructing the Weighted Quantification System (WQS) model, we devised a proficient weighted index to evaluate the confounding influence of six chemicals on the ultimate outcome. This carefully crafted index is intended to appraise the correlation between itself and either the dependent variable or the eventual result. It can be seamlessly incorporated into the regression model alongside pertinent covariates. The methodology expounded in this investigation meticulously incorporates the entire array of measured chemicals. To merit inclusion in the model, these chemicals must consistently exhibit a discernible impact on the progression of diabetes.
To quantify these chemical factors, a Weighted Quantitative Score (WQS) was employed, utilizing decimal classification. This score is utilized to calculate a weighted linear index, providing a comprehensive measure of the cumulative chemical burden. The model then examines the individual influence of each chemical within the overall index effect, considering the relative magnitude of the assigned weights. The formula for the WQS model is outlined below:
$$g\left(\mu \right)={\beta }_{0}+{\beta }_{1}\left(\sum _{i=0}^{c}{w}_{i}{q}_{i}\right)+{Z}^{{\prime }}\phi {\left.\right|}_{b}$$
$$WQS=\sum _{i=1}^{c}{w}_{i}{q}_{i}$$
where was the cutoff value; represented the covariance matrix and represented the covariance coefficient; the number of chemicals is (here, six). The sum of the whole weighted index (\({w}_{i}\)) was equal to 1, with each value from 0 to 1 (\(\sum _{i=0}^{c}{w}_{i}{\left.\right|}_{b}=1\), 0 ≤\({w}_{i}\) ≤1). The regression coefficient of the WQS index was \({\beta }_{1}\). \({q}_{i}\)denotes the decile of a chemical score (\({q}_{i}\) = 0, 1, 2…10 represent the first, second...tenth deciles, respectively). When the results were continuous variables, it was assumed that there was a linear linkage g(µ). Forty percent of the dataset was used for training, and 60% was used for validation. In a further step, it was set as an unconstrained positive coefficient. The \({w}_{i}\)for each chemical and for each time were set by the training set. After 1,000 bootstraps, we obtained 1,000 sets of indices. When\({\beta }_{1}\) was positive and the iterations of the same group converged successfully and the parameter conversion was not equal to two, the empirical weights were obtained by averaging all the obtained \({w}_{i}\)for each chemical. The average \({w}_{i}\)from the chemicals was calculated by applying it to the function to compute the statistical significance of the validation dataset.