Cross‑sectional study
Database and study subjects
NHANES is a large-scale, multi-stage, ongoing, and nationally representative study conducted by the National Center for Health Statistics, a subsidiary of the Centers for Disease Control and Prevention in the United States. Details of sampling methods and data collection have been announced, and the research protocol has been approved by the National Center for Health Statistics Ethics Review Committee. All study participants have signed informed consent forms during the survey[10]. This study used adult NHANES data from 2011 to 2020 (age ≥ 20 years old) to collect information on Hb, blood uric acid, age, gender, education level, marital status, chronic disease, smoking status, body mass index, etc. Participants lacking the above information were excluded.
Definition of diseases
The HUA is defined as more than 6 mg per deciliter [360μmol per liter] in women and more than 7 mg per deciliter [420μmol per liter] in men[11].
Other covariates
According to the data definition in the report, gender is divided into male and female. Race is divided into Mexican Americans, other Hispanic, non-Hispanic White, non Hispanic black, non Hispanic Asian, and other races. The educational level is divided into less than 9th grade, 9-11th grade (includes 12th grade with no diploma), high school graduate/GED or equivalent, some college or AA degree, college graduate or above. Marital status includes Married/Living with Partner, Widowed/Divorced/ Separated, never married. The calculation method for body mass index (BMI) is to divide the measured weight (kg) by the measured height (m2). Smoking more than 100 cigarettes in one's life is defined as a smoker[12]. Participants who reported hypertension, diabetes and hyperlipidemia were defined as having related chronic diseases.
Statistical analyses
Mean values and proportions of baseline characteristics were compared using linear regression for continuous variables and logistic regression for categorical variables. Covariates with statistical signifcance were included in the multivariate logistic analysis to examine the association between Hb and HUA. Odds ratios (OR) with 95% CIs and p values were calculated. A two-tailed test with p<0.05 was considered significant.
Mendelian randomization study
Data sources
This study is based on aggregated genome-wide association study (GWAS) data from participants of the European Large Genome Consortium (EUR) to assess the association between plasma Hb and HUA. The GWAS ID for the HUA dataset is ebi-a-GCST90018977, which involves 343,836 samples containing a total of 19,041,286 snps. And the Hb dataset with GWAS ID ebi-a-GCST90025969, which involves 445,373 samples containing a total of 4,234,826 snps. Detailed data can be found at https://gwas.mrcieu.ac.uk/. As this study used a publicly available GWAS dataset for MR analysis, ethics committee approval and signing of informed consent by participants can be found in the original study from each data source[13][14].
Selection of SNPs
SNPs related to Hb concentration were used as exposure data, screen SNPs of instrumental variables based on MR’s three core assumptions (correlation, independence, exclusivity), with a significance level of P<5×10-8, with r2<0.001 and a locus width of N=1 000kb, indicating that the selected SNPs are in a non linkage imbalanced state to ensure their independence from each other. Export to Excel to calculate the F-statistic. The calculation method for the F-statistic is F=R2/(1-R2) * (N-K-1)/K, R2=2 * MAF * (1-MAF)* β 2/sd2; MAF represents minor allele frequency. EAF represents the effect allele frequency. If EAF>0.5, then MAF=1-EAF; If EAF<0.5, then MAF=EAF. All weak instrumental variables with F<10 were removed. Then remove SNPs related to confounding factors such as metabolic disorders, obesity, gout, smoking and alcohol consumption, such as "rs477992", "rs760077", "rs1209384", "rs3791020", "rs1264081", "rs2275426", etc. (see supplementary materials for details). And the significance of SNPs with outcome factors is P>5×10-8.
Statistical analyses
We use the four MR methods in the TwoSampleMR software package to calculate the ratio between the impact of SNPs on exposure and outcomes, and then combine the results of each SNP calculation to evaluate the causal relationship between exposure and outcomes. These MR methods include inverse variance weighting (IVW), MR Egger regression, weighted median method and weighted mode. Among them, controlling for the level pleiotropy of IV to obtain unbiased estimates of the results has the greatest statistical power and is considered the most effective method by MR analysis. Therefore, this study used the results obtained from IVW analysis as the main reference result, and P<0.05 indicates statistical significance. The weighted median method is suitable for situations with a large number of invalid IVs, and even if the number reaches 50%, it can generate reliable causal estimates. The MR Egger method has a high tolerance for SNPs with horizontal pleiotropy.
To evaluate whether there is heterogeneity among the selected IVs, the Cochrane's Q test is first conducted. If P>0.05 shows no significant heterogeneity, the IVW fixed effects model is used to analyze the estimated Wald values for each SNP. Otherwise, a random effects model is used. MR Egger regression refers to evaluating the level of validity of an analysis by the size of the intercept. In a scatter plot, the correlation between the intercept and zero is tested. If P>0.05, it means that there is no level of validity; otherwise, there is. In addition, MR-PRESO analysis was conducted on exposure factors with a number of SNPs≥4, with a setting of K=10000. Abnormal SNPs were excluded, and the corrected results were estimated. Finally, sensitivity analysis was conducted using Leave one out to evaluate the bias of individual SNPs in MR analysis. SNPs with potential impact on the results were removed one by one, and MR analysis was performed again to ensure the stability of the results.