Baseline characteristics
The baseline characteristics of the dyslipidaemia and non-dyslipidaemia populations are shown in Table 1. The average age of all subjects was 50.49±12.16 years. The prevalence of dyslipidaemia was 44.38%. The differences in family history of diabetes, BMI, and lipid levels were statistically significant in dyslipidaemia and non-dyslipidaemia populations (all P <0.05).
Association between GRS and dyslipidaemia
The mean value of GRS in all participants was 1.33 (SD: 0.34). The overall association was significant between GRS and dyslipidaemia, with a crude HR (95% CI) of 1.366 (1.187, 1.572) and an adjusted HR (95% CI) of 1.353 (1.172, 1.561) (Table 2). Then, the GRS was divided into quartiles. Compared with Q1, subjects in the Q2, Q3, and Q4 groups had HRs (95% CI) of 1.043 (0.900, 1.210), 1.188 (1.028, 1.374), and 1.229 (1.069, 1.412), respectively, when adjusted for age, family history of diabetes, physical activity, BMI, and blood lipid indicators. The significant association suggested that the risk of developing dyslipidaemia steadily increased as the GRS increased. By the same token, adjusted and crude HRs showed the same constant increment in the training set and testing set.
Development and evaluation of the conventional models
In the training set, the 14 reported predictors were analysed using simple Cox regression, and 8 variables (age, family history of diabetes, physical activity, WC, BMI, TGs, HDL-C, and LDL-C) were statistically significantly related to dyslipidaemia. Eventually, the conventional models were composed of age, family history of diabetes, physical activity, BMI, TGs, HDL-C, and LDL-C (Table 3, above), considering the collinearity between WC and BMI. It is worth noting that there was no collinearity among TG, HDL-C, and LDL-C. The AUCs and their differences of the 4 conventional models with different classifiers are shown in Figure 1 and Table 4. In the testing set, the AUCs of the conventional models with the Cox, ANN, RF, and GBM classifiers were 0.702(0.673, 0.729), 0.736(0.708, 0.762), 0.787 (0.762, 0.811), and 0.816(0.792, 0.839), respectively, indicating that the conventional models showed quite high performance in predicting dyslipidaemia, especially the model with the GBM classifier. In addition, concerning that it may be not practical to use blood lipid indicators to predict dyslipidaemia. The AUCs of the prediction model without the blood lipid index were calculated for the conventional and conventional+GRS model, and the AUCs were 0.553 (0.523, 0.583) and 0.569 (0.539, 0.598), respectively, when using the Cox classifier. The prediction model using machine learning methods showed the similar poor performance (see Table S4).
Development and evaluation of conventional models with GRS
The conventional+GRS model combined conventional factors and the GRS (Table 3, below). Table 4 shows the differences in discrimination between the conventional model and conventional+GRS model. In the case of using the Cox classifier, the addition of GRS improved the predictive ability of the conventional model in a limited way. The conventional model showed moderate discrimination, and the AUC increased slightly with the addition of GRS to 0.707 (0.679, 0.734); the difference in AUC was 0.00491 but was not statistically significant at P=0.0549. Notwithstanding, the addition of GRS resulted in a statistically significant continuous NRI of 25.6% (13.8%, 35.8%) and IDI of 2.3% (1.1%, 3.7%). For the ANN classifier, the addition of GRS increased the AUC to 0.754 (0.727, 0.779); the difference in the AUC was 0.0183 (P =0.0031). Nevertheless, the continuous NRI and IDI were 7.8% (-2.7%, 18.5%) and 1.0% (-0.3%, 2.4%), respectively, presenting no statistical significance. Additionally, the conventional+GRS model with the RF classifier resulted in significant improvements (NRI: 14.1% (1.1%, 26.1%); IDI: 2.5% (0.5%, 4.2%)), demonstrating the competent progress of GRS in predicting dyslipidaemia. The discrimination of the prediction model showed significant improvements better than the GBM classifier when adding GRS into the conventional model. Figure 2 provides the receiver operating characteristic curves (ROCs) for the conventional and conventional+GRS models with different classifiers. The results suggested that the addition of GRS could improve the prediction performance of the conventional models in some aspects in most classifiers. In addition, the GBM classifier presented the best performance with an AUC of 0.831 (0.808, 0.853) of all the conventional models.
Figure 3 demonstrates the calibrations of the conventional and conventional+GRS models. The calibration curves of the conventional+GRS models were closer to the reference line than those of the conventional models. The Brier scores, which can be considered a "calibration" measure of a set of probabilistic predictions, also declined with the addition of GRS (Cox declined 0.048, ANN classifier slightly declined 0.005, and GBM declined 0.006), indicating conventional models were provided with better calibration when incorporating GRS (The lower the Brier score value, the better the prediction calibration). Other statistics, such as sensitivity and specificity, were also provided in Table S3. These metrics provided further evidence that the predictive ability of the models was improved by adding GRS.