Descriptive Analysis and Univariate Analysis
A total of 2,403 people completed the research survey. Among the 2,403 subjects, 199 individuals (8.3%) are HIV positive, and 2,204 individuals (91.7%) are HIV negative. In populations with different HIV infection statuses, there are different distributions of some demographic, behavioral, sociological and psychological characteristics. The results of descriptive analysis and univariate analysis are shown in Table 1.
The chi-square test results showed that some demographic characteristics, including age, education, marriage and occupation, were associated with HIV infection status. Regarding age, there were differences in HIV prevalence among different age groups (p=0.008), with the highest prevalence of 11.5% in MSM aged 35-44 years. For education, as the educational level increased, the risk of HIV infection decreased (p<0.001), with a prevalence of 15.9% in junior high school or below and 7.2% in college or above. In terms of marital status, the HIV prevalence was significantly higher (15.0%) among the divorced or widowed group compared to other categories (p=0.027). For occupation, physical laborers and freelancers had higher HIV prevalence, respectively 8.5% and 10.1%(p=0.015).
Regarding behavioral characteristics, MSM with a history of STDs had a prevalence of 30.2% (p<0.001), and those who used drugs before sex had a prevalence of 27.5% (p<0.001). Additionally, MSM with everyday smoking (13.9%, p < 0.001), MSP (10.1%, p=0.001), UAI (11.1%, p < 0.001), group sex (10.2%, p=0.032), higher frequency of sex (14.3%, p=0.009), assuming the receptive role in same-sex intercourse (11.6%, p<0.001) and drinking before sex (9.7%, p=0.046) had higher HIV prevalence.
For psychological characteristics, low self-esteem, depression and anxiety were identified as risk factors for HIV infection (p<0.05). The HIV prevalence was 12.7% in MSM with low self-esteem. Compared to other categories, moderate and severe levels of anxiety and depression had significantly higher HIV prevalence.
Multivariate Analysis
We selected 18 factors associated with HIV infection through univariate analysis. Using the forward LR method for multivariate analysis, we refined these variables to eight key factors: age, education, marriage, smoking, sexual pattern, STD, using drugs before sex and depression. Ultimately, we included these variables in the construction of the HIV infection predictive model.
Predictive Model
Based on the preliminary research, we found that using stepwise regression to screen predictive factors can yield the optimal predictive model, with good predictive accuracy and clinical application value(21). Therefore, in this study, we similarly adopted this method for model construction. We employed the full-entry stepwise regression method to include eight variables in the model: age, education, marriage, smoking, sexual pattern, STD, using drugs before sex and depression. The results of the model are shown in Table 2.
Regarding age, MSM aged 35-44 have a higher risk of HIV infection compared to other age groups (OR=1.842, 95% CI 1.147-2.958, p=0.012). Compared to MSM with an education level of junior high school and below, higher educational levels may be associated with a reduced risk of HIV infection (College and above: OR=0.386, 95% CI 0.229-0.650, p<0.001). Sometimes smoking (OR=1.799, 95% CI 1.193-2.713, p=0.005) and everyday smoking (OR=2.225, 95% CI 1.473-3.363, p<0.001) may be related to HIV infection. In terms of sexual patterns, compared to those who only engage as receptive partners, being only the insertive partners (OR=0.364, 95% CI 0.245-0.543, p<0.001) and engaging in both roles (OR=0.613, 95% CI 0.424-0.885, p=0.009) may be associated with a lower risk of HIV infection. Regarding depression, moderate (OR=1.748, 95% CI 1.067-2.864, p=0.027) and moderately severe (OR=2.398, 95% CI 1.414-4.068, p=0.001) levels of depression may be related to HIV infection. In addition, having an STD (OR=5.672, 95% CI 3.868-8.319, p<0.001) and using drugs before sex (OR=3.017, 95% CI 1.575-5.777, p=0.001) may be associated with an increased risk of HIV infection. Meanwhile, we also visualized the OR values of each variable to clarify their impact on HIV infection.
The results of the Hosmer-Lemeshow test indicated that the p-value for the model is 0.3834. Since the p-value is greater than 0.05, it suggests that the model fits the observed data well. We constructed a nomogram based on the model, incorporating the following 10 variables: age, education, marriage, occupation, smoking, STD, MSP, sexual patterns, using drugs before sex and depression. The nomogram is illustrated in Figure 1. The Area Under the Curve (AUC) of the nomogram is 0.783 (95% CI: 0.749-0.816), indicating that the model has good predictive performance, as shown in the Receiver Operating Characteristic (ROC) curve in Figure 2. The nomogram's calibration curve demonstrates the consistency between predicted results and observed outcomes. The results show that the curve aligns well with the diagonal line in the low probability region, with some deviation in the high probability region. Considering that the incidence of HIV infection in the study population typically ranges from 5% to 10%, the model exhibits good consistency. The calibration curve is depicted in Figure 3. Additionally, we evaluated the clinical utility of the predictive model through Decision Curve Analysis (DCA). The DCA results indicate that for HIV infection in the MSM population, the net benefit of using the predictive model is higher than the strategies of treating all or treating none, demonstrating the model's clinical value. The DCA curve is shown in Figure 4. In summary, based on the evaluation results of the model, we can conclude that this HIV predictive model fits well.