Study design
The data used in this study were extracted from patients who received ART between 2003 and 2019 from Nanjing AIDS Prevention and Control Information System (AIDS-PCIS). All patients received a free combination antiviral treatment (cART) containing at least three antiviral medicines. The follow-up started after treatment initiation and the participants were visited every three months. The observation end point was December 31, 2019, and the outcome was death. The survival time was defined as the duration from ART initiation to death or December 31, 2019. The inclusion criteria included: 1) living in Nanjing; 2) being visited at least once; 3) being over 18 years old when ART started; 4) having complete laboratory test data before starting ART.
At the end of the observation, 4573 patients met the inclusion criteria, and a total of 120 patients died from HIV/AIDS-related diseases during the follow-up.
Data collection
Demographic data and clinical information were retrieved from face-to-face surveys at the patients’ enrollment or extracted from their medical records using a structured questionnaire designed specifically for AIDS-PCIS. The information included the date of birth, gender, height, weight, marital status, infection route and WHO clinical stage. The age of the patient was calculated from the date of birth to the date of starting ART. Body mass index (BMI) was calculated using the following formula: BMI = weight (kg) / (height (m) * height (m)).
The laboratory testing information was obtained from the Nanjing Center for Disease Control and Prevention (CDC) or local hospitals. The laboratory testing indicators included CD4, white blood cell (WBC), blood platelets (PLT), HB, serum creatinine (CR), triglycerides (TG), serum total cholesterol (TC), fasting blood glucose (GLU), aspartate aminotransferase (AST), alanine aminotransferase (ALT), total bilirubin (TBIL). All these laboratory tests were carried out by the trained technical personnel strictly following clinical guidelines at each visit in the central laboratory of local hospitals or Nanjing CDC.
Routine blood biochemical indexes, such as TG, TC, GLU, CR, AST, ALT, and TBIL, were measured using a Beckman AU5800 automatic biochemical analyzer (Beckman COULTER K., Japan). Other indexes including WBC, HB and PLT were evaluated by Sysmex Xe-2100 automatic blood cell analyzer (Sysmex Corporation, Japan). CD4 was determined by the BD FACSCalibur flow cytometer (Becton Dickinson Corporation, USA).
Statistical analysis
Data processing
Since the prediction model is based on a multi-factor regression model, there is no simple method to estimate the sample size. When the number of predictors is much larger than the observations of outcome, overfitting may occur. Previous literature showed that in the conservative estimation, one prediction factor requires at least 10 effective outcomes. In this study, there were 120 cases with effective outcomes, so the number of predictors should be no more than 12.
Since directly dropping the observation with missing values might not only lead to selection bias, but also decrease the power of a test, missing value imputation was applied to obtain suitable values by employing the values of other variables before data analysis. The results were listed in Fig. 1. A sensitivity analysis was carried out to evaluate the filling effect of the missing values (Table 1).
A total of 120 deaths caused by AIDS-related diseases were contained in the case group of the nested case-control study. To ensure that all the subjects in the case group could have a matching control, PSM was applied at a ratio of 1:4 to determine the participants (a case was well matched by age and gender with 4 controls) [28]. Finally, 600 subjects were included in this study with 120 dead and 480 alive PLHIV who were separated into 120 blocks.
Table 1
Sensitivity analysis in imputation for missing data
Variables | Before imputation(mean ± standard deviation) | After imputation(mean ± standard deviation) | P value |
Weight, kg | 64.8 ± 10.2 | 64.8 ± 10.1 | 0.9903 |
Height, cm | 172.0 ± 6.1 | 172.0 ± 6.0 | 0.9405 |
BMI, kg/m2 | 21.9 ± 3.0 | 21.8 ± 3.0 | 0.5320 |
White blood cell count, 109/L | 5.65 ± 2.0 | 5.65 ± 2.0 | 0.9950 |
Blood platelet, 109/L | 188.7 ± 62.5 | 188.7 ± 62.4 | 0.9884 |
Hemoglobin, g/L | 142.7 ± 30.9 | 142.7 ± 30.8 | 0.9729 |
Creatinine, mmol/L | 71.3 ± 19.4 | 71.3 ± 19.3 | 0.9817 |
Triglyceride, mmol/L | 1.7 ± 2.3 | 1.7 ± 2.3 | 0.8398 |
Total cholesterol, mmol/L | 4.3 ± 1.3 | 4.3 ± 1.3 | 0.7732 |
Fasting plasma glucose, mmol/L | 5.6 ± 1.3 | 5.6 ± 1.3 | 0.9293 |
Aspartate aminotransferase, U/L | 27.0 ± 27.1 | 27.1 ± 27.1 | 0.9891 |
Alanine aminotransferase, U/L | 30.2 ± 29.2 | 30.2 ± 29.2 | 0.9964 |
Total bilirubin, mmol/L | 12.0 ± 6.2 | 12.0 ± 6.2 | 0.9905 |
Establishment and validation of prediction model
The patients were randomly split into a training set and a validation set at a ratio of 7:3. The comparability of the training set and validation set was then evaluated. Continuous variables with normal distribution were presented as mean ± standard deviation, and t-tests were used to infer the differences between the training and validation sets. The continuous variables subjecting to skewed distribution were described using median (first quartile, second quartile). The Wilcoxon rank-sum tests were employed for comparisons. Frequency (ratio) was utilized to describe the characteristics of categorical variables, and comparisons between the two sets were performed using chi-square tests or Fisher’s exact tests.
Then the data in the training set were used to fit a model and the data in validation set were applied to evaluate the efficacy of the model. Based on the data in the training set, a univariate Cox regression analysis was performed on the variables. P-values of the variables were calculated based on the univariate Cox proportion hazard regression model. The variables with p-values less than or equal to 0.2 were included in a multivariate Cox proportion hazard regression model. After multivariate analysis, the factors with p-value less than or equal to 0.05 were included in a prediction model. According to Occam’s Razor, the model with the fewest variables is the best [29]. Finally, we combined the statistically significant risk factors with professionally significant factors, such as the difficulty of index measurement, the cost of measurement and the difficulty of application, to determine the predictive factors and select a prediction model with better predictive performance.
The repeatability and extrapolation of an established prediction model should be evaluated to assess its performance. A strict evaluation of the prediction model should include internal validation and external validation. The internal validation is performed using the same dataset as the training set. This study employed the bootstrap resampling [30] for internal validation because there lacked extra data to verify the model. The 1,000 resampling performances of the model were averaged as the internal validation performance.
Discrimination and calibration are the two most common evaluation indicators. The discrimination of the prediction model is quantified using the area under the curve (AUC) and C-Index. The C-Index value ranges from 0 to 1. The closer C-Index is to 1, the better the discrimination of the model is. A C-Index of 0.5 indicates that the model has no predictive ability. When C-Index is less than 0.5, the model prediction is contrary to the actual results. In general, a C-Index of 0.7 indicates a good prediction performance of the model. However, discrimination cannot reflect whether the estimate of absolute risk of prediction model is accurate or not, because it is only based on risk scores or the ranking of prediction probabilities. Therefore, calibration, a more accurate indicator, is needed to qualify the prediction model. In this study, the calibration of the model was evaluated using the calibration curve.
We sorted the predicted probabilities of all participants from the smallest to the largest, and divided the patients into ten equal parts.
The average predicted probability of patients in each divided part was used as x-axis and the proportion of actual events as y-axis. Ideally, the calibration graph was a straight line with an intercept of 0 and a slope of 1. The predictive ability of the model was also evaluated using decision curve analysis (DCA).
Integrated discrimination improvement (IDI), net reclassification index or improvement (NRI) and other indicators that are used to compare models or evaluate the increase in predictive performance of individual predictors were not discussed in the present study.
Presentation of nomogram
The prediction model was visualized and presented by a nomogram. To calculate the score of each variable in each level, a scoring standard was developed based on the standard regression coefficients of all variables. Then using the scores of these factors, a total score was calculated to indicate the survival probability of each patient.
All data analyses and figures were made using R software version 3.6.3. All hypothesis tests were two-sided, with an α level of 0.05.