Patients
We included the RA patients in four hospitals from October 2018 to August 2019 in the Third Affiliated Hospital of Sun Yat-sen University, Zhuhai Hospital of Guangdong Chinese Medicine, Ganzhou Municipal Hospital, and Fujian Provincial Hospital; we also contemporarily reviewed randomly-selected healthy-check files from these hospitals. The population of interest was 18 or older and diagnosed with RA (satisfied the 2010 ACR / EULAR classification criteria[18]). For RA patients with more than one admission in the study period, only data from the first admission were analyzed. Exclusion criteria included when participants were unable to answer questions, pregnant, with parathyroid disorders, with a malignant tumor, chronically using medication that could affect bone mineral density like bisphosphonates, vitamin D/calcium supplements, refused to write informed consent or refused to have a dual-energy X-ray absorptiometry (DXA). The principal center was the Third Affiliated Hospital, Sun Yat-sen University. The detailed study flow diagram is shown in Figure 1.
Main outcome variable
BMD, T-score, and the Z-score of the lumbar spine 2-4, femoral neck, and total hip were collected from DXA reports (Hologic Discovery A densitometer, Badford, MA, USA) after blood samples had been taken. In our study, the diagnosis of BMD is the outcome variable. According to the World Health Organization[19], the definition of the T-score and the Z-score generates the results of BMD. A T-score ≥ −1.0, between −1.0 and −2.5, and ≤−2.5 represent the expected condition, osteopenia, and osteoporosis, respectively, as a diagnosis standard for men aged and over 50 and postmenopausal women. Meanwhile, the Z-score is used for premenopausal women and males aged under 50. A Z-score of −2.0 or lower indicates a lower BMD compared to the peers ('score below the expected range for age'). Therefore, either HC or RA patients were divided into two subgroups according to the diagnosing conditions, then stratified these subgroups into five ones by BMD results.
Study factors
Thirty-three independent variables (Table 2) were also collected. Smoking and drinking habits, medical and medication history, and laboratory examinations were taken from each participant's history. Dyslipidemia included hypercholesteremia and hypertriglyceridemia. 'Chronic usage' of non-steroidal anti-inflammatory drugs (NSAIDs) or GC was defined as consecutively taking these medications at least the last three months. 'Rheumatoid factor positive' was defined when the concentration reached or over 30 IU/ml; anti-cyclic citrullinated peptide antibodies (anti-CCP), antikeratin antibodies (AKA), and anti-RA33 antibodies (RA33) 'positive' was defined when their concentrations were at or over 20 IU/ml. Ethical approval was obtained from the Ethics Committee of the Third Affiliated Hospital of Sun Yat-sen University (Guangzhou, China). The registration no. of ethics approval of the study was [2018]02-283-01. Written informed consent was obtained from all individuals participating in this study.
Sample size
A systematic sampling design was used to select the participants. The sample sizes were estimated by PASS 15 software (https://www.ncss.com), with the statistical power (1-β) set 0.90, type I error (α) set 0.05, and assuming that the prevalence of complicating with OP was 40%[20] among RA patients. The software calculated that a total sample size of at least 312 would suffice. To ensure adequate events of each subgroup, we finally recruited 405 patients with RA and 198 healthy subjects for the present study.
Statistical analysis
Data were manually entered into EpiData (http://www.epidata.dk/) and then imported into Microsoft Office Excel (version 2016). Two physicians rechecked and transferred this data to the R software (version 3.6.1) for analysis. Continuous variables are marked as the mean ± standard error (SD), while discontinuous variables are presented as frequency and percentage. Dependent variables / primary outcomes were the T-score, Z-score, and corresponding diagnoses of BMD of the lumbar spine, femoral neck, and total hip, divided and stratified as mentioned above. A two-tailed t-test was used for comparing normally distributed continuous variables, and the Kruskal-Wallis H test was for non-normally distributed ones. Pearson's or Fisher's exact χ 2‐test was performed for categorical variables and the Cochran-Armitage trend test for appropriate ordinal variables. R (version 3.6.1) was used for statistical analyses, and statistical significance was assumed at the p < 0.05 level.
Model development
Owing to the inadequate amount of young RA patients, predictive models were only created for RA patients whose BMD was diagnosed with T-score. We took three different approaches of regression model development to ensure the robustness and validity of the regression models: clinical knowledge-driven, conventional logistic regression models (model A), least absolute shrinkage and selection operator (LASSO, model B), and random forest (RF, model C). We separated the data of all subgroups randomly into training sets (70%) and verification sets (30%), with the same positive-event proportion; the training set was for modeling, and the other was for validation, which could be evaluated by C-statistics, calibration slope, and the accuracy.
Model A: We preselected and then entered candidate variables based on existing literature or well-established risk factors into logistic regression models. The final set of variables included only those with a p-value<0.05 from the regression analysis.
Model B: LASSO is an ideal method to improve multicollinearity[21]. The LASSO procedure underwent 5-fold cross-validation to avoid over-fitting. We entered all 33 candidate variables into the LASSO models.
Model C: Random forest model assembles hundreds of more classification trees with a selection of correlates randomly[22]. We applied all 33 variables into the random forest models. The out-of-bag (OOB) estimates error rates; the Gini index was used to reference the relative permutation importance[23] of the correlates. We selected the important factors by giving the Gini index >5.