Clinical data and patient characteristics
The demographic information of the study participants was summarized in Table 1. Among 225 RA patients, the positive rate of RF and ACPA was 57.8% and 42.2%, respectively. These samples were randomly divided into two independent cohorts. The discovery set consisted of 172 RA patients and 71 normal controls (NCs), and the validation set consisted of 53 RA patients and 29 NCs (Figure 1A). The alluvial plot shown in the Figure 1B displayed the number of individuals across the disease status, gender, disease activity score (DAS28-CRP and DAS28-ESR), as well as the status of autoantibodies such as rheumatoid factor (RF), ACPA, anti-nuclear antibodies (ANA), and anti-keratin antibodies (AKA). The positive rates of RF, ACPA, ANA, AKA in the RA patients were 57.8%, 42.2%, 49.4% and 49.4% (Table 1 and Figure 1B).
Metabolomic and lipidomic profile (MLP)
In four datasets generated from four different LC-MS runs, there were 1697 and 1960 high-confident peaks identified for polar metabolites with method 1 and 2, and 4771 and 3973 high-confident peaks identified for lipids with method 3 and 4. From this 12401 high-confident peaks we identified 265 candidate disease-associated metabolites and lipids (Table S1), including 38 organic acids, 10 amines, 37 amino acids, 5 nucleotides, 4 bile acids, 26 acyl-carnitines (AcCa), 118 glycerophospholipids, 9 sphingolipids and 2 glycerolipids. These metabolites represent enriched metabolic pathways involving carnitine synthesis, oxidation of branched-chain fatty acids, biotin metabolism, malate-aspartate shuttle, citric acid cycle, urea cycle, phenylalanine and tyrosine metabolism, phospholipid biosynthesis and histidine metabolism (Figure S2).
Clustering of test samples using MLP and pathway enrichment analysis
To analyze whether any of the identified metabolites or lipids are associated with RA, we carried out the univariate and multivariate analysis of the identified metabolites and lipids. The well-established partial least squares-discriminant analysis (PLS-DA) demonstrated excellent separation of NCs from seropositive RA and seronegative RA based on MLP (Figure 2A). The permutation tests (n = 1000) were performed to validate the PLS-DA model we used (Figure S3). This separation clearly demonstrated the difference in serum metabolite and lipid levels that existed between RA patients and normal control subjects. Statistical analysis identified 68 significantly upregulated and 38 downregulated serum metabolites and lipids that were correlated with RA (Figure 2B). Pathway enrichment analysis (Figure 2C) highlighted the top enriched metabolic pathways in RA. Specifically, metabolic products associated with Warburg effect, pentose phosphate metabolism, glycolysis, and lipid metabolism products involving phospholipid, sphingolipid, oxidation of branched chain fatty acids and carnitine synthesis were upregulated in the RA group, while histidine metabolism was downregulated.
Constructing a multivariate classification model for RA
We constructed a multivariate classification model including 26 metabolites and lipids that could separate seropositive and seronegative RA cases from normal controls (Figure 2D-E). We evaluated three machine learning algorithms such as binary logistic regression, random forest and support vector machine on the above-mentioned metabolites and lipids for the classification of RA cases using the discovery set (n = 243). The binary logistic regression algorithm based model trained with leave-one-out cross-validation had an accuracy of 100% (Figure 2E, AUC = 1.00). This model was tested using the independent validation set (n = 82, Table S2), and showed AUC = 0.91 with test sensitivity of 89.7% and specificity of 90.6% (Figure 2E). We did an analysis of the 5 misclassified RA cases, and it is noteworthy that all of the 5 cases had borderline to positive RF antibody levels (Table S3).
Association of MLP and clinical parameters
To understand the association of MLP with clinical parameters of RA, we used a correlation-based network approach to explore the co-occurrence patterns between 28 clinical variables and serum concentrations of 265 identified metabolites and lipids in the RA patients, which could help to explore the correlation of metabolic products and clinical confounders. In the co-occurrence network (Figure 3), nodes represent clinical parameters or metabolites and lipids, and two nodes are connected if they are significantly correlated (adjusted p-value < 0.05 and r-value > 0.2). The resulting network consisted of 117 nodes and 274 edges. The modularity index was 0.467, which is above 0.4 suggested for a modularly structured network [23]. Overall, the entire network could be parsed into six modules. As shown in Figure 3, module 1 contained CRP, leukocyte, neutrophil and ACPA related to inflammatory activity and 17 metabolites and lipids. Modules 2 and 3 contained RF, IgG, IgM, IgA, globulin, ESR, ANA and AKA related to the immune response and 22 metabolites and lipids. The general clinical variables such as age, gender and BMI were clustered in modules 4-5 and were far away from modules 1-3. Therefore, the co-occurrence network analysis revealed that there was a clear association between the serum inflammation/immune markers and concentrations of metabolites and lipids in the RA patients.
Stratification of disease activity using MLP
To analyze the utility of using MLP for stratification of RA cases based on disease activity, twenty metabolites and lipids strongly correlated to the inflammatory/immune activity were selected from modules 1 and 2 in the co-occurrence network (Figure 3). Sixteen metabolites and lipids had significantly increased odd ratios for RA status, while 4 metabolites and lipids had significantly decreased odd ratios (Figure 4). Disease activity score DAS28-CRP is often utilized to evaluate the disease activity of RA patients [24,25]. Using DAS28-CRP to stratify RA patients, we further identify 7 metabolites and lipids that were strongly associated with disease activity categories by using the ordinal regression method (Figure 5). Among these, AcCa (20:3), aspartyl-phenylalanine (asp-phe), pipecolic acid, lysophosphatidylethanolamine (LPE 18:1) and LPE (20:3) appeared to be positively correlated with higher RA disease activity, while histidine and PA (28:0) were negatively correlated with RA disease activity (Figure 5A). AcCa (20:3), LPE (18:1) and LPE (20:3) were significantly increased in the remission-low risk (R-L) group compared to the NCs group (p < 0.05, AUC > 0.7). A small peptide asp-phe and lipid LPE (18:1) were significantly increased in the DAS28-CRP high disease activity (HIGH) group compared with the R-L group (p < 0.05, AUC > 0.7), indicating their potential role as biomarkers in RA disease activity stratification (Figure 5B).