Association of baseline gene expression signatures and biomarkers with response to anti-TNF treatment
In order to decrease the heterogeneity between analyzed groups for all the presented analyses, we included only female patients diagnosed with RA in our study (18 responders and 10 non-responders). The gene expression analysis was performed for the baseline data between the group of future responders compared to the group of non-responders. The analysis identified 59 differentially expressed genes and most of these genes achieved significance (FDR ≤ 0.1) due to a single outlier sample showing different gene expression profile (Additional file 1: Fig. S2A-B). Therefore, we decided to exclude this patient sample (both anti-TNF naïve and treated) from the subsequent analyses. The further analysis (without the outlier sample) yielded 192 differentially expressed genes (FDR ≤ 0.1), including 103 genes with higher expression in the group of responders and 89 genes with higher expression in the group of non-responders. The top differentially expressed genes are represented in Fig. 1A and listed in Table 2 (also see Additional file 2: Table S1).
Many differentially expressed genes showed significant differences between responders and non-responders. However, this was found to be due to outlier expression in one or more patient samples. In order to assess possible heterogeneity and to detect the genes with stable association to response, we employed a leave-one-out (LOO) approach, where we removed one patient sample in each iteration and repeated the association analysis. The genes are considered in each iteration if the gene meets statistical significance with a false discovery rate less than 0.1. After performing the LOO approach, only two genes, Epiplakin (EPPK1) and FosB Proto-Oncogene, AP-1 Transcription Factor Subunit (FOSB) with higher expression in future responders showed significance in all possible 28 iterations (FDR < 0.1 across all 28 LOO iterations, Table 2). And additional five genes, EGR1, EGR2, BCL6-AS1, IGLV10-54 and IGKV1D-39 showed significance (FDR < 0.1) in 27 LOO iterations. The genes EGR1, EGR2 and BCL6-AS1 were expressed higher in future responders, whereas immunoglobulin light chain genes IGLV10-54 and IKV1D-39 were expressed lower in future responders. We plotted the normalized expression counts of these seven differentially expressed genes and found only genes EPPK1 and BCL6-AS1 showed clear difference in expression between responders and non-responders (Fig. 1B) whereas FOSB, EGR1 and EGR2 did not show clear differences (Additional file 1: Fig. S2C). Also we noticed 6 immunoglobulin light chain genes (IGLV10-54, IGKV1D-39, IGKV3-20, IGLV3-1, IGKV1-17, IGKV2-24) and 1 heavy chain gene (IGHV5-10-1) showing lower expression in group of future responders compared to group of non-responders, however these genes did not pass the LOO analysis (FDR < 0.1) (Fig. 1A). The number of differentially expressed genes that showed significance across the 28 LOO iterations varied from 15 to 1617, indicating that the statistical analyses in DESeq2 were sensitive to outlier samples.
Table 2
Top differentially expressed genes in PBMCs of female future anti-TNF responders and non-responders before treatment inititation.
Genes | Description | log2FoldChange | p-value | Iteration count |
FOSB | FosB Proto-Oncogene, AP-1 Transcription Factor Subunit | 3.88 | 6.25E-09 | 28 |
EPPK1 | Epiplakin 1 | 1.89 | 6.06E-07 | 28 |
EGR2 | Early Growth Response 2 | 3.98 | 1.63E-07 | 27 |
BCL6-AS1 | BCL6 Antisense 1 | 2.14 | 2.95E-07 | 27 |
EGR1 | Early Growth Response 1 | 3.68 | 5.91E-06 | 27 |
IGLV10-54 | Immunoglobulin Lambda Variable 10–54 | -2.73 | 5.46E-06 | 27 |
IGKV1D-39 | Immunoglobulin Kappa Variable 1D-39 | -2.68 | 2.13E-05 | 27 |
PDIA4 | Protein Disulfide Isomerase Family A Member 4 | -0.45 | 5.59E-06 | 26 |
HSP90B1 | Heat shock protein 90 kDa beta member 1 | -0.60 | 1.17E-05 | 26 |
FAM46C | family with sequence similarity 46, member C | -1.03 | 1.42E-05 | 26 |
KDM6B | Lysine Demethylase 6B | 0.61 | 1.92E-05 | 26 |
FBXO7 | F-Box Protein 7 | -0.37 | 2.57E-05 | 26 |
PSAT1 | Phosphoserine Aminotransferase 1 | -0.78 | 2.49E-05 | 26 |
CDC20 | Cell Division Cycle 20 | -1.95 | 8.70E-06 | 25 |
NDC80 | NDC80 Kinetochore Complex Component | -0.80 | 1.46E-05 | 25 |
CHEK1 | Checkpoint Kinase 1 | -0.80 | 1.91E-05 | 25 |
ITM2C | Integral Membrane Protein 2C | -0.89 | 4.86E-05 | 25 |
SOGA1 | Suppressor Of Glucose, Autophagy associated 1 | 0.66 | 5.61E-05 | 25 |
TXNDC15 | Thioredoxin Domain Containing 15 | -0.42 | 6.42E-05 | 25 |
IGLV3-1 | Immunoglobulin Lambda Variable 3 − 1 | -1.80 | 8.19E-05 | 25 |
MTCO2P12 | MT-CO2 Pseudogene 12 | 2.76 | 7.17E-04 | 25 |
Table 2: The column represents genes, description of genes, log2 fold change, p-value and iteration count. Iteration count is the number of leave-one-out iterations where the gene remained significant .
To understand the characteristics of responders and non-responders, we performed gene set enrichment analysis (GSEA) and identified a total of 127 regulated pathways (FDR ≤ 0.01, Additional file 2: Table S2). We found positive and negative enrichment characteristics for responders compared to non-responders. Notably, response to therapy was preferentially characterized by higher expression of genes involved in graft versus host disease, antigen processing and presentation, and neutrophil degranulation. Future non-responders were characterized by higher expression of genes involved in cell cycle pathways, mainly - cell cycle mitotic activity, cell cycle checkpoints and also for DNA replication and protein translation. The top 15 most upregulated and downregulated pathways are represented in Additional file 1: Fig. S3A; Additional file 2: Table S2).
We further studied the association of immune phenotypes measured by flow cytometry and also the association of blood plasma protein levels to clinically defined response before anti-TNF treatment. Neither flow cytometry measurements nor protein measurements showed any significant difference between responders and non-responders at baseline (data not shown).
Association of gene expression signatures and response upon 3 months of anti-TNF treatment
With the aim to identify gene expression signatures associated with response upon therapy with TNF-blockade, we performed differential expression analysis in PBMC between groups of responders and non-responders after 3 months of the treatment. This analysis identified significant differences in expression for 19 genes (FDR ≤ 0.1) in PBMC (Additional file 1: Fig. S4A). We investigated the effects of outliers on differential gene expression among the follow up samples using a LOO approach. None of the previously found 19 genes remained significantly differentially expressed in all 32 iterations. However, three genes, BRDOS (Additional file 1: Fig. S4B), C2orf42 and HBA2 showed significance (FDR ≤ 0.1) in 31 iterations. All three genes are expressed lower in the group of responders compared to the group of non-responders. Additionally, five genes, EPHB3, MKS1, NCK1-AS1 (Additional file 1: Fig. S4B), SLC25A39, FBXO7 showed significance in 30 iterations. The list of differentially expressed genes and number of iterations where each gene showed significance are presented in Additional file 2: Table S3.
The gene set enrichment analysis was performed using differentially expressed genes, sorted based on log2 fold change and the analysis identified 27 regulated pathways. Interestingly, all the 27 pathways that showed significance had a lower expression in the group of responders. The regulated pathways were predominantly enriched for metabolism of RNA, metabolism of proteins and metabolism of amino acids and derivatives (Additional file 1: Fig. S3B; Additional file 2: Table S4) suggesting overall downregulation of biosynthesis in PBMCs of patients who have responded to anti-TNF treatment.
Further, our association analysis of immune phenotypes and plasma protein levels to clinically defined response of anti-TNF treatment did not show any significant association between responders and non-responders after three months (data not shown).
Effect on gene expression in PBMCs during anti-TNF treatment
Since biological and technical variations between individuals may significantly affect the analyses, paired samples of PBMC from the same patient before and after treatment is the most preferable approach for addressing changes related to the treatment. We analyzed the effect of anti-TNF treatment on gene expression using paired PBMC samples from all 25 RA patients, not considering response. The analysis identified 25 genes that showed significant treatment effect on gene expression (FDR ≤ 0.1). The expression of 14 genes were suppressed and 11 genes showed a slight increase in expression compared to baseline with 3 months of treatment with TNF blockade (Additional file 2: Table S5). The transcription factor BHLHE40 that controls cytokine production by T cells and Chitinase-6-like protein, CHI3L1were suppressed during treatment whereas the B cell novel protein 1 (alias FAM129C) and Tetratricopeptide Repeat Domain 21A (TTC21A) were induced by treatment (Fig. 2A). The pathway analysis of differentially expressed genes did not show any enrichment of gene sets (FDR ≤ 0.01), but when using a less conservative threshold of FDR ≤ 0.1, we detected the regulation of 114 gene sets during treatment across all patients (Additional file 2: Table S6). Induced genes are significantly enriched for genes involved in the RNA and protein metabolism, and interferon signaling, whereas suppressed genes are predominantly enriched for genes involved in neutrophil degranulation, hemostasis and signaling by GPCR (Fig. 2B).
Effect of anti-TNF treatment on gene expression in PBMCs in relation to treatment response
The trajectory of gene expression changes may correlate with measured clinical outcomes. Therefore, we investigated the transcriptional changes in paired samples of responders (n = 17) and non-responders (n = 8) to anti-TNF treatment separately. Our analysis identified five genes that were significantly regulated in responders, whereas no significant regulation was identified in non-responders. Of the five regulated genes in responders, C-X-C Motif Chemokine Receptor 2 (CXCR2), Myeloperoxidase (MPO), Myeloid Associated Differentiation Marker (MYADM), TNF Alpha Induced Protein 6 (TNFAIP6) were suppressed by treatment, whereas gene Low Affinity Immunoglobulin Gamma Fc Region Receptor II-B (FCGR2B) was induced during anti-TNF treatment. The gene expression plots for all five genes using normalized counts shows a clear difference in responders before and after anti-TNF treatment (Fig. 3A). We observed regulation of these genes in the same direction for non-responders, but with less significance (Additional file 1: Fig. S5).
Further, we performed gene set enrichment analysis of differentially expressed genes in responders and found 78 pathways that are significantly enriched (Additional file 2: Table S7). In responders, we noticed induction of pathways involved in regulation of cell cycle mitotic and protein metabolism, whereas genes involved in extracellular matrix organization, neutrophil degranulation, signaling by GPCR, signaling by interleukins, hemostasis and immune responses such as Toll like receptor cascades were downregulated (Fig. 3B; Additional file 2: Table S7).
Changes in cell phenotypes during anti-TNF treatment
Using a linear model we studied the changes in 422 immune phenotypes measured by flow cytometry during treatment with anti-TNF. When analyzing the effect of treatment in responders and non-responders, we observed differences in seven cell phenotypes in responders and no significant differences in non-responders (Additional file 2: Table S8). After treatment in responders, we detected a strong suppression of the proportion of granulocytes among leukocytes (defined as CD45 + cells), as well as a decrease in overall concentration of neutrophils in whole blood. The proportion of T cells (defined as CD3 + cells) and B cells (CD3-CD19+) among leukocytes was instead up-regulated during anti-TNF treatment among responders, along with the proportion of NKG2A + NKp44 + NK cells out of all NKp44+ NK cells (Fig. 3C). We compared beta coefficient values of responders and non-responders for the selected 58 cell phenotypes (p-value < 0.05) and we observed a positive correlation between response groups (r = 0.46, p value = 0.0008) indicating similar directionality of regulation for the most cell phenotypes.
Effect of anti-TNF treatment on levels of different proteins blood plasma
To identify the proteins that are regulated during treatment in responders and non-responders separately, we performed longitudinal analysis on paired patient samples. In responders, we identified regulation of 12 proteins in plasma compared to one protein in non-responders. Proteins were predominantly downregulated by anti-TNF treatment in responders, including CRP, IL-6, MMP-1, MMP-3, SAA, TNF-RI, VEGF, TNFR2, MIG, MIP-1 beta and YKL-40 (Fig. 3D). Interestingly, we observed downregulation of protein matrix metalloproteinase-3 (MMP-3) measured using two different methods. The protein adiponectin is induced during anti-TNF treatment among non-responders, as well as with less significance in responders (FDR: 0.11). The list of proteins that were regulated during anti-TNF treatment is shown in Additional file 2: Table S9.
Classifier performance for anti-TNF response data
We investigated the utility of machine learning algorithms to predict anti-TNF response using clinical data, flow cytometry measurements, protein measurements and transcriptomic data. At baseline, the model based on transcriptomics data predicted response fairly accurately with linear model (ROC AUC ± SEM : 0.81 ± 0.17) (Fig. 4) whereas the models based on clinical data, flow cytometry data and protein data showed limited predictive utility. The kernel method at baseline predicted response with ROC AUC: 0.73 ± 0.17 for clinical data, ROC AUC: 0.72 ± 0.18 for flow cytometry and ROC AUC: 0.72 ± 0.15 for protein data (Fig. 4). We further studied the classifier performance of the models based on all four data types at three months (after treatment) and we observed limited classifier utility of models based on flow cytometry, protein as well as transcriptomics data. In contrast, the linear model based on clinical data showed good classifier performance with ROC AUC: 0.85 ± 0.15 at three months. For the FACS data, we found maximum classifier utility of ROC AUC: 0.68 ± 0.17 using linear model, whereas the models based on proteins and transcriptomics data showed maximum classifier utility using non-linear method with ROC AUC: 0.73 ± 0.15 and ROC AUC: 0.72 ± 0.18 respectively.