Gene expression signatures of response to anti-TNF treatment at baseline
The gene expression analysis identified 59 differentially expressed genes between future responders and non-responders, but most of these genes achieved significance (FDR≤0.1) due to a single outlier sample showing different gene expression profile (Supplementary Fig. S2A-B). Therefore, we decided to exclude this patient sample (both anti-TNF naïve and treated) and a subsequent analysis (without the outlier sample) yielded 192 differentially expressed genes (FDR≤0.1), including 103 genes with higher expression, and 89 genes with lower expression, in future responders. The top differentially expressed genes are represented in Fig. 1A (also see Table 2 and Supplementary Table. S1A).
To assess possible heterogeneity and detect consistent differential gene expression, we employed a leave-one-out (LOO) approach, where one sample is removed in each iteration and the association analysis was repeated. The genes are considered if they meets statistical significance (FDR <0.1) in each iteration. The LOO approach yielded two genes, Epiplakin (EPPK1) and FosB Proto-Oncogene, AP-1 Transcription Factor Subunit (FOSB) with higher expression in responders in all 28 iterations (Table 2). An additional five genes, EGR1, EGR2, BCL6-AS1, IGLV10-54, and IGKV1D-39 showed significance (FDR <0.1) in 27 LOO iterations. The genes EGR1, EGR2, and BCL6-AS1 were expressed higher in future responders, whereas immunoglobulin light chain genes IGLV10-54 and IKV1D-39 had lower expression in future responders. The genes EPPK1, BCL6-AS1 and CDC20 showed a clear expression difference between responders and non-responders (Fig. 1B). We also noticed 6 immunoglobulin light chain genes (IGLV10-54, IGKV1D-39, IGKV3-20, IGLV3-1, IGKV1-17, IGKV2-24) and 1 heavy chain gene (IGHV5-10-1) expressed lower in responders compared to non-responders, however these genes did not pass the LOO analysis criteria (Fig. 1A).
Genes
|
Description
|
log2FoldChange
|
p-value
|
Iteration count
|
FOSB
|
FosB Proto-Oncogene, AP-1 Transcription Factor Subunit
|
3.88
|
6.25E-09
|
28
|
EPPK1
|
Epiplakin 1
|
1.89
|
6.06E-07
|
28
|
EGR2
|
Early Growth Response 2
|
3.98
|
1.63E-07
|
27
|
BCL6-AS1
|
BCL6 Antisense 1
|
2.14
|
2.95E-07
|
27
|
EGR1
|
Early Growth Response 1
|
3.68
|
5.91E-06
|
27
|
IGLV10-54
|
Immunoglobulin Lambda Variable 10-54
|
-2.73
|
5.46E-06
|
27
|
IGKV1D-39
|
Immunoglobulin Kappa Variable 1D-39
|
-2.68
|
2.13E-05
|
27
|
PDIA4
|
Protein Disulfide Isomerase Family A Member 4
|
-0.45
|
5.59E-06
|
26
|
HSP90B1
|
Heat shock protein 90kDa beta member 1
|
-0.60
|
1.17E-05
|
26
|
FAM46C
|
family with sequence similarity 46, member C
|
-1.03
|
1.42E-05
|
26
|
KDM6B
|
Lysine Demethylase 6B
|
0.61
|
1.92E-05
|
26
|
FBXO7
|
F-Box Protein 7
|
-0.37
|
2.57E-05
|
26
|
PSAT1
|
Phosphoserine Aminotransferase 1
|
-0.78
|
2.49E-05
|
26
|
CDC20
|
Cell Division Cycle 20
|
-1.95
|
8.70E-06
|
25
|
NDC80
|
NDC80 Kinetochore Complex Component
|
-0.80
|
1.46E-05
|
25
|
CHEK1
|
Checkpoint Kinase 1
|
-0.80
|
1.91E-05
|
25
|
ITM2C
|
Integral Membrane Protein 2C
|
-0.89
|
4.86E-05
|
25
|
SOGA1
|
Suppressor Of Glucose, Autophagy associated 1
|
0.66
|
5.61E-05
|
25
|
TXNDC15
|
Thioredoxin Domain Containing 15
|
-0.42
|
6.42E-05
|
25
|
IGLV3-1
|
Immunoglobulin Lambda Variable 3-1
|
-1.80
|
8.19E-05
|
25
|
MTCO2P12
|
MT-CO2 Pseudogene 12
|
2.76
|
7.17E-04
|
25
|
Table 2: Differentially expressed genes in future responders and non-responders before anti-TNF treatment. The iteration count is the number of leave-one-out iterations where the gene remained significant.
The gene set enrichment analysis based on ranking of differentially expressed genes (log2fold change) identified a total of 127 regulated pathways (Supplementary Table. S1B) between response groups. The pathway analysis reveal that responders were preferentially characterized by higher expression of genes involved in graft versus host disease, antigen processing and presentation, and AP-1 transcription factor network, whereas non-responders were characterized by higher expression of genes involved in cell cycle pathways, mainly - cell cycle mitotic activity, cell cycle checkpoints. The top 15 most upregulated and downregulated pathways are represented in Supplementary Fig. S3A; Supplementary Table. S1B).
Effect on gene expression in PBMCs during anti-TNF treatment
Since biological and technical confounders between individuals may significantly affect downstream analyses, using paired samples before and after treatment is the most preferable approach to address changes related to the treatment. We analyzed treatment effects on gene expression using 25 paired RA patients without considering the response. The analysis identified 25 differentially expressed genes, of which 14 genes were suppressed whereas 11 genes showed a slight increase in expression (Supplementary Table. S1C). The genes BHLHE40 that controls cytokine production by T cells and CHI3L1, Chitinase-3-like protein were suppressed during treatment whereas the B cell novel protein 1 (alias FAM129C) and TTC21A were induced by treatment (Fig. 2A). The pathway analysis of differentially expressed genes did not show any enrichment of gene sets, but when using a less conservative threshold of FDR<0.1, we detected 114 gene sets regulated during the treatment (Supplementary Table. S1D). Induced genes are significantly enriched for genes involved in the RNA and protein metabolism, and interferon signaling, whereas suppressed genes are predominantly enriched for genes involved in hemostasis and signaling by GPCR (Fig. 2B).
Distinct gene expression signatures for response during anti-TNF treatment
The trajectory of gene expression changes may correlate with measured clinical outcomes. Therefore, we investigated the transcriptional changes in paired samples of responders (n=17) and non-responders (n=8) to anti-TNF treatment separately. Our analysis identified five genes regulated in responders, whereas no significant regulation was identified in non-responders. Of the five regulated genes in responders, C-X-C Motif Chemokine Receptor 2 (CXCR2), Myeloperoxidase (MPO), Myeloid Associated Differentiation Marker (MYADM), TNF Alpha Induced Protein 6 (TNFAIP6) were suppressed by treatment, whereas gene Low-Affinity Immunoglobulin Gamma Fc Region Receptor II-B (FCGR2B) was induced during anti-TNF treatment. The gene expression plots for all five genes using normalized counts show a clear difference in responders before and after anti-TNF treatment (Fig. 3A). Interestingly, for these genes we observed a similar trend of regulation in non-responders (same directionality), but it does not reach statistical significance (Supplementary Fig. S4). The gene set enrichment analysis of differentially expressed genes ranked based on foldchange identified 78 regulated pathways in responders (Supplementary Table. S1E). We noticed induction of pathways involved in the regulation of cell cycle mitotic and protein metabolism, whereas genes involved in the extracellular matrix organization, signaling by GPCR and Toll-like receptor cascades were downregulated in responders (Fig. 3B; Supplementary Table. S1E).
Replication of gene expression associations at baseline
To validate our study, we sought to replicate our findings in an independent cohort (BiOCURA) of gene expression on PBMCs with 80 RA patients before they began adalimumab or etanercept anti-TNF treatment (12). From the replication cohort, we selected only female patients gene expression data for the analysis. At baseline, we replicate the gene CDC20 that showed lower expression in responders compared to non-responders (p-value<0.087). None of the remaining 20 genes (Table 2) achieved statistical significance with p-value<0.1. In the study cohort, we reported the suppression of the gene, CHI3L1 upon anti-TNF treatment in responders, we observed similar regulation in replication cohort with lower expression in responders compared to non-responders (p-value<0.096) at baseline.
Changes in cell phenotypes and protein blood plasma during anti-TNF treatment
We studied the changes in 422 immune phenotypes measured by flow cytometry during anti- treatment using paired samples. When analyzing the effect of treatment in responders and non-responders, we observed differences in seven cell phenotypes in responders and no significant differences in non-responders (Supplementary Table. S1F). With treatment, responders showed a strong suppression of the proportion of granulocytes among leukocytes (defined as CD45+ cells), and the overall concentration of neutrophils in whole blood. The proportion of T cells (defined as CD3+ cells) and B cells (CD3-CD19+) among leukocytes and the proportion of NKG2A+ NKp44+ NK cells out of all NKp44+ NK cells were instead up-regulated(Fig. 3C).
The longitudinal analysis using paired samples identified regulation of 12 proteins in plasma in responders and one protein in non-responders. The proteins were predominantly downregulated by anti-TNF treatment in responders, including CRP, IL-6, MMP-1, MMP-3, SAA, TNF-RI, VEGF, YKL-40, MIG, and MIP-1 beta, whereas TNFR2 is upregulated (Fig. 3D). Moreover, we identified the downregulation of protein matrix metalloproteinase-3 (MMP-3) measured using two different methods (Supplementary Data S1), whereas protein adiponectin is induced during anti-TNF treatment in non-responders, while showing a trend for induction in responders (FDR: 0.11). The list of proteins that were regulated during anti-TNF treatment is shown in Supplementary Table. S1G.
Further, our association analysis of immune phenotypes and plasma protein levels to clinically defined response to anti-TNF treatment did not show any significant association between responders and non-responders before treatment.
Classifier performance for anti-TNF response data
We investigated the utility of machine learning methods to predict anti-TNF response using clinical data, flow cytometry measurements, protein measurements, and transcriptomic data. At baseline, the linear model predicted response with high accuracy for transcriptomics data (ROC AUC ± SEM: 0.81 ± 0.17) (Fig. 4) whereas the kernel based method predicted response with highest accuracy for clinical data (ROC AUC: 0.73 ± 0.17), for flow cytometry (ROC AUC: 0.72 ± 0.18), and for protein data (ROC AUC: 0.72 ± 0.15 )(Fig. 4). We further studied the classifier performance for the anti-TNF treated data and observed limited classifier utility for models based on flow cytometry, protein as well as transcriptomics data. In contrast, as expected, the linear model based on clinical data showed high classifier performance with ROC AUC: 0.85 ± 0.15. For the FACS data, we found maximum classifier utility of ROC AUC: 0.68 ± 0.17 while using linear model, whereas the non-linear models based on proteins and transcriptomics data showed maximum classifier utility with ROC AUC: 0.73 ± 0.15 and ROC AUC: 0.72 ± 0.18 respectively.