Molecular subtypes related to immune infiltration in DLBCL
This analysis included 1158 DLBCL patients with available survival data in 4 cohorts obtained from the TCGA and GEO databases. A total of 738 DLBCL patients, who were included in the TGCA database (n=47), GSE87371 (n=221) and GSE31312 (n=470), were considered the training cohort. GSE10846 (n=420) was considered the validation cohort. To further stratify the DLBCL patients into different immune infiltration groups, consensus cluster analysis was conducted. The optimal number of clusters was two, which was defined by CDF curves (Fig. 1A–C).
An unsupervised hierarchical cluster analysis of DLBCL patients based on immune infiltration-related genes separated DLBCL into high- and low-immune cell infiltration groups. The high immune cell infiltration group (Immunity-H, n=376) was enriched for almost all selected immune cell subtypes. Cases in the low immune cell infiltration group (Immunity-L, n=362) exhibited lower selected immune cell infiltration, except activated B cells (Fig. 1D-E). A comprehensive summary of the patient characteristics of the two groups is shown in Supplementary Table 1. Group membership within the two subtypes was associated with similar clinical characteristics, such as age, sex, stage, and COO. The Immunity-H group, with a longer overall survival, contained more patients with lower IPI scores. The Kaplan–Meier survival curves between the Immunity-H and Immunity-L groups are shown in Fig. 1F.
Immune Infiltration Defines a Biologically Distinct Subgroup Within DLBCL
To determine the accuracy of this grouping strategy, we determined the major histocompatibility complex (MHC), immune coinhibitor checkpoint (IAP), and immune costimulator checkpoint (ICP)-related gene expression in the two groups (Fig.2A-C). For most of the related genes, the bar chart analysis showed that there was a significant positive correlation between the Immunity-H group and MHC, IAP and ICP. These results indicated that impaired immune surveillance of tumor cells is an escape mechanism in the Immunity-L group. Furthermore, we analyzed the functional context of these two subtypes by conducting Gene Ontology (GO). GO analysis based on the differentially expressed genes between the groups (Fig.2D, false discovery rate < 0.05) showed that the highly-expressed genes in Immunity-L group were mainly enriched in immune-related functions, such as activation of immune response, lymphocyte differentiation, neutrophil mediated immunity, positive regulation of MAPK cascade. We also analyzed somatic mutation data from the TCGA database to explore the difference in significantly mutated genes (SMGs) between these two subtypes. The SMG mutational profile of the DLBCL sample showed a distinct mutation ratio in IGLV3-1 (15/36 [41.7%]), BTG2 (13/36 [36.1%] and KMD2D (12/36 [33.3%]) (Supplementary Fig.1). The Immunity-L group showed more frequent mutations (101/169, [59.8%]). Because of the low sample size, we did not perform further detailed analysis between these two groups. In conclusion, these results identified extensive immune heterogeneity in DLBCL.
Identification of a classification-related prognostic signature for DLBCL
Considering the convenient application of the immune infiltration subtypes for prognostic prediction, we developed an immune risk score based on the differentially expressed genes between the Immunity-H and Immunity-L groups. According to the cutoff thresholds of |Log2 Fold Change|>0.5 and FDR < 0.05, a total of 489 mRNAs that were differentially expressed were obtained, of which 18 were upregulated and 471 were downregulated (Fig.3A). We then performed univariate analysis and LASSO analysis to select the gene set with the best prognostic value (Fig.3B-C). A 16-gene signature (ANTXR1, CD3D, TIMP1, FPR3, NID2, CTLA4, LPAR6, GPR183, LYZ, PTGDS, ITK, FBN1, FRMD6, PLAU, MICAL2, C1S) was identified and the risk score of each case was computed with the gene expression level and regression coefficient (Fig.3D). The Kaplan–Meier analysis of the 16 immune-related genes is shown in Supplementary Fig.2. Next, patients in training cohorts were assigned to a low-risk or high-risk group based on the median value of risk scores, which was used as the cutoff value. We also calculated the risk scores of patients in the validation cohort with the same coefficients to validate this signature. According to the distribution of the risk score and survival status, we detected the association between this signature and the proportion of deaths (p<0.05). A higher risk score was associated with a higher proportion of deaths (Fig.3E-F). Subsequently, we analyzed the expression of the 16 genes included in the signature in the high- and low-risk groups. To test the value of the risk score for DLBCL, we performed survival analyses (Kaplan–Meier test). The results showed that the survival time of the high-risk group was significantly shorter than that of the low-risk group in the training and validation cohorts (P < 0.05) (Fig.4A). To address the question of whether the risk score based on the newly established molecular immune signature is a unique feature of DLBCL, we tested its role in different IPI scores. We found that the survival time of the high-risk group was also significantly shorter than that of the low-risk group in both the IPI high and IPI low groups (P < 0.05) (Fig.4B). Moreover, all 16 genes were expressed at low levels in the high-risk group in both the training and validation cohorts (Fig.4C, Supplementary Fig.3)
The Prognostic Signature Is Related to Immune Infiltration and Clinical Characteristics
Notably, the risk score was significantly correlated with immune infiltration. A high-risk score was associated with lower immune infiltration (Fig. 5A). To further confirm the immune heterogeneity between these two risk groups, we examined the distribution of stromal and immune content of each group. The risk score was significantly negatively correlated with the immune score and ESTIMATE score(Fig.5B) (P<0.05). Patients in the low-risk group had a lower tumor purity, suggesting that low-risk group patients contained a higher number of immune cells (Fig.5B). Pearson’s analysis was performed to determine the correlations between the risk score and the specific type of each immune cell. Except for activated B cells, a negative correlation was observed between the risk score and immune infiltration cells, such as central memory CD4 T cell and Myeloid-derived suppressor cells(MDSC) (Fig.5C, Supplementary Fig. 4).By stratified analysis, we found that the risk score showed excellent prognostic value among subjects with different baseline characteristics. There was a tightly correlation between the risk score and IPI, age (Fig.5D). For high-risk group, there was a higher proportion of patients in the IPI high, age>60 years old groups. Stratified analysis further revealed that the risk score was lower in CR patients than in PR patients, but there were no statistically significant differences between PR and PD/SD patients (Fig. 5D). In addition, the risk score was higher in ABC-DLBCL patients than in GCB-DLBCL patients (Fig. 5D).
The prognostic signature is related to the response to chemotherapy and immunotherapy
We next performed a prediction analysis of response to therapy in the two-risk group by applying the “pRRophetic” method. There were obvious differences in the risk score and response to different drugs (Fig.6A-H). Patients in the high-risk group had a lower estimated IC50 than those in the low-risk groups for the following chemotherapy drugs: etoposide, gemcitabine, lenalidomide, vorinostat and obRO.3306. Conversely, patients in the low-risk group had a lower estimated IC50 than those in the high-risk groups for the following chemotherapy drugs: bortezomib and AZD8055. (P<0.05).
The functional analysis of differentially expressed genes
Next, we compared gene expression between low- and high-risk patients to determine the functional differences. A total of 308 genes were differentially expressed between the two groups. Compared with the low-risk group, 189 genes were downregulated and 119 were upregulated in the high-risk group (Supplementary Fig.5). The KEGG enrichment analysis indicated that many of these pathways were linked to the immune response in DLBCL, including the T cell receptor signaling pathway, B cell receptor signaling pathway, NF−kappa B signaling pathway, PD−L1 expression and PD−1 checkpoint pathway, which were significantly enriched in the low-risk group (Fig.7A-D). These results indicated that the signature derived from the immune infiltration classification could represent similar biological differences in DLBCL.
Prognostic value of the established signature
Uni- and multivariable analyses were performed to evaluate the correlation between the risk score and survival. Univariate Cox regression analysis showed that the risk score, age and IPI were both important risk factors for DLBCL (HR>1, P<0.001) (Fig.8A). Additionally, multivariate Cox analysis revealed that the risk score and IPI were independent prognostic factors for DLBCL survival (HR>1, P<0.001) (Fig.8B). Both univariate and multivariate Cox regression analyses indicated that the HR value of the risk score was greater than that of the IPI, which is currently considered to be the best prognostic criterion for DLBCL. We next constructed a survival prediction nomogram comprising the risk score and IPI to predict 3- and 5-year OS (Fig.8C-D). Calibration curves for the probability of 3- and 5-year survival showed good agreement between predictions and observations, which indicated that the established nomogram was reliable in predicting the prognosis of DLBCL (Fig.8E). Next, an ROC curve was plotted to observe the predicted value of the nomogram with or without the risk score. Interestingly, we noticed that the nomogram risk score AUC was 0.775, which was better than the IPI AUC (0.714) and risk score AUC (0.725) (Fig.8F). These data suggest that the nomogram was a better predictor for a poor prognosis than IPI in DLBCL.