PCA validation
PCAtest was run for 100 permutations and bootstrapping replicates and estimated an empirical Ψ value of 51848193.065, with a max null Ψ = 979604.238 and a min null Ψ = 978848.881, for a p-value < 0.00001. Likewise, the ϕ value was estimated at 0.337, with a max null ϕ = 0.047 and min null ϕ = 0.047, for a p-value < 0.00001. These results indicate highly significant non-random correlations in the data, justifying the use of PCA and confirming that the analysis is biologically meaningful. Moreover, the analysis showed that the first eight principal components (PC) explained 80% of the total original variance (from 1.8–33% for individual PC). Finally, the number of genes with significant correlations with each of these eight principal components ranged from 2830 to 7546. Together, these results strongly support the use of PCA on this dataset.
Polytope fitting
The elbow method applied by the PartiCode package suggested the presence of four archetypes, thus defining a tetrahedron (3-d simplex) (Fig. 1). The identified polytope was highly significant with the t-ratio test indicating a p-value < 0.00001.
Archetype-defining genes selection
For each of the four archetypes, ordered coefficients were plotted and revealed a clear pattern of abrupt change from coefficients close to zero for the vast majority of genes, to a few genes showing strongly positive or negative coefficients (Supp. Figure 1). Archetype-defining genes were therefore selected by considering a horizontal line (slope = 0) for genes with a coefficient close to zero and a line with a slope of 1 (vertical) for genes with strongly positive or negative coefficients. These lines were extrapolated and the line bisecting these right-angled extrapolated horizontal and vertical lines at 45 degrees was determined. The mid-point of this 45-degree line, when positioned so it just touched the data, was considered as the inflexion point (elbow point) and used as the coefficient cut-off to select the positive and negative archetype defining genes (Supp. Figure 2).
Enrichment analysis
The above procedure identified between 23 and 132 archetype-defining genes with positive loadings, and between 12 and 85 archetype-defining genes with negative loadings depending on the archetype (Supp. Table 1). For each archetype, both positive and negative gene lists were submitted to the Panther database for enrichment analysis. All eight archetype-defining gene lists (one positive and one negative loading list for each of the four archetypes) were significantly enriched (after FDR adjustment) in some functions. The 10% functions with the most significant enrichment p-value for each of the eight lists are presented in supplementary table 2. GO terms associated with each individual archetype-defining genes are listed in supplementary table 3.
The first archetype was positively enriched in aerobic energy production (three most significant GO terms: oxidative phosphorylation/aerobic respiration/aerobic electron transport chain) and negatively enriched in protein translation (three most significant GO terms: cytoplasmic translation/translation/peptide biosynthetic process). The second archetype was positively enriched in protein translation (three most significant GO terms: cytoplasmic translation/translation/peptide biosynthetic process) and negatively enriched in energy production and immune functions (three most significant GO terms: oxidative phosphorylation/adaptive immune response/cellular respiration). The third and fourth archetype were positively enriched in a mixture of aerobic energy production and immune functions, and negatively enriched in immune functions. The 10% lowest FDR-adjusted p-values of the enrichment analysis were very small, ranging from 0.0025 to 5.67x10− 95, indicating that, from a functional point of view, archetype-defining genes represented highly non-random genomic sub-samples. As predicted, a significant proportion (49%) of archetype-defining genes were shared between at least two archetypes, albeit in different combinations and proportions (Fig. 2).
Biological interpretability
One of the most striking features of the functional enrichment analysis was the contrast between archetype 1, which showed specialization for oxidative phosphorylation (OXPHOS) at the expense of protein synthesis, and archetype 2 which was specialized in protein synthesis at the expense of both OXPHOS and immune functions (Fig. 3a), b)). A detailed analysis of the genes with expression positively correlated with archetype 2 and negatively correlated with archetype 1 revealed 34 genes involved in protein synthesis. These genes consisted of various ribosomal subunits and two translation elongation factors, as well as CHCHD2. Interestingly, it was recently shown that in stress conditions produced by carbonyl cyanide m-chlorophenylhydrazone (CCCP) treatment, a decoupling agent known to induce oxidative stress 24, CHCHD2 knockdown triggered the integrative stress response (ISR) in cultured HeLa cells 25. The role of the ISR is to maintain homeostasis under stress conditions, including oxidative stress, and is known to slow down protein synthesis via the phosphorylation of eIF2α 26. Indeed, the ISR was recently observed to inhibit protein synthesis in the context of oxidative stress from mitochondria-derived production of reactive oxygen species (ROS) in a cardiac ischemia/reperfusion model 27. Because of the high rate of energy production via OXPHOS and the associated ROS generated, archetype 1 cells may down-regulate CHCHD2 in order to restore homeostasis via the ISR, at the expense of protein synthesis. In contrast, archetype 2 cells may overexpress CHCHD2 in order to benefit from its anti-apoptotic effect 28. Recently, the importance of CHCHD2 in cancer and its potential as drug target was highlighted by Gundamaraju et al. 29. Consistent with the hypothesis of a driving role for increased ROS production in archetype 1, the expression of ROMO1 was negatively correlated with this archetype. ROMO1 is known to increase ROS production 30 and thus might be downregulated by archetype 1 as an adaptation to compensate for the high ROS production generated by energy production via the electron transport chain.
Another noticeable contrast between archetypes 1 and 2 was the positive correlation of the expression of CKS2 and TPT1 with archetype 2, while the expression of these two genes were negatively correlated with archetype 1. CKS2 and TPT1 have both been implicated in cancer, including as potential therapeutic targets 31, 32, 33. TPT1 encodes the translationally controlled tumor protein (TCTP) and CKS2 encodes the cyclin-dependent kinase regulatory subunit 2, which is known to bind to and be necessary for the activity of the cyclin B1-CDK1 protein kinase, an essential factor for cells to progress past the G2 phase of the cell cycle 34. Interestingly, both TPT1 and CKS2 have also been associated with OXPHOS-induced oxidative stress 35, 36, as well as with cell-cycle regulation and with protein synthesis/degradation 37, 38. In an hematopoietic cells (HSC) mouse model, CKS2 knockout was associated with an accelerated cell cycle 39 (which may contribute to the malignant phenotype) but also with an increase ROS production 38. Moreover, the same study also found that CKS2 was involved in proteostasis of HSC, which may be related to the trade-off in protein synthesis identified for archetype 1 38. The role of TCTP in protein synthesis regulation is known to involve its interaction with elongation factors eEF1A and eEF1B 37. The fact that both eEF1A1 and eEF1B2 followed the same correlation of expression patterns between archetypes 1 and 2 as TPT1 and CKS2 supports this interpretation.
In general, TPT1/TCTP is thought to protect cells against apoptosis and oxidative stress 40. It might thus seem surprising that its expression should be negatively correlated with archetype 1 given the proposed trade-offs involving increased OXPHOS-induced ROS and ISR activation. However, while mild oxidative stress was found to upregulate TCTP, strong oxidative stress was found to downregulate its expression 36, and both CKS2 and TPT1 were downregulated in a butyrate resistant cell line conferring tumorigenesis, apoptosis and stress resistance in a colon adenocarcinoma model 41. Thus, the level of oxidative stress in archetype 1 might reach levels associated with reduced TCTP expression.
TCTP and CKS2 expression patterns are linked through the TCTP/CDC25C/CDK1 pathway, which was shown to be dysregulated in hepatocellular carcinoma 42. Overexpression of TPT1 has been associated with reduced CDK1 activity via ubiquitin-proteasome degradation of CDC25C, which is necessary for the dephosphorylation and activation of CDK1 42. By contrast, CKS2 is thought to promote CDK1 expression 31 and to be required for its function. Thus, the reciprocal correlation patterns seen between archetype 1 and 2 might reflect the need to compensate for the reduced activation of CDK1 by CDC25C from the increased expression of TPT1 by an increase in CKS2 expression and vice versa.
In addition, TCTP and CKS2 were both found to exhibit reciprocal repression with p53. In the case of TPT1, p53 is repressed via TCTP ubiquitin-mediated degradation of p53 while p53 directly represses TPT1 transcription 33, 43. Likewise, CKS2 expression was found to be repressed by p53 44, while the overexpression of CKS2 was associated with reduced p53 protein abundance in gastric cancer 45. However, the mechanism by which TPT1 and CKS2/CDK1 are repressed by p53 has been questioned and might be due to the indirect DREAM pathway rather than direct interaction with p53 46. Interestingly, CDC25C was also found to be repressed by p53 47 and thus complex TPT1/CDC25C/CKS2/CDK1/p53 interactions might be behind the TPT1-CKS2 opposite correlation pattern seen in these two archetypes. A schematic and synthetic representation of the hypothetical model outlined above for the metabolic trade-offs suggested by the analysis of archetypes 1 and 2’s defining genes is provided on Fig. 3.
A breakdown of the genes characterizing archetype 3 and 4 was also conducted (Fig. 3c),d)). Many of these genes turned out to be known for their involvement in cancer in general and/or lymphoma in particular. For instance, LMO2 expression was negatively correlated with archetype 4, while IGHM expression was positively correlated with this archetype. This expression pattern has previously been associated with the activated B-cell (ABC) DLBCL subtype 48. LMO2 expression reduces double-strand break DNA repair mechanisms and has been associated with a better prognosis in DLBCL patients treated with poly(ADP-ribose) polymerase (PARP) inhibitors 49. Interestingly, the expression of HLA-A,B,C,E, B2M and HLA-DRA/DRB1 was positively correlated with archetype 3, while expression of immunoglobulins (Ig) IGHM, IGHV4-34, IGHV5-51, IGKV3-20, IGLC2, IGLV1-47, IGLV3-1, IGLV3-19, IGLV3-21, JCHAIN were negatively correlated. In contrast, the expression of several Ig genes was positively correlated with archetype 4, while only the invariant HLA-DRA was positively correlated with the archetype. This pattern would be consistent with an immune system escape from MHC loss 50 via a partial plasmablast cell differentiation pathway 51 in archetype 4. Such a strategy would also be consistent with the many pro-tumorigenesis effect of cancer derived Ig that have been identified, including proliferation, migration, invasion, survival, and immune evasion through inhibitory effect on antibody-dependent cell-cytotoxicity (ADCC) from NK cells 52.
However, since Ig are common neo-antigens in B-cell malignancies and Ig-derived neoantigen presentation by MHC is a general phenomenon in lymphomas including DLBCL 53, archetype 3 may limit the production of Ig in the context of retained MHC expression to avoid displaying neo-antigen Ig to the immune system 54. Moreover, archetype 3 was also positively correlated with the expression of CCL18, CCL19, CXCL9 and CXCL10. CCL18 is known to increase the proliferation of B-cell lymphoma 55 and may contribute to immune evasion from its effect on immune surveillance mediated by macrophages and DC and by simultaneously favoring T cell-tolerance 55, 56, 57. CCL19 directs B-cell migration after activation via antigen binding and is known to be upregulated in both gcb and abc DLBCL subtypes. Recently, autocrine CCR7-CCL19 signaling was proposed to significantly contribute to lymphomagenesis under malignant conditions via a stronger activation of the survival pathways 58. More generally, CCR7 signalling upon binding to its ligands (CCL19/21) is associated with many pro-tumorigenic effects in hematological malignancies, including migration, proliferation, survival and immune evasion 59. Interestingly, increased expression of CCL19, CXCL9 and CXCL10 is associated with recruitment of immature CD56bright NK cells with low perforin content in the tumor microenvironment (TME), which is thought to protect tumor cells from NK cells 60. Additionally, MHC expression is a well-known way for tumor cells to avoid immune detection and destruction by NK cells 60. This correlation pattern involving MHC/CXCL9-10/CCL18-19 might thus represent the signature of an alternative immune evasion strategy for archetype 3, distinct form the one displayed by archetype 4. Finally, both archetypes were positively correlated with the expression of CD74, a gene involved in B-cell differentiation, proliferation and survival 61. An integrative schematic summary of the immune evasion trade-offs suggested by the above analysis of archetypes 3 and 4 is provided on Fig. 4.
By looking at some of the genes defining each archetype, trade-offs and expression patterns noted by other studies using different approaches could be detected. Besides the cases already highlighted, the proportion of archetype-defining genes found in nine comparative DLBCL or non-Hodgkin lymphoma gene expression studies 62, 63, 64, 65, 66, 67, 68, 69, 70 was assessed. Overall, 23% of the archetype defining genes identified by ParTI were part of the significantly differently expressed genes reported by these studies. Of note, the recently comparative transcriptomic study of Rapier-Sharman et al. 66 compared RNASeq gene expression data from 322 samples, including 134 B-cell lymphoma samples (of which 123 were LBCL/DLBCL) and 188 healthy B-cell controls. Among the 20 most differently expressed genes between B-lymphoma and normal control samples, 7 (35%) were found among the archetype defining genes identified here by the ParTI algorithm (CXCL9, CXCL13, C1QA, C1QB, C1QC, CCL18, CCL19). An additional three (15%) archetype defining genes identified by ParTI were among the 20 genes showing the most significant differences in the presence of splice variants (APOE, COL1A1 and RPL5). Although the tasks and trade-offs identified by ParTI are not necessarily expected to match differently expressed genes between malignant and normal cells, finding a substantial overlap in genes identified by these two kinds of analyses is convincing evidence that the Pareto optimality theoretical framework is uncovering biologically meaningful and interpretable information at the scale of systems organization, with potential therapeutic relevance.
Uncovering cancer cell vulnerabilities in the form of trade-offs represents a promising avenue to avoid the emergence of treatment resistance. The presence of trade-offs implies that certain tasks cannot be completely avoided by the cells, and yet cannot be simultaneously optimized to the maximum level allowable in principle by the genetic potential. The genes underlying these trade-offs are thus attractive therapeutic targets that could make resistance more difficult to acquire for the malignant cells. The Pareto task inference approach predicts that whenever such trade-offs exist, they should produce detectable geometrical structures in the data. We further predicted that besides geometrical patterns in trait space, phenotypic optimums (archetypes) under trade-offs would be significantly enriched in biological functions, and characterized by patterns of different combinations of a certain proportion of shared tasks/genes.
The data analyzed here do indeed confirm those three predictions at a high level of statistical significance. The t-ratio test empirically calculates the probability of observing a polytope providing as good or better fit to the data as compared to the best possible fit defined as the convex hull. To this end, the data are randomized and the best fitting polytope and its ratio to the convex hull for each replicate data set are re-estimated. The value of the observed ratio is then compared to the distribution of ratios from the simulated randomized data to derive its probability. This test strongly supports the presence of a polytope defined by 4 vertices (tetrahedron) in this transcriptomic dataset.
According to the theory, the vertices of this polytope should represent optimal specialist phenotypes in trait space. Thus, these archetypes are predicted to be significantly enriched in certain particular functions, some of which being either different or performed by different genes, and others shared among different archetypes. FDR-adjusted Fisher’s exact-tests on archetype defining gene lists clearly show that these genes are non-random genomic sub-samples, being statistically highly significantly enriched in particular functions. Some of the most significantly enriched functions were different among archetypes, although the third and fourth archetypes showed substantial overlap in broadly defined functions, but little overlap in the genes underlying these functions within each archetype (the overall percentage of positively and negatively correlated genes unique to either archetype in pairwise comparison was 72.5%). The fact that different archetypes are statistically significantly enriched in different functions and/or gene combinations is inconsistent with an enrichment simply reflecting a general “B-lymphocyte” functional category. Given the typical complete effacement of lymph node architecture and extensive infiltration by malignant B-lymphocytes in DLBCL, these specialized functional sub-categories are likely to reflect, at least partly, various B-cell malignant phenotypic strategies, although some contribution from other cell types from the TME such as dendritic cells, macrophages or T-lymphocytes cannot be completely excluded.
However, besides these different functional characteristics, archetypes also displayed significant proportions of shared genes. As predicted if archetypes result from optimization in the face of trade-offs, these shared elements were distributed in different combinations and proportions among different archetypes. This pattern is consistent with archetypes having to reconcile various functional constrains by shuffling certain genetic toolkits, turning on and off the expression of genes in a way that preserves certain core functions and functional combinations, at the expense of other less essential and potentially dispensable elements. Whether the archetypes identified by the ParTI approach represent irreversible commitment to certain phenotypic pathways or phenotype through which cells can cycle sequentially remains an open question. The former possibility could thus represent stages in malignant progression, while the latter would include the possibility that different archetype might represent temporary adaptations to transient intra or extra-cellular environmental circumstances. Our analysis also confirmed that the archetypes identified by the Pareto approach are biologically interpretable and can be used to generate hypotheses about possible mechanisms underlying the identified correlation patterns. Thus, taken together, these results broadly confirm the predictions of the Pareto optimality theory as applied to these transcriptomic data.
If trade-offs can be identified, therapeutic approaches could be tailored to exploit them in order to minimize the risk of resistance. Thus, the metabolic and immune evasion trade-offs suggested by the data may represent therapeutic opportunities that deserve further study. The first trade-off supports recent findings suggesting an important role for mitochondria, OXPHOS and ROS in tumorigenesis 71, 72, 73, including in relation with the stress responses induced by increased ROS production and their impact on macromolecule synthesis 74. The second trade-off is in line with recent suggestions that in follicular lymphomas, loss of MHCII may be selectively acquired in cells that have accumulated immunogenic mutations in their idiotype sequences in order to avoid displaying Ig neoantigens to T-cells 75. In that study, MHCII expressing lymphoma cells were associated with a TME rich in a CD4+ T-cell population with a high cytotoxicity expression profile signature (CD4CTL), while the reverse was observed for cells expressing low levels of MHCII 75. The pattern revealed by ParTI is also consistent with earlier findings that MHC loss can occur through a partial plasmablastic phenotypic differentiation which could be associated with high levels of Ig production 51. As mentioned previously, the retention of MHC associated with the expression of CXCL9/10 and CCL18/19 in archetype 3 might represent an alternative immune evasion strategy to avoid NK-cells detection (from the expression of MHC and the action of CXCL9/10 and CCL19) and promote T-cell tolerance from CCL18 55, 56, 76. Uncovering the best way to exploit such trade-offs in energy production, protein synthesis and immune evasion, will require additional detailed studies focused at testing the effect of disrupting the function of specific archetype defining genes on both side of the trade-offs simultaneously.