Super-enhancer(SE) is a new concept drew in recent years, a growing body of evidence indicates an explicit relationship between increasing tumorigenesis and malignancy of cancer and SEs. SEs drive not only the expression of genes but also non-coding RNA that regulate biological functions directly and indirectly. Lasso penalized Cox regression is popular in recent years cause it could minimize overfitting(25). Hence, in our article, we use this novel bioinformatic strategy and the Cox proportional hazard regression models to screen and optimize hub genes related to survival.
CLL, is considered to have a highly heterogeneous clinical course, with time to first to treatment is varying from months to years, and many patients eventually progressing and requiring chemotherapy, although initially, CLL is reported as an indolent malignancy. A review of the data so far, disease stratification, IGHV mutation status, 17p- and ZAP70 expression are the validated prediction of overall survival. Beyond that, gene expression analysis was carried out in various surrogate markers for genetic features and prognosis. Six surface antigens(CD62L, CD54, CD49c,CD49d, CD38, CD79b) prognostic risk model was put in place to diagnose and predict the OS for CLL(26). Besides, some large-scale gene expression profiling analyses generate different prognostic factors(16, 27, 28). But the before studies constructed no prognostic model according to SEs-associated genes which regulate the expression of hub genes related to CLL tumorigenesis.
In our research, the Lasso penalized Cox regression analysis is carried out by filtering out the potential SE-associated genes and yield a nine-gene prognostic model to foresee the OS of CLL patients. All of the individual markers in the nine-gene model associated with OS of CLL by Cox regression analysis results identical. K-M survival analysis also indicate that the majority of the nine genes correlated to OS. Beyond that, the nine-gene prognostic model is highly significant in the multivariate analysis of patients without treatment. The AUCs and C-index show that our model perform well in the prediction of survival. The effectiveness of this prognostic model could be validated by an independent patient cohort. Besides OS, this risk model is another indicator of TTT. We utilize the Nine-gene risk score in the GSE22762 and GSE39671 dataset and the results also indicate that the nine-gene model could be applied to predict TTT, the high-risk patients had less time to treatment than that of the low-risk. These data strongly indicate that the nine-gene prognostic model is a significant and valid risk forecaster.
We not only evaluate the data by a rigorous training and validation design, but also concentrate on the connection between individual gene and selected disease characteristics, like IGHV mutation status, FISH abnormality, and ZAP70 expression level. The results of three of the markers(TCF7, SLAMF1, and LAG3) are detected according to the association with IGHV status is expected. The lack of a public database that includes both survival data and mutation information limits the further research in a correlation between the nine-gene model and IGHV status. The same situation has occurred in studying ZAP70 and FISH abnormality. But in the poor prognosis group, like ZAP70-high and 17q- patients, the nine-gene risk score is significantly high than the low-risk group and we find that low expression of SLAMF1 in CLL is associated with ZAP70-high expression. The quantitative relation between TCF7, LAG3, and SLAMF1 expression and inferior overall survival is an accurate finding and indicates that these genes have a pathogenic role in CLL. Attended by that, the nine-gene prognostic model also play an important role in CLL etiopathogenesis. The WGCNA of the GSE50006 dataset reveals that TCF7 and LAG3 belonged to two gene module respectively, in addition to this, the expression of GMIP, SLAMF1, and TNFRSF25 are also significantly different in normal and CLL patients. Therefore, the five genes contained in our model are possibly functionally vital in the pathogenesis of CLL. In the present study, SLAMF1, TCF7, TNFRSF25, MNT, and VEGFA are protective factors, whereas GRWD1, SLC6A3, GMIP, and LAG3 appear to be harmful factors in CLL, we subsequently discussed each gene in the prognostic model.
Transcription factor 7 (TCF7), the T-cell-specific transcription factor required for T-cell development, animal models, suggested that it probably functions as a tumor suppressor(29). TCF7 over-expression in mice led to a disease resembling CLL, indicating that it was probably involved in the CLL transformation in direct(30). In CLL, TCF7 expression provided a high rate (74%) of correct assignment of patients at genetic risk (IGHV unmutated, V3-21 usage, 11q- or 17p-)(28). The above results are consistent with ours and this indicated TCF7 was an important role in CLL.
Signaling lymphocytic activation molecule family member 1(SLAMF1), also known as CD150), regulates hematopoietic stem cell differentiation, leukocyte adhesion and activation, and humoral immune responses. SLAMF1 comparatively over-express in normal peripheral blood B cells according to before meta-analysis of three gene expressions profiling studies. Recently, researchers found lower levels of SLAMF1 expression in cases with ZAP70-high (p<0.001), IGHV-unmutated (p<0.001), 17q- (p=0.003). In past studies, we believed that loss of SLAMF1 expression in CLL modulates genetic pathways regulated chemotaxis and autophagy and that potentially affected drug responses, suggesting that the effects underlie unfavorable clinical outcomes experienced by SLAMF1-low patients(31). Together, SLAMF receptors, the vital modulators of the BCR signaling axis, improve immune control in CLL by interference with NK cells in potential(32). In our research, the univariate and multivariate analysis presented that down-regulated SLAMF1 levels had an independent negative prognostic impact on overall survival (P < 0.05). We subsequently discovered that SLAMF1 is relatively overexpressed in IGHV mutated and ZAP70-low CLL patients. The strict correlation among low levels of it and high-risk genetic features indicated that it probably represented a marker that surrogate genomic complexity, however, mechanism of this correlation is still unknown.
Lymphocyte activating gene 3 (LAG3), the immune inhibitory checkpoint receptor, is one of the immunoglobulin superfamily with about 20% amino acid homology with CD4. The expression of it activates and exhausts T, NK cells, B cells, dendritic cells, and regulatory T (Treg) cells. LAG3 high expression in CLL cells correlates with unmutated IGHV (P < 0.0001) and decreased treatment-free survival (P = 0.0087)(33). Increased LAG-3 expression on leukemic cells correlates with shorter time to treatment and poor outcome in CLL, moreover, treatment with relatlimab, a novel anti-LAG-3 blocking monoclonal antibody currently under clinical trial for different solid and hematological malignancies including CLL, restored, at least in part, NK and T cell-mediated anti-tumor responses(34). CART cell generation with the showing of ibrutinib created enhanced cell viability and expansion of CLL patient-derived CART cells. And ibrutinib enriched the mentioned cells with the less-differentiated naïve-like phenotype and declined expression of exhaustion markers (PD-1, TIM-3, and LAG-3)(35).
Vascular endothelial growth factor A (VEGFA), a member of the PDGF/VEGF growth factor family. The angiogenesis process makes a significant contribution to the pathogenesis of B-cell chronic lymphocytic leukemia (B-CLL) being the levels of VEGFA and bFGF higher in patients than in healthy(36). Whereas, in our research, VEGFA has a protective role in CLL. High expression of VEGFA indicated a good prognosis by K-M survival analysis, and in normal samples, the level of VEGFA was higher even though it was not statistically significant.
TNF receptor superfamily member 25 (TNFRSF25), the receptor expresses preferentially in the tissues in lymphocytes and possibly functions vital to the regulation of lymphocyte homeostasis. The receptor stimulates sNF-kappa B activity and regulates cell apoptosis. TNFRSF25 was differentially expressed activating CLL cells and predominantly detecting in those with early clinical stage disease(37), and probably alters the balance between cell proliferation and death, influencing CLL physiopathology and results in the clinic.
Three genes (GRWD1, GMIP, and SLC6A3) have not been described in the context of CLL before and all of them were upregulated in high-risk CLL patients. The results of the univariate and K-M survival curve were not completely consistent with multivariate analysis. Glutamate rich WD repeat containing 1 (GRWD1), was identified as one of the ribosomal/nucleolar proteins that promote tumorigenesis(38). Meanwhile, GRWD1 was also viewed as having histone-binding activity and regulating chromatin openness to specific chromatin locations(39). Overexpression in colon carcinoma tissues was related to pathological grading, tumor size, N stage, TNM stage, and poor survival, knockdown of GRWD1 function as an inhibitor on cell proliferation and colony formation, and induced cell cycle arrest and more drug susceptibility, and suppressed the migration and invasion(40). GEM interacting protein, a RhoA-specific GAP, in a proteomics screen for proteins interacting with Girdin (Girders of actin), an actin-binding protein critical for neuronal migration to the olfactory bulbs, is identified as one of the major regulators of neuronal migration in the postnatal brain(41). Solute carrier family 6 member 3 (SLC6A3) involving in the metabolism of dopamine and catecholamine is the gene for Parkinson's disease and alcoholism in potentiality. The significance of the above three genes in CLL remained to be further studied.
In GSE14973, the risk score was significantly down-regulated after the valproic acid (VPA) treatment in vitro, meantime, protective factors (VEGFA and MNT) were high-expressed and pathogenic gene (GMIP) was low-expressed than before treatment, except TNFRSF25, and these results were almost consistent with our previous conclusion. VPA, a well-tolerated anti-epileptic drug with HDAC inhibitory activity. HDAC1 and HDAC3 inhibition or knockdown results can be figured out in HDAC7 downregulation which is related to a decline in histone 3 lysine 27 acetylation (H3K27ac) at transcription start sites (TSS) and super-enhancers (SEs) prominently in stem-like BrCa cells. In GSE112953 and GSE15913, the only upregulated gene was LAG3, and it may prompt that combination drug treatment with anti-LAG3 monoclonal antibody would receive a better outcome.
A limited set of gene expression markers was of independent prognostic value and thus increased the accuracy on predicting overall survival and time to treatment, while we couldn't compare the predictions from the model to the individual factors in the clinic, like IGHV somatic mutation status, stage, and other clinical data because of incomplete data; future studies should address this well, and more large cohort studies are needed to validated our model.