3.1 Survival analysis of stromal, immune and estimate scores
According to the ESTIMATE algorithm, for the TCGA-SKCM cohort, the stromal score ranged between − 1836.9 and 1854.648, the immune score between − 1002.3 and 3833.031 and the estimate score between − 2696.16 and 5144.515, and for the GSE65904 cohort, the stromal score ranged between − 1111.36 and 2282.686, the immune score between − 570.15 and 3381.81 and estimate score between − 1343.02 and 4989.343. To identify potential correlations between overall survival and immune, stromal and estimate scores, patients with metastatic melanoma based on the median score were divided into high and low score groups in the TCGA-SKCM and GSE65904 cohorts. Kaplan-Meier survival curve showed that the overall survival of the high immune, stromal and estimate score group was longer than that in the low score group (immune score, p = 0.001; stromal score, p = 0.014; estimate score, p = 0.003) in the TCGA-SKCM cohort, which was statistically significant (Fig. 1A-C). Similarly, in the GSE65904 cohort, the overall survival of the high immune score group was also longer than that of the low score group (p = 0.012), while there was no significant difference in OS between patients with high and low stromal and estimate scores (stromal score, p = 0.973; estimate score, p = 0.202) (Fig. 1D-F). The above results indicated that the good prognosis of patients with metastatic melanoma may be related to the remodeling of immune components.
3.2 Identification and analysis of DEGs
To reveal the relationship between patients with metastatic melanoma and the tumor microenvironment, we performed the differential analysis of gene expression profiles of samples from the TCGA-SKCM and GSE65904 cohorts based on the immune scores. In the TCGA-SKCM cohort, 3716 differentially expressed genes were identified using the DESeq2 package in R software, of which 744 genes were down-regulated and 2942 genes were up-regulated (Fig. 2A). In the GSE65904 cohort, 347 differentially expressed genes were identified by the limma package, of which 9 genes were down-regulated and 338 genes were up-regulated (Fig. 2B). Venn analysis of the differential genes identified in the TCGA-SKCM and GSE65904 cohorts revealed 272 overlapping genes (Fig. 2C), of which one gene was down-regulated and 271 genes were up-regulated. We performed functional enrichment analysis, including GO: BP, CC, MF, and KEGG pathway analysis on the obtained 272 overlapping genes. GO function was mainly enriched in response to T cell activation, leukocyte cell-cell adhesion, leukocyte proliferation, and regulation of lymphocyte activation (Fig. 2D), while the KEGG pathway was mainly enriched in several signaling pathways in immune diseases, such as Hematopoietic cell lineage, Allograft rejection, Graft-versus-host disease, and Type I diabetes mellitus (Fig. 2E).
3.3 Prognostic modeling
A univariate Cox proportional hazard regression model was used to analyze 272 overlapping differential genes and 177 genes related to survival were identified. In addition, to prevent overfitting of the model, we obtained a gene signature for 17 prognostic genes by employing the least absolute shrinkage selection operator (LASSO) regression method(Fig. 3A-B). We further used multivariate COX regression analysis to establish a prognostic risk regression model containing nine genes ALOX5AP,ARHGAP15,CCL8,FCER1G,GBP4,HCK,MMP9,RARRES2 and TRIM22(Fig. 3C).Risk scores for each sample were calculated based on the regression coefficients and expression levels of the nine genes. Riskscore=-(0.00069*expression level of ALOX5AP)-(0.00056*expression level of ARHGAP15) - (0.00096*expression level of CCL8) - (0.00039*expression level of FCER1G) - (0.00077* expression level of GBP4)+( 0.007688*expression level of HCK)-(0.00033*expression level of MMP9) -(0.00057*expression level of RARRES2) -(0.00178 * expression level of TRIM22).Patients were divided into high-risk and low-risk groups according to the median risk score. We then generated risk curves and scatter plots of risk scores and survival status for each sample in the TCGA-SKCM cohort and found that as risk scores increased, the number of deaths also increased (Fig. 3D-E). Kaplan-Meier analysis showed that patients in the high-risk group had lower OS than those in the low-risk group (P < 0.05) (Fig. 3F). Additionally, the area under the curve (AUC) at 1, 2, and 3 years OS was 0.712, 0.838, and 0.828, respectively (Fig. 3G). Our results suggested that the nine-IRGs prognostic model for survival prediction had acceptable sensitivity and specificity.
3.4 Validation of the prognostic model in the GEO melanoma cohort
To determine the stability of the prognostic model, the performance of the nine-IRGs prognostic model was evaluated in the GSE65904 cohort. Based on the nine-IRGs prognostic model, the risk score and survival status of each sample in the GSE65904 validation cohort were shown in the figure (Fig. 4A-B). The receiver operating characteristic (ROC) curves were used to predict survival rates at 1, 2 and 3 years and the area under the curve was 0.586, 0.639, and 0.640, respectively (Fig. 4C). Kaplan-Meier survival curves showed that the survival rate of high-risk patients is significantly shorter than that of low-risk patients (Fig. 4D). Consistent with the results of the training cohort, in the GSE65904 validation cohort, patients with metastatic melanoma who were assigned to the high-risk group according to the prognostic model had significantly worse overall survival (OS) than those assigned to the low-risk group.
3.5 Independent predictive capability of prognostic models
To further explore the independent predictive ability of the prognostic model, traditional clinical variables such as age, gender, tumor stage, TNM stage, and risk score were again analyzed by univariate and multivariate Cox regression analysis to evaluate the independent predictive ability of the model. In the TCGA-SKCM cohort, age, TN stage, and risk score models were statistically significant in the univariate analysis (Fig. 5A). Multivariate analysis further showed that T N stage and risk score were independent parameters associated with survival (Fig. 5B).In addition, to extend the availability of the nine-IRGs prognostic model and clinical applications, a nomogram was made for clinical variables and risk scores that could be used to visually predict the 1-year, 2-year and 3-year survival times of patients (Fig. 5C), and calibration curves were plotted to validate the accuracy of the prediction model (Fig. 5D-F).The predicted values matched well with the actual values, indicating that our model could be used to predict the prognosis of patients with metastatic melanoma. Notably, the findings suggested that the risk score remained independent in predictive ability and could be used as independent factors in the prognosis of patients with metastatic melanoma.
3.6 Gene set enrichment analysis
To identify potential signaling pathways and biological processes associated with the nine-IRGs prognostic model, gene set enrichment analysis (GSEA) was used between high-low and low-risk groups in the TCGA-SKCM cohort. The GO results showed that the low-risk group was mainly enriched in immune receptor activity (NES = 2.229, FDR < 0.002), cytokine receptor activity (NES = 2.202, FDR < 0.002), regulation of lymphocyte-mediated immunity ( NES = 2.196, FDR < 0.002)and regulation of leukocyte-mediated immunity (NES = 2.186, FDR < 0.002) (Fig. 6A). KEGG results showed that the low-risk group was mainly enriched in cytokine-cytokine receptor interaction (NES = 2.184, FDR < 0.001), natural killer cell-mediated cytotoxicity (NES = 2.034, FDR < 0.002), antigen processing and presentation (NES = 2.01, FDR < 0.002) and chemokine signaling pathway (NES = 1.960, FDR < 0.004) (Fig. 6B). These results indicated that the nine-IRGs prognostic model could be potentially valuable for inferring tumor microenvironment in patients with metastatic melanoma.
3.7 Immune infiltration analysis of high-risk and low-risk groups
To further elucidate the relationship between the prognostic model and tumor-infiltrating immune cells, we performed a series of analyses of cell types on the tumor microenvironment in the TCGA-SKCM cohort. First, 189 patients were divided into high-risk and low-risk groups based on the median risk score. The CIBERSORT algorithm was used to quantify the proportion of immune cells, and the difference of 22 immune cell types in the high-risk and low-risk groups was studied. The violin plot showed that plasma cells, CD8 + T cells, memory-activated CD4 T cells and M1 macrophages had higher infiltration levels in the low-risk group, while M2 macrophages were highly expressed in the high-risk group (Fig. 7A).In addition, to verify the accuracy of the results, we quantified the level of tumor immune infiltration from patients with metastatic melanoma by xCell and ssGSEA. Visual analysis revealed that the infiltration level of most immune cells in the low-risk group was significantly higher than that in the high-risk group (Fig. 7B-C).Therefore, we concluded that metastatic melanoma patients with different risk score subtypes significantly affected the tumor microenvironment.
3.8 Analysis of immune checkpoints in high-risk and low-risk groups
Since immune checkpoint inhibitors(ICIs) were used to treat malignant melanoma in clinical practice, we studied whether the risk model was related to ICIs-related biomarkers and found that the high-risk score group was negatively correlated with the high expression of CTLA4 (p < 0.001), HAVR2 (p < 0.001), LAG3 (p < 0.001), TIGIT (p < 0.001), CD274 (p < 0.001) and PDCD1 (p < 0.001)(Fig. 8). Therefore, patients in the low-risk group could have a better response to treatment with ICIs.
3.9 Construction of weighted gene co-expression network and identification of key modules, especially the yellow-green module that is highly correlated with risk scores
WGCNA is a systems biology method for analyzing the expression patterns of multiple genes in different samples, which can form clusters or modules containing genes with the same expression pattern[23]. If certain genes are located in one module, they may have the same biological function and the association between the module and the sample characteristics (such as clinical characteristics) can be studied. Three outlier samples were excluded from the TCGA-SKCM cohort so that a total of 186 samples with survival data were included in the WGCNA (Fig. 9A).In this study, we chose the power of β = 5 (scale-free R2 = 0.9) as a soft threshold to implement the scale-free network. As a result, six gene co-expression modules were identified after excluding gray modules using merged dynamic tree cut (Fig. 9B). The heat map plotted the TOM of the 4709 selected genes in the analysis, which indicated that each module was validated independently of the other. Subsequently, we found that the yellow-green module, containing ALOX5AP, FCER1G, GBP4, HCK, MMP9, RARRES2, and TRIM22 in the nine-IRGs prognostic model, had the highest correlation with risk scores. (R2 =-0.3,P = 3e-05) (Fig. 9C). We plotted the scatter plot of GSvsMM and risk scores in the yellow-green module (Cor = 0.44, P = 1.8e-37), elucidating that ALOX5AP(GS = 0.33,MM = 0.72),FCER1G(GS = 0.23,MM = 0.24)GBP4(GS = 0.30,MM = 0.64),HCK(GS = 0.07,MM = 0.89),MMP9(GS = 0.17,MM = 0.22),RARRES2(GS = 0.28,MM = 0.39) and TRIM22(GS = 0.34,MM = 0.80) seven genes were the central genes in the yellow-green module (Fig. 9D).
3.10 Analysis of biological functions and immune status of yellow-green module genes
In gene networks that conform to a scale-free distribution, genes with similar expression patterns can be co-regulated, functionally related, or pathway sharing. To further explore the potential biological functions of ALOX5AP, FCER1G, GBP4, HCK, MMP9, RARRES2 and TRIM22 in metastatic melanoma, we performed GO and KEGG pathway analysis of genes associated with the yellow-green module using the "GO" and "KEGG" packages of R software. GO analysis showed that genes were mainly enriched in response to interferon-gamma, antigen processing and presentation, cellular response to interferon-gamma and adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains (Fig. 10A). KEGG pathway analysis was mainly associated with immunological diseases such as allograft rejection, autoimmune thyroid disease and viral myocarditis (Fig. 10B ).
3.11 High expression of nine-IRGs is related to the progression of metastatic melanoma and tumor purity
To explore the prognostic ability of the nine-IRGs in different subgroups of metastatic melanoma, the Kaplan-Meier survival curves of ALOX5AP, ARHGAP15, CCL8, FCER1G, GBP4, HCK, MMP9, RARRES2 and TRIM22 were plotted by the Survival package of R software. The results showed that the high expression (median) of ALOX5AP (p < 0.001), ARHGAP15 (p < 0.001), CCL8 (p < 0.001), FCER1G (p = 0.003), GBP4 (p < 0.001), HCK (p = 0.015), RARRES2 (p = 0.01) and TRIM22 (p < 0.001) has a good prognosis in metastatic melanoma, which was statistically significant (Fig. 11A). We next used gene expression data to assess the role of ALOX5AP, ARHGAP15, CCL8, FCER1G, GBP4, HCK, MMP9, RARRES2 and TRIM223 in the progression of metastatic melanoma. Compared with the high-risk group, the expression level of ALOX5AP, ARHGAP15, CCL8, FCER1G, GBP4, HCK, MMP9, RARRES2 and TRIM22 in the low-risk group was higher (Fig. 11B).
3.12 TIMER analysis
To understand whether ALOX5AP, ARHGAP15, CCL8, FCLER1G, GBP4, HCK, MMP9, RARRES2, and TRIM22 accurately reflected the status of the immune microenvironment in metastatic melanoma, the TIMER database was used to analyze the relationship between ALOX5AP, ARHGAP15, CCL8, FCLER1G GBP4, HCK, MMP9, RARRES2 and TRIM22 with immune cell infiltration. The results showed that the expression of ALOX5AP, ARHGAP15, CCL8, FCLER1G GBP4, HCK, MMP9, RARRES2 and TRIM22 showed a positive correlation with B cells, CD8 + T cells, CD4 + T cells, macrophages, neutrophils and dendritic cells and a negative correlation with tumor purity (Fig. 12).