265 cases soft tissue sarcomas classification: There were 59 cases of dedifferentiated liposarcoma (22.3%), 2 cases of desmoid tumor (0.75%), and 52 cases of undifferentiated pleomorphic sarcoma (UPS)(including pleomorphic, malignant fibrous histiocytoma (MFH) giant cell MFH/undifferentiated pleomorphic sarcoma with giant cells, undifferentiated pleomorphic sarcoma) (19.6%), 107 cases of leiomyosarcoma (LMS) (40.4%), 10 cases of malignant peripheral nerve sheath (MPNST) (3.77%), 25 cases of myxofibrosarcoma (9.43%), and 10 cases of synovial sarcoma (including synovial sarcoma- biphasic, synovial sarcoma- monophasic, synovial sarcoma- poorly differentiated) (3.77%).
Immune and stromal scores
The range of estimate scores based on the ESTIMATE algorithm was from -2445.3107 to 13152.619. Stromal scores ranged from -740.7589 to 6339.22. The range of immune scores was from -1890.07 to 7914.021. The stromal score of dedifferentiated liposarcoma was the highest among the eight subtypes (Among all seven subtypes, only two desmoid tumor samples were included; thus, they were not included for comparison.). This was followed by myxofibrosarcoma, UPS, MPNST, and LMS. The synovial sarcoma subtype had the lowest stromal score (P < 0.0001). In addition, the immune scores were ranked from the highest to lowest as follows: UPS>myxofibrosarcoma > dedifferentiated liposarcoma > MPNST > LMS > synovial sarcoma (P < 0.0001), indicating that immune and stromal scores are important for the classification of different soft tissue sarcoma subtypes.
The gene expression data were divided into two groups—high expression and low expression groups—based on the median of the immune and stromal score. Kaplan-Meier survival curves were drawn, which showed that in the high immune score group, the total survival time was generally higher than that in the low immune score group (see Fig.1 A). The median of overall survival times in the high immune score group and the low immune score group were 2464 days and 1722 days, respectively (log-rank test P<0.05). Similarly, although the result was not statistically significant (the log-rank test P>0.05), the survival curve obviously indicated that the total survival time of the group with high stromal scores was higher than that of the group with low stromal scores, with a median total survival time of 2448 days for the high score group compared with 1722 days for the low score group (see Fig.1 B).
The sarcoma samples downloaded from the TCGA database were divided into seven types—dedifferentiated liposarcoma, desmoid tumor, UPS, LMS, MPNST, myxofibrosarcoma, and synovial sarcoma—and a box plot (Fig.1 C,D) drawn to compare different types of immunity and stromal scores, and there were statistically significant(P<0.0001), indicating that the immune and stromal scores were significantly different for the classification of sarcoma.
Differentially expressed genes results
To compare the difference in gene expression between the high score and low score groups, we used the “limma” algorithm, with the setting, log (fc) 2 = 2 and FDR = 0.05, to calculate the DEGs for the low stromal, immune, and estimate score groups relative to the high group. Volcano plots were drawn to indicate differences in gene expression among cases with different immune, stromal, and estimate scores. In the charts (see Fig.2 A, B, C), green represents downregulated genes, and red represents upregulated genes. In the immune score group, 91 genes were upregulated and 479 genes were downregulated. In the stromal score group, 157 genes were upregulated and 281 genes were downregulated. In the estimate score group, 100 genes were upregulated and 482 were downregulated. A Venn diagram was drawn to obtain 258 genes with common differences in all three groups, among which 66 genes were upregulated and 192 genes were downregulated (see Fig.2 D,E).
Functional enrichment analysis of DEGs
A total of 18 cellular components (CC), 26 molecular functions (MF), and 95 biological processes (BP) were obtained by GO analysis of 258 DEGs. Analysis of the top ten terms resulted in the minimum P values for immune and inflammatory responses, plasma membrane, receptor activity, and chemokine activity. Similarly, the results of KEGG pathway analysis were also mainly related to immunity, among which the highest proportion included Staphylococcus aureus infection, accounting for 29.17%, Leishman disease, accounting for 25%, and viral protein interaction with cytokine and cytokine receptor, accounting for 12.5% (see Fig.3).
Relationship between DEGs and overall survival results
To explore the different roles of 258 DEGs in survival, we set the median as the boundary to draw a Kaplan–Meier survival curve for each of the DEGs. Eighty-six genes were statistically significant predictors of overall survival (log-rank P < 0.05) (see Fig.4).
Construction of a protein co-expression network results
A total of 15 key node genes were selected because they were the most closely related to other genes around. Cluster 1 had 14 nodes and 39 edges; the key nodes were LILRB2, AOAH, CCL5, NCF4(Ryan, B. M. et al.,2014), CYBB. Cluster2 had a total of 8 nodes and 16 edges; the key nodes were ITGAM, CD2, IL2RG, CD163. Cluster 3 had 16 nodes and 33 edges; the key nodes were VAV1, CXCR3, HCK, MS4A6A, LY86, IDO1. Enrichment analysis was performed on these key genes, and the GO analysis results were as follows:
- positive regulation of T-cell apoptotic process
- negative regulation of T-cell apoptotic process
- inflammatory response
The KEGG results included:
- chemokine signaling pathway
- leukocyte transendothelial migration
These results are all associated with the immune system. (Only results of P < 0.05 are included) (see Fig.5).
Multivariate regression analysis of DEGs results (see Tab.1)
Table 1. Univariate regression analysis of DEGs
Group
|
Significant Value
|
HR
|
Confidence Interval 95.0%
|
age_at_initial_pathologic_diagnosis
|
0.000
|
1.082
|
1.035
|
1.130
|
residual_tumor
|
0.015
|
3.536
|
1.279
|
9.777
|
ADAP2
|
0.012
|
5.874
|
1.480
|
23.307
|
CSF1R
|
0.001
|
0.042
|
0.007
|
0.266
|
CYBB
|
0.003
|
10.208
|
2.159
|
48.262
|
MPEG1
|
0.049
|
0.253
|
0.064
|
0.996
|
NCF4
|
0.000
|
69.966
|
8.793
|
556.733
|
TBXAS1
|
0.019
|
4.680
|
1.282
|
17.084
|
C13orf33
|
0.008
|
0.608
|
0.421
|
0.878
|
GPR34
|
0.044
|
3.385
|
1.034
|
11.079
|
ITGAM
|
0.019
|
0.165
|
0.037
|
0.741
|
DOK2
|
0.030
|
0.219
|
0.056
|
0.862
|
CD163
|
0.015
|
0.186
|
0.048
|
0.726
|
TLR8
|
0.041
|
2.663
|
1.043
|
6.799
|
LYVE1
|
0.001
|
2.269
|
1.415
|
3.638
|
BATF
|
0.000
|
0.121
|
0.045
|
0.323
|
P2RY13
|
0.030
|
0.434
|
0.204
|
0.922
|
SLAMF6
|
0.037
|
3.636
|
1.080
|
12.236
|
S100A9
|
0.000
|
0.373
|
0.220
|
0.630
|
CD38
|
0.023
|
0.412
|
0.192
|
0.884
|
FCER2
|
0.033
|
0.669
|
0.463
|
0.968
|
CADM3
|
0.013
|
0.823
|
0.706
|
0.959
|
WISP2
|
0.049
|
1.232
|
1.001
|
1.516
|
KLHL23
|
0.004
|
2.201
|
1.279
|
3.789
|
MYH11
|
0.000
|
0.565
|
0.448
|
0.712
|
Using clinical data from TCGA database, univariate regression analysis was performed to obtain "age at initial pathologic diagnosis “and "residual cancer" as prognostic factors for sarcoma. Multivariate regression analysis of the differential genes was carried out to obtain a gene list that could predict the prognosis of sarcoma, including risk genes (HR>1) and benign genes (HR<1).
Verification of DEGs in the GEO database results
To verify the DEGs obtained are also significant in other datasets, we used the 140 independent liposarcoma samples of GSE30929 in the GEO database to perform survival analysis by calculating the forward relapse-free survival (DFS). We obtained 12 genes that are meaningful both in the TCGA and GEO database: CLEC10A(Kurze, A.-K. et al.,2019), IDO1(Nafia, I. et al.,2020), FCER2(Sharma, V. et al.,2014), CCL13(Mendez-Enriquez, E. and García-Zepeda, E. A.,2013), KLRB1(Zhang, G. et al.,2020; Gentles, A. J. et al.,2015), P2RY13(Fan, T. et al.,2020), PYHIN1(Massa, D. et al.,2020; Tong, Y., Song, Y. and Deng, S.,2019), DOK2(Ohsugi, T.,2017), PLA2G2D(Miki, Y. et al.,2016; Miki, Y. et al.,2013), MARCO(Xiao, Y. et al.,2019), MYH11(Hu, C., Chen, B., et al.,2020), NCF4(Ryan, B. M. et al.,2014). Nine of the genes are important because they have not previously been reported to be associated with the prognosis of sarcoma, including CLEC10A, FCER2, CCL13, KLRB1, P2RY13, PYHIN1, PLA2G2D, MARCO, NCF4.