Pan-sarcoma characterization of lncRNAs in the crosstalk of EMT and tumor immunity identifies distinct clinical outcomes and potential implications for immunotherapy

doi:10.21203/rs.3.rs-1540777/v1

Download PDF

Research Article

Pan-sarcoma characterization of lncRNAs in the crosstalk of EMT and tumor immunity identifies distinct clinical outcomes and potential implications for immunotherapy

https://doi.org/10.21203/rs.3.rs-1540777/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

The epithelial-to-mesenchymal transition (EMT) is a reversible process that may interact with tumor immunity through multiple approaches. Increasing evidence has demonstrated the interconnections among EMT-related processes, tumor microenvironment, immune activity, as well as the potential influence on immunotherapy response. Long non-coding RNAs (lncRNAs) are emerging as critical modulators of gene expression. They can play fundamental roles in tumor immunity and act as promising biomarkers of immunotherapy response. However, the potential roles of lncRNA in the crosstalk of EMT and tumor immunity are still unclear in sarcoma. We obtained multi-omics profiling of 1440 pan-sarcoma patients from 19 datasets. Through an unsupervised consensus clustering approach, we categorized EMT molecular subtypes. We subsequently identified 26 EMT molecular subtype and tumor immune-related lncRNAs (EILncRNA) across pan-sarcoma types and developed an EILncRNA signature-based weighted scoring model (EILncSig). The EILncSig presented a favorable performance in predicting the prognosis of sarcoma, and the high-EILncSig was associated with exclusive TME characteristics with desert-like infiltration of immune cells. Multiple altered pathways, somatically mutated genes and recurrent CNV regions associated with EILncSig were identified. Notably, the EILncSig was associated with the efficacy of immune checkpoint inhibition (ICI) therapy. We additionally screened compounds such as Irinotecan that may have the potential to convert the EILncSig phenotype. By integrative analysis on multi-omics profiling, our findings provide a comprehensive resource for understanding the functional role of lncRNA-mediated immune regulation in sarcomas, which may advance the understanding of tumor immune response and the development of lncRNA-based immunotherapeutic strategies for sarcoma.

Sarcoma

Epithelial-to-mesenchymal transition

LncRNA

Tumor immunity

Prognostic risk model

Machine learning

Sarcomas are a heterogeneous group of primary mesenchymal tumors, derived from bone, cartilage, muscle, and other connective tissues[1]. More than 100 different sarcoma subtypes varying in pathology, clinical presentation, molecular characteristics, and response to therapy have been identified, 80% of which are soft tissue sarcomas (STS), 15% as bone sarcomas, and 5% as gastrointestinal stromal tumors[2]. While relatively rare, sarcomas are often fatal and responsible for a significant amount of survival loss as the most aggressive childhood cancers[3]. The clinical management of sarcomas is highly challenging due to misdiagnosis, late diagnosis, as well as their heterogeneity, aggressive nature, and resistance to conventional treatments such as surgery, radiation, and chemotherapy[4, 5]. Therefore, novel therapeutic strategies are highly needed for sarcomas. To date, immunotherapy has gained many successful applications in several cancers. As a promising treatment strategy, several clinical trials on immunotherapy (such as immune checkpoint inhibitor (ICI) therapy) for sarcoma patients have shown profound beneficial effects on patients’ survival[4, 6]. However, some refractory patients still have disproportionate responses to immunotherapy. Thus, it is imperative to explore biomarkers that can function as molecular targets or modulators in the aspect of tumor immunology for sarcomas.

Epithelial-to-mesenchymal transition (EMT) is a reversible process that constitutes a critical characteristic of the tumor microenvironment (TME), which is reported to play critical roles in cancer metastasis, drug resistance and immune escape in several carcinomas[7, 8]. In contrast to carcinomas, the variable degree of epithelial/mesenchymal differentiation has been observed in various sarcoma histology subtypes, which can be either more epithelial-like (such as Ewing sarcoma, synovial sarcomas) or more mesenchymal-like (such as osteosarcoma, chondrosarcoma), as well as by the existence of sarcoma subtypes presenting both extreme phenotypes within one tumor[1, 9]. Accumulating evidences indicates that many sarcomas can undergo EMT- and MET-related processes to take advantage of both biological features leading to high aggressiveness and unfavorable clinical outcomes[9, 10]. However, few studies reported association and potential regulators among EMT, TME and cancer immunity in sarcomas. Over the last few years, long non-coding RNAs (lncRNAs) have emerged as critical elements in gene regulatory networks, affecting diverse biological processes such as EMT to modulate the progression of sarcomas[11, 12]. Increasing evidence also shows that lncRNAs can function as communicators and mediators, being directly and/or indirectly involved in the crosstalk between tumor cells and infiltrating immune cells within the TIME to participate in cancer onset and progression. For example, Huang et al. reported that lncRNA NKILA promotes tumor immune evasion by sensitizing T cells to activation-induced cell death[13]. Hu et al. identified oncogenic lncRNA LINK-A that regulates cancer cell antigen presentation and intrinsic tumor suppression[14].

In this study, we integrated large-size pan-sarcoma datasets with multi-omics profiling. Through a machine learning approach, we identified pan-sarcoma EMT molecular subtypes and identified lncRNAs in the cross-talk of EMT and immune microenvironment across sarcomas. We also constructed a lncRNA-based computational model and demonstrated it as a predictive biomarker for prognosis of patients with sarcomas, as well as a comprehensive resource for understanding functional role of lncRNA-mediated immune regulation in sarcomas.

Pan-sarcoma data collection

Overall, we collected 19 public sarcoma datasets from GDC TCGA, GDC TARGET, GEO and EMBL-EBI databases. Accessions for the datasets used in the present study are as follows: phs000178 (TCGA-SARC Sarcoma), phs000468 (TARGET-OS Osteosarcoma), GSE13433, GSE142162, GSE14827, GSE17618, GSE20196, GSE20559, GSE23980, GSE34620, GSE34800, GSE37371, GSE66533, GSE71118, GSE87437, E-MEXP-1922, E-MEXP-3628, E-MEXP-964, E-TABM-1202.

For the TCGA-SARC dataset, RNA-Seq (raw count and FPKM format) data, masked somatic mutation data (mutect2), masked copy number segment data, and survival follow up data with clinicopathological characteristics were obtained from the TCGA data portal by using the TCGAbiolinks R package[15]. TCGA-SARC molecular subtype data and other characteristics of patients were obtained from Lazar et al.’s study[16]. TCGA-SARC immune subtype data were curated from Thorsson et al.’s study[17]. For the TARGET-OS dataset, RNA-Seq (raw count and TPM format) data and clinical information were obtained from TARGET data matrix. The latest Homo sapiens GRCh38.104 annotation file was downloaded from Ensembl[18] for gene symbol and biotype annotations corresponding to Ensembl ID. DESeq2[19] R package was applied to filter out low-abundance genes, normalize RNA-Seq counts data, and perform variance stabilizing transformation. RNA-Seq data of FPKM format was transformed to TPM format[20].

For microarray datasets, raw or processed data, and available clinical information were downloaded from GEO[21] and EMBL-EBI[22]. When possible, available Affymetrix CEL files within each dataset were re-processed and re-normalized individually into expression matrix through the robust multi-array average expression measure by using the affyPLM R package[23]. The arrayQualityMetrics R package[24] was applied to exclude low-quality and outlier samples of microarray datasets. All microarray data used in this study was based on Affymetrix Human Genome U133 Plus 2.0 Array. We utilized the Combat method of sva R Package[25] to correct the batch effect caused by non-biotechnological bias of the 17 microarray datasets, and combine them into a pan-sarcoma microarray dataset of 1085 samples. The hgu133plus2.db R package was applied to map probes into gene symbols, in which the probe with the highest mean values was selected when multiple probes were mapped to one gene. In total, 1440 sarcoma patients were included in this study. Detailed information for all datasets and patients were documented in Supplementary file 1.

Immunotherapy data collection

RNA-Seq data and clinical information from patients with tumors treated with anti-programmed death (PD)-1 or anti-PD-ligand (PDL)-1 ICB therapy were obtained from Kim et al.’s study (GSE176307)[26], including overall survival, progression-free survival and treatment response of 89 urothelial cancer patients.

Clustering molecular pattern of EMT signature expression

We collected curated EMT-related gene lists reported by 5 pan-cancer studies via EMTome[27–32], and combined them into an EMT signature (Supplementary file 1). To cluster EMT molecular pattern of sarcoma patients, we utilized the ConsensusClusterPlus R package[33] to perform an unsupervised consensus clustering on expression of EMT signature in 1085 pan-sarcoma samples based on K-means algorithm. The resampling was set to be 1000 repetitions to ensure the clustering stability. Distance matrix of consensus clustering was extracted, and a silhouette analysis was applied to assess how similar an individual was matched to its assigned cluster compared to other clusters by using the CancerSubtypes R package[34].

Computation of the EMT score

EMT gene signatures with annotation of epithelial and mesenchymal markers from Tuan et al.’s and Hollern et al.’s studies were separately used to compute the EMT score[28, 32]. The EMT score for each sample was calculated as \(\sum _{i=1}^{{n}_{1}}\frac{{M}_{i}}{{n}_{1}}-\sum _{j=1}^{{n}_{2}}\frac{{E}_{j}}{{n}_{2}}\), in which M and E respectively represent the normalized expression of the mesenchymal maker genes and epithelial maker genes, \({n}_{1}\) and \({n}_{2}\)respectively represent the number of corresponding genes, as described in a previous study[35].

Functional Enrichment Analysis

The clusterProfiler R package[36] was used for over representation analysis and pre-ranked gene set enrichment analysis (GSEA). The non-parametric gene set variation analysis (GSVA) was conducted by using the GSVA R package[37]. A | normalized enrichment score (NES) | ≥ 1.0 and adjust P-value < 0.05 was considered with significance for the pre-ranked GSEA. The GSVA enrichment scores were applied to the limma R package to fit a linear model, and the alteration was considered with significance when the | log₂FoldChange | ≥ 0.2 and adjust P-value < 0.05. Gene sets of Gene Ontology (GO)[38] Biological Process section, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway[39], WikiPathways[40] and Reactome[41] pathway were downloaded from the Molecular Signatures Database (MSigDB)[42].

Evaluation of TME cell infiltration abundance

The CIBERSORTx[43] algorithm with the LM22 signature matrix was utilized to quantify the abundance of 22 types of TME infiltrating cells. We set parameters of CIBERSORTx as follows: 100 times for permutation test, batch correction - bulk mode, absolute mode of output scores, and RNA-Seq expression data without quantile normalization, while microarray expression data with quantile normalization. The overall fraction of stromal and immune cells infiltration in the sarcoma samples was calculated by using the xCell via the immunedeconv R package[44, 45].

Weighted gene co-expression network analysis

Weighted gene co-expression network analysis (WGCNA) is commonly used for mining gene co-expression networks and hub genes based on pairwise correlations in genomic applications[46]. In the present study, to identify lncRNAs in gene modules that were most relevant to EMT molecular subtype, we applied the WGCNA R package[47] to construct weighted gene co-expression modules and module-trait relationship from the pan-sarcoma samples. The threshold of scale-free topology fitting index (R²) was set as 0.90. The minimum module size was set as 30 and the threshold for merging modules was set as of 30%. Intramodular analysis was performed by calculating the correlation of module membership and gene significance for EMT molecular subtype.

Identification of immune-related lncRNAs in sarcoma

We downloaded curated human immune gene list with function and Gene Ontology term from the ImmPort project[48], and mapped gene symbols to Ensembl IDs. In total, we obtained 1752 immune genes in 17 immune functional pathways in subsequent analyses (Supplementary file 3). To identify potential immune-related lncRNA modifiers, we proposed a computational method that integrates a gene expression-based immunology framework as follows: 1) All lncRNAs were ranked based on their co-expression relationship with immune marker genes; 2) Infiltrations of immune cells were estimated through CIBERSORTx with absolute score mode, all lncRNAs were ranked based on the correlation between their expression and the abundance of a given infiltrating immune cell component; 3) GSVA enrichment score of the 17 immune functional pathways were computed for each sample, all lncRNAs were ranked based on the correlation between their expression and the GSVA enrichment score of a given immune functional pathway. Pearson's correlation coefficients (PCC) were calculated for each step, where a lncRNA with a PCC ≥ 0.3 and adjusted p-value < 0.05 was considered as candidate immune-related lncRNAs.

Development of an EMT- and tumor immune-related lncRNA signature scoring model

We identified lncRNAs that concurrently correlates to EMT molecular subtype and tumor immune in sarcoma. An EMT- and tumor immune-related lncRNA signature scoring model (EILncSig) was constructed by a method similar to a previous study[49]: 1) The prognostic value of each candidate lncRNA was firstly evaluated by univariate Cox proportional hazards regression analysis; 2) A weighted combination was applied by using the regression coefficients in the multivariate Cox regression analysis. The EILncSig score for each patient was defined as \(\sum _{i=1}^{k}{(Exp}_{i}*{Beta}_{i})\), where Exp and Beta means the normalized expression and regression coefficient of candidate lncRNA, and K means the number of lncRNAs in the EILncSig scoring model. We applied time-dependent ROC curves analysis and Kaplan-Meier survival analysis to evaluate the prognostic prediction value of EILncSig scoring model through the survivalROC and survminer R packages. The optimal cut point for dividing patients into high- and low- EILncSig levels was defined by the surv_cutpoint function of the survminer R package, where the parameter - minimal proportion of observations per group was set to 30% to avoid the occurrence of too few patients in a certain group. Univariate and multivariate Cox regression analyses were performed on EILncSig and available clinicopathological characteristics.

Clustering analysis of expression pattern based on pan-cancer TME signatures

The categorizing method for pan-cancer TME patterns and 29 sets of gene expression signatures describing pan-cancer TME characteristics were obtained from Bagaev et al.’s study[50] (Supplementary file 5). After performing GSVA on all the TME signatures for each patient, the GSVA enrichment scores were robustly standardized (median-centered and scaled by median absolute deviation) within each cohort. By using ConsensusClusterplus R package, we applied an unsupervised clustering algorithm to analysis the standardized GSVA scores of TME signatures. K-means clustering algorithm was used and resampling was set to be 1000 repetitions. An analysis of t-distributed stochastic neighbor embedding (t-SNE) by using the Rtsne R package[51] was further conducted and visualized on a 3D map with the scatterplot3d package[52] of R.

Analysis of somatic mutation and recurrent regions somatic copy number alteration

Analysis and visualization of somatic mutations of TCGA-SARC dataset was performed through the Maftools R package[53]. To determine significantly amplified or deleted regions of SCNA, we applied GISTIC 2.0[54] to analyze DNA copy number segmentation profiles. The analytic process of GISTIC 2.0 was completed on the GenePattern platform[55]. Parameters of GISTIC 2.0 were set as follows: noise threshold − 0.3, focal length cutoff − 0.5, confidence level − 90%, q-value threshold − 0.25, copy-ratio cap − 1.5, and arm-level peel-off mode enabled. We applied GenomicRanges R package to determine genes that overlapped within any “wide peak” region identified by GISTIC 2.0 with a residual q value less than 0.05.

Analysis of differentially expressed genes

The DESeq2 R package was applied to process RNA-Seq counts data and then identify differentially expressed genes (DEG) between two groups. The differential expression threshold was defined with a fold-change of threshold at 1.5 and an adjusted P value < 0.05. The DEG results were presented in volcano plots and heatmaps by EnhancedVolcano and pheatmap R packages.

Discovery of potential drugs based on CMAP database

The CMAP database[56] provides large-scale pharmacogenomic data including systematic drug-induced perturbation. We ranked and selected the top 500 DEGs to represent the transcriptomic alteration for EILncSig, and utilized PharmacoGx R package[57] to measure the concordance of transcriptomic difference and drug induced cellular molecular alterations. The GSEA method was implemented for connectivity scores calculation, and permutation testing was set as 100 times to detect the significance.

Statistical analysis

Statistical tests in this study were conducted by using the R software (version 4.1.2, https://www.r-project.org). The ggplot2 R package and extensions[58] were used for data analysis and visualization. The Wilcoxon signed-rank test and Kruskal-Wallis test were applied to compare continuous variables for two groups, and three or more groups, respectively. Categorical data was tested by the chi-square test. The Kaplan-Meier method, log-rank test and Cox proportional hazards regression analysis were used in prognostic analysis. Correlation analysis of continuous variables was performed by using the Pearson correlation test, while the Spearman correlation test was performed instead considering the influence of outliers when necessary. A statistical test is considered with statistical significance at two-sided P < 0.05. When necessary, the Benjamini-Hochberg method was applied for P value adjustment.

Derivation of de novo pan-sarcoma EMT molecular subtypes from the perspective of EMT signature

First, we collected EMT process-related genes that were curated by Tuan et al., Rokavec et al, Kandimalla et al., Koplev et al., and Hollern et al.’s pan-cancer studies. In total, 630 genes were annotated and combined into a merged EMT signature. Detailed gene symbols and gene types (epithelial/mesenchymal marker) are delineated in Supplementary file 1. As shown in Fig. 1A and 1B, a total of 1440 sarcoma patients of various histological subtypes were enrolled in the present study, in which RNA-seq expression data was contained in TCGA-SARC and TARGET-OS, and microarray expression data based on the same platform was contained in the other datasets. To obtain a comprehensive understanding of the pan-sarcoma EMT molecular subtypes, we combined transcriptomic profiling and available clinical information of 17 datasets that were tested on the same platform (GSE13433, GSE142162, GSE14827, GSE17618, GSE20196, GSE20559, GSE23980, GSE34620, GSE34800, GSE37371, GSE66533, GSE71118, GSE87437, E-MEXP-1922, E-MEXP-3628, E-MEXP-964, E-TABM-1202) (Fig S1A). A large pan-sarcoma expression dataset that contains 1085 samples with over 12 subtypes was involved for further clustering analysis.

Through an unsupervised consensus clustering method on the expression pattern of the merged EMT signature, we classified sarcoma patients into distinct EMT molecular subtypes, where 636 patients were assigned to EMT Cluster_1 (EMT_C1), and 449 patients were assigned to Cluster_2 (Supplementary file 2). Consensus matrix and silhouette analysis (average width: 0.93) showed satisfactory clustering results (Fig. 2A and 2B). To reveal the association of EMT molecular subtypes and prognosis of sarcoma patients, we performed Kaplan-Meier survival analysis on patients with matched expression profiling and clinical information. For Chibon et al.’s Sarcoma cohort (GSE71118), we obtained a p-value of 0.003596 from the log-rank test, indicating that patients of EMT_C2 had significantly worse metastasis-free survival (MFS) (Fig. 2C). A consistent result was also found as shown in Fig. 2D that patients of EMT_C2 had significantly worse overall survival (OS) in Williamson et al.’s rhabdomyosarcoma cohort (E-TABM-1202, log-rank p = 0.02605). Furthermore, more patients with metastatic diseases were observed in EMT_C2 (Fig. 2E, 49% vs 35%, p = 0.019). To analyze the biological processes and pathways variation of the distinct EMT molecular subtypes, we implemented gene set variation analysis (GSVA). As shown in Fig. 2F, oxidative damage, TGF-β signaling, and several immune-related pathways including interleukin-10 (IL-10) signaling, type II-interferon (IFNG) signaling, T/B cell receptor signaling pathway, and NK cell chemotaxis/cytotoxicity were significantly enriched in the EMT_C1 group while mRNA capping/processing/splicing, nucleolus organization, and several DNA damage repair related pathways including mismatch repair and base excision repair were significantly enriched in the EMT_C2 group. Accumulating studies have reported the potential association between TME-infiltrating immune cells and dysregulated EMT/MET in the tumor. Thus, we applied the xCell tool, a novel gene signature-based ssGSEA method to estimate the overall TME infiltration status, and found that both stromal and immune scores of EMT_C1 were significantly higher than those of EMT_C2 (Fig. 2G). Moreover, CIBERSORTx, a deconvolution algorithm was applied to assess the infiltrating abundance of various immune cell types between EMT subgroups (Fig. 2H and Fig S1B, C). Activated memory CD4⁺ T cells, activated NK cells, γδ-T cells and CD8⁺ T cells showed high infiltration in the EMT_C1 group, whereas regulatory T cells (Tregs), resting NK cells and activated Dendritic cells were more abundant in the EMT_C2 group. In addition, we found that patients of EMT_C1 possessed higher EMT scores, which indicated a tendency to mesenchymal phenotype (Fig S1D).

WGCNA and identification of lncRNAs associated to EMT molecular subtypes

We used variance-stability-transformed expression data via DESeq2 as the input data for WGCNA. The best β value in the co-expression network was calculated to be 7 (Fig S2A-C). A total of 21 gene modules were finally determined after dynamic tree cutting and module merging processes (Fig. 3A, Fig S2E and Supplementary file 3). As shown in the module-trait relationship, many modules were found significantly correlated (P-value < 0.05) with the EMT clusters (Fig. 3B). We screened modules with relatively high correlation coefficients (≥ 0.3). Furthermore, after the intramodular analysis, we finally determined five gene modules which have a good correlation of module membership and gene significance for the EMT molecular subtype (Fig. 3C and Fig S2F). According to gene biotype annotation of Ensemble GRCh38.104, 72 lncRNAs in the five gene modules were identified as EMT molecular subtype-associated lncRNAs.

Identification of immune-related lncRNAs across pan-sarcoma types

To identify candidate lncRNA modifiers that are relevant with tumor immune across pan-sarcoma types, we proposed a three-line parallel computational approach, which involves correlations of lncRNAs expression to 1) immune marker genes expression, 2) immune-related pathway activity, and 3) abundance of TME-infiltrating immune cells. Briefly, the Pearson correlation test on normalized lncRNA expression and corresponding term was performed for each step as shown in the schematic diagram of Fig. 3D. LncRNAs in the correlation pairs with a Pearson correlation coefficient ≥ 0.3 and an adjusted p-value < 0.05 were selected. A total of 37 lncRNAs were identified to be robust candidates involved in tumor immunity across pan-sarcoma types (Supplementary file 3).

Construction and validation of a pan-sarcoma EILncRNA signature scoring model

As shown in Fig. 3E, we finally determined 26 lncRNAs that are concurrently related to EMT molecular subtype and tumor immunity across pan-sarcoma types (EILncRNA). Considering the heterogeneity of sarcoma subtypes and the complexity of interaction between EMT and tumor immunity, we proposed to develop an EILncRNA signature-based scoring model (EILncSig) to quantitatively estimate the cross-talk characteristics of EMT, tumor immune microenvironment (TIME) and tumor immunity for individual sarcoma patients. We selected Chibon et al.'s sarcoma dataset (GSE71118) as the training cohort, which has the largest sample size (n = 311) with clinical information (MFS) in the present study. We performed univariate Cox proportional hazards regression analysis to clarify the prognostic significance of the 26 EILncRNAs. A total of seven EILncRNAs (MIR22HG, LINC01140, LBX2-AS1, WWP1-AS1, AFTPH-DT, MIR155HG, and MCM3AP-AS1) were then selected to construct the EILncRNA signature-based scoring model. EILncSig score was computed as the sum of the normalized expression of the seven EILncRNAs weighted by corresponding multivariate Cox regression coefficients (Supplementary file 4).

As shown in the time-dependent receiver operating characteristic (ROC) curve analysis for MFS prediction, areas under the curve (AUC) were 0.714, 0.684, and 0.680 for 1, 3, and 5 years, respectively. By using the optimal cutoff value of EILncSig score, patients in the training cohort were stratified to high- and low-EILncSig groups. Kaplan-Meier survival analysis showed patients of the high-EILncSig group had significantly worse MFS (log-rank P = 2.708e-9) (Fig. 4A1). The distribution of the EILncSig score and the seven-EILncRNA expression between high- and low- EILncSig groups was shown in Fig. 4A2.

To validate whether the EILncSig scoring model acquires robust effectiveness across pan-sarcoma patients, we enrolled three independent datasets as testing cohorts (Williamson et al.’s rhabdomyosarcoma cohort, E-TABM-1202, n = 101; TCGA-SARC sarcoma, n = 259; TARGET-OS osteosarcoma, n = 95) for further validation. The risk score for each patient was calculated and all patients were stratified into the high- and low-risk groups. As for Williamson et al.’s rhabdomyosarcoma cohort, the time-dependent ROC curve analysis indicates EILncSig as a prognostic predictor for OS. Kaplan-Meier survival analysis showed significantly worse OS of patients in the high-risk group (log-rank P = 0.01509, Fig. 4B). Consistent results of time-dependent ROC curve and Kaplan-Meier survival analyses on both OS and relapse-free survival (RFS) were also successfully validated in the other 2 validation cohorts (TCGA-SARC and TARGET-OS) as shown in Fig. 4C and 4D.

To confirm whether the EILncSig scoring stratification could be an independent prognostic factor of other clinical features, patients from TCGA-SARC and TARGET-OS with available clinicopathologic parameters were involved in univariate and multivariate Cox regression analyses (Supplementary file 4) to test the performance of the EILncSig after being adjusted by clinicopathologic parameters such as age, gender, tumor metastasis, tumor grade, etc. As shown of the multivariate Cox regression analyses in Fig. 4C4 and 4D4, the HRs of high-EILncSig versus low-EILncSig for OS were 2.598 (P = 0.00613; 95% CI: 1.313–5.143) in TCGA-SARC testing cohort, and 3.938 (P = 0.04687; 95% CI: 1.019–15.217) in TARGET-OS testing cohort, respectively. Therefore, the EILncSig was identified as an independent factor for OS and RFS prediction. Taken together, the results of the training and testing cohorts indicated that the EILncSig scoring model could be an excellent model for predicting the prognosis of sarcoma patients, which may aid in formulating precise therapeutic strategies for patients with sarcoma.

We further examined the associations of EILncSig scores and multiple tumor characteristics across pan-sarcoma patients. Chibon et al. established a prognostic gene expression signature, complexity model in sarcomas (CINSARC), to improve sarcoma patients grading. As shown in Fig. 4E, higher EILncSig scores were found in the CINSARC_C2 group (p = 4.6e-8). As for the TCGA-SARC cohort, relapse patients and patients with metastasis were found with higher EILncSig scores (p = 0.0078 and 0.00043, Fig. 4F1-2). A congruent result was also found in the integrative clustering (iCluster) molecular subtypes of sarcoma identified by Alexander et al. The iCluster_C1 group in which patients have the worst prognosis, possesses higher EILncSig scores, whereas the iCluster_C3 group has the lowest EILncSig scores (p<2e-16, Fig. 4F3). In addition, we found that EILncSig scores were positively correlated to EMT scores, and the EMT_C2 cluster had higher EILncSig scores in the combined pan-sarcoma dataset (Fig S3A, B), which demonstrated the significant association between EILncSig and EMT molecular phenotype across pan-sarcoma patients.

TME and immune patterns associated with EILncSig in sarcoma

Bagaev et al. developed 29 sets of gene expression signatures describing pan-cancer TME characteristics and applied them in exploring TME patterns in pan-cancer patients. Four TME subtypes (immune-enriched, fibrotic (IE/F); immune-enriched, non-fibrotic (IE); fibrotic (F); and depleted (D), respectively) were defined to demonstrate the role of TME in cancer progression and metastasis. We selected sarcoma datasets (TCGA-SARC, TARGET-OS, GSE71118, E-TABM-1202) to analysis the characteristics of TME across pan-sarcoma patients. After computing EILncSig scores and assigning patients to high- and low-EILncSig levels within each cohort, all patients were included in the clustering analysis of the TME pattern. We utilized an unsupervised clustering method to assign the pan-sarcoma patients into 4 groups by using robustly standardized GSVA enrichment scores of the 29 functional gene expression signatures (F^GES) sets (Supplementary file 5 and Fig S3C). As shown in the heatmap (Fig. 5A), sarcoma patients of distinct F^GES characteristics along with high- and low- EILncSig stratifications were distributed among the four TME patterns. We utilized the t-SNE analysis to demonstrate the definite diversity of sarcoma patients per TME pattern (Fig. 5B). Furthermore, high- and low- EILncSig stratifications and four TME patterns presented significant concordant relationships among sarcoma patients (Fig. 5C). Consistent with previous results, the TME-Depleted pattern with the worst prognosis covered 50% of the high-EILncSig group whereas TME-IE and IE/F patterns representing better prognosis were more enriched in the low-EILncSig group.

Thorsson et al. identified immune subtypes (wound healing, IFN-γ dominant, inflammatory, lymphocyte depleted, and TGF-β dominant) to define pan-cancer immune response patterns that impact prognosis and tumor-immune interactions. We collected the immune subtype information of TCGA-SARC samples (five immune subtypes involved totally) (Supplementary file 5). As shown in Fig. 5D, there is a significant difference in EILncSig scores among five immune subtypes (p = 0.00034), where extremely low EILncSig scores were in the TGFβ-Dominant immune subtype. Additionally, further analysis revealed that expression levels of five EILncRNAs (lncRNAs WWP1-AS1, AFTPH-DT, LBX2-AS1, MCM3AP-AS1 and miR155HG) were also significantly different among five immune subtypes (Fig. 5E and Fig S3D). In the aspect of TME-infiltrating immune cells estimated by CIBERSORTx (Supplementary file 5 and Fig S3E), CD8⁺ T cells, activated memory CD4⁺ T cells, Tregs, γδ-T cells, monocytes and macrophages (M1 and M2) showed high infiltration in the low-EILncSig group of better prognoses, whereas resting NK cells and Dendritic cells (resting and activated) were more abundant in the high- EILncSig group (Fig. 5F). Furthermore, the Spearman correlation analysis showed that the EILncSig score was negatively correlated with CD8⁺ T cells, activated memory CD4⁺ T cells, and activated NK cells, while positively correlated with resting NK cells (Fig. 5G).

The transcriptomic alteration, SNV, and sCNA associated with EILncSig in sarcoma

Given the development of EILncSig is originated from the lncRNA modulation in the pan-sarcoma cross-talk of EMT molecular and tumor immune characteristics. We further assessed the potential value of EILncSig in the perception of transcriptomic genomic alterations in sarcoma. First, we performed DEG analysis on the 259 samples of the TCGA-SARC dataset via DESeq2, and found that 6621 genes (3384 upregulated and 3237 downregulated) were significantly differentially expressed in the high-EILncSig group (Fig. 6A and Fig S4A). As shown in the heatmap of Fig. 6B, 186 EMT-related genes belong to the DEGs set, in which a major part of mesenchymal marker genes upregulated in the low-EILncSig group. This result is also consistent with the positive correlation between EILncSig scores and EMT scores in the combined pan-sarcoma microarray dataset. We further used DESeq2 Wald statistic as a rank list for pre-ranked gene set enrichment analysis (GSEA). As shown in Fig. 6C, ridge plots of GSEA revealed that several gene sets, including DNA damage repair, TP53 activity regulation, histone methylation and protein acetylation, were enriched in the high-EILncSig group, whereas tumor immune activity-related gene sets, such as immune response regulation, cytokine production, interferons and interleukins signaling, were enriched in the low-EILncSig group.

We analyzed the somatic mutation data of samples with matched EILncSig scores from TCGA-SARC, with 98 and 137 patients in the high- and low-EILncSig groups, respectively (Fig S4B, C and Fig. 6D). TP53 mutation was found as top1 mutation both in the high- and low-EILncSig groups. However, a higher mutation frequency (47% vs. 32%) was observed in the high-EILncSig group. The mutation frequency of RB1, a well-known tumor suppressor gene, was much higher in the high-EILncSig group (ranks 2nd) than that in the low-EILncSig group. Another widely-studied cancer-related gene TTN were also found mutated with relatively high differential frequencies in the low-EILncSig group. We disclosed specific mutation sites of TP53, RB1 and TTN corresponding to their amino acid location between the high- and low-EILncSig groups (Fig. 6E, F and Fig S4D).

As for the somatic copy number alteration (sCNA), we evaluated its divergence associated with EILncSig by using GISTIC 2.0, which involved 258 samples with matched EILncSig scores in TCGA-SARC (Fig S4E). As shown in Fig. 6G, higher copy number deletion events were found in the high-EILncSig group while no significant difference of amplification was observed. Additionally, there existed a significantly positive correlation between copy number deletion events and EILncSig scores (R = 0.241, P = 8.99e-05) (Fig S4F and Fig. 6H). As the previous GSEA showed that DDR-related pathways were found activated in the EILncSig-high group, these results pointed out that the EILncSig might potentially reflect the genome instability in sarcoma. Moreover, we implemented functions of GISTIC 2.0 to identify recurrent focal sCNA regions. As shown in Fig. 6I, there existed multiple obvious amplification peaks in the low-EILncSig group, while amplifications on chromosomes 8, 13 and 17 and deletions on chromosomes 1, 13 and 17, were found with higher absolute G-scores in the high-EILncSig group. We identified that several sCNA peaks were distinctly detected in the high-EILncSig group, such as focal amplification peaks, such as the well-studied cancer-driven gene MYC (8q.24.21), several oncogenic genes TFDP1, CUL4A, GAS6 (13q34), DNA damage response related genes TOP3A, ALKBH5 (17p11.2), along with focal deletion peaks including the tumor suppressor gene TP73 (1p36.32) (Supplementary file 6).

EILncSig as a potential predictor of immunotherapy response

Accumulating studies are focusing on identifying robust indicators of immunotherapy response of cancer patients. Predictive efficacy of biomarkers such as expression of certain immune checkpoint inhibitors (ICI), tumor neoantigen burden (TNB), and microsatellite instability (MSI) have been studied in specific cancer types. The clinical development of cancer immunotherapy and the advances in genomic analysis also validated the important role of the TME in response to ICB therapy. Considering the association of EILncSig with immune-infiltrating cells and immune processes activation, we evaluated the potential capacity of EILncSig as a predictor of immunotherapy response. Previous studies have demonstrated that there existed complex cross-talk among tumor immune response, immune infiltration, and expression of ICI genes.

Herein, we firstly compared the expression of several common ICI genes between patients stratified by EILncSig in the TCGA-SARC dataset as shown in Fig. 7A and Fig S5, the expressions of multiple ICI genes including CTLA-4 and PD-1 were significantly higher in the low-EILncSig group. Considering the globally high level of immune infiltration of the low-EILncSig group, ICI genes that are majorly expressed in immune cells are supposed to be of abundant expression. However, we found that expressions of PD-L1, LAG-3, SIGLEC6 and IDO2 had no difference between EILncSig groups, and the expression of VTCN1 was even significantly higher in the high-EILncSig group. In addition, VTCN1 expression was positively correlated to the EILncSig scores (Fig. 7B).

Next, we examined the capacity of the EILncSig to predict the ICB therapy response on an independent clinical cohort. Kim et al.’s cohort (GSE176307), a publicly accessible PD1/PD-L1 therapy dataset with RNA-Seq and follow-up data, was used in this study. Patients were stratified to high- and low-EILncSig groups in the same method (Supplementary file 7). The time-dependent ROC curve analysis showed that EILncSig scores could be used to predict patients’ PFS and OS. The Kaplan-Meier survival analysis revealed patients in the high-EILncSig group had worse OS and PFS after ICB therapy (log-rank P = 0.03753 and 0.01187, Fig. 7C and 7D). Moreover, a lower percentage of high-EILncSig patients achieved complete/partial response (CR/PR) while a higher percentage suffered from stable/progressed disease (SD/PD) compared to the low-EILncSig group (p = 0.018, Fig. 7E). Taken together, low-EILncSig patients were provided with significant clinical benefits, better therapeutic responses, and markable prolonged survival after ICB therapy.

Discovery of potential drugs that target EILncSig in sarcoma

Exploring the complex molecular interactions and regulatory mechanisms of tumor immunity is indeed the exact direction to improving immunotherapeutic efficacy. However, it is noteworthy that the combination of immunotherapy and classical chemotherapeutic drugs could be an achievable approach to promote the effectiveness of immunotherapy. Herein, we mined the CMAP database and interactively analyzed large-scale pharmacogenetic data with molecular characteristics of EILncSig, to discover drugs that may have the potential capacity to convert sarcoma from high-EILncSig into low-EILncSig status (Fig. 8A and Supplementary file 7).

As shown in Fig. 8B, promising drugs with positive connective scores were predicted, such as topoisomerase I inhibitor irinotecan, retinoid drugs isotretinoin, Ca²⁺ ionophore ionomycin, and antimetabolite drug tioguanine were predicted to be. Although these drugs have different molecular targets, an increasing number of recent publications have validated the potential of these drugs in immune modulation. For example, He et al. developed a PD-L1-targeting immune liposome (P-Lipo) for co-delivering irinotecan and JQ1, which can successfully elicit antitumor immunity in colorectal cancer through inducing immunogenic cell death (ICD) by irinotecan and interfering in the immunosuppressive PD-1/PD-L1 pathway by JQ1[59]. The antitumor immunity or immune-enhancing effect of specific compounds still need to be further validated in sarcoma, while we surmised that these results may be supportive to expand novel combination strategies of classic drugs with immunotherapy for sarcoma patients, and provide fundamental basis for further experiments and clinical trials.

Sarcoma is a highly heterogeneous malignant tumor, with a highly aggressive clinical phenotype and unfavorable clinical outcomes[60]. Due to the complex molecular profiling and varying clinicopathological characteristics across sarcoma types, only limited number of patients could obtain satisfactory clinical benefits from common therapeutic strategies. Immunotherapy has become a hotspot in cancer research and takes cancer treatment into a new era. Although immunotherapy for sarcoma has some successful cases[6], its application prospect and effectiveness are still unclear across heterogeneous sarcomas compared to specific well-studied cancers such as leukemia. Notably, emerging evidence has presented a boosted therapeutic efficacy by combining immunotherapy with modulation of specific functional targets[61, 62]. To explore the potential application of combined immunotherapy strategy for sarcoma, it is worthwhile to identify biomarkers that function as molecular targets or critical regulators in tumor immunity across sarcoma types.

EMT is a reversible process that may interact with tumor immunity through multiple approaches such as affecting the tumor immune microenvironment. Recent studies have demonstrated the interconnections among EMT-related processes, tumor microenvironment, immune activity, as well as the potential influence on immunotherapy response. It is notable that increasing evidence shows that certain sarcomas reside in intermediate EMT/MET-related processes, such as the metastable phenotype, which allows tumor cells to switch between epithelial and mesenchymal differentiation[9]. The combined presence of epithelial and mesenchymal features likely plays an indispensable role in the aggressiveness of such sarcomas. To precisely define the regulators of EMT/MET-related processes in sarcomas, we defined two distinct EMT-related molecular subtypes based on a combined EMT signature, identified 26 EILncRNAs, and then constructed a 7-lncRNA signature scoring model (EILncSig) that can stratify sarcoma patients with distinct prognoses, immune microenvironment characteristics, as well as genomic and transcriptomic variations. Although lncRNAs lack protein-coding capability, they represent an important layer of immune system regulation and function as pivotal regulators. Specific lncRNAs of the EILncSig have been reported to be dysregulated and play imperative roles in several cancers. LINC01140 was shown to be upregulated in various cancers, such as lung cancer[63]. However, Hu et al. found the expression of LINC01140 was lower in metastatic sarcoma, and low LINC01140 expression predicted poor OS, DFS of sarcoma patients[64]. Additionally, LINC01140 expression was shown to be negatively correlated with various EMT factors. MIR155HG, also referred to as B‑cell integration cluster, was identified as an oncogene gene in lymphomas, glioma, and colorectal cancer[65]. A recent study reported that MIR155HG acts as an oncogene regulating EMT in laryngeal squamous cell carcinoma (LSCC), and is associated with prognosis and tumor progression[66]. In the current study, the EILncSig was validated as a robust evaluating tool for the prognosis of patients with sarcoma through the examination on multiple independent datasets incorporating various sarcoma types.

Over the past several decades, accumulating studies have revealed the important roles of TME in sarcoma genesis, as well as in predicting the prognosis of sarcoma patients[67, 68]. An increased understanding of TME patterns in sarcoma is essential for improving patient outcomes and quality of life. Bagaev et al. developed 29 sets of gene expression signatures describing pan-cancer TME characteristics, and defined four TME subtypes to uncover the bidirectional interaction between sarcoma cells and TME[50]. Remarkably, the high- EILncSig group was mainly composed of the TME-Depleted pattern whereas TME-IE and IE/F patterns were more enriched in the low-EILncSig group. Consistently, the EILncSig score is negatively correlated with infiltrations of CD8 + T cells, activated memory CD4 + T cells, and activated NK cells. It is generally accepted that cytotoxic CD8 + T cells, following successful priming, recognize tumor-specific (neoantigens) or tumor-associated antigens and exert anti-tumor function primarily via the release of cytotoxic molecules such as perforin and granzymes[69]. Taken together, our finding indicates that EILncSig is closely associated with TIME characteristics across pan-sarcoma patients.

The EILncSig also reflects the expression alterations of genes involved in multiple vital hallmarks in sarcomas. Based on the GSVA, we found that several pathways involved in proliferation and metabolism were enriched in the high-risk group whereas tumor immune activity-related gene sets were enriched in the low-risk group. We also determined that somatic mutational profile and sCNA landscape were also significantly different between the high- and low-risk groups. The high-risk group had significantly higher mutational frequency, especially when it came to the well-known tumor suppressor genes TP53 and RB-1. Consistent with the GSVA results, copy number deletion events were markedly enriched with increased EILncSig scores, indicating the potential crosstalk between EILncSig and the genome instability of sarcoma. The sCNA analysis revealed that the high-risk group had multiple recurrent focal amplification peaks covering genomic regions of MYC (8q24.21), TFDP1, CUL4A, and GAS6 (13q34), along with focal deletion peaks including the tumor suppressor gene TP73 (1p36.32). The c-MYC proto-oncogene plays a crucial role in tumorigeneses such as proliferation, growth, apoptosis, metabolism, DNA replication, and angiogenesis, which can also induce radio- and chemo-resistance of sarcoma cells by suppressing radiation-induced apoptosis and DNA damage, promoting radiation-induced DNA repair, and transcriptional regulation of ABC transporter family genes[70, 71]. The transcription factor p73 is a structural and functional homolog of TP53, and can mimic and/or surrogate for p53 onco-suppressive functions, which has attracted incredible attention for therapeutic cancer management because of the rare mutation[72]. Galtsidis et al. demonstrated that p73 regulated miR-3158 containing network involved in EMT, thus modulating the cell migration in osteosarcoma[73].

Immune checkpoint inhibitor therapy has recently revealed substantial advancement in clinical care for many cancer types including sarcoma. An early assessment for ICI response by predictive biomarkers is crucial for the selection of patients who are most likely to benefit from ICB therapy. In case that ICI genes were supposed to be highly expressed in the low- EILncSig group with higher immune infiltration, we still found that the expressions of PD-L1 and LAG-3 had no significant difference between EILncSig groups, and the expression of VTCN1 was significantly higher in the high-EILncSig group. These findings suggest that high-EILncSig sarcoma patients may potentially benefit from the ICB therapy against PD-L1, LAG3 and VTCN1. Furthermore, Kim et al.’s cohort (GSE176307)[26] was used to compare the survival distributions of patients stratified by the EILncSig. The low-EILncSig patients were provided with significant clinical benefits, better therapeutic responses, and markable prolonged survival after ICB therapy, indicating that the complex interplay between immune infiltration and ICI genes in the TME has an impact on sarcoma patients’ survival. In addition, we identified multiple drugs that may have the potential to improve the immunotherapeutic response, thus developing novel chemo-immunotherapy strategies for sarcoma patients. Irinotecan is a first-line chemo-drug in colorectal and pancreatic cancer, and other solid tumors, which functions as a topoisomerase I inhibitor, thereby inducing double-strand DNA breakage and cell death[74]. Accumulating evidence has recently supported that irinotecan can induce ICD, upregulate tumor-specific antigens, thus triggering an anti-tumor immune response[75, 76]. He et al. and Liu et al. have validated the superior anti-tumor effect and enhanced patients’ survival improvement of chemo-immunotherapy by combining delivery of anti-PD-L1 and irinotecan[59, 77]. However, the immune-enhancing effects of specific drugs combined with immunotherapy are warranted for further validation in sarcoma.

In summary, we identified lncRNAs in the cross-talk of EMT and tumor immunity across pan-sarcoma types and constructed a lncRNA-based computational model. Our findings provide a comprehensive resource for understanding the functional role of lncRNA-mediated immune regulation in sarcomas. The constructed EILncSig in our study may serve as a robust predictor of prognosis for patients with sarcomas, as well as a potential biomarker of ICI therapy response that facilitates a more accurate selection of sarcoma patients who may benefit from immunotherapy. The present study established a groundwork for developing potential clinical applications of lncRNA-based immunotherapeutic strategies in precision medicine.

EMT: epithelial-to-mesenchymal transition

TME: tumor microenvironment

LncRNA: long non-coding RNAs

EILncRNA: EMT and tumor Immune-related lncRNAs

EILncSig: EILncRNA signature-based scoring model

CNV: copy number variation

STS: soft tissue sarcomas

ICI: immune checkpoint inhibitor

CRISPR/Cas: Clustered regularly interspaced short palindromic repeats/CRISPR associated nuclease

UCA1: urothelial carcinoma-associated 1

PD1: programmed cell death 1

LIMIT: LncRNA Inducing IFN-γ, MHC-I, and Immunogenicity of Tumor

MFS: metastasis-free survival

OS: overall survival

GSVA: gene set variation analysis

IL-10: interleukin-10

IFNG: interferon

TIME: tumor immune microenvironment

ROC: receiver operating characteristic

AUC: areas under the curve

RFS: relapse-free survival

TME subtypes -IE/F: immune-enriched, fibrotic form

-IE: immune-enriched, non-fibrotic form

-F: fibrotic form

-D: depleted form

F^GES: functional gene expression signatures

sCNA: somatic copy number alteration

TNB: tumor neoantigen burden

MSI: microsatellite instability

CR/PR: complete/partial response

SD/PD: stable/progressed disease

P-Lipo: PD-L1-targeting immune liposome

ICD: immunogenic cell death

PDL-1: PD-ligand-1

GSEA: gene set enrichment analysis

NES: normalized enrichment score

GO: Gene Ontology ()

KEGG: Kyoto Encyclopedia of Genes and Genomes

MSigDB: Molecular Signatures Database

WGCNA: Weighted gene co-expression network analysis

PCC: Pearson's correlation coefficients

DEG: differentially expressed genes

Availability of data and materials

The accession IDs, web links for publicly available datasets analyzed in this study are described in method section. All software and R packages used in our study are publicly available and denoted in the method section. Processed supporting the findings of the present study are available in Supplementary files. R scripts for data analysis and visualization are available upon request.

Competing interests

The authors declare that they do not have any competing conflicts of interest.

Consent for publication

All authors reviewed and approved the final manuscript for publication.

Ethics approval and consent to participate

The patient cohorts we used were publicly available datasets that were collected with patients’ informed consent.

Funding

This work was supported by grants from the National Natural Science Foundation of China (grant No. 82072978 to JL, No. 82072979 to ZZ), and the Natural Science Foundation of Hubei Province (grant No. 2020CFB861 to JL).

Authors’ contributions

DS and SMcontributed equally to this work. DS, SM, FP, BZ, and BH collected data. DS, SM, and FP analyzed data and conducted statistical analysis. All authors contributed to data interpretation. DS, SM, FP, ZZ, ZS, and JL drafted and revised the manuscript. ZZ and JL jointly conceived and supervised the study.

Acknowledgments

The results here are based upon data generated by the TCGA Research Network and the Therapeutically Applicable Research to Generate Effective Treatments. The study reported herein fully satisfies the TCGA and TARGET publication requirements (https://www.cancer.gov/tcga, https://ocg.cancer.gov/programs/target). The authors would like to thank the TCGA, TARGET and GEO developed by National Institutes of Health, and the ArrayExpress developed by the European Bioinformatics Institute.

Anderson WJ, Doyle LA (2021) Updates from the 2020 World Health Organization Classification of Soft Tissue and Bone Tumours. Histopathology 78:644–657
Ferrari A, Dirksen U, Bielack S (2016) Sarcomas of Soft Tissue and Bone. Prog Tumor Res 43:128–141
Damerell V, Pepper MS, Prince S (2021) Molecular mechanisms underpinning sarcomas and implications for current and future therapy. Signal Transduct Target Ther 6:246
Grünewald TG, Alonso M, Avnet S, Banito A, Burdach S, Cidre-Aranaz F, Di Pompo G, Distel M, Dorado-Garcia H, Garcia-Castro J et al (2020) Sarcoma treatment in the era of molecular medicine. EMBO Mol Med 12:e11131
Kasper B (2019) The challenge of finding new therapeutic avenues in soft tissue sarcomas. Clin Sarcoma Res 9:5
Groisberg R, Hong DS, Behrang A, Hess K, Janku F, Piha-Paul S, Naing A, Fu S, Benjamin R, Patel S et al (2017) Characteristics and outcomes of patients with advanced sarcoma enrolled in early phase immunotherapy trials. J Immunother Cancer 5:100
Piano MA, Brunello A, Cappellesso R, Del Bianco P, Mattiolo A, Fritegotto C, Montini B, Zamuner C, Del Fiore P, Rastrelli M et al (2020) Periostin and Epithelial-Mesenchymal Transition Score as Novel Prognostic Markers for Leiomyosarcoma, Myxofibrosarcoma, and Undifferentiated Pleomorphic Sarcoma. Clin Cancer Res 26:2921–2931
Terry S, Savagner P, Ortiz-Cuaran S, Mahjoubi L, Saintigny P, Thiery JP, Chouaib S (2017) New insights into the role of EMT in tumor immune escape. Mol Oncol 11:824–846
Sannino G, Marchetto A, Kirchner T, Grünewald TGP (2017) Epithelial-to-Mesenchymal and Mesenchymal-to-Epithelial Transition in Mesenchymal Tumors: A Paradox in Sarcomas? Cancer Res 77:4556–4561
Kahlert UD, Joseph JV, Kruyt FAE (2017) EMT- and MET-related processes in nonepithelial tumors: importance for disease progression, prognosis, and therapeutic opportunities. Mol Oncol 11:860–877
Wang JY, Yang Y, Ma Y, Wang F, Xue A, Zhu J, Yang H, Chen Q, Chen M, Ye L et al (2020) Potential regulatory role of lncRNA-miRNA-mRNA axis in osteosarcoma. Biomed Pharmacother 121:109627
Min L, Garbutt C, Tu C, Hornicek F, Duan Z (2017) : Potentials of Long Noncoding RNAs (LncRNAs) in Sarcoma: From Biomarkers to Therapeutic Targets.Int J Mol Sci18
Huang D, Chen J, Yang L, Ouyang Q, Li J, Lao L, Zhao J, Liu J, Lu Y, Xing Y et al (2018) NKILA lncRNA promotes tumor immune evasion by sensitizing T cells to activation-induced cell death. Nat Immunol 19:1112–1125
Hu Q, Ye Y, Chan LC, Li Y, Liang K, Lin A, Egranov SD, Zhang Y, Xia W, Gong J et al (2019) Oncogenic lncRNA downregulates cancer cell antigen presentation and intrinsic tumor suppression. Nat Immunol 20:835–851
Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, Sabedot TS, Malta TM, Pagnotta SM, Castiglioni I et al (2016) TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res 44:e71
Network CGAR (2017) Comprehensive and Integrated Genomic Characterization of Adult Soft Tissue Sarcomas. Cell 171:950–965e928
Thorsson V, Gibbs DL, Brown SD, Wolf D, Bortone DS, Ou Yang TH, Porta-Pardo E, Gao GF, Plaisier CL, Eddy JA et al (2018) The Immune Landscape of Cancer. Immunity 48:812–830e814
Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J et al (2021) Ensembl 2021. Nucleic Acids Res 49:D884–d891
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550
Pachter L (2011) : Models for transcript quantification from RNA-Seq. arXiv preprint arXiv:1104.3889.
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M et al (2013) NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res 41:D991–995
Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res 47:W636–w641
Heber S, Sick B (2006) Quality assessment of Affymetrix GeneChip data. Omics 10:358–368
Kauffmann A, Gentleman R, Huber W (2009) arrayQualityMetrics–a bioconductor package for quality assessment of microarray data. Bioinformatics 25:415–416
Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883
Rose TL, Weir WH, Mayhew GM, Shibata Y, Eulitt P, Uronis JM, Zhou M, Nielsen M, Smith AB, Woods M et al (2021) Fibroblast growth factor receptor 3 alterations and response to immune checkpoint inhibition in metastatic urothelial cancer: a real world experience. Br J Cancer 125:1251–1260
Busuioc C, Ciocan-Cartita CA, Braicu C, Zanoaga O, Raduly L, Trif M, Muresan MS, Ionescu C, Stefan C, Crivii C et al (2021) : Epithelial-Mesenchymal Transition Gene Signature Related to Prognostic in Colon Adenocarcinoma.J Pers Med11
Tan TZ, Miow QH, Miki Y, Noda T, Mori S, Huang RY, Thiery JP (2014) Epithelial-mesenchymal transition spectrum quantification and its efficacy in deciphering survival and drug responses of cancer patients. EMBO Mol Med 6:1279–1293
Rokavec M, Kaller M, Horst D, Hermeking H (2017) Pan-cancer EMT-signature identifies RBM47 down-regulation during colorectal cancer progression. Sci Rep 7:4687
Kandimalla R, Gao F, Li Y, Huang H, Ke J, Deng X, Zhao L, Zhou S, Goel A, Wang X (2019) RNAMethyPro: a biologically conserved signature of N6-methyladenosine regulators for predicting survival at pan-cancer level. NPJ Precis Oncol 3:13
Koplev S, Lin K, Dohlman AB, Ma'ayan A (2018) Integration of pan-cancer transcriptomics with RPPA proteomics reveals mechanisms of epithelial-mesenchymal transition. PLoS Comput Biol 14:e1005911
Hollern DP, Swiatnicki MR, Andrechek ER (2018) Histological subtypes of mouse mammary tumors reveal conserved relationships to human cancers. PLoS Genet 14:e1007135
Wilkerson MD, Hayes DN (2010) ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26:1572–1573
Xu T, Le TD, Liu L, Su N, Wang R, Sun B, Colaprico A, Bontempi G, Li J (2017) CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization. Bioinformatics 33:3131–3133
Mak MP, Tong P, Diao L, Cardnell RJ, Gibbons DL, William WN, Skoulidis F, Parra ER, Rodriguez-Canales J, Wistuba II et al (2016) A Patient-Derived, Pan-Cancer EMT Signature Identifies Global Molecular Alterations and Immune Target Enrichment Following Epithelial-to-Mesenchymal Transition. Clin Cancer Res 22:609–620
Yu G, Wang LG, Han Y, He QY (2012) clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16:284–287
Hänzelmann S, Castelo R, Guinney J (2013) GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14:7
Consortium GO (2019) The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res 47:D330–d338
Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen EL, Bohler A, Mélius J, Waagmeester A, Sinha SR, Miller R et al (2016) WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res 44:D488–494
Jassal B, Matthews L, Viteri G, Gong C, Lorente P, Fabregat A, Sidiropoulos K, Cook J, Gillespie M, Haw R et al (2020) The reactome pathway knowledgebase. Nucleic Acids Res 48:D498–d503
Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P (2015) The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1:417–425
Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, Khodadoust MS, Esfahani MS, Luca BA, Steiner D et al (2019) Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 37:773–782
Aran D, Hu Z, Butte AJ (2017) xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol 18:220
Sturm G, Finotello F, Petitprez F, Zhang JD, Baumbach J, Fridman WH, List M, Aneichyk T (2019) Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology. Bioinformatics 35:i436–i445
Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4:Article17
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559
Bhattacharya S, Dunn P, Thomas CG, Smith B, Schaefer H, Chen J, Hu Z, Zalocusky KA, Shankar RD, Shen-Orr SS et al (2018) ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data 5:180015
Sun J, Zhang Z, Bao S, Yan C, Hou P, Wu N, Su J, Xu L, Zhou M (2020) : Identification of tumor immune infiltration-associated lncRNAs for improving prognosis and immunotherapy response of patients with non-small cell lung cancer.J Immunother Cancer8
Bagaev A, Kotlov N, Nomie K, Svekolkin V, Gafurov A, Isaeva O, Osokin N, Kozlov I, Frenkel F, Gancharova O et al (2021) : Conserved pan-cancer microenvironment subtypes predict response to immunotherapy.Cancer Cell
Van Der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15:3221–3245
Ligges U, Maechler M (2003) scatterplot3d - An R Package for Visualizing Multivariate Data. J Stat Softw 8:1–20
Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP (2018) Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res 28:1747–1756
Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G (2011) GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 12:R41
Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) GenePattern 2.0. Nat Genet 38:500–501
Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, Gould J, Davis JF, Tubelli AA, Asiedu JK et al (2017) A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 171:1437–1452e1417
Smirnov P, Safikhani Z, El-Hachem N, Wang D, She A, Olsen C, Freeman M, Selby H, Gendoo DM, Grossmann P et al (2016) PharmacoGx: an R package for analysis of large pharmacogenomic datasets. Bioinformatics 32:1244–1246
Wickham H (2016) GGPLOT2: Elegant Graphics for Data Analysis 2016. Springer-Verlag, New York
He ZD, Zhang M, Wang YH, He Y, Wang HR, Chen BF, Tu B, Zhu SQ, Huang YZ (2021) Anti-PD-L1 mediating tumor-targeted codelivery of liposomal irinotecan/JQ1 for chemo-immunotherapy. Acta Pharmacol Sin 42:1516–1523
Pingping B, Yuhong Z, Weiqi L, Chunxiao W, Chunfang W, Yuanjue S, Chenping Z, Jianru X, Jiade L, Lin K et al (2019) Incidence and Mortality of Sarcomas in Shanghai, China, During 2002–2014. Front Oncol 9:662
Zhen S, Lu J, Chen W, Zhao L, Li X (2018) Synergistic Antitumor Effect on Bladder Cancer by Rational Combination of Programmed Cell Death 1 Blockade and CRISPR-Cas9-Mediated Long Non-Coding RNA Urothelial Carcinoma Associated 1 Knockout. Hum Gene Ther 29:1352–1363
Li G, Kryczek I, Nam J, Li X, Li S, Li J, Wei S, Grove S, Vatan L, Zhou J et al (2021) LIMIT is an immunogenic lncRNA in cancer immunity and immunotherapy. Nat Cell Biol 23:526–537
Xia R, Geng G, Yu X, Xu Z, Guo J, Liu H, Li N, Li Z, Li Y, Dai X et al (2021) : LINC01140 promotes the progression and tumor immune escape in lung cancer by sponging multiple microRNAs.J Immunother Cancer9
Hu X, Han W, Lou N (2021) High Levels of LINC01140 Expression Predict a Good Prognosis and Improve Radiotherapy in Sarcoma Patients. Crit Rev Eukaryot Gene Expr 31:9–20
Zhou L, Li J, Liao M, Zhang Q, Yang M (2021) : LncRNA MIR155HG induces M2 macrophage polarization and drug resistance of colorectal cancer cells by regulating ANXA2.Cancer Immunol Immunother
Cui W, Meng W, Zhao L, Cao H, Chi W, Wang B (2019) TGF-β-induced long non-coding RNA MIR155HG promotes the progression and EMT of laryngeal squamous cell carcinoma by regulating the miR-155-5p/SOX10 axis. Int J Oncol 54:2005–2018
Lin Z, Fan Z, Zhang X, Wan J, Liu T (2020) Cellular plasticity and drug resistance in sarcoma. Life Sci 263:118589
Ehnman M, Chaabane W, Haglund F, Tsagkozis P (2019) The Tumor Microenvironment of Pediatric Sarcoma: Mesenchymal Mechanisms Regulating Cell Migration and Metastasis. Curr Oncol Rep 21:90
Zhu N, Hou J (2020) Assessing immune infiltration and the tumor microenvironment for the diagnosis and prognosis of sarcoma. Cancer Cell Int 20:577
Gravina GL, Festuccia C, Popov VM, Di Rocco A, Colapietro A, Sanità P, Monache SD, Musio D, De Felice F, Di Cesare E et al (2016) c-Myc Sustains Transformed Phenotype and Promotes Radioresistance of Embryonal Rhabdomyosarcoma Cell Lines. Radiat Res 185:411–422
Xu BS, Chen HY, Que Y, Xiao W, Zeng MS, Zhang X (2020) ALK(ATI) interacts with c-Myc and promotes cancer stem cell-like properties in sarcoma. Oncogene 39:151–163
Logotheti S, Richter C, Murr N, Spitschak A, Marquardt S, Pützer BM (2021) Mechanisms of Functional Pleiotropy of p73 in Cancer and Beyond. Front Cell Dev Biol 9:737735
Galtsidis S, Logotheti S, Pavlopoulou A, Zampetidis CP, Papachristopoulou G, Scorilas A, Vojtesek B, Gorgoulis V, Zoumpourlis V (2017) Unravelling a p73-regulated network: The role of a novel p73-dependent target, MIR3158, in cancer cell migration and invasiveness. Cancer Lett 388:96–106
Del Rio M, Mollevi C, Bibeau F, Vie N, Selves J, Emile JF, Roger P, Gongora C, Robert J, Tubiana-Mathieu N et al (2017) Molecular subtypes of metastatic colorectal cancer are associated with patient response to irinotecan-based therapies. Eur J Cancer 76:68–75
McKenzie JA, Mbofung RM, Malu S, Zhang M, Ashkin E, Devi S, Williams L, Tieu T, Peng W, Pradeep S et al (2018) The Effect of Topoisomerase I Inhibitors on the Efficacy of T-Cell-Based Cancer Immunotherapy. J Natl Cancer Inst 110:777–786
Frey B, Stache C, Rubner Y, Werthmöller N, Schulz K, Sieber R, Semrau S, Rödel F, Fietkau R, Gaipl US (2012) Combined treatment of human colorectal tumor cell lines with chemotherapeutic agents and ionizing irradiation can in vitro induce tumor cell death forms with immunogenic potential. J Immunotoxicol 9:301–313
Liu X, Jiang J, Liao YP, Tang I, Zheng E, Qiu W, Lin M, Wang X, Ji Y, Mei KC et al (2021) Combination Chemo-Immunotherapy for Pancreatic Cancer Using the Immunogenic Effects of an Irinotecan Silicasome Nanocarrier Plus Anti-PD-1. Adv Sci (Weinh) 8:2002147

Download PDF

Editorial decision: Major Revision
08 May, 2022
Reviews received at journal
19 Apr, 2022
Reviewers invited by journal
17 Apr, 2022
Editor assigned by journal
11 Apr, 2022
First submitted to journal
09 Apr, 2022

You are reading this latest preprint version

Pan-sarcoma characterization of lncRNAs in the crosstalk of EMT and tumor immunity identifies distinct clinical outcomes and potential implications for immunotherapy

Status:

Version 1

Abstract

Figures

Introduction

Methods

Pan-sarcoma data collection

Immunotherapy data collection

Clustering molecular pattern of EMT signature expression

Computation of the EMT score

Functional Enrichment Analysis

Evaluation of TME cell infiltration abundance

Weighted gene co-expression network analysis

Identification of immune-related lncRNAs in sarcoma

Development of an EMT- and tumor immune-related lncRNA signature scoring model

Clustering analysis of expression pattern based on pan-cancer TME signatures

Analysis of somatic mutation and recurrent regions somatic copy number alteration

Analysis of differentially expressed genes

Discovery of potential drugs based on CMAP database

Statistical analysis

Results

Derivation of de novo pan-sarcoma EMT molecular subtypes from the perspective of EMT signature

WGCNA and identification of lncRNAs associated to EMT molecular subtypes

Identification of immune-related lncRNAs across pan-sarcoma types

Construction and validation of a pan-sarcoma EILncRNA signature scoring model

TME and immune patterns associated with EILncSig in sarcoma

The transcriptomic alteration, SNV, and sCNA associated with EILncSig in sarcoma

EILncSig as a potential predictor of immunotherapy response

Discovery of potential drugs that target EILncSig in sarcoma

Discussion

Conclusion

Abbreviations

Declarations

References

Supplementary Files

Status:

Version 1