3.1. WGCNA Identified Clinically Significant Modules
In total, 120 primary glioma samples and 4 non-tumor samples were obtained from the GSE43378 and GSE7696. After data preprocessing and normalization, 4219 DEGs were selected for analysis, including 2340 down-regulated and 1879 up-regulated in glioma samples. Expression profiling was presented in a volcano plot (Figure 1A).
Identified DEGs were submitted to WGCNA, which were assigned to gene co-expression modules after removing outlier samples by cluster dendrogram trees (Figure 1B). The co-expression network clustered to 7 modules of at least 30 genes, corresponding to 7 different colors. Module-containing genes were assigned by their intramodule connectivity and the non-clustering genes were divided into the gray module. TOM reflecting adjacencies or topological overlaps was visualized by heatmap (Figure 1C), the topological overlap depicted the similar commonality between notes interconnected. It can be known from the topological overlap heatmap that more obvious topological overlap appeared in genes within a module than across modules.
As displayed in the relationship between modules and clinical traits (Figure 1D), the yellow module was the most highlighted. It was not only positively correlated with WHO grade (correlation coefficient = 0.4, p = 5e-06) but also negative correlated with survival (correlation coefficient = -0.36, p = 5e-05). It was suggested that the yellow module genes played a role in promoting the development of glioma and leading to a poor prognosis. Subsequently, for further investigating the correlations between the yellow module and clinical traits, the relationships between GS and MM in the module were plotted to verify the significance of yellow module related to WHO grade (Figure 1E) and survival (Figure 1F).
3.2. Screening Collagen Genes
401 genes in the yellow module were submitted to DAVID website to enrich their potential functions. The results showed enriched BP (Supplementary figure 1A) was mainly focused on extracellular matrix (ECM) organization, cell adhesion, collagen fibril organization, angiogenesis, leukocyte migration, and single organismal cell-cell adhesion. MF (Supplementary figure 1B) mainly concentrated on ECM structural constituent, the binding of biomolecules (including protein, platelet−derived growth factor, integrin, and collagen), along with cadherin binding involved in cell-cell adhesion. With regards to CC (Supplementary figure 1C), a variety of enrichments were mainly involved in extracellular exosome, extracellular matrix, focal adhesion, and so on. The results of KEGG enrichment analysis (Supplementary figure 1D) indicated that the tumor-associated pathways in the yellow module were closely associated with ECM-receptor interaction, focal adhesion, protein processing in the endoplasmic reticulum, and PI3K-Akt signaling pathway. Based on these results, we hypothesized that genes correlated with the ECM-receptor interaction signal pathway may play an important role in the glioma progression.
We then imported the weighted co-expression networks of the yellow module to Cytoscape followed by screening the candidate hub genes, which are keynotes in the interactive network. A total of four calculation methods, including Degree, Edge percolated component (EPC), Closeness, and Radiality in cytoHubba application were employed in selecting hub genes, which were the intersection of the top 20 genes determined by these algorithms (Table 1). Eventually, 6 collagen genes (COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, and COL5A2) in a family were identified as the candidate key genes for further verification. And the weighted co-expression network of these genes interacting within the module was visualized (Supplementary figure 1E).
3.3. Higher Expression of 6 Collagen Genes was Negatively Correlated with the Prognosis of Glioma Patients
To test the collagen genes, differential expression analysis among different cancers and glioma grades were executed on a cancer microarray data-mining database: Oncomine[25]. The results showed that the collagen genes were overexpressed in most types of tumors, including central nervous system cancer, breast cancer, head and neck cancer, colorectal cancer, and so on (Supplementary figure 2A). Further, we utilized the Sun Brain dataset[26], a maximal sample size mRNA microarray that could be grouped by glioma grade in histology analysis. The grade plots of collagen genes (Supplementary figure 2B) were completed. The screened collagen genes were almost high-expression in glioma, and the expression level rises with glioma grade increases.
From the UCSC database, level 3 RNA-Seq datasets (TCGA-GBMLGG) and their clinical information were downloaded to investigate the relationship between gene expressions and WHO grades of patients. As showed in Figure 2A, similar results were achieved. Besides, survival analyses were performed on the GEPIA database, which was based on The Cancer Genome Atlas (TCGA) data, to confirm the prognostic value of candidate genes. We found that high expression of collagen genes showed a significantly poor prognosis (Figure 2B). On the CGGA website (mRNAseq_325), collagen genes (COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, and COL5A2) in WHO grades II, III and IV glioma samples were analyzed to explore the expression levels of these genes. Similarly, higher mRNA levels of these collagen genes were also found in higher grade glioma (Supplementary figure 3A-F).
As we all know, IDH wild-type glioma patients present a shorter progression-free survival (PFS) and overall survival than mutant-type patients [27]. Subsequently, we found that glioma patients with IDH mutant-type have lower expression levels of these collagen genes than IDH wild-type patients in TCGA dataset (Figure 3A). Moreover, there werw similar findings in the CGGA dataset (Figure 3B).These results indicated that the expressions of the 6 collagen genes were negatively correlated with the prognosis of glioma patients, and may become an independent prognostic marker.
Moreover, immunohistochemistry (IHC) datasets retrieved from the Human Protein Atlas database were utilized to reveal the protein level of the collagen gene. In total, four collagen genes (COL1A1, COL1A2, COL4A1, and COL4A2) were retrieved (No data found for COL5A2 and COL3A1 was not detected in most of the samples) (Supplementary figure 4A-D).
3.4. The Collagen Gene Expressions were Correlated with Stromal and Immune Cell Infiltration in Glioma
The tumor microenvironment consists of various immune and stromal cells, which has been considered to closely correlate with patient prognosis. The correlations between collagen gene expressions and ESTIMATE scores were examined. Results showed that collagen gene expressions were significantly positively correlated with stromal and immune scores in TCGA dataset (Figure 4A-F). Moreover, collagen genes also showed a significant correlation with stromal and immune scores in GEO and CGGA datasets (Supplementary figure 5A-B).
3.5. The Collagen Gene Expressions were Correlated with Immunosuppressive Properties in Glioma
Stromal and immune cells can secrete many factors that cultivate a chronic inflammatory and immunosuppressive intratumoral atmosphere. We found that the collagen genes were significantly positively correlated with the majority of immunosuppressive and immunosuppressive cell recruitment factors (Figure 5A-D). Moreover, a PPI network of collagen genes and immunosuppressive and immunosuppressive cell recruitment factors showed that they were closely correlated with each other (Figure 5E).
3.6. The Collagen Gene Expressions were Closely Associated with Immune Cell Infiltration in Glioma
We comprehensively explored the correlation between the collagen genes and immune cell infiltration in both LGGs and GBM by using the TIMER database. The results revealed that there was a positive correlation between the expression of COL1A1, COL1A2, COL3A1, COL4A1, and COL4A2, and the infiltration of B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils, and dendritic cells in LGGs (p < 0.05) (Figure 6A-E). Except for in CD4+ T cells, the COL5A2 expression was positively correlated with the infiltration of the other five immune cell types (B cells, CD8+ T cells, macrophages, neutrophils, and dendritic cells; all p < 0.05) in LGGs (Figure 6F). Besides, the expression levels of 6 collagen genes were all positively correlated with the infiltration of dendritic cells (all p < 0.05) in GBM (Supplementary figure 6A-F).
3.7. The Collagen Genes were Significantly Involved in the EMT Process of Glioma
FN1, VIM, SNAI2, ACTA2, CTNNB1, TWIST1, and SNAI1 are important EMT biomarkers [22, 28]. As shown in Figure 7A, they were strongly or moderately positively correlated with the collagen genes (COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, and COL5A2) based on the TCGA dataset. Moreover, we conducted a PPI network analysis of collagen genes and EMT related genes to explore the potential interactions among them. As expected, these collagen genes were closely correlated with the EMT related genes (Figure 7B).
3.8. Knockdown of COL5A2 Suppressed Migration and Invasion in Glioma Cells
To further validate the collagen genes were involved in the EMT process, we took COL5A2 for the subsequent analysis. The expression of COL5A2 in SHG44 and A172 cells was higher than normal glial cells by qRT-PCR (Figure 8A). To further confirm the biological function of COL5A2 on glioma progress, COL5A2 was knockdown by using small interfering RNA (siRNA) in SHG44 and A172 cells. After transfected with siCOL5A2, the expression of COL5A2 was low in SHG44 and A172 cells (Figure 8B-C). On account of the EMT program usually accompanied by the changes in cell migration and invasion, then, wound healing and transwell assays were conducted to detect the migratory and invasive ability of glioma cells. The result showed that knockdown of COL5A2 leads to delayed wound healing and decreased invasive ability (Figure 8D-E). These findings indicated that COL5A2 promotes migration and invasion in glioma.