COL5A2 is upregulated in GC tissues and correlates with poor survival in the TCGA and GEO databases.
First, TCGA-STAD was used to predict the mRNA expression levels of three major isomers of the COL5 family in GC and adjacent normal tissues. COL5A1 and COL5A2 were up-regulated in GC compared with COL5A3 (P<0.05) (Figure 1A). To evaluate the prognostic value of the COL5 family mRNA expression in GC, Kaplan-Meier analysis and the log-rank test were used to verify the relationship between mRNA expression and OS or PFS in GC patients. In patients with high COL5A2 expression, OS and PFS were significantly reduced (P<0.05), however, COL5A1 was only have a significant trend in OS (P=0.12) and PFS (P=0.14) (Figure 1B and C). Analysis of T stage showed that COL5A2 expression in advanced GC was significantly higher than that in early GC (Figure 1E). The above analysis showed that high COL5A2 expression indicated a poor prognosis of GC. Therefore, we chose COL5A2 for further exploration (Figure 1D).
To verify the findings in the TCGA database, the GSE62229 and GSE15459 datasets were selected to evaluate the expression and prognosis of COL5A2. COL5A2 expression in cancer tissue was significantly higher than that in adjacent normal tissues (P<0.001) (Figure 2B). Additionally, in the two GEO databases, patients with low COL5A2 expression showed longer OS and PFS (Figure 2A and C).
High COL5A2 expression indicates a poor prognosis in GC tissues
To validate the possible role of COL5A2 in GC progression, the expression pattern of COL5A2 was explored in paired clinical tissue samples in our patient samples. Thus, 126 paraffin-embedded GC tissues and 60 adjacent normal tissues with complete clinicopathological variable and follow-up information were collected. The COL5A2 protein level was significantly higher in GC tissues than in normal tissues (P<0.001; Figure 3A, 3B). Next, we used RT-qPCR to assess the expression pattern of COL5A2 in 48 pairs of fresh specimens and adjacent non-cancerous tissues (Figure 3C); the findings were consistent with the IHC results. Taken together, these results confirmed that COL5A2 is highly expressed in GC tissues.
Next, the prognostic role of COL5A2 was confirmed in our samples. Based on the COL5A2 expression levels, patients with complete follow-up information were divided into the COL5A2 low-expression group (negative or weakly positive expression, n=64) and COL5A2 high expression group (moderately or strongly positive expression, n=64). Kaplan-Meier curves confirmed that patients with high COL5A2 expression had a significantly shorter OS than those with low COL5A2 expression (P=0.0085, Figure 3D). Additionally, we verified the significance of COL5A2 in the survival of advanced GC (P=0.018; Figure 3E).
The association between COL5A2 expression and clinicopathological parameters in patients with GC was further evaluated. As shown in Table 1, COL5A2 expression in GC was correlated with Borrmann type (P=0.036), histological type (P=0.013), and T stage (P<0.011). A significant correlation was not found between COL5A2 and age, sex, tumor location, tumor size, or N stage. These results confirmed that COL5A2 expression is associated with the malignant phenotype of GC.
Weighted co-expression network construction and module identification
After quality evaluation and data preprocessing, an expression matrix was formed from the 298 GC samples of the GSE62229 dataset. The clinical traits were shown in the heatmap of the clustering dendrogram (Figure 4A). With the variance in the top 25%, 5407 genes were screened out and used for subsequent co-expression analysis. When choosing the soft threshold, we calculated the network topology with power values from 1 to 20. As shown in Figure 4B, the power value of 3, which was the lowest power of the scale-free topological fit index of 0.9, was pitched on. Additionally, the mean connectivity met the scale-free network distribution at the power value of 3. After merging similar clusters, thirteen different modules were identified that contained groups of genes with similar connection strengths (Figure 4C).
Finally, we found that COL5A2 was enriched into the salmon module (Figure 5A). and was highly correlated with T stage and Lauren stage (Figure 5B, r = 0.32, P=3e-8 and r = 0.31, P=4e-8). Interestingly, the salmon module was also found to be related to pStage (r = 0.23, P=8e-5) and survival status (r = 0.23, P=9e-5). Additionally, we selected the top 100 genes related to COL5A2 and constructed a visualized network using Cytoscope software (Figure 5C).
Functional Annotation and GSEA in the GSE62229 dataset and TCGA database
To understand the biological correlation of COL5A2, GO enrichment and KEGG pathway analyses were carried out. The top GO terms are shown in Figure 6A. The most enriched GO terms were as follows: BP (biological process), such as the extracellular matrix and structure organization, epithelial cell proliferation, and cell- substrate adhesion, CC (cellular component) such as the extracellular matrix, endoplasmic reticulum lumen, collagen trimer, and basement membrane, and MF (molecular function) such as cell adhesion molecular binding, glycosaminoglycan binding, and growth factor binding. Additionally, these genes were mainly enriched in the PI3K-Akt signaling pathway and focal adhesion, suggesting that the tumor microenvironment plays an important role in metastasis development (Figure 6B).
We performed GSEA of the GSE62229 dataset and TCGA database which revealed that COL5A2 was enriched in focal adhesion, ECM receptor interaction and regulation of actin cytoskeleton (Supplementary Figure S1). The GSEA results also showed that metastasis samples were significantly enriched in several well-known cancer-related pathways, such as the TGF-β, MAPK and JAK2 signaling pathways (Figure 7A, B). The results provide clues into the in-depth mechanism of metastasis development.