Identification of DEGs
Compared with normal tissue,the analysis of GSE41657 identified 4712 DEGs consisting of 2210 downregulated genes and 2502 upregulated genes, and the analysis of GSE74602 identified 2878 DEGs consisting of 1335 downregulated genes and 1543 upregulated genes in cancer tissue(Fig. 1A-1B).
Construction of the colon cancer coexpression network
For WGCNA analyses, all samples were included in the clusters. The soft-power threshold β was determined by the function “sft$powerEstimate”; β = 8 (GSE41657) and β = 20 (GSE74602) were selected for further analysis of colon cancer and normal conditions in the two GEO datasets, respectively (Fig. 1C; Fig. 1F). We set the shear height to 0.25. Then, gene modules were detected based on the TOM matrix. The analysis detected twenty-one modules in GSE41657 and five modules in GSE74602 (Fig. 1D; Fig. 1G). From the heatmap of modules and characters, it was found that the brown module (3783 genes) in GSE41657 and the black module in GSE74602 (1690 genes) had the highest degree of correlation with colon adenocarcinoma (Fig. 1E; Fig. 1H). The subsequent analyses screened the GSE41657_brown module and GSE74602 black module for seed genes.
Screening for co-altered genes
The overlapping differentially expressed genes of the most significant modules (GSE41657_DIFF, GSE41657_BROWN, GSE74602_DIFF and GSE74602_BLACK) were obtained by analysis using an online Venn graphing tool. A total of 439 genes were identified (Fig. 1I).
GO and KEGG enrichment analysis of DEGs
The R packages "clusterProfiler", "org.Hs.eg.db", "enrichplot", and "ggplot2" were used for the functional enrichment analysis of the DEGs. The results indicated that the DEGs were mainly enriched in ‘mitotic nuclear division’ and ’organelle fission’ among BPs; ‘chromosomal region’ and ’kinetochore’ among CCs; and ‘DNA helicase activity’ and ‘ATPase activity’ among MFs. KEGG pathway analysis revealed strong enrichment in ‘cell cycle’, ’RNA transport’, ’oocyte meiosis’, and ’cellular senescence’, among other pathways (Fig. 1J-1M).
Screening prognostic genes
The correlation of gene expression and survival traits were then analysed. A total of 379 patients had complete gene expression and survival information. The overall survival (OS) was analysed for these genes using Kaplan–Meier survival plots. Briefly, the "survival" and "survminer" R packages were used to plot Kaplan–Meier curves. Only 26 (RUVBL1,RNASEH2A,MMP3,RPP40,DUSP18,CCT6A,MORC2,NCL,IL13RA2,MELK,PPRC1,CHEK1,RRP12,EPHB2,TIMP1,MFSD11,KIF15,NUP85,NAT1,TNFAIP8L3,PDCD11,PMM2, TRAP1,CDC25C,E2F3,NMB)gene from 439 differential genes were found to significantly impact survival (p<0.05) (Fig.2).
Screening for unstudied genes
Through the literature search, we found that RPP40 DUSP18, PPRC1, MFSD11 and PDCD11 gene has not yet been studied in the colonic carcinoma.Using HPA database RPP40, DUSP18, PPRC1, MFSD11 and PDCD11 in COAD protein expression levels in the organization, and comparing with the expression level of normal tissue.The results indicate that RPP40,DUSP18,PPRC1 and PDCD11 were significantly increased in tumor tissue compared with normal samples, while MFSD11 was significantly down-regulated in tumor tissue.This is consistent with the differential expression data visualized through the TCGA database (Fig.3).
PPI network and module analysis
A PPI network of the 439 identified DEGs was constructed. Fifteen significantly enriched modules were obtained using Cytoscape software . There were 32 genes in the most critical module——the red module(Fig. 4A).
R packages were also used for the functional enrichment analysis of these key genes. The results indicated that the DEGs were mainly enriched in ‘mitotic nuclear division’, ’organelle fission’, ’microtubule cytoskeleton organization involved in mitosis’, and other functions. KEGG pathway analysis revealed strong enrichment in ‘cell cycle’, ’oocyte meiosis’, ’RNA transport’ and other pathways (Fig. 4B-4C). The functional results of the genes in the most significant module indicated that these genes were mainly enriched in processes associated with the cell cycle, cell proliferation and division.
Among the 32 genes, only NUP85 was associated with prognosis(Fig. 2).Therefore, NUP85 was selected as a key gene for subsequent analysis.
Further analysis of key genes using TCGA database
The expression of NUP85 in 398 COAD cancer tissues and 39 paracancerous tissues in TCGA database was analysed. The results showed that the expression level of NUP85 in colorectal adenocarcinoma was significantly higher than that in adjacent tissues (P < 0.001, Fig. 5A). The expression level of NUP85 in colorectal adenocarcinoma of the same patient was also significantly higher than that in adjacent tissues (P < 0.001, Fig. 5B).
NUP85 expression was further analysed by GSEA to identify activated functions and signalling pathways in colorectal adenocarcinoma. Figures were drawn for the top 10 most important functions and pathways in GO and KEGG. The GO functional enrichment analysis results showed that the NUP85 high expression group was enriched in gene sets such as "DNA repair", "M phase" and "cell cycle phase", while the low expression group was enriched in gene sets such as "basolateral plasma membrane" and "extracellular matrix". KEGG pathway enrichment showed that the samples with high NUP85 expression were enriched in the pathways 'RNA degradation', 'cell cycle', and 'nucleotide excision repair', while the low-expression group was enriched in the pathways 'Focal adhesion' and 'ECM receptor interaction' (Fig. 5C-D).
Based on the GEPIA database analysis of NUP85 and tumor-related macrophage markers, we found that no correlation was detected between NUP85 expression with marker of M1 macrophages(NOS2), CD8+(CD8A,CD8B), CD20+(MS4A1) lymphocytes(P>0.05;Fig. 5E-H), and a significant positive correlation was observed between NUP85 expression and marker of CD4+(CD4) lymphocytes,M2 macrophages(CD163)(P < 0.05;Fig. 5I-J).
Immunohistochemical verification
Immunohistochemical test results showed that the expression of the target gene NUP85 in colorectal cancer tissues was significantly higher than that in normal mucosal tissues (P < 0.05; Table 1; Fig. 5K-L).
According to the NUP85 immunohistochemistry results, colorectal cancer samples were divided into a NUP85+ group and a NUP85- group. The results showed that the expression of CD4 and CD163 in the NUP85- group was higher than that in the NUP85+ group (p<0.05). Spearman correlation analysis showed that the expression of NUP85 was negatively correlated with CD4 positive lymphocytes and M2-type tumour-associated macrophages (TAMs) (p<0.05; Table 2; Fig. 5M-P).
Building a prognostic model
Kaplan–Meier (K–M) tests and univariate Cox analysis identified 7 prognostic differentially expressed genes from 439 differentially expressed genes with statistically significant association with survival, among which MORC2, TIMP1, PPRC1, and E2F3 were high-risk and PMM2, NAT1, and EPHB2 were low-risk (Fig. 6A-B).
After multivariate Cox analysis of the above 7 genes, the TIMP1, PMM2, E2F3 and MORC2 genes were retained, and survival curves were drawn to establish a prognostic prediction model (Fig. 2). The prognostic scoring formula was as follows: PI = 0.193×E2F3 expression level -0.253×PMM2 expression level +0.099×MORC2 expression level +0.003×TIMP1 expression level (Table 3).
A total of 337 patients with complete clinical data were assigned to the low-risk and high-risk groups according to the median risk score calculated by the prognostic scoring formula. The results showed that the 5-year survival rate was approximately 85% in the low-risk group and 45% in the high-risk group (P < 0.01). The 75% OS of the lower-risk group (5.2 years) was significantly longer than that of the higher-risk group (1.3 years) (Fig. 6C). Higher prognostic score (Fig. 6D) was associated with a greater likelihood of death at any timepoint (Fig. 6E); that is, the higher the risk score, the worse the prognosis. Fig. 6F shows the expression levels of each gene in the prognostic model in the low-risk group and the high-risk group.
Cox regression analysis of survival factors in patients with colorectal cancer and multi-indicator ROC curve were performed. Among the independent variables included in Cox regression analysis, colorectal cancer prognostic associated gene model score was a continuous variable, and the remainder were categorical variables. Univariate Cox regression model analysis showed that age, pathological stage, TNM stage and prognosis-related gene prognostic model score were risk factors affecting the overall survival of colorectal cancer patients (P < 0.05, Fig. 6G). Multivariate Cox regression model analysis showed that age and prognostic model score were independent factors influencing the prognosis of patients with colorectal cancer (P < 0.05, Fig. 6H). One-year, three-year and five-year survival rates were analysed by multi-indicator ROC curves, and the area under the curve (AUC) values of the ROC curve were 0.693, 0.743 and 0.808 (null hypothesis: real area = 0.5, P < 0.05, Fig. 6I), suggesting that the COAD prediction model could be accepted.