Gene co-expression network of GC
A flow chart of this study is shown in Figure 1. The RNA sequencing data of GC from 375 patients downloaded from TCGA database and the expression values of 56831 genes,were applied to construct the co-expression network. Cluster analysis was performed on these samples using the flashClust package. After outliers were removed, the soft threshold power value was sat to four (Figure 2), to define the adjacency matrix between all genes with power function, according to the standard scale-free network distribution.
Forty-six gene co-expression modules, clustered in size from 31 to 2474 genes, were partitioned by hierarchical clustering and by dynamic tree cutting (Figure 3A). The unique colour identifier for each module is listed in Additional Table 1. Among these modules, three were merged because of their similar MEs, and forty-three modules were identified in total. The gray module represented a gene set that was not assigned to any of the modules. In total, the interaction of the forty-two co-expression modules were analyzed (Figure 3B ).
Table 1
The correlation of three genes (FAM83D, ITGAL and ALDH1B1) expression and clinicpathological features of GC patients
Parameters | FAM83D | ITGAL | ADLH1B1 |
Case | Low expression | High expression | P value | Case | Low expression | High expression | P value | Case | Low expression | High expression | P value |
Age (Mean± SD) | 415 | 65.85±10.67 | 65.31±10.60 | 0.61 | 410 | 65.69±10.84 | 65.55±10.57 | 0.89 | 410 | 65.58±10.62 | 65.65±10.79 | 0.95 |
Sex | | | | | | | | | | | | |
Female | 147 | 74 | 73 | 0.38 | 147 | 60 | 87 | 0.09 | 147 | 79 | 68 | 0.57 |
Male | 268 | 147 | 121 | | 263 | 130 | 133 | | 263 | 149 | 114 | |
Pathological T | | | | | | | | | | | | |
I-II | 110 | 61 | 49 | 0.59 | 107 | 61 | 46 | 0.01 | 107 | 62 | 45 | 0.57 |
III-IV | 305 | 160 | 145 | | 303 | 129 | 174 | | 303 | 166 | 137 | |
Pathological N | | | | | | | | | | | | |
N0 | 123 | 64 | 59 | 0.72 | 121 | 60 | 61 | 0.39 | 121 | 58 | 63 | 0.04 |
N1-3 | 292 | 158 | 134 | | 289 | 130 | 159 | | 289 | 170 | 119 | |
Pathological M | | | | | | | | | | | | |
M0 | 367 | 190 | 177 | 0.10 | 362 | 167 | 195 | 0.82 | 362 | 193 | 169 | 0.01 |
M1-3 | 48 | 31 | 17 | | 48 | 23 | 25 | | 48 | 35 | 13 | |
Pathological stages | | | | | | | | | | | | |
I-II | 186 | 103 | 83 | 0.44 | 184 | 94 | 90 | 0.08 | 184 | 96 | 88 | 0.21 |
III-IV | 229 | 118 | 111 | | 226 | 96 | 130 | | 226 | 132 | 94 | |
Histological type | | | | | | | | | | | | |
Intestinal | 176 | 88 | 88 | 0.61 | 173 | 98 | 75 | <0.001 | 174 | 101 | 73 | 0.32 |
Diffuse | 239 | 133 | 106 | | 236 | 91 | 145 | | 236 | 126 | 110 | |
Histological grade | | | | | | | | | | | | |
G1-G2 | 160 | 84 | 76 | 0.76 | 159 | 104 | 55 | <0.001 | 159 | 92 | 67 | 0.62 |
G3-GX | 255 | 137 | 118 | | 251 | 86 | 165 | | 251 | 136 | 115 | |
Anatomical subdivision | | | | 0.02 | | | | 0.48 | | | | 0.005 |
Cardia/GEJ | 97 | 58 | 39 | | 95 | 50 | 45 | | 95 | 66 | 29 | |
Fundus | 143 | 82 | 61 | | 147 | 62 | 85 | | 147 | 80 | 67 | |
Antrum | 156 | 70 | 86 | | 157 | 73 | 84 | | 157 | 75 | 82 | |
others | 19 | 11 | 8 | | 11 | 5 | 6 | | 11 | 7 | 4 | |
Living status | | | | | | | | | | | | |
Yes | 252 | 147 | 105 | 0.008 | 251 | 121 | 130 | 0.34 | 251 | 140 | 111 | 0.93 |
No | 163 | 74 | 89 | | 159 | 69 | 90 | | 159 | 88 | 71 | |
Abbreviation: GEJ: gastroesophageal junction. |
As shown in the network heatmap plot (Additional Figure. 1), each module represented individual validation to one another. To further explore the co-expression similarity of the whole modules, MEs were quantified and were clustered according to modules’ correlation (Figure. 4A). These 42 modules were classified into four main cluster sets. Within these cluster sets, the big clustering branch included 26 modules that could be subdivided into 12 sub-clusters. The remaining three clusters were subdivided into 16 other modules. The left and right branch of the clustering tree were both also divided into five sub-clusters. This result was consistent with the heatmap plot of the adjacencies (Figure. 4B).
Modules correspond to clinical significance
To investigate the corresponding clinical feature of the module, the Pearson correlations were analyzed between MEs and clinical traits, including: sex, age, clinical TNM stage, histological type, anatomical subdivision of neoplasm, histological grade and pathological stage (Figure 5). Eight modules were positively or negatively correlated with the defined clinical traits. The relative highest association with the module-feature relationship, was between the tan module and the neoplasm of anatomical location (at the gastroesophageal junction, GEJ) in histological region. The second significant association was between turquoise, violet modules, and the histopathological differential features of diffuse-type, histological grade G3, and pathological stage Ia.
A scatter plot of gene significance (GS) versus module membership (MM) were produced with select modules that demonstrated high module-feature associations. (Additional Figure 2). GS and MM were that demonstrated high correlation include the tan, turquoise, purple modules. The black modules had a relatively weaker GS-MM correlation and were consequently excluded from the functional enrichment analysis.
Functional enrichment analysis of genes in meaningful modules
Biological significance of selected modules was assessed using GO term function analysis, specifically: biological process, cellular component and molecular function, and KEGG pathway enrichment analysis. All genes in modules of interest were imported to DAVID software and string-db online analysis.
Light-green module genes were from the nuclear component and involved in RNA binding and poly (A) RNA binding, and were correlated to RNA splicing and RNA processing (Additional Table 2). Enrichment analysis showed that the tan module was involved mainly in epidermis tissue development and cell-cell junction during GC progression. Coding proteins from the tan module genes were also mostly correlated to the composition of extracellular exosome and zona adherent junction. Grey60 module genes were involved in cellular transcription, their coding proteins were the component of nucleus and nucleoplasm.
In modules of histological type, the turquoise, orange and salmon modules underwent enrichment analysis. The steel-blue module was excluded from this analysis. The turquoise module was associated with oxidation-reduction process and cell-cell adhesion, which were enriched in metabolic pathways. The orange module genes were correlated to angiogenesis. The salmon module genes were shown to play a role in exerting regulatory roles in inflammatory and immune responses. The coding proteins of these genes were the component of extracellular exosome and lysosomal lumen, the pathway was therefore involved with lysosome and phagosome.
The histological grade and pathological stage were also assessed in the modules. Here, the violet modules were excluded from enrichment analysis due to insufficient enrichment results. The purple module genes were from cytosolic transduction signal molecules, which were involved in the positive regulation of GTPase activity and protein binding, and were mainly enriched in immune response such as leucocyte migration. The pathway of this module was mainly regulated via NF-κB signaling pathway and human T-lymphotropic virus I (HTLV-I) infection.
Identification of hub genes from meaningful modules
The role of hub genes in modules suggest the biological significance of these modules. High MM value determines which genes were in network center and played essential biological function in whole network. To identify central nodes of these modules, intramodular analysis were conducted in seven modules including grey60, light-green, orange, purple, salmon, tan, and turquoise modules. From this, 293 genes with high intramodular connectivity value were identified as hub genes (Figure 6 and Additional Table 3). Another analytical approach, PPI network analysis, was introduced to analyze module genes. Hub gene criteria was set as the threshold value of confidence >0.4 and connectivity degree of ≥10. A total of 140 genes with high connectivity degree were obtained from six modules analyzed (the grey60, light-green, orange, purple, salmon and tan modules) (Additional Table 3). Turquoise modules were not assessed by String software, the number of genes exceeded the threshold setting of gene capacity. Finally, 47 intersectional hub genes from two analytical methods were collated, which were defined as common hub genes.
Common hub gene upstream regulator analysis
To investigate the upstream regulators of common hub genes, the miRNAs and TFs were predicted with the two databases of TarBase and miRTarBase[18]. High connectivity value ≥10 was set as a criterion to screen for the common hub genes. As shown in Additional Table 4, predicted miRNAs of four modular genes were constructed, while grey60 and tan modules were excluded due to low connectivity of modular genes. To further understand the regulatory mechanism between TFs and common hub genes, TF-gene networks of each module were constructed by an online tool, NetworkAnalyst[18], (Additional Table 5). Finally, 16 common hub genes with high connectivity degree with miRNAs and TFs from different modules, were selected as GC biomarkers.
Identification and validation of common hub genes
To validate the 16 common hub genes of modules further, differential expression and survival analyses were performed by GEPIA online tool and KM plotter database separately. Five common hub genes, ITGAL, CD86, ALDH1B1, FAM83D and HSPB6 had significantly different expression (Figure 7A). Survival analysis showed ITGAL, ALDH1B1 and FAM83D were correlated with better prognostic outcomes for GC patients (Figure 7B). The genes CD86 and HSPB6 were not included in the analysis because their abnormal regulation implied a longer survival time. The clinicopathological parameters of three common hub genes were analyzed in Table 1. Compared to ITGAL and ALDH1B1, high FAM83D expression was only associated with the living status of GC patients (p = 0.002).