Identification of DEGs
Logging in to the GEO database (https://www.ncbi.nlm.nih.gov/geo/) to search for TNBC, a total of 4,442 results were displayed. Through our careful reading and comparison, we selected 3 TNBC-related gene expression profiles (GSE64790, GSE62931, GSE38959). GSE64790 contains 3 TNBC samples and 3 normal samples, GSE62931 has 47 TNBC samples and 53 normal samples, GSE38959 contains 30 TNBC samples and 13 normal samples (Table 1). All results were obtained by comparing the differential expression of each gene in normal tissue samples and TNBC samples with the help of GEO2R online data analysis tool. We identified the P<0.05, |logFC|≥1.5 as the standard. The results showed that there were 600 DEGs in GSE62931, containing 269 up-regulated genes, and 331 down-regulated genes; GSE38959 had 1550 DEGs, up-regulated 1010, down-regulated 540; In gene chip GSE64790, a total of 660 DEGs were detected, including 186 up-regulated genes and 374 down-regulated genes. Subsequently, the venn graph network drawing tool was used to obtain their intersection to ensure the reliability of results. Finally, there were 66 DEGs, 33 of which were both up-regulated and down-regulated genes.
Table 1 Statistics of the three microarray databases selected from the GEO database
Dataset ID
|
TNBC
|
Normal
|
Total Number
|
GSE64790
|
3
|
3
|
6
|
GSE62931
|
47
|
53
|
100
|
GSE38959
|
30
|
13
|
43
|
Abbreviations: GEO, Gene Expression Omnibus; TNBC, triple-negative breast cancer.
Functional enrichment analyses
Enter the differential gene list into the DAVID database for GO and KEGG pathway enrichment analysis. The enriched GO terms were divided into CC, BP, and MF. In GO analysis, the results revealed that the DEGs were mainly enriched in BP term including mitosis, cell proliferation and so on ; in CC term, they were mainly enriched in nucleus and nucleoplasm; MF term showed that DEGs were mainly enriched in protein binding. In addition, KEGG pathway analysis results displayed that DEGs were mainly enriched in pathways related to Progesterone-mediated oocyte maturantion, oocyte meiosis and cell cycle (Table 2).
Table 2 Significantly enriched GO terms and KEGG pathways
Category
|
Term
|
Description
|
Count
|
P-value
|
BP term
|
GO:0007067
|
Mitotic nuclear division
|
10
|
1.4E-7
|
BP term
|
GO:0008283
|
Cell proliferation
|
10
|
3.6E-6
|
CC term
|
GO:0005634
|
Nucleus
|
32
|
4.3E-4
|
CC term
|
GO:0005654
|
Nucleoplasm
|
20
|
1.4E-3
|
MF term
|
GO:0005515
|
Protein binding
|
42
|
4.8E-3
|
KEGG pathway
|
hsa04914
|
Progesterone-mediated oocyte maturantion
|
4
|
3.7E-3
|
KEGG pathway
|
hsa04114
|
Oocyte meiosis
|
4
|
7.3E-3
|
KEGG pathway
|
hsa04110
|
Cell cycle
|
4
|
9.8E-3
|
Abbreviations: BP, biological process; CC, cellular component; MF, molecular function. GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.
PPI network construction and DisGeNET analysis
Metascape analysis results showed that the DEGs and their neighboring genes were mainly enriched in cell division, mitotic nuclear divison and cell cycle phase transition (Figure 2: A and B). Meanwhile, the website used the MCODE algorithm to cluster the PPI network to identify the subnetwork, that was, the potential protein complexes containing five core genes (CCNB2, BIRC5, CENPA, CENPF, SKA1) (Figure 2: C and D). Herein, Metascape also provided a DisGeNET analysis. Through quality control and association analysis, these DEGs were significantly related to the occurrence of the invasive carcinoma of breast, carcinoma of male breast, malignant neoplasm of male breast and other diseases (Figure 2: E).
ONCOMINE Analysis
The graph reveals the numbers of datasets with statistically significant mRNA over-expression (red) or down-regulated expression (blue) of the target gene. The threshold was designed with following parameters: p-value of 1E-3 and fold change of 1.5. We compared the transcription levels of core genes in cancers with those in normal samples by using ONCOMINE databases (Figure 3 and Figure 4). ONCOMING analysis showed that the mRNA expression of BIRC5, CCNB2, CENPA, CENPF and SKA1 was upregulated in patients with breast cancer. In Curtis’s dataset, BIRC5 was upregulated in medullary breast carcinoma compared with that in the normal samples, with a fold change of 6.014 and p–value of 9.13E-17. In Turashvili’s dataset, CCNB2 was overexpressed in Invasive ductal breast carcinoma with a fold change of 4.653 and p–value of 6.05E-6. In Curtis’s dataset, CENPA was overexpressed in Invasive ductal breast carcinoma compared with that in the normal samples, with a fold change of 2.183 and p–value of 1.27E-115. In the TCGA dataset, the transcription level of CENPF was significantly higher in patients with Invasive lobular breast carcinoma than that in the normal specimens, with a fold change of 6.980 and p–value of 1.31E-21. In Turashvili’s dataset, the fold change of mRNA expression of SKA1 in Invasive ductal breast carcinoma was 7.501 and p–value of 2.48E-6.
The Kaplan–Meier Plotter Analysis
Five genes (CCNB2, CENPF, SKA1, CENPA and BIRC5) were found to be associated with relapse-free survival (RFS) in TNBC through the Kaplan–Meier Plotter analysis. Patients with a higher level of them had worse RFS compared to those with lower levels. Among them, the overexpression of CCNB2 was the most unfavorable prognostic factor of relapse-free survival in TNBC patients (HR=1.98; 95% CI: 1.28–3.06; P=0.0018; n=255) in accordance with the lowest logrank p value. To date, the TNBC cases in the database are still insufficient for overall survival analysis (Figure 5).
Immunohistochemistry staining
The results of immunohistochemical staining showed that the CCNB2 protein expression in TNBC tissues was significantly higher than that in adjacent tissues. And the protein was located in the cytoplasm and displayed mainly brown-yellow granular staining in TNBC tissues.