Analysis of differential genes. We downloaded the gene expression profiles of GSE29431 from the GEO database, including 2 groups (54 breast cancer patients and 12 healthy controls). Subsequently, we identified 1174 differential genes from the GSE29431 dataset, of which 201 genes were up-regulated and 973 genes were down-regulated. We selected the top 50 differentially expressed genes to draw a heat map (as shown in Figure 1). The differential expression analysis test data of the top 50 genes are shown in Table 1. The five genes with the most significant differences included HEPN1, GPD1, C14orf180, TUSC5 and PLIN4.
Table 1. Analysis of differential expression of genes (top 50) in breast cancer patients and controls
ID
|
logFC
|
t
|
P.Value
|
adj.P.Val
|
HEPN1
GPD1
C14orf180
TUSC5
PLIN4
CA4
ITGA7
S100B
KANK3
PPP1R1A
LOC101930114
LGALS12
TIMP4
NPR1
MRAP
GLYAT
LOC101926960
LVRN
PDE2A
HSPB7
RBP4
CCDC69
BHMT2
ITIH5
LOC284825
TNMD
CRYAB
DGAT2
ATOH8
SLC19A3
PPARG
FHL1
DEFB132
SLC7A10
GPIHBP1
GPR146
SGCG
ALDH1L1
ANGPTL8
PEAR1
AIFM2
COPG2IT1
CIDEA
SYN2
ACSM5
KLB
MYOM1
LOC102723493
TMEM37
PCK1
|
-3.563188222
-2.713780204
-3.374748954
-3.454638472
-5.182630648
-3.299847481
-3.314514472
-2.331809407
-3.211710907
-3.852232407
-3.580563389
-3.334919046
-5.414165648
-2.428254537
-2.241332148
-3.212384722
-3.549222685
-3.797365787
-3.161842944
-1.831890639
-4.290525972
-3.058627231
-2.185756444
-3.955621991
-2.863705259
-4.397284435
-3.924310676
-4.714381435
-1.282317713
-4.344914898
-4.184964389
-3.669314611
-4.285100028
-2.34257113
-3.056934028
-3.201143694
-4.21236887
-1.489132056
-2.107141019
-2.284923824
-2.762923602
-3.603184907
-1.822927454
-1.466667389
-2.329249593
-4.562398463
-3.097678028
-1.891032102
-1.741200139
-4.59206037
|
-22.13471403
-23.08382717
-22.93544532
-22.37055701
-22.28482354
-31.20950032
-21.83632669
-21.43467393
-20.69160424
-20.36115288
-19.22917812
-19.21897031
-18.93437471
-18.77385175
-18.228354
-18.00599469
-17.87255737
-17.74402506
-17.43208876
-17.34736685
-17.1636739
-16.50641612
-16.36903537
-16.34871543
-16.27938304
-16.06722762
-16.04821999
-16.04294971
-16.01662954
-15.98273405
-15.92893918
-15.91549098
-15.85294922
-15.67744214
-15.63768802
-15.5722904
-15.42520563
-15.37155843
-15.30395015
-15.25421788
-15.07341198
-15.0071905
-14.97614279
-14.94006728
-14.8459675
-14.78047133
-14.70083207
-14.69012763
-14.60747178
-14.3630843
|
1.55E-41
1.56E-33
2.29E-33
1.00E-32
1.26E-32
1.87E-32
4.16E-32
1.23E-31
9.58E-31
2.43E-30
6.39E-29
6.58E-29
1.53E-28
2.47E-28
1.29E-27
2.55E-27
3.85E-27
5.75E-27
1.53E-26
1.99E-26
3.57E-26
2.96E-25
4.64E-25
4.96E-25
6.23E-25
1.25E-24
1.34E-24
1.36E-24
1.48E-24
1.66E-24
1.99E-24
2.08E-24
2.56E-24
4.61E-24
5.26E-24
6.56E-24
1.08E-23
1.30E-23
1.63E-23
1.93E-23
3.59E-23
4.51E-23
5.02E-23
5.68E-23
7.87E-23
9.88E-23
1.30E-22
1.35E-22
1.80E-22
4.26E-22
|
3.37E-37
1.66E-29
1.66E-29
5.45E-29
5.47E-29
6.79E-29
1.29E-28
3.35E-28
2.32E-27
5.28E-27
1.19E-25
1.19E-25
2.56E-25
3.84E-25
1.87E-24
3.47E-24
4.93E-24
6.94E-24
1.75E-23
2.17E-23
3.70E-23
2.93E-22
4.39E-22
4.50E-22
5.42E-22
1.05E-21
1.06E-21
1.06E-21
1.11E-21
1.20E-21
1.39E-21
1.41E-21
1.69E-21
2.95E-21
3.27E-21
3.97E-21
6.35E-21
7.42E-21
9.09E-21
1.05E-20
1.90E-20
2.33E-20
2.54E-20
2.81E-20
3.80E-20
4.67E-20
6.03E-20
6.13E-20
8.01E-20
1.85E-19
|
GO enrichment analysis of differentially expressed genes. The results of GO analysis showed that the differentially expressed genes were enriched into 56 different GO subsets. The most significantly enriched subsets of each were extracellular structure organization, extracellular matrix, and cofactor binding (Figure 2). The GO subsets enriched in the top 20 are shown in Table 2.
Figure 2. GO enrichment analysis “Count” is the number of genes enriched, and “p. value.adjust” is the corrected P value. (a) Biological processes (b) Cellular components (c) Molecular functions
Table 2. GO enrichment analysis (top 20)
Term
|
Count
|
P-Value
|
Category
|
extracellular matrix
|
86
|
3.16E-15
|
CC
|
collagen-containing extracellular matrix
|
76
|
1.83E-14
|
CC
|
extracellular structure organization
|
70
|
2.66E-11
|
BP
|
multicellular organismal homeostasis
|
68
|
9.65E-07
|
BP
|
urogenital system development
|
64
|
7.54E-12
|
BP
|
regulation of lipid metabolic process
|
63
|
1.44E-07
|
BP
|
regulation of vasculature development
|
63
|
1.03E-06
|
BP
|
renal system development
|
61
|
3.83E-12
|
BP
|
regulation of angiogenesis
|
58
|
1.54E-06
|
BP
|
cell-cell junction
|
58
|
1.33E-05
|
CC
|
extracellular matrix organization
|
57
|
5.23E-08
|
BP
|
adherens junction
|
57
|
0.0016235
|
CC
|
response to acid chemical
|
56
|
8.08E-08
|
BP
|
kidney development
|
54
|
8.68E-10
|
BP
|
fatty acid metabolic process
|
54
|
1.03E-06
|
BP
|
cofactor binding
|
54
|
0.002207086
|
MF
|
actin cytoskeleton
|
51
|
0.0016235
|
CC
|
cell adhesion molecule binding
|
51
|
0.014555902
|
MF
|
apical part of cell
|
45
|
0.0016235
|
CC
|
organic acid catabolic process
|
42
|
8.24E-06
|
BP
|
“Term” is the name of the GO subset, “Count” is the number of enriched genes, “P-Value” is the P value, and “Category” is the type of subset belonging
KEGG pathway enrichment analysis of differentially expressed genes. The differentially expressed genes were enriched in signaling pathways such as PI3K-Akt, Focal adhesion, and proteoglycan in cancer (Table 3), with statistical significance. For the above KEGG pathways that meet the requirements, the R language is used for partial visualization (Figure 3).
Table 3. KEGG pathway enrichment analysis of differentially expressed genes
Term
|
Count
|
P-Value
|
PI3K-Akt signaling pathway
|
46
|
0.008276276
|
Focal adhesion
|
31
|
0.006678947
|
Proteoglycans in cancer
|
31
|
0.007120448
|
ECM-receptor interaction
|
25
|
1.70E-06
|
PPAR signaling pathway
|
22
|
4.86E-06
|
Carbon metabolism
|
22
|
0.005118641
|
AMPK signaling pathway
|
21
|
0.008276276
|
Relaxin signaling pathway
|
21
|
0.016820746
|
Insulin resistance
|
19
|
0.012446009
|
AGE-RAGE signaling pathway in diabetic complications
|
18
|
0.012446009
|
Regulation of lipolysis in adipocytes
|
16
|
0.000392322
|
Glycerolipid metabolism
|
14
|
0.006678947
|
Pyruvate metabolism
|
13
|
0.002854637
|
Malaria
|
12
|
0.008276276
|
Propanoate metabolism
|
10
|
0.006678947
|
Fatty acid degradation
|
10
|
0.025282878
|
“Term” is the name of the KEGG pathway, “Count” is the number of genes, and “P-Value” is the P value
Protein-protein interaction analysis. The 1174 differentially expressed genes were imported into the String database, and protein-protein interactions (PPIs) were analyzed, and a PPIs network was constructed. The lines between nodes represent the interaction relationship between proteins. (Figure4)
Survival analysis. By using the GEPIA online analysis website, we performed a survival analysis of the five most significantly different genes, HEPN1, GPD1, C14orf180, TUSC5 and PLIN4, and the results are shown in Figure 5. The abscissa in the figure represents the survival time, the ordinate represents the overall survival rate, the red indicates the overall survival rate when the gene is highly expressed, and the blue indicates the overall survival rate when the gene expression is low. The results showed that the decrease in the expression of gene HEPN1 significantly decreased the overall survival rate of breast cancer patients (P<0.05), and the expression changes of other genes had no significant effect on the overall survival rate of breast cancer patients. Therefore, it is speculated that the low expression of HEPN1 may play an important role in the prognosis and development of breast cancer patients. The higher the expression of HEPN1 gene, the better the prognosis of patients. It indicates that the HEPN1 gene may be a potential biomarker for predicting the prognosis of breast cancer patients.
Validation of differential gene expression levels. We showed the expression levels of 5 differential genes in normal and breast cancer tissues through boxplots (Figure 6). It can be seen from Figure 6 that the expression levels of HEPN1, GPD1, C14orf180, TUSC5 and PLIN4 genes in breast cancer tissues were significantly lower than those in normal tissues. These five differential genes are expected to be potential biomarkers of breast cancer.