Identification of DEGs
After making differential genes for 134 sets of samples, 2063 differential genes were obtained according to the screening conditions | logFC | > 2.0 and P < 0.01, of which 897 genes were up-regulated and 1166 genes were down-regulated. The differential genes were displayed in a volcano diagram as in Fig. 1.
Co-expression Module Identification
The co-expression network was constructed for the resulting 2063 differential genes, yielding a total of four modules, excluding the gray module. The different colors represent different modules, as shown in Fig. 2A. Then a correlation heat map of the module and sample traits was drawn, as shown in Fig. 2B, with rows representing modules and columns representing traits, and traits including both HER2 + and normal. According to the correlation heat map, it can be seen that the module with the highest correlation with HER2 + breast cancer is the turquoise module with a correlation of 0.94. The turquoise module is a key co-expression module with a gene count of 760.
Functional Enrichment Analysis
The 760 genes in the key co-expression module obtained in the previous step were subjected to GO and KEGG pathway analysis in order to further understand the functions of these genes, as shown in Fig. 3. In the GO analysis, the BP of the key co-expression modules were focused on extracellular matrix organization,extracellular structure organization and regulation of ion transmembrane transport, CC mainly include collagen-containing extracellular matrix, transporter complex༌apical part of celland DNA packaging complex, and MF is mainly related to extracellular matrix structural constituent, glycosaminoglycan binding, and receptor ligand activity. KEGG pathway analysis of key co-expression modules revealed that these key genes primarily function in pathways including neuroactive ligand − receptor interaction, cytokine − cytokine, receptor interaction, tyrosine metabolism and phenylalanine metabolism.
PPI networks and Hub Genes
A PPI network was constructed for 760 key genes of key co-expression modules and 100 key genes were found using the MCC algorithm of the CytoHubba plugin (Fig. 4). Differential gene analysis was also used to obtain differential genes for the other three subtypes of breast cancer, with the same screening method | logFC | > 2.0 and P < 0.01. The differential gene results were as follows, with basal 2093, lumA 1365, and lumB 2084. The 100 key genes obtained by the MCC algorithm were compared two-by-two with the differential genes of the other three subtypes to identify genes present only in the HER2 subtype and not in the other subtypes, DRD4 UTS2 and GLP1R, respectively, as shown in Fig. 5. DRD4 and UTS2 were up-regulated and GLP1R was down-regulated. To verify the accuracy of the key genes, we plotted their expression values in box plots across the four subtypes, as shown in Fig. 6, where UTS2 is expressed higher in the HER2 + subtype than in the other subtypes. DRD4 was most highly expressed in the lumA subtype, but its logFC value was less than 2, which did not meet the screening criteria for differential genes, as shown in Table 1. The logFC value for DRD4 in the HER2 subtype was 2.11, which exceeded the screening criteria by 2. GLP1R is less expressed in the HER2 subtype than in the other subtypes. It is therefore concluded that DRD4 UTS2 and GLP1R were key genes that distinguish the HER2 + subtype from other subtypes.
Table 1
results of differential expression of DRD4 UTS2 and GLP1R
genesymbol
|
subtype
|
logFC
|
t
|
P.Value
|
adj.P.Val
|
DRD4
|
lumA
|
1.9667676
|
13.53841
|
2.60E-34
|
2.68E-33
|
DRD4
|
lumB
|
1.4850377
|
8.286951
|
1.53E-14
|
5.42E-14
|
DRD4
|
basal
|
1.7474764
|
9.56844
|
7.61E-18
|
3.49E-17
|
DRD4
|
HER2+
|
2.1192462
|
9.945784
|
9.10E-18
|
5.89E-17
|
UTS2
|
lumA
|
1.0425104
|
6.531077
|
2.11E-10
|
5.49E-10
|
UTS2
|
lumB
|
1.4346405
|
7.702268
|
5.68E-13
|
1.82E-12
|
UTS2
|
basal
|
1.7886484
|
7.733012
|
6.77E-13
|
2.21E-12
|
UTS2
|
HER2+
|
2.2715777
|
8.116018
|
2.84E-13
|
1.22E-12
|
GLP1R
|
lumA
|
-1.56017
|
-7.32302
|
1.48E-12
|
4.40E-12
|
GLP1R
|
lumB
|
-1.80133
|
-7.09725
|
2.06E-11
|
5.89E-11
|
GLP1R
|
basal
|
-0.7034848
|
-2.191190
|
0.029700
|
0.04073
|
GLP1R
|
HER2+
|
-2.09328
|
-8.33133
|
8.62E-14
|
3.89E-13
|
Survival analysis of key genes and Validation of Protein Expressions
The PROGgeneV2 database was used to view the effect of UTS2 DRD4 and GLP1R expression on the prognosis of HER2 + patients. The data used for the survival analysis was GSE6130 from the GEO database, and the data was divided into high and low expression groups according to median gene expression to see its impact on survival in HER2 + positive patients. The results showed that patients with low expression of DRD4 and GLP1R had a lower survival time than those with high expression, while UTS2 had a lower survival rate with high expression, as shown in Fig. 7. In addition, in the survival analysis of UTS2, the p-value was 0.0025, which was statistically significant. From the above analysis, it is evident that high expression of UTS2 is significantly associated with low survival in HER2 + patients. To further validate UTS2, the HPA database was used to view its expression in normal breast tissues as well as breast cancer tissues, and the results are shown in Fig. 8. The results showed that the protein expression level of UTS2 was significantly higher in breast cancer than in normal tissues, further validating the accuracy of the results.
Cox Proportional Hazards Regression Model
In the Cox regression model, clinical data were downloaded from the TCGA database for HER2 + patients, where age was the actual age of the patient and tumor grade was represented by 1–4 for T1-T4 grade, respectively. In the expression of UTS2, 0 represented low expression and 1 represented high expression. The Cox regression model was constructed using the data as above and combined with the patient's survival time, and the model results were displayed as a nomogram, as shown in Fig. 9A. By adding up the points corresponding to each index, which corresponds to the total points, we can get the survival rate of 2 years and 4 years for different patients. It can be seen that higher expression of UTS2 corresponds to higher points, which corresponds to lower survival rate. In addition, age and tumor grade are also positively correlated with lower survival rates. To assess the accuracy of the nomogram, we plotted the calibration curves, which were in good agreement for both predictions, as shown in Fig. 9B,C.