3.1 Differential gene expression
The GSE45547, GSE49710, and GSE16237 datasets downloaded from the GEO database were normalized (Figures 1A, 1B) and screened for genes differentially expressed by INSS IV and INSS I NBs. Genes with adjusted P-values <0.05 and|log2FC|> 3 were considered differentially expressed. A total of 62 genes were upregulated and 163 downregulated in NBs compared with adjacent normal tissue, as shown by volcano plots (Figure 1C) and heatmaps (Figure 1D).
3.2 Prognostic regression model
The ability of 53 prognosis-related genes to affect survival was determined by Kaplan-Meier analysis, and a Forest plot was drawn (Fig. 2A). Survival-related genes were further analyzed by lasso regression analysis to reduce interactions between variables (Fig. 2B). The λ value corresponding to the smallest error was selected and the corresponding gene was retained (Fig. 2C). Twelve genes were identified (Fig. 2D), and their ROC curves were plotted (Fig. 2E).
Table 2. Areas under the curve (AUC) of prognosis-related genes
Gene |
AUC
|
STK33
|
0.67
|
EIF2B2
|
0.68
|
PHIP
|
0.32
|
PTBP2
|
0.28
|
RUVBL1
|
0.71
|
POLR2H
|
0.73
|
TIA1
|
0.30
|
PAGE5
|
0.68
|
ANKRD60
|
0.65
|
DYNC1I2
|
0.25
|
SCN3A
|
0.26
|
NREP
|
0.27
|
The results of the lasso regression were integrated into the prognostic function to build and validate a prognostic model. A heat map scatter plot shows the distribution of risk scores, the survival status of each subject, and gene expression patterns (Fig. 3A). During follow-up, most of the patients in the high-risk group died and most of the patients in the low-risk group survived (Fig. 3B). The heatmap showed that six metabolism-related genes were highly expressed in the high-risk group and six metabolism-related genes were highly expressed in the low-risk group (Fig. 3C).
Risk score=1.1580*POLR2H+1.1437*ANKRD60+(-0.8321)* DYNC1I2+0.4468* EIF2B2+ 0.3572* RUVBL1+(-0.3006)*NREP+0.2128* STK33+(-0.2081)* PHIP+0.1755* SCN3A+(-0.0673)* PTBP2+0.0589* PAGE5+(-0.0031)* TIA1.
The differences between the two groups were compared by the Kaplan-Meier method curve. The HR for low vs. high risk was found to be 0.31 (95% CI: 0.17-0.55), with a P value <0.001, showing that the grouping was meaningful (Fig. 3D). The ROC curves of the survival model indicated that the 3-year and 5-year AUCs were 78.6% and 81.7%, respectively (Fig. 3E).
Prognostic Model Validation:
The prognostic model was externally validated using the E-MTAB-8248 dataset from the ArrayExpress database. These patients were successfully divided into high-risk and low-risk groups based on their risk scores (Fig. 4A-D). The HR for low vs high risk was 0.23 (95% CI 0.11-0.48, P <0.001), indicate that the model was valid. The 3-year and 5-year AUCs of the survival model were 72.1% and 74.7%, respectively (Fig. 4E).
Nomogram
Multivariate Cox survival analysis compared the effects of gender, age (<12 months vs. ≥12 months), MYCN amplification (non-amplified vs. amplified), ploidy (diploid vs. hyperdiploid), and MKI (low, medium, high) on 92 patients with complete data in the TARGET-NB dataset were retained. The HRs for the predictive model, ploidy, age, MKI, and COG score were 1.664 (95% CI 1.417-1.955, P <0.001), 0.424 (95% CI 0.23-0.784; P = 0.006), 16.014 (95% CI 2.201-116.519; P = 0.006), 1.771 (95% CI 1.205-2.602; P = 0.004) and 8.643 (95% CI 1.437-51.998; P = 0.018), respectively, for OS (Fig. 5A).
The nomogram containing the risk scores of the metabolic gene signature for OS prediction was found to have a C-index of 0.766, indicating high prediction accuracy (Fig. 5B). A calibration curve showed that the 5-year OS predicted by the nomogram was very close to the actual 5-year OS (Fig. 5C).
Risk score and immune correlation analysis
The correlation between risk score and immune cell infiltration into NBs was assessed by Spearman analysis using data from the TIMER database (http://timer.cistrome.org/) to determine if risk scores were associated with tumor immunity (Fig. 6A-F). Risk score was significantly negatively correlated with NK cell (-0.50) and myeloid dendritic cell (-0.41) infiltration, but was significantly positively correlated with infiltration of common lymphoid progenitor cells (0.37), B cells (0.354), and CD4+ Th1 T cells (0.494).
The abscissa in each panel represents the distribution of risk scores, and the ordinate represents the distribution of immune scores. The density curve on the right represents the trend in the distribution of immune scores, and the upper-density curve represents the trend in the distribution of risk scores. The value at the top represents the p-value of the correlation coefficient.
Table 3. risk_score
|
Pearson correlation coefficient
|
Sig.
|
NK cell_QUANTISEQ
|
-0.50**
|
0
|
Myeloid dendritic cell_QUANTISEQ
|
-0.41**
|
0
|
uncharacterized cell_QUANTISEQ
|
0.42**
|
0
|
Common lymphoid progenitor_XCELL
|
0.37**
|
0
|
B cell plasma_XCELL
|
0.35**
|
0
|
T cell CD4+ Th1_XCELL
|
0.49**
|
0
|
Macrophage M0_CIBERSORT
|
0.25**
|
0.003
|
T cell CD4+ Th2_XCELL
|
0.25**
|
0.003
|
T cell regulatory (Tregs)_QUANTISEQ
|
-0.24**
|
0.004
|
T cell CD4+ (non-regulatory)_QUANTISEQ
|
-0.23**
|
0.006
|
Macrophage M0_CIBERSORT-ABS
|
0.22**
|
0.008
|
Neutrophil_QUANTISEQ
|
0.22**
|
0.01
|
Macrophage M2_CIBERSORT
|
-0.20*
|
0.015
|
T cell CD8+_TIMER
|
-0.20*
|
0.017
|
Macrophage M2_CIBERSORT-ABS
|
-0.20*
|
0.017
|
Mast cell_XCELL
|
-0.18*
|
0.032
|
Tumor immunity
The immune cell profile of each sample in TARGET was assessed by CIBERSORT, with the number of sequences set at 100 (6) (Fig. 7A). The correlation between the 12 genes in the predictive model and the infiltration of immune cells was assessed by creating a correlation heatmap showing the Pearson correlation coefficients of the sample immune cells selected by CIBERSORT and the 12 prognosis-related genes (Fig. 7B).
Because the prognosis of NB patients was related to differences in gene expression levels and none of the 12 prognosis-related genes was highly correlated with immune cells, the protooncogene POLR2H and the tumor suppressor gene DYNC1I2, which account for the highest proportion of risk scores, were included in subsequent analyses. The 142 TARGET-NBL samples were uniformly clustered according to the expression matrix and the samples were split into two clusters. Cluster 1 consisted of 64 patients with low POLR2H and high DYNC1I2 expression, whereas cluster 2 consisted of 78 patients with high POLR2H and low DYNC1I2 expression (Fig. 8A). Because these two genes showed opposite trends in the two clusters, their correlation was assessed, with Spearman correlation analysis showing that POLR2H and DYNC1I2 expression were negatively correlated (-: p = 1.6e-4 r = -0.31) (Fig. 8B).
Kaplan-Meier analysis showed that the survival curves of patients in the two clusters differed significantly (Fig. 8C), with the risk of death being significantly lower in cluster 1 than in cluster 2 (HR = 2.68, P <0.001). A Sankey diagram shows the relationship between clinical data and survival of each patient (Fig. 8D).
Differential gene enrichment analysis
To explore the differentially expressed genes and their functions in the two groups of patients, differential analysis of patients in clusters 1 and 2 was performed using as criteria |log2FC|>1 and adjusted P<0.05. This analysis showed that 52 genes were up-regulated and 20 were down-regulated (Fig. 9A). GO enrichment analysis showed that the main enrichments were in categories that included neurotransmitter transport, regulation of neurotransmitter levels, transport vesicles, synaptic vesicle cycle, vesicle-mediated transport in synapses, response to xenobiotic stimuli, exocytic vesicles, monoamine transport, amine transport, synaptic vesicles, organic hydroxy compound transport, exocytosis, cell junction assembly, signal release, axon development, catecholamine transport, regulation of exocytosis, responses to metal ions, axonogenesis, synapse organization, and metal ion transmembrane transporter activity (Fig. 9B).
Tumor purity
To investigate the role of tumor immunity in NB patients, ESTIMATE was performed to evaluate stromal and immune cell fractions. Cluster 1 had higher tumor purity than cluster 2, whereas cluster 2 had a higher interstitial score (matrix content), immune score (degree of immune cell infiltration), and ESTIMATE score (matrix and immune synthetic marker). The differences were not statistically significant, however, possibly because the sample sizes were small (Fig. 10).
Comparison of immune infiltration
CIBERSORT analysis showed that the proportion of M1 macrophages was higher in cluster 2 than in cluster and that dendritic cells in cluster 2 were dormant (Fig. 11A, Fig. 13D). In comparison, ssGSEA showed that the proportions of CD56bright natural killer cells, MDSCs and plasmacytoid dendritic cells were higher, and the proportion of immature dendritic cells lower, in cluster 2 than in cluster 1 (Fig. 11B, 13E).
GSEA enrichment analysis
Functional differences between the two clusters were assessed by GSEA of all differentially expressed genes (Cluster 2 vs. Cluster 1). Many important pathways found to be enriched in the MSigDB collection (h.all.v7.4.symbols.GMT) were validated in cluster 1 of the E-MTAB-8248 dataset (Fig.12A, Fig. 13F), including genes associated with protein secretion, bile acid metabolism, UV response DN, and hedgehog signaling. Genes enriched in cluster 2 (Fig. 12B, Fig. 13G) included those associated with E2F targeting, the MYC target V2, the G2M checkpoint, and MYC target V1.
Table 4. TARGET-NBLGSEA results
Group1
NAME
|
SIZE
|
NES
|
NOM p-val
|
FDR q-val
|
RANK AT MAX
|
LEADING EDGE
|
HALLMARK PROTEIN SECRETION
|
95
|
1.79
|
0.00
|
0.00
|
6596
|
tags=60%, list=34%, signal=90%
|
HALLMARK BILE ACID METABOLISM
|
112
|
1.71
|
0.00
|
0.01
|
3963
|
tags=35%, list=20%, signal=43%
|
HALLMARK UV RESPONSE DN
|
141
|
1.63
|
0.00
|
0.01
|
3511
|
tags=40%, list=18%, signal=49%
|
HALLMARK MITOTIC SPINDLE
|
198
|
1.61
|
0.00
|
0.01
|
5138
|
tags=37%, list=26%, signal=50%
|
HALLMARK HEDGEHOG SIGNALING
|
36
|
1.60
|
0.01
|
0.01
|
4004
|
tags=47%, list=20%, signal=59%
|
Group2
NAME
|
SIZE
|
NES
|
NOM p-val
|
FDR q-val
|
RANK AT MAX
|
LEADING EDGE
|
HALLMARK TNFA SIGNALING VIA NFKB
|
199
|
-3.42
|
0.00
|
0.00
|
2838
|
tags=53%, list=14%, signal=62%
|
HALLMARK MYC TARGETS V2
|
58
|
-3.02
|
0.00
|
0.00
|
2903
|
tags=57%, list=15%, signal=67%
|
HALLMARK EPITHELIAL MESENCHYMAL TRANSITION
|
198
|
-2.83
|
0.00
|
0.00
|
3451
|
tags=55%, list=18%, signal=66%
|
HALLMARK INFLAMMATORY RESPONSE
|
200
|
-2.47
|
0.00
|
0.00
|
3342
|
tags=44%, list=17%, signal=52%
|
HALLMARK INTERFERON GAMMA RESPONSE
|
198
|
-2.39
|
0.00
|
0.00
|
4668
|
tags=50%, list=24%, signal=65%
|
HALLMARK INTERFERON ALPHA RESPONSE
|
95
|
-2.28
|
0.00
|
0.00
|
3260
|
tags=45%, list=17%, signal=54%
|
HALLMARK ALLOGRAFT REJECTION
|
196
|
-2.18
|
0.00
|
0.00
|
5383
|
tags=48%, list=27%, signal=65%
|
HALLMARK MYC TARGETS V1
|
196
|
-2.17
|
0.00
|
0.00
|
6321
|
tags=50%, list=32%, signal=73%
|
HALLMARK P53 PATHWAY
|
195
|
-2.17
|
0.00
|
0.00
|
2971
|
tags=37%, list=15%, signal=43%
|
HALLMARK HYPOXIA
|
197
|
-2.07
|
0.00
|
0.00
|
2917
|
tags=35%, list=15%, signal=40%
|
HALLMARK ANGIOGENESIS
|
36
|
-2.01
|
0.00
|
0.00
|
3232
|
tags=50%, list=16%, signal=60%
|
HALLMARK IL2 STAT5 SIGNALING
|
198
|
-1.90
|
0.00
|
0.00
|
3967
|
tags=41%, list=20%, signal=51%
|
HALLMARK APOPTOSIS
|
160
|
-1.89
|
0.00
|
0.00
|
3919
|
tags=39%, list=20%, signal=48%
|
HALLMARK IL6 JAK STAT3 SIGNALING
|
87
|
-1.85
|
0.00
|
0.00
|
5668
|
tags=60%, list=29%, signal=84%
|
HALLMARK REACTIVE OXYGEN SPECIES PATHWAY
|
49
|
-1.84
|
0.00
|
0.00
|
5260
|
tags=43%, list=27%, signal=58%
|
HALLMARK DNA REPAIR
|
148
|
-1.72
|
0.00
|
0.00
|
4456
|
tags=36%, list=23%, signal=47%
|
HALLMARK UNFOLDED PROTEIN RESPONSE
|
109
|
-1.70
|
0.00
|
0.00
|
2425
|
tags=25%, list=12%, signal=28%
|
HALLMARK KRAS SIGNALING UP
|
198
|
-1.61
|
0.00
|
0.00
|
2553
|
tags=27%, list=13%, signal=31%
|
HALLMARK UV RESPONSE UP
|
156
|
-1.60
|
0.00
|
0.01
|
3770
|
tags=35%, list=19%, signal=43%
|
HALLMARK MYOGENESIS
|
199
|
-1.50
|
0.00
|
0.01
|
3904
|
tags=30%, list=20%, signal=37%
|
HALLMARK TGF BETA SIGNALING
|
54
|
-1.49
|
0.02
|
0.02
|
3020
|
tags=37%, list=15%, signal=44%
|
HALLMARK MTORC1 SIGNALING
|
196
|
-1.46
|
0.00
|
0.02
|
5551
|
tags=40%, list=28%, signal=55%
|
HALLMARK ESTROGEN RESPONSE EARLY
|
198
|
-1.33
|
0.00
|
0.06
|
3399
|
tags=30%, list=17%, signal=36%
|
HALLMARK ESTROGEN RESPONSE LATE
|
197
|
-1.33
|
0.04
|
0.06
|
3834
|
tags=32%, list=20%, signal=40%
|
HALLMARK GLYCOLYSIS
|
198
|
-1.28
|
0.01
|
0.08
|
2552
|
tags=20%, list=13%, signal=22%
|
HALLMARK COAGULATION
|
138
|
-1.22
|
0.06
|
0.12
|
3132
|
tags=26%, list=16%, signal=31%
|
HALLMARK E2F TARGETS
|
198
|
-1.21
|
0.06
|
0.13
|
5925
|
tags=38%, list=30%, signal=54%
|
HALLMARK NOTCH SIGNALING
|
32
|
-1.15
|
0.23
|
0.19
|
3802
|
tags=34%, list=19%, signal=43%
|
HALLMARK COMPLEMENT
|
200
|
-1.14
|
0.10
|
0.20
|
4563
|
tags=36%, list=23%, signal=46%
|
HALLMARK CHOLESTEROL HOMEOSTASIS
|
73
|
-1.06
|
0.30
|
0.34
|
2636
|
tags=25%, list=13%, signal=28%
|
HALLMARK OXIDATIVE PHOSPHORYLATION
|
185
|
-1.06
|
0.26
|
0.35
|
5513
|
tags=38%, list=28%, signal=52%
|
HALLMARK G2M CHECKPOINT
|
194
|
-1.02
|
0.40
|
0.44
|
4183
|
tags=24%, list=21%, signal=30%
|
E-MTAB-8248 Validation of Differences in Immune Characteristics in the Two Clusters
A total of 223 NB samples from E-MTAB-8248 in the ArrayExpress database were divided into two clusters, in the same manner as the division of the TCGA dataset (Fig. 13A). The distribution of POLR2H and DYNC1I2 expression in the two clusters was similar to that observed in the TCGA dataset. The expression of POLR2H and DYNC1I2 showed a significant negative correlation (R = -0.60, P <0.001; Fig. 13B). Kaplan-Meier analysis ,HR=0.23 P<0.001(Fig. 13C), and analysis by CIBERSORT (Fig. 13D), ssGSEA (Figs. 13D, 13E) and GSEA (Figs. 13F, 13G, Table 5) yielded similar results.
Table 5. E-MTAB-8248GSEA results
Group1
NAME
|
SIZE
|
NES
|
NOM p-val
|
FDR q-val
|
RANK AT MAX
|
LEADING EDGE
|
HALLMARK PROTEIN SECRETION
|
91
|
1.88
|
0.00
|
0.00
|
5166
|
tags=45%, list=26%, signal=61%
|
HALLMARK HEDGEHOG SIGNALING
|
35
|
1.88
|
0.00
|
0.00
|
3785
|
tags=43%, list=19%, signal=53%
|
HALLMARK UV RESPONSE DN
|
133
|
1.60
|
0.00
|
0.04
|
4484
|
tags=37%, list=23%, signal=47%
|
HALLMARK BILE ACID METABOLISM
|
112
|
1.49
|
0.00
|
0.09
|
4341
|
tags=39%, list=22%, signal=50%
|
HALLMARK HEME METABOLISM
|
183
|
1.47
|
0.00
|
0.09
|
4784
|
tags=34%, list=24%, signal=44%
|
HALLMARK APOPTOSIS
|
155
|
1.37
|
0.03
|
0.17
|
5363
|
tags=41%, list=27%, signal=55%
|
HALLMARK FATTY ACID METABOLISM
|
151
|
1.35
|
0.03
|
0.16
|
6111
|
tags=45%, list=31%, signal=65%
|
Group2
NAME
|
SIZE
|
NES
|
NOM p-val
|
FDR q-val
|
RANK AT MAX
|
LEADING EDGE
|
HALLMARK E2F TARGETS
|
179
|
-2.19
|
0.00
|
0.00
|
5256
|
tags=46%, list=26%, signal=62%
|
HALLMARK MYC TARGETS V2
|
52
|
-2.16
|
0.00
|
0.00
|
5509
|
tags=60%, list=28%, signal=82%
|
HALLMARK G2M CHECKPOINT
|
178
|
-2.14
|
0.00
|
0.00
|
5039
|
tags=47%, list=25%, signal=62%
|
HALLMARK MYC TARGETS V1
|
180
|
-1.41
|
0.01
|
0.08
|
5905
|
tags=34%, list=30%, signal=48%
|
Verification of clinical samples
Levels of expression of POLR2H and DYNC1I2 in primary NBs
The levels of expression of POLR2H and DYNC1I2 in tumor tissue of 17 patients with NB were analyzed by RT-qPCR. Based on the expression of POLR2H and DYNC1I2 mRNAs, the patients could be divided into two clusters, with POLR2H expression being higher and DYNC1I2 significantly lower (P <0.001) in group 2 than in group (Fig. 15A, 15C). Kaplan-Meier analysis showed that the risk of death was significantly lower in group 1 than in group 2 (HR = 9.37 P <0.05) (Fig. 15B). A Sankey diagram shows the relationship between clinical features and prognosis in the 17 patients with NB (Fig. 15D).