Characterisation of molecular subtypes in MYCN non-amplified neuroblastomas.
A total of 3,799 samples from 18 datasets were identified in GEO (Gene Expression Omnibus) and ArrayExpress, in which 714 cases are with MYCN gene amplification (MYCN-AMP) and 3,085 cases MYCN non-amplified (MYCN-normal) (Fig. 1a; Supplementary Table 1). Following removal of batch effects (Supplementary Fig. 1a), two clear clusters corresponding to MYCN-AMP and MYCN-normal neuroblastomas, respectively, were visualised using principal component analysis (PCA) (Supplementary Fig. 1b). Samples in the MYCN-normal group (n = 3,085) were further randomly divided into a train and a test group with a 7:3 ratio, containing 2,160 and 925 cases, respectively.
In an unbiased attempt to identify subtypes within MYCN non-amplified neuroblastomas, we applied consensus clustering to both train and test groups based on 5,792 variable genes (top 50% median absolute deviation; Supplementary Table 2). As determined by the relative area under the cumulative distribution function and cluster-consensus scores, the optimal number of distinct clusters was 3 (Fig. 1b; Supplementary Fig. 1c), which was further confirmed in another 3 independent datasets (TARGET microarray, TARGET RNA-seq and GSE49711) (Supplementary Fig. 1d). To ensure precise clinical and molecular characterisations of subgroups, samples with silhouette width values less than 0 were labeled as not defined17 and were excluded from the further analysis (Supplementary Fig. 1e; grey dots in Fig. 1c). In total, within the train group, subgroup 1 (green), 2 (blue) and 3 (purple) accounts for 60%, 18% and 22%, respectively (Fig. 1c). Cross-cohort analysis using an unsupervised method subMap (https://www.genepattern.org/modules) confirmed the robustness of this classification (Supplementary Fig. 2a; false discovery rate, FDR < 0.05).
Further clinical characterisation of these subtypes identified key distinguishing features. Patients within subgroup 2 were frequently observed in the advanced neuroblastomas according to the International Neuroblastoma Staging System (INSS) and in those defined as "high risk"7 (Fig. 2a and b; Supplementary Fig. 2b). We then analysed their overall survival together with MYCN-AMP cases. Patients with MYCN amplified had the worst prognosis (Fig. 2c and d; Supplementary Fig. 2c). Importantly, there was a high degree of variability for overall survival among MYCN non-amplified cases, in which subgroup 2 was associated with a poor prognosis, followed by subgroup 3; while patients within subgroup 1 had the most favorable outcomes. These observations were consistent in both train and test cohorts, and further 3 independent datasets (TARGET microarray, TARGET RNA-seq, and GSE49711) (Fig. 2c and d; Supplementary Fig. 2c). In addition, the molecular subtype classification was a strong independent predictor of mortality including in multivariate analysis with the risk classification that uses commonly measured clinical variables to predict mortality in neuroblastomas7. Using subgroup 1 as a reference, the hazard ratio (HR) and 95% confidence interval (CI) for subgroup 2 and 3 were 24.4 (5 ~ 120) and 9.7 (2 ~ 46), respectively (Fig. 2e). Similar results were obtained using univariate or multivariate cox regression analysis with age and INSS stages in MYCN non-amplified neuroblastomas (Supplementary Table 3). Impressively, the molecular subtype classification alone outperformed the risk classification in general (Fig. 2f) as well as INSS stages (Supplementary Fig. 2d).
Overall, subgroup 2 and subgroup 3 (to a lesser extent) were associated with poor survival in MYCN non-amplified neuroblastomas, suggesting fundamentally different mechanisms leading to an advanced disease.
Defining molecular features of 3 subtypes in MYCN non-amplified neuroblastomas.
Using the same 5,792 variable genes described above (Supplementary Table 2), we observed clear distinctions among these 3 subtypes in MYCN non-amplified neuroblastomas (Fig. 3a; Supplementary Tables 4). Intriguingly, subgroup 2 showed a similar signature to MYCN-AMP cases (Fig. 3a). This was in consistence with the Gene Set Enrichment Analysis (GSEA), showing HALLMARK_MYC_TARGETS_V1 and V2 significantly enriched in subgroup 2 (Fig. 3b; Supplementary Table 5; FDR (false discovery rate) < 0.001 and FDR = 0.009, respectively). In contrast, subgroup 3 exhibited an "inflamed" phenotype, with high expression of genes related to IL2_STAT5_SIGNALING, IL6_JAK_STAT3_SIGNALING, INTERFERON_ALPHA_RESPONSE and INTERFERON _GAMMA_RESPONSE (Fig. 3b; Supplementary Table 5; all FDR values less than 0.05). None of these pathways were enriched in subgroup 1 (Fig. 3b).
The above analysis was extended using weighted gene co-expression network analysis (WGCNA)18. Six molecular modules were identified (Supplementary Fig. 3; Supplementary Table 6) and were further used to construct a protein-protein network consisting of 1,699 gene and 4,197 edges (Fig. 3c; confidence score > 0.7). Molecular module MEturquoise, which was significantly correlated with subgroup 2 (Fig. 3d), was enriched for "Mitotic roles of pole-like kinase", "NER (nucleotide excision repair) pathway", "Cell cycle control of chromosomal replication" and "eNOS (endothelial nitric oxide synthase) signaling". In subgroup 3, there were 2 molecular modules, MEblue and MEbrown highly involved (Fig. 3d; Supplementary Table 6). Molecular module MEblue was enriched for signaling, such as "CDK5 (cyclin-dependent-like kinase 5)", "ILK (integrin-linked protein kinase)", "Neuregulin" and "HIF1α (hypoxia inducible factor 1α)"' whereas "PD-1, PD-L1 cancer immunotherapy", "IL-8 signaling", "Natural killer cell signaling pathway", "Neuroinflammation", "Regulation of EMT (epithelial-mesenchymal transition)" and "Dendritic cell maturation" were significantly enriched in molecular module MEbrown.
Subgroup 2 shows a " MYCN " signature, potentially induced by Aurora Kinase A (AURKA) overexpression.
Our above analysis suggested that mechanism other than gene amplification induces N-MYC activity in subgroup 2. Indeed, the mRNA level of MYCN in subgroup 2 was significantly lower than cases within MYCN-AMP group (Fig. 4a; Supplementary Table 4; p < 0.0001). To evaluate N-MYC activity in neuroblastoma samples, a total of 87 genes upregulated by N-MYC were selected to classify its activity19. The MYCN-score for each sample was calculated using single-sample gene set enrichment analysis (ssGSEA) based on this 87-gene expression signature. MYCN scores in subgroup 2 were significantly higher than those in subgroups 1 and 3, and were comparable to those in MYCN-AMP group, although slightly lower (Fig. 4b). Moreover, the MYCN score was an independent predictor of mortality including in multivariate analysis with the risk classification (Fig. 4c; HR: 3.8; p = 0.008).
To investigate the potential mechanism that leads to higher MYCN scores in subgroup 2, correlation analysis coupled with protein protein interactions (PPI) network construction was performed (Fig. 4d; Supplementary Table 7). Among the candidate genes, AURKA (Aurora kinase A) was identified to interact with MYCN. AURKA, a serine/threonine kinase regulating the process of mitosis20, was previously demonstrated to regulate N-MYC protein stability21. AURKA was expressed at significantly higher levels in subgroup 2 when compared to the other 2 subgroups and its levels were even slightly higher than those in the MYCN-AMP group (Fig. 4e). Classifying MYCN non-amplified neuroblastomas into high and low groups, we demonstrated that the AURKA mRNA levels alone could predict the overall survival (Fig. 4f; HR 5.7; p < 0.0001). In addition, the high level of AURKA was an independent predictor (HR 6.6, p = 0.003) of mortality including in multivariate analysis with the risk classification (Supplementary Fig. 4).
These findings were further investigated by immunohistochemistry (IHC) staining of N-MYC or AURKA in a custom neuroblastoma tissue microarray, which contains 94 MYCN non-amplified neuroblastomas. Within them, 22 samples were positive for N-MYC (Fig. 5a), and they had worse survival compared to those with N-MYC negative staining (n = 72) (Fig. 5b; p = 0.03). In parallel, patients with higher levels of AURKA had unfavorable survival outcomes (Fig. 5c and d; p = 0.00014). Moreover, a higher percentage of patients with high AURKA staining was observed in the N-MYC-positive group when compared to it in the N-MYC-negative group (Fig. 5e; 64% vs. 39%; p = 0.041).
Taken together, these results suggested that a "MYCN" signature in subgroup 2 is potentially induced by AURKA overexpression in MYCN non-amplified neuroblastomas.
Subgroup 3 is accompanied by an "inflamed" gene signature.
Considering immune-related pathways were enriched in subgroup 3, the activity of immune cells and pathways were further systematically explored. ssGSEA was performed to calculate enrichment scores of 46 immune gene sets summarised from two previous studies22, 23, and subgroup 3 showed a significantly higher activity of immune cells and pathways compared to the other 2 subtypes as well as MYCN-AMP group (Fig. 6a; Supplementary Table 8). Consistently, MHC-1 (major histocompatibility complex-1) or cytolytic activity (CYT) scores were highest in subgroup 3 (Fig. 6b and c). This was also true when using the ESTIMATE algorithm to evaluate the immune scores, stromal scores and tumour purity scores in neuroblastomas24, showing highest immune and stromal scores, and lowest tumour purity scores in subgroup 3 (Fig. 6d; Supplementary Fig. 5a and b).
For a comprehensive assessment of immune cell infiltration, we used CIBERSORTx deconvolution25 to quantify various immune populations based on a single cell RNA sequencing (scRNA-seq) dataset in MYCN non-amplified neuroblastoma26 (Supplementary Fig. 5c). While similar immune cell types were present in each subtype, the absolute number of several immune cell populations were markedly increased in subgroup 3, including B cells, myeloid cells, T cells and pDC (plasmacytoid dendritic cells) (Fig. 6e). Finally, to investigate whether subgroup 3 would benefit more from immunotherapy than the other subgroups, we compared the expression matrix of 3 subgroups with published melanoma datasets including response information after treating with immunotherapies27, 28.The subMap analysis highlighted that patients within subgroup 3 are predicted to respond to anti-PD1 immunotherapy (Fig. 6f; Supplementary Fig. 5d).
Taken together, these results demonstrated that subgroup 3 is accompanied by an "inflamed" gene signature, and is more likely to benefit from anti-PD1 therapies.
Integrative analysis of subgroup molecular features on drug response.
To investigate subgroup-specific druggable targets, we performed an integrative analysis to assess the associations between molecular features and the response to anticancer drugs in MYCN non-amplified neuroblastomas (see Supplementary Methods for details). These analyses generated 123 potential druggable targets (Supplementary Table 9). Of note, in subgroup 2, we identified 96 genes were that were overexpressed in subgroup 2 and showed high gene-dependency levels. The expression levels of these genes also negatively correlated (drug-sensitive) with the response to anticancer drugs. The top 15 genes in subgroup 2 were shown in Fig. 7a. These drugs target important pathways, including mitosis, metabolism, cell cycle and ERK/mitogen-activated protein kinase (MAPK) signalling (Fig. 7a; Supplementary Table 9). For example, AURKA was upregulated in subgroup 2, and AURKA expression negatively correlated (drug-sensitive) with the response to the cell cycle pathway inhibitor AZD7762 (correlation coefficient = − 0.57, p = 0.0054) and the NEDD8-activating enzyme (NAE) inhibitor Pevonedistat (correlation coefficient = − 0.60, p = 0.0031).
In subgroup 3, 8 genes were overexpressed and showed high gene-dependency levels (Fig. 7b). The expression levels of these genes also negatively correlated with the response to 18 anticancer drugs, which target pathways, such as WNT, protein stability and degradation, PI3K/mTOR signaling, DNA replication and EGFR signalling (Fig. 7b).
Together, these data suggest that each subgroup has unique therapeutic vulnerabilities, which emphasizes the importance of therapy stratification according to the molecular features.
Identification of independent predictors to subgroup patients within MYCN non-amplified neuroblastomas
To identify independent predictors for subgrouping, we applied a multi-cohort analysis pipeline via MetaIntegrator29 (see Supplementary Methods). In total, 43 genes (Table 1) displayed the ability to predict different subgroups accurately (Supplementary Fig. 6a; area under the curve (AUROC)Subgroup1 = 0.92; AUROCSubgroup2 = 0.94 and AUROCSubgroup3 = 0.99).
Table 1
Independent predictors to subgroup patients within MYCN non-amplified neuroblastomas.
Subgroup
|
Up
|
Down
|
Supgroup 1
|
SCN2A, LONRF2, FRS3, CPEB4, SNAP25, PMP22, OLA1, C14orf132
|
TMEM109, LDHA, NDE1, PIM2, MRPL11, TNFRSF10B, COLEC12, TAF10, ELL, SIVA1
|
Supgroup 2
|
KIF4A, COX8A, UHRF1, ODC1, SNRPD1, CDCA4, SKP2, HMGB3, FANCI, CDK2;
|
PLA2G4C, NDEL1, ANXA2, PLSCR3 ;
|
Supgroup 3
|
IFITM2, ANXA11, STAT5A, FYCO1, LMNA, RARRES3, PLSCR4, SLC10A3, TMCO4, AGA
|
KBTBD7
|
Furthermore, the machine learning classifier, support vector machine (SVM), was applied based on these 43 genes. SVM analysis showed good classifications across cohorts, with average AUROC of 0.931, 0.928, 0.759 and 0.954 in the test cohort, TARGET Microarray, TARGET RNA-seq and GSE49711, respectively (Fig. 8a). These independent predictors worked consistently between microarray and RNA-seq within GSE47792 (Fig. 8b).
Evaluation of different patient stratification strategies
Finally, we evaluated our subgrouping method (named Zhou_Subgroup) together with other reports. Recently, Westermann and colleagues reported 4 subgroups in neuroblastoma, including MYCN-amplified (MYCN), MYCN non-amplified high-risk (MNA-HR), MYCN non-amplified low-risk (MNA-LR) and mesenchymal (MES)30. With our method, patients within Westermann_MNA-HR can be further classified into 3 subtypes (Fig. 8c), showing different prognosis (Fig. 8d). This was also true for Westermann_ MNA-LR (Fig. 8c and d). A majority of cases in Westermann_ MES or MYCN belonged to subgroup 3 and 2, respectively (Fig. 8c).
Califano and colleagues classified high-risk neuroblastomas into 3 main subgroups (MYCNAmp, MYCNA), 11q-LOH (loss of heterozygosity) and mesenchymal (MES)31. In comparisions, in the GSE85047 microarray, all cases of Califano_ MYCNA were classified in subgroup 2. Most cases in Califano_ MES or Stage1 belonged to subgroup 3 and 1 respectively (Supplementary Fig. 6b). Interestingly, most cases in Califano_11q-LOH were classified in subgroup 2 (Supplementary Fig. 6b). Similar findings were observed in the TARGET microarray (Supplementary Fig. 6b).
van Groningen and colleagues reported that neuroblastoma is composed of two super-enhancer-associated differentiation states: an ‘ADRN’ subgroup showing up-regulated genes involved in adrenergic differentiation and an ‘MES’ subgroup with higher expressions of mesenchymal markers32. To quantify these characteristics, we calculated "ADRN" or "MES" scores of our subgroups based on a 369-gene "ADRN" signature or a 485-gene MES signature, respectively. We observed that subgroup 3 showed the highest "MES" scores and the lowest "ADRN" scores, consistent with our above findings; while subgroup 1 and 2 had the highest "ADRN" scores with the lowest "MES" scores in subgroup 2 (Supplementary Fig. 6c).
Together with other reports, our findings emphasised the extent of inner heterogeneity within MYCN non-amplified population and the importance of patient stratification.