Somatic genomic alterations in SCLC patients from eastern and western populations
We then extracted mutational signatures of these 171 SCLC patients by decomposing matrix of nucleotide substitutions according to the previous study using pan-cancer analysis18. Among the three signatures enriched in SCLC, signature 4 showed transcriptional strand bias for C>A mutations (Fig. 1c), which is known to be caused by tobacco smoking18. This signature is probably an imprint of the bulky DNA adducts generated by polycyclic hydrocarbons found in tobacco smoke and their removal by transcription-coupled NER. Signature 13 has been proved to be attributed to the APOBEC family of cytidine deaminases. As APOBEC activation constitutes part of the innate immune response to viruses and retrotransposons, this signature may represent collateral damage on the human genome from a response originally directed at retrotransposing DNA elements or exogenous viruses18.
In general, mean copy number of each gene in three datasets showed similar patterns (Fig. 1d), indicating availability of integration of CNV data. We next combined 158 samples with both mutational data and CNV data to calculate gene alteration ratios in the four pathways (Fig. 1e). Alteration ratios of each pathway in three datasets were similar based on at least one gene change in mutation or copy number. Besides the high mutation in tumors suppressor genes, such as TP53, RB1 and NOTCH family members, many oncogenic genes also showed dramatic amplification in SCLC (Fig. 1e).
Gene co-expression network modules revealed four SCLC subtypes
Although much attention has been put on the heterogeneity of SCLC, most of these studies employed cell lines and mouse models. To take advantage of transcriptomic data of clinical specimens, we collected another two SCLC cohorts, one with 81 specimens6 and the other with 31 specimens4, and integrated them with 19 Chinese SCLC specimens after batch effect removal (see Methods), with the exclusion of two outliers. We carried out weighted gene co-expression network analysis (WGCNA) to reconstruct a topological similarity network, from which 12 coherent gene modules were extracted. Based on module eigenvectors, the 129 samples formed an unsupervised clustering tree with four major clusters (Fig. 2a). Samples from different batches were evenly distributed in four clusters, suggesting the remaining batch differences are minor after correction for the batch effect.
We then carried out Kaplan-Meier survival analysis between clusters, and found patients of cluster 4 had a significantly poorer prognosis than other three subgroups, with no significant difference among cluster 1, cluster 2 and cluster 3, indicative of cluster 4 as a more malignant subtype (Fig. 2b). The previous classification of SCLC was based on molecular makers, including ASCL1, NEUROD1, POU2F3 and YAP112. Borromeo et al. claimed that neuroendocrine genes ASCL1 and NEUROD1 can reveal tumor heterogeneity and determine the distinct subtypes of SCLC11. The transition from ASCL1-positive SCLC to NEUROD1-positive SCLC was considered as a transition from the classic subtype to a variant subtype. To align our subtyping system with previous work, we evaluated the expression of ASCL1, NEUROD1 and their inhibitor NOTCH. Samples in cluster 2 had a higher level of ASCL1 expression, whereas cluster 3 had a higher expression of NEUROD1, suggesting that cluster 2 and cluster 3 correspond to the classic and variant subtypes of SCLC, respectively (Fig. 2g). Cluster 1 showed a lower expression of both ASCL1 and NEUROD1, and higher expression of NOTCH genes19, representative of a novel non-classic, or variant, SCLC subtype. The remaining cluster 4 had significantly higher expression of CCSP than any other cluster. CCSP is a main product of club cells, which are especially abundant in the peripheral airways where they contribute to maintenance of airway integrity and repair 20. It is suggested that cluster 4 may have different cells of origin. Moreover, the cluster 4 had a poorer survival.
To further explore the characteristics of the subgroups, we correlated the WGCNA modules with subgroups by calculating the Pearson correlation coefficient (Supplementary Fig. 2a). Then, we selected the most relevant module for each subgroup and analyzed the subgroup-specific gene functions (Supplementary Fig. 2b). The cluster 1 specific module (ME5) was enriched in the LPS/IL-1-mediated inhibition of RXR function, IL-10 signaling, and T cell receptor signaling pathways (IPA canonical pathways, p<0.01), suggestive of a strong connection of cluster 1 with immune responses. Gene set enrichment analysis (GSEA) showed the unique up/down-regulated pathways in the four SCLC subtypes (Fig. 2c). Multiple neuronal-related pathways were enriched in the neuroendocrine SCLC subtypes (cluster 2 and cluster 3) (Fig. 2d, e). Interestingly, the pathway of error-prone translesional synthesis (TLS) was specifically up-regulated in cluster 2 (Fig. 2d), which is known to underlie the mutagenic effects of numerous anticancer agents and relevant to drug resistance 21. In contrast to cluster 1, down-regulated pathways of cluster 4 were mainly related to immunity (Fig. 2c, f). We named cluster 2 as the classical subtype, cluster 3 as variant 1 subtype, and cluster 4 as variant 2 subtype
Application of the Immune Checkpoint Blockade in the Immune subtype (cluster 1)
Cluster 1 was characterized by elevated immune cell signaling based on the analysis of the gene expression data. We then named cluster 1 as “Immune subtype”. GSEA demonstrated several immune cell receptor signaling pathways (Fig. 2c), up-regulated genes in comparison of Th1 cells versus Th17 cells (GSE11924 Gene set), and IL-12 signaling in Immune subtype versus other subtypes (Fig. 3b). Differential expression profiling revealed that the Immune subtype was enriched for both activated immune cells and immuno-stimulators (Fig. 3c). Genes mediating immunosuppressive functions (PD1, IL10RA, LAG3) were up-regulated (Fig. 3a). Notably, PD-L1 is potential biomarkers for sensitivity to immune-checkpoint-based therapies. We also assessed the helper T cell composition by comparing gene products of Th1 and Th2 (Supplementary Fig. 3). The ratios of IFNGR/IL4R and IFNGR/IL6R were lower in cluster 1, suggestive of a potential deviation from Th1 to Th2, which may further confer immune escape22. FOXP3, which is the marker gene of regulatory T cells (Tregs)23, was also overexpressed in cluster 1 (Fig. 3a). Furthermore, estimation of abundance of immune cells using ImmuCellAI24 showed that B cell, iTreg, dendritic cells, gamma delta T cells, natural killer T cells and Tfh cells were significantly higher in the Immune subtype, whereas Th1/Th2 ratio was significantly reduced (Fig. 3d). Tumor-infiltrating immune cells are considered to be primary immune signatures and strongly associated with the clinical outcomes of immunotherapies25. Thus, the mechanisms by which these tumors achieved immune escape likely involve the recruitment of immune suppressive cells or the activation of immune checkpoint molecules. In addition, we found that POU2F3, ANXA1, MYC and ASCL2 were expressed extremely high in the immune subtype versus other samples (Fig. 6a, b), and thus could be used as molecular biomarkers to distinguish this immune subtype.
Significant alterations of four subtypes on WES level
To further explore specific alterations between four SCLC subtypes on WES level, we collected 115 WES mutational samples and 98 CNA samples corresponding to the four clusters at the RNA-seq level4, 6. Significantly mutated genes based on prop.test with certain functions in each subtype were shown in waterfall plot (Fig. 4a). Lawson et al. identified a core set of 182 genes across mouse cancer models related to cancer-intrinsic cytotoxic T lymphocytes (CTLs) evasion26, but these genes were not on our SCLC mutation list (Supplementary Data 2). On average, samples in classic subtype had more mutated genes (Fig. 4b). In contrast, variant 2 subtype had the least average mutated genes and lowest TMB (Fig. 4b, c). Interestingly, NEUROD1 was only mutated in Immune subtype, whereas NOTCH1 had higher mutation ratio in other subtypes than Immune subtype (Fig. 4a). The RASGRF1/DENND1B/ARHGEF5/ITSN2/SYTL5 as a gene set, participated in several pathways correlated with guanyl-nucleotide exchange factor activity (Fig. 4a). CD163 had 18.5% mutation frequency in Immune subtype, which was considered as the most specific marker of tumor-associated macrophages (TAMs) in tumor tissues27, 28. Classical and Variant 1 subpopulations had more mutations in neuroendocrine related genes. TIE1 was observed mutated in Variant 2 subtype (Fig. 4a), which is an essential component of the angiopoietin signaling system that has potential for therapeutic targeting in disease29. The endothelial angiopoietin/Tie (ANG/Tie) system regulates angiogenesis during development and tumor growth, contributes to capillary-to-venous remodeling in inflammation, and maintains vascular integrity30.
In general, we found differential functional features of four subtypes at CNA level. Circos heatmap depicted median gene copy number in each subtype. Immune, Classic, Variant 1 and Variant 2 subtype were inwards apart (Fig. 4d). Significantly amplified genes with high alteration ratio (>20%) in each subtype were used for pathway enrichment analysis (Supplementary Data 3) (Fig. 4e). It is showed that response to oxidative stress was observed in Classic subtype. Besides the well-recognized effect of ROS in mutation and cancer cell growth, growing evidence suggests its involvement in drug resistance 31. Of note, significant amplified genes in Immune subtype were enriched in several immune-related pathways (Fig. 4e). The PI3K events in ERBB4 signaling were observed in Variant 2 subgroup (Fig. 4e). Moreover, in addition to the transcriptomic signatures, specific genomic alterations were also significantly enriched in each subtype, particular in the immune subtype (Fig. 4).
Identification of four subtypes of SCLC using a machine learning model
We next build a random forest model to classify four subtypes based on transcriptomic data. Random forest algorithm is a kind of classical machine learning algorithm based on statistical learning theory. It combines bootstrap resampling method with decision tree algorithm, and can evaluate the importance of features while building models. We used 460 differential expressed genes in 80% samples to construct Pearson correlation coefficient network which were clustered into 20 modules (Fig. 5b) (Supplementary Data 4). Mean gene expression value of each module was used as input features for the training (Fig. 5a, Methods). Fourteen features were chosen as the final classifier since the error of cross validation was relatively stable (Fig. 5c). High positive margin indicates correct classification of four SCLC subtypes (Fig. 5d). The prediction results with cross validation showed that this model achieved 99% accuracy to distinguish testing samples.
We further tested our model on an independent data set consisted of 79 SCLC patients with their gene expression data32. We found that 6 patients were predicted as Immune subtype, 15 patients were predicted as classic subtype, 57 patients were predicted as variant 1 subtype, and 1 patient was predicted as variant 2 subtype. The expressions of the marker genes were consistent with each predicted subtype (Fig. 5e), indicative of the overall effectiveness of our classification model.
Validation of Immune subtype using POU2F3 antibody in the immunotherapy patient cohort
In order to find the unique marker for Immune subtype, we performed the analysis of the differential expression genes between Immune subtype and other groups. POU2F3 stood out as the most significantly up-regulated gene in the Immune subtype (Fig. 6a, b). To identify the Immune subtype of SCLC clinically, we collected 28 SCLC specimens and performed immunohistochemical staining of the candidate. All the 28 SCLC patients were resistant to chemotherapy and given immunotherapy or chemo-immunotherapy for more than two cycles (every 21 days for one cycle). Therefore, we can evaluate properly the effectiveness of immunotherapy.
The immunohistochemical staining of POU2F3 showed that the patients with high expression level (score >= 100) are sensitive to immunotherapy (Fig. 6c; Supplementary Data 5). According to Marina et al.33, ASCL1, NEUROD1, YAP1, POU2F3 expression have different distribution in 174 SCLC patient samples by immunohistochemistry, and POU2F3 was expressed in 7% of SCLC. It appeared that POU2F3 was expressed in 43% SCLC chemotherapy-resistant patients. Interestingly, 100% (12/12) of the patients with POU2F3-high tumors had an significantly improved response following immunotherapy (Fig. 6d), and p value was 0.013 when compared with POU2F3-low tumors. High F1 score (75%) and AUC (0.872) were achieved (Fig. 6e), indicating the good ability of this classification. In addition, 2 out of the 12 cases with POU2F3-high level showed dramatical regression of the intrapulmonary tumors (Fig. 6f and Supplementary Fig. 4a. These results support that the Immune subtype identified here is indeed sensitive to immunotherapy and the POU2F3 expression could serve as an biomarker for effective immunotherapy.