Resources and Data sources
Software. Endnote X9 (Thomson Reuters) was used to manage references, R (version 4.2.2) was used for data analysis, Excel for Office 365 (Microsoft) was used for data storage, and CorelDRAW X8 (Corel) was used for Figure preparation.
Computational Resources. Analyses were run on a desktop computer with an Intel Core i9-10900L CPU (3.70 GHz, 10 cores, 20 threads) with 120 GB RAM running Windows 10 Pro (v21H2).
Public data sources. scRNA-seq data (Table S10) from Yu et al. were obtained from Gene Expression Omnibus (GSE117891)38; Neftel et al. from GSE13192839; Abdelfattah et al. from GSE18210937; Qazi et al. from GSE196583108. Richards et al. data was obtained from the Broad Institute Single-Cell Portal (SCP503)63. Spatial transcriptomics data of coronal section from 8-week male C57Bl/6 mice was from 10x Genomics (Adult Mouse Brain FFPE dataset). Bulk RNA-seq data from the Cancer Genome Atlas (TCGA) RNA-seq V2 data from the TCGA PanCancer Atlas were retrieved from the National Cancer Institute (NCI) Genomic Data Commons using TCGAbiolinks R package v.2.16.0; Chinese Glioma Genome Atlas (CCGA) from http://www.cgga.org.cn/download.jsp109; Glioma Longitudinal Analysis (GLASS) consortium from Synapse (http://synapse.org/glass)110; Zhao et al. PDL1 responder vs. non-responder data from SRA (PRJNA482620)48; and Cloughesy et al. from GEO (GSE121810)111.
Animal studies
Animal Studies. Animal use and studies were performed in accordance with guidelines outlined by Animal Use Protocols within the Division of Comparative Medicine at the University of Toronto and the McMaster University Central Animal Facility. Intracranial injections were conducted in 6-8-week-old male C57BL/6 mice as previously described using 106 cells per mouse112. Briefly, a small burr hole was drilled 2 mm behind the coronal suture and 3 mm to the right of the sagittal suture. Cells suspended in 10 µL PBS were injected intracranially using a Hamilton syringe (Hamilton, #7635-01) into the right frontal lobes. Animals were sacrificed at humane endpoint, defined loss of 15-20% of bodyweight from date of tumor engraftment, and a combination of hunched posture, decreased response when picked up, lack of grooming and loss of skin elasticity. Brains were removed, and right frontal pieces were snap frozen for sci-RNA-seq3 processing.
Cell lines
CT2A and GL261 glioma cells. CT2A cells were purchased from Millipore/Sigma-Aldrich (#SCC194) and GL261 cells were purchased from DSMZ (#ACC802). These lines were maintained in Dulbecco’s modified eagle medium (DMEM; Wisent, #319-005-en) supplemented with 10% fetal bovine serum (FBS; Gibco, #12483-020) and 0.1% penicillin-streptomycin (Gibco, #15140122). Cells were cultured at 37 °C, 5% CO2. Mycoplasma testing was routinely performed. To generate Cas9 knock-in CT2A and GL261 cell lines, cells were transduced with Lenti-Cas9-2A-Blast (Addgene, #73310) and selected with Blasticidin S HCl (Gibco, #A1113903) as previously described25. CT2A-Ova cell lines were generated via transduction with lenti-Ova, and sorted for high expression.
NK and CTL primary cells. Primary NK and CTL cells were isolated from OT-1 (C57BL/6-Tg(TcraTcrb)1100Mjb/J mice (Jackson Laboratory). Spleens were harvested, minced, and strained to obtain splenocytes. NK cells were then isolated through negative selection using the NK Cell Isolation Kit (Miltenyi Biotec, #130-115-818). NK cells were cultured in 10% RPMI with 1% penicillin-streptomycin, 100 ng/mL IL-2 and 55 mM 2-mercaptoethanol. Primary CTL cells were isolated, cultured, and activated as described previously25. In short, OT-1 CD8+ T cells were isolated using an antibody-based magnetic separation kit (Miltenyi, #130-096-543), and activated and expanded with CD3/CD28 beads (Myltenyi, #130-093-627).
Myeloid cell lines. RAW264.7 cells were purchased from MilliporeSigma, BV-2 cells from AcceGen Biotech, and J774A.1 cells from ATCC. RAW264.1 and J774A.1 were cultured in DMEM (Wisent, #319-005-CL) supplemented with 10% FBS (Gibco, #12483-020). BV-2 cells were cultured in RPMI-1640 (Wisent, #350-000-CL) supplemented with 10% FBS. Cell lines were maintained in humidified incubators at 37 °C and 5% CO2 and were routinely tested for mycoplasma contamination.
CRISPR-Cas9-edited cell lines. CRISPR-mediated gene knockouts in CT2A cell lines were generated by electroporation using the Neon Transfection System (Invitrogen, MPK10096) following the manufacturer’s instructions. In brief, Cas9 ribonucleoproteins (RNP) were prepared by combining 20 pmol single-guide (sg) RNA (Synthego) consisting of 3 sgRNA’s targeting the same gene with 20 pmol s.p. Cas9 nuclease (IDT #1081059) in 5 μL buffer R (Invitrogen) for 15 min at room temperature. sgRNA sequences are listed in Table S11. CT2A cells were lifted with trypsin (Gibco) and washed twice with Dulbecco’s phosphate buffered saline (DPBS, Wisent) and resuspended with buffer R. 200,000 cells were combined with 20 pmol Cas9 RNP in 12 μL buffer R and electroporated at 1200 V, for 2 pulses of 30 ms with 10 μL tips. 72 h after electroporation, cells were subjected to limiting dilution and single-cell clonal expansion. Genomic DNA from selected clones was extracted using the DNA Fast Extract kit (Wisent, #801-200-DR) and sgRNA target regions were PCR amplified with DNA 2X HS-Red Taq PCR mastermix (Wisent, #801-200-DM). Primer sequences are provided in Table S12. The PCR product was sequenced by Sanger sequencing. Confirmation of gene knockout was performed using ICE (https://ice.synthego.com/#/) to identify out-of-frame insertion-and-deletion mutations. CRISPR-mediated gene knockouts were also verified by western blot. Antibodies used are listed in Table S13.
Immunoblot analysis. To confirm gene knockouts in CT2A cells, cells were cultured to 90% confluency in 10 cm dishes. The cells were washed with DPBS and lysed with RIPA buffer (Thermo Scientific, #89901) supplemented with 1X protease inhibitor (Thermo Scientific, #78420) at 4°C. Protein quantification was done by Pierce BCA Assay (Life Technologies, #23225). Lysates were loaded on precast SDS-PAGE gels (Invitrogen, NP0321PK2) and subsequently transferred onto nitrocellulose membrane for detection. All primary antibodies were probed overnight at 4°C, and membranes were washed with Tris-buffered saline with Tween-20 (TBST; Cell Signaling Technology, #9997) and incubated with appropriate HRP-conjugated secondary antibodies for 1 h. Subsequently membranes were washed with TBST and the signal was detected with chemiluminescent substrate (Thermo Scientific, #34579) on an iBright Imaging system (Thermo Fisher Scientific).
Cell Proliferation Assay. Single cells were plated in a 96-well plate at a density of 1,000 or 200 cells/200 μL per well and incubated at 37°C and 5% CO2 for 5 days. 20 μL of Presto Blue (ThermoFisher, Cat.A13262), a fluorescent cell metabolism indicator, was added to each well four hours prior to reading out the assy. Fluorescence was measured using a FLUOstar Omega Fluorescence 556 Microplate reader (BMG LABTECH) with excitation and emission wavelengths of 544 nm and 590 nm, respectively. Readings were then analyzed using Omega analysis software. Proliferation was calculated for each well by subtracting the average RFI of blank wells from the RFI of individual wells. Normalized RFI was plotted for each well being tested as a side-by-side comparison of proliferation.
Immune cell killing assays. The killing efficiency of immune cells cocultured with CT2A cells were assessed across a range of effector-to-tumor ratios (E:T) and quantified as a percentages killing relative to untreated conditions. 24 h prior to killing assay, CT2A cells were incubated in 1 mM carboxyfluorescein succinimidyl ester (CFSE) dye (37 ºC × 10-20 min) and then cultured in complete medium overnight. On day of killing assay, CT2A cells were re-plated with immune cells at various E:T ratios. At 24 h endpoint, immune cells and dead CT2A cells were removed by gentle PBS wash. The remaining viable and adherent CT2A cells were assessed visually and by counting using a Coulter counter. For ADCP-mediated killing assays, killing efficiency was assessed by flow cytometry as before25. CFSE and anti-CD11a were used as CT2A and J774.1 cell markers, and double-positive cells were interpreted as phagocytosed CT2A cells. E:T ratios that achieved between 20-50% killing efficiency were selected for screening conditions.
Single cell transcriptomic analysis using sci-RNA-seq3
Sample processing, sci-RNA-seq3 library generation, and sequencing. Cells were harvested with 0.25% typsin-EDTA and neuron dissociation solution113, respectively. Cell pellets were immediately snap-frozen in liquid nitrogen and then stored at -80°C for sci-RNA-seq3 based single-nucleus RNA-Seq processing. Samples from all genotypes were processed together to minimize batch effects. Nuclei extraction and fixation were performed as previously described113, except for the use of a modified CST lysis buffer114 plus 1% of SUPERase In RNase Inhibitor (AM2696). Nuclei quality was checked with DAPI and Wheat Germ Agglutinin staining. Sci-RNA-seq3 libraries were generated as previously described using three-level combinatorial indexing113. The final libraries were sequenced on Illumina NovaSeq as follows: read 1: 34bp, read 2: 69bp, index 1: 10bp, index 2: 10bp.
Demultiplexing and read alignments. Raw sequencing reads were first demultiplexed based on i5/i7 PCR barcodes. FASTQ files were then processed using the sci-RNA-Seq3 pipeline113. After barcodes and unique molecular identifiers (UMIs) were extracted from the read1 of FASTQ files, read alignment was performed using STAR short-read aligner (v2.5.2b) with the mouse genome (mm10) and Gencode vM12 gene annotations. After removing duplicate reads based on UMI, barcode, chromosome and alignment position, reads were summarized into a count matrix of M genes × N nuclei.
Filtering. Raw count matrices were loaded into a Seurat object (version 4.0.1) and filtered to retain cells with (i) 200 – 9000 recovered genes per cell, (ii) less than 60% mitochondrial content, and (iii) unmatched rate between 0.11 to 0.27 (median ± 3 median absolute deviations).
Normalization. To normalize expression values, we adopted the modeling framework previously described and implemented in sctransform (R Package, version 0.3.2)115. In brief, count data were modelled by regularized negative binomial regression, using sequencing depth as a model covariate to regress out the influence of technical effects, and Pearson residuals were used as the normalized and variance stabilized biological signal for downstream analysis.
Integration. Data from each timepoint and replicate were integrated with the reciprocal PCA method (Seurat) using the top 2000 variable features.
Dimensional reduction. PCA was performed on the integrated dataset, and the top N components that accounted for 90% of the observed variance were used for UMAP embedding, RunUMAP(max_components = 2, n_neighbours = 50, min_dist = 01, metric = cosine).
Clustering and annotation. To identify cellular sub-populations, we performed clustering using the Louvain algorithm implemented in Seurat (resolution = 1.5). Cluster were assigned numerical identifiers that were ranked in descending order according to cluster size, such that the smallest identifier (i.e., 0) corresponding to the largest cluster size, and vice versa. Clustered populations were then annotated using spatial and scRNA-seq reference atlases. Using Seurat’s label transfer pipeline, we identified transfer anchors for each query-reference pair using FindTransferAnchors(normalization.method = ‘‘SCT’’, reference.reudction = ‘‘pca’’, dims = 1:50) and then mapped the query samples onto the reference atlas using MapQuery(reference.reduction = ‘‘pca’’, reduction.model = ‘‘umap’’). The resulting prediction scores were cross-validated with cluster-specific markers obtained from differential-expression analyses to inform cluster annotation.
Differential-expression analysis. Differential expression analyses were performed using the Wilcoxon [wilcoxauc(…) function, (presto R package, v 1.0.0)] and co-dependency index (CDI) methods [FindCDIMarkers(…) function, (scMiko R package, v. 1.0.0)]. Differentially expressed genes (thresholded at 5% FDR) were ranked by area under receiver operating characteristic curve (AUROC) or normalized CDI (nCDI) scores and the top 2-4 markers for each cell type cluster were used for visualization.
NMF gene program discovery. For each sample, NMF was performed across multiple rank parameters and gene programs that were consistently resolved within and between tumors across multiple ranks were retained as robust NMF programs. NMF workflow parameters are summarized in Table S11. The workflow described here was modified from work by Gavish and colleagues:55
- Expression matrix preparation: For each sample, genes expressed in 0.5-5% of cells were used for NMF analysis. Expression matrices were normalized and scaled using the sctransform workflow (see above), and negative values were truncated at zero to obtain a non-negative scaled expression matrix (NSM).
- Run NMF analysis: NMF analyses was performed for each NSM using nnmf(…, k = c(2,...,15), loss = “mse”, rel.tol = 1e-4, max.iter = 50) (NNLM R package, v 0.4.4). NMF was performed for rank parameters between k = 2-15 (Table S14). For each NMF run, we defined the resulting gene programs as the top 70-150 genes, ordered by factor loading.
- Identify component programs: We reasoned that certain NMF runs will yield non-informative decompositions, and that only a subset of NMF programs will be reproducible within and between samples. To ensure within-sample robustness, the Jaccard similarity between NMF programs was computed across all intra-sample ranks, and programs with similarities exceeding 0.2-0.7 across more than 2 ranks were retained. Next, to ensure between-sample robustness, the Jaccard similarities between the remaining NMF programs were computed between samples, and programs with similarities exceeding 0.15-0.5 in at least 2 samples were retained. The resulting NMF programs that passed the intra- and inter-sample filtering steps were termed component programs.
- Identify consensus programs: To obtain a non-redundant set of gene programs from the component programs, a Jaccard similarity matrix was computed for the remaining component programs, and hierarchically-clustered using Pearson correlation as the distance metric. Clustering parameters that separated the component programs were determined by visual inspection of the hierarchically-clustered Jaccard similarity heatmap. For each cluster, the consensus program was defined as the set of genes that were represented in at least 25-50% of component programs (Table S14).
Gene set enrichment analysis. To functionally-annotated gene sets, we performed hypergeometric overrepresentation analysis using the fora(…) (fgsea R package, v 1.14.0). Annotated gene sets used for enrichment analyses included GO ontology (biological processes, cellular components, molecular function) and those curated by the Bader Lab (http://baderlab.org/GeneSets).
Enrichment Network. Jaccard similarity between enriched pathways was used to account for gene set redundancies. Functional enrichment co-similarities were visualized using emapplot(…, edge.params = list(min = 0.2)) (enrichplot R package, v1.18.1).
Gene program activity. For each gene set, cell-level gene program activities were calculated using AddModuleScore(…) in Seurat. Where indicated, the programs were classified as “on” or “off” states by performing gaussian decomposition of gene program activity values. This was implemented using MClust(…, G = 1:5) (mclust R package, v 6.0.0) where parameterised finite gaussian mixture models were fit to a numeric vector of activity values and models were estimated by expectation-maximization algorithm initialized by hierarchical model-based agglomerative clustering. For each program, the optimal model was selected according to Bayesian information criterion (BIC).
Glioma-associated transcriptomic regulators. For each GBM scRNAseq dataset (overview of datasets in Table S10), cell-level gene program activities were calculated using i) a curated list of GBM- and tumor-associated gene panels and ii) TF-specific target gene sets consolidated from TRRUST v2116 and AnimalTFDB 3.0117. Machine-learning-based random forest regression models were then trained to identify which transcription regulators were associated with each GBM subtype. This was repeated across each GBM dataset, and the top transcription factors that were consistently associated with a GBM subtype were nominated as GBM glioma-associated transcription regulators (GTRs, 5% FDR). For each GBM transcriptomic phenotype, the top regulatory transcription factors were visualized in a bipartite network, where one layer of nodes represented GBM-specific gene programs and the other layer of nodes represented associated GTRs.
Relative abundance analysis. To compare the relative abundance of each cell type across conditions, counts for each cell type were tallied within each condition, and divided by the total number of cells profiled.
Differential abundance analysis. To evaluate regional differential abundances of cells in UMAP space across genotypes, we adopted the Milo method118. In brief, for each comparison between the CT2A and GL261-engrafted mouse samples, cells were first resampled to normalize cell counts, and then a K-nearest neighbor (KNN) graph representing higher-dimensional relationships between single cells was constructed. The KNN graph was then used to define neighborhoods of cells using the refined sampling scheme118. Finally, the number of cells belonging to each condition within each neighborhood was counted and the differential abundance was computed using a negative binomial generalized linear model. The differential abundance estimates for each neighborhood were visualized in UMAP space, with each node representing a given neighborhood (comprised of 20-80 cells each), and the color representing the differential abundance expressed as log fold-change between GL261 and CT2A samples. Non-significant differentials (FDR > 0.1) were truncated at zero.
Transcriptomic similarity. To evaluate the relative transcriptomic similarity of various human gliomas to murine gliomas, we adopted the transfer learning approach implemented in Seurat119. Specifically, murine glioma cells were mapped onto human GBM transcriptomic space by projecting the query (murine glioma cells) PCA structure onto the reference (human GBM) PCA structure. This enabled identification of corresponding cells (i.e., anchors), thereby allowing the mapping of murine gliomas cells onto the human GBM space and inference of the corresponding human glioma label in murine cells. The transfer scores computed using the FindTransferAnchor(…, normalization.method = “SCT”, k.anchor = 5, n.trees = 100, npcs = 40) and TransferData(…) functions in Seurat were used as surrogate measures of transcriptomic similarity.
Survival analyses
Murine glioma programs survival associations. The prognostic value of NMF gene programs identified in murine glioma cells was assessed by performing survival analyses using TCGA RNA-seq profiles of GBM patient samples (N=162). For each sample, NMF program activity was calculated by ssGSEA, and survival associations were assessed constructing Kaplan-Meier curves (survminer R package, v 0.4.9) in which samples were stratified into low (score <0) and High (score ≥0) groups, split based on median score. Corresponding hazard ratios were estimated by fitting proportional hazards regression models using the coxph(…) function (survival R package, v 3.4-0).
Meta-analysis of E1 and E2 survival associations. Using RNA-seq data from each GBM cohort, E1 and E2 program activities were scored using ssGSEA. Then, E1 and E2 hazard ratios (HR) were calculated using proportional hazards regression. Meta-analytic HR estimates were estimated by fitting a random effects model using the rma(…, method = “REML”) function (metafor R package, v 4.0) and forest plots were generated using forest.rma(…). I2, a measure of heterogeneity, was quantified to estimate the percentage of total variance that is attributed to interstudy variance.
Genome-wide CRISPR screens
Pooled genome-wide CRISPR screens in CT2A and GL261 cells. CRISPR screens in sTable CT2A-Cas9 and GL261-Cas9 cells were performed as previously described. In brief, 150×106 cells were infected with the mTKO lentiviral library (Addgene #159393) at an MOI of around 0.3. 24 h after infection, culture medium was changed to puromycin-supplemented medium (5 µg/mL). 72 h after infection, 1×108 cells were cryo-banked whereas 9×107 cells were split into three replicates of 3×107 cells and further passaged every 3–4 d while maintaining 200-fold library coverage. 3×107 cells were collected for genomic DNA extraction at day 0 (T0) post-selection and at every subsequent passage until days 19 (T19, CT2A) or 15 (T15, GL261) post-selection. Genomic DNA extraction, library preparation, and sequencing were performed as described below.
CD29-sort screen in CT2A cells. Infection, selection and passaging of CT2A-Cas9 cells for the CD29 sort screen was performed as described above. At T19, cells were stained with anti-CD29-FITC antibody (20 mg/1000 mL) and FACS sorting was performed as before120. The unsorted T19 sample was used as a reference. Genomic DNA extraction, library preparation, sequencing, and data processing were performed as described below.
Genome-wide CRISPR loss-of-function immune killing screens. Cas9-expressing CT2A-Ova cells were infected with the mTKO lentiviral library at a multiplicity of infection of 0.3 and maintained at 200-fold-coverage for each gRNA in the library. Infected cells were selected with 5 µg/mL of puromycin for 48 h. After selection cells were split in technical triplicates and maintained in culture for 72 h. For myeloid killing screens, CT2A cells (3×106) were harvested and co-cultured with LPS-polarized (100 ng/mL LPS × 24 h) RAW264.7 (1.5×105; 0.05 E:T), J774A.1 (3×105; 0.1 E:T), and BV-2 (6×105; 0.2 E:T) myeloid cell lines for 24 h to achieve killing efficiencies ranging between 22.9% to 44.1% (Fig S12A). For the J774A.1 coculture screen, an additional treatment arm was included in which CT2A cells were opsonized with anti-CD29 antibody (37ºC × 20 min) followed by anti-Armenian Hamster IgG2b antibody (37ºC × 20 min, then room temperature × 40 min) prior to coculturing with J774A.1 cells to facilitate ADCP. Anti-CD29 antibody was confirmed to not affect CT2A viability (Fig S12B). ADCP-dependent killing efficiency was 33.7% across replicates (Fig S12C). After 24 h of selective pressure, myeloid cells were eliminated using puromycin (8 µg/mL × 48 h) and confirmed by CD11b staining. For CTL killing screens, CT2A cells were co-cultured with preactivated CD8+ T cells at a 0.2 E:T ratio to achieve 46.2% killing efficiency (Fig S12D, see Lawson et al. for further details25). For NK killing screens, a 0.5 E:T ratio was used to achieve 43.8% killing efficiency (Fig S12E). Untreated CT-2A cells were kept in parallel as a control. Across all conditions, killing efficiencies was calculated as the number of CT-2A cells in co-culture relative to the number of unchallenged cells. Cell pellets at 200-fold library coverage were collected for genomic DNA extraction at the end of screen. Genomic DNA extraction, library preparation, sequencing, and data processing were performed as described below.
Genomic DNA extraction, library preparation, sequencing, and data processing. For all screens, genomic DNA was extracted using the Wizard Genomic DNA Purification Kit (Promega, #A1120)121. Sequencing libraries were prepared, sequenced using Illumina HiSeq2500, and processed as before25,122. Bayes factor (BF) scores were calculated using BAGEL for CT2A and GL261 dropout screens, such that genes with a BF > 5 were classified as essential genes123. Scaled BF was calculated as scaled BF = BF - 5. For immune coculture screens, fold changes between immune cell-treated and untreated CT2A cells were calculated using limma124.
Human fitness scores. Processed CRIPSR-Cas9 genome screen data was retrieved from Project Score database for 41 human GBM cell-lines and 1031 human non-CNS cell lines41,42. For each cell-line, gene-level fitness was represented as a scaled Bayes factor (BF), which corresponded to BAGEL2-derived BF subtracted by the BF at the 5% FDR threshold31. Scaled BFs were then pooled as averages for GBM and non-CNS cell lines and genes with BF > 0 were classified as essential within each respective group. If genes were essential in both groups, they were additionally termed common essentials, and if genes were essential in one group but not the other, they were termed either GBM-specific or Non-CNS-specific. All other genes were non-essentials.
Data visualization and statistics
Data visualization. Unless otherwise specified, the ggplot2 R package (v 3.3.5) was used for data visualization. scRNA-seq gene expression was visualized using FeaturePlot function (Seurat) or DotPlot function (Seurat). Venn diagrams were generated using either ssvFeatureEuler (seqsetvis R package, v 1.8.0) or ggVennDiagram (ggVennDiagram R package, v 1.1.4).
Statistics and reproducibility. All pairwise comparisons were performed using the signed Wilcoxon rank sum test, and p values were adjusted for multiple comparisons using the Benjamini-Hochberg procedure, as indicated. In cases where methods were compared across a common set of data, paired Wilcoxon tests were performed.