DATA AVAILABILITY
The data/analyses presented in the current publication include the use of protected study data downloaded from the dbGaP web site, under phs000915.v2.p211, phs000673.v47, phs000909.v172, phs000424.v818, phs000178.v11 26. Access to dbGaP was granted for Project #24196, entitled “Unravelling determinants of acquired resistance to hormone therapy in cancers”. Publicly available data were retrieved from GEO: GSE12079523, GSE12074119, GSE11843522, GSE12607821 and SRA: PRJNA47744973, PRJEB2109274. Datasets generated in this study were made available at EMBL-EBI. Bulk RNA-Seq data used in the study were deposited with the accession number E-MTAB-9930. Single-cell RNA-Seq data for LuCaP PDX models and LNCaP cells were deposited with the accession number E-MTAB-9903. Processed data in form of h5 files can also be downloaded directly from https://www.pcaprofiler.com . All the software used for the analyses is described and referenced in the respective Method Details subsections.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Plasmids
The pHAGE-puro (Plasmid #118692) and the pHAGE-EZH2 (Plasmid #116738) were purchased from Addgene.
Cell Lines
PC3, DU-145, 22rV1, MDA-PCa-2b, LAPC-4, LNCaP, VCaP, and HEK 293T cell lines were purchased from ATCC (American Tissue Culture Collection) (Manassas, USA). The LAPC-4 cell line was a gift from Prof. Helmut Klocker while the LNCaP-abl cell line was a gift from Prof. Myles Brown (DFCI, Boston).
Immunohistochemically staining
EZH2 protein expression was analyzed on a previously described tissue microarray including, matched primary and CRPC samples 36. All prostate cancer samples were obtained under approval by the Ethics Committee of Northwestern and Central Switzerland (EKNZ, No EK/1311 and 2015/228). Tumor‐free prostate core needle biopsies were used to analyze benign prostate (n = 3 patients). Prostate cancer biopsies included in the TMA were taken during routine clinical treatment. Samples were selected based on the following inclusion criteria: (a) histologically‐diagnosed PCa, (b) tumor‐containing biopsies available at HN and CR state, and (c) sufficient quality and amount of material, as evaluated by experienced pathologists (LB and KM). Castration‐resistance was defined as either biochemical progression (ie, serum PSA progression according to Prostate Cancer Clinical Trials Working Group criteria or clinical progression. A TMA comprising 112 matched HN/CR tissue specimens, and including 107 transurethral resections and five distant metastases derived from 55 PCa patients was constructed as previously described (Federer-Gsponer JR et al. Prostate 2020). Due to tissue loss, a common problem associated with TMA technology, 33 high-quality matched tissue samples of primary and CRPC remained after sectioning.
For EZH2 IHC, slides were analyzed with the Bond-III automated staining system (Leica) using manufactured reagents for the entire procedure. For antigen retrieval, slides were incubated for 60 min in Citrate buffer at pH6 at 98°C. Thereafter, slides were incubated with a rabbit monoclonal antibody against EZH2 (D2C9, CST5246 from Cell Signaling) at the dilution of 1:500. Detections were performed using the detection refine DAB kit (Leica). Immunohistochemical staining was evaluated as a percentage of tumor cells with nuclear positivity for EZH2 using Aperio ImageScope (Leica).
For the assessment of CD11c protein expression in primary prostate cancer and the correlation with PSA-recurrence patients were selected from two previously characterized tissue microarrays cohorts constructed in Zurich and Bern75-77. Due to tissue loss, a common problem associated with TMA technology, a total of 482 high-quality tissue samples of primary tumor remained after sectioning (n = 272 from Zurich and n = 210 from Bern). In each case, the local scientific ethics committees approved (StV-Nr. 25/2007 and StV-Nr. 25–2008) and informed consent was obtained from all patients. Recurrence-free survival curves were calculated using the Kaplan-Meier method. Patients were censored at the time of their last tumor-free clinical follow-up visit. Time to PSA recurrence was selected as the clinical endpoint. Only patients undergoing radical prostatectomy were used for survival analysis.
For CD11c IHC, slides were analyzed with the Bond-III automated staining system (Leica) using manufactured reagents for the entire procedure. For antigen retrieval, slides were incubated for 20 min in Citrate buffer at pH6 at 98°C. Thereafter, slides were incubated with a rabbit anti-CD11c antibody targeting the C-terminus (ab52632) at the dilution of 1:1000 for one hour at room temperature. Detections were performed using the detection refine DAB kit (Leica). Immunohistochemical staining was evaluated with the automated Aperio ImageScope (Leica) image quantification system using a two-tiered score, i.e. tumor spots with at least three percent of CD11c-positive cells were classified as CD11c high, while the remaining cases were classified as CD11c low. In the case of the Bern cohort, multiple spots per tumor were available and the percentage of CD11c-positive tumor cells was established based on the average of two spots displaying the highest Gleason pattern.
Cell Culture
PC3, DU-145, 22rV1, LAPC-4, and LNCaP cell lines were cultured in RPMI 1640 (21875-034 Life Technologies) supplemented with 10% Fetal Bovine Serum (FBS-11A Capricorn Scientific) and 1% Penicillin/Streptomycin (15140-122 Life Technologies) with 5% CO2 at 37°C. LAPC4 was also supplemented with 1 nM DHT.
The LNCaP-abl cell line was cultured in Phenol-Red Free RPMI 1640 (11835063 Life Technologies) containing 10% charcoal-stripped serum (CSS Fetal Bovine Serum, Charcoal Stripped A3382101 Life Technologies) and 1% Penicillin/Streptomycin with 5% CO2 at 37°C.
VCaP and 293THEK cell lines were cultured in DMEM (61965059 Life Technologies) supplemented with 10% FBS and 1% µg/ml Penicillin/Streptomycin with 5% CO2 at 37°C.
MDA-PCa-2b cell line was cultured in ATCC-formulated F-12K medium (30-2004) supplemented with 20% FBS, 25 ng/ml Cholera Toxin (C8252 Sigma), 10 ng/ml Epidermal Growth Factor (AF-100-15 Peprotech), 0.005 mM Phosphoethanolamine (P1348 Sigma), 100 pg/ml Hydrocortisone (H0135 Sigma), 45 nM Selenium Acid (211176 Sigma), 0.005 mg/ml Human Recombinant Insulin (I1884 Sigma) and 1% Penicillin/Streptomycin with 5% CO2 at 37°C.
Transfection and Infection
The HEK 293T cells were transfected with pHAGE (Empty) and PHAGE EZH2 vectors as previously described78. Forty-eight hours after the transfection, the viral supernatants were collected and filtered through a 0.45 m filter. The LNCaP cell line was incubated with viral supernatant and 8μg/ml Polybrene (H9268 Sigma) for 72 h and then selected with 2 μg/ml puromycin (P8833 Sigma) for two weeks. Western blotting was used to verify the protein overexpression of EZH2.
Ex vivo culture of PDX
PDX tumor tissue was cut into small pieces (1–0.5 mm) with a scalpel blade and then digested in Collagenase Type I media solution (200U/ml Cat#SCR103, Millipore) at 37 °C for 45-60 min. After enzymatic dissociation, the cell suspension was passed through a 100 µM cell strainer (11814389001 Roche) to eliminate macroscopic tissue pieces and centrifuged. The cell pellet was then resuspended in 2-volume pf RBC lysis buffer (11814389001 Roche), incubated for 3 min at RT, and, after centrifugation, resuspended in media. Cell suspension of PDX cells was then propagated in 3D vitro culture (44 and 79). The cells were embedded in 50% phenol red-free Matrigel (356231 Corning) and plated as a drop in a 96-well-plate (10000 cells/ well) and maintained in the medium 7-10 days.
DHT Dose Response Assay
3D culture of PDXs (see session Ex vivo culture of PDX) or 2D culture of LNCaP cell line (5000 cells/well in 10% CSS medium) were seeded in triplicate in a 96-well plate and subsequently treated with serial dilutions of DHT (concentration range of 0.01nM-30µM). Proliferation was assessed after 7-10 days by Cell-Titre-Glo assay (G9241 Promega) for 3D culture or MTT (Methylthiazolyldiphenyl-tetrazolium bromide) assay (M5655 Sigma) for the 2D culture. For each time point, absorbance (OD, 590 nm) was measured in a microplate reader (Cytation 3 Imaging Reader Biotek).
Colony Formation Assay in DHT-free medium
VCAP (5x 10^5 cell/well), LAPC4 (2.5x 10^5 cell/wells), LNCaP (2.5x 10^5 cell/wells), or LNCaP over-expressing EZH2 were seeded in triplicate in 6-well plates in a standard medium. After 24-48h, when the cells attached to the plate and formed a confluent layer, the medium was replaced with 10% CSS medium (DHT free medium) with/without 1 µM GSK126 and keep in culture until the formation of the colonies (4-6 weeks). The medium/treatment was weekly replaced. At the end time point, the cells were gently washed with PBS, fixed with 0.01% crystal violet and 20% of EtOH for 30 min, and then wash out with water. The imagines’ colonies were acquired using the Fusion Solo IV LBR system and the quantification of colonies was performed by ImageJ software.
Animal Experiments
All animal experiments were carried out according to the Swiss Veterinary Authority (TI-42-2018 and TI-10-2010). All in vivo studies were used 6-8 weeks old male NRG (NOD-Rag1null IL2rgnull, NOD rag gamma) mice. Patient-derived xenografts (PDX) LuCaP-147, -145.2,-78, -35 -23.1 were provided by Dr. Eva Corey 41 .Dr. Marianna Kruithof-de Julio provided PNPCa. PDXs tumors were maintained by subcutaneous implantation of matrigel-embedded tumor fragment (1-2-mmm average diameter tumor or take rate varied from 1 to 6 months. For the experiment in castration LNCaP or LuCaP-147 cells (obtained from tumor dissociation see details in a session of ex vivo PDX culture) were suspended in PBS and 50% Matrigel and subcutaneously injected into the dorsal flanks of the mice (2 10^6 cells/mouse). Tumor growth was recorded using a digital caliper, and tumor volumes were calculated using the formula (L x W 2) /2, where L=length and W=width of the tumor. Tumor volume was measured 2 times per week. When the tumor reached the dimension of 50-100 mm2, mice were surgically castrated. For the GSK126 treatment, the mice were treatment one week after castration by daily i.p. at the dose of 100mg/Kg for 3 weeks. At the end of the experiment, mice were euthanized, tumors explanted, and used for the molecular assessment.
Antibodies and Western Blot Analysis
Primary antibodies used: anti-GAPDH (sc-47724 Santa Cruz), anti-AR (sc-7305 Santa Cruz), anti-DNMT3A (sc-365769 Santa Cruz), anti- EZH2 (612667 BD Transduction Laboratory), anti-DNMT(15032S Cell Signaling Technologies), anti-EED (85322S Cell Signaling Technologies), anti-SUZ12 (3737S Cell Signaling Technologies), anti-Aurora A (14475T Cell Signaling Technologies), antiH3K27me3 (9733S Cell Signaling Technologies), anti- PLK1 B290751 Biolegend, anti-H3K9me2 (ab1220 Abcam), antiH3K27ac (ab4729 Abcam) and H3K4me3 (ab6000 Abcam).
Tumor Tissues (25-30 mg) or cellular pellet were lysates with RIPA Buffer supplemented with cocktail phosphatase inhibitors (4906845001 Roche) and proteases inhibitors (5892953001 Roche). Protein concentration was determined by BCA reagent (A52255 Thermo Fisher Scientific), 30-50 μg of whole protein lysate were separated on 8-12% SDS–polyacrylamide gels and transferred onto PVDF membrane (88518 Thermo Fisher Scientific). The membranes were blocked with 5% milk in Tris Buffered Saline with Tween 20 (TBST) for 30 min at RT, incubated overnight at 4°C with primary antibodies, and incubated for 1h at RT with secondary antibodies (anti-rabbit IgG HRP W401B and anti-mouse IgG HRP W402B Promega). The protein bands were visualized using the western bright quantum reagent (K-12042-D20 Advansta) and quantified using the Fusion Solo IV LBR system.
RNA Extraction for RNA-seq analysis
According to the manufacturer’s guidelines, the RNA extraction was performed from PDXs frozen fragment (25-30 mg) of cellular pellet using RNeasy kit (74106 Qiagen). The RNAs were processed using the NEB Next Ultra II Directional Library prep Kit for Illumina (E7765 NEB) and sequenced on the Illumina NextSeq500 with single-end, 75 base pair long reads.
Single-cell isolation for scRNA sequencing
To perform scRNA-seq PDX tumor tissue, they were dissociated into single cells as described above (see session Ex vivo culture of PDX). After resuspension in PBS, single-cell suspensions were loaded into a 10x Chromium Controller (10x Genomics, Pleasanton, CA, USA), aiming for 10000-5000 cells, with the Chromiun Next GEM Single Cell 3' v3.1 reagent kit (PN-1000121 10x Genomics), according to the manufactured instructions.
RNA-Seq Data processing
Sequencing of Xenografts, 2D and 3D cultures
We retrieved bulk RNA-Seq data for cellular models of prostate cancer from various available datasets and extended these by performing bulk RNA-Seq of several prostate-cancer Xenografts models (i.e. PNPCa; LuCaP-78; LuCaP-23; LuCaP-35; LuCaP-145; LNCAP), and their derived 3D cultures. Additional sequencing was performed for 2D cultures of LNCaP, LNCaP-all, LAPC4, and VCaP cells. (See data availability section)
Prostate Cancer Transcriptome Atlas
To build an integrated resource of transcriptional features representing all stages of prostate cancer progression, we collected raw sequencing data from a large panel of independent datasets. We gathered raw data for 1223 clinical samples (1104 excluding technical replicates, 1044 excluding multiple metastatic sites derived from the same individual). The resulting integrated cohort is representative of various stages of disease progression, namely, normal prostate specimens (n=174), primary tumors (n=714), castration-resistant prostate cancers (n=316), and castration-resistant prostate cancers showing features of neuroendocrine trans-differentiation (n=19). Raw sequencing files were retrieved from following sources: 1) Gene Tissue Expression Database (GTEX); 2) The Cancer Genome Atlas (TCGA); 3) Atlas of RNA sequencing profiles of normal human tissues (GSE120795); 4) Integrative epigenetic taxonomy of primary prostate cancer (GSE120741); 5) Prognostic markers in locally advanced lymph node-negative prostate cancer (PRJNA477449); 6) The Long Noncoding RNA Landscape of Neuroendocrine Prostate Cancer and its Clinical Implications (PRJEB21092); 7) Integrative Clinical Sequencing Analysis of Metastatic Castration Resistant Prostate Cancer Reveals a High Frequency of Clinical Actionability (PRJNA283922; dbGaP: phs000915); 8) CSER - Exploring Precision Cancer Medicine for Sarcoma and Rare Cancers (PRJNA223419; dbGaP: phs000673); 9) Molecular Basis of Neuroendocrine Prostate Cancer (PRJNA282856; dbGaP: phs000909); 10) Heterogeneity of Androgen Receptor Splice Variant-7 (AR-V7) Protein Expression and Response to Therapy in Castration Resistant Prostate Cancer (CRPC) (GSE118435); 11) Molecular profiling stratifies diverse phenotypes of treatment-refractory metastatic castration-resistant prostate cancer (PRJNA520923; GEO: GSE126078). Depending on the specific dataset considered, fastq files were downloaded either by using gdc-client (TCGA) or sra-toolkit (SRA, dbGaP). Detailed information along with all available clinical annotations are provided in Supp.Table1.
RNA-seq data processing of clinical samples
The overall quality of sequencing reads was evaluated using FastQC (Andrews S., 2010). Sequence alignments to the reference human genome (GRCh38) were performed using STAR (v.2.6.1c) in two-pass mode. Gene-expression was quantified at the gene level by using the comprehensive annotations made available by Gencode (v29 GTF-File). Strand-specific information was not maintained to avoid technical differences between stranded and unstranded libraries. Samples were adjusted for library size and normalized with the variance stabilizing transformation (vst) in the R statistical environment using DESeq2 (v1.28.1) pipeline. When performing differential expression analysis between groups we applied the embedded IndependentFiltering procedure to exclude genes that were not expressed at appreciable levels in most of the samples considered. If not otherwise specified, all gene set enrichment analyses were performed using the limma package (Camera, use. ranks set to TRUE) 70. Gene-Sets collections were retrieved either from the Molecular Signature Database (MSigDB, or from previous publications (AR/NE-Score) 80.
Batch effects correction and Principal Component Analysis
In the processes of integrating different datasets from a variety of sources, we verified that batch effects did not overwhelm the biological signal. Batch effects may derive not only from differences across datasets, but also may be consequent of a different sequencing technique (PolyA+; TotalRNA; Hybrid Capture Sequencing) or originate from other unknown sources. Principal component analysis (PCA), by identifying the transcriptional features endowed with the highest variance across samples, is a very useful tool to detect relevant batch effects. When the latter are overwhelming, they are likely to appear among the top principal components and cluster together samples sharing the same batch effect-related features. A PCA analysis performed on the complete set of 1223 samples (Figure S1B) showed that the largest source of batch effects was associated with the Hybrid Capture Sequencing technique (HCS), while no relevant differences could be clearly associated with the dataset of origin. Only two of the CRPC datasets (phs000915, phs000673) contained samples sequenced using HCS, and for several of these, matched technical replicates sequenced using PolyA+ technology were also available. This allowed us to assess and remove technology-associated bias in gene expression (ComBat, PolyA+ samples set as reference batch). We further reduced the possibility of confounding biological with technical variation by generating a training-subset of our data, consisting of 883 PolyA+ samples (52 Normal prostate, 620 Primary tumors, 193 CRPCs, 19 NEPCs) and determined the top 2000 genes showing the highest amount of variation within the PolyA+ training set only. This way, for PCA representation we avoid the selection of genes that are possibly affected by the sequencing technique, despite the correction we had already performed on the data.
Hence, we used the same 2000 genes to generate a PCA plot computed on the extended set of samples. The results depicted in the PCA plot shown in Figure1A clearly show that the positioning of tumors at the same stages of cancer progression overlap with each other irrespectively of the dataset of origin and the sequencing technology. This indicates that the different positioning of normal prostate, primary tumors, CRPCs, and NEPCs is due to a real biological signal and not consequent to an unwanted dataset-specific batch effect.
Integration and validation of additional bulk RNA-Seq samples and pseudo-time inference
We developed a method to include new prostate tumor samples in our current analysis by starting from raw counts, which allows the computation of pseudo-time and Principal components without modifying the original data and plots. Ideally, RNA-Seq should be quantified using the sample genome (hg38) and references used for the current study (Gencode V29). Predictions can be performed sequentially, one sample at a time. For each new sample of interest, raw counts will be merged with the ones composing our full set. The obtained numeric matrix (the original matrix + 1 extra sample of interest) undergoes the same normalization and processing steps up to the computation of the PCA. Here, coordinates may slightly differ from the original ones, due to the adding of a new sample which might exert a small effect on the global re-normalization of all samples. To address this behavior, we apply a machine learning-based approach that generates at runtime three elastic net models, one for each of the top 3 principal components, and train them to predict the error between the original coordinates and ones that are recomputed following the addition of the extra sample of interest. Hence, we apply these models to adjust the computed PC1, PC2 and PC3 coordinates of the extra sample which can now be added to the PCA plot and pseudo-time can be determined using slingshot.
Trajectory analysis
Trajectory and pseudo-time inference are frequently used in single-cell RNA sequencing data analysis to model developmental trajectories through smooth curves following dimensionality reduction and clustering. Here we applied one of these tools, slingshot (v1.6.0), to infer progression-associated trajectory and pseudo-time from our integrated set of bulk-RNA sequencing samples. We selected slingshot because of its capability to also determine branches along the trajectory if any. PCA positioning (PC1-PC2) of the individual samples was used as input for slingshot, along with the information that the computed trajectory had to start from the Normal tissue cluster. The analysis was performed using 1106 samples, discarding all technical replicates, in order not to overweight some samples and influence the computation of the trajectory. Metastatic lesions from the same individual but localized in different organs were admitted for this analysis. Subsequently, we could associate a pseudo-time for each sample, ranging from 0 to 250 (Figure 1B).
Correlation of genes and pathways to pseudo-time
Having defined a unique pseudo-time value for each sample, we computed the correlation between pseudo-time and mRNA expression for each gene. For this purpose, we used Pearson’s correlation over Spearman’s because we aimed at identifying the strength of the linear relationship between gene expression and pseudo-time. However, to be more robust to outliers, we opted for 10 times repeated leave one third out procedure. Precisely, we randomly selected 10 subsets composed of 66% of the samples and computed correlation coefficients between pseudo-time and expression of each gene in all subsets. Finally, we averaged these values and ranked them according to their correlation coefficient to pseudo-time. Subsequently, using this ranking we applied Camera to perform gene-set enrichment analysis procedure (use.ranks = TRUE) and determined which gene-set were mostly directly or inversely associated with pseudo-time (Figure S1F).
Correlation of mRNA expression and protein abundances
Proteomics data were retrieved from the Proteomics Identifier Database (PRIDE: projects PXD009868, PXD003430, PXD003452, PXD003515, PXD004132, PXD003615, PXD003636). The dataset includes 28 gland confined prostate tumors and 8 adjacent non-malignant prostate tissue obtained from radical prostatectomy procedures, plus 22 bone metastatic prostate tumors obtained from patients operated to relieve spinal cord compression. To compute the correlation between mRNA expression and protein abundance we first computed, for each gene, the average Fold-change (log2) between CRPC and PRIMARY tumors based on mRNA expression. Then the same was applied to the proteomics data to obtain for each protein a log fold change representing differential abundance between CRPCs and primary tumors. For protein/mRNA correlation purposes, we discarded all genes that had not been evaluated in the proteomic data. Finally, we used Pearson’s method to evaluate the strength of correlation and the associated statistical significance.
Retrieval of genetic information and correlation with progression
Matched genetic information respective to mutations and copy number status could be retrieved for 763 samples through cBioportal. Samples for which this information was available are indicated in Supp.Table1. To determine associations between mutations and tumor progression, for each gene we compared the pseudo-time of mutant vs wild-type samples, by performing statistical testing using the Wilcoxon-sum rank test. Mutations were ordered according to their False Discovery Rate adjusted P-values and analyses were performed separately in PRIMARY and CRPC+NEPC tumors, to determine the relative contribution of mutations at various stages of disease progression. We only screened for genes being mutated in more than 5 individuals (Figure S1L). To determine associations between copy-number alterations and tumor progression, we associated for each gene a value of either -2 (homozygous deletion), -1 (heterozygous deletion), 0 (Wild-Type), 1 (Gain), 2(Amplification), and subsequently computed Pearson’s correlation between these values and pseudo-time. We restricted this last analysis to genes being frequently deleted or amplified in prostate tumors, namely, MYC, AR, RB1, PTEN, and TP53 (Figure 1E). The above-described analyses were performed discarding technical replicates. Metastatic lesions from the same individual but localized in different organs were admitted for this analysis.
Quantification of immune infiltrates and correlation with progression
Quantification of immune infiltrates for all samples in our cohort was inferred from transcriptomic data using CibersortX 81 by using the default signature matrix "LM22" to deconvolve 22 immune cell subsets from bulk RNA-Seq (Absolute quantification mode). The abundance of inferred immune populations was correlated to pseudo-time using the same strategy applied to correlate gene-expression and pseudo-time. We opted for 10 times repeated leave one-third out procedure. Precisely, we randomly selected 10 subsets composed of 66% of the samples and computed correlation coefficients between pseudo-time and each immune population in all subsets. Finally, we averaged these values and ranked them according to their correlation coefficient to pseudo-time. Pearson’s correlation-associated P-Values were corrected for multiple testing using the False Discovery Rate (FDR).
Macrophage Polarization Index
The Macrophage Polarization Index, indicating polarization towards M1 or M2 phenotypes was computed for all bulk-RNA samples in our cohort using MacSpectrum 71.
SINGLE-CELL RNA-SEQ DATA PROCESSING
Quantification of gene expression
Fastq files were generated by demultiplexing raw data using cellranger mkfastq (v3.1.0) To make single-cell gene-expression quantification more comparable to those of bulk RNA-Seq, we generated a custom genome with cellranger more, using the very same reference (GRCh38.p12) and annotations (encode v29).used for STAR when performing bulk RNA-Sequencing analysis. To discriminate between human and murine cells that may infiltrate the tumors in the in vivo setting, we created a Mouse-Human reference, by creating a hybrid genome (GRCh38.p12+GRCm38.p6) and hybrid gene-annotations (gencode v29 and M25, for human and mouse genes respectively). To avoid conflicts, mouse genomic coordinates were preceded by a prefix (i.e. mm_chr1, mm_chr2, etc.). Subsequently, cellranger count was used to quantifying gene-expression in form of an h5 filtered matrix where Ensembl gene IDs are used as identifiers.
Data filtering and clustering
Expression quantification files were imported in R statistical environment using Seurat (v3.1.5) package. We discarded individual cells from our data matrix by using two filtering procedures: first, we aimed at detecting transcriptional outliers, second, we looked for putative doublets, which we also discarded. Briefly, we computed per-cell quality control metrics using scatter (v1.16.1). The total amount of mitochondrial and ribosomal gene expression was quantified for both human and mouse cells. The number of genes being detected per cell, the total amount of reads per cell, and the mitochondrial and ribosomal fraction of the transcriptome were used to determine the skewness-adjusted multivariate outlyingness for each cell (robustbase v0.93-6). Outliers were detected by median absolute deviation (MAD) and removed at both tails. Counts were then normalized (Seurat::NormalizeData, method = LogNormalize, scale.factor = 1000) and the top 2000 most variable features were selected (Seurat::FindVariableFeatures, method = vst). Data were then scaled (Seurat::ScaleData) and principal component analysis was performed up to the top 50 components (Seurat::RunPCA). Subsequently, we identified and eliminated putative doublets using DoubletFinder (v2.0.3). Having identified outliers and doublets, we removed them from the original count data and went through the pre-processing step again (i.e. normalization, scaling, and pca-reduction). We proceeded to the determination of the k-nearest neighbors of each cell and the construction of a Shared Nearest Neighbor (SNN) Graph (Suerat::FindNeighbors), then we identified clusters using the shared nearest neighbor (SNN) modularity optimization based clustering algorithm (Seurat:: FindClusters, resolution = 0.5). Finally, we performed Umap dimensionality reduction on the first 10 Principal Components, annotated the previously identified clusters, and generated plots accordingly.
Identification of Cell-Cycle Phase and Cell-Type
We retrieved the list of cell cycle markers 82 and subdivided it into markers of G2/M phase or S phase, according to Seurat’s annotations. We then used this information to infer the cell cycle phase in our samples (Seurat::CellCycleScoring). Murine cells could be clearly distinguished from human cancer cells, because of the intrinsic differences that could be easily spotted thanks to the alignment and quantification performed using a hybrid human-mouse genome. Murine cell types were identified using SingleR (v1.2.4) 66, using ImmGen repository 67.
Dealing with Drop-out events
Drop-out events are very frequent in the single-cell experiment performed using chromium 10x technology. To address these issues, we applied Markov Affinity-based Graph Imputation of Cells (RMagic v2.0.3) 68.
Differential expression analysis and gene-set enrichment
Differential expression was performed between different cell-clusters and between clusters subjected to different treatment conditions (Seurat::Findmarkers) using a hurdle model tailored to scRNA-seq data (MAST method). Genes were subsequently ranked for log2 Fold-Change and the Camera algorithm (pre-ranked) was used to determine gene-set enrichments for each comparison. Cell-specific gene-set enrichments were determined using single-sample GSEA, computed using gene-expression values of each cell following RMagic imputation.
Macrophage Polarization Index of macrophages
The Macrophage Polarization Index, indicating polarization towards M1 or M2 phenotypes was computed for all cells being identified as macrophages from SingleR analysis (https://macspectrum.uconn.edu).
Macrophage Reclustering
We could identify a sustained number of murine macrophages infiltrating all xenograft models, except for PNPCa cells. We isolated them and performed a cell-type-specific analysis by repeating all previously described processing steps (i.e. normalization, scaling, and pca-reduction). Dropout events were addressed using RMagic, and cell-specific enrichments were computed using a single sample GSEA.
Integration of scRNA-Seq with bulk-RNA samples, PCA, and pseudo-time inference
Single-cell experiments can be easily integrated with bulk-RNA experiments by simply summing up together gene-counts for all individual cells into one meta-element. This has proven to be extremely comparable in terms of pseudo-time inference and PCA positioning, as scRNA-Seq and bulk RNA-Seq experiments performed on the same cells are overimposable to each other. The same applies for the integration of single-cell derived clusters, provided that the number of cells composing each cluster is not so critically low that the number of drop-out events results in a matrix composed of too many missing genes. If this is the case, or if just a single cell is to be integrated into the analysis, we suggest running RMagic to deal with the drop-out events, and then simply proceed as previously described.
QUANTIFICATION AND STATISTICAL ANALYSIS
Quantification methods and statistical analysis methods for ** were mainly described and referenced in the respective Method Details subsection. If not otherwise specified, all statistical tests were corrected for multiple comparisons using the false discovery rate (FDR) correction method.
ADDITIONAL RESOURCES
PCAProfiler
We provide a resource for the research community endowed with a web-based interface to facilitate the mining of the prostate cancer transcriptome atlas, called the PCaProfiler (https://www.pcaprofiler.com). Using this resource, scientists can easily interrogate the atlas, recapitulates the findings shown in this study, and extend these by exploiting correlations between genes of interest and prostate cancer progression. PCaProfiler will allow integration and pseudo-time inference of new cancer transcriptomes that the user can directly upload, compute and visualize on the server. All results can be downloaded and re-uploaded to PCA-Profiler when needed. Pre-loaded are PCA-positioning and Pseudo-time inferences of cell-line, xenografts, and organoid models, as well as single-cell clusters and additional transcriptional datasets not included in the current study (i.e. PRJEB25542, ESCAPE Trial). PCAProfiler will be updated frequently with new data as new samples are being released or under specific requests.