CGR signatures based on event topology
The definition of chromothripsis has been imprecise and evolving7,27. A previous review27 proposed a set of criteria to define chromothripsis: clustering of breakpoints, randomness of DNA fragments, oscillating copy-number states, involvement of a single haplotype and ability to walk through the derived chromosome, etc. In our recent study8, we followed the above criteria to detect chromothripsis using ShatterSeek in 2,428 whole-genome sequenced tumors from Pan-cancer Analysis of Whole Genomes (PCAWG). However, in some studies, the term chromothripsis is used loosely without the requirement of oscillating copy-number states23,28. Here, we define “CGRs” broadly as complex events formed via one-time events rather than accumulation of multiple individual events over time. In order to detect CGRs more broadly, we removed the requirement of oscillating copy-number states in ShatterSeek (see details in Online Methods) and re-analyzed the same 2,428 samples8. A total of 2,014 CGRs were detected in 1,289 (53.1%) samples (Supplementary Table S1). The newly detected 1,226 CGRs demonstrated interleaved SVs (Supplementary Figure S1) which suggested that they likely formed through one-time catastrophic events. Out of 285,791 somatic SVs detected in 2,428 tumors, 106,759 (37%) are involved in CGRs. Therefore, CGRs are a major source of genomic instability.
We hypothesize that the CGR topology—pattern of copy-number states and distribution of breakpoint junctions—can be used to infer their mechanisms of formation. To this end, we developed a computational algorithm “Starfish” to deconvolute CGR signatures based on the event topology (see details in Online Methods). Briefly, we first selected 12 features to comprehensively depict the topology of each CGR event (Figure 1a). These features include breakpoint dispersion score, copy loss percentage, copy loss density, copy gain percentage, copy gain density, number of copy states, median copy number change, maximum copy number, highest telomere loss percentage, ratio of telomere loss and CGR loss sizes, median breakpoint microhomology and median breakpoint insertion size. Features related to the magnitude of CGR events (i.e. number of chromosomes involved, number of rearrangements and size of CGR regions) were not used. After removing highly correlated features (Supplementary Figure S2a) and features with small variances (Supplementary Figure S2b), there were five features remaining: CGR breakpoint dispersion score (measuring the randomness of breakpoint distribution on chromosomes), copy loss percentage, copy gain percentage, telomere loss percentage and maximum copy number. We performed unsupervised consensus clustering for all 2,014 CGRs (Figure 1a) using the five features and discovered six clusters (Figure 1b, Supplementary Figure S2c). Clusters constructed by different clustering approaches were very similar (Supplementary Figure S2d). We use six clusters produced by the partition around medoids (PAM) algorithm for the remainder of this document and refer to the clusters as CGR signatures (Supplementary Table S1).
Signature 1 features highest maximum copy number with moderate percentages of copy gain (Figure 1b). This pattern resembles ecDNA where small DNA fragments are highly amplified (Figure 1c). Homogeneous staining region (HSR) is another form of genomic amplification in cancer and the amplified DNA fragments reside on linear chromosomes29. HSRs can excise from the host chromosomes and circularize into DMs and DMs can re-integrate into linear chromosomes23,30. The Signature 1 does not differentiate between circular and linear forms, and therefore, it captures both DMs and HSRs. Signature 2 has the highest amount of telomere loss (Figure 1b) and reflects chromothripsis formed through chromatin bridge and possibly involving BFB cycles (Figure 1c). Signatures 3 and 5 present the largest amount of genomic copy loss and copy gain respectively (Figure 1b and 1c) and neither matches any known mechanisms. Signature 4 highlights the lowest breakpoint dispersion scores (breakpoints evenly distributed) with modest copy gains and losses (Figure 1b) which coincides with micronuclei-induced chromothripsis (Figure 1c). In sharp contrast, Signature 6 manifests the highest breakpoint dispersion scores (breakpoints unevenly distributed) with a very small amount of copy loss and no copy gain (Figure 1b). This pattern fits the definition of chromothripsis27 with distinct patterns, so we named this signature “hourglass chromothripsis” (Figure 1c) due to the shape of its copy-number profile (small fractions of content leaking to the lower level). Among the six signatures, Signature 3 is the least common one with 240 events and Signature 6 is the most abundant one with 431 events (Figure 1b). In total, 971 (48%) CGRs can be related to known mechanisms (Signatures 2 and 4) and known cellular structures (Signature 1). Interestingly, 1,043 (52%) CGRs belong to the three signatures (Signatures 3, 5 and 6) that cannot be attributed to any known mechanisms which highlights the benefit of our signature decomposition strategy.
Benchmarking CGR Signatures
To benchmark our algorithm, we utilized three studies that have induced chromothripsis events using experimental approaches18,20,22. In order to easily classify additional CGR events, we trained a neural network classifier (Supplementary Figure S3a and S3b), namely Starfish classifier, using the five features and CGR signature labels from the PCAWG samples (Figure 1b). We then predicted CGRs and their signatures in the three benchmarking datasets using the modified ShatterSeek and Starfish classifier. In 62% of the cases (26 out of 42, Supplementary Table S2), the predicted signatures matched experimentally induced CGR forming mechanisms (Figure 1d) which suggested that the CGR signatures deconvoluted by Starfish are biologically meaningful. In the study by Umbreit et al., micronuclei were formed via broken chromatin bridge22 and produced chromothripsis through DNA shattering. Consistent with this, most CGRs (8 out of 13) in this category displayed patterns of micronuclei-induced chromothripsis (Figure 1d).
As an indirect validation of Signature 1, we compared Signature 1 CGRs to ecDNA (circular events) predicted by AmpliconArchitect25. Since 85% of the AmpliconArchitect-predicted circular events in cell lines were confirmed to be DMs by florescent in situ hybridization (FISH)25, we used them as ground truth. In 289 AmpliconArchitect-predicted circular events from 849 tumors shared between this study and the AmpliconArchitect study, most were relatively simple (Supplementary Figure S3c) as ecDNA/DM can form with only one DNA fragment ligated head-to-tail3,23. Out of the 100 complex circular events, 64 (64%) were classified as Signature 1 by Starfish. There were also 309 CGRs in Signature 1 not classified as circular events by AmpliconArchitect which could be the linear form of HSRs. To validate this, we compared them to tyfonas events predicted by JaBbA26 which were experimentally validated as HSRs. In 887 tumors shared by this study and JaBbA study26, 38 out of 46 (83%) JaBbA-predicted tyfonas events were classified as Signature 1 by Starfish (Supplementary Figure S3d). These results combined suggested that the Signature 1 can truthfully capture both DMs and HSRs.
In summary, our CGR signatures deconvoluted based on event topology are highly accurate and biologically relevant.
We then investigated the differences in event magnitude among six CGR signatures. Signatures 1 and 2 have more chromosomes and SVs involved and affect larger genomic regions compared to the other signatures (Figure 1e). The number of foldback inversions has been used to classify BFB-cycle events25,26. However, we found that a hard cutoff with the number of foldback inversions cannot effectively separate Signature 2 from others (Supplementary Figure S3e).
CGR events as major drivers of cancer
The frequencies of CGRs significantly vary across tumor types (Figure 2a and Supplementary Figure S4a). CGRs are most abundant in glioblastoma, osteosarcoma and esophageal cancer, while pilocytic astrocytoma and chronic lymphocytic leukemia barely have any. Signature 1 is frequently observed in many tumor types including soft tissue sarcoma, esophageal cancer, glioblastoma and lung squamous cell carcinoma (Figure 2a and Supplementary Figure S4b), which is consistent with previous studies24,25. In contrast, Signature 2 is most abundant in clear cell renal cell carcinoma (Figure 2a and Supplementary Figure S4b) in which chromosomes 3 and 5 are known to be prone to chromothripsis31. This signature is also abundant in osteosarcoma, melanoma, breast cancer, and ovarian cancer (Figure 2a). The enrichment of the Signatures 1 and 2 being consistent with previous studies again demonstrated the accuracy of our signature deconvolution. Strikingly, Signature 6 is found in almost all tumor types (Figure 2a) and is particularly common in prostate cancer (107/187 [57%] in PCAWG cohort). We also observed biases of CGR occurrences among tumor subtypes. For example, Signature 5 is enriched in basal breast cancers (Supplementary Figure S4c).
CGRs also have uneven distribution across the genome (Supplementary Figure S4b). It was reported that regions with frequent CGRs often carry major cancer-driving genes, such as ERBB2 in breast cancer32, EGFR in glioblastoma3 and TERT in chromophobe renal cell carcinoma33. In fact, we were able to find cancer-driving genes within 3Mb of the majority of the CGR hotspots including CCND1, ERBB2, PTEN, TMPRSS2, MYCL, MYC, CCNE1, GATA6, TERT, CDK4, MDM2, TP53, EGFR, MYCN and CNKN2A (Figure 2b). GATA6 is known to be the most frequently amplified gene in pancreatic cancer34. We found that most of the amplifications (10.8%) are due to CGRs (Figure 2c) of different signatures (Figure 2d). Prostate cancers also have two hotspots on chromosome 21 corresponding to TMPRSS2-ERG fusions (Supplementary Figure S5) even though these fusions are usually caused by simple deletions35. Given the abundance of CGRs in cancers and the frequent involvement of major cancer-driving genes, we conclude that CGRs are major players of tumorigenesis.
Genetic and clinical associations of CGRs
To better understand the mechanisms of CGR formation, we sought to identify genetic alterations associated with CGR signatures. It has been shown that TP53 mutations are associated with chromothripsis in tumors8,36–38. In cell line models, TP53 had to be inactivated so that the cells can tolerate chromothripsis without undergoing apoptosis18–20. After testing somatic mutations in all protein-coding genes that are mutated in at least 10 tumors in PCAWG for each CGR signature (except Signature 6), we observed Signatures 1, 2, 4 and 5 are significantly associated with TP53 mutations (Figure 3a) with FDRs of 2.5e-10, 1.9e-3, 2.0e-4 and 4.54e-10 respectively. Interestingly, Signatures 3 and 6 (in an extended prostate cancer cohort) are significantly associated (FDRs 1.4e-2 and 5.9e-2 respectively) with mutations in Speckle Type BTB/POZ Protein (SPOP) (Figure 3a), a subunit of an E3 ubiquitin ligase complex involved in protein ubiquitination and degradation. We will study Signature 6 in a greater detail in a later section.
Kataegis, clustered somatic SNVs, is known to be associated with chromothripsis4,39. We found that kataegis events co-occur with CGRs (Figure 3b) in 1,004 out of 2,014 cases (50%). In particular, 83% of Signature 1 CGRs are accompanied by kataegis in contrast to around 40% in other signatures (Figure 3c). Such enrichment is present in almost all tumor types (Figure 3d). Interestingly, in melanoma, kataegis co-occurs with the vast majority of CGRs in all six signatures (Figure 3d, Supplementary Figure S6).
Aneuploidy is known to promote genome instability and chromothripsis8,40–42. We then asked if any CGR signatures are associated with whole genome duplication (WGD). When all tumors are considered, most CGR signatures are associated with WGD (Figure 3e). However, when controlled for TP53 mutation status, only tumors with Signatures 1 and 5 as well as tumors with more than one CGR signatures are significantly associated with WGD (Figure 3e). Among tumors with multiple CGR signatures, the ones harboring Signature 1 or 5 CGRs are more likely to harbor WGD (Figure 3f). We further investigated tumor-type-specific effects. Although sample sizes are limited, Signatures 1 and 5 remain significantly associated with WGD in several tumor types such as ovarian, pancreatic, stomach cancers and melanoma (Supplementary Figure S7). In summary, two CGR signatures, Signatures 1 and 5, are associated with WGD, while other signatures are not.
Chromothripsis has been linked to poor patient survival8,26,43. When all patients in PCAWG cohort are tested together, CGRs are associated with poorer survival (Figure 3g left panel). However, it is possible that the poor outcome in these patients is because of impaired cell cycle checkpoints in the tumors (e.g. caused by TP53 mutations). When controlled for TP53 mutation status, patients with or without CGRs in their tumors have comparable survival (Figure 3g right panel). Similarly, CGR status did not predict survival when both tumor types and TP53 mutation status are controlled for (Figure 3h and Supplementary Figure S8). Note that, our results do not conflict with previous reports showing that ecDNA is associated with poor patient outcome25, because ecDNA can arise from both simple genomic rearrangements and CGRs.
CGR breakpoint biases
The uneven distribution of somatic rearrangements across the genome may be caused by biases in their formation and the breakpoint locations may provide clues about their forming mechanisms. We studied the genomic properties of CGR breakpoints by comparing observed breakpoint locations against randomly shuffled locations similar to a previous study of simple SVs17. All CGRs are enriched in high GC content, high gene density and early-replicated regions (Figure 4a) similar to most simple SVs17 suggesting that CGR may be more likely to form in open chromatin regions. The CGR breakpoints being closer to repetitive elements than expected (Figure 4a) indicates that repetitive elements may play a role in DNA fragmentation and/or ligation during CGR formation. Interestingly, CGRs of Signatures 1 and 2 tend to occur far away from telomeres while CGRs of Signatures 3, 4, 5 and 6 preferentially occur near telomeres (Figure 4a). It is possible that acentric DNA fragments resulting from breaks near the telomeres may be more likely to produce micronuclei and chromothripsis.
Role of transcription-replication collision
DNA replication stress is a major source of genome instability44,45. Collision between transcription and DNA replication machineries is unavoidable because both processes use the same DNA template, and can result in replication fork collapse and genome instability46. Some very large genes, known as common fragile sites, are hotspots for deletions due to transcription-replication collision47. Recently, it was reported that deletions, insertions and point mutations can frequently form when such collision is induced in a bacteria system48. Here, we sought to evaluate whether transcription-replication conflict contributes to the CGRs in cancer. First, we defined replication orientation based on RepliSeq data from cell line Bg02es (derived from human embryonic stem cells). The left and right replicated regions were defined as previously described49 (Supplementary Figure S9a) and are independent of the selection of cell types (Supplementary Figure S9b). Then, head-on and co-directional collision regions could be defined based on replication and transcription orientations (Figure 4b). We found Signature 1 breakpoints are significantly enriched in head-on collision regions (Figure 4c, Chi-square tests with Bonferroni correction) compared to randomly shuffled breakpoints. If the rearrangements are caused by transcription-replication conflict, we expect the enrichment to be dependent on gene expression. When controlled for gene expression level, we indeed found the enrichment is only significant in top 50% of the genes (highly or moderately expressed) ranked by expression level in tumors, but not in the bottom 50% (lowly expressed or not expressed) (Figure 4d). To rule out the possibility that the high gene expression being the consequence of CGRs, we performed the same test using gene expression in normal tissues and observed a similar bias (Supplementary Figure S9c) towards expressed genes. To further rule out the effect of selection, we removed breakpoints within 3 Mb of hotspots (Figure 2b) and the bias could still be observed (Supplementary Figure S9d). Previous studies based on in vitro experiments in cell lines reported that ecDNA can form via chromothripsis18,23. Our results indicated that conflicts between DNA replication and transcription may contribute to ecDNA formation in tumor tissue. When a replication fork collapses, DNA polymerase can switch to a new template and different types of genomic rearrangements can form depending on the destination of the polymerase50,51. Template switching upon transcription-replication collision (Figure 4e) can be a plausible mechanism to produce a circular molecule. Further studies are needed to elucidate the role of transcription-replication collision in ecDNA formation.
Hourglass chromothripsis in prostate cancer
The Signature 6 hourglass chromothripsis is dominant in prostate cancer (Figure 2a). Chromoplexy is another form of CGR enriched in prostate cancer, lymphoid malignancies and thyroid cancer4,11. It is considered to be the result of ligation of simultaneously broken DNA ends of several chromosomes11—a complex form of reciprocal translocations4. We sought to address whether hourglass chromothripsis is equivalent to chromoplexy. Using two strategies to detect chromoplexy: ChainFinder11 and junction patterns17, we found hourglass chromothripsis events have little overlap with chromoplexy (Supplementary Figure S10a). In addition, most hourglass chromothripsis cases only involve one or two chromosomes (Figure 1e) while chromoplexy usually involves multiple chromosomes11. Other than prostate cancer, hourglass chromothripsis is commonly seen in glioblastoma and bladder cancer (Figure 2a), while chromoplexy is enriched in thyroid cancer and lymphoid malignancies4. Therefore, we conclude that hourglass chromothripsis is a unique type of CGRs and distinct from chromoplexy.
To test whether hourglass chromothripsis is a one-time catastrophic event, we utilized linked-read sequencing data of 23 prostate cancers52. We identified 10 hourglass chromothripsis events in 15 tumors including two events in an SPOP mutant tumor 01115468-TA3. In this tumor, one of the hourglass chromothripsis occurred on the long arm of chromosome 8 (Figure 5a). Once the rearranged tumor chromosome was reconstructed by following individual somatic SVs (Figure 5b), all rearranged DNA fragments in Figure 5b could be phased into a single haplotype using linked-read barcodes. The same tumor harbored another more complex hourglass chromothripsis involving five chromosomes (Figure 5c). Using the same procedure, we identified seven phased blocks with more than one somatic SV. There are a total of 155 somatic SVs in these seven phased blocks. If this hourglass chromothripsis results from simple SVs accumulated over time, we expect the somatic SVs to be evenly represented in two haplotypes. However, 144 of them can be phased to one haplotype which is extremely unlikely to occur by chance (p=1.3e-28, binomial tests, p values combined with Fisher’s method). These results suggest that hourglass chromothripsis events are indeed one-time catastrophic events.
We then took advantage of additional 329 publicly available WGS prostate cancers from International Cancer Genome Consortium53,54 and identified another 359 CGRs (Supplementary Table S3). In the combined cohort of 516 prostate cancers, we found that mutations in SPOP are significantly associated with hourglass chromothripsis (p=3.4e-3, Fisher’s exact test, Figure 6a). SPOP is known to be recurrently mutated in prostate cancer and the mutations are mutually exclusive with ETS fusions (TMPRSS2-ERG, -ETV1, -ETV4 and -ETV5)55. All mutations are missense mutations in the meprin and TRAF homology (MATH) domain (Supplementary Figure S10b) and potentially disrupt SPOP’s target binding56. Mutant SPOP is associated with micronuclei formation57 which suggests SPOP is likely to be directly involved in hourglass chromothripsis formation. Signatures 3 and 4 are also associated with SPOP mutations (p=1.4e-10 and 1.5e-3, respectively, Fisher’s exact test, Figure 6a). In addition, SPOP mutations are associated with simple genomic rearrangements as well (Supplementary Figure S10c). Based on these results, we conclude that SPOP is likely to be a gatekeeper of genome stability similar to p53. Mutant SPOP may allow the cells to tolerate various types of genomic rearrangements.
Next, we sought to investigate the functional consequences of hourglass chromothripsis in prostate cancer. We identified recurrently deleted regions resulting from Signatures 3, 4 and 6 as well as simple deletions using GISTIC58. We found that CGRs, especially hourglass chromothripsis events, mostly delete the same regions as those regions deleted by simple deletions leading to loss of cancer driver genes such as PTEN (Figure 6b). This suggests that CGRs are under positive selection similar to recurrent simple deletions. A few GISTIC peaks of simple deletions are not found in CGRs including the most frequent one in chromosome 21q22 (Figure 6b) causing TMPRSS2-ERG fusions35. TMPRSS2-ERG fusions should be less likely to result from CGRs because both TMPRSS2 and ERG reside on chromosome 21 and are 3 Mb away from each other. The easiest way to form a fusion gene is to connect the two genes by a simple deletion. The chance of forming fusion gene through CGRs is expected to be much lower since DNA fragments in CGRs are randomly ligated. Nonetheless, we still observed 11 TMPRSS2-ERG fusions resulting from hourglass chromothripsis events in 187 prostate cancers in the PCAWG cohort (Figure 2b and Supplementary Figure S5), which further suggested that hourglass chromothripsis is a major player of tumorigenesis in prostate cancer and the impact of CGRs is similar to that of simple SVs.