NCCN high-risk characterisation for ancestrally assigned PCa patients
Clinically and technically matched whole genome sequenced germline data (mean coverage 45.9X; range 30.2-97.6X) was derived from 170 PCa patients, ancestrally classified previously using 7,472,833 genome-wide SNVs and population substructure analysis18. In brief, 113 Black South African patients presented with an African ancestral genetic fraction of > 85%, while the 57 White patients presented with European ancestral genetic fractions of > 90% (4 South African, 52 Australian) and 73.7% European and 26.3% Asian substructure (1 Australian) (Supplementary Table 1). Importantly, although mean age was 5-years younger at presentation or surgery, a greater number of European (86%; 49/57) over African patients (72%; 81/113) met current NCCN guidelines for germline testing based on International Society of Urological Pathology (ISUP) Group Grading defined as high-risk localized PCa (ISUP 4/5 or Gleason score \(\ge\) 8). Notably, we have previously provided evidence for the extension of these criteria for Black South African men to include ISUP 3, which would expand our cohort of high-risk Black men to 82% (93/113)20. While Black South Africans present with significantly elevated median and range of prostate specific antigen (PSA) levels (median 244 ng/mL versus 9.4), as previously presented18, 23, still the study was biased towards over representation of NCCN guidelines for PSA inclusive high-risk PCa for the European (70.2%; 40/57) over African patients (65/113; 57.5%).
Genome-wide gene-disrupting SV discovery
In this study, we identified and genotyped 42,966 high-quality germline SVs. We found a median of 9,206 SVs (range: 8,891 to 9,708) per-African genome, which is significantly higher than the median of 7,490 (range: 7,309 to 8,050) per-European genome (p-value = 1.1e-26 by Wilcoxon test). In total, we identified 38,668 African derived SVs (18,674 private) and 24,292 European derived SVs (4,298 private) (Supplementary Table 2). Including only high-quality genotype calls for allele frequency (AF) estimation left a total of 33,243 high-confidence SVs. Excluding for common SVs, defined as minor allele frequency (MAF) > 5%, a total of 20,982 rare (MAF < 1%) and low-frequency (MAF = 1 to 5%) SVs remained across the ancestries for further annotation (Fig. 1).
Further interrogation for gene regions overlapping, we identified 1,857 gene-disruptive SVs, including 1,752 potential Loss-of-Function (pLoF), 52 Copy Gain (CG) and 53 Intragenic Exon DUP (IED) (detailed in Methods). Notably, pLoF, CG and IED SVs can have functional impact on genes through either gene inactivation or increased dosage effect32. Conversely, there is no clear or direct coding effect by SVs with other gene impact types, which included in our study 109 partial gene DUP, 22 partial exon DUP, 48 whole-gene INV, 343 promoter SVs, 9,431 intronic SVs and 258 enhancer SVs. As such, the latter SVs were not discussed further. In total, we identified 1,857 (MAF \(\le\)5%) gene-disruptive SVs of which 1,407 are African-relevant, including 93% (1,314) African-private, and 543 European-relevant, including 83% (450) European-private (Supplementary Table 2). There were 93 SVs (5%) shared by both African and European PCa patients. The 1,857 gene-disruptive SVs (1,050 rare in both African and European) underwent further downstream interrogation for potential clinical relevance. Of the 1,857 gene-disruptive SVs, 1,167 were previously reported in dbVar database of SVs, while 690 were absent and as such regarded as novel, of which 513 (74%) are uniquely African (Fig. 1).
Characterising ClinVar verified candidate potentially pathogenic SVs
Of the 1,167 dbVar reported gene-disruptive SVs, 14 (1.2%) were recorded in ClinVar, with three reported as ‘pathogenic’ or ‘likely pathogenic’ based on functional prediction consensus. One 2,958 bp likely pathogenic DEL results in loss of exon 7 in OCA2 (Supplementary Fig. 1), a 5,064 bp pathogenic DEL leads to exon 5–7 loss in PIGN (Supplementary Fig. 2), while a 235 bp likely pathogenic DUP duplicates exon 3 of SLC3A1 (Supplementary Fig. 3). The OCA2 and PIGN DELs were identified in a single African patient each, while the SLC3A1 DUP presented in two African patients (Table 1).
Although pathogenic in ClinVar, none have been associated with cancer phenotypes and include rather oculocutaneous albinism, multiple congenital anomalies-hypotonia-seizures syndrome and cystinuria, respectively. As such, we searched the literature for plausibility with further ascertainment derived from normal prostate and tumour tissue data sets using GENT233. Reported to be downregulated in numerous cancer types (all-type P < 0.001, GENT2 T-test), although not significant for PCa, pLoF deletion of the pigmentation gene OCA2 has been linked not only to Prader-Willi syndrome, but also Prader-Willi associated malignancies34, and melanoma35, with recent studies linking melanoma with increased PCa risk36. Highly expressed in normal prostate tissue with significant upregulation in tumour tissue (P < 0.001, GENT2 T-test), PIGN functions as a cancer chromosomal instability suppressor gene37, 38. Although at lower levels, SLC3A1 is also upregulated in PCa (P < 0.001, GENT2 T-test), with overexpression in breast cancer associated with tumourigenesis39. These observations taken together provide the rational for characterising the pLoF OCA2 and PIGN DELs and SLC3A1 IED as potentially pathogenic SVs (PP-SVs). Notably, all three SVs are reported as rare (irrespective of ancestry) in multiple population-wide studies including gnomAD SV32, 1000 genomes Project (1KGP)40, 41 and TOPMed SV42 (Supplementary Data 1).
Table 1
Candidate potentially pathogenic (PP) SVs identified in 170 PCa patients.
Genes | Gene impact type1 | chrom1 | pos1 | chrom2 | pos2 | SV type | ClinVar / dbVar concordance | MAF African (this study) | MAF European (this study) | MAF African (dbVar)2 | MAF European (dbVar)2 |
Potentially Pathogenic SV (PP-SV) |
SLC3A1 | IED | chr2 | 44281377 | chr2 | 44281612 | DUP | Likely pathogenic | 0.01 3 | 0 | 0.0075 | 1.3e-04 |
OCA2 | pLoF | chr15 | 28017719 | chr15 | 28020677 | DEL | Likely pathogenic | 0.004 | 0 | 0.0015 | 0.001 |
PIGN | pLoF | chr18 | 62152637 | chr18 | 62157701 | DEL | Pathogenic | 0.004 | 0 | 0.0013 | 1.3e-04 |
SLC7A2 | pLoF | chr8 | 17418976 | chr8 | 17544122 | DEL | In dbVar | 0.009 | 0 | 0.003 | 0 |
DNAJC15 | pLoF | chr13 | 43078470 | chr13 | 43079390 | DEL | In dbVar | 0 | 0.009 | 0 | 1.0e-04 |
BCL2L11 | pLoF | chr2 | 111122626 | chr2 | 111125901 | DEL | novel | 0.005 | 0 | NA | NA |
BARD1 | pLoF | chr2 | 214768022 | chr2 | 214772899 | DEL | novel | 0.005 | 0 | NA | NA |
COL4A2/ COL4A1 | CG | chr13 | 110294204 | chr13 | 110633815 | DUP | In dbVar | 0.005 | 0 | 1.3e-04 | 6.3e-06 |
SLC2A5 | IED | chr1 | 9045605 | chr1 | 9049441 | DUP | In dbVar | 0 | 0.009 | 7.3e-04 | 0.002 |
FOXP1 | pLoF | chr3 | 71097066 | chr3 | 74525618 | INV | novel | 0.009 | 0 | NA | NA |
WASF1 | pLoF | chr6 | 108167886 | chr6 | 110172775 | INV | In dbVar | 0.004 | 0 | 9.6e-05 | 0 |
MLH1 | pLoF | chr3 | 37000362 | chr3 | 39352689 | INV | In dbVar | 0.004 | 0 | 4e-04 | 6.4e-06 |
RB1 | pLoF | chr13 | 48466588 | chr13 | 48473911 | INV | In dbVar | 0.004 | 0 | 1.8e-04 | 1.3e-05 |
CTNNA1 | pLoF | chr5 | 138903881 | chr19 | 21614900 | TRA | novel | 0 | 0.009 | NA | NA |
AK8-DST | pLoF | chr9 | 132876361 | chr6 | 56896165 | TRA | novel | 0 | 0.009 | NA | NA |
PP-SV candidates classified as ‘cautionary’ |
LTBP1/ BIRC6 | CG | chr2 | 32403832 | chr2 | 33107415 | DUP | In dbVar | 0 | 0.009 | 1.0e-04 | 0.0018 |
PHC3- PRKACA | pLoF | chr3 | 170090742 | chr19 | 14110142 | TRA | novel | 0.004 | 0 | NA | NA |
KCTD3-DST | pLoF | chr1 | 215567414 | chr6 | 56652607 | TRA | novel | 0.009 | 0 | NA | NA |
PKHD1 | pLoF | chr6 | 51981375 | chr15 | 30874073 | TRA | novel | 0.009 | 0 | NA | NA |
1 Gene impact type based on gene annotation. pLoF: Potential loss-of-function. CG: Copy gain. IED: Intragenic Exon Duplication.
2 The ancestry related MAF in dbVar were based on gnomAD32 or TOPMed42 SV study. The detail of all dbVar studies (dbVar study name and ID) and reported allele frequencies were shown in Supplementary Data 1.
3 Presenting at low-frequency rather than rare variants within the ancestrally-defined patient cohort.
Characterising candidate potentially pathogenic SVs absent from ClinVar
Among 1,843 SVs with unknown classification in ClinVar or absent from dbVar, we predicted their potential pathogenicity based on four SV impact prediction tools, including StrVCTVRE43, CADD-SV44, POSTRE45 and PhenoSV46. The number of scored SVs by four tools and their types were shown in Supplementary Fig. 4 and Supplementary Table 3. Candidate SVs were required to meet two of the following criteria: StrVCTVRE score \(\ge\)0.37, CADD-SV score \(\ge\) 10, POSTRE score \(\ge\)0.8 and/or PhenoSV score \(\ge\)0.5 (Supplementary Table 4 and Methods). Based on this criterion, all three ClinVar identified pathogenic or likely pathogenic SVs and the single SV of uncertain significance were successfully annotated as pathogenic candidates, while conversely our workflow excluded for all 10 ClinVar characterised benign SVs (Supplementary Table 5). Using our criteria, 291 SVs were defined as PP-SV candidates (107 DELs, 16 DUPs, 11 INVs and 157 TRAs) disrupting 419 genes. In total 190 candidate SVs were private to African and 88 to European patients, with 13 shared between the ancestries (Supplementary Table 4).
To further define cancer-related pathogenic potential, we assessed for the presence of disrupted genes by PP-SV candidates in gene sets derived from the Human Molecular Signature Database (MSigDB) oncogenic signature and hallmark gene sets47 and COSMIC Cancer Gene Census (COSMIC CGC) cancer driver genes48. Requiring disrupted genes in two of the three cancer gene sets, 58 SVs were defined as cancer-related PP-SV candidates, including 20 DELs, 3 DUPs, 6 INVs and 29 TRAs, disrupting 56 genes. Of the 58 candidates, 23 of them were identified with MAF between 1–5% in either African or European patients, leaving 35 rare PP-SV candidates for further consideration, of which 16 have been reported in dbVar. Two dbVar SVs including TRA disrupting gene NBEA and POLR2C DEL were reported at low-frequencies (AF = 0.03 and 0.01, respectively) (Supplementary Data 1) and were therefore excluded from further analysis. Using our criteria, 33 rare cancer-related PP-SV candidates were identified (Supplementary Data 2 and Fig. 1), including 15 DELs, 3 DUPs (1 IED and 2 CGs), 5 INVs and 10 TRAs.
Of the 15 pLoF DELs, 11 were excluded as PP-SVs, with impacting genes showing oncogenic behaviour in multiple cancer types or no strong evidence for their tumour suppressor effects (Supplementary Table 7). Conversely, four pLoF DELs were defined as PP-SVs, impacting known tumour suppressors or established DNA damage repair gene (Supplementary Table 7). Two of them are known to dbVar, including a SLC7A2 125,146 bp DEL identified in two African (Supplementary Figs. 5) and a DNAJC15 920 bp DEL in a European patient (Supplementary Figs. 6). Another two identified PP-SVs are novel pLoF DELs, which identified in a single African patient each, including a BCL2L11 3,275 bp (Supplementary Fig. 7) and DNA damage repair gene BARD1 4,877 bp DEL (Fig. 2A, Supplementary Fig. 8).
Of the two dbVar whole-gene DUPs, the COL4A2 339,611 bp CG, with breakpoints disrupting COL4A1 and NAXD, observed in a single African patient is defined as a PP-SV (Supplementary Fig. 9), as COL4A2 indicating oncogenic behaviour in gastric and breast cancers (Supplementary Table 7). In contrast, the TTC27 703,583 bp DUP observed in a single European patient is afforded ‘cautionary’ PP-SV status (Supplementary Fig. 10). Although TTC27 is absent in three cancer gene databases, the breakpoints disrupt MSigDB and COSMIC CGC genes BIRC6 and LTBP1, resulting in a LTBP1-BIRC6 gene fusion of unclear effect. Observed in a single European patient, a 3,836 base DUP directly impacts exon 4 of SLC2A5 (Supplementary Fig. 11), which downregulated in PCa (P < 0.001, GENT2 T-test) and has been identified an oncogenic behaviour (Supplementary Table 7), therefore allocated PP-SV status.
Of the five pLoF INVs, those impacting MLH1, RB1 and WASF1 are in dbVar, while FOXP1 and NSD3 INVs are novel. As NSD3 has been identified as oncogenic in multiple cancers, the associated INV is classified here as unlikely pathogenic, with all remaining pLoF INVs classified as PP-SVs, as they disrupting known to PCa and Lynch Syndrome predisposing DNA mismatch repair gene MLH1 and PCa tumour suppressor genes RB1, WASF1, and FOXP1 (Supplementary Table 7). Identified in a single African patient each (Supplementary Fig. 12–14), the three dbVar INVs were reported as rare by the recent TOPMed SV study42, in which WASF1 INV was also identified as African-specific (Table 1 and Supplementary Data 1). The novel INV impacting FOXP1 was identified in two African patients (Fig. 2E, Supplementary Fig. 15).
Of the 10 pLoF TRAs, five impacting genes of GRM8, WDR43, NPM1, NUSAP1 and MECOM with oncogenic properties (Supplementary Table 7), therefore are classified as unlikely pathogenic. PKHD1 TRA identified in two African patients received a ‘cautionary’ PP-SV classification, as identified potential oncogenic in colon cancer, while potential tumour suppressor in colorectal cancer (Supplementary Table 7). As CTNNA1 was known to have tumour suppressor behaviour across multiple tumour types (Supplementary Table 7), here we classify the European-specific pLoF CTNNA1 TRA as a PP-SV (Supplementary Fig. 16). The remaining pLoF TRAs result in PHC3-PRKACA (1 African patient), KCTD3-DST (2 African patients) and AK8-DST (1 European patient, Supplementary Fig. 17) novel gene fusions. PHC3-PRKACA was classified as ‘cautionary’ PP-SV, as PHC3 showed potential cancer suppressor effect in PCa, while PRKACA appears to portray oncogenic behaviour (Supplementary Table 7). Although unknown to PCa, both DST and AK8 have demonstrated tumour suppressor behaviour, conversely, KCTD3 with an unclear role in cancer (Supplementary Table 7). Here we classify AK8-DST as a PP-SV, while KCTD3-DST is assigned ‘cautionary’ PP-SV status.
Correlating PP-SVs and ‘cautionary’ PP-SVs with clinical features
The clinicopathological features of the study cohort has been previously described18, 28. In brief, African patients show a 5-year greater mean age and 25-fold greater PSA level at diagnosis compared to European patients (Supplementary Table 1). Based on our previous observations20, high-risk or aggressive PCa were defined as ISUP GG\(\ge\)3 and conversely, low-risk disease presentation as ISUP GG\(<\)3. Biased towards aggressive disease presentation (82% African, 86.0% European), it was notable that all four patients with a pathogenic or likely pathogenic SV presented with aggressive disease at diagnosis, 92.9% (13/14) of PP-SV and 83.3% (5/6) cautionary PP-SV presenting patients (Table 2).
Table 2
Clinicopathological features of patients by ethnicity presenting with potentially pathogenic (PP) SVs and cautionary PP-SVs as defined by this study criteria.
Gene name | Pathogenicity | SV type | Patient ID | Ethnicity | Age | PSA | ISUP GG | Family history |
SLC3A1 | PP-SV Likely Pathogenic | DUP | N0001 | African | 75 | 22.9 | 4 | |
SMU094 | African | 64 | 15 | 4 | |
OCA2 | PP-SV Likely Pathogenic | DEL | N0059 | African | 79 | 153 | 5 | |
PIGN | PP-SV Pathogenic | DEL | SMU083 | African | 86 | 40.5 | 3 | |
SLC7A2 | PP-SV | DEL | UP2035 | African | 70 | 680 | 5 | |
KAL0054 | African | 64 | 42.9 | 5 | |
DNAJC15 | PP-SV | DEL | 17135 | European | 63 | 7.8 | 5 | |
BCL2L11 | PP-SV | DEL | KAL0101 | African | 71 | 32.3 | 5 | |
BARD1 | PP-SV | DEL | N0073 | African | 62 | unknown | unknown | |
COL4A2/COL4A1 | PP-SV | DUP | UP2039 | African | 71 | 319 | 4 | |
SLC2A5 | PP-SV | DUP | 11099 | European | 70 | 9.9 | 5 | |
FOXP1 | PP-SV | INV | UP2101 | African | 57 | 75 | 5 | |
N0084 | African | 65 | 591 | 4 | |
WASF1 | PP-SV | INV | N0048 | African | 70 | 83.3 | 5 | |
MLH1 | PP-SV | INV | SMU080 | African | 64 | 23.3 | 4 | Sister with cervical cancer |
RB1 | PP-SV | INV | SMU064 | African | 70 | 13.7 | 3 | |
CTNNA1 | PP-SV | TRA | 13179 | European | 59 | 8.4 | 5 | |
AK8-DST | PP-SV | TRA | 11452 | European | 67 | 11 | 1 | |
LTBP1/BIRC6 | Cautionary PP-SV | DUP | 5287 | European | 54 | 4.3 | 5 | |
PHC3-PRKACA | Cautionary PP-SV | TRA | SMU061 | African | 65 | 12.1 | 3 | Mother with stomach cancer |
KCTD3-DST | Cautionary PP-SV | TRA | UP2039 | African | 71 | 319 | 4 | |
SMU101 | African | 70 | 4.3 | 3 | |
PKHD1 | Cautionary PP-SV | TRA | N0056 | African | 70 | 153 | 5 | |
SMU196 | African | 47 | 9.5 | 1 | |