Patient selection:
In present study, we have selected 30 unrelated patients with breast cancer and/or ovarian cancer and were found negative to panel genes designed previously for west Indian population. Distribution of the patients on the basis of disease and age of onset has been represented in supplementary figure 2. The whole exome analysis of 30 patients was performed and the data were analysed using various in silico tools. Further, the variants were annotated for functional consequences by exomiser based on the Human Phenotype Ontology (HPO) for BC and OC. Initially, we found total of 4,56,741 unique variants of 18,594 genes among the 30 patients with the average 40,000 variants per patient. We examined the variants based on two major criteria, (i) prevalence of variants among the patients and (ii) the disease association using phenotype score of exomiser (Figure 1).
Variants selection based on the prevalence of variants among West Indian patients:
To analyse the variants based on their prevalence in west Indian patients, we analysed all the entries obtained from Exomiser. Firstly, we obtained total more than 1.2 million variants from 30 patients. Further, we examined the variants on the basis of their prevalence in ≥90% of patients (27/30 or more patients). We found the 2,481 variants of 2,150 genes were present in ≥90% patients. Therefore, we indicated these 2,481 variants as highly prevalent variants among the West Indian population for HBOC.
Analysis of prevalent variants and their pathogenicity:
We analyzed the highly prevalent variants on the basis of their mutational pathogenicity predicted in SIFT tool and found 17 variants of 8 genes. The genes are HYDIN, AVIL, IWS1, PLA2G6, PRDM4, ST3GAL2, TROAP, and ZNF717. The genes HYDIN having 5 variants, IWS1 having 2 variants and ZNF717 having 5 variants. Rest of the genes have one variant each. We analyzed the variants and found 8 variants, HYDIN:c.12478G>C, HYDIN:c.11224G>A, HYDIN:c.2119A>G, TROAP:c.1676G>A, ZNF717:c.1994A>G, ZNF717:c.1980A>T, ZNF717:c.1021T>G and ZNF717:c.526T>C were likely benign and 9 variants, HYDIN:c.6892C>G, HYDIN:c.6259C>T, AVIL:c.1364C>T, IWS1:c.713A>G, IWS1:c.712G>A, PLA2G6:c.2283C>A, PRDM4:c.82C>G, ST3GAL2:c.611G>C and ZNF717:c.2083C>T were VUS. None of them were reported in clinvar database. The variants of HYDIN, AVIL, IWS1, PLA2G6, PRDM4, ST3GAL2, and ZNF717 were predicted highly oncogenic and the variant from TROAP and 2 variants of ZNF717 were predicted benign. None of the gene has shown the significant association with cancer, breast cancer or ovarian cancer (Table 1).
Analysis of prevalent variants having South Asian frequency:
We looked into the allele frequency of these variants in the South Asian population and found 9 variants of 9 genes. The genes are TCF20, SOST, MALT1, LRIT2, MAN2C1, SLC4A3, ZFR2, ZNF717 and FAM104B. We analysed the variants in varsome tool and found that the 2 variants, SOST: c.122del and MALT1: c.2406del were pathogenic, 1 variant, SLC4A3: c.470del was likely pathogenic, 3 variants, TCF20: c.5853C>T, MAN2C1:c.2246+5A>G and ZNF717:c.959T>C were likely benign and 3 variants, LRIT2:c.726del, ZFR2:c.943del and FAM104B:c.331C>T were VUS. None of the variants are reported to be associated with BC and OC in clinvar database. The MAN2C1:c.2246+5A>G was predicted highly oncogenic by cScape tool. None of the gene is significantly associated with cancer, breast cancer or ovarian cancer (Table 2).
Analysis of prevalent variants for high to moderate HBOC Genes:
We analyzed the highly prevalent variants whether that particular gene or the variants have been previously associated with high or moderate risk of HBOC condition. First, we gone through the set of genes reported in NCCN guidelines and thereafter the set of genes which were included in our previously customized gene panel. From NCCN guideline, we included 22 genes which are associated with high or moderate risk of HBOC condition and from the gene panel we included 18 genes for the present analysis. We found 3 variants of 3 genes from NCCN genes and 3 variants of 3 genes from customized panel gene set. Further, we found that the BRCA1:c.3214del and BRIP1:c.356del were common in both sets, whereas NF1:c.3093_3094del was from NCCN genes set and ERBB2: c.2694del was from customized panel genes set. The BRCA1:c.3214del was found pathogenic in varsome and shows significant association with HBOC in clinvar. The BRIP1:c.356del variant was found likely pathogenic but not reported in clinvar. The NF1: c.3093_3094del was found pathogenic but not reported in clinvar. The ERBB2: c.2694del variant was found likely pathogenic but was not reported in clinvar. All the four genes, BRCA1, BRIP1, NF1 and ERBB2 have shown strong functional relation with breast and ovarian cancer (Table 3).
Variants analysis using Exomiser score:
We analysed the variants in Exomiser and selected top 50 variants (Based on Phenotype score) from each patient sample. By examining those variants; we identified total 687 variants of 81 genes from 30 patients. From this, we identified 223 variants of 22 genes which are reported in NCCN guidelines and/or included in previously designed customized gene panel for HBOC. The remaining 464 variants of 59 genes are not included in NCCN guideline for HBOC.
Analysis of disease associated variants identified from Exomiser for their Pathogenicity:
To analyse the pathogenic effects of respective variation in exome of HBOC patients, we analysed the 464 variants of 59 genes by their respective SIFT and Polyphen scores and found 15 variants of 12 genes. The genes include COL14A1, FAN1, GNAS, OPCML, PHB, PIK3CA, POLE, PPM1D, RAD54L, RNF43, TERT and TWIST1. We found these gene variants in varsome and classified the variants based on the ACMG guidelines. Out of the 15 variants, 14 variants, COL14A1:c.529G>T, GNAS:c.478A>G, OPCML:c.458T>C, PHB:c.505T>C, PIK3CA:c.31T>G, PIK3CA:c.32G>T, POLE:c.6302C>A, POLE:c.6344A>G, PPM1D:c.1579G>A, RAD54L:c.345A>C, RAD54L:c.579C>G, RNF43:c.379C>T, TERT:c.2711T>A and TWIST1:c.332T>C were found VUS and 1 variant FAN1:c.1589T>C was likely benign. Further, we analysed the oncogenic properties of variants and found variants of COL14A1, OPCML, PHB, PIK3CA, POLE, PPM1D, RAD54L, RNF43, TERT and TWIST1 were oncogenic, FAN1 and GNAS were benign. We also confirmed our results by visualizing the chromosomal position of each variation in IGV (Table 4). Further, the protein-protein interaction analysis using STRING suggested that out of 59 genes, 53 showed functional association. These 59 genes were also analysed for their association with high and moderate risk genes reported for HBOC, and found that 54 genes except COL14A1, AAGAB, OPCML, SEC23B and DMPK genes, were functionally associated (Supplementary Figure 3). The analysis of disease association with BC and OC revealed that majority of them are strongly associated with BC and OC. Moreover, the functional annotation analysis suggested that the majority of genes also involves in the biological processes associated with DNA integrity maintenance, transcriptional regulation, cell cycle and apoptosis.
Analysis of disease associated variants identified from Exomiser for the South Asian prevalence:
We analysed the 464 variants of 59 genes for their frequency in South Asian population. We found 5 variants of 5 genes encompassing South Asian population frequency. The genes include KRAS, MRE11, PPM1D, RAD54L and RNF43. Out of these 5 variants, KRAS:c.547A>G was benign, MRE11:c.1441del and RAD54L:c.2209C>T were pathogenic and PPM1D:c.1579G>A and RNF43:c.379C>T were VUS in varsome. Further, we analysed the oncogenic properties of variants and found that variants of KRAS, PPM1D, RAD54L and RNF43 are oncogenic (Table 5).
Analysis of prevalent disease associated variants identified from Exomiser among patients:
We analyzed the 687 variants of 81 genes for their frequency. We scrutinized the variants on the basis of their frequency in patients. For the present study, we scrutinized variants with ≥25%. We found 33 variants of 30 genes having higher frequency among patients. Out of these 33 variants, 25 variants, CTNNB1:c.718del, WNT10A:c.307del, NBN:c.2027del, SMAD4:c.130_131insA, PALLD:c.88del, PRLR:c.1251del, HMMR:c.470del, MITF:c.598_599del, GNAS:c.106del, POLD1:c.262del, KEAP1:c.1652del, STK11:c.326dup, ESR1:c.677dup, TERF2IP:c.1116del, RAD51:c.60dup, PALB2:c.3199del, POT1:c.1789dup, SEC23B:c.82_83del, ATM:c.1102del, CDKN2A:c.238del, MLH1:c.718del, RAD50:c.2863del, FGFR2:c.2096del, CDH1:c.1955del, PALB2:c.3351del were pathogenic, 5 variants, ACD:c.1217del, AKT1:c.722del, BRCA2:c.1597dup, POLD1:c.66_67insG and TERT:c.3327del were likely pathogenic and 03 variants, CDKN2B:c.173del, CDKN2B:c.310del and IDH1:c.994del were VUS in varsome. These 3 genes having 2 variants each are CDKN2B- (2) VUS, PALB2- (2) Pathogenic, POLD1- (1) Pathogenic, (1) Likely Pathogenic (Table 6). These genes were further analyzed by STRING and found that the 29 genes showed prominent association with high to moderate HBOC genes (Supplementary Figure 4). Therefore, the identified gene variants may possess the potential for diagnosis purpose for the early detection of HBOC. Moreover, the functional annotation analysis suggested that the majority of genes also involves in the biological processes associated with DNA integrity maintenance, transcriptional regulation and cell cycle (Supplementary Figure 5).
Analysis of variants identified from Exomiser pertaining to high/moderate-risk genes of HBOC:
We analysed 687 variants of 81 genes for their frequency. We scrutinized the variants on the basis of their frequency in patients. For the present study, we scrutinized genes and their variants having high to moderate risk of HBOC and found 223 variants of 22 genes. The genes include BRCA1, BRCA2, CDH1, BRIP1, NBN, PALB2, TP53, ATM, STK11, BARD1, CHEK2, RAD51C, PTEN, PMS2, MSH2, MSH6, CDKN2A, MLH1, EPCAM, RAD51D, RAD50 and CASP8. On the basis of the pathogenicity score, we identified the 4 variants of 4 genes includes TP53, BRCA1, STK11 and CASP8. Of them, TP53:c.787A>G was predicted benign, BRCA1:c.4621T>C was pathogenic, STK11:c.191A>G was likely pathogenic and CASP8:c.811T>C was VUS. The TP53:c.787A>G has been reported in south Asian population (Table 7).