Preparation of three breast cancer cohorts with HRD scores
A Japanese cohort consisting of 46 AYA breast carcinomas was prepared in a previous study by our group [37]. These cases were sporadic primary breast cancers diagnosed in 2003–2015 in patients aged 27–39. All members of the cohort underwent surgery at the National Cancer Center Hospital, Tokyo, Japan. Clinicopathological information, such as histological subtype and tumor grades, was obtained retrospectively. Tumor grade was classified into three categories (I–III) according to the criteria for the Nottingham histologic score [38]: percentage of tubule formation, degree of nuclear pleomorphism, and accurate mitotic count [38]. Two other cohorts of breast cancer cases, from the US [35] and Europe [36], were also used; each included 70 AYA cases (Table 1). Information regarding clinicopathological characteristics, germline mutations, somatic mutations, methylation status, and HRD score from the US and European cohorts was obtained from the National Cancer Institute Genomic Data Commons, the cBioPortal database, and published supplementary data [34-36, 39-42], as shown in Table S1 in Additional file 1.
Calculation of HRD score
The HRD scores of the 46 Japanese AYA breast cancer cases were calculated based on genome-wide SNP profiles obtained by whole-exome sequencing [37]. The allelic status of each tumor was assessed using the ASCAT (v2.5.2) algorithm [43]. A total HRD score, i.e., the sum of loss of heterozygosity (number of LOH regions longer than 15 Mb), the telomeric allelic imbalance (number of regions of allelic imbalance that extend to one of the subtelomeres but do not cross the centromere), and large-scale state transitions (number of break points between adjacent regions longer than 10 Mb after filtering out regions shorter than 3 Mb), was calculated for each case according to a previously described method [17]. HRD scores of the US and European cohorts were calculated based on SNP array data [34, 36]. HRD scores obtained by whole-exome sequencing and SNP array analysis were consistent with each other (Pearson r = 0.87) [24]; therefore, cases with HRD score ≥42 were judged as HRD-high for subjects from all three cohorts, according to a threshold commonly used in recent clinical trials [17, 18].
Germline and somatic mutations in 28 cancer-related genes
In the Japanese cohort, 28 cancer-related genes were searched for germline and somatic mutations using existing whole-exome sequencing data [37]. This set of 28 genes consisted of 25 cancer susceptibility genes [44], APC, ATM, BARD1, BMPR1A, BRCA1, BRCA2, BRIP1, CDH1, CDK4, CDKN2A, CHEK2, EPCAM, MLH1, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, STK11, RAD51C, RAD51D, SMAD4, and TP53, and three other genes related to HR [45], CHEK1, FAM175A, and MRE11A. For germline mutations, null variants (nonsense, frameshift indel, and splice-site variants) and missense variants classified as “pathogenic” or “likely pathogenic” in the ClinVar database (as of October 19, 2018; https://www.ncbi.nlm.nih.gov/clinvar/) were selected. Somatic mutations of the TP53 gene were annotated according to the criteria of the IARC TP53 database (version R19) [46], and pathogenic mutations including null variants were counted as positive.
DNA hypermethylation of the BRCA1 and RAD51C genes
DNA methylation assays using the Infinium MethylationEPIC BeadChip Kit (Illumina, San Diego, CA, USA) were performed to obtain genome-wide DNA methylation profiles of 37 tumor (including all 13 HRD-high cases) and 12 adjacent non-tumor (including four HRD-high cases) tissues from the Japanese cohort. After probes with standard deviations greater than 0.05 were removed, 24 probes, including cg04658354 for BRCA1 and cg14837411 for RAD51C, were defined as unmethylated in the 12 non-tumor breast tissues (mean beta value <0.2). A tumor sample was judged as hypermethylated when its beta values for probes of tumor tissues were greater than 0.3. A sample was defined as hypermethylated when it had more than four outlier probes for a specific gene promoter [36]. Based on a previous study [34], hypermethylation of the BRCA1 and RAD51C genes in a US cohort was evaluated based on methylation status at the cg04658354 and cg14837411 loci, respectively. For the European cohort, only the hypermethylation status of BRCA1 was available [39].
Mutational signature analysis
Mutational signatures of the breast cancer genome from the Japanese cohort samples was obtained by decomposing somatic mutations into four major mutational signatures (catalogue of somatic mutations in cancer (COSMIC) signature 1, 2, 3, 6), defined previously [37], to minimize the Kullback–Leibler divergence. A heat map based on signature profiles was generated using the R command regHeatmap. The Pearson’s correlation coefficient between the percentage of BRCA signatures (COSMIC signature 3) and the HRD score was calculated.
Model for predicting HRD status using factors assessed in clinical setting
A prediction model for judging the HRD-high phenotype (i.e., HRD score ≥42) was constructed based on logistic regression of four or five variables: i) presence or absence of pathogenic germline or somatic BRCA1/2 mutations, ii) presence or absence of non-functional somatic TP53 mutations, iii) TNBC subtype or not, iv) high tumor grade (grade III) or not, and v) hypermethylation (BRCA1 and RAD51C) or not (Table 1). To construct and validate the model using these four or five factors, only cases with information for all four or five variables were included in the analysis. First, a HRD prediction model was constructed using data from all cases in the US cohort, irrespective of age, as a development cohort (n = 744). Then, the constructed model was applied to the AYA cases of the Japanese (n = 37 or 46) and European (n = 58) cohorts, respectively (i.e., validation cohorts). When all cases of the European cohort (n = 477) were used as the development cohort, AYA cases of the Japanese (n = 37 or 46) and US (n = 54) cohorts were used as validation cohorts. The area under the receiver operating characteristic curve (AUC) was calculated to evaluate the predictive power of each cohort. Cutoff values of the prediction model were defined using Youden’s index where the sum of sensitivity and specificity was maximal. Based on the cutoff value, positive and negative predictive values were calculated in the validation cohorts.
Statistical analyses
Associations among clinicopathological and genetic factors were examined by Fisher’s exact test, Chi-squared test, Pearson’s correlation, Kruskal–Wallis test, and Mann–Whitney U-test.