SNV distribution among the two breast tumors and metastatic lymph node
One unanswered question for multifocal breast cancer is the clonal relationship of its multiple foci. For the patient enrolled in this study, the locations of the two separated breast tumors and one metastatic axillary lymph node were shown in Fig. 1a. Both bulk exome and single cell whole genome sequencing were performed. By bulk exome sequencing (with a DNA mixture from 18 non-tumor single cells as control), we detected 451, 525 and 441 somatic SNVs from the upper, outer breast tumors and lymph node respectively (Fig. 1b). Among them were 39, 38 and 40 nonsynonymous SNVs. Mutation types of these SNVs showed no significant differences among the two breast tumors and lymph node (Fig. 1c), indicating similar tumorigenesis mechanisms.
Then we focused on the nonsynonymous SNVs and drew a heatmap showing the distribution of 62 nonsynonymous SNVs (Fig. 1d). To find out the origin of the two breast tumors, we first compared nonsynonymous SNVs between the upper and outer breast tumors. For 54 nonsynonymous SNVs, 42.6% (23/54) were shared, and for 22 putative driver SNVs (also included in COSMIC database or KEGG: pathways in cancer), 86.4% (19/22) were shared. Such high percentages of shared SNVs indicated that the two breast tumors had a common origin. To study the clonal relationship of the metastatic lymph node with the two breast tumors, we compared nonsynonymous SNVs between the upper or outer breast tumors and lymph node separately. For the upper breast tumor and lymph node, among 58 nonsynonymous SNVs, 36.2% (21/58) were shared, and among 24 putative driver SNVs, 70.8% (17/24) were shared. For the outer breast tumor and lymph node, the percentages were much higher. Among 46 nonsynonymous SNVs, 69.6% (32/46) were shared, and among 21 putative driver SNVs, 81.0% (17/21) were shared. What’s more, all of the nonsynonymous SNVs shared by the upper breast tumor and the lymph node also existed in the outer breast tumor. These results suggested that the lymph node was much more likely to come from the outer breast tumor, following the lymphatic drainage pathway.
To further understand how these nonsynonymous SNVs may affect the functions of protein, we performed the functional impact prediction by three methods: Mutation Assessor, SIFT and Polyphen-2 (Fig. 2a). Among all nonsynonymous SNVs, EPPK1 D2378H, which was shared by the two breast tumors and lymph node, were predicted highly deleterious by all of the three methods. Besides, PKDREJ C1134Y, LAGE3 L73F, VEPH1 A296D, ENG C182F, DIP2B N1321S, ZKSCAN2 G567D, etc. were also predicted deleterious to varying degrees.
For the 22 putative driver SNVs, NOTCH2 and TERT genes were also included in KEGG: pathways in cancer. Figure 2b showed their positions and upstream or downstream proteins in pathways in cancer. NOTCH2 gene encodes a member of Notch family, which plays a role in angiogenesis sustainment [29]. TERT gene encodes the protein component with reverse transcriptase activity of telomerase, which maintains telomere ends by addition of the telomere repeat [30]. Mutation of NOTCH2 were shared by the two breast tumors and lymph node, while mutation of TERT was specific to the upper breast tumor. 21/22 putative drivers SNVs (except for TERT gene) were also included in COSMIC database, and Fig. 2c showed protein annotations for 8 SNVs from 5 genes. For EPPK1 gene, R2239H, R2364Q and D2378H mutations affected its Plectin repeat domains, which are associated with intermediate filaments and cell adhesion. And for RFPL1 gene, I167V mutation affected its SPRY-associated domain, which is associated with innate immunity and inflammation. Mutations of EPPK1, NLRC5, NOTCH2 and RFPL1 genes were shared by the two breast tumors and lymph node, while mutation of PRPF38B gene was specific to the lymph node.
Phylogenic tree on SNV level confirms common origin and single focus metastasis
To clearly demonstrate the evolutionary relationships among the two breast tumors and lymph node, we constructed phylogenic tree based on nonsynonymous SNVs (Fig. 3a). In the phylogenic tree, the two breast tumors and lymph node shared a common ancestor, which confirmed common origin of the two breast tumors, just as indicated above. Besides, the outer breast tumor and lymph node shared a more recent ancestor, confirming single focus metastasis in this case. In other words, the lymph node metastasis came from the outer breast tumor instead of co-metastasis by both of the two breast tumors, the same as shown above. Majority (17/24) of the putative driver SNVs were trunk mutations, such as mutations of EPPK1, NLRC5, NOTCH2 and RFPL1 genes. While mutations of LRRC37A, TERT and PSG4 genes were branch mutations specific to the upper breast tumor, and mutations of PRB2 and PRPF38B genes were branch mutations specific to the lymph node.
The mutation frequency distribution in Fig. 3b further supported these conclusions. For the upper and outer breast tumors, more than half of the shared SNVs had close mutation frequencies, indicating their similarity. And compared with the upper breast tumor and lymph node, more of the shared SNVs between the outer breast tumor and lymph node had close mutation frequencies, meaning that they were more similar at SNV level.
To further investigate the subclone architecture of these nonsynonymous SNVs, we used PyClone to compare between the outer and upper breast tumors (Fig. 3c), and between the outer breast tumor and lymph node (Fig. 3d). For the two breast tumors, SNVs were divided into four clusters and the cellular prevalence of the four clusters kept stable from the outer to upper breast tumor. While for the outer breast tumor and lymph node, the subclone architecture of SNVs became more complex, with a total of five SNV clusters divided. And this complex subclone architecture (or the cellular prevalence of the five clusters) still kept stable from the outer breast tumor to the lymph node. In general, no matter during separation of the two breast tumors or lymph node metastasis, the SNV subclone architecture still maintained unchanged, without presence of any advantageous or disadvantageous SNV subclones.
CNV patterns of single cells from the two breast tumors and lymph node
Besides bulk exome sequencing, in the meanwhile we performed low depth single cell whole genome sequencing of the two breast tumors and lymph node for CNV analysis. A total of 147 single cells were sequenced. Among them 4 cells didn’t pass the quality control and 18 cells were identified as non-tumor cells. Sequencing data from 40, 53 and 28 single cells (a total of 125 cells) in the upper, outer breast tumors and lymph node were used in the following analysis. CNVs of the 125 single cells were shown in the heatmap in Fig. 4a. Clustering based on whole genome CNVs divided all single cells (without regard to origins) into three clones. Clone 1 and Clone 2 were major clones, accounting for 44% (55/125) and 51.2% (64/125) of all cells, and Clone 3 was a minor clone, accounting for only 4.8% (6/125). Majority of cells in Clone 1 were from the outer breast tumor, and the left few cells were from the lymph node. For Clone 2 cells were from the upper breast tumor and the lymph node. And Clone 3 included cells from the outer breast tumor and the lymph node. In this respect, compared with the upper breast tumor, the outer breast tumor may be more similar to the lymph node.
Then to find the characteristics of each clone, we draw the CNV patterns for the three clones in Fig. 4b. Each clone had its specific CNV characteristics. Except for Clone 3, which showed only a few CNV regions, Clone 1 and Clone 2 shared some CNV regions, for example, copy number gains at chr8q, 17q, 20q and 21q, and copy number losses at chr6q, 8p, 11q, 14q and 17p. Clone 1 and Clone 2 also had their specific CNV regions. Specific CNV regions of Clone 1 included copy number gains at chr1q and 16p, and copy number losses at chr10q, 16q, 17p and X. Specific CNV regions of Clone 2 included copy number losses at chr1p, 2q, 3p, 4p, 4q, 5q, 7p, 7q, 9p and 18q.
Clonal evolution on CNV level identifies a subclone with metastatic advantages
To reveal the CNV level clonal evolution from the breast tumors to lymph node, we used TimeScape to demonstrate this process. As shown in Fig. 4c, from the breast tumors to lymph node, the frequency of Clone 3 kept stable, the frequency of Clone 1 decreased, and the frequency of Clone 2 increased significantly. This indicated that compared with Clone 1 and Clone 3, Clone 2 had more metastatic advantages.
Then we analyzed the specific CNV regions of Clone 2 and putative driver genes located in these regions, as marked in Fig. 4b. In chr1p, with copy number losses in Clone 2, there were many famous cancer related genes, such as ARID1A, MTOR and JAK1. ARID1A encodes a member of SWI/SNF family, which is required for transcriptional activation of genes normally repressed by chromatin [31]. MTOR encoded protein belongs to a family of phosphatidylinositol kinase-related kinases, which mediates cellular responses to stresses such as DNA damage and nutrient deprivation [32]. JAK1 encoded protein is a member of protein-tyrosine kinases, playing a crucial role in effecting the expression of genes that mediate inflammation, epithelial remodeling and metastatic cancer progression [33]. Chr5q, with copy number losses in Clone 2, also contained many important genes associated with cancer, such as APC, MAP3K1 and PIK3R1. APC encodes a tumor suppressor protein that acts as an antagonist of the Wnt signaling pathway [34]. MAP3K1 encoded protein is a serine/threonine kinase and is part of some signal transduction cascades, including the ERK and JNK kinase pathways as well as the NF-kappa-B pathway [35]. PIK3R1 encodes the regulatory subunit of phosphatidylinositol 3-kinse, which plays an important role in the metabolic actions of insulin [36]. Besides there were many other putative driver genes in Clone 2 specific CNV regions, for example ACVR2A, CXCR4 and LRP1B genes in chr2q with copy number losses, PHOX2B, RHOH and SLC34A2 genes in chr4p with copy number losses, CASP3, FAT1 and FBXW7 genes in chr4q with copy number losses, and BCL2, SETBP1, SMAD2 and SMAD4 genes in chr18q with copy number losses. The copy number gains or losses of these regions or genes had a high probability to be associated with lymph node metastasis.