Single nuclei chromatin accessibility and transcriptome mapping of breast tissues from women of diverse genetic ancestry. Figure S1 provides a brief overview of the experimental design and Fig. 1a provides genetic ancestry estimates of the donor samples used in the study. Details of genetic ancestry estimation analysis are described in our recent publication 13. Table S1 provides details of donors including menopausal status, age, BMI, self-reported race, days since ovulation etc. Average age, BMI and number of childbirths in each subgroup are also provided in this table. The majority of self-reported White donors are enriched for European ancestry, whereas the majority of self-reported Black donors are enriched for African ancestry. Sixty percent of self-reported Hispanic women are enriched for European ancestry. Fifty percent of Asians are enriched for East Asian while other 50% are enriched for Southeast Asian ancestry markers. Indigenous Americans are enriched for “Americana” ancestry markers and Americana ancestry proportion in these donors is similar to Indigenous American ancestry proportions described in Indigenous American breast cancer patients in Peru 4. Note that this is the only group to carry significant “Americana” ancestry markers. Ashkenazi Jewish Americans are a mixture of European, Middle Eastern, and African ancestry. We also included six samples from BRCA1 mutation donors and five BRCA2 donors and information on these donors are provided in another study that utilized these samples for single cell RNA-seq 14. Table S2 provides details of number of nuclei sequenced in each group and number of genes per nuclei per group. Although multiome assay that combined snATAC-seq and snRNA-seq was attempted with tissues from donors of all genetic ancestry, for unknown reasons, combined multiome assay did not work with tissues from donors of African ancestry. Therefore, only snRNA-seq was performed with the tissues of donors of African ancestry. For better comparison of data from breast tissues of African ancestry, a set of samples from parous and nulliparous women of European ancestry was similarly processed to obtain snRNA-seq.
Integrated analysis of snATAC-seq and snRNA-seq data from all ancestries except African ancestry revealed 10 distinct clusters: three of them are epithelial cell types, two are endothelial cell types, two adipocyte subtypes, fibroblasts, T cells and macrophages (Fig. 1b). Epithelial cell types were annotated using CD49f/EpCAM markers as done previously into mature luminal (ML), luminal progenitors (LP) and basal cells 11. These cell clusters have been described with various names in the literature 7,8,12. In a recent breast cell annotation event organized by Chan-Zuckerberg Initiative that included several research groups involved in developing single cell atlas of the breast, the following terminologies were suggested: luminal hormone sensing (LHS), luminal adaptive secretory precursor cells (LASP) and basal-myoepithelial (BM) cells for mature luminal, luminal progenitor, and basal cells, respectively. Another recent study that combined single cell RNA-seq with single cell proteome assay has subclassified hormone sensing cells into HS-alpha (HSa) and HS-beta (HSb), LASP cell into alveolar progenitors (AP) and basal-luminal alveolar (BL) cells and BM cells into Basal-alpha (BAa) and Basal-beta (BAb) cells 8. We used the markers described in that study to subcluster the epithelial cells and found the presence of all six epithelial cell states (Fig. 1c). Further examination of these data revealed LASP cells to be in four cell states, fibroblasts in two states, endothelial cells in four states and macrophages in two states (Fig. 1d). Genes expressed in each of the cell types are listed in Table S3. Genes differentially expressed in HSa compared to HSb cells, AP versus BL cells and BAa versus BAb cells are listed in supplementary Tables S4, S5, and S6, respectively. Genes differentially expressed in different cell states of endothelial cells, fibroblasts and adipocytes are listed in Tables S7, S8, and S9, respectively. The top ten transcription regulators of each of the major cell types are listed in Table S10. Average expression of genes in each cluster is listed in Tables S11.
AP cells are enriched in breast tissues of women of Indigenous American ancestry. We next analyzed data based on self-reported racial/ethnicity groups. Samples from BRCA1 and BRCA2 mutation carriers were also included. Cell types in each group are shown in Fig. 2a and cell proportions are listed in Table S12. Data from women of African ancestry are only from snRNA-seq and additional details for this group are described in subsequent sections. AP cells were disproportionately higher in breast tissues of women of Indigenous American ancestry. While 19% of cells in Indigenous Americans are AP cells, the percentage of AP cells in the other groups ranged from 5 to 9%. Using the previously described markers of alveolar cells (EHF and ELF5) and luminal progenitor cells (KIT) 15,16, we confirmed that AP cells express alveolar markers and KIT (Fig. 2b-d). None of these differences between the groups is due to differences in age, BMI or number of childbirths as the average age of European, Indigenous American, Asian, and Hispanic-White donors were 38, 41, 42, and 41, respectively (Table S1). Average age of Ashkenazi-Jewish-European and African ancestry donors was higher (average 53 and 54, respectively). None of these differences can be attributed to differences in proliferation rate of cells in the breast during tissue collection as MKI67, a marker of cell proliferation, is expressed in very few cells across samples (Fig. 2e).
AP cells enriched in Indigenous Americans express ESR1 and are enriched for transcripts downstream of ER-growth factor signaling
To determine whether AP cells enriched in Indigenous Americans express a unique set of genes compared to other cell clusters within LASP cells, we compared gene expression between cluster 9 (Indigenous American enriched cluster) and clusters 1, 12 and 18 (Table S5). ESR1 is among the top genes highly expressed (~ 230 fold higher) in this cluster compared to other clusters. ESR1 expression pattern in various clusters, as shown in Fig. 3a, further confirmed these results. A set of pioneer factors control ERa genome wide binding including FOXA1 and GATA3 and ERa-FOXA1-GATA3 constitute a cell type specific transcription factor network of hormone responsive cells 17,18. FOXA1 expression was mostly restricted to LHS cells, even in Indigenous Americans, whereas GATA3 expression was higher in LHS cells with lower level expression in LASP and BM cells (Fig. 3b and Fig. 3c).
Lack of FOXA1 expression but the presence of ESR1 transcripts in AP cells of Indigenous Americans suggested that ERa activity in these cells is regulated by a mechanism distinct from that in LHS cells. To evaluate this possibility, we subjected genes differentially expressed in cluster 9 to Ingenuity Pathway Analysis. Among the various pathways enriched in this cluster, intersection of EGF signaling with ER signaling emerged as one of the top signaling networks (Fig. 3d). EGF has previously been shown to alter the ER cistrome and induce gene expression patterns found in antiestrogen resistant cells 19. Thus, in cluster 9, ER and EGF signaling cross talk may be dominant. To further confirm this possibility, we next determined how many of the genes differentially expressed in cluster 9 are known ER-regulated genes in the breast cancer cell line MCF-7 20. We determined overlap between cluster 9 enriched genes and our previously described ERa:E2 regulated genes in MCF-7 cells 20. Among 671 genes enriched in Cluster 9, 264 genes are ERa:E2 regulated genes (Table S13). Therefore, it is possible that E2 and growth factors control ERa activity in LHS and LASP cells, respectively.
Chromatin accessibility of ESR1 gene extends to LASP cells: To determine whether cell type specific expression of ESR1 correlates with chromatin accessibility, we mapped chromatin accessible regions of ESR1 in various epithelial cell types and among donors of different genetic ancestry. Accessibility map included 10 KB regions upstream of transcription start site to cover promoter regions and 10 KB downstream of transcription end site to cover potential 3’ enhancer regions. Although ESR1 expression was limited to LHS cells among most donors except Indigenous Americans, the ESR1 gene displayed similar chromatin accessible regions in both LHS and LASP cells (Fig. 4a) except for one peak (peak 1) being more prominent in LHS cells. Since ESR1 is transcribed from multiple promoters 21, it is difficult to assign which among the open chromatin peaks correspond to promoter and enhancers. Since coding region of one of the ESR1 isoforms (NM_000125) starts at chr. 6:151,807,682, peaks 2 and 3 can be considered open chromatin regions of enhancer-promoters. Peak 4 may correspond to promoter region for isoforms such as ERaΔ36, which is transcribed from an intronic promoter 22. There was genetic ancestry and BRCA2 mutation dependent variability in chromatin accessible peaks 1 and 4 (Fig. 4b). For example, peak 4 is more prominent in BRCA2 mutation carrier but close to being absent in Indigenous Americans. ESR1 binding sites are present in chromatin accessible regions of LHS and LASP cells but at lower levels in BM cells, although differences are less dramatic (Fig. 4c). With respect to FOXA1, promoter region was more accessible in LHS cells compared to other cell types (Fig. 4d). Furthermore, chromatin accessible regions of LHS cells were enriched for binding sites for FOXA1 compared to other cell types (Fig. 4e).
We also examined the chromatin accessibility status of LASP/alveolar markers EHF, ELF5, and KIT. Distinct chromatin accessibility patterns for EHF and ELF5 were observed in three major epithelial cell types of the breast with prominent chromatin accessible promoter region in LASP cells followed by LHS cells (Fig. 4f). However, promoter regions of KIT was similarly accessible in all three cell types. BM cells had least open chromatin regions for these two genes. Therefore, EHF and ELF5 are more reliable markers of LASP cells. Chromatin accessibility map of BM cell markers TP63 and KRT14 showed cell type-dependent variability with BM cells showing distinct chromatin accessible promoter region (Fig. 4g). Multiple isoforms of TP63 are expressed using different promoters 23 and it is difficult to ascertain which among the chromatin accessible regions contain cell-type specific active promoter/enhancer regions of TP63. Nonetheless, these results clearly show that TP63 and KRT14 are bona fide markers of BM cells and changes in their expression in disease conditions require chromatin reorganization.
Expression patterns and binding site enrichment analysis showed that binding sites for EHF and ELF5 are enriched in the AP subpopulation of LASP cells (Fig. 4h). TP63 expression and binding site enrichment were restricted to BM cells, whereas GATA3 expression and binding site enrichment did not show cell type specificity (Fig. 4h). Collectively, only select genes expressed in epithelial cell types of the breast show cell type-specific chromatin accessibility patterns and their cell type-specific expression is influenced likely by extracellular signals rather than at the level of cell type specific chromatin accessibility.
Incompatibility in gene expression and chromatin accessibility extends to other genes associated with cell identity. Several in vitro studies have shown that the transcription regulator ZEB1 mediates epithelial to mesenchymal transition of breast epithelial and breast cancer cells and that its promoter region remains in poised state to confer plasticity to breast cancer cells 24–26. However, our previous two studies examining ZEB1 protein levels in normal breast tissues demonstrated the presence of ZEB1+ cells in stroma that surrounds duct with no expression in epithelial cells 13,27. We examined our multiome dataset for ZEB1 expression and chromatin accessibility. Interestingly, ZEB1 mRNA expression was observed in fibroblast-like cells, endothelial cells, adipocytes, and T cells but not in epithelial cells (Fig. 5a). However, ZEB1 regulatory regions showed similar patterns of chromatin accessibility in fibroblasts and all three epithelial cell types (Fig. 5b).
We extended the above observations to other genes that are used as markers to define various endothelial and T cell subtypes. For example, CD4+ T cell enriched IL7R and CD8+ T cell enriched IFNg showed T cell-specific expression and chromatin accessibility (Fig. 5c). Similarly, expression and chromatin accessibility of CD8 + T cell enriched GZMK were restricted to T cells (Fig. 5d). Chromatin accessibility of macrophage enriched FCGR3A was restricted to macrophages (Fig. 5e).
Among the endothelial cells, lymphatic endothelial cell marker LYVE1 is expressed in Endo-2 subcluster and a macrophage subpopulation but the chromatin accessibility patterns in these two cell types were not similar (Fig. 5f). Similarly, Endothelial Stalk-like subtype marker ACKR1 is expressed in a subset of Endo-1 subcluster but the chromatin in the regulatory regions of this gene is accessible in all cell types (Fig. 5g). CXCL12, which is expressed in Endo-1 and in fibroblasts, showed similar chromatin accessible patterns in all cell types except T cells (Fig. 5h). These results collectively demonstrate lack of compatibility in chromatin accessibility and expression of corresponding genes in the majority of cases.
Breast tissues of women of African ancestry show fibroblast and epithelial cell states distinct from breast tissues of women of European ancestry
As noted above, our multiple attempts to perform integrated snRNA-seq and snATAC-seq of breast tissues of women of African ancestry were not successful. Since breast cancer outcome disparity is known among African American women and our prior studies have suggested a biologic basis of disparity, particularly related to composition of stromal cells 3,13,27, we made multiple attempts through different protocols but were finally successful in obtaining reliable data only when nuclei were subjected to snRNA-seq alone. To allow appropriate comparison, we subjected new breast tissues of women of European ancestry to a similar protocol and performed snRNA-seq. Cell clustering showed similarity to clustering patterns obtained with integrated snATAC-seq and snRNA-seq (Fig. 6a). However, there were major differences in epithelial and fibroblast cell states between breast tissues of women of African and European ancestry. For example, LASP cell clusters in African ancestry are dominantly populated by BL cell state, whereas in case of European ancestry, there were similar numbers of BL and AP cells. Similar to data presented in Fig. 3a, ESR1 and FOXA1 expression were restricted to LSH cells in both groups (Fig. 6b). Unlike in the breast tissues of Indigenous Americans, AP cells did not express higher levels of ESR1 compared to other LASP cells. Average age of donors of African ancestry and European ancestry in this snRNA-seq was 54 and 34, respectively. Therefore, we compared cell type status of women of African ancestry with that of the Ashkenazi-Jewish European group with similar age (average 53). Differences between African and European ancestry persisted when cell state of African Ancestry was compared to cell state of Ashkenazi-Jewish European group as shown in Fig. 2a. The differences between cell clusters of women of African and European ancestry are not due to differences in proliferation rate as there was only a minor difference in MKI67 + cells between two groups (Fig. 6c) and MKI67 positivity was similar between African and Ashkenazi Jewish-European groups.
We have recently reported a unique type of stromal cells with multi-potent activity enriched in breast tissues of women of African ancestry 13. These cells are PROCR+/ZEB1+/PDGFRa+ (hence called PZP cells) and display mesenchymal stem and fibroblast-like features. Breast epithelial cells co-cultured with PZP cells transdifferentiated into BL state 13, which may explain the elevation of BL-like LASP cells in the breast tissues of women of African ancestry. Because fibroblasts in breast tissues of women of African ancestry clustered differently from that of European ancestry, we analyzed the expression patterns of PROCR, ZEB1 and PDGFRa. A subpopulation of cells in the fibroblast clusters were positive for all three markers and, number of PROCR+/ZEB1+/PDGFRa + cells were higher in fibroblast cluster of women of African ancestry compared to women of European ancestry (Fig. 6d, indicated by a circle in Fig. 6a), consistent with our prior study showing protein level differences 13.
Fibroblasts in breast have recently been shown to exist in four different states based on expression pattern of specific genes- Fibro-prematrix, Fibro-SFRP4, Fibro-major and Fibro-matrix 7. Using the gene expression signatures that specify these cell states, we analyzed fibroblasts of breast tissues from different genetic ancestry donors. As shown in Fig. 6e, there were genetic ancestry-dependent variabilities in fibroblast state with women of African ancestry showing specific enrichment of genes that specify Fibro-prematrix state at the expense of Fibro-matrix state (Fig. 6e and 6f). Figure S2 provides additional differences in fibroblasts between African and European ancestry and gene expression differences between fibroblasts of African ancestry and European ancestry are shown in Table S14. It is interesting that the expression of several of the ATP binding cassette subfamily members is elevated in fibroblasts of African ancestry compared to European ancestry (Figure S2). We also note that none of the markers corresponding to other fibroblast clusters are expressed at high levels. (Figure S2). When fibroblast populations of other ancestry groups and BRCA1/2 mutation carriers were compared, there was considerable variation in fibroblast cell state across genetic ancestry and BRCA1/2 mutation status suggesting important contribution of fibroblast diversity in human health (Fig. 6g). For example, consistent with the recent report on the effects of BRCA1 mutation on stromal cells 28, fibroblasts in BRCA1/2 mutation carriers are enriched in cluster 6.
Previous studies have demonstrated that breast tumors in African Americans are enriched for exhausted T cells 29. To determine whether healthy breast tissues of women of African ancestry intrinsically contain higher levels of exhausted T cells, we determined the expression patterns of CD274 (PD-L1), CTLA-4, LAG3, and PDCD1, all markers of exhausted T cells, in breast tissues from women of African and European ancestry 29. No significant differences were noted (Figure S3) between tissues of African and European ancestry. Therefore, accumulation of exhausted T cells in tumors of African American women is likely induced by the tumor microenvironment.
Murrow et al recently reported hormone-regulated cell-cell interaction network in the human breast at single cell resolution and identified various markers that can distinguish hormone responsive cells into two states; LRRC26 marks cells with both ER and progesterone receptor activity (HR + State 1), whereas P4HA1+ cells to represent hormone responsive cells with elevated hypoxia/Pro-angiogenic activity (HR + State 2) 9. Although LRRC26 expression was seen in a subpopulation of LHS cells, a subset of LASP cells also expressed this gene (Figure S4). Every cell type of the breast expressed P4HA1, suggesting lack of specificity of this gene as a marker of hormone responsive cells. We extended this analysis to include other genes suggested to be enriched in HR + State 1 (PNMT, CXCL13, MYBPC1, CADP52, WNT4, PPP1R1B, and IL20RA) and HR + State 2 (NDRG1, HILPDA, ANGPTL4, EGLN3, ERO1A, PLOD2, and ENO2). Only a few of these genes (CXCL13, MYBPC1, PNMT and WNT4) are expressed at higher levels in ESR1-positive cells (Figure S4). Additional studies are needed to further characterize distinct hormone responsive cell states.
Individual level characterization of age-dependent changes in breast epithelial cell gene expression. Several donors to our tissue bank have donated breast tissues twice, 10 years apart (described as timepoint 1 and timepoint 2) allowing us to determine gene expression differences with age. Prior studies focused on identifying age-dependent differences in gene expression utilized tissues from independent donors of different ages 8. Because of limited tissue samples, we performed spatial transcriptomics with samples from five donors who donated breast tissues twice. However, consistent results covering all regions of interest (ROI) were obtained with three pairs of samples. Age and BMI status of three donors at both times of tissue collection, UMAP of transcriptome data, and immunostaining of breast tissues to differentiate epithelial cells and adipocytes are shown in Figure S5a and b). Micro-dissected ducts and lobules as well as adipocytes are shown in Fig. 7a. We first evaluated resulting spatial transcriptomics data for adipocyte, endothelial and epithelial cell subtype gene signatures derived from multiome data (Tables S4-S9). These initial analyses showed that data from timepoint 2 cluster into two groups, but are generally more abundant in adipocyte-2 (Adi-2), macrophages and Endo-2 cell types compared to timepoint 1 (Figure S5c). Adi-1 and Adi-2 differ with respect to adiponectin expression with Adi-1 expressing 30-fold higher adiponectin than Adi-2 (Table S9). Endo-2 is enriched for lymphatic endothelial cell markers. Spatial transcriptomics allowed us to extend the studies to find gene expression differences between lobular and ductal epithelial cells. Volcano plot in Fig. 7b shows differences in gene expression between ductal and lobular epithelial cells. Pathway analysis of genes differentially expressed in ductal and lobular epithelial cells and at two different time points are shown in Figure S6. While metabolic pathways are enriched in epithelial cells of the lobules, ductal epithelial cells showed enrichment of extracellular signal activated signaling networks such as cytokine and cancer-associated signaling networks. Age-dependent changes were also restricted to metabolic pathways in lobular epithelial cells, whereas changes in splicing machinery were observed in ductal epithelial cells. Genes differentially expressed in lobular epithelial cells compared to ductal epithelial cells (Table S15), differences at timepoint 1 (Table S16), at timepoint 2 (Table S17); within ducts between timepoint 1 versus 2 (Table S18), within lobules between timepoint 1 versus 2 (Table S19) are provided as supplementary information. Volcano plots shown in Fig. 7c-7f further highlight key genes differentially expressed in ductal and lobular epithelial cells under different conditions. Four-hundred twenty six genes are differentially expressed in ductal versus lobular epithelial cells. A recent study using a different spatial transcriptomic technique listed 30 genes that are expressed differentially in ductal and lobular epithelial cells 7. Among those 30 genes, 10 genes showed similar expression patterns (MGP, ANXA1, TACSTD2, KRT14, KRT17, WFDC2, STAC2 and ALDH1A3 are elevated in ductal epithelial cells whereas APOD and SNORC are elevated in lobular epithelial cells). Expression pattern of these genes assessed through multiome data is shown in Figure S7. Since 10 genes showed similar expression patterns in lobular and ductal epithelial cells in two independent studies, these genes can be considered as markers of ductal and lobular epithelial subtypes. However, expression of only few of these genes is enriched in epithelial cells. We observed upregulation of KRT14 and KRT17, BM cell enriched genes, in ductal compared to lobular epithelial cells, consistent with the previous report 7. Immunoglobulin heavy constant alpha 1 (IGHA1) and Immunoglobulin Kappa Constant (IGKC) gene, both of which are expressed in the breast as per GTEx database, are expressed at higher levels in lobular compared to ductal epithelial cells. Several genes showed differences in expression in two different timepoints but only PTBP1 showed consistent decline (~ six folds) in expression in timepoint 2 compared to timepoint 1 in both ductal and lobular epithelial cells. PTBP1 is associated with mRNA processing and alternative splicing 30 and, based on UALCAN database 31, PTPB1 is overexpressed in all subtypes of breast cancer (Figure S8). We determined genes commonly upregulated or down-regulated at timepoint-2 compared to timepoint-1 in both lobular and ductal epithelial cells. IPA analyses of the genes showed downregulation of genes associated with PKA signaling pathway but upregulation of EIF2 and oxidative phosphorylation pathways in timepoint 2 compared to timepoint 1 (Fig. 8 and Table S20).