Identification of super-enhancers
To analyze the characteristics of mammals SEs, we identify enhancers in six cells/tissues, including ESCs/ips, liver, stomach, colon, ileum and small intestine across human, mouse and pig by using H3K27ac histone maker (Figure 1). When pooling all cells/tissues in each species together, we identify a median of 848 SEs across five human cells/tissues (without ileum), a median of 888 SEs across five pig cells/tissues (without small intestine), and a median of 503 SEs for six mouse cells/tissues (Figure 2a). Generally, hundreds of SEs and thousands of TEs are identified for individual cells/tissues across mammals (Figure 2f, Figure 1S-4S). For example, in liver, we identify 912 SEs and 17852 TEs for human, 1261 SEs and 17220 TEs for pig, 503 SEs and 11472 TEs for mouse, respectively (Figure 2f).
In addition, average H3K27ac signals for SEs are detected higher than that for TEs. When pooling all cells/tissues in each species together, a median of 13013 rpm/bp are detected for human SEs, 8596 rpm/bp for pig SEs and 5492 rpm/bp for mouse SEs, while a median of 1117 rpm/bp are detected for human TEs, 1124 rpm/bp for pig TEs, and 570 rpm/bp mouse TEs (Figure 2b). For H3K27ac density for individual cells/tissues across mammals, saturation curves also show higher signal found at SEs that at TEs (Figure 2d, Figure 1S-4S). In addition, average H3K27ac signal density around SE center is higher than that around TE center (Figure 2f, Figure 1S-4S).
Furthermore, SEs sizes in genome are found larger than TEs sizes. When pooling all cells/tissues in each species together, we detect a median of 33289 bp for human SEs, 27558 bp for pig SEs, and 19523 bp for mouse SEs, while a median of 983 bp for human TEs, 2332 bp for pig TEs, and 1117 bp for mouse TEs (Figure 2c). For individual cells/tissues across mammals, the median lengths of SEs are also found longer than that of TEs. For example, in liver, we detect a median size of 19090 bp for human SEs, 15654 bp for pig SEs and 8532 bp for mouse SEs, while median size of 1285 bp for human TEs, 1490 bp for pig TEs, and 953 bp for mouse TEs (Figure 2f). The SE identification of the remaining unexampled individual cells/tissues across species are listed in Figure S1-S4, they show similar SEs and TEs characterizations to liver across species.
Generally, in each cell/tissue of three species, SEs differ TEs from their larger sizes in genome, higher H3K27ac histone maker signal and lower numbers, which is consistent with SE characterization in previous studies [7, 8].
Genomic distribution of super-enhancers
To reveal SE genomic features across three mammal cells/tissues, we plot the distribution of genomic annotation of SEs. Across species, a majority of SEs are located in promoters regions for human and mouse, while a majority of SEs are located in distal intergenic region. When pooling all cells/tissue within species together, we detect 91.9% human SEs, 63.4% mouse SEs, and 29.8% pig SEs located in promoter region (± 3kb of TSS) in median, while 1.3% of human SEs, 11.8% of mouse SEs, and 66.1% of pig SEs overlapped with distal intergenic regions in median (Figure 3a). For TE annotation, we detect a median of 33.6% for human, 25% for mouse, and 3.1% for pig located in promoter region, while 20.1% for human, 31.2% for mouse, and 91.1% for pig overlapped with distal intergenic region (Figure 3a). Thus, for each species, a higher percentage of SEs are located in promoter than TEs, while a higher percentage of TEs are overlapped with distal intergenic region than SEs (Figure 3a).
Furthermore, we plot the density of SEs along their distance to TSSs for individual cells/tissues across species. Generally, a large part of pig SEs are located far from TSSs TSS at 20~30kb while most human and mouse SEs are located near from TSSs within 10kb (Figure 3b, 5S-8S). In addition, the distribution of genomic annotation for individual cells/tissues show that pig SEs are mostly distributed in promoter region compared to human and mouse SEs. For example, in liver, we detect 91.9% human SEs, 31.6% mouse SEs, 31.2% pig SEs located in promoter, while 1.1% human SEs, 17.5% mouse SEs, and 64.7% pig SEs located in distal intergenic region. SEs in the remaining unexampled individual cells/tissues show similar pattern (Figure 3c and 4S-8S), except for mouse colon SEs.
Super-enhancers specificity at different stages of tissues within species
To reveal SEs specificity at different stages of tissues within species, we investigate their intersection in three human stomach stages, three human colon stages, three mouse stomach stages and three mouse small intestine stages. When pooling all these individuals within species together, we detect a median of 50.7% SEs and 35.7% TEs unique for human, while a median of 50.4% SEs and 43.7% TEs unique for mouse (Figure 4a, Figure 12S). To better compare the overlapping across different tissues across species, we also analyze the common overlapping rates for SEs and TEs for different stages within species. Generally, the overlapping of common SEs is lower than that of TEs. For example, across three human stomach stages, 6.4% SEs and 11.76% TEs are shared (Figure 4b). This pattern has also been found in three mouse stages (Figure 2c), three human colon stages and three mouse small intestine stages (Figure S9). To further better understand the overlapping of SEs in different stages for a certain cell/tissue within species, we show several examples of common overlapping SEs in IGV (Figure 4d, Figure S9).
Similar to the overlap analysis of SEs, we also assess the intersection of SE-associated genes. When pooling all individuals of tissues within species together, we detect a median of 19.6% SE- and 8.7% TE-associated genes unique for human, while a median of 26.6% SE- and 19.7% TE-associated genes unique for mouse (Figure 4a, Figure 11). For different stages within individual cells/tissues, lower overlapping rates of SE-associated genes has been found than that of TE-associated genes (Figure 4c, S9). For example, for three human stomach stages, 17% SE-associated genes and 23.7% TE-associated genes are shared (Figure 4c). Similar pattern of SE and TE-associated genes for three human colon stages, three mouse stomach stages, three mouse small intestine stages are detected and given in Figure 4c and Figure S9).
Super-enhancers specificity across different cells/tissues within species
Later, to reveal super-enhancers specificity across different cells/tissues within species, we estimate the overlapping of SEs for three selected tissues, including liver, stomach and colon across three species. When pooling all tissues within species together, we detect a median of 73.5% of SEs and 64% of TEs unique for human, 79.5% of SEs and 45.3% of TEs unique for pig, and 88.4% of SEs and 30.8% of TEs unique for mouse (Figure 5a, Figure S13). In addition, we detect the common overlapping rates for SEs and TEs across different cells/tissues within species. Normally, the overlapping of common SEs is lower than that of TEs. For example, in liver, 1.6% SEs and 3.5% TEs are detected for human, 1.8% SEs and 33.3% TEs for pig, and 0.2% SEs and 1.4% TEs for mouse (Figure 5b,c,d). Examples of overlapping for different cells/tissues of human, pig and mouse are given in Figure 5e.
Similarly, we also investigate the intersection of SE-associated genes across different cells/tissues within species. We detect a median of 43.7% SE-associated genes and 21.4% TE-associated genes unique for human, 32.8% SE-associated genes and 3.7% TE-associated genes unique for pig, and 62.1% SE-associated genes and 30.8% TE-associated genes unique for mouse (Figure 5a, Figure 13). For the overlapping rates, we detect 16.8% SE-associated genes and 16.0% TE-associated genes shared for human, and 10.8% SE-associated genes and TE-associated genes 28.4% shared for pig, while 2.9% SE-associated and 5.2% TE-associated genes shared for mouse (Figure 5b,c,d). The individual information are given in Figure 5e.
Super-enhancers specificity across species
Furthermore, to reveal super-enhancers specificity across species, we investigate the overlapping of orthologous SEs for the same cells/tissues across species. The selected cells/tissues include liver, stomach, colon and ips across three species. When pooling the cells/tissues across the species, we detect a median of 91.6% orthologous SEs and 85.9% orthologous TEs unique for human, 93.2% orthologous SEs and 87.2% orthologous TEs unique for pig, and 92.1% orthologous SEs and 81.3% orthologous TEs unique for mouse (Figure 6b). The overlapping across species for orthologous SEs are low. For example, we detect 0.1% orthologous SEs and 1.3% orthologous TEs are shared for liver across three species. Similar pattern has been found in other individuals cells/tissues (Figure S10-S13). Examples of overlapping for orthologous SEs are given in Figure 6d, and S10-S13.
The overlapping analysis are also performed for orthologous SE-associated genes across species. When pooling all cells/tissues together, a median of 79.1% SEs and 64.8% TEs orthologous associated genes unique for human, 83.9% SEs and 49.8% TEs associate genes unique for pig, and 63.3% SEs and 34.4% TE-associated genes unique for mouse (Figure 6a). The common overlapping of orthologous SE-associated genes has shown 1.2% SEs and 3.5% TE orthologous associated genes for liver (Figure 6b). Similar patterns have been found in other individuals cells/tissues and given in Figure S10-S13.
Function comparison across species
To reveal function of SEs, we perform GO analysis for six cells/tissues across three species. First, we show the selected five of top 15 enriched terms across species (Figure 7). Among these terms, terms such as response to hormone/peptide hormone are shared among three species, terms such as response to insulin, actin filament organization are shared between human and mouse, while terms such as response to oxidative stress, oxoacid metabolic process are shared between human and pig. The shared biological terms are mostly consistent with liver function. Previous studies have shown that liver plays a central role in glucose, and lipid metabolism, oxygen-rich blood supply and oxidative metabolism [19, 20]. In addition, investigation of scRNA-seq of human liver cells has revealed remarkably conserved features of liver zonation between mouse and human, i.e. the spatial separation of the immense spectrum of different metabolic pathways along the liver sinusoids between mouse and human [21, 22]. Furthermore, we use CRC mapper to identify the core transcription factors in each species (Figure 8). For liver, six common core transcription factors are detected across three species, including SREBF1, FOXO3, BCL6, NR5A2, IRF1, and KLF15 (Figure 8). SREBF1 is a decamer flanking the lover density lipoprotein receptor gene and some genes involved in sterol biosynthesis. KLF15, Krüppel-like factor 15 (KLF15) is a transcription factor that is involved in various biological processes, including cellular proliferation, differentiation and death.
Second, we show the five selected of top 15 enriched terms of ips/ESCs across species (Figure 7). By using the same parameters with mouse and human SEs, no significant enriched terms has been found for pig ips SEs. Thus, we compare SE function between human and mouse. Terms such as stem cell population maintenance, maintenance of cell numbers, regulation of cell fate specification/cell fate commitment are shared between two species (Figure 7). These biological processes are related to ips/ESCs identity that are pluripotent and capacity to self-renewal [7, 8]. Furthermore, one common transcription factor SOX2 are shared across three species for ips (Figure 8). SOX2 is a key transcription factor that is essential to maintaining the pluripotent embryonic stem cell phenotype [7, 8].
Third, we compare SE function of the tissues of digestive system. These digestive tissues include stomach, colon, small intestine, and ileum. For stomach, terms such as muscle cell differentiation are shared between human and mouse, while the remaining terms are unique in each species. For example, terms such as cellular response to peptide hormone stimulus, cellular response to peptide, cellular response to insulin stimulus are detected in human. The unique terms in mouse include striated muscle tissue development, muscle tissue development and epithelial cell proliferation. The unique terms in pig include response to oxygen-containing compound, positive regulation of immune system process, cellular protein metabolic process. These different terms are related to stomach function. Previous study have indicated that the stomach is a muscular sac that provides a conducive environment for breaking down, chemically modifying, and sending to the next stage of digestion the food [23]. For stomach, eight common core transcription factors are detected, such as FOS, SREBF1, FOSL2, TGIF1, IRF1, NR2F2, JUN, and TEAD1. FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. In some cases, expression of the FOS gene has also been associated with apoptotic cell death. TGIF1, he protein encoded by this gene is a member of the three-amino acid loop extension (TALE) superclass of atypical homeodomains.
For the other digestive tissues such as colon, small intestine and ileum, they show similar pattern with stomach. For example, for colon, similar terms such as response to steroid hormone, myeloid cell differentiation are shared between human and mouse. Terms such as actin filament organization, actin filament bundle organization are shared between human and mouse. The other unshared terms such as response to oxygen-containing compound, muscle adaptation, and cell junction assembly. For small intestine, active filament organization, actin filament bundle organization, actin filament bundle assembly, cell junction assembly are shared between human and mouse. For mouse ileum, regulation of epithelial cell differentiation, cellular response to organic cyclic compound, and negative regulation of focal adhesion assembly. For pig ileum, response to oxygen-containing compound, response to oxidative stress, response to hormone, positive regulation of immune system process. The tissues are complex mixed cell components (Figure S14). For the key transcription factors, we shown colon across three species, For colon, nine transcription factors includes FOS, SREBF1, FOS2, TGIF1, IRF1, JUN, TEAD1, HES2, and NR2F2 (Figure 8). NR2F2, Ligand-activated transcription factor. Activated by high concentrations of 9-cis-retinoic acid and all-trans-retinoic acid, but not by dexamethasone, cortisol or progesterone (in vitro).