Genome-wide DNA methylation profiling of breast IPs and normal mammary gland tissues of tree shrews
To investigate methylation patterns during tree shrew IP development, we analysed genome-wide DNA methylation (DNAm) levels in the IP group and normal mammary gland tissue by whole-genome bisulphate sequencing (WGBS) with > 99% conversion efficiency. The genome of the normal mammary gland (Control) presented ~ 4.39% methylated cytosines (mCs), and the IP sample presented ~ 4.41% mCs among the total sequenced C sites, reflecting the degree of genome methylation. The methylation levels of CG, CHH, and CHG (where H is A, C, or T) sites were distinct. We detected genome-wide mC levels of 88.08 ± 1.76% for CG, 2.52 ± 0.32% for CHG, and 9.40 ± 1.45% for CHH in the Control group and 90.19 ± 0.54% for CG, 2.07 ± 0.41% for CHG, and 7.74 ± 0.51% for CHH in the IP group. Thus, the proportions of these contexts were analogous between groups (Fig. 2A).
Epigenetic variation among DNA sequences, particularly CpG DNA methylation, is an important type of variation that modulates gene expression under physiological and pathological conditions. Our results showed that CG methylation levels were higher while CHG and CHH methylation levels were lower and that the number of mCG sites was increased in the IP group compared with the Control mammary gland tissues, while other types of sites did not differ significantly. To further compare the coding gene distribution as well as the methylation levels of diverse functional genomic elements between the two groups, we analysed the DNA methylation pattern in three distinct regions of transcriptional elements: the upstream 2 k region (2000 bp before the transcription start site (TSS) of the gene), the gene body region, and the downstream 2 k region (2000 bp after the gene termination site). The distribution characteristics of DNA methylation levels in distinct functional regions help to understand the characteristics of DNA methylation modifications in different regions at the whole-genome level.
The obtained DNA methylation profiles demonstrated that the methylation level in the CG context was higher than those in the CHG and CHH contexts. The DNA methylation level in the CG context was highest in the gene body region. DNA methylation was moderately high in the upstream 2K start site, decreased dramatically from the upstream 2 k region to the TSS, increased sharply from the TSS to the gene body region, maintained the highest level in the gene body region and then decreased slightly in the downstream 2K region (Fig. 2B).
We identified 3,486 differentially methylated CG regions, 82 CHG regions, and 361 CHH regions. In the CG context, 3,486 differentially methylated regions (DMRs), located in 701 genes, were identified between the IP and Control groups (Q < 0.05), among which 705 showed increased methylation and 2,781 showed decreased methylation in IP tissues compared with the levels in Control tissues (Additional file 1: Table S1). All DMRs were used for the comparison of differences, as shown in Fig. 3A. We analysed CG methylation sites in tree shrews in the IP and Control groups using hierarchical clustering. The results revealed separation between the two groups.
CG site methylation levels for the total methylated sites and DMGs in IP tumours (69.47%) were decreased compared with those in mammary glands (74.18%). Among the genes containing DMRs, 607 genes were located within the gene body, including 259 upregulated genes and 357 downregulated genes. A total of 58 genes were located in the upstream 2K region, whereas 55 were located in the downstream 2K region (Fig. 3B) (Additional file 1: Table S1).
To further elucidate the biological functions of DMGs, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. Based on GO term analysis, the putative target genes of KLF5 were associated with terms such as developmental process (P adjust = 0.0056) and single-organism developmental process (P adjust = 0.0103) in the biological process (BP) category; binding (P = 0.0063) in the molecular function (MF) category; and membrane (P = 0.0045) in the cellular component (CC) category (Fig. 3C) (Additional file 1: Table S1). KEGG pathway analysis indicated that 15 DMGs were associated with the oxytocin signalling pathway (Q = 0.0027), while 10 DMGs were associated with the oestrogen signalling pathway (Q = 0.0113) (Fig. 3D) (Additional file 1: Table S1). Taken together, the gene annotation results revealed that the majority of differentially genes showing increased methylation and decreased methylation were mapped to the gene body region, suggesting that DMGs play crucial roles in IP.
Differentially expressed genes (DEGs) between the IP group and normal mammary gland tissue
To systematically describe the transcriptome landscape of the IP group and normal mammary gland tissues, whole-transcriptome sequencing was performed. After removing low-quality reads from each library, the clean reads were combined and aligned with the Tupaia chinensis genome (TupChi_1.0), resulting in the identification of 10,051 known mRNAs, 25,481 new mRNAs, 1,022 known lncRNAs, 1,617 new lncRNAs, and 10,602 new circRNAs in IPs (IP group) and normal mammary glands (Control group) in total. Furthermore, we compared the transcriptomic landscapes of the normal mammary gland and IP tumour tissues of tree shrews. We identified 39 upregulated DEGs and 23 downregulated DEGs in IP tumours compared with normal mammary glands (FDR < 0.05 & |log2FC|>1) (Fig. 4A). Among these DEGs, the expression of ST14, PSMF1, and TNFSF1 was more than 15-fold higher in the IP group. In normal mammary gland tissue, the levels of FAM192A and Psmc5 were 15- and 7-fold higher, respectively. More detailed information is listed in Additional file 1: Table S2.
Among these DEGs, GO analysis indicated that TNFSF11 participates in the tumour necrosis factor-mediated signalling pathway and the cellular response to tumour necrosis factor (P = 0.0042); GATA3, EPHA2, and PTPRG are involved in the regulation of epithelial cell migration (P = 0.0042); STAG2, TEX14, PPP1R9B, and TPR are involved in the regulation of cell cycle processes (P = 0.0087); and TNFSF11, and EPHA2 are involved in epithelial cell proliferation (P = 0.0091) (Additional file 1:Table S2). ACSL1 and APOA5 participate in the PPAR signalling pathway (P = 0.0367) according to KEGG analysis (Additional file 1: Table S2). Furthermore, we utilized gene set enrichment analysis (GSEA) to analyse groups of functionally relevant genes in IPs compared to normal mammary glands. A total of 107 GO terms and 66 pathways (FDR < 0.05) were indicated to be significantly enriched in dysregulated genes in IPs (Additional file 1: Table S2). The upregulation of cell cycle phase- and S phase-related terms in the biological phase category and mitotic sister chromatid segregation and sister chromatid segregation in the cellular component organization or biogenesis category was observed under the BP category in the IP group (Additional file 1: Table S2). Among the DEGs identified in the IP group, the upregulated gene sets were involved not only in replication and repair, including DNA replication, mismatch repair and nucleotide excision repair, but also in cell growth and death, including the cell cycle; the downregulated gene sets were involved in the endocrine system, including the renin secretion and PPAR signalling pathways (Additional file 1: Table S2). Overall, many of the DEGs identified in IPs were found to be involved in tumorigenesis, as demonstrated by the enrichment analysis of GO terms and pathways.
The lncRNA expression profile is distinct between IPs and normal mammary gland tissue in tree shrew
Compared with the results in normal mammary gland tissue samples, 50 differentially expressed lncRNAs (DElncRNAs), including 39 upregulated and 11 downregulated lncRNAs, were identified in the IP tissue samples (Fig. 5A). The most significantly upregulated lncRNAs were TCONS_00002926, TCONS_00077528, XR_001369927.1, TCONS_00037489, and XR_333956.1, whereas TCONS_00016941, XR_334171.1, TCONS_00135842, TCONS_00077456 and TCONS_00094673 were downregulated lncRNAs identified in the IP group (Additional file 1: Table S3). To reveal the function of the lncRNAs, we predicted the complementary correlation of antisense lncRNAs and mRNAs. All antisense target genes of the DElncRNAs were subjected to GO and KEGG pathway analysis to determine their functions. A variety of relevant GO terms in the BP category were observed (Additional file 1: Table S3), such as blood vessel endothelial cell migration (P = 3.99E-05), epithelial cell migration (P = 0.0012), epithelium migration (P = 0.0012), and negative regulation of apoptotic process (P = 0.0016). KEGG pathway analysis revealed that the Jak-STAT signalling pathway (P = 0.0087), PPAR signalling pathway (P = 0.0233), and oestrogen signalling pathway (P = 0.0292) were associated with DElncRNAs (Additional file 1: Table S3).
The second function of lncRNAs, when located less than 10 kb upstream/downstream from a gene, is to act as cis-regulators of their neighbouring genes on the same strand. A large number of enriched GO terms were observed in the BP category, including branching morphogenesis of an epithelial tube (P = 0.0001), serine phosphorylation of STAT protein (P = 0.0002), negative regulation of cell growth (P = 0.0017), and execution phase of apoptosis (P = 0.0019) (Additional file 1: Table S3). KEGG pathway analysis revealed that DElncRNAs were associated with the pathways of protein processing in the endoplasmic reticulum (P = 0.0017), glyoxylate and dicarboxylate metabolism (P = 0.0018), antigen processing and presentation (P = 0.0250), and the NOD-like receptor signalling pathway (P = 0.0483) (Additional file 1: Table S3).
The third function of lncRNAs is the trans-regulation of non-adjoining co-expressed genes. We analysed the correlation of expression between lncRNAs and mRNAs to reveal the target genes of lncRNAs and performed GO function and KEGG pathway enrichment analyses of protein-coding genes with an absolute correlation greater than 0.9 (Additional file 1: Table S3). The enriched GO functions of the DElncRNA trans-regulated co-expressed genes are listed in Table S3 and included mesenchymal stem cell differentiation (P = 0.0032), extrinsic apoptotic signalling pathway (P = 0.0070), recombinational repair (P = 0.0086), and the execution phase of apoptosis (P = 0.0125) in the BP category. The DElncRNA trans-regulated co-expressed genes were significantly enriched in KEGG pathways including homologous recombination (P = 0.0004), histidine metabolism (P = 0.0022), and the renin-angiotensin system (P = 0.0244) (Additional file 1: Table S3). These dysregulated lncRNAs might be involved in IP tumorigenesis in tree shrews by regulating genes and signalling pathways implicated in tumorigenesis.
The circRNA expression profile is distinct between IP and normal mammary gland tissues of tree shrews
As shown in Fig. 6A, we identified 32 differentially expressed circRNAs (DEcircRNAs), including 25 upregulated circRNAs and 7 downregulated circRNAs, between IPs and normal mammary glands (P < 0.05 & |log2FC| > 1). In the heat map of DEcircRNAs, IP tissues could be separated from normal mammary gland tissues by the DEcircRNAs. The most significantly upregulated circRNAs were novel_circ_004184, novel_circ_001608, novel_circ_007270, novel_circ_004893, and novel_circ_004886, while novel_circ_007552, novel_circ_007844, novel_circ_002826, novel_circ_005946 and novel_circ_009457 were identified as downregulated circRNAs in the IP group (Additional file 1: Table S4).
The gene of origin of a circRNA is its parental gene. We carried out a functional enrichment analysis of parental genes to investigate the putative functions of differentially expressed circRNAs. GO analyses revealed that the parental genes of the dysregulated circRNAs were enriched in the terms branch elongation of an epithelium (P = 0.0161), negative regulation of cell proliferation (P = 0.0173), positive regulation of epithelial cell proliferation (P = 0.0233), vasculogenesis (P = 0.0276), and gland morphogenesis (P = 0.0290) in the BP category. The PI3K-Akt signalling pathway (P = 0.0139) and Ras signalling pathway (P = 0.0509) were identified in the KEGG enrichment analysis (Fig. 6B. 6C, Additional file 1: Table S4). Taken together, these findings suggest that these DEcircRNAs that influence gene expression may affect IP tumour development.
Integration analysis of the methylome and transcriptome
To examine whether differences in DNA methylation between the IP and healthy mammary gland tissues of tree shrews could be the basis of the observed gene expression differences, we analysed the correlation between the gene expression and DNA methylation data in the IP group against that in the normal mammary gland group, and the results indicated a considerable regulatory effect of DNA methylation on the modulation of gene expression. The methylation levels in the upstream 2K (Pearson’s R = -0.0383 for IP group; and Pearson’s R = -0.0192 for Control group), gene body (Pearson’s R = -0.0922 for IP group; and Pearson’s R = -0.0737 for Control group) and downstream 2K regions in the Control group (Pearson’s R = -0.0013 for Control group) were negatively correlated with expression levels, but those in the downstream 2K region were positively correlated in the IP group (Pearson’s R = 0.0030 for IP group) (Fig. 7A. 7B).
Additionally, DMGs and DEGs were compared through integrated methylomic and transcriptomic analysis, which showed 25 differentially methylated and expressed genes according to both RNA-seq (P < 0.05) and WGBS (Q < 0.05) (Fig. 7B and Additional file 1: Table S5). Among these genes, the gene numbers exhibiting DMRs in the upstream 2K, gene body and downstream 2K regions were 1, 23 and 1, respectively. The GO analysis of DEGs and DMGs showed that 15 genes could show enrichment in 213 BP terms (P < 0.05) (Additional file 1: Table S5). Among these genes, ATP2B4 was involved in the positive regulation of transmembrane transport (P = 0.0008), calcium-mediated signalling (P = 0.0022), the negative regulation of catabolic processes (P = 0.0031), and the negative regulation of calcium-mediated signalling (P = 0.0061); PDZK1 was involved in the positive regulation of transmembrane transport (P = 0.0014), positive regulation of transport (P = 0.0031) and regulation of transmembrane transport (P = 0.0126); LCP1 was involved in the regulation of intracellular transport (P = 0.0034) and the regulation of cellular localization (P = 0.0151); and PDZK1 and LCP1 are involved in the regulation of transport (P = 0.0135). The KEGG analysis of DEGs and DMGs showed that 9 genes were enriched in 19 signalling pathways (P < 0.05) (Additional file 1: Table S5). Subsequent analysis identified 9 genes with an inverse relationship between their degree of DNA methylation and gene expression in gene body regions (Table 2), which were related to signal transduction and the endocrine system. Three differentially upregulated and downregulated genes (ADCY5, ATP2B4, and CREB5) were associated with signal transduction pathways, including the cAMP signalling pathway (P = 3.13E-06), cGMP-PKG signalling pathway (P = 6.76E-05), TNF signalling pathway (P = 0.0118), and AMPK signalling pathway (P = 0.0131). Furthermore, ADCY5 and CREB5 were involved in insulin secretion (P = 0.0001) and the oestrogen signalling pathway (P = 0.0002) in the endocrine system. Thus, we integrated the gene expression and DNA methylation maps and identified protein-coding genes with underlying changes related to DNA methylation in IPs; the resulting alterations probably induced the development of mammary tumours.
Table 2
Significant DEGs showing a negative correlation with DNA methylation involved in the development of intraductal papilloma.
Gene ID | start | end | DNA methylation | mRNA | Symbol | |
P value | Q value | Methylation difference | Control fpkm | IP fpkm | log2 (FC) | P value | Description |
XM_006146258.2 | 69740001 | 69740200 | 1.09E-20 | 4.56E-18 | -35.605 | 0.001 | 1.370 | 10.420 | 0.008 | ADGRL3 | latrophilin-3 isoform X6 |
XM_006146258.2 | 69467601 | 69467800 | 2.00E-16 | 4.18E-14 | -33.942 | 0.001 | 1.370 | 10.420 | 0.008 | ADGRL3 | latrophilin-3 isoform X6 |
XM_006146258.2 | 69467501 | 69467700 | 1.16E-17 | 3.02E-15 | -33.442 | 0.001 | 1.370 | 10.420 | 0.008 | ADGRL3 | latrophilin-3 isoform X6 |
XM_014582091.1 | 139578801 | 139579000 | 1.21E-09 | 5.35E-08 | -27.965 | 20.140 | 95.463 | 2.245 | 0.021 | LCP1 | plastin-2 |
XM_014582091.1 | 139578901 | 139579100 | 1.21E-09 | 5.35E-08 | -27.965 | 20.140 | 95.463 | 2.245 | 0.021 | LCP1 | plastin-2 |
XM_006170445.2 | 23207301 | 23207500 | 4.60E-13 | 4.82E-11 | -26.329 | 0.001 | 8.303 | 13.019 | 0.044 | UBE2V2 | Ubiquitin-conjugating enzyme E2 variant 2 |
XM_006169173.2 | 65590701 | 65590900 | 1.74E-14 | 2.49E-12 | 27.634 | 179.237 | 50.200 | -1.836 | 0.035 | DNASE1L3 | deoxyribonuclease gamma |
XM_014583774.1 | 35152901 | 35153100 | 5.78E-06 | 7.69E-05 | 28.147 | 8.087 | 0.223 | -5.178 | 0.037 | CREB5 | cyclic AMP-responsive element-binding protein 5 isoform X3 |
XM_006157144.2 | 118399201 | 118399400 | 2.95E-09 | 1.16E-07 | 29.807 | 11.083 | 0.883 | -3.649 | 0.045 | PDZK1 | Na(+)/H(+) exchange regulatory cofactor NHE-RF3 |
XM_006154597.1 | 40832101 | 40832300 | 2.93E-18 | 8.43E-16 | 33.982 | 4.490 | 0.170 | -4.723 | 0.007 | ADCY5 | adenylate cyclase type 5 |
XM_006164776.2 | 106988901 | 106989100 | 4.92E-14 | 6.38E-12 | 39.173 | 7.483 | 0.993 | -2.913 | 0.029 | ADAMTSL1 | ADAMTS-like protein 1 |
XM_006159177.2 | 28289501 | 28289700 | 1.30E-24 | 8.60E-22 | 45.829 | 4.847 | 0.030 | -7.336 | 0.002 | ATP2B4 | plasma membrane calcium-transporting ATPase 4 |
Start/end: the gene position from start to end; Q value: adjusted p value; Control fpkm/IP fpkm: gene expression levels in the Control/IP group; log2 (FC): log of the fold change in gene expression. A positive value indicates upregulation, and a negative value indicates downregulation. |