Evaluation of RNA-seq data
For analyzing phenotypic variation in seed oil accumulation between wild tetraploid and cultivated peanuts, we measured oil content in three accessions of A. monticola and five accessions of A. hypogaea. The oil content of mature seed in three wild accessions, namely 171 (WH 4335), 172 (WH 4334), and 245 (WH 10025), ranged from 58.1% to 59.8%, and that of five cultivated peanuts, namely 003 (Zhh 0225), 145 (Zhh 0888), 492 (Zhh 0003), 502 (Zhh 0602), and Z16 (Zhh 7720), ranged from 48.3% to 52.0% (Figure 1A). Multiple comparison analyses indicated that wild tetraploid species has significantly higher oil content than cultivated peanuts. To explore the reason for oil accumulation difference during peanut domestication, developing seeds at two stages (R5 and R8) were collected to generate RNA-seq datasets for both the species (Figure 1B). Sixteen samples with three biological replications were used to construct RNA-seq libraries followed by generation of 44.1 to 45.4 million clean reads per library (Table S1). The union set of genomes of diploid ancestors A. ipaensis and A. duranensis (Bertioli et al. 2016) were used as a reference in this study. An average of 83.72% and 83.59% of clean reads were mapped on the reference genome for wild and cultivated peanuts, respectively (Table S1). This similar mapping rate indicated that the reference can be used to quantify the gene expression level in both wild and cultivated species. The number of genes among different libraries ranged from 32148 to 45226 genes with expression of total 56236 genes among 16 samples (Table S1 and S2). Hierarchical clustering analysis was performed to cluster the samples according to developmental stages (Figure 1C). In stage R8 group, the samples could be further divided into wild (BW) or cultivated (BC) subgroups. However, the samples in R5 group could not be clearly distinguished between wild and cultivated subgroups. The first two PCA components (Figure 1D) of transcriptional profiles explained 37.8% (PC1) and 13.2% (PC2) of sample-to-sample variance, respectively. Similar to hierarchical clustering analysis, the PCA analysis also grouped samples according to developmental stages (PC1) prior to species (PC2).
Differentiation in gene expression between wild and cultivated peanuts
To explore divergence of gene expression pattern between two species, transcriptional profiles of 16 samples were used to perform K-mean clustering analysis. The expressed genes (36850) were grouped into 9 clusters designated as C1-C9 (Figure 2A). Five clusters (C1-C5) showed a decreasing tendency from R5 to R8 stage in both peanut species. Conversely, C6 and C8 clusters showed an increasing trend across accessions from R5 to R8 stage. For C7 cluster, wild accessions (171, 172, and 245) showed a decreasing trend, while cultivated accessions showed an increasing pattern or no obvious change during seed developmental stages. In C2 cluster, gene expression level was higher in wild accessions at R5 stage. In C8 cluster, gene expression level was in general higher in cultivated accessions at both stages. GO category analysis was performed to identify over-represented GO terms of the 9 clusters (Figure 2B and Table S3). A total of 30, 14, and 22 GO terms were enriched in biological process, cellular component, and molecular function, respectively. Each cluster had differently enriched GO terms, and the number of over-represented GO terms ranged from 3 (C9) to 35 (C2). The GO term of lipid metabolism process was found over-represented in clusters C1, C2, and C7 (P-value < 0.05). Interestingly, gene expression level was different between wild and cultivated peanuts in C2 and C7 clusters, indicating that transcriptional profile of lipid metabolism may be divergent between the two species.
To further dissect the expression difference of lipid metabolism, a comparative transcriptomic analysis was performed to identify differentially expressed genes (DEGs) between wild and cultivated peanuts. There were 5647 and 3184 DEGs at R5 (SW vs SC) and R8 (BW vs BC) stages, respectively (Figure S1). A total of 1578 DEGs were detected at both R5 and R8 stages. The number of up-regulated and down-regulated DEGs was overall equivalent at both R5 and R8 stages (Figure S1). Heatmaps of DEGs involved in fatty acid synthesis, fatty acid elongation, TAG synthesis, phospholipid synthesis, sphingolipid synthesis, galactolipid and sulfolipid synthesis, wax synthesis, and β-oxidation were profiled to represent a transcriptional change in lipid metabolism (Figure 3). According to expression pattern of DEGs involved in lipid metabolism, samples in both heatmaps (R5 and R8 stages) could be divided into two groups (wild and cultivated). At R5 stage, most lipid metabolism-related pathways, such as fatty acid synthesis, fatty acid elongation, TAG synthesis, phospholipid synthesis, sphingolipid synthesis, galactolipid, and sulfolipid synthesis were dramatically up-regulated in wild species (Figure 3A). Most down-regulated DEGs at R5 stage were mainly distributed on β-oxidation and wax synthesis pathways. Compared with R5 stage, the portion of up-regulated DEGs in wild species was generally lower at R8 stage (Figure 3B). However, DEGs in fatty acid synthesis, TAG synthesis, phospholipid synthesis, sphingolipid synthesis were mainly up-regulated in wild peanut accessions at R8 stage. Conversely, all the DEGs in β-oxidation were down-regulated at R8 stage. There were 16 lipid metabolic DEGs simultaneously identified at both R5 and R8 stages belonging to fatty acid synthesis (5), fatty acid elongation (3), TAG synthesis (2), sphingolipid synthesis (4), and β-oxidation (3) pathways.
Construction of co-expression network
To further explore genes with high connectivity to lipid metabolic-related DEGs, weighted gene co-expression network analysis (Langfelder and Horvath 2008) was performed to construct a topological overlap matrix of expression similarity between genes. Genes co-expressed with lipid metabolic-related DEGs (expression similarity ≥ 0.1) were selected to perform GO enrichment analysis. They were enriched in 14 GO terms for molecular function category, including transcription regulator activity, DNA-binding transcription factor activity, and DNA binding (Figure 4A). It was suggested that transcription factors (TFs) may play a role in the co-expression network. Many TFs were differentially expressed between wild cultivated peanuts (Figure S3). In specific TF families, such as AP2, B3, and bZIP, the family members were predominately upregulated in wild cultivated peanuts at both R5 and R8 stages. 391 transcription factors were belonging to 47 families that were co-expressed with DEGs involved in lipid synthesis pathways, such as fatty acid synthesis, fatty acid elongation, TAG synthesis, Phospholipid synthesis, Sphingolipid synthesis, Galactolipid and Sulfolipid synthesis (Figure 4B). According to interaction number, the top10 notes in the co-expressed transcription factors (Table S6) included AP2 (Araip.E0UEG), B3 (Aradu.07I6M), B3 (Araip.S9XVH), bZIP (Aradu.898PR), bZIP (Araip.0GM4I), bZIP (Araip.R3LNH), C2H2 (Araip.X9IXZ), G2-like (Araip.8YC75), MYB-related (Araip.L6TM2), NF-YC (Aradu.3GN04), and Trihelix (Aradu.RW5KN). Expression ratio (wild/cultivated) of the transcription factors ranged from 1.4 to 2.5 at R5 stage with mean value of 1.8 while the expression ratio ranged from 1.1 to 2.4 at R8 stage with mean value of 1.4 (Figure 4C). In general, the abundance of the co-expressed transcription factors was higher in wild peanuts than cultivated peanuts, especially at R5 stage.
Influence of DNA methylation on gene expression
The available literature in multiple crops proved that DNA methylation could help to regulate gene expression (Huang et al., 2019; Rajkumar et al., 2020; Wang et al., 2015; Wang et al., 2016; Xing et al., 2015). For revealing the relationship between DNA methylation and gene expression in peanut, we performed bisulfite sequencing of the genomic DNA isolated from R5 and R8 stages of seeds in wild peanut (245) and cultivated peanut (Z16). Four samples (S245, SZ16, B245, BZ16) representing seeds at R5 (S) and R8 (B) stages for 245 and Z16, were sequenced with three biological replicates. About 480 M clean reads were generated for each sample (Table S7) and approximately 76% of the clean reads were uniquely mapped to the reference genomes covering >87% of the genomic cytosine positions. Each methylome had >16-fold average depth per strand. Methyltcytosines were identified in CG, CHG, and CHH contexts across samples. Compared with average methylation of CHH context (11.2-22.3%), the level was much higher in CG (81.7%-86.0%) and CHG (73.2%-78.2%) contexts (Figure S3A). The fraction of methylcytosine (mC) was 20.1%-25.2% in CG, 25.6-31.8% in CHG, and 43.0-54.4% in CHH. Overall methylation levels in four samples were similar (29.3-34.7%) (Figure 5A and Figure S3A). The distribution of mC showed much lower methylation in the terminal chromosomes in contrast to much higher gene expression in the terminal chromosomes (Figure 5A). Genome-wide correlation coefficient between overall DNA methylation and gene expression in the four samples was ~-0.56 (p < 2.2 e-16), indicating significantly antagonistical correlation (Figure 5B).
To further characterize epigenetic variations between 245 (wild) and Z16 (cultivated), differentially methylation regions of CG, CHG, and CHH contexts were detected at R5 and R8 stages (Table S8). Chromosome-wide view of DMR distribution indicated CG-DMR being most likely to be enriched in gene-rich regions (Figure 6A and 6B). Approximately 40% of CG-DMRs were located in the genic region (gene body+2k upstream), the portions were deduced to ~15% and ~10% for CHG-DMRs and CHH-DMRs, respectively (Figure 6C). Conjoint analysis of the expression profile and DMRs was indicated that CG-DMRs and CHG-DMRs were significantly enriched in the genic region (gene body+2k-upstream) of DEGs between 245 and Z16 (Figure 6D). It is hinted that CG- and CHG-DMRs, not CHH-DMR, may play a role in the change of gene expression. Among DEG-overlapped CG-DMRs, the portion of hyper-DMRs in S245-upregulated DEGs was 50.3% and 51.9% at R5 and R8 stages, respectively, and that in S245-downregulated DEGs could increase to 70.5% and 71.1% at R5 and R8 stages, respectively (Figure 6E). Similarly, the portion of hyper-DMRs was much higher in S245-downregulated DEGs (83.7% and 82.4% at R5 and R8, respectively) than in S245-upregulated DEGs (42.7% and 12.5% at R5 and R8, respectively) for DEG-overlapped CHG-DMRs. The results suggested that dynamic change of methylated CG (mCG) and methylated CHG (mCHG) on a genic scale correlated negatively with the difference in gene abundance between 245 and Z16.
Role of DNA methylation in divergence of lipid metabolism
The CG-DMRs between 245 and Z16 were overlapped with 8639 and 2147 DEGs at R5 and R8 stages, respectively (Figure 6F). For CHG-DMRs, there were 4024 and 1357 CHG-DMR associated DEGs at R5 and R8 stages, respectively (Figure 6G). In general, more than 50% and 25% of DEGs between 245 and Z16 may be associated with CG-DMRs and CHG-DMRs, respectively. GO enrichment analysis showed that CG-DMR associated DEGs were enriched in carbohydrate metabolic process and response to endogenous stimulus at both stages (Figure 6H). The CHG-DMR associated DEGs were enriched in response to chemical and response to endogenous stimulus at both stages (Figure 6I). Interestingly, lipid metabolism was one of enriched terms at R8 stage for CG-DMR associated DEGs.
There were 41 and 19 lipid metabolic DEGs that showed a negative correlation with DMRs in CG and CHG context between 245 and Z16 at R5 stage, respectively (Figure 7A). Most genes involved in fatty acid synthesis, fatty acid elongation, phospholipid synthesis, sphingolipid synthesis, galactolipid and sulfolipid synthesis, and TAG synthesis pathways, exhibited hypomethylation with higher expression in 245 at R5 stage. In addition, 33 and 20 TFs known to involve in regulation of lipid metabolism, were found to be negatively correlated with CG- and CHG- DMRs. Among the CG- or CHG- DMR associated TFs, 15 members had been identified to co-express with lipid metabolic DEGs (Figure 7A and Table S6). Most TFs involved in regulation of lipid metabolism, especially the co-expressed TFs, showed hypomethylation with higher abundance in 245 at R5 stages. At R8 stage, 6 and 6 lipid metabolic DEGs were found to be negatively correlated with CG- and CHG- DMRs between 245 and Z16 at R8 stage, respectively (Figure 7B). Genes involved in β-Oxidation showed hypermethylation with lower expression in 245 at R8 stage. Reversely, genes involved in fatty acid elongation, phospholipid synthesis, and TAG synthesis pathways showed hypermethylation with lower abundance in 245. Meanwhile, 17 TFs known to regulate lipid metabolism, exhibited a negative correlation with DMR in CG or CHG context at R8 stage. Among them, three members were identified to co-express with lipid metabolic DEGs (Figure 7B and Table S6). Two DEGs showed hypermethylation with lower abundance, while another one showed hypomethylation with higher expression in 245 at R8 stage. Two examples exhibited a relationship between DMRs and lipid metabolic DEGs. The one gene (Araip.H6S1B) encoding acyl transporter protein was down-regulated in 245 at both stages. Hyper-DMRs were observed at 5’ UTR of the gene (Figure 7B). Another one (Aradu.IPH1G) encoding B3 transcript factor, was up-regulated in 245 at R5. Hypo-DMRs were located at upstream of the gene (Figure 7C).