On an average, 12,78,52,977 bisulphite sequencing reads were generated for each sample (Supplementary Table 1). As high as 99.96% reads mapped on to the genome of cultivated peanut. Only VG 9514 had relatively low mapped reads, indicating its divergence from the cultivated peanut probably due to the contribution from A. cardenasii. The number of mapped reads at each DNA methylated site ranged from 1 to 1,658 (Supplementary Table 2). Among the 11 genotypes, 7,59,73,928 sites belonged to the category where all the mapped reads (100%) showed cytosine methylation (Supplementary Table 3). Similarly, 10,11,37,805 sites belonged to the category where at least 50% of the mapped reads showed cytosine methylation. The number of sites where less than 50% of the mapped reads showed cytosine methylation was 12,64,87,183.
On an average, 25,53,19,879 plausible DNA methylation sites were found among the 11 genotypes, of them 7,68,86,803 sites showed DNA methylation with 100% reads showing methylation (Table 2). The B sub-genome exhibited higher DNA methylation sites (4,62,94,063) than the A sub-genome (3,04,15,166). A total of 1,77,574 sites were found in the scaffolds. CHG (where H is A, C or T) region showed the highest methylation sites (3,05,37,376) regions, followed by CpG (3,03,56,066) and CHH (1,59,93,361) regions. This observation is in line with the previous reports [29, 30] that the DNA methylation in plants is found both in CpG and non-CpG (CHG and CHH, where H is A, C or T) contexts in contrast to mammals where DNA methylation occurs predominantly at CpG dinucleotides.
Among the 11 genotypes, JL 24 and TMV 2 showed the highest (8,21,37,767) and the lowest methylation sites (6,90,44,110), respectively (Table 3). Such a natural epigenetic variation was also observed among the different ecotypes of Arabidopsis [31]. Many of the sites were found to be conserved across for DNA methylation across the genotypes of peanut. A total of 53,79,101 sites showed DNA methylation across all the 11 genotypes. The sites showing genotype-specific DNA methylation ranged from 65,75,363 (TMV 2) to 91,90,780 (JL 24) (Table 3).
On an average, inter-genic regions (7,04,64,637 sites) were more prone for DNA methylation than the genic regions including 2 kb upstream and 2 kb downstream regions (64,22,166 sites) (Table 3). Within the genic regions, the introns (15,90,263) showed a greater number of DNA methylation sites than the exonic regions (9,71,274). The 2 kb upstream and 2 kb downstream regions had 38,60,629 DNA methylation sites, indicating higher proportion of DNA methylation at the upstream and downstream regions than the gene body region. The distribution of DNA methylation within the genome especially in the promoter and gene body regions is very important as it influences the gene expression [32].
Of the 67,124 genes (31,359 in A genome, 35,110 in B genome and 655 on scaffolds) in peanut, the number of genes showing at least one methylated site ranged from 51,179 (ICGV 86855) to 55,497 (ICGV 99005) (Table 3). Of them, Arahy.0DU9MH, a 3,42,359 bp long gene on chromosome 11, showed the highest number of methylated sites, which ranged from 11,488 (ICGV 86655) to 14,026 (JL 24). Within Arahy.0DU9MH, the promoter region had 131 methylated sites, while the gene body (142 in exons and 12,573 in introns) had 12,715 sites. The expression (FPKM) of the 53,740 genes varied widely among the 11 genotypes (Supplementary Table 4). Arahy.0DU9MH with the highest DNA methylations sites did not show any expression at 21 DAS in the leaves of the 11 genotypes. Fifty genes with wide range of FPKM across the genotypes were selected and checked for the DNA methylation. Arahy.FHUH7B on chromosome 10 showing the highest FPKM of 54,951 had a maximum of 102 DNA methylation sites (Supplementary Table 5). Many of these genes showed negative association between the number of DNA methylation sites and FPKM among the genotypes.
Fourteen cytosine–5 DNA methyltransferase (C5-MTases) coding genes and ten DNA demethylase coding genes identified in the diploid peanut earlier [21] were analysed for DNA methylation and expression. A considerable variation was observed for methylation across the genes, however, not much variation was observed for methylation across the genotypes (Supplementary Table 6). A DME-like A gene Arahy.R549UJ (Aradu.4D5YM) of 15,531 bp length on chromosome 8 showed the highest number of methylation (as high as 586). Forty-four sites were found in the promoter region, while 542 sites were in the gene body (48 in exon and 494 in intron). This gene did not show any expression at 21 DAS in the leaves of the 11 genotypes.
An attempt was made to enumerate the differentially methylated sites between a parent (TMV 2) and its EMS-derived mutant (TMV 2-NLM). The two genotypes significantly differed for 650 methylation sites, of which 240 and 401 were found in the A and B genome (remaining nine on the scaffolds), respectively. Again, the inter-genic region showed a greater number of DNA methylated sites (605) than the genic regions (45; 23 in exons and 22 in introns). Thirty-seven genes exhibited differential methylation, of which eight showed differential expression (Supplementary Table 7a), indicating the influence of EMS mutagenesis on DNA methylation.
In an attempt to identify the differentially DNA methylated sites, foliar disease resistant (GPBD 4, VG 9514, ICGV 86855, ICGV 99005 and ICGV 86699) and susceptible (TAG 24, TMV 2 and JL 24) groups of genotypes were constructed. The common sites within susceptible group were compared with the common sites within the resistant group. In total, 766 sites showed significantly differential DNA methylation. Of these, 331 sites were in the A genome and 433 sites were in the B genome. In total, 731 methylation sites were in the inter-genic regions and 35 were in the genic regions (19 in exons and 16 in introns). Interestingly, four differentially DNA methylated sites (10,01,785, 10,01,813, 10,21,671 and 13,05,680) mapped to the QTL region (for LLS) on A02 and one (13,43,50,159) mapped to the QTL region (for rust) on A03 [33]. Of these sites, only one (13,05,680) was in a genic (Arahy.42YDET) region. However, this gene has not been regarded as a candidate gene for foliar disease resistance [33]. Based on the genomic position of the DNA methylation sites, 25 genes were found to be differentially methylated (q≤0.01) between resistant and susceptible genotypes. Of these genes, two genes (Arahy.1XYC2X on chromosome 01 and Arahy.00Z2SH on chromosome 17) coding for senescence-associated protein showed differential expression with resistant genotypes recording higher FPKM values (Supplementary Table 7b). It was interesting to note the methylation pattern within Arahy.1XYC2X differed between the resistant and susceptible groups, indicating the epialleles at this locus. The candidate genes identified for late leaf spot (four genes) and rust (six genes) resistance in the previous study [33] did not show any DNA methylation, indicating that breeding for foliar disease resistance can depend only on the genetic variation. FPKM values for the transcripts at these loci were on par between the resistant and the susceptible genotypes. This was also confirmed by the qRT-PCR where some of these genes showed non-significant fold changes between the two groups (Supplementary Table 8).