3.1. Across genome transferability of SNP and SSR markers
Both SSR and SNP markers were used on related species of Brassica juncea from Brasicaceae family to understand the genetic relationship between different taxon genomes based on the ability of markers to detect corresponding loci in related species. A total of 88 SSR and 58 SNP markers were found to be functional across 38 genotypes belonging to ten different taxon groups (Supplementary Table 1). The SNPs used in the current study have revealed high cross-genome transferability of more than 59% across the species. Interestingly, the SNPs developed from amphidiploid genome AABB were found to be highly transferable to distantly related species genomes such as EE, CC, RR and SS. The highest transferability (93.17%) was observed in case of close relatives - B. napus (AACC) and B. oleracea (CC); and lowest (63.8%) was observed for R. sativus. For B. tournefortii the cross transferability was observed to be 70.68%; although, Prakash and Narain (1971) designated B. tournefortii as D genome due to low cross compatibility, hybrid sterility and very little gene flow. When the SNP markers from individual genome of B. juncea (A and B) is taken into account, the B-genome SNP loci show more transferability to wild Brassica species as compared to A-genome derived markers where none of the two genomes are available (Fig. 1). A-genome derived markers from B. juncea shows high rate of transferability in all those species where A genome is present suggesting a high conservancy of this genome during the evolution period.
The SSR markers from AA-genome were found to be highly transferable to the related genomes of different taxon included in the current study. The extent of transferability ranged from 78% to 100% (Supplementary Table 1). The high rate of SSRs transferability reveal highly syntenic genomes in the most widely used taxon of the Brasicaceae family. The percent cross-transferability obtained in present investigation is in accordance with some recent studies (Thakur et al. 2018, Singh et al. 2018). However when the transferability potential of both the markers were compared, the SSRs seems to be far better in studying the population genetics in orphan crops species by exploiting multi-allelism among the genomic resources of cultivated species. Moreover, as the SNP loci were selected from the genic region, so some of these regions might be highly conserved for a particular species.
3.2. Genome-wise allelic patterns
The number of alleles detected by SSR markers ranged from 89 alleles (DD genome; B. tournefortii) to 152 alleles (SS genome; S. alba). The SSR markers used in the present study have been developed using sequence information from diploid progenitor (B. rapa) contributing AA genome to B. juncea. Out of 88 SSR markers, thirty one (31) markers detected private alleles among six taxon groups with the most private alleles among genotypes from SS (Sinapis alba) and EE (Eruca species) taxons. Taxons with AA, CC, AACC and DD genomes did not carry any private allele. Detection of highest number of 16 private alleles among SS taxon indicate either a selection pressure experienced by it or it could be due to it being a descendent of ‘nigra’ lineage.
The number of alleles detected by 41 SNPs among 10 taxon groups ranged from 35 (DD genome; B. tournefortii) to 75 (SS genome; Sinapis alba). Like SSRs, the SNP markers too detected highest number of alleles in DD genome of B. tournefortii; but contrary to high number of private alleles in SSRs, there were only three SNPs that were reported to have detected private alleles among SS (Sinapis alba) and RR (Raphnus sativus).
3.3. Molecular genetic diversity
Out of total functional SNP markers, only 41 with an average missing data less than 40% were considered for diversity analysis due to software requirement. As a result of biallelic nature of SNPs, a total of 82 alleles were amplified. The minor allele frequency ranged from 0.013 to 0.48 with an average of 0.235. The gene diversity and heterozygosity value was also able to identify the variability among the genotypes. The gene diversity ranged from 0.026 to 0.496 and heterozygosity level of markers ranged from 0.029 to 0.810 (Supplementary Table 2). The PIC (Polymorphism Information Content) value ranges from 0.025 to 0.375 with an average of 0.244 (Figure 1).
The 88 polymorphic SSR markers were able to amplify 252 alleles among 38 genotypes and in many cases, the amplified PCR product is different from the expected size as obtained in case of species having A progenitor genome. The AA-genome SSR markers were used for the amplification of corresponding loci from AA-genome containing amphidiploid species B. juncea (AABB) and B. napus (AACC). The SSR analysis revealed that the alleles for nearly 74% SSRs were amplified in different size (80-400bp) range among the two amphidiploid genotypes. The results indicated that present day A-genome in these two amphidiploids is diverse from each other both at non-coding (Thakur et al. 2018) and coding sites. This might be due to the fact that the AA-genome in these amphidiploids had evolved under different selection pressure after originating from the parental progenitor species (B. rapa). The number of alleles detected at each locus ranged from one to five with an average of 2.97 alleles per locus, with a size range of 80-400 bp reflecting a wide variation among repeat regions of different alleles. The gene diversity ranged from 0.027 to 0.694 and heterozygosity level of markers ranged from 0.026 to 0.833 (Supplementary Table 3). The PIC (Polymorphism Information Content) value ranged from 0.027 to 0.782 with an average of 0.478 (Figure 2). The average SSRs PIC value is more than the SNPs which relies that SSRs are more informative as compared to SNP markers. Moreover, the high PIC values are also contributed due to SSRs being multi-allelic while SNPs are almost always bi-allelic in nature. The diversity within a collection of germplasm depends upon the degree of relatedness, origin of individual genotypes and the types of markers used to estimate allelic information at different loci. The SSR markers obtained from non-coding DNA tend to uncover higher genetic diversity than SNP markers that are obtained from highly conserved coding regions of a genome. However, SNPs from conserved regions are more likely to be involved in phenotype causal relationship.
3.4 Phylogenetic relationship and Population Structure
In order to see the efficacy of these two markers in determining the genetic distance between various species of Brassicacaeae, a dendrogram based on unweighted neighbor joining method was constructed. Both the markers were able to grouped 38 genotypes into three major clusters {SS, (A, B, C, E, T) and (R, AR)} depending upon the difference in their genome composition. The cluster I, II and III represent SS genome group (S. alba); A-, B-, C-, E-, T- genome group and R- genome group genotypes respectively. The clustering indicated the ability of these molecular markers to form grouping of related genotypes from a genome with high level of accuracy. In case of SNP markers, the cluster I consists of genotypes from S. alba (SS) and cluster III contains genotypes from R. sativus (RR) and Brassicoraphanus (AARR) (Figure 1c). As such the genetic distance of these three species is far away from the core species. S. alba seem closest to B. napus (0.472) and quite distinct from B. juncea (0.563) and B. rapa (0.548) (Supplementary Table 4). Brassicoraphanus was formed from the combination of two genome (AA and RR), but it shows more genetic closeness to R. sativus(RR) (0.243) when compared to AA genome species i.e. B. rapa (0.423) and B. juncea (0.510). The cluster II consists of genotypes from E. sativa/vesicaria, B. juncea, B. tournefortii, B. oleracea, B. napus and B. rapa. E. sativa/vesicaria was found to be closer to B. juncea (0.41) than to B. rapa (0.424), B. napus (0.432) and B. oleracea (0.514). Interestingly, B. tournefortii (TT genome) that showed low cross transferability of markers, was tend to found closer to B. juncea (0.432) than to B. oleracea (0.543) and other Brassica spp. SSR markers also showed nearly the same clustering pattern (Figure 1c) but however the genetic distance between the species was large when compared to SNP markers (Supplementary Table 5). As the SNP markers in the present study was derived from genic region and SSRs were mostly obtained from non-genic region, so the SNP loci shows more conserve nature due to low mutation rate in the genic region as compared to non-genic region. Population structure estimated using STRUCTURE V2.3.4 software under the Hardy-Weinberg Equilibrium also clustered 38 genotypes into three (SSR makers) and seven (SNP markers) groups based on the maximum likelihood and delta K (ΔK) (Figure 1d and 2d).