Sapota is one of the momentous fruit crop cultivated in India. It was an introduced crop by the Spaniards to Asia and in India it was found to be brought through SriLanka. Despite its cultivation throughout the world, very few studies on its genetic diversity using molecular markers.. So far molecular characterization in sapota has been done using dominant RAPD markers (Meghala et al. 2005; Jalawadi et al. 2014; Kumar et al. 2015).
In this study, we have developed microsatellite markers for sapota to characterize at molecular level and to have clear understanding of genetic diversity. SSR markers are widely used in species identification, genome mapping in crop breeding programs, forensics, phylogeography and population genetics due to their abundance availability in the genome, high polymorphism, easy reiteration and cost effectiveness (Ravishankar et al. 2011, 2015c). In the absence of sequenced genomes in non-model species like sapota, the enrichment of genomic libraries with microsatellite markers will be advantageous to develop molecular markers for genetic studies. However, so far there is no report on development of SSR markers in sapota. We have used NGS Illumina HiSeq 2500 platform for developing this informative and versatile DNA based microsatellite markers. At present, NGS has transformed the development of microsatellite markers quick, simple and cost effective with a high throughput data identifying a large number of loci in the genome (Ravishankar et al. 2015b, c; Unamba et al. 2015; Hodel et al. 2016).
The sapota cultivar Cricket Ball genomic DNA was sequenced which generated 3,326,257,143 bases from 22,028,193 reads and after assembly 6396224 contigs were obtained. The GC content of contigs was 40.91% which is within the range commonly observed for plant genomes (Smarda and Bures 2012). A total of 2591 sequences containing 3591 SSRs were identified. We noted mononucleotide repeats were more predominant for 59.1% of all the observed repeats followed by direpeats (28.6%). Others like tri, tetra, penta and hexarepeats accounted for less than 10% (Fig. 1). This finding was in accordance with other crops like mango (Ravishankar et al. 2015c), Garcinia gummi-gutta (Ravishankar et al. 2017), rice, sorghum, Brachypodium, Arabidopsis, Populous (Sonah et al. 2011) with predominant monorepeats. Dinucleotide repeats were also common in other crops like Pouteria sapota (Arias et al. 2015), Pomegranate (Ravishankar et al. 2015b), sour passion fruit (Araya et al. 2017), Manchurian walnut (Hu et al. 2016), American Cranberry (Zhu et al. 2012).
Among the different motif types, AT and AG dinucleotide repeat motifs, TCT and AAT trinucleotide repeat motifs and AAAT tetranucleotide repeat motif are higher in frequency (Fig. 3). Similar pattern was observed in many crops including Pouteria sapota (Arias et al. 2015). The higher frequency a particular repeat motif and its length in the plant genome might be due to selection pressure on that motif over the years during selection and evolution. The evolution of microsatellites in plant genome is not very well studied and also not understood properly. The most common explanation given is it may be due to mutational mechanism through replication slippage. The other likely causes are unequal crossing over, nucleotide substitution, and duplication events. However, they may not explain specific pattern of motif repeats in different species (Buschiazzo and Gemmell 2006; Sonah et al. 2011; Ravishankar et al. 2015c).
In this study, we report successful development of thirty polymorphic microsatellite markers with high PIC values more than 0.8. The mean PIC value was 0.912. The high number of polymorphic SSRs isolation may be due to the Illumina paired-end sequencing which provides an effective alternative to the expensive and time consuming conventional microsatellite enrichment library based method of genome wide SSRs isolation. According to Botstein et al. (1980), any locus with PIC more than 0.5 is highly polymorphic. The mean of observed heterozygosity was 0.291 and expected heterozygosity was 0.927. The number of alleles per locus ranged from 11 to 38 with a mean of 23. The PI values range from 0.0026 to 0.0370 with a mean of 0.0141. A high mean PIC value 0.912 and high mean alleles per locus 23 was observed which might be due to high heterozygosity in the species which also recorded a large number of alleles. We observed 17 (32%) SSR markers showing alleles more than 20 per locus indicating high heterozygosity and diversity in the genotypes used. The markers with low PI can be used as universal primer for sapota DNA fingerprinting.
The clustering pattern in N-J analysis showed three clusters and no clear separation was observed which deviates from the studies carried out in sapota germplasm in India using RAPD markers by Jalawadi (2014), Jalawadi et al. (2014), Kumar et al. (2014) stated that the clustering in sapota was based on size and shape of the fruit. Further analysis of the clustering pattern using the STRUCTURE program figured out ideal value of k = 3 which also showed admixed population with ancestry shared among 56.6% of the population. In Pouteria sapota, a study carried out by Arias et al. (2015) microsatellite markers showed a clustering pattern based on the geographical locations and the STRUCTURE analyses showed admixed population((Fig. 7). The hindrance in the clustering pattern in N-J analysis is probably due to the admixed population shown by STRUCTURE analyses.
Analysis of molecular variance revealed a significant Fst value of 0.69659 indicating high genetic differentiation among the 53 genotypes and 3 populations studied. There was a high differentiation among populations within groups and low differentiation among groups. The observed diversity among populations within groups indicates likely coexistence of different genotypes in the same region (Ravishankar et al. 2015a). In this study the genetic differentiation within the populations showed by the STRUCTURE analysis is similar with the AMOVA results.
Sapota is an introduced crop into India. In such crops the genetic differentiation is generally depends on the number cultivars introduced, degree of heterozygosity or the origin of the cultivars, which are unclear here. Hence, it is expected to be narrow and the genetic variation to be less. However, the results of this study shows that high genetic differentiation and diversity in the sapota population. This is in accordance with earlier studies by Jalawadi et al. (2014) and Kumar et al. (2014). Initially sapota might have been cultivated using seedlings, due to their high heterozygosity there are variations in the off-springs. Later it was selected and vegetatively propagated based on the preference of the region for fruit characteristics and yield. Therefore, there is a possibility of occurrence of wide diversity and a great extent of genetic variability in sapota might be originated due to seedling segregation and it was also possible a large number of seedlings or grafts of sapota was introduced to India from the place of origin. This is the first study in sapota where SSR or microsatellites were developed and genetic diversity of Indian collections was examined. The SSR markers developed would be helpful in developing linkage map, assessing genetic diversity and also molecular characterization of genotypes.