In this study, 11.01 gb of M. casturi DNA was produced with high-throughput sequencing using a Illumina HiSeq 4000 with 2 × 150 bp paired-ends. The raw data were registered in DDBJ with accession number DRA011022. After filtering, clean reads were obtained, and 10.95 gb of de novo genome assembly was performed using a Ray Assembler. The scaffold obtained was 259,872 bp with an N50 value of 1,445 bp and a maximum scaffold length of 144,601 bp (Table 1). The annotation process used BUSCO, assessed using details of the complete categories and single-copy BUSCOs (S), with a ratio of 42.3% (Table 2).
Table 1
Statistics of de novo assembly from M. casturi using Ray Assembler
Features | Number |
Raw reads (Bases) | 73 438 066 M (11 015 710 G) |
Clean reads (bases) | 73 102 518 M (10 954 176 G) |
Number of Scaffold | 259 872 |
N50 (bp) | 1 445 |
Mean (bp) | 947.68 |
Longest (bp) | 144 601 |
Table 2
Summarized benchmarks in BUSCO notation from Ray Assembler of M. casturi
No | Categories | Number | Ratio (%) |
1 | Complete and single-copy BUSCOs (S) | 608 | 42,2 |
2 | Complete and duplicated BUSCOs (D) | 36 | 2,5 |
3 | Fragmented BUSCOs (F) | 241 | 16,7 |
4 | Missing BUSCOs (M) | 555 | 38,5 |
Microsatellites were identified using the MISA program and 11,040 sequences containing microsatellite motifs, and 770 sequences with more than one microsatellite site were extracted (Table 3). The trinucleotide motif exhibited the highest proportion (52.77%), followed by the dinucleotide motif (33.3%). (Table 3). Fourteen candidate sequences were selected and identified (Table 4). All confirmed primers were amplified and then registered with the DDBJ accession numbers shown in Table 4.
Table 3
Number and motif of microsatellite region from Ray assembler of M. casturi scaffold
Characteristics | Number |
- Total number of identified SSRs | 11 040 |
- Number of SSR containing contig | 10 160 |
- contig containing more than 1 SSR | 770 |
- SSRs present in compound formation | 272 |
Motif | |
- Dinucleotide | 3 680 |
- Trinucleotide | 5 826 |
- Tetranucleotide | 1 194 |
- Pentanucleotide | 213 |
- Hexanucleotide | 102 |
- Heptanucleotide | 25 |
Table 4
Characteristics of 14 microsatellite loci of M. casturi
No | Locus | Nucleotides | Motif | Ranged size | NA | DDBJ accession |
1 | mc-122955 | F : TGTTGATGGTAAGGATTTGGTGT | (GGATG)6 | 168–178 | 2 | LC594546 |
| | R : TCAGGTGAGTATGTATTGTGCA | | | | |
2 | mc-148231 | F : TCCCTCCCCTAAACCCTTCT | (ACCCTAA)5 | 188–209 | 4 | LC594549 |
| | R : GCTTCTCCTTGCCTCTAAATCCT | | | | |
3 | mc-151578 | F : GAGCCTTGTACTCGTTCAATGA | (CAAGCT)8 | 273–279 | 5 | LC594547 |
| | R : ACGAGCTTAAAATGAGTTTGACT | | | | |
4 | mc-167596 | F : AGCTGAACCTTGTTGCCCTT | (GA)27 | 192–224 | 3 | LC594539 |
| | R : TCTGCTTGTTGGAACTGAACA | | | | |
5 | mc-176197 | F : TGTATGCCCGAATTGTTCCAAC | (AC)19 | 237–250 | 3 | LC594537 |
| | R : GCTGGCTTTAATGGAAGTTGCA | | | | |
6 | mc-211123 | F : GGATGGTGGATGTCAGATTTTCG | (TGAAGT)6 | 323–339 | 5 | LC594548 |
| | R : CGAAGAGAACGGGTCCCTTG | | | | |
7 | mc-21672 | F : TGGTTGGTAAGAAGTAGGATTC | (ATAC)11 | 263–264 | 4 | LC594543 |
| | R : CACAATGCAAATCACTCCTC | | | | |
8 | mc-230178 | F : AGACAGCCATAATTTGCCCCA | (ATG)12 | 162–188 | 6 | LC594541 |
| | R : GCTGGAGGTTGATCAGGGTC | | | | |
9 | mc-28107 | F : GGTGTGCGTTCTGTTTTGACA | (TG)28 | 211–250 | 5 | LC594540 |
| | R : CAGCAGCATCAACACAAGCA | | | | |
10 | mc-4673 | F : TTTCCAAAGCCAAGACTCTC | (TAAACCC)5 | 231–245 | 3 | LC594550 |
| | R : AAAATTGTATTCATTAAGCCCCT | | | | |
11 | mc-58089 | F : TCTTGTCGTCGAATCAAACTCA | (AT)22 | 264–287 | 6 | LC594538 |
| | R : CTCGGTCTATCAATGGTGTAGGT | | | | |
12 | mc-8693 | F : CGAAGGGTTGAGGTTTGGGT | (CTTTT)7 | 159–183 | 4 | LC594545 |
| | R : AAAGAGTGAGAGGGTTGCGT | | | | |
13 | mc-88075 | F : CTCCAATCGAACAACCCAGC | (TTA)15 | 278–286 | 3 | LC594542 |
| | R : AGGGGTGCATATGGAGGATT | | | | |
14 | mc-88387 | F : CCATTTCGACGATGTTGGAAGT | (TATG)10 | 251–252 | 2 | LC594544 |
| | R : GCAACCCTTACCAACAAGCA | | | | |
Eight samples were used to validate and determine allele size using QIAxcel®. The 14 primers produced 55 alleles in total, and the mean number of alleles per locus was 3.93 (Table 4). All loci were polymorphic (Table 5). The mc-230178 and mc-58089 loci produced six alleles, while mc-122955 and mc-88387 produced two alleles. Some loci showed the same alleles between Kasturi and Mawar, namely mc-176197, mc-21672, and mc-88075. In the mc-88387 locus, only the Kasturi sample was not amplified, and it was proposed that this locus was a null allele of Kasturi. Therefore, mc-88387 can be used to identify M. casturi in the population, as it is otherwise similar to other Mangifera species, such as Mawar. The UPGMA tree was produced using 14 loci (Fig. 2). M. quadrifida and Rawa-rawa were placed in the same clade. All accessions of M. casturi were in the same clade as M. indica, even as an out-group for this analysis (Hambangan or M. foetida). Mawar accessions were most closely related to M. indica. Kasturi and Pelipisan had the same clade. Some markers showed allele similarity between Kasturi and Pelipisan; these accessions, thus, had a closer genetic relationship to each other than to Mawar. However, Pinari also exhibited distinct genetic differences from the other accessions of M. casturi, even though Mawar was quite distant from another M. casturi accession.
Table 5
Allele size information per microsatellite locus
No. | Locus | Allele size (bp) |
1 | mc-122955 | 168,178 |
2 | mc-148231 | 188,196,203,209 |
3 | mc-151578 | 256,270,273,279,284 |
4 | mc-167596 | 193,224,226 |
5 | mc-176197 | 235,237,249 |
6 | mc-211123 | 318,323,333,335,340 |
7 | mc-21672 | 255,256,257,263 |
8 | mc-230178 | 162,164,170,173,178,185 |
9 | mc-28107 | 204,211,214,225,250 |
10 | mc-4673 | 231,238,246 |
11 | mc-58089 | 261,264,273,279,282,287 |
12 | mc-8693 | 157,160,167,182 |
13 | mc-88075 | 267,278,286 |
14 | mc-88387 | 251,253 |
Phylogenetic analysis was performed using three widely used chloroplast markers (Fig. 3). The matK phylogenetic tree showed that Kasturi, Mawar, Pelipisan, Pinari, and Hambawang belonged to one group with M. indica and M. sylvatica. In comparison, the rbcL phylogenetic tree placed Mawar and Pelipisan into the same clade for almost all M. indica accessions. Meanwhile, Pinari and Hambawang were separated from this clade and joined by the M. laurina, M. flava, M. cochinchinensis, M. odorata, and M. duperreana clades. In contrast, the phylogenetic tree analysis results using trnH-psbA led to Kasturi, Pelipisan, and Hambawang being grouped with M. indica. Pinari was close to M. odorata, M. griffithii, M. pajang, M. andamanica, and M. indica.
The ITS phylogenetic tree produced three large groups, namely Indica 1, Indica 2, and a group containing Kasturi, Mawar, Pelipisan, and Pinari. Hambawang was included in the Indica 2 group. Pinari was placed in a sub-group with M. oblongifolia, M. camptosperma, M. gedebe, and M. flava. Kasturi, Mawar, and Pelipisan were included in the other sub-groups with M. casturi (MF678493.1), M. griffthii, M. quadrifida, M. kemanga, M. torquenda, and M. sumatrana.