3.1 Mitogenome organization
We obtained a total length of 16,354 bp of the Hangul mitogenome and submitted it in the NCBI GenBank (Accession number: MW430050). It consisted of 22 transfer RNA genes, 13 protein-coding genes (PCGs), two ribosomal RNA genes, and a non-coding control region (D-loop region) (Fig. 1 and Table 1). The arrangement and distribution of mtDNA genes were similar to the other Tarim and western red deer species [26,27]. The total nucleotide composition of Hangul mtDNA was A (33.26%), T (28.75%), C (24.51%), and G (13.49%) (Table 2). Most of the genes were coded on the H-strand, except for the ND6 gene (13557-14084) and eight tRNA genes (tRNAGln, tRNAAla, tRNAAsn, tRNACys, tRNATyr, tRNASer, tRNAGlu, tRNAPro). The control region was present between tRNAPro and tRNAPhe (Table 1). We observed eight pairs of overlapping genes among tRNAVal/16S rRNA, tRNAIle/tRNAGln, COI/tRNASer, ATP8/ATP6, ATP6/COIII, ND4L/ND4, ND5/ND6, and tRNAThr/tRNAPro. The lowest overlapping was observed between tRNAVal/16S rRNA, ATP6/COIII, and tRNA-Thr/tRNA-Pro (1 bp), whereas it was highest between ATP8 and ATP6 (40 bp) (Table 1). These overlapping nucleotides are usually present in other mammalian species [28,29]. We observed a total of 15 intergenic spacers in between the mitochondrial regions, which range from 1 to 32 bp in length. The longest space was found between tRNAAsn and tRNACys (32 bp) (Table 1). The AT and GC content in the whole mitogenome of Hangul was found to be AT biased, with 62% and 37%, respectively. We estimated the values of AT-skew, GC-skew, AT%, and GC% for identifying nucleotide compositions in complete mitogenome. The result indicated that AT-skew was positive (0.072) whereas GC-skew (-0.289) was negative for all examined red deer subspecies (Supplementary Table ST2).
3.2. Protein-coding genes (PCGs)
The total length of 13 PCGs in the Hangul mitogenome was 11,403 bp that consisted of 64 bp overlapping fragments, which accounted for 69.72% of the complete mitogenome. However, the length of PCGs in some red deer subspecies (C. c. songaricus, C. canadensis, and C. c. nannodes) was 11,404 bp (Supplementary Table ST1). The typical base composition of PCGs was A = 31.38% T = 30.54%, G = 13.22 and C =24.87 (Table 2). The Hangul PCGs comprised 12 majority strand or H-strand genes (NADH dehydrogenases: ND1, ND2, ND3, ND4, ND5, and ND4L; three cytochrome c oxidases: COI, COII, and COIII; two ATPases: ATP6 and ATP8, and one cytochrome b: Cyt b gene) and one minority strand or L-strand gene (NADH dehydrogenase: ND6 gene) (Fig. 1 and Table 1) as commonly present in other vertebrate species [30,31]. We observed a higher abundance of AT% (61.9%) than GC% (38.1%). We examined base skews between red deer subspecies for understanding the nucleotide distribution in PCGs. The average AT and GC skews value for Hangul PCGs was 0.014 and −0.306, receptively. We also observed positive AT skewness in other red deer species; it indicated that adenines base presents more frequently than thymine, while GC skewness values were negative, indicating C-biased nucleotide composition (Supplementary Table ST1). Of these 13 PCGs, the ND5 gene (1821 bp) was the longest, and the ATP8 (201 bp) was the smallest in length. All 13 PCGs were started with ATG or ATA; similar to other red deer species [27]. We found seven complete stop codons TAA, out of thirteen PCGs, excluding Cyt b with AGA; whereas ND1- ND4 and COIII use incomplete codon TA- or T- (Table 1). The PCGs with incomplete stop codon were completed by a post-transcriptional polyadenylation addition during the mRNA maturation process. Relative synonymous codon usage (RSCU) for the 13 PCGs of Hangul consisted of 3597 codons (excluding stop codons) (Fig. 2). We observed the highest frequency for leucine (11.75%) and lowest for tryptophan amino acid (1.12%) in Hangul and other red deer species PCGs (Fig. 3).
3.3. Ribosomal RNA and transfer RNA genes
We identified two ribosomal RNA and 22 tRNA genes in the complete mitogenome of Hangul, which is typically found in other mammalian species [32,33]. The size of 12S rRNA was 957 bp, while 16S rRNA was 1572 bp. The 12S rRNA and 16S rRNA genes were located between tRNAPhe and tRNAVal and between tRNAVal and tRNALeu, respectively (Table1 and Fig. 1). The total nucleotide composition in two rRNA was A (37.72%), T (24.24%), C (20.88%), and G (17.16%) (Table 2). The length of both rRNA genes of Hangul was 2529 bp which accounted for 15.46% of the complete mitogenome and it varies from 2516 to 2529 bp in other red deer subspecies (Supplementary Table ST2). The total AT content of two rRNA was 61.96%, similar to other subspecies of red deer. The typical AT and GC skew for Hangul in two rRNA was 0.217 and −0.097, receptively (Supplementary Table ST1). The 22 tRNA genes were distributed in the whole mitogenome and the size of 22 tRNA was varied from 60 (tRNAser) to 75 bp (tRNALeu). Of these 22 tRNAs genes, 14 were located on H-strand, while eight were present on L-strand (Fig. 1 and Table 1). The size of 22 tRNA was 1514 bp and nucleotide composition was A (35.6%), T (28.2%), C (20.61%), and G (15.59%). The average AT and GC content in tRNA was AT-biased with 63.8% and 36.2%, respectively. We observed positive skews values (0.116) for AT content and negative skews values (-0.138) for GC content (Table 2). The anticodons of 22 tRNAs of Hangul are provided in Table 1. All the 21 tRNA genes showed a typical secondary cloverleaf structure, excluding tRNAser in which the dihydrouridine arm did not form a stable structure (Fig. 4).
3.4. Mitochondrial D-loop
Mitochondrial D-loop/control region (mtCR) is a non-coding, hyper-variable region, plays an essential role in regulating replication and transcription of the mitochondrial genome [34]. We observed the length of mtCR in C. h. hanglu was 917 bp and it was positioned between tRNAPro and tRNAPhe (Table 1 and Fig. 1). The size of mtCR was smallest (916 bp) in C. h. yarkandensis and longest (994 bp) in C. c. xanthopygus and C. c. songaricus. It showed that Tarim red deer (C. h. hanglu and C. h. yarkandensis) having almost equal size of mtCR. The variation in the CR length might be due to insertion and deletion (INDEL), which has also been reported in previous studies [35,36]. The nucleotide composition of CR was A (29.23%), T (31.73%), C (23.56%), and G (15.49%). The AT (60.95%) content was higher than the GC (39.05%) content. We observed negative skews values for AT and GC content −0.041 and −0.206, respectively (Table 2).
3.5. Phylogenetic analysis and genetic distance
The phylogenetic position of Hangul was performed with other red deer, sika deer (C. nippon), eight species of Cervini, two species of Muntiacini, one species of Alceini, six species of Caprini, two species of Bovini, and one species of Boselaphini using 13 PCGs. The Bayesian inference phylogenetic tree indicated that Hangul formed a sister relationship with C. h. yarkandensis and closed to C. e. hippelaphus, which formed a Western clade with high posterior probability values (PP) (PP~1) (Fig. 5). Interestingly, the other red deer subspecies of C. cannadensis are clustered with Sika deer (C. nippon) and formed an Eastern clade. The Bayesian result supports the assignment of Tarim red deer (C. h. hanglu and C. h. yarkandensis) within the western clade as suggested by Lorenzini and Garofalo, 2015 using both complete mt cyt b gene and CR region [16]. Moreover, both the sika and red deer clustered within the cervini group (Fig. 5). In contrast to this, the previous phylogenetic position of Tarim deer did not show the congruent results, showed a closer relationship with Eastern red deer [14], and another study by Kumar et al., (2017), where it formed a separate clade with both Eastern and Western red deer clade [15]. The long coverage of mtDNA provides better insight into the resolution of the phylogenetic tree and taxonomic position than the short fragment [37]. Moreover, the cladding pattern of other deer species exhibited similar clustering as described by Gilbert et al., 2006 [38]. We estimated pairwise genetic distance between the red deer subspecies based on 13 PCGs and complete mitogenome (Table 3 and Supplementary Table ST2). It indicated that Hangul was closest to Tarim red deer (C. h. yarkandensis) with a low genetic distance (0.028). We also observed that Hangul was closer to western red deer (C. e. hippelaphus) (0.038) than eastern red deer (0.057 to 0.072). The highest genetic distance was observed between Hangul and Manchurian wapiti (C. c. xanthopygus) and these findings were similarly based on complete mitogenome (Supplementary Table ST2). The gene-wise comparison of the red deer subspecies showed Hangul was closer to Tarim red deer based on 12S, 16S, ND1, ND2, COI, ATP8, ATP6, ND4, ND5, ND6, Cyt b, and CR. However, few genes such as COII, COIII, ND3, and ND4L showed close to C. e. hippelaphus, which belongs to the western red deer clade (Fig. 6).