Mitogenome structure and organization: The Arunachali yak mitogenome had the distinctive characteristics of a mammalian mitochondrial genome. The mitogenome was a closed-circular double-stranded DNA molecule of 16,324 base pairs (bp) comprising 37 genes (13 protein coding genes (PCGs), 22 tRNAs, 2 rRNAs) and a non-coding control region (D-loop) (Table 1). The size of whole mitochondrial genome typically ranges from16,321 bp as in Meiren yak[37] to 16,325bp in Bazhou yak[38]. The gene order and arrangement were comparable to those of previously documented yak mitogenomes[39,40] and related bovid species[21,22].
Table 1: Gene content of Arunachali yak mitogenome
Group of genes
|
Name of genes
|
Protein coding genes (PCGs)
|
ND1, ND2, ND3, ND4, ND4L, ND5, ND6, COX1, COX2, COX3, Cytb, ATP6, ATP8
|
Ribosomal RNAs (rRNAs)
|
16s-rRNA (l-rRNA), 12s-rRNA (s-rRNA)
|
Transfer RNAs (tRNAs)
|
tRNA-Phe, tRNA-Val, tRNA-Leu, tRNA-Ile, tRNA-Gln, tRNA-Met, tRNA-Trp, tRNA-Ala, tRNA-Asn, tRNA-Cys, tRNA-Tyr, tRNA-Ser, tRNA-Asp, tRNA-Lys, tRNA-Gly, tRNA-Arg, tRNA-His, tRNA-Ser, tRNA-Leu, tRNA-Glu, tRNA-Thr, tRNA-Pro
|
Non-coding gene
|
Control region or D-loop
|
A total of 29 genes including 12 PCGs, 15 tRNAs, 2 rRNAs were encoded on the heavy (H) strand while the remaining PCG, ND6 and 7 tRNAs on the light (L) strand. Gene overlaps and intergenic or non-coding regions were also observed in the mitogenome1-40 bp (Table 2, Figure 1). There were 12 gene overlaps and the longest one involved 40bp between ATP8 and ATP6. The overlap between these two genes has been reported in other mitochondrial studies and appears to be a common occurrence[41]. Overall, 15 intergenic regions ranging from1-31 bp were identified. Aside from the D-loop region, the longest intergenic region occurred between trnN and trnC. This region of 31bp has been annotated in some studies as the origin for the replication of light (L) strand[42].
Nucleotide composition: The nucleotide composition of the whole mitogenome was found to be overall biased towards adenine and thymine with AT content at 60.98% and GC at 39.02% with each nucleotide content at A=33.70%, T=27.28%, G=13.21% and C=25.80%. The AT bianess is a typical characteristic of the mitochondrial genome among all the vertebrates [21-22,25,37-40]. The AT and GC content did not vary much among the different regions and greatest amount of AT was observed in the tRNA region while GC content was most significant in the PCGs. Additionally, the mitogenome had positive AT skew (0.11) and negative GC skew (0.32) which suggests higher content of adenosine and cytosine than their complementary nucleotides (Table 3).
Table 2: Mitochondrial genome annotation of Arunachali yak. H-heavy strand and L- light strand.
Gene Name
|
Start
|
End
|
Length (bp)
|
Strand
|
Anti-codon
|
Start codon
|
Stop codon
|
Space(+) / Overlap(-)
|
trnF
|
1
|
67
|
67
|
H
|
GAA
|
|
|
0
|
12s rRNA
|
68
|
1024
|
957
|
H
|
|
|
|
0
|
trnV
|
1025
|
1091
|
67
|
H
|
TAC
|
|
|
0
|
16s-rRNA
|
1090
|
2661
|
1572
|
H
|
|
|
|
-2
|
trnL
|
2663
|
2737
|
75
|
H
|
TAA
|
|
|
1
|
ND1
|
2740
|
3696
|
957
|
H
|
|
ATG
|
TAA
|
2
|
trnI
|
3696
|
3764
|
69
|
H
|
GAU
|
|
|
-1
|
trnQ
|
3762
|
3833
|
72
|
L
|
TTG
|
|
|
-3
|
trnM
|
3836
|
3904
|
69
|
H
|
CAT
|
|
|
2
|
ND2
|
3905
|
4948
|
1044
|
H
|
|
ATA
|
TAG
|
-1
|
trnW
|
4947
|
5013
|
67
|
H
|
TCA
|
|
|
-2
|
trnA
|
5015
|
5083
|
69
|
L
|
TGC
|
|
|
1
|
trnN
|
5085
|
5158
|
74
|
L
|
GTT
|
|
|
1
|
trnC
|
5191
|
5257
|
67
|
L
|
GCA
|
|
|
31
|
trnY
|
5258
|
5325
|
68
|
L
|
GTA
|
|
|
0
|
COX1
|
5327
|
6871
|
1545
|
H
|
|
ATG
|
TAA
|
1
|
trnS
|
6869
|
6937
|
69
|
L
|
TGA
|
|
|
-3
|
trnD
|
6945
|
7012
|
68
|
H
|
GTC
|
|
|
7
|
COX2
|
7014
|
7697
|
684
|
H
|
|
ATG
|
TAA
|
1
|
trnK
|
7701
|
7767
|
67
|
H
|
TTT
|
|
|
3
|
ATP8
|
7769
|
7969
|
201
|
H
|
|
ATG
|
TAA
|
1
|
ATP6
|
7930
|
8610
|
681
|
H
|
|
ATG
|
TAA
|
-40
|
COX3
|
8610
|
9390
|
781
|
H
|
|
ATG
|
T--
|
-1
|
trnG
|
9394
|
9462
|
69
|
H
|
TCC
|
|
|
3
|
ND3
|
9463
|
9819
|
357
|
H
|
|
ATA
|
TAG
|
0
|
trnR
|
9810
|
9878
|
69
|
H
|
TCG
|
|
|
-10
|
ND4L
|
9879
|
10175
|
297
|
H
|
|
ATG
|
TAA
|
0
|
ND4
|
10169
|
11546
|
1378
|
H
|
|
ATG
|
T--
|
-7
|
trnH
|
11547
|
11616
|
70
|
H
|
GTG
|
|
|
0
|
trnS
|
11617
|
11676
|
60
|
H
|
GCT
|
|
|
0
|
trnL
|
11678
|
11747
|
70
|
H
|
TAG
|
|
|
1
|
ND5
|
11748
|
13568
|
1821
|
H
|
|
ATA
|
TAA
|
0
|
ND6
|
13552
|
14079
|
528
|
L
|
|
ATG
|
TAA
|
-17
|
trnE
|
14080
|
14148
|
69
|
L
|
TTC
|
|
|
0
|
Cytb
|
14153
|
15292
|
1140
|
H
|
|
ATG
|
AGA
|
4
|
trnT
|
15296
|
15365
|
70
|
H
|
TGT
|
|
|
3
|
trnP
|
15365
|
15430
|
66
|
L
|
TGG
|
|
|
-1
|
D-loop
|
15431
|
16324
|
894
|
H
|
|
|
|
0
|
Table 3: Nucleotide composition of Arunachali yak mitogenome
Region
|
T%
|
C%
|
A%
|
G%
|
A+T
|
G+C
|
AT Skew
|
GC Skew
|
Whole genome
|
27.28
|
25.80
|
33.70
|
13.21
|
60.98
|
39.02
|
0.11
|
-0.32
|
PCGs
|
28.90
|
26.30
|
31.80
|
12.90
|
60.70
|
39.20
|
0.05
|
-0.34
|
tRNAs
|
30.29
|
19.34
|
32.13
|
18.23
|
62.42
|
37.58
|
0.03
|
-0.03
|
rRNAs
|
23.61
|
21.43
|
37.49
|
17.48
|
61.09
|
38.91
|
0.23
|
-0.10
|
D-loop
|
28.52
|
25.28
|
32.33
|
13.87
|
60.85
|
39.15
|
0.06
|
-0.29
|
PCGs and Codon Usage: The 13 PCGs of Arunachali yak mitogenome had a total length of 11379 bp coding 3793 amino acids which accounted for 69.71% of the total sequence. The initiation codon for all the PCGs were the typical ATN start codon. The start codon of most of the genes viz. ND1, COX1, COX2, COX3, ATP8, ATP6, ND4 ND4L, ND6, Cytb were ATG while three genes ND2, ND3 and ND5 used ATA codon for initiation of transcription. With respect to stop codon, eight out of the thirteen PCGs had TAA as their stop codon, COX3 and ND4 terminated with a single T residue which is also observed in other vertebrates [23,39] . The stop codon for ND2 and ND3 in Arunachali yak mitogenome was TAG while other mammalian mitogenome studies have reported the stop codon for ND3 gene as T-residue[23,37,42-43]. However, TAG as stop codon for mitochondrial genome was observed for an ancient cattle[44]. The Cytb gene used a unique stop codon AGA to terminate transcription[39-40, 43] (Table 2).
Table 4: Codon count and relative synonymous codon usage
Codon
|
Count
|
RSCU
|
Codon
|
Count
|
RSCU
|
Codon
|
Count
|
RSCU
|
Codon
|
Count
|
RSCU
|
UUU(F)
|
100
|
1.06
|
UCU(S)
|
106
|
1.2
|
UAU(Y)
|
177
|
1.27
|
UGU(C)
|
45
|
1
|
UUC(F)
|
88
|
0.94
|
UCC(S)
|
97
|
1.1
|
UAC(Y)
|
102
|
0.73
|
UGC(C)
|
45
|
1
|
UUA(L)
|
128
|
1.29
|
UCA(S)
|
106
|
1.2
|
UAA(*)
|
171
|
1.65
|
UGA(W)
|
54
|
1.15
|
UUG(L)
|
41
|
0.41
|
UCG(S)
|
37
|
0.42
|
UAG(*)
|
85
|
0.82
|
UGG(W)
|
40
|
0.85
|
CUU(L)
|
118
|
1.19
|
CCU(P)
|
145
|
1.39
|
CAU(H)
|
136
|
1.18
|
CGU(R)
|
31
|
0.89
|
CUC(L)
|
108
|
1.09
|
CCC(P)
|
130
|
1.25
|
CAC(H)
|
95
|
0.82
|
CGC(R)
|
38
|
1.09
|
CUA(L)
|
144
|
1.45
|
CCA(P)
|
104
|
1
|
CAA(Q)
|
156
|
1.43
|
CGA(R)
|
37
|
1.06
|
CUG(L)
|
58
|
0.58
|
CCG(P)
|
37
|
0.36
|
CAG(Q)
|
62
|
0.57
|
CGG(R)
|
33
|
0.95
|
AUU(I)
|
157
|
1.14
|
ACU(T)
|
162
|
1.43
|
AAU(N)
|
217
|
1.15
|
AGU(S)
|
80
|
0.9
|
AUC(I)
|
118
|
0.86
|
ACC(T)
|
113
|
1
|
AAC(N)
|
161
|
0.85
|
AGC(S)
|
105
|
1.19
|
AUA(M)
|
144
|
1.4
|
ACA(T)
|
135
|
1.19
|
AAA(K)
|
218
|
1.48
|
AGA(*)
|
74
|
0.71
|
AUG(M)
|
61
|
0.6
|
ACG(T)
|
44
|
0.39
|
AAG(K)
|
77
|
0.52
|
AGG(*)
|
85
|
0.82
|
GUU(V)
|
26
|
0.79
|
GCU(A)
|
61
|
1.23
|
GAU(D)
|
42
|
0.97
|
GGU(G)
|
31
|
1.16
|
GUC(V)
|
34
|
1.03
|
GCC(A)
|
63
|
1.27
|
GAC(D)
|
45
|
1.03
|
GGC(G)
|
23
|
0.86
|
GUA(V)
|
54
|
1.64
|
GCA(A)
|
64
|
1.29
|
GAA(E)
|
75
|
1.34
|
GGA(G)
|
31
|
1.16
|
GUG(V)
|
18
|
0.55
|
GCG(A)
|
10
|
0.2
|
GAG(E)
|
37
|
0.66
|
GGG(G)
|
22
|
0.82
|
The relative synonymous codon usage (RSCU) calculated for each amino acid (except for the stop codons) in the mitogenome indicated the most frequently used codon as Lys (AAA) followed by Asn (AAU), Tyr (UAU) and Thr (ACU) as the top five. The least commonly used codon was Ala (GCG) with only 10 counts (Table 4, Figure 2). The most abundant amino acids were Leucine, Serine, Threonine, Proline and Asparagine. In general, it was observed that the codons ending with A or T were more preferred than those with G or C in the third position when coding the same amino acid. This may be the cause of AT biasness in the whole mitogenome. Furthermore, the distribution pattern of the codons in each of the amino acid can act as marker for species level identification of animals[45].
The Ka, Ks and Ka/Ks ratio were calculated for each of the 13 PCGs (Figure 3) to estimate the selection pattern of these PCGs. It was observed that the Ka/Ks ratio was very low (less than 1.0) for all the genes and the ratio was 0 in four of the genes (Supplementary Table S1). There was absence of non-synonymous substitution in COX1, COX2 and ND1 genes while ND6 genes in the mitogenomes under study lacked synonymous substitution. Overall, it may be assumed that there is presence of purifying selection in these breeds and as such evolutionary stability is ensured[46-47]. Purifying selection is vital to prevent deleterious mutations from taking over a population and to ensure that any better structures are kept for as long as they can be maintained[48].
Transfer RNAs and ribosomal RNAs: The mitogenome of Arunachali yak consisted of 22 tRNAs with a total length of 1779 bp ranging from 60bp in trnS1 to 75bp in trnL (Table 2). The total AT content in tRNAs was 62.42% and that of GC was 37.58% (Table 3) suggesting AT biasness. All the tRNAs exhibited clover-leaf secondary structures except for Serine1 and Lysine which lacked D or dihydrouridine arm and loop respectively (Figure 4). Such structures have been defined in other domestic animals like Indian gaur[22], Indian wild pig[23], Camels[24]and closely related species Indian Mithun[21].The 12s and 16s rRNA genes of Arunachali yak were of 957bp and 1572bp respectively in lengths. 12s rRNA was located between trnF and trnV genes and 16s rRNA was annotated between trnV and trnL. The two rRNA genes had an AT content of 61.09 % making these an AT-rich region.
D-loop or control region: D-loop or the control region in the mitogenome was located between trnP and trnF and had an A+T content of 60.85%. The region was found to be 894 bp in length which is akin to the length of D-loop reported in other yak breeds [49-50]. As inferred from other studies, the length of D-loop usually ranges from 891 bp[36] to 895 bp[43]. The mitogenome of Arunachali yak also showed the presence multiple copies of palindromic motifs ‘ATGTA’ and ‘TACAT’, however not in pairs as documented for Sichuan takin or Tibetan takin [25]. These short sequences are hypothesised to be sites for termination of H-strand elongation during replication process[22]. The control region terminates with the characteristic homo-polymeric C stretch of 13bp as also observed in other bovids[21-22,43,51-52] (Figure S1).
Phylogenetic analysis: Maximum likelihood method with different best fit substitution models were used for establishing the phylogenetic relationship among the Bos grunniens sequences included in this study. A total of three phylogenetic trees (WMT, 13PCGs and DLOOP) were constructed using three datasets as shown in Figure 5. In all the three trees, our sequence of Arunachali yak clusters consistently with ten of the Chinese yak breeds viz. Sibu, Bazhou, Seron, Jinchuan, Zhongdian, Tongde, Pali, Meiren, Leiwuqi and Gaoshan. The phylogenetic tree of concatenated mitochondrial PCGs (13PCGs) is comparable to the percent sequence identity of the sequences[53] (Supplementary Table S6) with more similar sequences clustering in the same clade. This tree is also in consensus with other studies on yak mitochondrial genome[43,51]. Therefore, concatenated PCGs may be a better dataset for construction of phylogenetic tree in case of mitogenome. The DLOOP tree shows Arunachali yak to be closely related to Jinchuan yak. The D-loop or control region has the highest variability in the whole of mitogenome and has been proven to be powerful marker for study of maternal phylogenetic and evolutionary relationships among breeds[23]. It may therefore be hypothesised that Arunachali yak and Jinchuan yak have the closest relation and shared the same ancestor before domestication.