Basic characteristics of T. foenum-graecum mt genome
The T. foenum-graecum mt genome was circular in structure with a total length of 345,604 bp and a GC content of 45.28%. The GC content of PCGs (42.72%) was lower than that of tRNA (52.41%) and rRNA (51.2%). The mt genome structure is shown in Fig. 1. There were 59 genes, including 33 protein-coding genes, 21 tRNA genes, 4 rRNA genes and 1 pseudo gene. The classification of genes in the mt genome of T. foenum-graecum is shown in Table 1. Among them, there are 11 genes with introns (ccmFC, nad1, nad2, nad4, nad5, nad7, rps3, rps7, rps10, trnP-CGG, trnT-TGT) containing a total of 25 introns. NADH dehydrogenase contains the largest number of introns, 19 in total. In addition, two copies of rrn26, trnF-GAA, trnG-GC and four copies of trnM-CAT were found in the T. foenum-graecum mt genome. The rps1 is a pseudo gene.
Table 1 Gene classification table
Group of genes
|
Gene name
|
Length
|
Start codon
|
Stop codon
|
Amino acid
|
ATP synthase
|
atp1
|
1518
|
ATG
|
TAA
|
506
|
atp4
|
588
|
ATG
|
TAA
|
196
|
atp6
|
663
|
ATG
|
CAA
(TAA)
|
221
|
atp8
|
483
|
ATG
|
TAA
|
161
|
atp9
|
225
|
ATG
|
TAA
|
75
|
Cytohrome c biogenesis
|
ccmB
|
621
|
ATG
|
TGA
|
207
|
ccmC
|
747
|
ATG
|
TGA
|
249
|
ccmFC*
|
1431
|
ATG
|
TAG
|
477
|
ccmFN
|
1728
|
ATG
|
TGA
|
576
|
Ubichinol cytochrome c reductase
|
cob
|
1179
|
ATG
|
TGA
|
393
|
Cytochrome c oxidase
|
cox1
|
1584
|
ATG
|
TAA
|
528
|
cox2
|
906
|
ATG
|
TAA
|
302
|
cox3
|
798
|
ATG
|
TGA
|
266
|
Maturases
|
matR
|
1986
|
ATG
|
TAG
|
662
|
Transport membrance protein
|
mttB
|
312
|
ATG
|
TGA
|
104
|
NADH dehydrogenase
|
nad1****
|
978
|
ACG
(ATG)
|
TAA
|
326
|
nad2****
|
1467
|
ATG
|
TAA
|
489
|
nad3
|
357
|
ATG
|
TAA
|
119
|
nad4***
|
1488
|
ATG
|
TGA
|
496
|
nad4L
|
303
|
ACG
(ATG)
|
TAA
|
101
|
nad5****
|
2010
|
ATG
|
TAA
|
670
|
nad6
|
618
|
ATG
|
TAA
|
206
|
nad7****
|
1185
|
ATG
|
TAG
|
395
|
nad9
|
591
|
ATG
|
TAA
|
197
|
Ribosomal proteins (LSU)
|
rpl16
|
558
|
ATG
|
TAA
|
186
|
rpl5
|
564
|
ATG
|
TAA
|
188
|
Ribosomal proteins (SSU)
|
rps10*
|
411
|
ATG
|
TAA
|
137
|
rps12
|
372
|
ATG
|
TGA
|
124
|
rps14
|
303
|
ATG
|
TAG
|
101
|
rps19
|
138
|
ATG
|
TGA
|
46
|
rps3*
|
1677
|
ATG
|
TAA
|
559
|
rps4
|
1050
|
ATG
|
TAA
|
350
|
rps7*
|
276
|
ATG
|
TAA
|
92
|
Ribosomal RNAs
|
rrn18
|
2008
|
|
|
|
rrn26(2)
|
(3137,3137)
|
|
|
|
rrn5
|
115
|
|
|
|
Transfer RNAs
|
trnC-GCA
|
71
|
|
|
|
trnD-GTC
|
74
|
|
|
|
trnE-TTC
|
72
|
|
|
|
trnF-GAA(2)
|
(64,74)
|
|
|
|
trnG-GCC(2)
|
(72,72)
|
|
|
|
trnH-GTG
|
74
|
|
|
|
trnK-TTT
|
73
|
|
|
|
trnL-CAA
|
82
|
|
|
|
trnM-CAT(4)
|
(73,74,74,74)
|
|
|
|
trnN-GTT
|
72
|
|
|
|
trnP-CGG*
|
83
|
|
|
|
trnP-TGG
|
75
|
|
|
|
trnQ-TTG
|
72
|
|
|
|
trnT-TGT*
|
75
|
|
|
|
trnW-CCA
|
74
|
|
|
|
trnY-GTA
|
83
|
|
|
|
Gene*: gene with one intron; Gene**: gene with two introns; Gene (2): copy number of multi-copy gene.
In protein-coding genes, the most used start codon is ATG and the most used stop codon is TAA. The 21 tRNAs involve 15 amino acids in the transport process, including: methionine (Met), lysine (Lys), glutamate (Glu), phenylalanine (Phe), proline (Pro), tryptophan (Trp), glutamine (Gln) glycine (Gly), aspartate (Asp), threonine (Thr), tyrosine (Tyr), asparagine (Asn), cysteine (Cys), histidine (His), and leucine (Leu). The difference between the number of tRNAs and amino acids indicates the existence of one amino acid being transported by multiple tRNAs.
Prediction Of Rna Editing Sites
RNA editing affects gene expression and RNA stability through base substitution, insertion or deletion and plays an important role in promoting transcriptional diversity and enriching the variety of proteins[14, 15]. RNA editing sites were predicted for the mt genome of T. foenum-graecum, and a total of 465 RNA editing sites were predicted in 33 PCGs, and all RNA editing sites were of C-T editing type. The relationship between the number of genes and editing sites is shown in Fig. 2. ATP synthase (except atp4), Transport membrance protein, Maturases, Ribosomal proteins (except rpl5) and Ribosomal proteins (except rps4, rps3) were found to have a relatively low number of RNA editing-derived substitutions (1–10 editing sites), while Cytohrome c biogenesis, Ubichinol cytochrome c reductase, Cytochrome c oxidase, and NADH dehydrogenase (except nad9) were significantly edited (10–41 editing sites). Among them, nad4 had the highest number of RNA editing sites.
The RNA editing sites were classified according to the hydrophilicity of amino acids, as shown in Table 2. It includes five types of edits: hydrophilic-hydrophilic, hydrophobic-hydrophobic, hydrophilic-hydrophobic, hydrophobic-hydrophilic and hydrophilic-stop. Among them, 13.12% of the amino acids remained hydrophilic; 31.83% of the amino acids remained hydrophobic; 47.53% of the amino acids changed from hydrophilic to hydrophobic; 6.45% of the amino acids changed from hydrophobic to hydrophilic; and 1.08% of the amino acids were prematurely terminated during the coding process. Premature termination occurred in atp6, ccmFc, and cox1 in the T. foenum-graecum mt genome. In addition, a total of 32 codon transitions were involved, with TCA (S) = > TTA (L) being the most common, with 68 editing sites.
Table 2 Classification table of RNA editing sites
Type
|
RNA-editing
|
Number
|
Percentage
|
hydrophilic-hydrophilic
|
CAC (H)=>TAC (Y)
|
9
|
13.12%
|
CAT (H)=>TAT (Y)
|
14
|
CGC (R)=>TGC (C)
|
11
|
CGT (R)=>TGT (C)
|
27
|
total
|
61
|
hydrophobic-hydrophobic
|
CCA (P)=>CTA (L)
|
39
|
31.83%
|
CCC (P)=>CTC (L)
|
12
|
CCC (P)=>TTC (F)
|
6
|
CCG (P)=>CTG (L)
|
28
|
CCT (P)=>CTT (L)
|
23
|
CCT (P)=>TTT (F)
|
10
|
CTC (L)=>TTC (F)
|
7
|
CTT (L)=>TTT (F)
|
14
|
GCA (A)=>GTA (V)
|
1
|
GCC (A)=>GTC (V)
|
1
|
GCG (A)=>GTG (V)
|
4
|
GCT (A)=>GTT (V)
|
3
|
total
|
148
|
hydrophilic-hydrophobic
|
ACA (T)=>ATA (I)
|
5
|
47.53%
|
ACC (T)=>ATC (I)
|
1
|
ACG (T)=>ATG (M)
|
7
|
ACT (T)=>ATT (I)
|
3
|
CGG (R)=>TGG (W)
|
34
|
TCA (S)=>TTA (L)
|
68
|
TCC (S)=>TTC (F)
|
25
|
TCG (S)=>TTG (L)
|
40
|
TCT (S)=>TTT (F)
|
38
|
total
|
221
|
hydrophobic-hydrophilic
|
CCA (P)=>TCA (S)
|
4
|
6.45%
|
CCC (P)=>TCC (S)
|
7
|
CCG (P)=>TCG (S)
|
3
|
CCT (P)=>TCT (S)
|
16
|
total
|
30
|
hydrophilic-stop
|
CAA (Q)=>TAA (X)
|
1
|
1.08%
|
CAG (Q)=>TAG (X)
|
2
|
CGA (R)=>TGA (X)
|
2
|
total
|
5
|
Discussion in terms of amino acid conversion revealed that 151 (32.47%) of these editing sites were located on the first base of the triplet codon and 298 (64.09%) on the second base of the triplet codon. In addition, the first and second bases of one codon were edited and the amino acid changed from the original proline (CCT) to phenylalanine (TTT). In the study it was also found that the highest number of leucine was present after RNA editing. This includes: 108 sites converted from serine to leucine and 102 sites converted from proline to leucine.
Codon Preference
A study of T. foenum-graecum codon preference showed that when a certain codon for which the relative synonymous codon usage (RSCU) > 1, it indicates that the codon was used relatively frequently and had a preferences[16]. Among these codons, a total of 32 codons were biased, and 29 of them ended with A or T, accounting for 90.63% of the codons. In addition, the 96 bases that make up the 32 codons contain 30 A bases and 32 T bases, indicating that codons with preferences use more A/T bases in their composition. Thus, the T. foenum-graecum mt genome has a significant AT preferences. When a certain codon for which the RSCU = 1, it indicates that there is no preferences for that codon[16]. In the T. foenum-graecum mt genome, tyrosine has no preferences. The schematic diagram of codon preference is shown in Fig. 3.
Repeated Sequences
Dispersed repetitive sequences are repetitive units that are present in a scattered form throughout the genome[17]. A total of 202 dispersed repeat sequences were detected in the T. foenum-graecum mt genome, including 108 forward repeats (F) and 94 palindrome repeats (P) of two repeat types, with repeat lengths mostly concentrated between 30–60 (83). The total length of the scattered repetitive sequences was 47506 bp, accounting for 13.75% of the total length of the mt genome. The length of each repeat sequence and the number of repeat types are detailed in Table 3.
Table 3
Distribution of dispersed repeat sequences
Length | Dispersed type | Number |
20–29 | P | 2 |
F | 2 |
30–39 | P | 16 |
F | 19 |
40–49 | P | 14 |
F | 9 |
50–59 | P | 11 |
F | 14 |
60–69 | P | 2 |
F | 3 |
70–79 | P | 5 |
F | 11 |
80–89 | P | 3 |
F | 1 |
90–99 | P | 9 |
F | 9 |
100–199 | P | 18 |
F | 31 |
≥ 200 | P | 14 |
F | 9 |
SSRs are 1–6 bp DNA fragments with the advantages of high variability, covariance and reproducibility, which are resources for establishing polymorphic DNA markers and can be widely used in plant genetic breeding[18–21]. A total of 96 SSRs were detected in the T. foenum-graecum mt genome, including 11 monomers, 21 dimers, 10 trimers, 34 tetramers, 16 pentamers and 4 hexamers. Among them, tetramers had the highest number of repeats, accounting for 35.42% of the total SSRs, and hexamers had the lowest number of repeats, accounting for only 4.17% of the total SSRs. Each SSRs is shown in Table 4.
Table 4
SSR type | Repeats | total |
monomer | A/T | 10 |
C/G | 1 |
dimer | AC/GT | 1 |
AG/CT | 15 |
AT/AT | 5 |
trimer | AAC/GTT | 1 |
AAG/CTT | 4 |
AAT/ATT | 4 |
ATC/ATG | 1 |
tetramer | AAAC/GTTT | 2 |
AAAG/CTTT | 8 |
AAAT/ATTT | 3 |
AACC/GGTT | 1 |
AAGC/CTTG | 3 |
AAGT/ACTT | 3 |
AATG/ATTC | 3 |
AATT/AATT | 1 |
ACAG/CTGT | 1 |
ACAT/ATGT | 1 |
ACCG/CGGT | 1 |
ACGG/CCGT | 2 |
ACTG/AGTC | 1 |
AGCC/CTGG | 1 |
AGCT/AGCT | 1 |
AGGC/CCTG | 1 |
CCCG/CGGG | 1 |
pentamer | AAAAG/CTTTT | 1 |
AAAAT/ATTTT | 1 |
AAACC/GGTTT | 4 |
AAACT/AGTTT | 1 |
AAATT/AATTT | 1 |
AACTG/AGTTC | 1 |
AACTT/AAGTT | 2 |
AAGAT/ATCTT | 1 |
AAGCT/AGCTT | 1 |
AATTC/AATTG | 1 |
ACACC/GGTGT | 1 |
ACGGC/CCGTG | 1 |
hexamer | AAACTT/AAGTTT | 2 |
AAATGG/ATTTCC | 1 |
AGATAT/ATATCT | 1 |
Tandem repetitive repeat are formed by the tandem arrangement of repetitive DNA units of 1-200 bp and are widely found in eukaryotes and some prokaryotes[22]. A total of 19 tandem repeats were detected in the T. foenum-graecum mt genome, with length distributions ranging from 5–57, and 13 tandem repeats had a match rate of > 97%, as shown in Table 5. The distribution of repetitive sequences on the genome is shown in Fig. 4.
Table 5
Distribution of tandem repeat sequences
NO. | Size | Repeat sequence | Percent Matches |
1 | 36 | TAACATAGACCCTCTTTACTTACAGTCGAGCTCTAT | 98 |
2 | 57 | ATATGAAGTTCTAATATTATCTGCACTAAGAAGTGATTACGACTTGTTGTAGATGA | 89 |
3 | 32 | GAGAGGTATGAAAGCGATACTCGACTGATAAG | 82 |
4 | 22 | TTCGATGTAATTGATTTCGCCA | 100 |
5 | 36 | AGGGTCTATGTTAATAGAGCTCGACTGTAAGTAAAG | 100 |
6 | 30 | CGGAGGTTGAGGAGGAGTTTCGGGCTGCTG | 64 |
7 | 16 | CTTGTTATTAGTAAAG | 100 |
8 | 27 | TCTGTATCACTTCTTTACTTGGCTTAT | 100 |
9 | 27 | ATTCTCAATCCACGACGACTATTAACG | 100 |
10 | 25 | TTGATGAACAAGAAGGAACGAAGTG | 100 |
11 | 12 | ATTTATAGCAGC | 100 |
12 | 15 | TCTGACGTCCTTCCT | 100 |
13 | 19 | AATTATCTTATCTAAAATA | 70 |
14 | 19 | CACCTGCAGTTTGGTGCAG | 88 |
15 | 28 | TGCAGGCGAATAGAAAGAGCCCGGCACC | 100 |
16 | 25 | GGGTGAGGGATTAATAAACTAGCTC | 100 |
17 | 5 | ATTCA | 100 |
18 | 9 | GAGACTTTTG | 90 |
19 | 36 | CTTTACTTACAGTCGAGCTCTATTAACATAGACCCT | 100 |
Nucleotide Polymorphism
When a gene or gene spacer varies, causing DNA sequence polymorphism. Analysis of nucleotide polymorphism in the mt genome of T. foenum-graecum showed that the range of its was 0-0.03891. The corresponding nucleotide polymorphism values for rps12, rps3, rpl5, cox2, and atp6 were 0.01174, 0.01288, 0.01314, 0.02692, and 0.03891, respectively. Their higher nucleotide polymorphism indicates that these genes or gene spacers have undergone higher variation. Nucleotide polymorphism values of each gene are shown in Fig. 5.
Synteny And Phylogenetic Analysis
T. foenum-graecum and five other Leguminous species (Trifolium pratense, Trifolium meduseum, Trifolium grandiflorum, Trifolium aureum, Medicago truncatula) were subjected to synteny analysis to tentatively determine their affinities. The results showed that T. foenum-graecum was the most similar to Medicago truncatula. Schematic diagrams of the covariance and mt structures of these six plants are shown in Fig. 6 and Fig. 7. Among them, Trifolium meduseum had the largest length of 348,724 bp and Medicago truncatula had the smallest length of 271,618 bp. They all had a GC content of about 45%, further indicating that the plant mt genomes is relatively conserved.
T. foenum-graecum and 25 other Leguminosae species were subjected to phylogenetic analysis. In the comparison between T. foenum-graecum and other Papilionoideae plants, T. foenum-graecum (Trigonella) was first linked to Medicago truncatula (Medicago) in a group with a maximum similarity of 100%. In a group connected with Trifolium pratense, Trifolium meduseum, Trifolium grandiflorum, and Trifolium aureum (Trifolium), the similarity was high at 93%. Caesalpinioideae, Cercidoideae and Detarioideae were compared as outgroups of the phylogenetic tree. The phylogenetic tree is shown in Fig. 8. There are 24 nodes in the phylogenetic tree, 18 of which have 100% support and 22 of which have more than 80% support.
Substitution Rates Of Pcgs
The six Leguminosae plants (T. foenum-graecum, Trifolium pratense, Trifolium meduseum, Trifolium grandiflorum, Trifolium aureum, Medicago truncatula) were compared two by two to analyze Ka/Ks values between species, as shown in Fig. 9. Among the 28 PCGs counted, 23 genes (atp1, atp4, atp6, atp8, ccmB, ccmC, ccmFC, ccmFn, ccmFn2, cob, cox1, cox2, cox3, mttB, nad1, nad2, nad4, nad4L, nad5, nad6, rpl16, rps10, rps14) had Ka/Ks < 1. When Ka/Ks < 1, it indicates that these genes will continue to evolve under purifying selection; when Ka/Ks > 1, it indicates that positive selection of genes has occurred and proteins have been changed; when Ka/Ks = 1, it indicates that there is neutral selection[23].
Chloroplast And Mitochondrial Homologous Sequences
Annotation of the T. foenum-graecum chloroplast genome using the same leaf. Homology analysis of the mitochondria and chloroplasts of T. foenum-graecum observed a transfer of DNA sequences from the cp genome to the mt genome. T. foenum-graecum mitochondria contain 23 cp insertions, ranging from 35 to 2427 bp in length, for a total length of 10,023 bp, or 2.9% of the total genome length, as shown in Table 6. Annotation of these homologous sequences revealed that some genes were lost during the migration of PCGs from chloroplasts to mitochondria, and only some sequences could be found in mitochondria. However, tRNA genes are able to retain their integrity during transfer to mitochondria, such as: trnW-CCA, trnN-GUU, trnD-GUC, trnH-GUG, trnM-CAU. Therefore, it is inferred that tRNA genes are more conserved and have better gene integrity than PCGs during migration. The analysis of homologous fragments of cp and mt sequences is shown in Fig. 10.
Table 6
Cp insertions in the mt genome of T. foenum-graecum
| Identity% | Length | Mismatches | Gap openings | gene |
1 | 100 | 2427 | 0 | 0 | rrn4.5(partical:6.73%) rrn23(partical:82.89%) |
2 | 99.916 | 1188 | 1 | 0 | psbC(partical:83.54%) |
3 | 100 | 1140 | 0 | 0 | psaB(partical:51.70%) |
4 | 100 | 1016 | 0 | 0 | psbC(partical:10.34%) psbD(partical:86.82%) |
5 | 99.741 | 772 | 1 | 1 | rrn23(partical:11.68%) trnA-UGC(partical:30.86%) |
6 | 100 | 426 | 0 | 0 | trnI-GAU(partical:55.72%) |
7 | 92.708 | 384 | 4 | 5 | rrn23(partical:13.68%) |
8 | 99.288 | 281 | 2 | 0 | trnI-GAU(partical:34.17%) |
9 | 98.252 | 286 | 4 | 1 | trnW-CCA; petG(partical:10.53%) |
10 | 74.972 | 887 | 172 | 38 | rrn16(partical:57.98%) |
11 | 87.547 | 265 | 30 | 3 | rrn16(partical:17.69%) |
12 | 99.167 | 120 | 1 | 0 | psaA(partical:5.27%) |
13 | 91.27 | 126 | 11 | 0 | psaA(partical:5.53%) |
14 | 98.824 | 85 | 0 | 1 | trnN-GUU |
15 | 96.429 | 84 | 3 | 0 | trnD-GUC |
16 | 96.154 | 78 | 3 | 0 | trnH-GUG |
17 | 93.59 | 78 | 5 | 0 | trnM-CAU |
18 | 98.077 | 52 | 1 | 0 | rrn16(partical:3.49%) |
19 | 96.296 | 54 | 0 | 2 | ycf2(partical:0.85%) |
20 | 80.412 | 97 | 19 | 0 | rrn23(partical:3.47%) |
21 | 80.412 | 97 | 19 | 0 | rrn23(partical:3.47%) |
22 | 95.556 | 45 | 2 | 0 | rrn23(partical:1.61%) |
23 | 97.143 | 35 | 1 | 0 | rrn23(partical:1.25%) |