Analysis of AAP proteins
To perform a phylogenetic analysis of AAP proteins in plants, we identified putative AAP proteins using the plant sequences listed below as a reference. Combining the sequence data from Tegeder and Ward [13] and Romani et al. [25], 17 plant species were selected, including Chlorophyta (Trebouxiophyceae: Coccomyxa subellipsoidea; Chlorophyceae: Dunaliella salina, Volvox carteri, Micromonas pusilla, Micromonas sp., Ostreococcus lucimarinus, Chlamydomonas reinhardtii), Bryophyta (M. polymorpha, Sphagnum fallax, Physcomitrella patens), lycophytes (S. moellendorffii), Gymnospermae (Picea abies), and angiosperms (Amborella trichopoda, A. thaliana, S. tuberosum, Zea mays, O. sativa; Table 1).
Table 1 The number of AAPs, clade, and genetic characteristics of AAP genes in 17 different stage plants.
|
Number of AAPs
|
Number of Group I
|
Number of Group II
|
Tandem duplication (pairs)
|
Segmental duplication (pairs)
|
Clade 1A
|
Clade 1B
|
Clade 2
|
Clade 3
|
Clade 4
|
Clade 5
|
Chlorophyta
|
|
|
|
|
|
|
|
|
|
|
|
|
Volvox carteri
|
0
|
|
|
|
|
|
|
|
|
|
|
Chlamydomonas reinhardtii
|
0
|
|
|
|
|
|
|
|
|
|
|
Dunaliella salina
|
0
|
|
|
|
|
|
|
|
|
|
|
Micromonas pusilla
|
0
|
|
|
|
|
|
|
|
|
|
|
Micromonas sp.
|
0
|
|
|
|
|
|
|
|
|
|
|
Ostreococcus lucimarinus
|
0
|
|
|
|
|
|
|
|
|
|
|
Coccomyxa subellipsoidea
|
5
|
1
|
|
|
|
|
|
4
|
0
|
0
|
Bryophyta
|
|
|
|
|
|
|
|
|
|
|
|
|
Sphagnum fallax
|
30
|
|
4
|
|
|
|
|
26
|
6
|
0
|
|
Physcomitrella patens [13]
|
12
|
|
2
|
|
|
|
|
10
|
0
|
0
|
|
Marchantia polymorpha
|
10
|
4
|
1
|
|
|
|
|
5
|
1
|
0
|
Lycophyte
|
|
|
|
|
|
|
|
|
|
|
|
|
Selaginella moellendorffii [13]
|
15
|
2
|
2
|
|
|
|
|
11
|
4
|
1
|
Gymnospermae
|
|
|
|
|
|
|
|
|
|
|
|
|
Picea abies
|
9
|
|
2
|
|
4
|
2
|
1
|
|
0
|
0
|
Angiosperm
|
|
|
|
|
|
|
|
|
|
|
|
Amborella
|
|
|
|
|
|
|
|
|
|
|
|
|
Amborella trichopoda
|
16
|
|
|
2
|
1
|
1
|
3
|
9
|
3
|
0
|
Eudicots
|
|
|
|
|
|
|
|
|
|
|
|
|
Arabidopsis thaliana [13]
|
8
|
|
|
3
|
1
|
4
|
|
|
0
|
5
|
|
Solanum tuberosum [26]
|
8
|
|
|
2
|
2
|
4
|
|
|
1
|
1
|
Monocots
|
|
|
|
|
|
|
|
|
|
|
|
|
Zea mays [27]
|
22
|
|
|
4
|
3
|
10
|
5
|
|
0
|
6
|
|
Oryza sativa [28]
|
19
|
|
|
4
|
3
|
8
|
4
|
|
5
|
2
|
In total, 210 proteins were blasted, with some genes having more than one transcript and we thus only selected the primary one. Through the analysis of predicted proteins, 154 proteins had Aa_trans or SLC5-6-like_sbd superfamily which consisted mainly of sequences to recognize the AAP proteins (Additional file 13). Only 5 AAP-like proteins were predicted in C. subellipsoidea from 7 different chlorophyte species we searched were predicted AAP proteins and the amount of AAP proteins in S. fallax were larger than others. Each tracheophyte speices also predicted AAP proteins, either. In order to visualize the groups of AAP proteins in plants at various stages, we used 7 different colors to distinguish the plant species and noted the plant species (Fig 1) and the number of AAP proteins (Table 1) in each group.
We have provided some information about AAP proteins, which included the protein length, domain location and number of transmembrane domains and exons (Additional file 1). While for the most part exons numbered 6-8, in some species only 1 exon was identified and in C. subellipsoidea more than 10 exons were identified. In general, the number of exons was relatively stable in all plants. A greater number of exons more short sequences being constructed and the length of the sequence was not correlated with the number of exons.
The AAP protein family as an amino acid transporter had specific repetitive sequences. We predicted the location of the main motif, Aa_trans domain, and the number of transmembrane domains in each protein. The e-value was set -5 to confirm that the domain showed all of the proteins in these two kinds of motifs. Most proteins had one main Aa_trans domain, except for Pp3c21_14080V3.1, 413158, pa_MA_889393g0010, ZmAAAP17, ZmAAAP64, and OsAAP19, which had 2 domains which were all incomplete, and pa_MA_101691g0010, which had 3 segments. Six to twelve transmembrane domians were predicted in each protein. Among them, SmAAP9A contained 12 domains, 413158, 426884 and ZmAAAP17 each contained 6 transmembrane domians (Additional file 1 and 10) and we showed all transmembrane domians by Figure 2.
Phylogenetic analysis of AAP
In order to perform a comprehensive phylogenetic analysis of AAP proteins in plants, we selected some representative plant sequences at different evolutionary stages. In total, 154 proteins in 5 different plant stages, from chlorophytes to angiosperms, were used to construct a phylogenetic tree using the Neighbor-Joining method. We choose this method because it was especially well-suited for datasets comprising lineages with largely varying rates of evolution. It can be used in combination with methods that allow for correction of superimposed substitutions [29]. In the unroot tree we could easily divide to 2 main groups (Fig 1). Group I had more branching events and group II could be clearly divided into 2 parts which could reference the bootstrap values. We selected group I proteins to construct a phylogenetic tree in which the bootstrap values separated group I into 5 clades (Fig 3). Clade 1 contained non-seed plants and Gymnospermae, and separated into 2 clusters based on the bootstrap values. The other 4 clades comprised seed plants, and Gymnospermae were located in clade 3, 4 and 5. We referenced a part of the grouping method from Tegeder and Ward [13] to classify these proteins. In group I, P. patens and S. moellendorffii AAP proteins were identical to those identified in Tegeder and Ward [13]. Group II mainly included early plant species from Chlorophyta, Bryophyta, and lycophytes. A. trichopoda also appeared in this group as the sister group of the remaining flowering plants. Other early plant AAP proteins mainly appeared in clade 1 and amount of these proteins were belonged to clade 1B. But no proteins were appeared in clade 1 till the evolution of angiosperms (Table 1).
Investigation of gene duplication events and annotations
Gene duplication is potentially advantageous as a primary source of genes with new or modified functions [30]. We analyzed all predicted proteins from each species and found that C. subellipsoidea, P. patens and P. abies exhibited no duplication events. The highest number of tandem duplication events appeared in S. fallax and that of segment duplication events appeared in Z. mays. Oryza sativa had the highest number of duplication events (Additional file 1). Combined with the phylogenetic information it is evident that the duplication events of non-seed plants occurred in 2 main groups. Only M. polymorpha had a tandem duplication event that appeared in group II. All angiosperm duplication events belonged to group I except for those occurring in A. trichopoda. And S. fallax had a duplication event in group I, either (Fig 4). The analysis of the plant genome duplication database (PGDD) [31] and MCscanX [32]also acquired 8 collinear gene pairs, which were homologous gene pairs in different plants. One of these was identified this event in S. moellendorffii for SmAAP9C, which had homologous genes in early plants, and the others all appeared in angiosperms (Additional file 3).
To better understand the gene evolution, it was necessary to calculate ratios of non-synonymous to synonymous nucleotide substitutions (Ka/Ks). We selected all duplicated Coding sequence(CDS) sequences, from which we had deleted the termination codon, to analyze the Ka/Ks ratios using DnaSP6 [33] and PGDD website databases. Firstly, the target genes were aligned using the ClustalX2 ‘align codons’ function. Following this, Ka and Ks values were analyzed in DnaSP6. In total, 48 gene pairs were analyzed, and Ks values could not be determined for 3 collinear gene pairs. Ka/Ks ratio values were slightly above 1.0 in only 2 gene pairs (Sphfalx0007s0031.1/Sphfalx0007s0033.1 and Sphfalx0362s0005.1/Sphfalx0362s0007.1), and no Ka/Ks ratio values were much greater than 1.0. Collinear genes showed Ka/Ks ratios of less than 1.0 between Z. mays and O. sativa, whereas Ks values could not be determined between A. trichopoda and O. sativa, as well as S. moellendorffii and A. thaliana (Additional file 3).
We also used same method to calculate Ka/Ks ratio values in each of the plant species’ AAPs (Additional file 4). The highest Ka/Ks value was also Sphfalx0007s031.1/Sphfalx0007s033.1 and in OsAAP15/OsAAP16 and 174/1275 gene pairs the Ka value was 0 while the Ks value could not be calculated (Additional file 4). Overall, the Ka/Ks values of 16 gene pairs were greater than 1, with the majority occurring in monocots and 2 in S. fallax, which were duplication pairs (Additional file 6).
154 proteins were annotated through Gene Ontology with specific reference to biological process (BP), molecular function (MF), and cellular component (CC). The results indicated that four aspects of CC were annotated to 154 genes and 46 proteins were predicted be related to CC, with majority of proteins belonging to non-seed plants. Seven proteins, which were all group II members, were located in plastids and only AtAAP3 existed in the nuclear envelope. Most proteins were located in the plasma membrane. Four aspects of MF were annotated to 103 proteins that were linked to transmembrane transporter activity. Further, OsAAP13, ZmAAAP09, and ZmAAAP69 were also associated with ion binding, ATPase activity and helicase activity. Four aspects of BP were annotated to 7 genes. Five proteins in Bryophyta participated in transport processes, two S. moellendorffii AAPs were related to transmembrane transport, and OsAAP13, ZmAAAP09, and ZmAAAP69 were associated with DNA metabolic processes and stress response (Fig 5, Additional file 5).