Identification of MAPKKK gene family in C. sinensis
In order to identify the MAPKKK gene family in tea (C. sinensis), 415 known MAPKKK peptide sequences from Arabidopsis thaliana (80), Oryza sativa (75), Solanum lycopersicum (71), Solanum tuberosum (81), Capsicum annum (60) and Coffee canephora (48) were retrieved from their respective databases. MAPKKK gene family is divided into three other sub-families, which include MEKK-like, Raf-like and ZIK-like genes. To classify and categorize the MAPKKK genes in tea, BLASTp searches were conducted against the tea protein database, using the retrieved peptide sequences from Arabidopsis and rice as query sequences. For all BLASTp searches, e value and identity percentage were set to 1e-5 and 50% as threshold, respectively (Additional File 1: Supplementary Table S1, Supplementary Table S2 and Supplementary Table S3). The identified tea peptides were again screened with a Hidden Markov Model (HMM) search to confirm the presence of serine/threonine-protein kinase-like domain (PF00069). The screened peptides were again subjected to self-BLAST to remove any chances of redundant data. The final screened tea MAPKKK genes yielded 59 total potential genes, which included 17 MEKK-like, 31 Raf-like and 11 ZIK-like genes and were incorporated into the final dataset.
The physicochemical properties of the identified tea MAPKKK protein sequences were evaluated using ExPASy ProtParam tool (Table 1, Table 2 and Table 3). The length and molecular weight of the 17 MEKK proteins ranged from 311 to 1191 amino acid residues and 34828.88 to 130956.46 kDa respectively (Table 1). For the Raf proteins, it ranged from 305 to 1436 amino acid residues and 35012.57 to 159263.21 kDa (Table 2), and for the ZIK proteins, it ranged from 300 to 831 amino acid residues and 34181.96 to 94422.51 kDa (Table 3). The theoretical pI values ranged from 4.58 to 9.50 for MEKK, 4.88 to 9.61 for Raf and 5.14 to 6.33 for ZIK proteins, indicating that most of the MEKK and Raf proteins have a basic nature while the ZIK proteins being acidic. The grand average of hydropathy (GRAVY index) in all the extracted MEKK, Raf and ZIK were negative values, ranging from -0.605 to -0.060, -0.661 to -0.182 and -0.582 to -0.350 respectively. This indicates that all the identified 59 tea MAPKKKs are hydrophilic in nature. 52 of the 59 putative tea MAPKKKs had instability index values above 40, while 6 Raf genes (TEA000933.1, TEA022171.1, TEA011280.1, TEA031223.1, TEA007232.1 and TEA013875.1) and 1 ZIK gene (TEA020112.1) had instability index values less than 40 (Table 1, Table 2 and Table 3). This signifies the unstable nature of most of the identified tea MAPKKKs [14]. Subcellular localization of the peptides was predicted using the BaCelLo online server with 49 genes being localized in the nucleus, 9 genes in chloroplast and 2 genes in cytoplasm (Table 1, Table 2, and Table 3). TMHMM server v2.0 was employed to predict the presence of trans-membrane helices in the putative peptide sequences and one of the ZIK genes (TEA027328.1) had one trans-membrane helix (Additional File 3: Supplementary Fig. S1,
Gene ID
|
Locus position
|
Gene length
(bp)
|
Protein length (aa)
|
Mol. Wt. (kDa)
|
pI value
|
No. of negative residues
|
No. of positive residues
|
GRAVY index
|
Instability index
|
Aliphatic index
|
Subcellular localization
|
TEA028357.1
|
Scaffold856:196999-204246-
|
7247
|
628
|
68667.76
|
5.60
|
77
|
67
|
-0.380
|
58.36
|
76.85
|
Nucleus
|
TEA025870.1
|
Scaffold790:521648-539960+
|
18312
|
776
|
85271.15
|
6.76
|
94
|
92
|
-0.379
|
45.58
|
81.08
|
Nucleus
|
TEA016319.1
|
Scaffold3144:371539-383072-
|
11533
|
627
|
68238.67
|
9.50
|
53
|
71
|
-0.535
|
50.61
|
68.23
|
Nucleus
|
TEA008165.1
|
Scaffold3102:729210-737275+
|
8065
|
1032
|
112285.36
|
9.04
|
84
|
102
|
-0.423
|
53.62
|
72.95
|
Nucleus
|
TEA027265.1
|
Scaffold1289:966535-975893+
|
9358
|
939
|
101539.85
|
9.35
|
80
|
104
|
-0.605
|
63.34
|
65.88
|
Nucleus
|
TEA006319.1
|
Scaffold2905:735285-744378+
|
9093
|
683
|
75479.57
|
9.32
|
62
|
78
|
-0.505
|
67.84
|
72.55
|
Chloroplast
|
TEA006473.1
|
Scaffold1374:1527992-1535696-
|
7704
|
710
|
78857.59
|
9.09
|
65
|
79
|
-0.516
|
69.53
|
71.59
|
Nucleus
|
TEA014429.1
|
Scaffold41:2381991-2415462+
|
33471
|
1191
|
130956.46
|
6.13
|
145
|
128
|
-0.350
|
45.47
|
89.93
|
Chloroplast
|
TEA031711.1
|
Scaffold5399:986467-998883-
|
12416
|
562
|
62129.83
|
6.31
|
72
|
69
|
-0.484
|
48.47
|
75.62
|
Nucleus
|
TEA001470.1
|
Scaffold558:920549-933450+
|
12901
|
789
|
87423.22
|
8.34
|
90
|
95
|
-0.313
|
49.68
|
84.13
|
Nucleus
|
TEA017119.1
|
Scaffold5354:234291-239017-
|
4726
|
506
|
56190.19
|
4.66
|
80
|
49
|
-0.481
|
47.12
|
69.53
|
Nucleus
|
TEA005306.1
|
Scaffold2184:2097399-2125258+
|
27859
|
1097
|
121164.56
|
5.40
|
162
|
127
|
-0.540
|
49.78
|
72.63
|
Nucleus
|
TEA009902.1
|
Scaffold438:521469-522821-
|
1352
|
450
|
49874.37
|
4.58
|
65
|
34
|
-0.060
|
44.64
|
91.60
|
Chloroplast
|
TEA029598.1
|
Scaffold944:301732-304329+
|
2597
|
423
|
46235.51
|
4.94
|
62
|
43
|
-0.433
|
51.54
|
74.18
|
Nucleus
|
TEA005122.1
|
Scaffold1857:297670-298674-
|
1004
|
334
|
36588.08
|
6.01
|
40
|
34
|
-0.381
|
46.55
|
78.23
|
Chloroplast
|
TEA028214.1
|
Scaffold613:628014-629048+
|
1034
|
344
|
38088.50
|
6.33
|
44
|
41
|
-0.322
|
45.18
|
79.36
|
Nucleus
|
TEA031689.1
|
Scaffold1549:309791-310726-
|
935
|
311
|
34828.88
|
6.04
|
44
|
40
|
-0.340
|
48.20
|
90.64
|
Nucleus
|
Supplementary Fig. S2, Supplementary Fig. S3).
Table 1. Sequence characteristics and physicochemical properties of MAPKKKs belonging to MEKK subfamily in C. sinensis. Locus position, gene length, protein length, molecular weight and pI value, no. of negative and positive residues, GRAVY index, instability index, aliphatic index and subcellular localizations were analysed.
Table 2. Sequence characteristics and physicochemical properties of MAPKKKs belonging to Raf subfamily in C. sinensis. Locus position, gene length, protein length, molecular weight and pI value, no. of negative and positive residues, GRAVY index, instability index, aliphatic index and subcellular localizations were analysed.
Gene ID
|
Locus position
|
Gene length
(bp)
|
Protein length (aa)
|
Mol. Wt. (kDa)
|
pI value
|
No. of negative residues
|
No. of positive residues
|
GRAVY index
|
Instability index
|
Aliphatic index
|
Subcellular localization
|
TEA001765.1
|
Scaffold1670:382409-407933-
|
25524
|
842
|
93193.15
|
5.86
|
107
|
92
|
-0.248
|
46.33
|
89.69
|
Nucleus
|
TEA002020.1
|
Scaffold3595:726640-735244+
|
8604
|
896
|
99135.31
|
6.37
|
111
|
103
|
-0.382
|
42.96
|
81.80
|
Nucleus
|
TEA000256.1
|
Scaffold3876:193108-215389+
|
22281
|
1086
|
119081.90
|
6.63
|
118
|
112
|
-0.441
|
44.80
|
78.36
|
Nucleus
|
TEA029086.1
|
Scaffold106:745738-778269+
|
32531
|
919
|
101696.51
|
5.17
|
119
|
85
|
-0.182
|
41.82
|
91.44
|
Chloroplast
|
TEA022129.1
|
Scaffold3036:784237-806418+
|
22181
|
940
|
104852.31
|
6.01
|
114
|
102
|
-0.217
|
48.52
|
89.81
|
Nucleus
|
TEA019143.1
|
Scaffold1695:623368-630213+
|
6845
|
724
|
79987.28
|
7.68
|
86
|
87
|
-0.609
|
41.33
|
70.98
|
Nucleus
|
TEA028452.1
|
Scaffold433:2415340-2426547+
|
11207
|
846
|
93141.05
|
6.10
|
107
|
93
|
-0.523
|
46.31
|
70.89
|
Nucleus
|
TEA016969.1
|
Scaffold4925:453439-477111+
|
23672
|
1107
|
124661.25
|
8.46
|
145
|
153
|
-0.506
|
46.58
|
77.06
|
Nucleus
|
TEA013270.1
|
Scaffold344:585774-608400+
|
22626
|
755
|
85320.07
|
5.83
|
104
|
84
|
-0.374
|
53.97
|
80.97
|
Nucleus
|
TEA026716.1
|
Scaffold1930:511463-522712-
|
11249
|
368
|
41783.40
|
5.63
|
52
|
44
|
-0.487
|
46.26
|
73.89
|
Nucleus
|
TEA028758.1
|
Scaffold9739:380569-387825-
|
7256
|
1213
|
135047.30
|
5.63
|
159
|
123
|
-0.661
|
51.62
|
66.13
|
Nucleus
|
TEA010804.1
|
Scaffold35:1009695-1024064+
|
14369
|
305
|
35012.57
|
6.50
|
44
|
41
|
-0.644
|
44.62
|
74.16
|
Nucleus
|
TEA009451.1
|
Scaffold1786:773656-783180-
|
9524
|
1333
|
148469.64
|
4.88
|
193
|
127
|
-0.547
|
45.20
|
72.24
|
Nucleus
|
TEA021421.1
|
Scaffold1504:1005366-1017261-
|
11895
|
1331
|
147137.87
|
5.50
|
181
|
135
|
-0.618
|
44.79
|
71.96
|
Nucleus
|
TEA017670.1
|
Scaffold1965:485241-501001+
|
15760
|
561
|
62970.57
|
5.67
|
78
|
62
|
-0.385
|
49.24
|
89.63
|
Nucleus
|
TEA019184.1
|
Scaffold4191:163416-174412+
|
10996
|
601
|
67890.36
|
5.90
|
76
|
65
|
-0.328
|
49.47
|
89.02
|
Nucleus
|
TEA000933.1
|
Scaffold397:63694-76812+
|
13118
|
407
|
45660.50
|
7.66
|
45
|
46
|
-0.291
|
38.27
|
81.89
|
Nucleus
|
TEA031230.1
|
Scaffold2248:916505-925818+
|
9313
|
489
|
54655.73
|
9.20
|
56
|
67
|
-0.311
|
43.48
|
87.32
|
Chloroplast
|
TEA022171.1
|
Scaffold382:2039496-2046332+
|
6836
|
404
|
44640.23
|
8.60
|
48
|
54
|
-0.379
|
26.52
|
78.89
|
Nucleus
|
TEA011280.1
|
Scaffold3804:571784-578194+
|
6410
|
368
|
41126.17
|
7.52
|
48
|
49
|
-0.312
|
25.58
|
82.91
|
Nucleus
|
TEA031223.1
|
Scaffold2248:107085-111327-
|
4242
|
434
|
48234.42
|
6.42
|
57
|
55
|
-0.280
|
26.84
|
84.22
|
Nucleus
|
TEA007232.1
|
Scaffold3038:2387807-2395630-
|
7823
|
368
|
41118.03
|
7.02
|
48
|
48
|
-0.424
|
35.65
|
77.34
|
Nucleus
|
TEA016553.1
|
Scaffold1761:1968012-1984037-
|
16025
|
432
|
49062.23
|
8.45
|
65
|
69
|
-0.513
|
43.90
|
83.06
|
Nucleus
|
TEA033032.1
|
Scaffold858:331961-346154-
|
14193
|
415
|
46688.59
|
6.05
|
59
|
52
|
-0.355
|
43.89
|
86.53
|
Nucleus
|
TEA001764.1
|
Scaffold619:1545624-1550286+
|
4662
|
351
|
39474.64
|
6.47
|
42
|
39
|
-0.191
|
43.90
|
86.72
|
Cytoplasm
|
TEA026000.1
|
Scaffold3457:1062923-1073461-
|
10538
|
1296
|
144765.39
|
5.40
|
177
|
122
|
-0.507
|
42.22
|
73.72
|
Nucleus
|
TEA033556.1
|
Scaffold192:400250-403690-
|
3440
|
541
|
61612.16
|
9.27
|
57
|
72
|
-0.371
|
46.58
|
86.19
|
Chloroplast
|
TEA013875.1
|
Scaffold5449:126808-131150+
|
4342
|
341
|
39047.30
|
6.76
|
44
|
42
|
-0.256
|
36.12
|
91.52
|
Cytoplasm
|
TEA002722.1
|
Scaffold1369:145416-156759+
|
11343
|
1436
|
159263.21
|
5.41
|
203
|
153
|
-0.566
|
45.20
|
72.12
|
Nucleus
|
TEA030052.1
|
Scaffold319:1148438-1156900+
|
8462
|
1357
|
148194.52
|
5.00
|
166
|
110
|
-0.444
|
50.17
|
73.96
|
Nucleus
|
TEA008343.1
|
Scaffold142:344598-352450+
|
7852
|
334
|
37992.05
|
9.61
|
35
|
46
|
-0.257
|
46.78
|
86.17
|
Nucleus
|
Table 3. Sequence characteristics and physicochemical properties of MAPKKKs belonging to ZIK subfamily in C. sinensis. Locus position, gene length, protein length, molecular weight and pI value, no. of negative and positive residues, GRAVY index, instability index, aliphatic index and subcellular localizations were analysed.
Gene ID
|
Locus position
|
Gene length
(bp)
|
Protein length (aa)
|
Mol. Wt. (kDa)
|
pI value
|
No. of negative residues
|
No. of positive residues
|
GRAVY index
|
Instability index
|
Aliphatic index
|
Subcellular localization
|
TEA010125.1
|
Scaffold52:664232-675917-
|
11685
|
675
|
76706.84
|
5.67
|
91
|
68
|
-0.546
|
47.51
|
72.79
|
Nucleus
|
TEA022762.1
|
Scaffold9600:223976-236388+
|
12412
|
732
|
83655.02
|
5.87
|
103
|
83
|
-0.419
|
50.69
|
89.07
|
Nucleus
|
TEA024720.1
|
Scaffold1050:31289-41391+
|
10102
|
655
|
74571.17
|
6.33
|
89
|
81
|
-0.518
|
51.00
|
77.68
|
Nucleus
|
TEA002087.1
|
Scaffold754:205321-211058-
|
5737
|
719
|
81783.51
|
5.54
|
108
|
80
|
-0.582
|
42.80
|
75.41
|
Nucleus
|
TEA013346.1
|
Scaffold5883:262251-272009+
|
9758
|
831
|
94422.51
|
5.53
|
124
|
93
|
-0.504
|
43.60
|
79.06
|
Nucleus
|
TEA013344.1
|
Scaffold5883:191507-194485+
|
2978
|
481
|
55933.38
|
5.65
|
72
|
58
|
-0.472
|
40.70
|
80.04
|
Nucleus
|
TEA031068.1
|
Scaffold1571:837990-857219+
|
19229
|
762
|
86343.00
|
5.92
|
98
|
76
|
-0.375
|
40.49
|
85.07
|
Chloroplast
|
TEA020698.1
|
Scaffold2762:535605-540535-
|
4930
|
664
|
75716.56
|
5.63
|
95
|
74
|
0.439
|
46.00
|
81.91
|
Nucleus
|
TEA027328.1
|
Scaffold688:688353-693133+
|
4780
|
748
|
84782.46
|
5.48
|
100
|
81
|
-0.350
|
42.68
|
80.16
|
Chloroplast
|
TEA020112.1
|
Scaffold1093:624579-626760+
|
2181
|
300
|
34181.96
|
5.60
|
47
|
40
|
-0.428
|
33.33
|
85.47
|
Nucleus
|
TEA033250.1
|
Scaffold4160:2129637-2136050+
|
6415
|
622
|
69665.68
|
5.14
|
99
|
74
|
-0.449
|
43.80
|
84.00
|
Nucleus
|
Phylogenetic analysis of tea MAPKKKs
A phylogenetic analysis of the putative tea MAPKKK genes was carried out to evaluate the evolutionary relationships. MEGA 7.0.14 was used to generate the phylogenetic trees, using the Neighbor-Joining (NJ) algorithm, at default parameters and 1000 bootstrap replicates. Three different phylogenetic trees were constructed for MEKK, Raf and ZIK proteins, comprising of the identified tea sequences and already known 415 MAPKKK sequences from Arabidopsis, rice, tomato, potato, capsicum and coffee. For MEKK, the NJ tree was generated using 17 sequences from tea, 21 sequences from Arabidopsis, 22 sequences from rice, 17 sequences from tomato, 22 sequences from potato, 17 sequences from capsicum and 12 sequences from coffee (Fig. 1A). The NJ tree was divided into 4 distinct clades, with an uniform distribution of genes in Clade A. Clade B consisted of only 6 capsicum genes while clade D had only 2 genes of potato. Clade C however, had a share of tomato and potato gene clusters. Clade A also featured an orthologous pair (AtMEKK15 and TEA005306.1) depending on their phylogenetic relationship. For Raf, the NJ tree was generated using 31 sequences from tea, 48 sequences from Arabidopsis, 43 sequences from rice, 44 sequences from tomato, 43 sequences from potato, 37 sequences from capsicum and 28 sequences from coffee (Fig. 1B). Unlike the MEKK tree, the Raf tree was divided into 11 different clades, with an uniform clustering of genes in all the clades. However, the Raf tree did not feature any orthologous gene pair. The NJ tree for ZIK was generated using 11 sequences from tea, 11 sequences from Arabidopsis, 10 sequences from rice, 10 sequences from tomato, 16 sequences from potato, 6 sequences from capsicum and 8 sequences from coffee (Fig. 1C). The ZIK tree was divided into 7 clades and had a uniform clustering of genes in all the clades with only clade E consisting of 2 genes each of Arabidopsis and rice. Similar to the Raf tree, the ZIK tree also did not feature any orthologous gene pair.
Domain analysis of tea MAPKKKs
The MAPKKK domain architecture as reported in known literature reveals that proteins belonging to Raf family in Arabidopsis and other plant species, possess a C-terminal kinase domain and a long N-terminal regulatory domain. The proteins belonging to ZIK family on the contrary possess a N-terminal kinase domain. The proteins belonging to the MEKK family however has less conserved domain architecture and is either located either at the N- or C- terminal or at a much central region of the protein [6].
Out of the 3 subgroups of plant MAPKKKs, the MEKK subfamily is fairly well known and characterized. Most MEKKs are known to be a part of the recognized MAP Kinase cascades, which activates the downstream MKKs. MEKK1 and MEKK2 from Arabidopsis, have been proven to play a significant role in plant innate immunity [15, 16, 17]. Similar to other plant MAPKKKs, 16 out of 17 members of MEKK subfamily in tea structured a characteristic conserved signature G(T/S)Px(W/Y/F)MAPEV, except TEA014429.1 (Fig. 2A). Two of the most widely studied Arabidopsis Raf subfamily MAPKKKs, namely CTR1 and EDR1 are known to actively participate in ethylene mediated signalling and defense response mechanisms. All 31 members of the Raf subfamily in tea featured a conserved GTxx(W/Y) MAPE signature in its kinase domain with no exceptions (Fig. 2B). The ZIK-like MAPKKKs are also known by the name WNK or with no lysine (K). They are not proven to be involved with the phosphorylation of the MKKs in plants however, have specific functions. Arabidopsis ZIK1 is known to phosphorylate APRR3 in-vitro, which is a putative component of the circadian clock in plants and is believed to be involved in signal transduction pathway, regulating its biological activity [18]. Another ZIK cascade, involving ZIK2, ZIK5 and ZIK8 in Arabidopsis is known to regulate the flowering time by modulating the photoperiod [19]. The ZIK subfamily featured a characteristic GTPEFMAPE(L/V/M)(Y/F/L) conserved signature across all its 11 members in tea (Fig. 2C) [5, 6]. The presence of these distinctive conserved signatures across the tea MAPKKKs further confirms identity and the subfamily they belong. The largest subfamily was found to be the Raf subfamily with 31 members, while the smallest was found to be the ZIK subfamily with only 11 members. This result showed consistency when compared with known literature on other plant MAPKKKs.
Motif composition of tea MAPKKKs
To understand the evolution and comprehend sequential characteristics of the MAPKKK proteins in tea, a conserved motif search was carried out using the MEME suite (Fig. 3). Ten conserved motifs were identified in each of the three subfamilies. Almost all the tea MAPKKK proteins featured the protein kinase domain of motif 1, motif 2 and motif 3. Motif 4 was conserved across all the proteins with only one exception of TEA031230.1. Motif 5, motif 7 and motif 8 were only obtained for the ZIK subfamily with one exception of a MEKK-like TEA014429.1, which also featured motif 8. Motif 6 and motif 9 were harboured by almost all the protein sequences. However, motif 10 was only specific to the MEKK and Raf subfamilies. Motif annotation revealed that motif 2 harboured a protein kinase ATP-binding site. Motif 6 contained a tyrosine kinase phosphorylation site. Motif 9 featured a serine/threonine protein kinase activation site (Additional File 3: Supplementary Fig. S4). The results suggested that proteins belonging to a same group harboured similar conserved motifs, further indicating that the classification of the tea MAPKKK subfamilies was backed by motif analyses.
Gene structure analysis of tea MAPKKKs
The intron-exon distribution pattern for tea MAPKKKs were analysed and visualised using the Gene Structure Display Server v2.0. Study of gene structure revealed differences in number of introns and exons, which contributes to variation in gene length. Introns or non-coding sequences are found abundantly within a genome and are regarded as an indicator of genome complexity [20, 21]. Analysis of the intron patterns could help to comprehend and provide insights into the evolution, function and regulation of the genes [20, 22, 23, 24, 25]. The analysis of the intron-exon architecture in tea revealed significant variation in the number of introns and exons among the three subfamilies of MAPKKKs (Fig. 4). However, genes belonging to the same clades had similar intron-exon distribution. The MEKK subfamily had 10 out of 17 genes (59% of the MEKK genes) possessing 6 to 16 exons (Fig. 4A). TEA025870.1 had 19 exons and 18 introns in its gene. Two genes possessed 2 exons and 1 intron and the remaining 4 genes had no introns. Only 9 out of 17 genes featured UTR segments and 5 out of these 9 genes featured both 5’ and 3’ UTRs. 3 genes contained only the 5’ UTR segments and 1 gene only had the 3’ UTR segment. The genes belonging to the Raf subfamily had exons ranging from 6 to 18 and was featured by 27 out of 31 genes (87% of the Raf genes) (Fig. 4B). TEA016969.1 featured a staggering 28 exons and was the highest among all the Raf genes. Three genes namely TEA000933.1, TEA013875.1 and TEA033556.1 had 2, 3 and 4 exons respectively and were the lowest among the all Raf genes. 29 out of 31 genes possessed UTR segments. However, only 17 of the 29 genes had both 5’ and 3’ UTRs. 7 genes featured only the 5’ UTR segment and remaining 5 genes only had the 3’ UTR. Unlike the MEKK and Raf subfamilies, ZIK subfamily displayed a certain level of conservancy with respect to the number of exons and introns. 10 out of 11 ZIK genes (91% of the ZIK genes) had exons ranging from 7 to 10 (Fig. 4C). TEA020112.1 however featured only 2 exons. 9 out of 11 genes possessed UTR segments and 5 of them had both 5’ and 3’ UTRs. 4 genes featured only the 5’ UTR segment. However, no ZIK subfamily gene in tea featured only the 3’ UTR segment like the MEKK and Raf subfamilies.
Genomic distribution map and evolutionary pressure of tea MAPKKKs
The tea MAPKKKs was mapped onto the genomic scaffolds to understand their distribution pattern. Due to the lack of chromosome-level assembly data in the TPIA database, the genes were mapped onto their respective scaffolds instead of the chromosomes. All 59 tea MAPKKKs were extensively distributed across 58 different genomic scaffolds. 17 MEKK genes were distributed across 17 different scaffolds (Fig. 5A). Similarly, 31 Raf genes were distributed across 31 genomic scaffolds (Fig. 5B). 11 ZIK genes were mapped onto 10 genomic scaffolds (Fig. 5C). Two ZIK genes namely, TEA013344.1 and TEA013346.1 were mapped on the same genomic scaffold 5883 and thus featured a duplication event. Additionally, both these genes possessed similar intron-exon architecture. This result is conclusive evidence that duplication events were of significant importance and played a crucial role in the expansion of the MAPKKK genes in C. sinensis genome. Further, the ratio of non-synonymous substitution rates (Ka) and synonymous substitution rates (Ks) was evaluated to illuminate the mechanism of gene divergence and evolutionary pressure of the tea MAPKKKs (Additional File 2). The ratio determines the selective pressure acting on the respective proteins. If the Ka/Ks ratio is <1, it determines negative or purifying selection. If the Ka/Ks ratio is =1, it indicates neutral selection and if the Ka/Ks ratio is >1, it signifies positive selection [26]. For the MEKK subfamily, pair wise comparisons revealed that 72 gene pairs had Ka/Ks ratios above 1, indicating that they are under positive selection, 24 gene pairs had values less than 1, indicating a negative selection and remaining 40 were not a number (Nan) (Additional File 2: Supplementary Table S4). Similarly, Ka/Ks ratios of the Raf subfamily revealed 341 gene pairs in positive selection, 96 in negative selection and 28 pairs as Nan (Additional File 2: Supplementary Table S5). Ka/Ks ratios of ZIK subfamily uncovered 30 pairs in positive selection, 21 in negative selection and the remaining 4 as Nan (Additional File 2: Supplementary Table S6). The Ka/Ks cumulative graphs of tea MAPKKKs were also generated (Additional File 3: Supplementary Fig. S5, Supplementary Fig. S6 and Supplementary Fig. S7). The results suggest strong positive selection pressures would have occurred, enabling different factors to regulate the MAPKKKs in C. sinensis.
Functional interaction network of tea MAPKKKs
For better understanding of the interactions of tea MAPKKKs in C. sinensis, an interaction network was constructed based on the orthologous genes in Arabidopsis, using the STRING server (Fig. 6). The functional interaction network of the genes has been built using that of Arabidopsis because tea database is not included in the STRING online server. TEA005306.1 in tea was found to be orthologous to AT5G55100 in Arabidopsis. This orthologous gene was identified using the TPIA database and AT5G55100 was used to build the interaction network. Additionally, tea proteins, homologous to the Arabidopsis proteins participating in the network were incorporated in the figure. These homologous proteins were designated as STRING proteins and were selected on the basis of high bit scores. Similarity searching programs such as BLAST produce accurate statistical estimates that help determine that protein sequences sharing substantial degree of similarities tend to have similar structures [27]. Proteins that have high sequence and structural similarity generally tend to possess similar functions [28]. AT5G55100 is involved in RNA processing and is expressed during 15 growth stages in 24 different plant structures. It shows interactions with AT4G33690 which is involved in biological process of protein binding. AT2G29210 is involved with RNA splicing, mRNA processing and is expressed during 13 different growth stages in 23 plant structures. ATO (AT5G06160) encodes for a protein similar to pre-mRNA splicing factor SF3a60 and is involved in gametic cell fate determination. Loss of function results in the ectopic expression of egg cell makers, thereby suggesting a role in restriction of gametic cell fate. TK1 (AT2G36960) is a TSL-kinase interacting protein and is involved in protein binding. It is expressed in 14 developmental stages in 25 different plant structures. GPT (AT2G41490) is an integral component of membrane and has a UDP-N-acetylglucosamine-dolichyl-phosphate N-acetylglucosamine phosphotransferase activity. It is expressed during 15 developmental stages in 23 plant structures. AT3G57220 is located in the endoplasmic reticulum and has a UDP-N-acetylglucosamine-dolichyl-phosphate N-acetylglucosamine phosphotransferase activity. It is also linked with polysaccharide biosynthesis and is expressed during 10 growth stages in 16 different plant structures.
Tissue specific developmental gene expression of tea MAPKKKs
The tissue specific expression pattern of the tea MAPKKK genes in various plant tissues were retrieved from the TPIA database where levels of expression were expressed using transcripts per million (TPM). The TPIA database houses tissue specific expression data for 7 different plant tissues which includes apical bud, flower, fruit, young leaf, mature leaf, old leaf, root and stem (Additional File 4: Supplementary Table S7). Among the 59 tea MAPKKK genes, expression data for 58 genes were retrieved with an exception of 1 MEKK gene, TEA031689.1. All 58 genes displayed varied levels of expression, with few of the transcripts barely readable (Fig. 7). For the MEKK genes, the maximum level of expression in apical bud was shown by TEA006319.1. This gene also marked the highest level of expression in young leaf. TEA017119.1 showed highest level of expression in flower. TEA016319.1 displayed highest expression levels in fruit, mature leaf, old leaf and stem. TEA005122.1 was expressed maximum in root. TEA028357.1 and TEA009902.1 had negligible levels of expression in all of the 7 plant tissues (Fig. 7A). For the Raf genes, TEA000933.1 showed highest levels of expression in apical bud, fruit, young leaf, mature leaf, old leaf, root and stem. TEA007232.1 was expressed maximum in flower. However, TEA001765.1, TEA013270.1, TEA028758.1 and TEA031230.1 had negligible levels of expression (Fig. 7B). Finally, for the ZIK genes, TEA002087.1 displayed highest levels of expression in apical bud, flower, young leaf and stem. TEA022762.1 had highest levels of expression in fruit, mature leaf and old leaf. TEA020112.1 showed maximum expression in root. However, TEA013344.1, TEA031068.1, TEA020698.1 and TEA027328.1 showed minor levels of expression (Fig. 7C). Heat maps for all the 58 genes, representing the tissue specific expression levels were also being generated (Additional File 3: Supplementary Fig. S8).
Abiotic stress induced differential expression levels of tea MAPKKKs
To check the effect of various abiotic stress tolerance levels of tea MAPKKKs, the expression data was retrieved from the TPIA database (Additional File 5: Supplementary Table S8, Additional File 6: Supplementary Table S9, Additional File 7: Supplementary Table S10 and Additional File 8: Supplementary Table S11) and expression graphs were generated (Fig. 8, Fig. 9, Fig. 10 and Fig. 11). The Tea Plant Information Archive database houses stress tolerance data for cold stress, drought stress, salt stress and methyl jasmonate (MeJA) treatment. The cold acclimated (CA) data (unpublished), present in the TPIA database consists of 5 stages of expression. These are: 1. 25~20 °C (CK), 2. Fully acclimated at 10 °C for 6 h (CA1-6h) 3. 10~4 °C for 7 days (CA 1-7d), 4. Cold response at 4~0 °C for 7 days (CA 2-7d) and 5. Recovering under 25~20 °C for 7 days (DA-7d), where CK is the control [29]. Expression of MEKK genes revealed that 15 out of 17 genes were upregulated under CA 1-6h. TEA006473.1 was downregulated while TEA031689.1 displayed no expression levels. Expression levels under the CA 1-7d condition showed that 12 genes were upregulated, 4 genes were downregulated and remaining 1 gene showed no data. Under the CA 2-7d condition, expression levels revealed that 10 genes were upregulated, 6 genes were downregulated and remaining 1 gene displayed no expression data. Lastly, under the DA-7d condition, data revealed that 13 genes showed upregulation, 3 genes showed downregulation and 1 gene had no data (Fig. 8A). Expression of the Raf and ZIK genes were also carried based on the same 5 conditions. For the Raf genes, under CA 1-6h condition, 22 genes out of 31 were upregulated and 9 genes were downregulated. Under CA 1-7d condition, 16 genes were upregulated and 15 genes were downregulated. Expression levels under CA 2-7d revealed that 17 genes showed upregulation and remaining 14 genes showed downregulation. Under DA-7d condition, 21 genes were upregulated and 10 genes were downregulated (Fig. 8B). Expression data of the ZIK genes revealed that under CA 1-6h, 7 out of 11 genes were upregulated and 4 genes were downregulated. CA 1-7d condition revealed that 5 genes were upregulated, 5 genes were downregulated and remaining 1 gene displayed no expression. Under CA 2-7d condition, 4 genes were upregulated, 6 genes were downregulated and 1 gene had no expression. Finally, under DA-7d, 8 genes showed upregulation and remaining 3 showed downregulation (Fig. 8C). Heat maps for the retrieved expression data were also generated (Additional File 3: Supplementary Fig. S9).
Further, expression levels of all tea MAPKKKs were checked under the effects of drought stress conditions. Drought stress levels are recorded in the TPIA database with respect to 25% polyethylene glycol (PEG) treatment and it includes 4 different stages: 1. 0h; 2. 24h; 3. 48h; and 4. 72h [30], where 0h was taken as the control. The expression levels of MEKK genes revealed that under PEG-N-24h condition, 12 genes were upregulated, 4 were downregulated and 1 gene did not show any expression. Under PEG-N-48h, 12 genes were upregulated, 4 were downregulated and 1 gene showed no expression. PEG-N-72h revealed 11 genes showing upregulation, 5 genes showing downregulation and 1 gene with no expression (Fig. 9A). Expression of Raf genes showed that under the PEG-N-24h condition, 11 genes were upregulated, 20 genes were downregulated. Under PEG-N-48h, 16 genes showed upregulation while the remaining 15 genes were downregulated. PEG-N-72h revealed that 15 genes were upregulated and 16 genes were downregulated (Fig. 9B). Finally, the expression data of ZIK genes revealed 10 out of 11 genes had different expression levels under the given conditions while 1 gene (TEA013344.1) had no data. Under the PEG-N-24h condition, expression data showed that only 1 gene was upregulated while the rest of the genes were downregulated. PEG-N-48h condition too revealed the same result with only 1 gene being upregulated. However, PEG-N-72h showed that 2 genes were upregulated and the rest of the genes were downregulated (Fig. 9C). Heat maps for the afore-mentioned data were also generated (Additional File 3: Supplementary Fig. S10).
The expression levels of the tea MAPKKKs under salt stress condition were studied. Similar to the drought stress parameters, the salt stress data in TPIA database is recorded based on treatment with 200 mM NaCl under 4 stages: 1. 0h; 2. 24h; 3. 48h; and 4. 72h where 0h was taken as the control. Analysis of the MEKK genes revealed that under NaCl-N-24h, 9 genes were upregulated and 8 genes were downregulated. For NaCl-N-48h condition, 9 genes showed upregulation and remaining 8 genes were downregulated. Expression levels under NaCl-N-72h revealed 5 genes being upregulated and the rest being downregulated (Fig. 10A). For the Raf genes, expression data concluded that under NaCl-N-24h condition, 15 genes were upregulated and 16 genes were downregulated. Under the NaCl-N-48h condition, 16 genes showed upregulation and 15 genes were downregulated. Expression levels under NaCl-N-72h showed that 8 genes were upregulated and remaining 23 were downregulated (Fig. 10B). For ZIK genes, 10 out of 11 genes had expression levels with 1 gene (TEA013344.1) showing no effect under the given conditions. Expression levels determine that under NaCl-N-24h condition, only 2 genes showed upregulation and the rest of the genes were downregulated. For NaCl-N-48h condition, only 1 gene was upregulated while the remaining 9 were downregulated. NaCl-N-72h condition too revealed a similar result with 2 genes being upregulated and remaining 8 being downregulated (Fig. 10C). Heat maps were generated for the above-mentioned data as well (Additional File 3: Supplementary Fig. S11).
Finally, the expression levels of the tea MAPKKKs under MeJA treatment were studied and analysed. The hormonal treatment data is recorded based on the results of exposing the plant parts to water solution of MeJA, under 4 stages: 1. 0h: 2. 12h: 3. 24h and 4. 48h where 0h was used as the control. For the MEKK genes, under the 12h_MeJA condition, 3 genes showed upregulation, 13 genes were downregulated and remaining 1 gene had no expression at all. Under the 24h_MeJA condition, 4 genes were upregulated, 12 were downregulated and 1 gene was not expressed at all. Under 48h_MeJA condition, 8 genes were upregulated and 9 genes were downregulated (Fig. 11A). Similarly, for the Raf genes, treatment under 12h_MeJA condition revealed that 10 out of 31 genes were upregulated and remaining 21 genes were downregulated. Under 24h_MeJA condition, 8 genes showed upregulation while 23 genes were downregulated. 48h_MeJA revealed that only 4 genes were upregulated and rest of the genes were downregulated (Fig. 11B). Treatment of the ZIK genes under the 12h_MeJA condition revealed that 4 genes were upregulated and 7 genes were downregulated. 24h_MeJA condition showed 5 genes being upregulated and remaining 6 being downregulated. 48h_MeJA condition concluded that 3 genes were upregulated and remaining 8 being downregulated (Fig. 11C). Heat maps for these data were also generated (Additional File 3: Supplementary Fig. S12).