Genome-wide identification and characterization of Glutathione S-transferase gene family in Cajanus cajan and their expression profiling under different developmental stages in anatomical tissues

doi:10.21203/rs.3.rs-2130802/v1

Download PDF

Research Article

Genome-wide identification and characterization of Glutathione S-transferase gene family in Cajanus cajan and their expression profiling under different developmental stages in anatomical tissues

https://doi.org/10.21203/rs.3.rs-2130802/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Plant glutathione S-transferases (GSTs) are multifunctional conserved protein superfamily that is involved in various biological processes such as growth and development, cellular detoxification, stress biology, and various signaling processes. In the current study, a comprehensive genome-wide identification and characterization of the GST gene family were performed in the agriculturally important legume crop Cajanus cajan. A total of 68 GST genes were identified that belong to eight GST classes based on their conserved domains and motifs. Among 68 CcGST genes, 37 CcGST genes were found on seven Cajanus chromosomes and the remaining CcGST genes were found on the scaffold. Segmental and tandem duplication, both were the driving forces of CcGST gene family expansion. The conserved pattern of exon and intron structure among the different GST classes was observed. The secondary prediction showed the dominance of α- helices. Ser is the highly phosphorylated site in CcGSTs. The subcellular localization prediction of CcGSTs revealed their dominance in the cytoplasm. The physicochemical properties of major CcGST proteins reveal that they are acidic in nature. The expression profiling study revealed the high expression of CcGSTU38, CcGSTU40, CcGSTU44, CcGSTL3, CcGSTL4, CcEF1G1, CcEF1G2, CcDHAR2 and CcGSTF6 in most of the developmental stages in different anatomical tissues. The molecular docking study of highly expressed CcGSTU38 with eight herbicide safeners revealed its highest binding affinity with Fenclorim (-5.44 kcal/mol). This gene could be a potential candidate for future molecular characterization under herbicide stress. The results of the current study endow us with the further functional analysis of Cajanus GSTs in the future.

Cajanus Cajan

Glutathione S-transferase

Expression Profiles

Legume information system (LIS)

Pigeon pea (Cajanaus. cajan) is a diploid legume crop species (2n= 2x =22) from the family Fabaceae local to the Old World. The scientific name for the genus Cajanus and the species cajan originated from the Malay word katjang meaning legume concerning the bean of the plant. In English, they are often known as pigeon pea which is derived from the historical utilization of the pulse as pigeon fodder in Barbados. The pigeon pea is generally developed in tropical and semitropical districts all over the globe, being ordinarily consumed in South Asia, South East Asia, Africa, and Latin America (https://en.wikipedia.org/wiki/Pigeon_pea). Pigeon pea is a protein-rich staple food. It is a valuable legume grain for sustainable agriculture and human nutrition. It is eaten as a green vegetable and dry pulse. It contains around 22% protein, which is just multiple times that of grains. Red gram supplies a significant portion of protein prerequisite of vegetarian residents of the country. Pigeon peas contain high levels of protein that is 21.7g (nutritional value per 100g) and the important amino acids methionine, lysine, and tryptophan (Pandey et al. 2013). The plant is known for its huge medicinal value. Leaves are used as a treatment for pulmonary conditions such as coughs, bronchitis, diarrhoea, haemorrhages, sores, and wounds. Leaves are also infused with Dactyloctenium aegyptium, to accelerate childbirth. Diabetes and sore throats can be cured with the infusion of flowers and leaves of Pigeon pea. The root bark has various flavonoids like cajaflavanone, triterpenes, and cajanone (antimicrobial agent). C. cajan also has anti-cancerous properties concerning MCF-7 human breast cancer cells. Cajanol is found in C. cajan roots and is an isoflavanone and also an important phytoalexin. Cajanol arrests the cell cycle in the G2/M phase and induces apoptosis (Pal et al. 2011). Globally, pigeon pea ranked sixth after peas, broad beans, lentils, chickpeas and common bean (Emefiene et al. 2014). It is cultivated on 5.4 million hectare land area worldwide with an annual production of 4.49 million tons. It is grown in about eighty-two countries of the world. India accounts for 72% pigeon pea production worldwide. It is agriculturally important crop that play a major role in economy of India and Malawi (FAO Statistics; 2017). It is their chief cash crop.

Plant glutathione S- transferases (GSTs; EC 2.5.1.1.8) are a large and complex supergene family, which is involved in diverse key metabolisms in plants. They catalyze the nucleophilic conjugation of GSH to its multiple electrophilic and hydrophobic substrates. The presence of GSTs has been very well documented both in eukaryotes and prokaryotes; their functional and structural characterizations have been accomplished in diverse species. GSTs are major phase II detoxification enzymes working downstream of Cyt P450s in cellular metabolism and aid plants to alleviate several chemical compounds like reactive oxygen species, reactive carbonyl species, reactive nitrogen species, etc. They are involved in multiple cellular pathways like secondary metabolism, anthocyanin accumulation (Shao et al. 2021), signal transduction pathways (Nianiou-Obeidat et al. 2017), tetrapyrrole metabolism, and retrograde signaling (Sylvestre Gonon et al. 2020), apoptosis, plant growth and development (Basantani et al. 2007), and against various biotic and abiotic stresses like oxidative stress tolerance, drought stress, herbicides, antibiotic resistance, etc. (Cao et al. 2022; Basantani et al. 2011), glucosinolate biosynthesis and metabolism (Liu et al. 2014). These glutathione conjugation reactions are important in herbicide selectivity (Chronopoulou et al. 2017). When the plants are under extreme stress conditions, reactive oxygen or nitrogen species (ROS or RNS) are generated in the plant cells, which react with organic molecules, which in turn produce toxic compounds. GSTs play a significant role in the detoxification, excretion, or sequestration of these toxic compounds (Vaish et al. 2020). Plant GSTs have been classified into 14 classes based on sequence similarity: tau, phi, theta, zeta, lambda, dehydroascorbate reductase (DHAR), -subunit of the eukaryotic translation elongation factor 1B, tetrachloro hydroquinone dehalogenase (TCHQD), Ure2p, microsomal prostaglandin E synthase type 2 (GST proteins are characterized by the presence of a conserved N- terminal domain with a thioredoxin fold (Trx) and a C-terminal domain. The N-terminal domain contains a G-site for glutathione binding, and the C-terminal domain contains H-site for the binding of hydrophobic substrates. Interestingly, the catalytic residue involved in GSH conjugation varied among classes (Sylvestre-Gonon et al. 2019), while lambda, DHAR, and GHR had Cystein (Cys) as a catalytic residue (Lallement et al. 2014a). helices are the major secondary structural elements in GSTs, followed by β- strands and sheets. Based on their localization, GSTs are also known to be cytosolic, microsomal, and mitochondrial (Vaish et al. 2020). Currently, genome-wide identification and characterization of the GST gene family have been performed in many plant species like melon (Song et al. 2020), tea plant (Cao et al. 2022), banana (Vaish et al. 2022), pumpkin (Kayum et al. 2018), radish (Gao et al. 2020), etc. Wheat is the only plant species with more than 330 GST genes (Wang et al. 2019b).

In legumes, the GST gene family has been identified and characterized. In Glycine max, 126 GST genes (Hasan et al. 2020), in Vigna radiata, 33 genes (Vaish et al. 2018), and in Cicer arietinum, 51 GST genes have been identified (Ghangal et al. 2020). In G. max, GmGSTU63, GmGSTU73, GmGSTF2, and GmGSTT5 were found to be highly upregulated in response to four abiotic stresses- dehydration, drought, temperature, and ozone; and two biotic stress conditions and other genes were involved in various developmental stages. The GST proteins in chickpea were found to be involved in seed developmental stages.

The draft genome of Cajanus cajan was sequenced by Varshney et al. 2012. The GST gene family in pigeon pea was comprehensively identified and characterized using the sequenced genome of C. cajan, which is available in the legume information system. Using various computational strategies, a total of 68 CcGST genes were identified and characterized. The physicochemical features, subcellular localization, gene architecture, motif analyses, active site residue depiction, secondary structure prediction, and post-translational modifications were analyzed. Chromosome localization, gene duplication, and phylogenetic analyses were also done. The transcript profiling of all the 68 CcGST genes was analyzed in different developmental stages and anatomical tissues. The molecular docking study of the highly expressed CcGST gene (CcGSTU38) with eight herbicide safeners was performed to elucidate the binding affinity. This comprehensive genome-wide identification and expression profiling will provide a significant platform for further characterization of these genes and the possibility to explore the roles of CcGST genes in crop improvement.

2.1 Genome-wide identification of GST gene family in Cajanus cajan led to the identification of 68 CcGST genes

With the pBLAST search against the C. Cajangenome on the LIS database, a total of 68 GST genes were identified in the Cajanus genome.The identified GST genes were validated for the presence of thioredoxin fold at the N-terminal domain with NCBI Batch-CD search, SMART database, and Pfam Database. All the 68 genes were found to have N and C-terminal domain with the Trx fold. The identified 68 GST genes were grouped into eight canonical GST classes i.e. tau, phi, theta, zeta, lambda, DHAR, GHR, and EF1G. The protein, genomic DNA, and mRNA sequences were downloaded from the LIS database. The nomenclature was done by adding the prefix Cc from Cajanus cajan with the identifier of the respective GST classes: CcGSTU, CcGSTF, CcGSTT, CcGSTZ, CcGSTL, CcDHAR, CcGHR, and CcEF1G.The tau and phi GST genes were highest in number i.e. 44 (CcGSTU) and 9 (CcGSTF) followed by 5 (CcGSTL) and 2 (CcGSTT, CcGSTZ, CcDHAR, CcGHR, and CcEF1G). The numbering of genes was done based on their corresponding chromosomal position from top to bottom (Table 1).

2.2 CcGST proteins are highly stable and dominantly localized in the cytoplasm

Among the 68 CcGST genes the largest protein was encoded by CcEF1G1 and the smallest was encoded by CcGSTU41, which was 385 and 78 amino acids in length respectively with their respective molecular weight i.e. 44.13 kDa and 9 kDa. The isoelectric point (pI) ranged from 4.65 (CcGSTU26) to 9.69 (CcGSTT2). Out of 68 CcGST proteins, 16 CcGST proteins were basic and 52 were acidic in nature. The aliphatic index (AI) ranged from 77.48 (CcGSTF8) to 132.31 (CcGSTU41). The CcGST with AI greater than 100 such as 102.34 (CcGSTU2), 103.91 (CcGSTU5), 104.8 (CcGSTU21), 100.59 (CcGSTU27), 101.33 (CcGSTU36), 103.52 (CcGSTU37), 132.31 (CcGSTU41)and 106.95 (CcGSTT2) were more hydrophobic than other GST members as they contained a higher number of amino acids containing aliphatic side chain in their structure such as alanine, methionine, isoleucine, glutamate, lysine. The value of hydropathicity (GRAVY) for all CcGSTs was negative which is indicative of these proteins as more hydrophilic and had good interaction with water molecules (Table.1). The subcellular localization was predicted through three independent online available tools. The results showed that major CcGSTs were localized in the cytoplasm followed by mitochondria, chloroplast, endoplasmic reticulum, plasma membrane, and nucleus (TableS1; Fig.1).

2.3 Thirty-seven CcGST genes were localized on nine Cajanus Chromosomes and tandem and segmental duplication were equally involved in CcGST gene family expansion

Among 68 CcGSTs only 37 GST genes were annotated on nine Cajanus chromosomes,the rest31 were found on scaffolds with an unknown chromosomal location. Chr 7 possessed the highest ten CcGST genes and Chr 8possessed the lowest i.e., only one CcGST gene. Chr 2 and 9 contained 5 CcGST genes, Chr 1 and 11 each contained 4 CcGST genes, Chr 3 and 6 carried 3 CcGST genes each whereas Chr 2 possessed only two CcGST genes (Fig.2). Tandem and segmental duplication and transposition play an important role in gene family expansion. The gene family expansion event was also analyzed in C. cajan. A total of 19 gene pairs were found to be duplicated with a percent identity of more than 80% against each other. Ten gene pairs CcGSTU8/9, CcGSTU9/12, CcGSTU28/29, CcGSTU36/37, CcGSTU38/40, CcGSTF5/6, CcGSTF6/7, CcGSTL3/4, CcGSTL3/5, and CcGSTL4/5 were found to be involved in tandem duplication with common chromosomal or scaffold location and nine gene pairs CcGSTU14/41, CcGSTU20/41, CcGSTU34/43, CcGSTU36/38, CcGSTU36/40, CcGSTU37/38, CcGSTU37/40, CcGSTUL2/3 and CcEF1G1/ CcEF1G2 were part of segmental duplication with different chromosomal or scaffold localization. CcGSTU plays a major role in CGST gene family expansion as tau CcGST are majorly involved in gene duplication events (Table.2; Fig. 2).

2.4 Phylogenetic tree showed clustering of classes into separate clades

To further understand the relationship among the GSTs of different plant species viz. A. thaliana, G. max, O. sativa, (angiosperm), Physcomitrella patens (a bryophyte), and Larix kaempferi (a gymnosperm), the GST protein sequences of all these plants were aligned through Clustal Omega and a combined phylogenetic tree was constructed using MEGA.X tool. The results showed that GST genes of these crops can be divided into twelve classes namely tau, phi, theta, zeta, lambda, DHAR, EF1G, GHR, Hemerythrin, iota, and Ure2p. Hemerythrin, iota, and Ure2p classes were found only in P. patens whereas tau, phi, theta, zeta, lambda, DHAR, EF1G, and GHR classes are common to all plant species. Each GST class branched out into eleven clades. The two superclades were plant-specifictau and phi GST genes. The gene pairs of CcGSTs under tandem and segmental duplications were close together in a phylogenetic tree showing close relatedness with each other.The outcome also revealed that the GST gene family had undergone divergentevolution between dicotyledonous and monocotyledonous plantsfrom a common ancestor (Fig. 3). Additionally,it can also be predicted thatthe evolution of plant GSTs might be earlier than their division into individual groups such as bryophyte, pteridophyte, gymnosperm, and angiosperm (Fig.3).

2.5 Fifteen conserved motifs were identified and canonical gene architecture was observed in CcGSTs

To investigate the conserved motifs in CcGSTs,the MEME suite tool was implemented. Fifteen highly conserved protein motifs were recognized in Cajanus GSTs. The amino acid length ranged from 6 to 50. Among 15 motifs, CcGSTU contained the highest number of motifs i.e. motifs 1, 2, 3, 4, 5, 6, 7, 8,12, 13, and 14. Few motifs were class-specific and few motifs were found in all the CcGST classes.Motif 1is found in all CcGST classes except CcGHR, whereas motif 3 was present in all CcGSTs except CcEF1G. Motif 5 was observed in all CcGST classes. Motif 9 was found only in CcGSTF whereas motif 11 was found in CcGSTF, CcGSTT, and CcGSTZ. Motif 10 was found in CcGSTF and CcEF1G. Motif 15 was observed only in CcGSTL. Motif 1, 3, 4, 11, and 12 was localized at the N-terminus and motif 6 and 10 was localized at the C-terminus. Motif 3 containing highly conserved Serine residue was predicted to be the active site residue (Fig.4).

The gene structure of 68 CcGST geneswere analyzed using the genomic and CDS sequences with a Gene structure display server. There is a significant difference in the exon number across the CcGST classes. The number of exons ranged from one to ten. All the CcGSTU members had two exons in their gene structure except for CcGSTU14, 31, and 32 which contained threeexons,and CcGSTU41 which contained only one exon. All the CcGSTF contained three exons except for CcGSTF2 which possessed two exons. All CcGSTT genes possessed seven exons, CcGSTZ1 had ten exons and CcGSTZ2 had nine exons. All CcDHAR genes had six exons whereas CcGSTL1 and CcGSTL3 had eight exons and CcGSTL2 and CcGSTL4 had nine exons. All the CcEF1G genes contained six exons and CcGHR1 and CcGHR2 contained three and six exons respectively(Fig. 5).

2.6 Ser and Cys are conserved catalytic residues

For the confirmation of the presence of catalytic residue in the predicted GST protein sequence of Cajanus, the amino acid sequences of each class of GSTs were aligned with corresponding amino acid sequences of Arabidopsis, G. max, and O. sativa (Fig. 6). Ser (S) as a catalytic residue located in the N-terminus G-site was observed in tau, phi, theta, and zeta class whereas Cys (C) was observed in DHAR, lambda, and GHR class(Fig. 6). However, the positions of the active site residues varied greatly among the different CcGST classes. For example, the Ser of tau and theta CcGSTs was found at position 10-20 (Fig.6a and 6c), whereas in Phi CcGSTs it was localized at position 60-70 (Fig. 6b). In zeta CcGSTs, it was at position 30-40 (Fig. 6d). Inlambda and DHAR CcGST classes the Cys residue was found at position 100-110(Fig. 6e), and 20(Fig. 6f), respectively. In GHR, catalytic Cys was found at position 40-50 (Fig).The catalytic residue in EF1G class was tyrosine (Tyr) residue but its position is not confirmed.

2.7 Secondary structure prediction

In Cajanus GSTs, the percentage of secondary structural elements like alpha-helix, beta-sheet, coils, and turns were estimated through the SOPMA tool. The percent of the alpha helix was found to be highest followed by coils and β-strands. In Cajanus GSTs, all the tau, phi, theta, and zeta classes possessed the highest percent of the alpha helix. The CcGSTU42 contained the highest percentage of α-helices which is 61.29 and CcDHAR1 contained the lowest percentage of α-helices which is 36.74. It is observable that the protein sequences of a few Cysteinyl GSTs viz. CcGSTL2, CcGSTL4, CcDHAR1, CcGHR1, and CcGHR2 and Tyr active site residue containing GST class i.e. CcEF1G1, CcEF1G2possessed a higher percent of the coil than α-helices. These structural differences can be correlated with their stability (Table.S3; Fig. 7).

2.8 Phosphorylation is the major post-translational modification in CcGSTs

For post-translational modification analyses such as phosphorylation and glycosylation, the 68 CcGST amino acid sequences were investigated. Serine (Ser) was found to be the major site of phosphorylation followed by threonine (Thr) and tyrosine (Tyr) accounting for 45%, 31%, and 25% respectively (Table. S3; Fig. 8). Furthermore, the glycosylation sites were predicted. Among 68 CcGSTs, 35 CcGST genes were found to have possible glycosylation sites. The CcGHR1 was found to have the maximum number of 7 glycosylation sites (Table S).A score above 0.70 is indicative of most potential glycosylation sites. In CcGSTU8, CcGSTU13, CcGSTU44, CcGSTF3, and CcGSTT1 had the score ≤ 0.70 and can be considered as likely sites for glycosylation (Table. S4).

2.9 CcGSTU38 was found to be highly expressed in all developmental stages

To explain the functions of CcGST genes, their expression levels were analyzed in seventeen anatomical tissues, namely seed, pod, shoot apical meristem, sepal, petal, root, leaf, petiole, stem, nodule, pistil, stamen, bud, embryo, hypocotyls, radicals, and cotyledon at different developmental stages from germination to senescence. On analyzing the expression pattern, the CcGST genes can be classified into three types. The expression analysis was done based on its developmental stages viz. reproduction stage, seedling stage, germination stage (Fig.9b), vegetation stage, and senescence stage (Fig. 9c). In the reproductive stage, the CcGST genes are expressed ubiquitously in most of the tissues like CcGSTU5, CcGSTU22, CcGSTU27, CcGSTU28, CcGSTU32, CcGSTU34, CcGSTU35, CcGSTU38, CcGSTU39, CcGSTU40, CcGSTU44, CcGSTL2, CcGSTL3, CcGSTL4, CcGSTL5, CcGSTF2, CcGSTF5, CcGSTF6, CcGSTF9, CcGSTZ2, CcGHR1, CcGHR2, CcDHAR1, CcDHAR2, CcEF1G1, and CcEF1G2. Among them, the expression level of CcGSTU38, CcGSTU40, CcEF1G1, CcEF1G2, CcGSTL3, CcGSTL4, CcDHAR2, and CcGSTF6 was highest in all the tissues in all the developmental stages except for petals at the reproduction stage (Fig. 9a, b, c). In the radical germination stage, the expression level of CcGSTU16, CcGSTU17, CcGSTU18, CcGSTU19, and CcGSTF4 was high. In the senescence and vegetation stages of most tissues, many of the CcGSTs were found to have a very low level of expression, whereas few CcGST genes like CcGSTU2, CcGSTU8, CcGSTU12, CcGSTU13, CcGSTU24, CcGSTU25, CcGSTU41, CcGSTU42, CcGSTU43, CcGSTF1, CcGSTF3, CcGSTF7, CcGSTF8, and CcGSTL1 were found to have very low transcript abundance (Fig. 9c). Comparatively, a remarkable difference is found in the different developmental stages; viz. majority of the genes were found to be expressed in the seedling, germination, and reproduction stages in different anatomical tissues, whereas in the vegetation and senescence stage most of the genes were found to have an extremely low level of expression in nearly all the tissues.

2.10 Molecular docking analyses showedthe highest binding affinity of CcGSTU38 with Triapenthenol

In the expression profiling, it was analyzed that the expression level of CcGSTU38 was highest in all the anatomical tissues under all the developmental stages, hence this candidate gene was selected for molecular docking study with eight most commonly used herbicide safeners. The three-dimensional structure of CcGSTU38 was modeled using a Swiss model workspace. The PDB structure of CGSTU38 was used for a molecular docking study against safener molecules namely;Fenclorim, Benoxacor, Flurazole, Dichlormid, Oxabetrinil, Fluxofenim, Cyometrinil, and Triapenthenol. The Docking study of CcGSTU38 showed different binding energy with the ligand molecule accounting for -3.41 kcal/mol with Benoxacor, -5.03 kcal/mol with Dichlormid, -4.73 kcal/mol with Dietholate,

-5.44 kcal/mol with Fenclorim, -5.33 kcal/mol with Flurazol, -5.17 kcal/mol with Fluxofenim, -5.02 kcal/mol with Oxabetrinil, and -5.48 kcal/mol with Triapenthenol. The binding energy of CcGSTU38 was lowest with Fenclorim (-5.44 kcal/mol) having a high affinity with protein molecule and could be a potential substance to enhance the expression level of CcGSTU38 under herbicide treatment (Table.3; Fig. 11).

The dawn of high throughput next-generation sequencing techniques has released the genome sequence of diverse plant species. The genome sequences are publically available and aided the comprehensive genome-wide identification studies in the past few decades. The glutathione S-transferase is a versatile inducible protein family that play important role in stress resistance and xenobiotic detoxification. In legumes, the GST gene family has been identified and characterized previously, a total of 74 GST genes in soybean (Ahmad et al. 2020), 51 in chickpea (Ghangal et al. 2020), 31 in Vigna radiata (Vaish et al. 2018) and 92 in Medicago (Hasan et al. 2021) have been reported. The lack of GST gene family identification and characterization was found in C. cajan. Consequently, the current study utilized the genome sequence of C. cajan available at LIS. The complete genome-wide identification and characterization were performed and 68 CcGST genes were observed inpigeon pea. A total of eight GST classes were found namely, tau, phi, theta, zeta, lambda, DHAR, EF1G, and GHR. The conserved N or C-terminal domain was present in the CcGSTs as reported in other plant species. The plant-specific tau and phi GST genes were highest in number that is 44 and 9 respectively as reported in tea (Cao et al. 2022), banana (Vaish et al. 2022), apple (Fang et al. 2020),and melon (Song et al. 2021). The high number of tau and phi genes can be coupled to their major functional contribution to plant metabolism. The physicochemical features like molecular weight, hydropathicity, aliphatic index, and pI were analyzed and it was observed that the majority of the CcGSTs were acidic in nature. The physical features of protein play an important role I in its biochemical functioning hence need to know in detail (Mohanta et al. 2019).The values of the grand average of hydropathicity (GRAVY) for most of the CcGSTs were negative which can be correlated with its hydrophilic nature and high stability (González-Faune et al. 2021). Few CcGSTs showed an aliphatic index above 100 and these proteins can be considered thermal stable as they contained more aliphatic amino acids in their primary structure (Hasan et al. 2021). The location of any protein in subcellular compartments is directly related to its role in diverse biological functions. The majority of the CcGSTs were found to be localized in the cytoplasm which indicates that these are soluble proteins. In P. patens, implementing the C-terminal GFP fusion technique and confocal microscopy, the subcellular localization of 21 P. patens protein were identified. Among them, 16 P. patens GSTs were localized in the cytoplasm (Liu et al. 2013). In a study by Lallement et al. 2014, the presence of plant GSTs were also reported in mitochondria, chloroplast, peroxisome, nucleus, etc. In mitochondria under oxidative stress, GSTs are found to be involved in maintaining GSH: GSSG ratios. As in this cellular compartment, a high concentration of GSH has been observed (Zechmann et al. 2008).It can be predicted that CcGSTs found in mitochondria would be involved in this function. The 37 CcGST genes were localized in nine Cajanus chromosomes. The pattern of the chromosomal distribution of CcGSTs was found in clusters in the proximal or distal end, as reported in wheat (Wang et al. 2019b) and banana GSTs (Vaish et al. 2022). It is reported that the origin of new members in a gene family is majorly due to gene duplication events and is a key factor in plant genome evolution [53]. The duplicated genes show variations in amino acid composition, domain organization, and gene architecture and direct in a way to non-functionalization, sub-functionalization, or neo-functionalization [54]. In C. cajan, tandem and segmental duplicationwere the two major drivingforcesfor GST gene family expansion. 19 CcGST gene pairs were found to be duplicated on three chromosomes (Chr 3, 7, and 8) and six scaffolds. Ten CcGST genes showed tandem duplication and nine CcGST genes showed segmental duplication.In Gossypium raimondii,both the duplication events were observed for GST gene family expansion (Dong et al. 2016). The tau class CcGSTs played a major role in gene duplication event. The ratio of non-synonymous and synonymous substitutions (dN/dS) for all the duplicated CcGST genes was less than 1 which suggested that the duplicated genes of CcGST were not favored by non-synonymous substitutions and purifying selection is more common. In banana, apple, melon, and medicago, the ratio of non-synonymous and synonymous substitutions (dN/dS) was also less than one and suggested strong purifying selection. Thephylogenetic analysis of C. cajan with A. thaliana, G.max, O. sativa, P. patens, and L. kaempferishowed the clustering of GST genes in a class-specific manner. Each claderepresents the different GST classes. The gene architectureof CcGSTs showed conservation of classes. The number of exon/intron was conserved among the classes such as two exons in tau, three exons in phi, six exons in DHAR, GHR, and EF1G, seven exons in theta class, ten exons in zeta, and eight exons in lambda. Multiple sequence alignment revealed the conservation of signature motifs in eight CcGST classes such as W(A/V)S(P/M) in tau, (E/Q)SR(A/K/G)I in phi, SQPS/C in theta, SSCS/A in zeta, CPF/YA in lambda, CPFC/S in DHAR, and CPWA in GHR (Vaish et al. 2020; Vaish et al. 2022; Wang et al. 2019b). Phosphorylation is the most common post-translational modification in proteins that plays a major role in plant functions. It is a positive modulator of enzyme activity. For the same, the phosphorylation and glycosylation sites were predicted in CcGSTs. Serine was found to be highly phosphorylated as reported in banana GSTs. Kinases(kinase II, protein kinase C, and tyrosine kinase) are involved in the regulation of tau, phi, and zeta GSTs. GSTs, undergo post-transcriptional regulation mechanisms i.e. phosphorylation, that is correlated with GST gene expression. The secondary structure of plant GSTs is characteristically rich in α-helix followed by random coil and β-sheets. It is reported that the N-terminal domain contains α-helix and β-strands, arranged in thioredoxin-like fold and the C-terminal domain is dominant of α-helix (Labrou et al. 2015). A similar secondary structure pattern was observed in CcGST proteins. The structural flexibility and functionality of a protein are correlated to its secondary structure. It is well known that the protein having α-helix as a dominant secondary structure elementis more stable and structurally elastic. The proteins with a higher percentage of α-helix and random coil were known for true enzymatic activity and good turnover (Yu et al. 2017).Utilizing the RNA-seq data, the expression profiling has been done in many plant species like rice, soybean, banana, apple, chickpea, etc. the expression pattern of Cajanus GSTs were tissue and developmental stage-specific. Like in the seedling and germination stage, most of the CcGSTs were showing high transcript abundance whereas in the senescence and vegetative stage the expression level was remarkably low. The expression level of CcGSTU38, CcGSTU40, CcEF1G1, CcEF1G2, CcGSTL3, CcGSTL4, CcDHAR2, and CcGSTF6 was very high in most of the anatomical tissue in all the developmental stages whereas the other remaining GSTs genes demonstrated a moderate or low level of expression in the tissue. The outcome was reliable to the GST expression data of Arabidopsis (Dixon and Edwards 2010), G. max, and rice (Jain et al. 2010), etc.Safeners are those chemical compounds that can protect the crop from the negative effect of diverse herbicides without reducing their effect (Davies, 2001). The molecular docking study of CcGSTU38 with the herbicide safeners demonstrated its high binding affinity with fenclorim. Hence this safener can be a potential candidate for enhancing GST activity to detoxify herbicides. In Arabidopsis, maize, and wheat, the expression level of GST transcripts was higher (Edwards et al., 2005).

3.1 Genome-wide identification and nomenclature of Cajanus GST genes

For comprehensive genome-wide identification of GST genes in the Cajanus genome, the well-characterized GST protein sequences of Arabidopsis thaliana, Glycine max, and Oryza sativa were retrieved from The Arabidopsis Information Resource (TAIR) (http://www.arabidopsis.org/) and the Rice Genome Annotation Project (RGAP) by their accession numbers, respectively. The pBLAST search was performed on the Legume Information System (LIS) (http://legumeinfo.org/) by selecting the genome of Cajanus cajan with an e-value of 0.001. The identified sequences were confirmed for the presence of conserved N-terminal domains with the thioredoxin fold and a C-terminal domain through NCBI Batch-CD Search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) (Marchler-Bauer et al. 2017), SMART (Simple Modular Architecture Research Tool) database (http://smart.embl-heidelberg.de/) (Letunic et al. 2020) and Pfam search (http://pfam.xfam.org/search) online tools. The amino acid, genomic, and coding sequence (CDS) of the identified C. cajan were downloaded from the LIS database and implemented for further computational analysis.

3.2 Chromosomal localization, Phylogeny and gene duplication analyses of CcGST genes

The chromosomal location of the identified CcGST genes was retrieved from the Cajanus genome browser. The locations of these genes were diagrammatically represented on their respective chromosomes using the TBtools software v0.667 (https://github.com/CJ-Chen/TBtools). The evolutionary analysis was carried out using the GST amino acid sequences of A. thaliana, O. sativa, G. max, P. patens (a bryophyte), and Larix kaempferi (a gymnosperm) with C. cajan GST amino acid sequences. All the sequences were aligned using Clustal Omega and the tree was constructed through the neighbor-joining (NJ) method using MEGA X software with the bootstrap value of 1000 replicates (Kumar et al. 2018).

Duplicated CcGST genes were identified by NCBI pBLAST search (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins) against each other. Those gene pairs showing a percent identity above 80% were assumed to be duplicated genes (Kong et al. 2013). Tandem duplicated genes were homologous gene pairs within 100 Kb regions on the same chromosome of C. cajan, while segmental duplicated genes (SD) were those located beyond the 100 kb region or on a different chromosome (Holub et al. 2001). The protein sequence alignment performed on Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) and their respective mRNA sequences were used for the estimation of synonymous rate (dS), non-synonymous rate (dN), and evolutionary constraint (dN/dS) between the duplicated CcGST gene pairs using the PAL2NAL online tool (https://bio.tools/pal2nal) (Suyama et al. 2006). The mode of selection between duplicated gene pairs was identified through the dN/dS ratio. The values > 1, =1, and <1 were considered as positive, neutral, and purifying selection respectively. The divergence time T (million years (Mya) of each duplicated gene pair was calculated using the formula: (T=dS/2λ), where T is the divergence time, dS is the number of synonymous substitutions per site, and λ is the fixed rate of 1.5×10−8 synonymous substitutions per site per year expected for dicotyledonous plants (Koch et al. 2000).

3.3 Physicochemical properties of CcGST protein sequences and their subcellular localization prediction

The amino acid length, molecular weight, pI, aliphatic index, and Grand Average of Hydropathy (GRAVY) of CcGST proteins were analyzed through the Expasy ProtParam tool (http://web.expasy.org/protparam/) (Gasteiger et al. 2005) with default parameters. The subcellular localization was predicted through three different tools, namely the CELLO online tool v.2.5 (http://cello.life.nctu.edu.tw/) (Yu et al. 2006), DeepLoc (Armenteros et al. 2017) (http://www.cbs.dtu.dk/services/DeepLoc/) and WoLF pSORT (www.genscript.com/wolf-psort.html) (Horton et al. 2007).

3.4 Conserved motifs and gene architecture analyses

The 68 GST protein sequences of Cajanus were used for conserved motif analysis. The conserved motifs of CcGSTs were identified using the Multiple Em for Motif Elicitation (MEME) program (http://meme-suite.org/) (Bailey et al. 2009). The number of motifs was 15, with a motif width of 6–50. The results were visualized with TBtools. The CDS and genomic DNA sequences of CcGSTs were used for gene architecture analysis with the online available tool Gene Structure Display Server 2.0 (GSDS, https://gsds.cbi.pku.edu.cn/) (Hu et al. 2015).

3.5 Catalytic residue depiction through multiple sequence alignment

The GST amino acid sequences of C. cajanwere aligned with A. thaliana, O. sativa,and G. max, GST protein sequences usingClustal Omega (Sievers et al. 2011). The signature sequences and the conserved catalytic residues were visualized through ESPript 3.0 (http://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi) (Robert and Gouet 2014).

3.6 Secondary structure prediction of CcGSTs

To identify the secondary structure elements viz. alpha-helix, beta-strand, and the random coil of the CcGSTs,the online available SOPMA tool (Self-Optimized Prediction Method with Alignment) was used (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html) (Combet et al. 2000).

3.7 Post-translational modifications prediction in CcGSTs

In CcGSTs, the significant post-translational modification sites viz. phosphorylation and glycosylation sites were predicted through local prediction software GPS5.0 (http://gps.biocuckoo.cn/online.php) (Wang et al. 2019a) with the threshold of GPS5.0 as high and NetNGlyc 1.0 server (http:// www.cbs.dtu.dk/services/NetNGlyc/) (Gupta et al. 2004) respectively with default parameters.

3.8 Expression profiles of Cajanus GST genes in different tissues

Toanalyze the expression levels of CajanusGST genes in different tissues (root, shoot, leaf, flower, cotyledons, etc and in different developmental stages (vegetation, senescence, reproduction, and germination) the RNA-seq data was obtained from the LIS database by the accession number of 68 CcGST genes.The expression profiling was done based on their transcripts per kilobase million (TPM) values. On basis of the log values of each gene in a given tissue, the heatmap was drawn by the TBtools software (Chen et al. 2018).

3.9 Molecular docking analysis with eight herbicide safeners

Based on the higher expression level of CcGSTU38 in most of the developmental stages, this gene as selected for molecular docking studies with eight herbicide safeners. The safeners used in the current study are Fenclorim, Benoxacor, Flurazole, Dichlormid, Oxabetrinil, Fluxofenim, Cyometrinil, and Triapenthenol. The homology modeling was done for CcGSTU38, using the online available tool SWISS-Model server (https://swissmodel. expasy.org) (Arnold et al. 2006). The best template was selected for modeling based on model-template sequence identity (Fiser et al. 2004) and maximum coverage. The 3D structures of herbicides were downloaded from the PubChem database (http://www.pubchem.ncbi.nlm.nih.gov) in SDF format. These files were converted into PDB format using the PyMol tool and used for docking studies with CcGSTU38. The molecular docking study was done with the AutoDock v.4 tool (Morris et al. 2009).

Conclusively, comprehensive genome-wide identification and characterization of the GST gene family in pigeon pea led to the identification of 68 GST genes. All the candidate genes were further analyzed for their physical properties, gene architecture, motif analyses, gene duplication analyses, expression profiling, etc. The identified CcGST genes can potentially be used for molecular and functional characterization in this agriculturally important crop. The individual CcGST genes can be cloned and characterized, and their expression can be studied in different tissues under normal developmental and diverse stress conditions. The results of the current study can be utilized for pigeon pea plant breeding programs for developing high-yielding and/or stress-tolerant varieties.

Funding

No funding was received for conducting this study.

Competing Interest

The authors have no relevant financial or non-financial interests to disclose.

Authors Contribution

Material preparation, data collection and experiments were done by S.V., N. S., M. J., R. S., A. P., A. A., and M. V. The results were analyzed by S.V. and M.B. The first draft of the manuscript was written by S. V., M. J., M. B., D. G. and G. K. All authors read and approved the final manuscript.

Abdul Kayum M, Nath UK, Park JI, Biswas MK, Choi EK, Song JY, Kim HT, Nou IS (2018) Genome-wide identifcation, characterization, and expression profling of glutathione S-transferase (GST) family in pumpkin reveals likely role in cold-stress tolerance. Genes 9:84. https://doi.org/10.3390/genes9020084
Ahmad MZ, Nasir JA, Ahmed S et al (2020) Genome-wide analysis of glutathione S-transferase gene family in G. max. Biologia 75:1691–1705. https:// doi. org/ 10. 2478/ s11756- 020- 00463-5
Armenteros JJA, Sonderby CK, Sonderby SK, Nielsen H, Winther O (2017) DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics 21:3387–4339. https:// doi.org/ 10. 1093/ bioin forma tics/ btx431
Arnold K, Bordoli L, Kopp J, Schwede T (2006) The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22:195–201. 10.1093/bioinformatics/bti770
Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, et al. MEME SUITE: tools for motif discovery and searching. Nucleic acids research. 2009; 37(suppl_2):W202–W8. https://doi.org/10.1093/nar/gkp335 PMID: 19458158
Basantani M, Srivastava A (2007) Plant glutathione transferases: a decade falls short. Can J Bot 85:443–456. https://doi.org/10.1139/B07-033
Basantani M, Srivastava A, Sen S (2011) Elevated antioxidant response and induction of tau-class glutathione S-transferase after glyphosate treatment in Vigna radiata (L.) Wilczek. Pesti Biochem Physiol 99:111–117. https://doi.org/10.1016/j.pestbp.2010.11.007
Cao Q, Lv W, Jiang H, Chen X, Wang X, Wang Y (2022) Genome-wide identification of glutathione S-transferase gene family members in tea plant (Camellia sinensis) and their response to environmental stress. Int. J. of Biological Macromolecules. 205: 749-760. https://doi.org/10.1016/j.ijbiomac.2022.03.109.
Chen CJ, Chen H, He YH, Xia R (2018) TBtools, a toolkit for biologists integrating various biological data handling tools with a user-friendly interface. BioRxiv. https:// doi. org/ 10. 1101/ 289660
Chronopoulou E, Georgakis N, Nianiou-Obeidat I, Madesis P, Perperopoulou F, Pouliou F et al (2017) Plant glutathione transferases in abiotic stress response and herbicide resistance. Glutathione in Plant Growth, Development, and Stress Tolerance. Springer, London, pp 215–233
Dixon DP, Edwards R. (2010) Glutathione transferases. Arabidopsis Book. 8:e0131. doi: 10.1199/tab.0131.
Dong Y, Li C, Zhang Y, He Q, Daud MK, Chen J et al (2016) Glutathione S-transferase gene family in Gossypium raimondii and G. arboreum: Comparative genomic study and their expression under salt stress. Front in Plant Sci 7:139. https:// doi. org/ 10. 3389/ fpls. 2016. 00139
Edwards R, Del Buono D, Fordham M, Skipsey M, Brazier M, Dixon DP, Cummings I.
(2005) Differential induction of glutathione transferases and glucosyltransferases in wheat, maize and Arabidopsis thaliana by herbicide safeners. Z Naturforsch C J Biosci. 60(3-4):307-16. doi: 10.1515/znc-2005-3-416.
Emefiene ME, Joshua VI, Nwadike C, Yaroson AY, Zwalnan NDE. (2014) Profitability analysis of pigean pea (Cajanus cajan) production in Riyom LGA of plateau state. Int. Lett. Nat. Sci., 13 (2).
FAO Statistics: Pigeon Producing Countries. Production and Area Harvested. Food and Agriculture Organization of the United Nations, Rome (2017).
Fang X, An Y, Zheng J, Shangguan L, Wang L. (2020) Genome-wide identification and comparative analysis of GST gene family in apple (Malus domestica) and their expressions under ALA treatment. 3 Biotech. 10(7):307. doi: 10.1007/s13205-020-02299-x.
Fiser A (2004) Template-based protein structure modeling. Methods Mol Biol 673:73-94. 10.1007/978-1-60761-842-3_6
Gao J, Chen B, Lin H, Liu Y, Wei Y, Chen F, Li W (2020) Identifcation and characterization of the glutathione S-transferase (GST) family in radish reveals a likely role in anthocyanin biosynthesis and heavy metal stress tolerance. Gene 743:144484. https://doi.org/10.1016/j. gene.2020.144484
Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins M, Appel RD et al (2005) Protein identification and analysis tools on the ExPASy server. The proteomics protocols handbook 571- 607.https://doi.org/10.1385/1-59259-890-0:571
Ghangal R, Rajkumar MS, Garg R et al (2020) Genome-wide analysis of glutathione S-transferase gene family in chickpea suggests its role during seed development and abiotic stress. Mol Biol Rep 47:2749–2761. https:// doi. org/ 10. 1007/ s11033- 020- 05377-8
González-Faune P, Sánchez-Arévalo I, Sarkar S, Majhi K, Bandopadhyay R, Cabrera-Barjas G, Gómez A, Banerjee A (2021) Computational study on temperature driven structure–function relationship of polysaccharide producing bacterial glycosyl transferase enzyme. Polymers 13:1771. https://doi.org/10.3390/polym13111771
Gupta R, Jung E, Brunak S. Prediction of N-glycosylation sites in human proteins. 2004.
Hasan MS, Islam S, Hasan MN, Sajib SD, Ahmed S, Islam T, et al. (2020) Genome-wide analysis and transcript profiling identify several abiotic and biotic stress-responsive Glutathione S-transferase genes in soybean. Plant Gene. 23:100239. https://doi.org/10.1016/j.plgene.2020.100239
Hasan MS, Singh V, Islam S, Islam MS, Ahsan R et al (2021) Genomewide identification and expression profiling of glutathione S-transferase family under multiple abiotic and biotic stresses in Medicago truncatula L. PLoS ONE 16:e0247170. https:// doi. org/ 10. 1371/ journ al. pone. 02471 70
Holub EB (2001) The arms race is ancient history in Arabidopsis, the wildflower. Nat Rev Genet 2:516–527. https:// doi. org/ 10. 1038/ 35080 508
Horton P, Park K, Obayashi T, Fujita N, Harada H, Adams-Collier, CJ et al (2007) WoLF PSORT: protein localization predictor. Nucleic Acids Research, 35 (Web server issue): W585-W587. https://doi. org/10.1093/nar/gkm259
Hu B, Jin J, Guo AY, Zhang H, Luo J, Gao G (2015) GSDS 2.0: an upgraded gene feature visualization server. Bioinformatics 31:1296–1297. https://doi.org/10.1093/bioinformatics/btu817
Jain M, Ghanashyam C, Bhattacharjee A (2010) Comprehensive expression analysis suggests overlapping and specific roles of glutathione S-transferases during development and stressresponses in rice. BMC Genomics 11:73. https:// doi. org/ 10. 1186/1471- 2164- 11- 73
Kayum MA, Nath UK, Park JI, Biswas MK, Choi EK, Song JY, Kim HT, Nou IS (2018) Genome-wide identifcation, characterization, and expression profling of glutathione S-transferase (GST) family in pumpkin reveals likely role in cold-stress tolerance. Genes 9(2):84. https://doi.org/10.3390/genes9020084
Koch MA, Haubold B, Mitchell-Olds T. (2000) Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Molecular biologyand evolution. 17(10):1483–98. https://doi.org/10.1093/oxfordjournals.molbev.a026248
Kong X, Lv W, Jiang S, Zhang D, Cai G, Pan J, Li D (2013) Genomewide identification and expression analysis of calcium-dependent protein kinase in maize. BMC Genom 14:433. https:// doi. org/ 10. 1186/ 1471- 2164- 14- 433
Kumar S, Stecher G, Li M, Knyaz C, Tamura K (2018) MEGA X: Molecular Evolutionary Genetics Analysis across computing platforms. Mol Biol Evol 35:1547–1549. https:// doi. org/ 10. 1093/ molbev/ msy096
Labrou NE, Papageorgiou AC, Pavli O, Flemetakis E (2015) Plant GSTome: structure and functional role in xenome network and plant stress response. Current Op in Biotechnol 32:186–194. https:// doi. org/ 10. 1016/j. copbio. 2014. 12. 024
Lallement PA, Brouwer B, Keech O, Hecker A, Rouhier N (2014) The still mysterious roles of cysteine-containing glutathione transferases in plants. Front Pharmacol 5:192. https://doi.org/ 10.3389/fphar.2014.00192
Letunic I, Bork P (2018) 20 years of the SMART protein domain annotation resource. Nucleic Acids Res 46(D1):D493–D496. https://doi.org/10.1093/nar/gkaa937
Liu YJ, Han XM, Ren LL, Yang HL, Zeng QY (2013) Functional divergence of the glutathione S-transferase supergene family in Physcomitrella patens reveals complex patterns of large gene family evolution in land plants. Plant Physiol 161:773–786. https:// doi. org/ 10. 1104/ pp. 112. 205815
Marchler Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S et al (2017) CDD/SPARCLE: functional classifcation of proteins via subfamily domain architectures. Nucleic Acids Res 45:D200– D203. https://doi.org/10.1093/nar/gkw1129
Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al. (2009) AutoDock4 and Auto-DockTools4: Automated docking with selective receptor flexibility. Journal of computational chemistry. 30(16):2785–91. https://doi.org/10.1002/jcc.21256
Mohanta TK, Khan A, Hashem A et al (2019) The molecular mass and isoelectric point of plant proteomes. BMC Genomics 20:631. https://doi.org/10.1186/s12864-019-5983-8
Nianiou-Obeidat I, Madesis P, Kissoudis C, Voulgari G, Chronopoulou E, Tsaftaris A, Labrou NE (2017) Plant glutathione transferase-mediated stress tolerance: functions and biotechnological applications. Plant Cell Rep 36:791–805
Pal D, Mishra P, Sachan N, Ghosh AK. (2011) Biological activities and medicinal properties of Cajanus cajan (L) Millsp. J Adv Pharm Technol Res. 2(4):207-14. doi: 10.4103/2231-4040.90874.
Pandey P, Pandey V, Kumar R and Tiwari D. (2013) Imperative quality factors which influence nutritional value of pigeonpea [(Cajanus cajan (l.) millsp.)]. Agri-environment: perspectives on sustainable development, 169. http://www.editura.bioflux.com.ro/docs/belgianromanian-book.doc.pdf#page=169
Robert X, Gouet P (2014) Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res 42:W320–W324. https:// doi. org/ 10. 1093/ nar/ gku316
Shao D, Li Y, Zhu Q, Zhang X, Liu F, Xue F, Sun J (2021) GhGSTF12, a glutathione S-transferase gene, is essential for anthocyanin accumulation in cotton (Gossypium hirsutum L.). Plant Science. 305:110827. https://doi.org/10.1016/j.plantsci.2021.110827
Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539.https:// doi. org/ 10. 1038/ msb. 2011. 75
Song W, Zhou F, Shan C, Zhang Q, Ning M, Liu X, Zhao X, Cai W, Yang X, Hao G, Tang F (2021) Identification of glutathione S-transferase genes in Hami melon (Cucumis melo var. saccharinus) and their expression analysis under cold stress. Front. Plant Sci. 12:672017. https:// doi. org/ 10. 3389/ fpls. 2021. 672017
Suyama M, Torrents D, Bork P (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34:W609–W612. https:// doi. org/ 10. 1093/ nar/ gkl315
Sylvestre-Gonon E, Law SR, Schwartz M, Robe K, Keech O, Didierjean C, Dubos C, Rouhier N, Hecker A (2019) Functional, structural and biochemical features of plant serinyl-glutathione transferases. Front Plant Sci 10:608. https://doi.org/10.3389/fpls. 2019.00608
Sylvestre-Gonon E, Schwartz M, Girardet JM, Hecker A, Rouhier N (2020) Is there a role for tau glutathione transferases in tetrapyrrole metabolism and retrograde signalling in plants? Phil Trans R Soc B 375:20190404. https://doi.org/10.1098/rstb.2019.0404.
Vaish S, Awasthi P, Tiwari S, Tiwari SK, Basantani MK (2018) In silico genome-wide identification and characterization of glutathione S-transferase gene family in Vigna radiata (L.) Wilczek. Genome 61:311–322. https:// doi. org/ 10. 1139/ gen- 2017- 0192
Vaish S, Gupta D, Basantani MK (2020) Glutathione S-transferase: a versatile protein family. 3 Biotech 10:321. https://doi.org/10.1007/ s13205-020-02312-3
Vaish S, Parveen R, Gupta D, Basantani MK. (2022) Genome-wide identification and characterization of glutathione S-transferase gene family in Musa acuminata L. AAA group and gaining an insight to their role in banana fruit development. J Appl Genet. doi: 10.1007/s13353-022-00707-x.
Varshney R, Chen W, Li Y. et al.(2012) Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 30: 83–89. https://doi.org/10.1038/nbt.2022
Wang C, Xu H, Lin S, Deng W, Zhou J, Zhang Y, Shi Y, Di P, Xue Y (2019a) GPS 5.0: an update on the prediction of kinase-specifc phosphorylation sites in proteins. Genomics Proteomics Bioinformatics 18:72–80. https://doi.org/10.1016/j.gpb.2020.01.001
Wang R, Ma J, Zhang Q et al (2019b) Genome-wide identification and expression profling of glutathione transferase gene family under multiple stresses and hormone treatments in wheat (Triticum aestivum L.). BMC Genomics 20:986. https://doi.org/10.1186/ s12864-019-6374-x
Yang Q, Liu YJ, Zeng QY (2014) Biochemical functions of the glutathione transferase supergene family of Larix kaempferi. Plant Physiol Biochem 7:99–107. https://doi.org/10.1016/j.plaphy.2014. 02.003
Yu CS, Chen YC, Lu CH, Hwang JK (2006) Prediction of protein subcellular localization. Proteins 64:643–651. https://doi.org/10. 1002/prot.21018
Zechmann B, Mauch F, Sticher L, Müller M (2008) Subcellular immunocytochemical analysis detects the highest concentrations of glutathione in mitochondria and not in plastids. J of Exp Botany 59:4017–4027. https://doi.org/10.1093/jxb/ern243

Table 1

S. No	Accession No.	Gene Name	Chr No	Start	End	Gene length	Protein length (aa)	pI	Mol wt Protein (kDa)	GRAVY	Aliphatic Index	Subcellular localization
1	C.cajan_19719	CcGSTU1	CcG01	5548927	5550582	1655	226	7.18	26.23	-0.266	94.14	Cy^a,b,c
2	C.cajan_06128	CcGSTU2	CcLG02	17239883	17242083	2200	184	4.95	20.95	-0.149	102.34	Cy^a,b,c
3	C.cajan_06167	CcGSTU3	CcLG02	17661646	17663969	2323	225	5.08	25.69	-0.184	96.67	Cy^a,b,c
4	C.cajan_07967	CcGSTU4	CcLG02	36338463	36339269	806	221	7.11	25.91	-0.376	94.39	Cy^a,c, Cp^b
5	C.cajan_11866	CcGSTU5	CcLG06	8958362	8959376	1014	230	5.46	25.96	-0.098	103.91	Cy^a,c, Cp^b
6	C.cajan_17901	CcGSTU6	CcLG07	6315205	6316370	1165	240	5.81	27.43	-0.161	95.83	Cy^a,b,c
7	C.cajan_19169	CcGSTU7	CcLG07	19460956	19462278	1322	220	8.06	26.05	-0.329	86.82	Cy^a,c, Nu^b
8	C.cajan_19173	CcGSTU8	CcLG07	19510179	19512107	1928	221	5.51	25.86	-0.388	82.04	Cy^a,b,c
9	C.cajan_19174	CcGSTU9	CcLG07	19523663	19525551	1888	219	8.26	25.69	-0.38	78.77	Cy^a,c, Cp^b
10	C.cajan_19175	CcGSTU10	CcLG07	19526756	19527797	1041	221	5.91	25.62	-0.261	84.66	Cy^a,b,c
11	C.cajan_19176	CcGSTU11	CcLG07	19528572	19529591	1019	219	5.44	25.60	-0.458	89.91	Cy^a,b,c
12	C.cajan_19177	CcGSTU12	CcLG07	19531933	19533287	1354	220	6.61	25.79	-0.423	82.41	Cy^a,b,c
13	C.cajan_19178	CcGSTU13	CcLG07	19533741	19535046	1305	218	5.34	25.46	-0.43	95.18	Cy^a,b,c
14	C.cajan_19179	CcGSTU14	CcLG07	19548305	19551176	2871	318	6.10	36.92	-0.481	84.31	Cy^a,b,c
15	C.cajan_22474	CcGSTU15	CcLG09	3981850	3984942	3092	239	7.39	27.85	-0.218	91.76	Cy^a,b, Per^c
16	C.cajan_22479	CcGSTU16	CcLG09	4017629	4018492	863	234	5.34	27.64	-0.158	89.49	Cy^a,c, Per^b
17	C.cajan_22480	CcGSTU17	CcLG09	4019865	4020667	802	224	5.60	26.56	-0.23	95.27	Cy^a,b,c
18	C.cajan_22481	CcGSTU18	CcLG09	4026458	4027524	1066	212	5.73	24.85	-0.257	90.99	Cy^a,b,c
19	C.cajan_22482	CcGSTU19	CcLG09	4033868	4034677	809	237	5.85	27.86	-0.266	94.14	Cy^a,b,c
20	C.cajan_14010	CcGSTU20	CcLG10	7142293	7143371	1078	220	5.63	25.63	-0.305	95.27	Cy^a,b,c
21	C.cajan_02149	CcGSTU21	CcLG11	23623461	23626440	2979	225	5.50	26.02	-0.214	104.8	Cy^a,b,c
22	C.cajan_02150	CcGSTU22	CcLG11	23631853	23633450	1597	226	8.41	25.76	-0.259	94.12	Cy^a,b,c
23	C.cajan_02151	CcGSTU23	CcLG11	23650129	23651208	1079	220	6.14	25.12	-0.265	97.55	Cy^a,b,c
24	C.cajan_03944	CcGSTU24	CcLG11	43332427	43333200	773	218	5.76	25.40	-0.304	93.81	Cy^a,b,c
25	C.cajan_29423	CcGSTU25	Scaffold128736	198746	200317	1571	221	6.10	25.23	-0.268	90.36	Cy^a,b,c
26	C.cajan_33225	CcGSTU26	Scaffold137510	100812	101276	464	131	4.65	14.93	0.107	94.35	Cy^a,b,c
27	C.cajan_34134	CcGSTU27	Scaffold126240	75866	77514	1648	221	8.48	25.28	-0.224	100.59	Cy^a,c, Cp^b
28	C.cajan_34135	CcGSTU28	Scaffold126240	80947	82151	1204	231	5.72	26.50	-0.164	94.16	Cy^a,c, Mt^b
29	C.cajan_34137	CcGSTU29	Scaffold126240	89735	90963	1228	232	5.94	26.61	-0.181	92.93	Cy^a,c, Cp^b
30	C.cajan_37029	CcGSTU30	Scaffold134540	58635	60264	1629	224	8.41	25.94	-0.332	95.67	Cy^a,b,c
31	C.cajan_37031	CcGSTU31	Scaffold134540	64323	72848	8525	240	7.83	27.97	-0.361	92.54	Cy^a,c, Cp^b
32	C.cajan_37033	CcGSTU32	Scaffold134540	74605	76621	2016	231	6.25	27.13	-0.227	92.38	Cy^a,b,c
33	C.cajan_39470	CcGSTU33	Scaffold129475	213526	215012	1486	231	7.07	26.36	-0.189	96.62	Cy^a,b,c
34	C.cajan_41658	CcGSTU34	Scaffold127338	93880	95187	1307	217	6.33	25.19	-0.421	87.14	Cy^a,b,c
35	C.cajan_41659	CcGSTU35	Scaffold127338	100593	102341	1748	219	5.53	25.63	-0.342	93.97	Cy^a,b,c
36	C.cajan_42053	CcGSTU36	Scaffold132540	50366	51360	994	225	5.69	25.74	-0.14	101.33	Cy^a,b,c
37	C.cajan_42056	CcGSTU37	Scaffold132540	71385	72144	759	142	5.96	16.44	-0.244	103.52	Cy^a,b,c
38	C.cajan_43561	CcGSTU38	Scaffold132561	16706	17908	1202	225	5.42	25.70	-0.143	98.27	Cy^a,b,c
39	C.cajan_43562	CcGSTU39	Scaffold132561	25593	26443	850	223	6.14	25.80	-0.184	99.24	Cy^a,c, Cp^b
40	C.cajan_43564	CcGSTU40	Scaffold132561	71356	72476	1120	225	5.20	25.97	-0.218	96.58	Cy^a,b,c
41	C.cajan_46982	CcGSTU41	Scaffold111076	2	238	236	78	6.91	9.00	0.3	132.31	Ex^a, ER^c
42	C.cajan_47302	CcGSTU42	Scaffold136978	2648	3471	823	195	6.53	22.43	-0.312	94	Cy^a,b,c
43	C.cajan_47784	CcGSTU43	Scaffold120162	669	1212	543	124	6.93	14.47	-0.51	78.71	Cy^a,b,c
44	C.cajan_48663	CcGSTU44	Scaffold117397	716	973	257	85	5.04	9.71	-0.087	91.65	Cy^a,b,c
45	C.cajan_19620	CcGSTF1	Cc01	4459169	4461250	2081	200	5.28	22.69	-0.167	98.95	Cy^a,b,c
46	C.cajan_20350	CcGSTF2	Cc01	12289496	12290617	1121	268	8.81	31.55	-0.412	95.97	Cy^a,b,c
47	C.cajan_10442	CcGSTF3	Cc03	24189048	24189959	911	219	5.52	24.77	-0.229	83.65	Cy^a,b,c
48	C.cajan_30612	CcGSTF4	Scaffold126590	303094	303880	786	200	8.42	22.65	-0.408	81	Nu^a,b, Mt^c
49	C.cajan_35918	CcGSTF5	Scaffold000080	61507	63060	1553	215	5.62	24.77	-0.31	98.28	Cy^a,b,c
50	C.cajan_35920	CcGSTF6	Scaffold000080	91820	92835	1015	215	6.01	25.41	-0.468	92.37	Cy^a,b,c
51	C.cajan_35921	CcGSTF7	Scaffold000080	110824	111768	944	214	6.01	24.93	-0.381	95.14	Cy^a,c,Nu^b
52	C.cajan_38864	CcGSTF8	Scaffold000297	11450	12787	1337	214	6.07	24.38	-0.365	77.48	Cy^a,c,Cp^b
53	C.cajan_40757	CcGSTF9	Scaffold126759	40157	41984	1827	215	5.73	24.76	-0.225	94.6	Cy^a,b,c
54	C.cajan_11526	CcGSTT1	Cc06	5431051	5433841	2790	251	7.77	28.62	-0.359	89.36	Cy^a, Cp^b,Nu^c
55	C.cajan_11527	CcGSTT2	Cc06	5435759	5442606	6847	249	9.69	28.14	-0.155	106.95	Mt^a,b,c
56	C.cajan_18537	CcGSTZ1	Cc07	13321598	13325264	3666	312	6.43	35.08	0.13	93.37	PM^a, Cp^b, Cy^c
57	C.cajan_24593	CcGSTZ2	Scaffold133855	502947	507373	4426	224	6.17	25.52	-0.193	98.84	Mt^a, Cy^b,c
58	C.cajan_09785	CcGSTL1	Cc03	18108770	18110656	1886	229	5.97	26.16	-0.301	86.86	Cy^a,b,c
59	C.cajan_09786	CcGSTL2	Cc03	18116299	18121790	5491	259	5.46	29.42	-0.402	81.97	Cy^a,b,c
60	C.cajan_39542	CcGSTL3	Scaffold135072	57076	59912	2836	249	5.13	28.60	-0.196	97.83	Cy^a,c
61	C.cajan_39543	CcGSTL4	Scaffold135072	65458	68078	2620	264	5.24	30.05	-0.456	83.03	Cy^a,b,c
62	C.cajan_39544	CcGSTL5	Scaffold135072	71710	74934	3224	264	4.83	30.68	-0.222	94.51	Cy^a,b,c
63	C.cajan_19585	CcDHAR1	Cc01	4039548	4044850	5302	264	8.80	29.53	-0.285	87.46	Cp^a,b,c
64	C.cajan_07926	CcDHAR2	Cc02	36047272	36049735	2463	212	6.16	23.19	0.001	99.29	ER^a, Cy^b,c
65	C.cajan_17280	CcEF1G1	Cc08	20131871	20133776	1905	385	6.50	44.13	-0.431	79.01	Cy^a,b,c
66	C.cajan_24568	CcEF1G2	Scaffold133855	169436	171929	2493	384	5.82	43.85	-0.445	80	Cy^a,b,c
67	C.cajan_06611	CcGHR1	Cc02	22735452	22736730	1278	372	7.94	41.31	-0.187	86.72	Cp^a,b, Mt^c
68	C.cajan_13909	CcGHR2	Cc10	6130850	6133468	2618	356	6.67	40.91	-0.419	78.9	Mt^a, Cp^b,c

Table 2

S. No.	Gene Name 1	Gene Name 2	d_N	d_S	d_N/d_S	Duplication time (Mya)	Duplication Type	Selection type
1	CcGSTU8	CcGSTU9	0.1161	0.3893	0.2983	12.98	Tandem	Purifying
2	CcGSTU9	CcGSTU12	0.0801	0.2527	0.317	8.42	Tandem	Purifying
3	CcGSTU28	CcGSTU29	0.0768	0.2731	0.2811	9.10	Tandem	Purifying
4	CcGSTU36	CcGSTU37	0.0269	0.0465	0.5788	1.55	Tandem	Purifying
5	CcGSTU38	CcGSTU40	0.0728	0.2424	0.3003	8.08	Tandem	Purifying
6	CcGSTF5	CcGSTF6	0.1106	0.9951	0.1112	33.17	Tandem	Purifying
7	CcGSTF6	CcGSTF7	0.0602	0.2415	0.2493	8.05	Tandem	Purifying
8	CcGSTL3	CcGSTL4	0.0514	0.1415	0.3631	4.72	Tandem	Purifying
9	CcGSTL3	CcGSTL5	0.1305	0.282	0.4627	9.40	Tandem	Purifying
10	CcGSTL4	CcGSTL5	0.0907	0.2131	0.4258	7.10	Tandem	Purifying
11	CcGSTU14	CcGSTU41	0.2114	0.3569	0.5923	11.90	Segmental	Purifying
12	CcGSTU20	CcGSTU41	0.1671	0.1193	1.4004	3.98	Segmental	Positive
13	CcGSTU34	CcGSTU43	0.0632	0.0739	0.8546	2.46	Segmental	Purifying
14	CcGSTU36	CcGSTU38	0.0741	0.2869	0.2581	9.56	Segmental	Purifying
15	CcGSTU36	CcGSTU40	0.0971	0.3835	0.2532	12.78	Segmental	Purifying
16	CcGSTU38	CcGSTU37	0.0623	0.2949	0.2114	9.83	Segmental	Purifying
17	CcGSTU40	CcGSTU37	0.0853	0.3675	0.2321	12.25	Segmental	Purifying
18	CcGSTL2	CcGSTL3	0.1275	0.4873	0.2617	16.24	Segmental	Purifying
19	CcEF1G1	CcEF1G2	0.0596	0.7272	0.0819	24.24	Segmental	Purifying

Table 3

S.No.	Safeners	Lowest Binding Energy (GSH)
1	Benoxacor	-3.41
2	Dichlormid	-5.03
3	Dietholate	-4.73
4	Fenclorim	-5.44
5	Flurazol	-5.33
6	Fluxofenim	-5.17
7	Oxabetrinil	-5.02
8	Triapenthenol	-5.48

No competing interests reported.

CajanusSupplementarydata.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Genome-wide identification and characterization of Glutathione S-transferase gene family in Cajanus cajan and their expression profiling under different developmental stages in anatomical tissues

Status:

Version 1

Abstract

Figures

Introduction

Results

Discussion

Methods

Conclusion

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1