Gymnema sylvestre R. Br. (family Asclipiadaceae) is a medicinally important, perennial, woody climber species that grows in the tropical regions [1, 2]. Commonly known as madhunashini in India, G. sylvestre is used to treat diabetes since ages [2, 3]. Besides maintaining blood glucose homeostasis, the leaves of the plant enhance uptake and activities of glucose-utilizing enzymes by insulin-dependent pathways [3]. The plant is also used for the treatment of obesity and other ailments of animals and humans [1]. The medicinal properties of G. sylvestre could mainly be attributed to the presence of dammarane and oleanane classes of triterpene saponins and polyphenols [1, 3].
In recent years, several studies have generated the transcriptome data of G. sylvestre to gain insights into the genes regulating various biosynthetic pathways [1, 3, 4]. On the other hand, viral taxonomy is witnessing a shift from the traditional taxonomy to the sequence-based taxonomy, which advocates the inclusion of viral sequences known only from metagenomic data in the International Committee on Taxonomy of Viruses (ICTV) taxonomy scheme for comprehensively characterizing global virome [5]. The acceptance of sequence-based viral taxonomy paved way for data-driven virus discovery studies that identified several putative novel plant viruses in various plant transcriptomes [6, 7, 8, 9, 10, 11, 12]. Thus, in the present study, we explored the transcriptome datasets of G. sylvestre available in public domain for novel plant viral sequences and identified a putative novel cholivirus.
A total of ten transcriptome datasets of G. sylvestre (belonging to four Bioprojects: PRJNA325591, PRJNA399247, PRJNA491644, PRJNA543558) available in Sequence Read Archive (SRA) database of National Centre for Biotechnology Information (NCBI) were searched individually for the presence of reads of novel plant viruses using the RNA-dependent RNA Polymerase (RdRp) search tool available in Serratus [13] (alignment identity: ≥45%; score: ≥10). Raw reads of all the putative virus-positive libraries were imported directly into the Galaxy Australia server (https://usegalaxy.org.au/) [14] and trimmed using Trimmomatic v 0.36 [15]. Trimmed reads were assembled using rnaviralspades v 3.15.4 [16] and subjected to BLASTn analysis against the genome sequences of related viruses using blastn tool of ncbi blast+ v 2.10.1 [17] in the Galaxy server. Obtained viral contigs were examined for the presence of intact open reading frame (ORF) through NCBI ORF Finder (https://www.ncbi.nlm.nih.gov/orfnder/). Molecular weight estimation and motif prediction were performed using the tools mentioned in [18] while BLASTp analysis of encoded proteins were performed against ‘non-redundant’ database of NCBI for obtaining the maximum percent identity values at maximum query coverage (qcov) and minimum expectation value (evalue). Putative cleavage sites were predicted in viral polyproteins by comparing the known polyprotein sequences of related viruses. Trimmed reads of virus-positive libraries were mapped onto the recovered viral genomes and mean coverage values were obtained as described in [12]. Neighbourhood-joining (NJ) trees were generated using Poisson model with 1000 bootstrap replicates after MUSCLE alignment of protein sequences in MEGA7 v 7.0.26 [19] and the sequence identity matrices were generated using the Sequence Demarcation Tool v 1.2 [20].
Reads of a putative novel virus tentatively named as Gymnema sylvestre virus 1 (GysV1) that shared sequence similarities with known choliviruses were identified in all the libraries of the Bioproject PRJNA399247 through Serratus RdRp search and the details of each cholivirus-positive library are provided in Table 1. After assembly and BLASTn analysis, the two longest non-redundant viral contigs of lengths 6.35 kb and 3.98 kb that shared sequence identities with RNA1 and 2 of known choliviruses were regarded as RNA1 and 2 of GysV1, respectively.
GysV1 RNA 1 (BK062888) encoded for a 1929 aa (219.42 kDa) long polyprotein with RNA helicase (PF00910) and viral RdRp (PF00680) motifs that shared maximum sequence identity of 44.82% (at maximum qcov of 94%) with polyprotein 1 of Dioscorea mosaic associated virus (DMaV), a member of the subgenus Cholivirus. Putative cleavage sites predicted in GysV1 polyprotein 1 are Q(427)/T, Q(957)/G and Q(1186)/G. GysV1 RNA2 (BK062889) encoded for a 1132 aa (125.57 kDa) long polyprotein with no predicted motif that shared 34.22% sequence identity (at maximum qcov of 93%) with polyprotein 2 of DMaV. Putative cleavage site predicted in GysV1 polyprotein 2 is K(329)/G (Fig. 1).
Phylogenetic analysis based on the conserved protease-polymerase (Pro-Pol) region placed GysV1 in a distinct sub-clade that was related to Ananas comosus secovirus (AcSV) and DMaV (Fig 2a) while GySV1 was placed in a sister clade to DMaV in the polyprotein 2-based phylogenetic tree (Fig. 2b). Sequence identity matrix revealed the maximum identities of GysV1 Pro-Pol (60.80%) and polyprotein 2 (36.10%) sequences with AcSV Pro-Pol and DMaV polyprotein 2 sequences, respectively (Fig. 2c, d).
Recently, the genus Sadwavirus under the family Secoviridae was reorganized into three sub genera- Satsumavirus, Stramovirus and Cholivirus [21]. Currently, the sub genus Cholivirus contains three recognized members- chocolate lily virus A, DMaV, pineapple secovirus A and two putative members- AcSV and pineapple secovirus B [22, 11]. Based on the <80% and <75% amino acid sequence identity criteria for the conserved Pro-Pol and polyprotein 2 sequences, respectively to regard a member of the family Secoviridae as a new species [23], GysV1 can be designated as a new member of the subgenus Cholivirus (genus Sadwavirus). Further studies are needed to understand the biological properties and prevalence of GysV1.