Sequencing of the T. coccineum genome
Paired-end (PE) libraries and mate-pair (MP) libraries (with 3-, 5- and 8-kb insert sizes; MP-3kb, MP-5kb, and MP-8kb, respectively) of the T. coccineum genome were generated and then sequenced using Illumina NovaSeq 6000 instruments. Miseq (MS) libraries were generated, then sequenced, using the Illumina MiSeq system. For long reads, a PacBio (PB) library was generated and sequenced using PacBio Sequel II system. The total base-counts of the sequence reads for PE, MP-3kb, MP-5kb, MP-8kb, MS, and PB amounted to 854 Gb, 99 Gb, 108 Gb, 109 Gb, 27 Gb, and 93 Gb, respectively (Table 1).
Table 1
Statistics of sequence reads
Library | Insert size (bp) | Read length (bases) | Number of reads | Total read length (bases) |
PE | 350 | 151 | 5,732,398,372 | 854,270,829,961 |
MP-3kb | 3000 | 151 | 698,859,570 | 99,096,491,543 |
MP-5kb | 5000 | 151 | 750,513,382 | 107,931,325,410 |
MP-8kb | 8000 | 151 | 762,709,822 | 109,359,355,641 |
MS | 550 | 301 | 97,731,712 | 26,503,089,921 |
PB | | Ave. 10,738 | 8,670,092 | 93,100,193,428 |
Read lengths for Illumina NGS (PE, MP, MS) were indicated in max length, and PacBio read (PB) was indicated in average length. PE: paired-end; MP: mate-pair; MS: Miseq; PB: PacBio; NGS, next-generation sequencing. |
Size estimation of the T. coccineum genome
Prior to genomic DNA assembly, we estimated the 1C DNA content of T. coccineum by flow cytometry, using the Chrysanthemum seticuspe genome (cultivar Gojo-0, 3 pg/1C)5 as an internal standard. The estimated T. coccineum size was 9.4 pg/1C (Supplemental Fig. 1A), which corresponds to 9.4 Gb. This size is approximately 1.6 times larger than the 5.8 pg/1C obtained in a previous report in which the size was measured by Feulgen densitometry genome estimation6. To further validate the 9.4 Gb genome size, we performed a k-mer spectrogram analysis of the Illumina short reads of the T. coccineum genome (Supplemental Fig. 1B) using Jellyfish7. The k-mer spectrograms showed two major distributions: one with a maximum coverage of 1 and the other with a multimodal distribution with a maximum coverage of 44. Since the distribution with the maximum coverage=1 was considered to be caused by sequencing errors, we treated the data after coverage=11, which is the minimum value between the two distributions, as the k-mer derived from the correct genome. In accordance with previous studies8, the genome size estimated from the k-mer distribution with a maximum of 44 for coverage=11 and above was 9.8 Gb. These analyses led to the conclusion that the T. coccineum used in the present study has a genome size of approximately 9 Gb.
Sequence assembly and annotation of the T. coccineum genome
The reads obtained from next-generation sequencing (NGS) were subjected to contig assembly and scaffolding to estimate the genome sequence (Fig. 2). A collection of 6,500,576 contigs with a total length of 8.57 Gb (Table 2) was constructed by assembling PE reads and MS reads using SOAPdenovo9. The assembled contigs then were scaffolded with PB and MP reads using SSPACE10,11, which concatenates contig sequences, and the Gapfiller12 and TGS-Gapcloser programs13, which fill the inter-contig unknown bases (‘N’s) in scaffolds with ‘A/T/G/C’ (Fig. 2), as described in our previous report. Since the accuracy of gap-filled sequences depends on PB reads that exhibit lower sequence accuracy than PE reads, the scaffold sequences were polished using POLCA14. The total length of the resultant scaffolds was 9.46 Gb, which corresponded well with the flow cytometry-estimated genome size for T. coccineum (Supplemental Fig. 1). The N50 value of the scaffolds was 27.8 Kb, and the maximum length of the scaffolds was 331 Kb (Table 2). Subsequently, the draft genome was subjected to analysis by AUGUSTUS, resulting in the prediction of 1,582,136 putative genes.
Table 2
Statistics of genome assembly.
| Contigs | Scaffolds |
Total number of sequence fragments | 6,500,576 | 2,836,647 |
Total length (bp) | 8,565,698,618 | 9,463,677,832 |
N50 (bp) | 8,465 | 27,784 |
Length of longest contig (bp) | 149,916 | 331,286 |
GC content (%) | 34.9 | 35.1 |
Completeness of the draft genome sequences was evaluated using BUSCO16, which counts complete (C), fragmented (F), and missing (M) conserved genes in genome sequences. Sequence analysis of 1614 conserved core plant genes confirmed that 97.8% of conserved genes (92.7% as complete and 5.1% as fragmented) were present in the T. coccineum genome assembly (Table 3). These scores indicated quality as high as that obtained for T. cinerariifolium2, and we therefore used the T. coccineum draft genome for subsequent analysis.
Table 3
Annotation statistics for draft genome.
Number of predicted genes | 1,582,136 |
BUSCO v5 | C: 92.7% (Single: 70.8%, Duplicated: 21.9%) F: 5.1% M: 2.2% |
Number of predicted TEs | 772,794 |
Number of predicted genes encoding products with known protein signatures | 103,680 |
*C: percentage of full-length conserved genes in BUSCO notation; F: percentage of fragmented conserved genes in BUSCO notation; M: percentage of missing genes in BUSCO notation; TE: transposable element. |
Since a larger number of transposable elements (TEs) is detected than that of functional genes in the genomes of plants, including T. cinerariifolium2, we first analyzed the TE component of the assembled T. coccineum genome. TEs were detected and annotated using hmmpfam against the Gypsy database (GyDB)17, revealing the presence of 772,794 TEs. The value of 82,212 TEs/Gb in the T. coccineum genome was slightly larger than the 73,957 TEs/Gb observed in the T. cinerariifolium genome2. Furthermore, the predicted genes were subjected to InterProScan to provide high-confidence annotation, revealing the presence of 103,680 putative genes encoding products that exhibited known protein signatures. Thus, a high-quality 9.4-Gb T. coccineum draft genome was assembled and shown to include a total of 772,794 TEs and 103,680 plausible genes based on 854 Gb of PE reads, 316 Gb of MP reads, 26.5 Gb of MS reads, and 93.1 Gb of PB reads.
Inter-genus comparative analysis of TE classification
We analyzed the ratio of each TE clade against all TE regions in the genomes of T. coccineum, T. cinerariifolium, C. seticuspe, Artemisia annua, Helianthus annuus, Nicotiana tabacum, Oryza sativa, and Arabidopsis thaliana. T. cinerariifolium, C. seticuspe, A. annua, and H. annuus (which belong to the Asteraceae family), and N. tabacum, O. sativa, and A. thaliana (which are model organism) were used as described in the previous study2. The top-5 clade ratios of each plant are shown in Table 4. In T. coccineum, members of the sire-clade TEs were the most abundant TE clade, which was also observed in the three Asteraceae plants (T. cinerariifolium, C. seticuspe, and A. annua). In the T. cinerariifolium genome, the second-largest ratio of clades was athila, followed (in order) by del, oryco, and lentiviridae; in the T. coccineum genome, the second-largest clade was del, followed (in order) by athila, oryco, and tork. These results suggested that the del- and tork-clade TEs multiplied after evolutionary divergence of T. cinerariifolium and T. coccineum from a common ancestor.
Table 4
The top-5 most-abundant TE clades in each species.
Rank | Tco | Tci | Cs | Aa | Ha | Nt | Os | At |
1 | sire (25.7) | sire (33.0) | sire (32.0) | sire (21.8) | del (37.7) | del (40.4) | tat (11.4) | athila (9.54) |
2 | del (15.3) | athila (17.0) | athila (10.9) | athila (19.6) | sire (9.85) | tat (20.5) | retroviridae (8.97) | retroviridae (4.89) |
3 | athila (12.5) | del (12.0) | oryco (5.11) | del (6.57) | lentiviridae (8.72) | athila (9.87) | del (8.39) | caulimovirus (4.15) |
4 | oryco (7.25) | oryco (6.34) | lentiviridae (5.06) | oryco (4.59) | tat (6.76) | sire (3.02) | tork (4.73) | badnavirus (4.05) |
5 | tork (4.70) | lentiviridae (4.92) | del (5.03) | tork (4.01) | athila (5.17) | tork (2.80) | alpharetroviridae (4.66) | tork (3.08) |
*Parenthesized numbers indicate the ratio (%) of each clade against total TE regions in that species. Tco: T. coccineum; Tci: T. cinerariifolium; Cs: C. seticuspe; Aa: A. annua; Ha: H. annuus; Nt: N. tabacum; Os: O. sativa; At: A. thaliana. |
To examine whether del- and tork-clade TEs had multiplied in a common ancestor of the Asteraceae or independently in individual genera, we estimated molecular phylogenetic trees of the reverse transcriptase (RT) domains encoded by the del and tork sequences and evaluated the number of co-clustered genes in single-genus clusters, as described in the previous study2. In these molecular phylogenetic analyses, the TEs that multiplied in a common ancestor are positioned in orthologous clusters, while the TEs that multiplied after divergence from a common ancestor are positioned in clusters with the TEs from single plant species (multiplied clusters). The phylogenetic analysis revealed that 67%, 62%, 73%, 68%, and 86% of del TEs constituted multiplied clusters for T. coccineum, T. cinerariifolium, C. seticuspe, A. annua, and H. annuus, respectively (Fig. 3A). Likewise, 57%, 37%, 54%, 38%, and 71% of tork TEs were shown to be multiplied in the respective organisms (Fig. 3B). These results indicated that more than half of the del TEs and more than one-third of the tork TEs were multiplied in the individual genera, but the other TEs were conserved as common ancestor TEs within Asteraceae. Collectively, these results suggested that most of del and tork TEs were multiplied in the individual lineages of the respective Asteraceae genera, leading to the major TEs in T. coccineum.
Pyrethrin-related enzymes encoded in the T. coccineum genome
T. coccineum predicted proteins with high homology to 9 known T. cinerariifolium pyrethrin biosynthesis-related proteins (TciLOX118, TciADH219, TciALDH119, TciCDS20, TciGLIP21, TciJMH22, TciPYS23, TciCCH24, and TciCCMT24,) were detected by BLASTP25 (Table 5, Supplemental Fig. 2), indicating that a complete set of known pyrethrin-related enzymes is conserved in the T. coccineum genome.
Table 5
T. coccineum genome-encoded proteins corresponding to known pyrethrin biosynthesis-related proteins.
Known pyrethrin related enzymes | Corresponding proteins of T. coccineum | Protein sequence similarity |
TciLOX1 | Tco_0863779 | Identities = 847/861 (98%), Positives = 853/861 (99%), Gaps = 0/861 (0%) |
TciADH2 | Tco_0487905 | Identities = 340/378 (90%), Positives = 359/378 (95%), Gaps = 2/378 (1%) |
TciALDH1 | Tco_0682217 | Identities = 448/499 (90%), Positives = 471/499 (94%), Gaps = 1/499 (0%) |
TciCDS | Tco_1315810 | Identities = 358/395 (91%), Positives = 374/395 (95%), Gaps = 0/395 (0%) |
TciGLIP | Tco_1108878 | Identities = 337/365 (92%), Positives = 348/365 (95%), Gaps = 0/365 (0%) |
TciJMH | Tco_0572988 | Identities = 450/512 (88%), Positives = 479/512 (94%), Gaps = 2/512 (0%) |
TciPYS | Tco_1240348 | Identities = 465/488 (95%), Positives = 475/488 (97%), Gaps = 0/488 (0%) |
TciCCH | Tco_0360514 | Identities = 470/498 (94%), Positives = 484/498 (97%), Gaps = 1/498 (0%) |
TciCCMT | Tco_1190813 | Identities = 358/374 (96%), Positives = 361/374 (97%), Gaps = 5/374 (1%) |
The information of “Protein sequence similarity” column is introduced by BLASTP program with each known pyrethrin related enzymes as a query. Tci: T. cinerariifolium; Tco: T. coccineum; ADH2: alcohol dehydrogenase 2; ALDH1: aldehyde dehydrogenase 1; CCMT: 10-carboxychrysanthemic acid 10-methyltransferase; CDS: chrysanthemyl diphosphate synthase; CHH: chrysanthemol 10-hydroxylase; GLIP: GDSL (Gly-Asp-Ser-Leu motif) lipase; JMH: jasmone hydroxylase; LOX1: 13-lipoxygenase; PYS: pyrethrolone synthase. |
Synteny analysis of genes encoding pyrethrin-related enzymes
The distribution of genes within the scaffolds that included loci encoding proteins corresponding to TciLOX1, TciADH2, TciALDH1, TciCDS, TciGLIP, TciJMH, TciPYS, TciCCH, and TciCCMT were analyzed using the GenomeJack software program. All genes were located on separate scaffolds, and TEs were located on the flanking regions of the genes encoding all pyrethrin-related enzymes, with the exception of the genes encoding TcoCCMT and TcoGLIP. The gene encoding the Tco_1190812 protein, which contains a Jacalin-like lectin domain, was located upstream of the gene encoding the TcoCCMT protein (Fig. 4A). A BLASTP search using the Tco_1190812 sequence as a query detected a predicted protein with sequence similarity (E-value of 3 × 10−93; 92.72% identity) to a segment of an Artemisia annua mannose-binding lectin (accession No. PWA73033.1). In the T. cinerariifolium genome, a gene encoding a corresponding Jacalin-like lectin (Accession No. GEW32189.1) also was found upstream of the locus encoding TciCCMT, suggesting that this synteny is conserved.
TciGLIP (i.e., the T. cinerariifolium GDSL (Gly-Asp-Ser-Leu motif) lipase) is the key enzyme in the final esterification of pyrethrin biosynthesis21. Syntenic analysis showed that open reading frames (ORFs) encoding putative GLIPs are present in the regions downstream of both the TciGLIP- and TcoGLIP-encoding genes. However, the T. cinerariifolium glutathione S-transferase-encoding gene (accession No. GEU71427.1) positioned upstream of TciGLIP and the hypothetical protein-coding gene positioned downstream of tandem GLIP-encoding genes in T. cinerariifolium are replaced by TEs in the T. coccineum genome (Fig. 4B), suggesting that this tandem GLIP-encoding locus translocated after the divergence of T. coccineum and T. cinerariifolium. Although the transcriptional regulatory mechanism of the TciGLIP gene is yet to be determined, this difference in the flanking region of these GLIP-encoding genes provides a clue to investigating the mechanisms regulating the possible differential expression of the genes encoding TciGLIP and TcoGLIP.
Functional annotation of the T. coccineum genes and inter-genus comparative analysis
Next, we investigated multiplication ratios of protein superfamilies in T. coccineum compared with those in other species, including T. cinerariifolium, which was described in our previous study. In brief, the predicted protein data sets of T. coccineum, T. cinerafiiolium, C. seticuspe, A. annua, H. annuus, N. tabacum, O. sativa, and A. thaliana were subjected to analysis using InterProScan, and multiplication odds scores were calculated for each superfamily. A positive value for the multiplication odds score indicates that a genus possesses a higher number of multiplied genes in a given superfamily than other genera.
The highest and lowest multiplication odds scores for the biodefense-, signaling-, and metabolism-related were compared with those of other plants, including T. cinerariifolium (Table 6 and 7, respectively). To further compare T. coccineum with T. cinerariifolium, the multiplication odds scores for the superfamilies that were identified in the previous study2 and not listed in the highest or the lowest table (Table 6 and 7, respectively) are shown in Table 8.
Table 6
Superfamilies with highest multiplication odds scores in T. coccineum.
Category | IPR ID | Superfamily name | Tco | Tci | Cs | Aa | Ha | Nt | Os | At |
Biodefense | IPR036041 | Ribosome-inactivating protein | 1.96 (159) | 1.29 (98) | -1.81 (7) | -1.00 (16) | -3.07 (0) | -3.07 (0) | -0.94 (17) | -3.07 (0) |
Metabolism | IPR005848 | Urease, alpha subunit | 2.36 (108) | -0.14 (15) | -1.87 (1) | -1.46 (3) | -1.87 (1) | -0.87 (7) | -2.14 (0) | -1.87 (1) |
Metabolism | IPR036226 | Lipoxygenase, C-terminal domain | 1.86 (232) | 0.48 (86) | -0.22 (51) | -0.82 (32) | -1.12 (25) | -0.67 (36) | -1.86 (13) | -2.44 (7) |
Metabolism | IPR033966 | RuBisCO | 1.60 (42) | -0.25 (8) | 0.05 (11) | -0.15 (9) | -1.37 (1) | -0.95 (3) | -0.25 (8) | -1.15 (2) |
Metabolism | IPR032466 | Metal-dependent hydrolase | 1.48 (166) | 0.69 (94) | -1.13 (23) | -0.89 (28) | -0.32 (44) | -0.05 (54) | -1.61 (15) | -0.98 (26) |
Metabolism | IPR036849 | Enolase-like, C-terminal domain | 1.38 (71) | 0.91 (50) | -0.62 (14) | -0.55 (15) | -0.78 (12) | -0.29 (19) | -1.41 (6) | -1.29 (7) |
Metabolism | IPR036396 | Cytochrome P450 | 0.90 (1220) | 0.19 (745) | 0.16 (732) | 0.07 (688) | -0.20 (568) | -0.12 (600) | -1.05 (314) | -0.85 (361) |
Signaling | IPR035983 | HECT, E3 ligase catalytic domain | 1.22 (95) | 0.84 (72) | 0.10 (41) | -0.52 (25) | -0.90 (18) | -0.21 (32) | -1.84 (7) | -1.25 (13) |
Parenthesized numbers indicate the number of genes categorized in each superfamily. |
Tco: T. coccineum; Tci: T. cinerariifolium; Cs: C. seticuspe; Aa: A. annua; Ha: H. annuus; Nt: N. tabacum; Os: O. sativa; At: A. thaliana. |
Table 7
Superfamilies with lowest multiplication odds scores in T. coccineum.
Category | IPR ID | Superfamily name | Tco | Tci | Cs | Aa | Ha | Nt | Os | At |
Signaling | IPR039512 | RCHY1, zinc-ribbon | -1.18 (3) | -0.48 (8) | -0.09 (12) | -0.09 (12) | -0.01 (13) | 1.28 (39) | -0.72 (6) | -0.09 (12) |
*Parenthesized numbers indicate the number of genes categorized in each superfamily. |
Tco: T. coccineum; Tci: T. cinerariifolium; Cs: C. seticuspe; Aa: A. annua; Ha: H. annuus; Nt: N. tabacum; Os: O. sativa; At: A. thaliana. |
Table 8
Superfamilies with characteristic odds scores in T. cinerariifolium genome
Category | IPR ID | Superfamily name | Tco | Tci | Cs | Aa | Ha | Nt | Os | At |
Biodefense | IPR035992 | Ricin B-like lectins | 0.81 (44) | 1.41 (69) | -0.34 (17) | -0.05 (22) | -1.22 (7) | -0.80 (11) | -1.10 (8) | -1.48 (5) |
Biodefense | IPR036861 | Endochitinase-like | -0.13 (7) | -1.13 (1) | -0.39 (5) | -0.25 (6) | 0.53 (14) | 0.29 (11) | 0.09 (9) | 0.37 (12) |
Signaling | IPR036097 | Signal transduction histidine kinase, dimerization/phosphoacceptor domain | -0.11 (32) | 1.41 (101) | -0.62 (21) | -0.37 (26) | -0.28 (28) | 0.35 (46) | -1.74 (7) | -0.74 (19) |
Signaling | IPR024792 | Rho GDP-dissociation inhibitor domain | 0.48 (18) | 1.24 (34) | -0.14 (10) | -0.58 (6) | -0.34 (8) | -0.14 (10) | -1.04 (3) | -1.04 (3) |
Metabolism | IPR012347 | Ferritin-like | 0.72 (22) | 1.29 (35) | -0.03 (11) | -0.71 (5) | -0.57 (6) | -0.86 (4) | -1.23 (2) | -0.57 (6) |
Metabolism | IPR036909 | Cytochrome c-like domain | 0.40 (21) | 1.16 (39) | -0.50 (9) | -0.84 (6) | -0.22 (12) | -0.16 (17) | -0.82 (7) | -0.82 (7) |
Metabolism | IPR037069 | Acyl-CoA dehydrogenase/oxidase, N-terminal domain | 0.45 (22) | 1.05 (36) | -0.60 (8) | -0.30 (11) | -0.22 (12) | 0.22 (18) | -0.98 (5) | -0.84 (6) |
*Parenthesized numbers indicate the number of genes categorized in each superfamily. |
Tco: T. coccineum; Tci: T. cinerariifolium; Cs: C. seticuspe; Aa: A. annua; Ha: H. annuus; Nt: N. tabacum; Os: O. sativa; At: A. thaliana. |
Among biodefense-related superfamilies, genes encoding proteins with the “Ribosome-inactivating protein (RIP)” (IPR036041) domain showed multiplication in the T. coccineum genome, exhibiting a multiplication score of 1.96 (Table 6). In the previous study, this superfamily was also multiplied in T. cinerariifolium. Although the odds score of “Ribosome-inactivating protein (RIP)” (IPR036041) in the T. coccineum genome is 1.5 times higher than that in T. cinerariifolium (Table 6), the odds score of “Ricin B-like lectins” (IPR035992) in the T. coccineum genome is less than that of T. cinerariifolium (Table 8). RIPs, including ricin, show high toxicity to a wide range of species, including insects, bacteria, and viruses, serving as biodefense molecules for the producing plant26. RIPs are categorized into type I and type II due to the absence or presence (respectively) of the ricin B lectin domain27. The ricin B lectin domain is involved in internalization via binding of target cell glycans; therefore, type-II RIPs have higher toxicity than type-I RIPs. Taken together, these results demonstrated that genes encoding higher-toxicity type-II RIPs are multiplied (i.e., more abundant) in the T. cinerariifolium genome compared with the T. coccineum genome, suggesting that T. coccineum may have been subjected to, or may be more sensitive to, natural enemies under wild conditions, compared with T. cinerariifolium. In the previous study, a gene encoding a putative insecticidal type-II RIP Tci_399175 (accession No. GEY27201.1) that showed sequence similarity to the Sambucus nigra insecticidal RIP SNA-I (S. nigra agglutinin-I, accession No. O22415.1)28, was found in the T. cinerariifolium genome2. A BLASTP search of the T. coccineum genome with the SNA-I sequence returned Tco_1336120. An alignment of the RICIN domain, which is important for identifying target cells, confirmed that this putative insecticidal RIP is also encoded in the T. coccineum genome (Supplemental Fig. 3). These results indicated that a SNA-I-like insecticidal RIP is conserved in both Tanacetum species. In combination, these comparative analyses of the RIP genes verified that type-I and type-II RIPs are abundant in the T. coccineum and T. cinerariifolium genomes, respectively, suggesting distinct RIP-associated defense strategies between these two plants.
“Endochitinase-like superfamily” (IPR036861), which plays a pivotal role in defense against fungal pathogens, is present in the T. coccineum genome at levels similar to those seen in other genera, and more abundant than that seen in T. cinerariifolium (Table 8). While T. cinerariifolium is native to a region with a dry environment, T. coccineum is native to a region with a humid environment, indicating that these plants have been exposed to distinct natural enemies. These observations suggested that distinct defense strategies may have evolved in the different lineages leading to T. cinerariifolium and T. coccineum, reflecting the differing areas of origin of the two species.
The metabolism-related superfamily showed the highest gene multiplication values in T. coccineum, with genes encoding “Urease, alpha subunit” (IPR005848), “Lipoxygenase, C-terminal domain” (IPR036226), “RuBisCo” (IPR033966), “Metal-dependent hydrolase” (IPR032466), “Enolase-like, C-terminal domain” (IPR036849), and “Cytochrome P450” (IPR036396) exhibiting multiplication scores of 2.36, 1.86, 1.60, 1.48, 1.38 and 0.90 respectively (Table 6). In particular, genes encoding metalloproteins such as lipoxygenases, metal-dependent hydrolases, and cytochrome P450 were multiplied, as was observed for the T. cinerariifolium genome2. Since some pyrethrin-related proteins belong to the superfamily of cytochrome P450s or lipoxygenases, the corresponding genes may have multiplied in the common lineage shared by T. coccineum and T. cinerariifolium, given that both species have the ability to synthesize pyrethrins. The cytochrome P450 superfamily-encoding genes were 1.6 times more numerous in the T. coccineum genome than in the T. cinerariifolium genome. Molecular phylogenetic analysis showed that 57% of the T. coccineum cytochrome P450s were not included in orthologous gene clusters but constituted single-genus clusters (Fig. 5), suggesting that some of orthologous cytochrome P450s were multiplied in the T. coccineum-specific lineage. These results support the view that the cytochrome P450s of T. cinerariifolium and T. coccineum might have multiplied during the respective evolutionary processes and then acquired an ability to produce species-specific plant specialized metabolites, including pyrethrins. Moreover, these results suggested that species-specific secondary metabolites may be more abundant in T. coccineum than T. cinerariifolium. the Further investigation of T. coccineum secondary metabolites is needed.
While the high multiplication of proteins harboring the “HECT, E3 ubiquitin ligase catalytic domain” (IPR035983) was observed in T. coccineum (Table 6) (as is the case in T. cinerariifolium), low multiplication was seen for proteins containing “RCHY, zinc-ribbon” (IPR039512), a domain that is contained in the RING-finger-type E3 ubiquitin ligases (Table 7). These results suggested that genes encoding the HECT-type E3 ubiquitin ligases are multiplied in the T. coccineum genome, while genes encoding the RING-finger-type E3 ubiquitin ligases are not. Investigation of the biological significance of this apparent imbalance in the amplification of E3 ubiquitin ligase-encoding genes awaits further study.
Similarly, genes encoding proteins containing the “Signal transduction histidine kinase, dimerization/ phosphoacceptor domain” (IPR036097) are multiplied in T. cinerariifolium, but not in the T. coccineum genome (Table 8). In planta, histidine kinases are involved in responding to environmental stimuli, including sunlight, plant hormones, and ethylene. A. thaliana ethylene receptor 1 (AtETR1) is a typical gas-induced histidine kinase that possesses both a HATPase (histidine kinase-like ATPase) domain and a REC (phosphoacceptor receiver) domain29. We surveyed the histidine kinase-encoding genes of T. coccineum and T. cinerariifolium for the presence of HATPase and REC domains; data for the number of genes containing each domain are presented as Venn diagrams in Fig. 6A. The number of genes encoding both HATPase and REC domains were 13 and 38 for T. coccineum and T. cinerariifolium, respectively. A molecular phylogenetic tree of the predicted histidine kinase proteins is presented in Fig. 6B; the AtETR1 protein is indicated, as are clusters for the 5 and 4 paralogs found in T. cinerariifolium and T. coccineum, respectively. Comparative analysis detected not only orthologous clusters but also the apparent multiplication of a T. cinerariifolium-specific single-genus cluster (Fig. 6B, green).
Our previous study implicated a gas-induced histidine kinase in the VOC-mediated regulation of pyrethrin production in T. cinerariifolium2. The present study also suggested a correlation between the extent of pyrethrin production and the number of histidine kinase proteins in T. cinerariifolium and T. coccineum. These results suggested that T. cinerariifolium have acquired a gas (i.e., VOC) -induced pyrethrin production system via species-specific multiplication of histidine kinase-encoding genes. Investigation of the functional relationship between histidine kinases and VOC-induced pyrethrin production is underway.