Balanced gene expression of tetraploidy-produced genes
To find whether the tetraploidy-produced genes have similar or divergent expression levels, we inferred 4713 pairs of tetraploidy-produced genes with grape orthologs. The present inferred gene pairs are an update of our previous work performed. These poplar pairs of genes and their grape orthologs are all in colinearity, supporting their origination by the tetraploidy event. We distinguished the two sub-genomes of poplar tetraploid events according to grape chromosome numbers and identified them with purple rectangle and green rectangle(Supplementary Figure 1).
We downloaded expression data of five tissues (xylem, phloem, shoot, leaf, and root) and two wood forming cell types (fiber and vessel) of poplar [17]. Each of the tissues and cell types has three repeats. In total, we have 24 datasets. Using paired sample T-test, for each dataset, we compared expression of duplicated copies, which were naturally classified into 19 groups as to from which chromosome are their grape orthologs located (Table 1).
In each dataset of studied tissues, for each group, we found that duplicated genes are of the same expression levels. Actually, of the total 456 comparisons, only a tiny fraction of 3.1% (or 14) show significant different expression (P-value < 0.05). This means that the expressional divergence occurs at a mere 3.0% of all comparisons. Besides, if the significance level is set to be 0.01, none comparison shows divergent expression.
Gene expression and gene retention
We checked whether gene expression is related to gene retention. We counted numbers of retained poplar collinear genes as to 19 grape chromosomes, and inferred the average expression of these collinear genes. By performing Pearson correlation test, we found that there was moderate correlation between the retention of collinear genes and the expression level (Figure 1; Pearson coefficient = 0.269, P-value = 0.102).
KEGG and GO analysis of duplicates
We performed KEGG of all poplar genes (Supplementary Table 1). Of the 4,713 tetraploidy-produced duplicates, actually, only 29 pairs have been annotated with both copies differentially annotated, accounting up to 0.6% of all annotated pairs
Among 4713 pairs of poplar genes, we screened the top 500 pairs of genes with the greatest difference in expression by T-test, and we performed KEGG enrichment analysis over them (Figure 2). We found that the most significant enrichment was about carbon fixation in photosynthesis, and then carbon metabolism and fructose and mannose metabolism. All of these are about the most important functions of life support activities.
As to the pathway annotation (Figure 3), gene pairs mainly concentrated in metabolism, genetic information processing, environmental information processing, cellular processes, and organismal systems. Among them, the genes concentrated on metabolism were the most, and 403 genes were distributed in 10 sub-functions such as carbohydrate metabolism, energy metabolism, amino acid metabolism, etc. Other functional pathways involve smaller number of genes, for example, genetic information processing: 99 genes, cellular processes: 39 genes, organismal systems: 14, environmental information processing: 19 genes, respectively.
As to GO analysis, duplicated regions did not show any divergence about gene ontology enrichment. For example, we checked poplar duplicated regions orthologous to grape chromosome 2, the collinear genes preserved show no difference in GO item enrichment (Figure 4).
Tetraploidization and gene evolution
An example of gene family can help understand gene copy number variation, gene loss and divergent evolutionary rate. Calcium-dependent protein kinases ( CDPKs) play crucial roles in regulation of plant development and tolerance of various environmental stresses. Gene expression profiling showed that a number of Populus CDPK differentially expressed across different tissues and developmental stages. So we downloaded the CDPK gene family sequence of Arabidopsis thaliana and searched and retrieved their homologs in the poplar and grape genomes. These poplar and grape genes were constructed into phylogenetic trees by MEGA(Figure 5). There are 9 genes in grape, and 17 genes in poplar, showing a near doubled number of CDPK genes in poplar as to grape. Actually, all these genes are in colinearity within/between genomes, suggesting the copy number increase in poplar is a direct outcome of its specific tetraploidization. No recent tandem duplication was found.
There is clear evidence of genome fractionation by gene loss. At least three poplar paralogs (of pt16G00564, pt05G01136, pt14G01035, respectively) were lost after the tetraploidization, and one grape gene orthologous to pt16G01172 and pt06G01013 was lost. There are six subgroups each with a grape gene and two corresponding poplar orthologs, duplicated in the poplar tetraploidization.
In five out of six subtrees, as expected the grape gene is the outgroup of the poplar duplicates. However, there is one subtree, in which a poplar duplicate is outgroup to the grape gene and the other poplar duplicate, showing an aberrant subtree topology. This can be explained by elevated evolutionary rate in the poplar duplicate coming to be the outgroup.