Library sequencing and assembly
A total of 66.13Gb clean data was obtained from 9 transcriptome libraries of A. carmichaeli’s true root, lateral roots and “bridge”. There were 77,202,226 reads in true root, 70,043,540 reads in “bridge” and 73,419,580 reads in lateral roots. The Q30 of each sample was 93.21% or higher and GC content of 45.22%. A total of 28,185 Unigenes were obtained after assembly, with a total length of 42,159,509. The N50 of Unigenes was 1,627 bp, among which 15,651 Unigenes were more than 1kb in length.
Based on the data of 9 samples, the genes whose expression threshold was greater than or equal to 0.1 were screened by union method, and compared according to the same sample mixing tank. The results showed that 16,478 genes were expressed in true root, 16,743 in “bridge” and 17,552 in lateral root (Fig.2). Among them, 14,915 genes were expressed in all tissues, and 1,197 genes were expressed only in lateral roots. There are many genes that the other parts don’t have in the lateral roots, There are many genes in the lateral roots that don’t exist in the true root and “bridge”, indicating that the lateral roots have different growth and development mechanisms.
Functional annotation and enrichment analysis of expressed genes
To obtain a comprehensive annotation of A. carmichaeli transcriptome, 36,203 full-length transcripts was annotated by searching against seven protein databases and a total of 28,185 transcripts were annotated. In addition, 8018 unannotated unigenes might represent novel A. carmichaeli genes
BLASTx similarity analysis against the Nr database demonstrated that the A. carmichaeli full-length transcripts were similar to several plant species (Table 2). Among them, 12,477 (45.96%) transcripts showed significant homology with that of Aquilegia coerulea and 1215 (4.48%) and 724 (2.67%) transcripts had high similarity with sequences of Macleaya cordata and Nelumbo nucifera, respectively.
Based on the results of Nr annotation, 13,932 unigenes were assigned to 44 functional groups in GO database, and most of the unigenes showed more than one functional group, and we totally detected 107,745 hits followed by 6986 unigenes and 6977 unigenes, 8093 unigenes in the remaining three main functional categories, i.e., ‘cellular component’, ‘ biological process’ and ‘molecular function’, respectively (Fig.3).The dominant subgroups were ‘oxidation-reduction process’, ‘translation’ and ‘transmembrane transport’ which were annotated genes 2770, 1465, 1103 respectively in the group of biological processes. Among cellular component functions, 6474, 2494, 2347 annotated genes were classified into ‘the integral component of membrane’, ‘nucleus’, and ‘cytosol respectively’. In the group of molecular function, ‘ATP binding’, ‘metal ion binding’, ‘structural constituent of ribosome’ were the principal GO-terms comprising of 3296, 1725, 1629 annotated genes respectively. These functional categories are important activities in plants and participate in the biosynthesis of metabolites. A total of 10,735 unigenes that annotated by the COG database were functionally classified into 25 molecular families. the five largest categories were “Translation, ribosomal structure and biogenesis” (1650,15.54%), “Posttranslational modification, protein turnover, chaperones” (1183,11.14%), “General function prediction only” (1 173,11.05%), “Carbohydrate transport and metabolism” (1078,10.15%) and “Energy production and conversion” (861~8.11%). KEGG pathway enrichment analysis is helpful for functional genes identification, understanding the functions of genes in the biosynthetic pathways and annotated a total of 12,104 unigenes and assigned them to five main categories and 131 biological pathways. The largest pathway was the “Ribosome” pathways containing 1207 transcripts. Moreover, a number of transcripts were assigned to other significant pathways, such as biosynthesis of amino acids and carbon metabolism.
Enrichment analysis of metabolic pathways of differentially expressed genes
To investigate and understand the variation of transcript abundance and expression patterns of genes, we carried out a comparative analysis of the differential genes of true root (A), “bridge” (B), lateral root (C) of A. carmichaelii (A vs. B, A vs. C and B vs. C) and the results were displayed in Table 3. Moreover, true root and lateral root had the most specifically expressed differential genes (1468), while lateral root and “bridge” had fewer differential genes (1248) and fewer differences between true root and “bridge”. 81genes were differentially expressed in all comparison groups, suggesting that there was a larger biological differences between true root and lateral root of A. carmichaelii and these genes may play an important role in the metabolism of different root of A. carmichaelii.
To obtain insight into the functional categories of the DEGs between true root and lateral root, the GO enrichment analysis was performed using Goatools (Fisher exact test, P-value≤0.05). a total of 646 DEGs were annotated into GO database, The most enriched GO category among these DEGs was ‘catalytic activity’ (GO: 0005488,399 DEGs), followed by ‘metabolic process’ (GO:0008152, 370 DEGs), ‘cellular process’ (GO: 0009987,334 DEGs), ‘binding’(GO: 0005488,287 DEGs). ‘cells’ (GO: 0005623, 257 DEGs), ‘cell part’ (GO:0044464, 254 DEGs). Thus, the growth and development of between true root and lateral root in A. carmichaelii is complex and various
Through KEGG enrichment search, we found that there were 31 transcriptome differential genes between true root and “bridge”, which were distributed in 23 metabolic pathways. These metabolic pathways can be classified into three categories, the most is metabolism, and followed by environmental signal processing; There are 498 transcriptome differential genes between true root and lateral roots, which are distributed in 100 metabolic pathways. These metabolic pathways can be classified into five categories, the most is metabolic pathway, and followed by genetic information processing. There are 331 transcriptome differential genes between “bridge” and lateral roots, which are mainly distributed in 77 metabolic pathways. These metabolic pathways can be classified into four categories, the most of which are metabolic pathway. The pathways that displayed significant changes between true root and lateral root were identified using the KEGG database. A total of 12 KEGG pathways were significantly enriched (Table 4), among which the ‘Starch and sucrose metabolism’, ‘Ribosome’, ‘Carbon metabolism’, ‘Phenylpropanoid biosynthesis’, and ‘Plant hormone signal transduction’ pathways were the most highly represented. The largest number of DEGs were in the ‘Starch and sucrose metabolism’ category (ko00500), indicating that starch and sucrose play an important role in the growth and development of lateral roots. The ‘plant hormone signal transduction’ pathway (ko04075) exhibited the 15 DEGs, indicating that plant hormones play important roles in the growth and development of roots in A. carmichaelii.
Statistics of SNPs and SSRs
Simple Sequence Repeat (SSR) and Single Nucleotide Polymorphisms (SNP) are important marker types for screening transcriptome sequence differences among trueroot, lateral root and “bridge” of A. carmichaeli. In this study, 3 838 SSR markers were obtained from single gene sequence structure analysis of 9 transcriptome libraries, and most of them were single base repeats (2038 genes), and then three base repeats (1080 genes) and two base repeats (517 genes) were followed. The results showed that there were 676,573 SNPs in the transcriptome library of three true root samples, 661 848 SNPs in three “bridge” samples and 673 537 SNPs in three lateral root samples.
Transcription factors(TF)prediction
The Unigene annotated in this study was compared with PlantTFDB (plant transcription factor database) and AnimalTFDB (animal trnscription factor database) databases to predict the transcription factor and the family information. A total of 1910 expressed TFs belonging to 211 TF families were identified from the transcriptome dataset (Fig.5). Among them, the most abundant TF family was The Cys2His2 (C2H2)-type zinc-finger protein (ZFP) family, which is one of the largest class of plant TFs and have been extensively studied and have been shown to play important roles in plant development and environmental stress responses by transcriptional regulation. The expression pattern analysis of the nine transcripts indicated that TFs regulating the growth and development of roots in A. carmichaeli displayed high expression in true root and lateral roots.