Plant materials and mitochondrial DNA extraction
The fruits of A. lappa and A. tomentosum were collected from the herb garden of Liaoning University of traditional Chinese medicine (E 121° 52′, N 39° 03′ ) and from Urumqi, Xinjiang (E 84° 33′, N 44° 07′ ), both the plant fruits samples had the permission for biological experiments. Professor Tingguo Kang at Liaoning University of Traditional Chinese Medicine, identified the certificate specimens (A. lappa number: 10162190625306LY, A. tomentosum number: 10162190625307LY ), plant samples were deposited in the herbarium of Liaoning University of Traditional Chinese Medicine and the genomic DNA was stored in the Key Laboratory of Traditional Chinese Medicine in the University (Dalian, China, 116600), fruits were cultured in dark under the condition of 23 ℃ and 60% humidity. When the seedling weight was 3–5 g, the whole plant was collected, washed, frozen with liquid nitrogen, and then stored at -80 ℃ for standby. We used an improved extraction method [26] for the mitochondrial DNA isolation.
Mitochondria Dna Sequencing And Genome Assembly
After DNA isolation, 1 µg of purified DNA was fragmented to construct short-insert libraries (insert size 430 bp) according to the manufacturer’s instructions (Illumina), then sequenced on the Illumina Hiseq 4000 [27] (Shanghai BIOZERON Co., Ltd). The high molecular weight DNA was purified and used for PacBio library prep, BluePippin size selection, then sequencend on the Sequel Sequencer.
Prior to assembly, Illumina raw reads were filtered firstly. This filtering step was performed in order to remove the reads with adaptors, the reads showing a quality score below 20(Q < 20), the reads containing a percentage of uncalled based (“N” characters) equal or greater than 10% and the duplicated sequences. The mitochondria genome was reconstructed using a combination of the Pacbio Sequel data and the Illumina Hiseq data, and the following three steps were used to assemble mitochondria genomes. First, Assemble the genome framework by the both Illumina and Pacbio data using SPAdes v3.10.1 [28]. Second, verifying the assembly and completing the circle or linear characteristic of the mitochondria genome, filling gaps if there were. Third, clean reads were mapped to the assembled mitochondria genome to correct the wrong bases, judge if there is any insertion and deletion.
Genome Annotation
The mitochondria genes were annotated using homology alignments and denovo prediction, and the EVidenceModeler v1.1.1 [29] was used to integrate gene set. tRNA genes and rRNA genes were predicted by tRNAscan-SE [30] and rRNAmmer 1.2 [31]. A whole mitochondria genome Blast [32] search (E-value < = 1e-5, minimal alignment length percentage > = 40%) was performed against 5 databases. They are KEGG [33–35] (Kyoto Encyclopedia of Genes and Genomes), COG [36–37] (Clusters of Orthologous Groups), NR (Non-Redundant Protein Database databases), Swiss-Prot [38], and GO [39] (Gene Ontology). The circular of genome map was drawn using OrganellarGenomeDRAW v1.2 [40].
Comparative Analysis Of The Mitochondrial Genomes
Large repeat sequence analysis
We used the software REPuter (http://bibiserv.techpak.uni-bielefeld.de/computer/), the minimum sequence length was 30 bp, and the editing distance was 3. Use the following four repetition methods to find the long repetition sequence: F: forward, R: reverse, C: complexity, P: palindromic.
Simple Sequence Repeats (ssr) Analysis
The SSR software MicroSAtellite (MISA) (http://pgrc.ipk-gatersleben.de/misa/) was used to identify SSR sequences and tandem repeats of 1–6 nucleotides were considered microsatellites. The minimum numbers of repeats were set to 10, 6, 5, 5, 5, and 5 for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides, respectively. The minimum distance between two SSR was set to 100 bp. Finally, software primer 3 (http://www.simgene.com/primer3) was designed for SSR sequence obtained from MISA recognition.
Snp And Indel Detection
SNP mainly refers to DNA sequence polymorphism caused by single nucleotide variation at the genome level. InDel refers to the insertion and deletion sequence of small segments in genome. In this section, in order to identify sequence variations in the known genes as well as the ORFs between A. lappa and A. tomentosum mitochondrial sequences, SNPs and InDels were detected. The SNP annotation results (Additional file 10: Table S10) of A. lappa with A. tomentosum as reference sequence showed that the SNPs in coding region had no synonymous mutation at start codon and synonymous mutation at stop codon, nor nonsynonymous mutation at start codon and nonsynonymous mutation at stop codon, and no nonsense mutation at the same time. There were 6 synonymous mutations and 13 nonsynonymous mutations in the gene. There were 32 intergenic mutations. InDel annotation results (Additional file 11: Table S11) of A. lappa with A. tomentosum as reference sequence showed that only 3 InDels existed between genes, and the results of gene mutation type caused by InDel with A. tomentosum as reference sequence showed that there were no gene mutation existed.
Syntenic sequence analysis and SV analysis between mitochondrial genomes of A. lappa and A. tomentosum
Through the syntenic sequence between the genomes, we can observe the insertion and deletion of the sequence between the genome of the target species and the reference genome. From the statistical table (Table 3) of collinear comparison coverage of five Asteraceae plants, we could find A. lappa and A. tomentosum mitochondrial genome sequence alignment regions accounted for 100% of the whole genome, and Fig. 4 illustrated that the two showed a complete collinearity. We compared the mitochondrial genomes of A. lappa and A. tomentosum with those of three other Asteraceae plants, and the results showed a very high similarity. Arctium species differed greatly from other 3 Asteraceae plants in their mitochondrial genomes, in which the percentage of regions aligned with Helianthus annuus mitochondrial genome sequences was only 51.7% of the whole genome, and the number of comparison blocks was 96. Additional file 14–15
Table 3
Coverage statistics of collinearity comparison of 5 Asteraceae plants
Species
|
Total length of sequence alignment region (bp)
|
Total Sequence Length (bp)
|
Sequence alignment regions as a percentage of the entire genome (%)
|
Total length of sequence alignment region (bp)
|
Total Sequence Length (bp)
|
Sequence alignment regions as a percentage of the entire genome (%)
|
Number of comparison blocks
|
A. lappa VS A. tomentosum
|
312609
|
312609
|
100
|
312598
|
312598
|
100
|
1
|
A. lappa VS Helianthus annuus
|
156224
|
312598
|
49.98
|
155596
|
300945
|
51.7
|
96
|
A. lappa VS Diplostephium hartwegii
|
168718
|
312598
|
53.97
|
168492
|
277718
|
60.67
|
81
|
A. lappa VS Chrysanthemum boreale
|
157655
|
312598
|
50.43
|
157319
|
211002
|
74.56
|
70
|
A. tomentosum VS Helianthus annuus
|
156244
|
312609
|
49.98
|
155596
|
300945
|
51.7
|
96
|
A. tomentosum VS Diplostephium hartwegii
|
168723
|
312609
|
53.97
|
168492
|
277718
|
60.67
|
81
|
A. tomentosum VS Chrysanthemum boreale
|
157599
|
312609
|
50.41
|
157254
|
211002
|
74.53
|
71
|
Figure S1-2 showed that the mitochondrial genomes of Arctium plants had lower collinearity with that of Helianthus annuus and slightly higher collinearity with that of Chrysanthemum boreale (74.56% and 74.53%). This indicated that the mitochondrial genomes of plants in different genera varied greatly even in the one Asteraceae family. The microbial genome has dense functional genes, and the occurrence of structural variations will cause the loss or alteration of multiple gene functions, resulting in changes in microbial phenotypes, functional differences and pathogenicity. For further clarifying the difference between the two mitochondrial genomes, SV were investigated between A. lappa and A. tomentosum using A. tomentosum as reference (Fig. 5). No translocations, inversions, translocations and deletions were found in the mitochondrial genomes of A. lappa and A. tomentosum. The results indicated that the mitochondrial genome similarity between A. lappa and A. tomentosum was very high.
Gene transfer between chloroplast genome and mitochondrial genome in two Arctium plants
Biomass (information) exchanges occur between subcellular units or organelles in eukaryotic cells to coordinately regulate various life activities of cells [48]. Recent studies have shown that information exchange and transfer between chloroplasts and mitochondria exists in plants and induces the occurrence of PCD (programmed cell death), but the mechanism of action has not been fully analyzed [49]. Both of the chloroplast genomes and mitochondrial genomes of A. lappa and A. tomentosum had the phenomenon of gene exchange and transfer (Fig. 6–7). Among them, 48 transfer segments with a similarity of not less than 80% we calculated had a total length of 8229 bp, the shortest transfer segments length was 45 bp, and the longest transfer segments length was 2532 bp, respectively (Additional file 12–13: Table S12-13). This phenomenon was found in other plants such as Ginkgo biloba and Salvia miltiorrhiza.
Core, Specific and Pan Gene analysis
The homologous genes present in all samples are regarded as common genes (Core genes), after removing the common genes, the non-common genes (Dispensable genes) are obtained, and the specific genes are the only genes that are specifically owned by the sample. All non-shared genes are merged with shared genes as the Pan genes. Among them, the core gene and the specific gene are likely to correspond to the commonality and characteristics of samples, which can be used as the basis for the study of functional differences between samples. Core genes and specific genes were analyzed for A. lappa and A. tomentosum and three other Asteraceae plants (Fig. 8). There were 354 genes and 22 core genes for these five Asteraceae plants, and 1, 2, 0, 1 and 0 specific genes for A. lappa, Helianthus annuus, Chrysanthemum boreale, Diplostephium hartwegii and A. tomentosum, respectively. There were 95 Dispensable genes. Among them, the specific genes of A. lappa were orf115a, orf873 and rps12 for Helianthus annuus, and rps19 for Diplostephium hartwegii. The number of Pan genes was 117. These core and specific genes were likely to correspond to the commonality and characteristics of these five plants, which can provide a basis for the study of functional differences among different species.
Syntenic Sequence And Structural Variation Analysis
MUMmer software was used to compare the 5 genomes, including A. lappa, A. tomentosum genomes and 3 Asteraceae plants genomes downloaded from the NCBI (Chrysanthemum Boreale, genbank accession number: NC039757; diplosephium hartwegii, genbank accession number: NC034354; Helianthus annuus, genbank accession number: NC023337), and the large-range collinearity relationship between genomes was determined. Later, LASTZ was used to compare the regions, confirm the local position arrangement relationship, and find the regions of translocation (Translocation/Trans), inversion (Inversion/Inv), and translocation + inversion (Trans + Inv).
The mitochondrial genomes of A. lappa and A. tomentosum were compared using MUMmer software, and then the regions were compared using LASTZ to find the SV from the regional comparison results.
Gene transfer between chloroplast genome and mitochondrial genome
The chloroplasts of A. lappa and A. tomentosum were compared with their mitochondrial genomes by BLSTN, respectively. The selected parameter E value was less than 1e− 10.
Core, Specific and Pan Gene analysis
Using cd-hit (v4.6.1, http://cd-hit.org) software, the protein sequences of multiple samples to be analyzed were clustered, and the screening parameters for Identity and alignment length (requiring identity > 50% and coverage > 50% of clustering) were set. The clustering of all protein sequences was obtained according to the results of software analysis. Core and Pan gene sets were constructed by comparing the protein sequences of A. lappa, A. tomentosum genomes and 3 Asteraceae plants genomes.
Phylogenetic Analysis
In order to determine the phylogenetic position between Arctium and other genera in Asteraceae, phylogenetic tree was constructed based on the whole mitochondrial genomes of 28 species, including A. lappa and A. tomentosum, 25 species of the other genera in Asteraceae and other Asterids plants, 1 species of the outgroup (Ginkgo biloba). All GenBank accession numbers were listed (Additional file 1: Table S1). The PhyML V3.0 software was used to construct a phylogenetic tree by maximum likelihood method (ML), bayes correction, 1000 bootstrap replicates to calculate bootstrap values [41].