Primer design and in silico evaluation of primer coverage
To design PCR primers that specifically amplify the bacterial 16S, sequence alignment was performed using plant Mt18S and Ct16S and bacterial 16S. Because plant bacterial microbiota is mainly composed of Proteobacteria, Actinobacteria, Bacteroidetes and Firmicutes [19, 29, 39, 40], we selected 33 species from these four phyla (Fig. S1A) and aligned the corresponding 16S with Mt18S and Ct16S of several plants, including O. sativa, A. thaliana, and Zea mays. The primer selection criteria included the following: the 3′ end should be an exact match to bacterial 16S but mismatch to Mt18S or Ct16S, with fewer than two mismatches between a primer and its bacterial 16S template. Four primers with a mismatch(es) at the 3′ end to either Mt18S or Ct16S were selected, including 322F with one mismatch to Mt18S, 796R with two mismatches to Ct16S, 799F [30] with two mismatches to Ct16S and 1107R with three mismatches to Mt18S (Fig. 1A and S1B).
According to the up to 500-bp read-length of the amplicons by the Illumina Miseq PE300 platform, the designed primers could be combined into two sets of PCR primers, which include 322F/796R which amplifies the bacterial 16S V3–V4 regions generating an amplicon length of about 454 bp and 799F/1107R which amplifies the V5–V6 regions of about 308 bp (Fig. 1B). We then evaluated the overall coverage of the primer pairs and their specificity to domain Bacteria. Probe sequences were blasted against the nonredundant SSU r132 using TestProbe 3.0 with the default settings, while sequences of the primer pairs were blasted against the SILVA Database SSU r132 using TestPrimer 1.0 with the default settings. As shown in Table 1, all the four primer pairs are specific to domain Bacteria. At zero or one-mismatch stringency, primer pair 322F/796R showed overall coverage higher than 799F/1107R.
Experimental evaluation of designed primer pairs using soil samples
Soil microbiota was used as the material to compare the overall coverage and accuracy of each primer pair in deciphering a microbiota structure. Soil represents the most diverse ecosystem on earth, with exceptionally high bacterial species diversity. Planting soils are colonized by bacterial communities with more abundant bacteria and wider species diversity than those of bacterial communities on and inside plants. It is generally believed that the bacteria inside plant roots and leaves are mainly derived from the soil through root absorption [12, 32, 41, 42]. Meanwhile, planting soil does not contain any plant DNA that would affect bacterial-16S amplification. Using planting soil as the material, the designed primer sets were compared experimentally with the widely used primer pairs 338F/806R and 799F/1193R.
We extracted DNA from soil samples and conducted PCR using the protocol described above. The amplicon libraries were sequenced using the Illumina Miseq PE300 platform. In total, the four primer sets revealed 7,656 OTUs. Because the four primer sets amplified different 16S rDNA regions, the four 16S amplicon libraries did not share any identity at the OTU level (Fig. 2A). At the genus and phylum levels, the four 16S amplicon libraries shared 48.06% and 75.96% identities, respectively (Fig. 2B and C). When the same or similar 16S regions were amplified, the amplicon libraries shared higher identities. At the OTU, genus, and phylum levels, the amplicon libraries obtained using 322F/796R and 338F/806R shared 70.27%, 82.28%, and 90.48% identity, respectively, and those obtained using 799F/1107R and 799F/1193R shared 63.87%, 77.85%, and 90.48% identity, respectively. These results indicated a better comparability when the same 16S variable regions were amplified.
Community β-diversity was analyzed by principal coordinate analysis (PCoA) based on Bray–Curtis distances. At the OTU level, the first principal coordinate accounted for 70.44% of the variability, and the samples with different 16S variable-region amplified were separated along that axis (Fig. 2D). Samples primed with the V3–V4 primer sets, including 338F/806R (Fig. 2D, red circle) and 322F/796R (Fig. 2D, blue triangle), clustered together. Those samples were separated from those primed with the V5–V7 and V5–V6 primer sets. The clustering pattern was similar at the species and genus levels (Fig. 2D). Because host compartment, environmental factors, and host genotype determine the diversity of microbiota composition at lower taxonomic ranks [12, 43, 44], that is, at the genus and species levels, it is better to use primers targeting the same 16S variable regions to produce more comparable sequencing results.
At various taxonomic levels, the four primer sets revealed similar community composition, especially the predominant taxa. However, some taxa exhibited differences in relative abundance among the different amplicon libraries, and a few low-abundant taxa were undetectable using some of the primer sets (Fig. 2E–F). At the phylum level, amplification with each primer set revealed the same four predominant phyla: Proteobacteria, Firmicutes, Actinobacteria, and Bacteroidetes (Fig. 2E). These four phyla constituted 85.68% (322F/796R), 81.61% (799F/1107R), 80.30% (799F/1193R), and 71.80% (338/806R) of the 16S amplicons. Among the four main phyla, Proteobacteria had the highest relative abundance (>30%), followed by Firmicutes (17%–20%) (Fig. 2E). The four primer sets revealed six other phyla with relative abundance of >1%: Acidobacteria, Chloroflexi, Nitrospirae, Verrucomicrobia, Gemmatimonadetes, and Patescibacteria. Among them, Chloroflexi exhibited significantly higher relative abundance in the 338F/806R 16S amplicon library (11.61%) than in the amplicon libraries obtained using the other three primer sets (less than 5%). A few phyla were amplified by only some primer sets, for example, Verrucomicrobia was not detected by 322F/796R and constituted 1.62%–3.53% relative abundance in the other three amplicon libraries (Fig. 2E).
At the class level, amplification using each primer set revealed seven highly abundant classes: Gammaproteobacteria, Deltaproteobacteria, Actinobacteria, Clostridia, Bacteroidia, Alphaproteobacteria, and Bacilli (Fig. 2F). The relative abundance of each class in the soil microbiota resolved by different primer sets is close to each other and the total abundances of the seven classes constituted 82.91% (322F/796R), 79.12% (799F/1107R), 76.51% (799F/1193R), and 69.77% (338F/806R) of the 16S amplicons. There were eight other classes whose relative abundance is over 1%. Amplification preference was observed among these classes, and it corresponded to that observed at the phylum level. For example, the class Anaerolineae (phylum Chloroflexi) exhibited significantly higher relative abundance in the 338F/806R 16S amplicon library (9.998%) than in the other three amplicon libraries (<2%). Class Subgroup_6 (phylum Gemmatimonadetes) was only detected using the primer pair 338F/806R. Class Verrucomicrobiae (phylum Verrucomicrobia) was not detected using the primer pair 322F/796R (Fig. 2F). These differences are summarized in Supplementary file 1: Fig. S2.
Because Chloroflexi exhibited high relative abundance in the 16S amplicon library obtained using 338F/806R (Fig. 2E and Fig. S2A), we analyzed this phylum at various lower taxonomic levels to explore its composition (Fig. S2B–D). Anaerolineae was the main detected class in Chloroflexi (Fig. S2B). At the order level, SBR1031, which belongs to Anaerolineae, was amplified only by the primer pair 338F/806R, while Anaerolineales was amplified by 338F/806R (40.65%) and 799F/1193R (1.10%) at very different relative abundance. The high abundance of SBR1031 and Anaerolineales was the reason for the high abundance of Chloroflexi detected by 338F/806R. Dehalococcoidales, which belongs to Dehalococcoidia, was much more abundant in the libraries obtained using 799F/1193R (14.63%) and 322F/796R (6.05%) than in those obtained using 338F/806R (<1%) and 799F/1107R (<1%) (Fig. S2C).
Definition of rice root and leaf endophytic bacteria
We observed root and leaf surfaces by SEM to determine whether any surface bacteria remained after surface sterilization. The results showed that it was easy to obtain bacteria-free leaf surfaces using a standard surface sterilization method. In the SEM analyses, there were no bacteria attached to the leaf surface or in and around the stomata (Fig. 3A–C). In contrast, many bacteria remained tightly attached to the root surface after surface sterilization and were mainly distributed between adjacent cells or in grooves on the cell surface (Fig. 3D–F). This tight attachment of bacteria suggests that this may be one invasion route for endophytes. On the basis of these results, in this study, the leaf endophytic bacteria refer to all bacteria colonizing the internal tissues, while the root endophytes include both the bacteria colonizing the internal tissues and those unremovable by surface sterilization.
Experimental evaluation of designed primer sets using plant materials
The designed primer sets were applied to the rice genomic DNA template to evaluate the efficiency in excluding plant DNA contamination. Leaf and root samples were collected from 5-week-old greenhouse-grown rice plants (cultivar Nipponbare), and DNA was extracted from the surface-sterilized plant materials. PCR was performed in accordance with the protocols described above, and the amplicon libraries were sequenced using the Illumina Miseq PE300 platform. The sequencing results indicated that from all the plant materials, including leaf and root samples, there was no Ct16S sequence co-amplification (Fig. S3A and 3B). Except for the leaf sample amplified using 322F/796R, all the other three amplicon libraries had little (<1.8%) or no MtDNA contamination. Rice leaf samples amplified using 322F/796R had 87.8% MtDNA contamination on average (Fig. 4A&B), indicating that primer 322F was not very efficient in avoiding Mt18S contamination. Considering the decreased efficiency of bacterial-DNA amplification when endophytes are much less abundant than plant mitochondria [31, 32], the surface-sterilized rice leaf might contain bacterial amounts significantly fewer than the root. The large proportion of Mt18S in the plant genomic DNA samples might be the reason for the Mt18S contamination.
After removing the Mt18S reads from the sequencing results, we compared the obtained plant endophytic bacterial structures. At the genus level, the two 16S amplicon libraries of rice leaf endophytes shared 47.9% identity, and those of rice root symbionts shared 52.4% identity (Fig. 4C&D). At each taxonomic level, the two primer sets revealed similar bacterial composition; however, various taxa exhibited differences in relative abundance (Fig. S4). A few taxa were amplified by only one primer set. At the family level, both Verrucomicrobia and Pedosphaeraceae (phylum Verrucomicrobia), were only detected by 799F/1107R at a low relative abundance (<0.4% in leaf samples and <2% in root samples). Both Amoebophilaceae and Flavobacteriaceae were only detected by 322F/796R. All these primer-specific taxa had low relative abundance (Fig. S4C).
Optimization of primer pair 322F/796R
To specifically amplify bacterial 16S from the plant materials, especially those containing very few endophytes, we modified the primer 322F to reduce co-amplification of Mt18S, thus increasing the efficiency of 322F/796R in producing plant-DNA free bacterial amplicon library from the plant materials that contain low-abundant endobacteria. The penultimate base was changed from G (guanine) to A (adenosine), C (cytosine), T (thymine), or H (a proportional mixture of A, C, and T); these modified primers were named 322F-A, 322F-C, 322F-T, and 322F-H, respectively. These 322F-derivatives are hereafter referred to as “322F-Dr” (Table 1). All of these primers contained two adjacent mismatches to Mt18S at the 3′ end. The primer sets 322F-A/796R, 322F-C/796R, 322F-T/796R and 322F-H/796R (collectively, “322F-Dr/796Rs”) were then used to amplify bacterial 16S from rice leaf genomic DNA. As expected, amplification with each of the 322F-Dr/796Rs resulted in very little MtDNA contamination (<1.8% at the genus level; Fig. 4E).
After removing the Mt18S sequences, we compared the taxonomic classifications and the α-diversity and β-diversity revealed by 322F/796R and each of the 322F-Dr/796Rs. At the OTU level, both Ace and Shannon’s index analyses showed no significant difference among the samples primed with different primer sets (Fig. 4F&G). These results indicated that all the 322F-Dr/796Rs, when compared with 322F/796R, did not change the ability in decipher the richness and diversity of the bacterial communities. We then compared the β-diversity using principal component analysis (PCA) analysis at the OTU level. As shown in Fig. 4H, the bacterial communities deciphered using 322F/796R and each of the 322F-Dr/796Rs were clustered together, indicating the high similarity of the four primer pairs in revealing the bacterial community structure of the same rice leaf sample.
We then compared the primer sets 322F/796R and 322F-Dr/796Rs in deciphering the community structure of bacteria in an environmental material that is free of plant DNA interference. Total DNA extracted from planting soil was used as template. At the OTU level, the five 16S amplicon libraries shared 60.08% identity, with only 4.09% of the 16S amplicons belonging to a single amplification group (Fig. S5A). At the genus and phylum levels, the proportions of shared amplicons were 77.21% and 86.84%, respectively, while the proportions of single-group specific amplicons were 3.24% and 5.27%, respectively (Fig. S5B&C). All the 322F-Dr/796Rs, as compared with 322F/796R, had similar abilities to reveal the richness and diversity of the bacterial communities (Fig. S5D&E). In a PCA of β-diversity, samples with the same template rather than those amplified with the same primer pair grouped closer together (Fig. S5F), indicating a weak influence, if any, of primer pairs on the definition of soil bacterial structure.
Rice leaf and root contain different endo-bacteriomes
Using the newly designed primer set 322F-A/796R, we performed 16S NGS to compare the composition of the rice leaf and root endo-bacteriomes. The coverage index of all samples was >0.97. Sequencing results indicated that rice leaf and root shared 66.36% identity at the OTU level; however, the structures of bacterial communities were very different (Fig. 5A and B). At the phylum level, the root and leaf exhibited similar bacterial community composition but differences in the relative abundance of phyla (Fig. 5C). Proteobacteria had the highest relative abundance in all leaf and root samples. In leaf samples, more than 70% 16S amplicons belonged to Proteobacteria. The phyla Bacteroidetes, Actinobacteria, Fibrobacteres, and Firmicutes had relative abundance ranging between 2% and 5%. There were low-abundant (<2%) Chloroflexi and Deinococcus-Thermus. In root samples, there were three predominant phyla: Proteobacteria (44.6%), Fibrobacteres (35.7%), and Bacteroidetes (16.9%). The phyla Actinobacteria (1.9%) and Chloroflexi (0.5%) were at low abundance. Compared with the root, leaf enriched Proteobacteria, Firmicutes, and Deinococcus-Thermus (Fig. 5D). At the family level, the relative abundance of many taxa differed significantly between the leaf and root endo-bacteriomes (Fig. 5E).
Compared with the root endo-bacteriome, the leaf endo-bacteriome had significantly lower species richness but similar species diversity (Fig. 5F&G), suggesting that there might be large differences among individual leaf samples. As illustrated in a Venn diagram (Fig. 5H), the six leaf 16S amplicon libraries shared 20% identity at the OTU level, with more than 10% of the 16S amplicons belonging to a single amplification group. In contrast, the six root samples shared 50% identity at the OTU level and less than 5% of the OTUs belonged to a single amplification group (Fig. 5I). PCoA indicated that the leaf samples were more separated than the root samples, providing further evidence that the leaf endo-bacteriome was more variable among samples (Fig. 5B).
Using PICRUSt, we explored the function of the endo-bacteriomes in the leaf and root. The rice leaf and root endo-bacteriomes enriched very different KEGG pathways, indicative of different microbial functional features (Fig. 5J). The level 3 KEGG pathway data indicated that the secretion system, bacterial secretion system, two-component system, pentose phosphate pathway, bacterial motility proteins, transporters, and flagellar assembly were more enriched in leaf samples, while the DNA repair and recombination proteins, transcription machinery, pyrimidine metabolism, peptidases, alanine, aspartate and glutamate metabolism, cysteine and methionine metabolism, and amino acid related enzymes were more enriched in root samples. The level 1 KEGG pathway data indicated that environmental information processing and cellular processes were more enriched in the leaf endo-bacteriome, while genetic information processing and metabolism were more enriched in the root endo-bacteriome.
Together, these results suggest that the interactions between the leaf endo-bacteriome and the plant would be more readily affected by environmental factors, whereas root endophytes have stronger interactions with the plant and would participate more in plants’ physiological functions.
Quantification of rice endophytic bacteria
Using the designed primer sets, qPCR was performed to quantify the endobacteria in the rice leaf and root. Leaf and root samples were collected from 5-week-old greenhouse-grown rice plants (Nipponbare) and then surface-sterilized before extracting total genomic DNA. The primer 322F-A/796R was used to amplify the endophyte DNA. Within the rice leaf, the number of bacteria was 1/100 that of plant cells. In contrast, the number of bacteria within root and tightly attaching onto the root surface was 10-fold that of plant cells (Fig. 6A). It remained unclear what proportion of these root bacteria were localized inside the root.
Next, we used the standard curve method to quantitate the endobacteria. A qPCR standard curve was created by using gradient dilutions of E. coli genomic DNA in 100 ng rice callus DNA as the PCR templates. Rice callus was considered to contain no or very few bacteria. The primer pair 322F-A/796R was used to amplify the bacterial 16S rDNA gene to create a standard curve for the bacterial number (Fig. 6B). The PCR efficiency of 322F-A/796R was 90.55%. The DNA loss rate during extraction was 66.7%–90%, as estimated by adding dilutions of E. coli cells to 50–100 mg plant materials before DNA extraction and PCR amplification. We calculated that the rice leaf contained endophytic bacteria at a concentration of 106–107 cells per gram plant fresh weight (Fig. 6C). The surface-sterilized rice root contained much higher bacterial densities (109–1010 cells per gram plant fresh weight; Fig. 6C). The low bacterial abundance in the rice leaf might be the reason for high Mt18S DNA contamination in the amplicon library obtained using 322F/796R.