Background: Due to the inherent scarcity of the microbial rare taxa, it is difficult to distinguish between bona fide rare taxa and potential false positives in metabarcoding amplicon sequencing studies. Although recently developed quality control and clustering or denoising algorithms could remove sequencing errors or non-biological artifact reads, no algorithm could remove high quality reads from sample-wise cross contaminations introduced by index misassignment.
Results: We thoroughly evaluated the rate of index misassignment of the mostly used NovaSeq 6000 and DNBSEQ-G400 sequencing platforms using both commercial and customized mock communities, and observed significant higher (5.68% vs 0.08%) fraction of potential false positive reads for NovaSeq 6000 compared to DNBSEQ-G400 in. Significant batch effects could be caused by the randomly detected false positive or false negative rare taxa. These false detections introduced by index misassignment could also lead to inflated alpha diversity for relatively simple samples while underestimated alpha diversity for complex samples. Further test using a set of cow rumen samples reported differential rare taxa by different sequencing platforms. Correlation test of the rare taxa detected by each sequencing platform demonstrated that the Novaseq 6000 platform detected rare taxa had a much lower possibility to be correlated with the physiochemical properties of rumen fluid. Community assembly mechanism and microbial network association analysis indicated that false positive or negative rare taxa detection could lead to distorted community assembly process and identification of even fake keystone species of the community, one of which was confirmed negative by PCR cloning and following Sanger sequencing.
Conclusions: Metabarcoding amplicon sequencing process may introduce scarce but not neglectable false positives. We highly suggest proper positive and negative controls and technical replicate settings, and proper sequencing platform selection in future amplicon studies, especially when the microbial rare biosphere should be focused.