Background: Single-cell RNA (scRNA) profiling or scRNA-sequencing (scRNA-seq) is a rapidly developing technology and an important frontier of molecular biology science. scRNA profiling makes it possible to parallelly investigate diverse molecular features of multiple types of cells in a given plant tissue, and promotes elucidation of cellular heterogeneity and discovery of developmental processes underpinning cell differentiation. While it is assumed that the power of scRNA profiling in uncovering cellular heterogeneity largely depends on the depth of scRNA-seq, no study about the effect of the sequenced cell numbers on the power of plant scRNA-seq has ever been reported.
Results: In this study, on the basis of analyzing the sample coverage of 1,244 available scRNA-seq studies (including 30 in plants) and the effect of sample coverage on cell clustering and identification of cell types, we evaluated the effects of sample size (i.e., cell number) on the outcome of single cell transcriptome analysis by sampling different number of cells from a pool of ~57,000 Arabidopsis thaliana root cells integrated from five published studies. Our results indicated that the most significant principle components could be achieved when 20,000-30,000 cells were sampled, a relatively high reliability of cell clustering could be achieved by using ~20,000 cells with little further improvement by using more cells, 96% of the differentially expressed genes could be successfully identified with no more than 20,000 cells, and a relatively stable pseudotime could be estimated in the sub-sample with 5,000 cells.
Conclusions: Our results imply that ~20,000 (or 10,000 - 30,000[1] ) cells are enough for profiling Arabidopsis root cells using scRNA-seq, although the applicability of this number to other Arabidopsis tissues and other plants is yet to be further determined by analyzing scRNA-seq data generated from diverse tissues of different plant species. Nevertheless, our results provide a general guide for optimizing sample size to be used in plant scRNA-seq studies. Change to “or up to 300000”?