Roots are underground organs that develop through long-term adaptation to terrestrial life. Roots also play crucial roles in various physiological functions. Primarily, they facilitate the absorption of water and inorganic salts, which are subsequently transported to the stem and leaves[1]. In addition to their role in nutrient uptake, roots serve as anchors, as they utilize strong branching capabilities to firmly stabilize plants in conducive soil[2]. Moreover, roots play a significant role in assimilating diverse inorganic salts into organic substances via complex biochemical reactions[3, 4]. They also serve as vital sites for synthesizing plant hormones, including cytokinins, auxins, abscisic acid, gibberellins, and ethylene, thereby exerting considerable influence on overall plant development[5]. Furthermore, certain types of roots possess the ability to expand and function as storage organs, thus facilitating both storage and reproductive processes.
In recent years, remarkable advancements have been made in the domain of molecular biology research focused on root systems. A significant portion of this research has focused on the model plant Arabidopsis thaliana. The primary root of Arabidopsis comprises distinct layers, including the epidermis, cortex, and vascular cylinders the primary root undergoes growth and development[6–8], it generates lateral roots, thereby continually expanding the complexity and extent of the root system[4, 9].The intricate development of the Arabidopsis root system is governed by multifaceted molecular processes, thus necessitating an in-depth investigation at a more refined level, particularly within specific cell subpopulations.
Cells are the smallest functional units in plants, and physiological processes are typically carried out collaboratively by multiple cells. However, there is significant gene expression variation among cells. Therefore, more precise transcriptome technology has been applied[10, 11]. Lopez-Anido constructed different models of cell state differentiation in the leaf tissue of Arabidopsis, analysing the potential heterogeneity of epidermal stomatal lineage cells[12]. Liu utilized the single-cell transcriptome atlas of early-stage Arabidopsis seedlings to identify new marker genes for leaf vein cells[13]. Kim determined the roles of various tissues in Arabidopsis true leaves starting from metabolic pathways, thus providing knowledge regarding the leaf vascular system and the relationships of leaf cell types[14]. Previous research on the underground part of Arabidopsis involved the construction of a high-resolution genetic map of the development process from stem cells to nucleus-free sieve tubes in the original endodermis of the Arabidopsis root system at seven developmental stages[15].
With the availability of scRNA-seq data, cell type identification has become an essential step for a multitude of downstream analyses[16–22]. In certain scenarios, the absence of reliable markers for essential cell populations poses a challenge in accurately defining cell types. Implementing an efficient machine learning prediction model to extract molecular markers and discern cell subpopulations from existing single-cell datasets has been proven to be a time-efficient and resource-saving strategy[23–27].
To address the limitations mentioned above, we proposed an ensemble computing framework named AtML, which enables the model to capture cell subpopulation biomarkers of Arabidopsis root tips and predict cell stages (Fig. 1). The AtML model combines MIC and XGBoost to assess the importance of genes in predicting Arabidopsis root tip cell subpopulations. Furthermore, we successfully applied our AtML model to data not included in the training dataset and demonstrated its superior predictive performance. By conducting a biological analysis of the optimal genes, we identified potential lineage-specific genes, which could help biologists better understand the heterogeneity of Arabidopsis root tips. Our work utilized a machine learning approach to aid in the development of markers for single-cell sequencing in Arabidopsis, thereby providing new insights and more accurate markers for cell type identification.