Standardization of data and clustering of nucleus pulposus single cell data
As the sequencing depth increases, the number of genes (count) and gene types (feature) detected in a cell by high-throughput sequencing also increase with the sequencing depth, and both show a positive correlation within a certain range. If the number of counts and the number of features measured in a cell deviate significantly from the position of the majority of cells, this cell is likely to be a bicellular or multicellular capture state, i.e., two or more cells are captured in a single microwell. The permeability of the cell membrane of dead cells is significantly altered, which results in a large loss of RNA from the cytoplasm, but not so much from mitochondrial RNA, so if a dead cell is captured in one microwell, the proportion of mitochondrial RNA to gene COUNT will rise and the number of gene COUNT will fall.
We obtained the number of genes in a single cell, the gene type in a single cell, and the proportion of mitochondrial genes in a single cell (Fig. 1A) based on the single-cell gene expression matrix obtained from the upstream analysis to plot the violin plot of feature. The gene number-gene type relationship, mitochondrial gene ratio-gene number relationship, and mitochondrial gene ratio-gene type relationship were also plotted (Fig. 1B).
We set the screening conditions as the number of features in a single cell is more than 200 and less than 6000, and the ratio of mitochondrial genes is less than 0.1. 4774 cells were finally screened for subsequent analysis. We used the function "NormalizeData" with the parameter "LogNormalize" and the scale of 10000 to normalize the data of single cell expression matrix.
After selecting the top 2000 genes highly variable genes that were simultaneously expressed in at least 20 cells (Fig. 1C) and performing PCA downscaling analysis to obtain principal components, the top 20 principal components were finally selected for UMAP downscaling clustering of 4774 cells to obtain 4 cell subgroups: cluster0, cluster1, cluster2, cluster3, respectively including 2239, 1910, 561, and 64 cells. Each cell cluster can be considered as a cell cluster with statistically significant differences in gene expression from other cell clusters in the tissue. All cells in the two-dimensional UMAP distribution are shown in Fig. 1D. The two-dimensional UMAP map can show the relationship between thousands of cells with different complex phenotypes in the form of a two-dimensional planar dot plot, where cells clustered into groups have a high degree of similarity and cell groups separated from each other have a large difference in gene expression. the number of four cell groups is shown in Fig. 1E. Cluster0, cluster1 and cluster 2, which are more closely related in the UMAP plot, have subclusters that occupy most of them.
Identification of subclusters marker genes and subcluster type identification
A total of 1159, 1240, 937 and 1915 (P < 0.05) marker genes were identified in the four sub-clusters and sorted according to log2FC using differential screening to obtain up-regulated genes in different cell clusters. The top 20 marker genes were selected for each subcluster and were as follows: Cluster0: MGP, GPX3, DCN, MT1X, MT1G, CLU, MT2A, MT1E, ECRG4, CDO1, CHAD, SLPI, MT1F, RBP4, CHI3L1, DEFB1, OMD, CILP, HMOX1 and PLA2G2A. Cluster1: COL15A1, LTBP1, UNC5B, TNC, TGFBI, COL5A1, MXRA5, COL1A2, COL6A2, COL6A1, COL5A2, DKK3, VEGFA, COL6A3, COL11A1, FN1, TPM1, IGFBP3, COL1A1, COL3A1. Cluster 2: CD55, TIMP1, CRLF1, NT5E, PCOLCE2, ANGPTL4, P3H2, CD59, S100A2, SERPINE2, FGFBP2, TMEM158, FGF1, S100A4, KLK10, CLMP TMSB10, PRG4, IL11, IGFBP5. Cluster 3: PLEK, LCP1, FCGR3B, CSF3R, SELL, PTPRC, TYROBP, S100A9, CTSS, S100A8, BCL2A1, IL1B, FCER1G, C5AR1, BASP1, CXCL8, LYZ, SRGN, CD74, HLA-DRA. A gene expression heat map was generated using these genes on a cell-cell cluster basis for display (Fig. 2A).
Using the singleR package, the single cell data were compared with the Blue print Encode Data reference set to obtain a type score for each individual cell, and finally the scores for each cell corresponding to the different cell types were presented as a heat map (Fig. 2B), and the cell types were marked in the UMAP map Fig. 2C). The large cell clusters containing cluster0, cluster1 and cluster2 were identified as chondrocytes, while the small cluster3 cells were identified as neutrophils.
Construction of a PPI map of chondrocyte subpopulations in the nucleus pulposus and screening of key genes
The upregulated top100 marker genes of the 3 nucleus pulposus chondrocyte subclusters were entered into the STRING database to construct PPI maps, and then the core 10 key genes were obtained using the cytoHubb tool in cytoscape software. As shown in Fig. 3A, Fig. 3B and Fig. 3C: Key genes of Cluster0: EEF1A1, UBA52, EEF1B2, FAU, TPT1, HSPA8, JUNB, HSPB1, UBC, FOS. key genes of Cluster1: COL1A2, COL6A2, COL5A1, COL5A2, COL6A1, COL6A3, ITGB1, P4HA1, COL11A1, FN1, etc. Key genes for Cluster2: PKM, GAPDH, ENO2, TPI1, PGK1, LDHA, PLAUR, SERPINE1. these key genes may play a critical role in the differentiation and function of the subcluster.
Functional enrichment analysis of chondrocyte Subclusters
To deeply explore the functions of different subtypes of chondrocyte clusters, functional enrichment analysis of upregulated differential genes of each class of subcluster was performed and the top 20 BP entries annotated by GO analysis of three chondrocyte subclusters are shown at P < 0.01. as shown in Fig. 4A, Fig. 4B, and Fig. 4C. Among them, calcification-inhibited medullary nucleus chondrocytes were mainly enriched for biological processes such as intrinsic apoptotic signaling pathway, oxidative phosphorylation, respiratory electron transport chain, mitochondrial ATP synthesis-coupled electron transport, and transition metal ion homeostasis. Steady-state nucleus pulposus chondrocytes are enriched in biological processes related to ossification, osteoblast differentiation, osteochondrogenesis and other biological processes such as cell matrix adhesion, metaplastic cell migration, and collagen fibril organization. The effector nucleus pulposus chondrocytes are enriched in biological processes such as phosphoribose metabolism, negative regulation of hydrolase activity, oxidative phosphorylation, homotypic cell-cell adhesion, and negative regulation of peptidase activity.
We also performed the gene set variation analysis (GSVA) analysis method using the KEGG dataset for three different chondrocyte subclusters, and the results are shown in Fig. 4D. Compared with the other two groups of cells cluster0 medullary chondrocyte cluster had higher activity in TNFα-activated NF-κB signaling pathway, p53 apoptotic signaling pathway, etc., indicating that this group of cells had higher activity in inflammatory response and apoptosis. Cluster1 medullary chondrocytes produced significant enrichment in angiogenesis, wnt/β-catenin signaling pathway, NOTCH signaling pathway, etc. The Cluster2 nucleus pulposus chondrocyte cluster was significantly enriched in PI3K-AKT signaling pathway, IL-2-STAT5 signaling pathway, TGFβ signaling pathway, glycolysis, hypoxia, and oxidative phosphorylation pathways.
According to the literature in recent years[18, 19], based on the functional enrichment results of the above cell subclusters and previous studies on cartilage and intervertebral discs, we named the cluster0 subcluster of chondrocytes as calcium-inhibited nucleus pulposus chondrocytes, with the main expressed signature marker genes MGP, MT1G, GPX3, CDO1, MT2A. the cluster1 subcluster of chondrocytes as steady-state nucleus pulposus chondrocytes, with the signature marker genes MXRA5, DKK3. and the cluster2 subcluster of chondrocytes as effector nucleus pulposus chondrocytes, with the marker genes MSMO1, HMGCS1.
Pseudo-Time Analysis of the Chondrocyte cluster
To further investigate the process of interconversion and genetic changes between chondrocytes in the nucleus pulposus, we performed a pseudo-time analysis of the chondrocyte cluster using the monocle software package. The pseudo-time analysis diagram is shown in Fig. Each point in the diagram represents a cell, cells with similar states are clustered together, and each branch point represents a possible decision point for a cell biological process. The function arranges the cells along the pseudo-time trajectory according to the successively similar genetic changes among different cells. The results are shown in Fig. 5A, where we define the Cluster1 steady-state nucleus pulposus chondrocyte cluster to be located at the temporal start position, and the Cluster0 calcification-inhibited nucleus pulposus chondrocytes, and the Cluster2 effector nucleus pulposus chondrocyte cluster to be located at the two temporal end points, respectively. The continuation distribution of the pseudo time analysis plots for most of the cells indicates that the three cells are not completely separated from each other, but change progressively, which is consistent with the UMAP plot in which the three cells are clustered together into one large cluster. We thus hypothesize that some cells in the steady-state nucleus pulposus will develop into calcification-inhibited nucleus pulposus chondrocytes and effector nucleus pulposus chondrocytes during the process of nucleus pulposus degeneration.
To explore genes possessing similar change trends during differentiation, we observed the changes of each co-expressed gene module during the continuation distribution of the pseudo time analysis for most of the cells indicates that the three cells are not completely separated from each other, but change progressively, which is consistent with the UMAP plot in which the three cells are clustered together into one large cluster. We thus hypothesize that some cells in the steady-state nucleus pulposus will develop into calcification-inhibited nucleus pulposus chondrocytes and effector nucleus pulposus chondrocytes during the process of nucleus pulposus degeneration. course by classifying and visualizing genes with similar change trends. We visualized the change trends of these genes, and as shown in Fig. 5B, there are about 2085 genes with significant temporal-proportional change trends along the temporal alignment. To better study these genes, the most significant 100 of them are shown separately (Fig. 5C). The results showed that TPM1, NIBAN1, COL11A1, HSPA5, UNC5B, PAPPA, PABPC1, COL6A2, CD109, APLP2, PLOD1, QSOX1, HSP90B1, CCPG1, COL11A2, EDIL3, P4HA2, CALU, COL2A1 CANX, P4HA1, TTC3, VEGFA, VIM and other genes are highly expressed in the early part of the trajectory. FN1, COL1A2, LTBP1, MXRA5, TNC, TGFBI, COL5A2, COL15A1, COL5A1, COL1A1, COL3A1, IGFBP3, COL6A3, SPARC, MELTF, LOXL2, COL6A1, CCND1, ANGPTL4, FGF1, CRLF1, NT5E, ISLR, PRSS23, MFGE8, SERPINE1, TIMP3, GAPDH with enhanced expression at mid-term and CHAD, OMD, GPX3, GADD45B, SOD2, DEFB1, RGCC, MGP, ECRG4, HMOX1, PLA2G2A, MT1E, IFITM3, RBP4, CHI3L1, NNMT, MT1G, IER3, MT2A, CDO1, IFITM2, C11orf96, EEF1B2, UBB, SLPI, CHI3L2, MT1M, TPT1, MT1X, MT1F and other genes with enhanced expression in the later part of the trajectory.
In the results of the single cell pseudo-time analysis, there are several branching nodes where cells undergo programmed changes in gene expression, such as cell differentiation and apoptosis. We were interested in cluster1 steady-state chondrocytes differentiating into cluster0 calcification-inhibited chondrocytes and cluster2 effector chondrocytes and further analyzed the gene expression changes at node 1 using the BEAM analysis method (Fig. 5D) and selected the top 100 genes to be presented in a pseudo-time heat map. The results are shown in Fig. 5E, in the direction of differentiation of node 1 towards state2 (i.e. calcification-inhibited nucleus pulposus chondrocytes), the genes DCN, CLU, PLA2G2A, HMOX1, C11orf96, CHAD, CLEC3A, IFITM2, CDO1, CHI3L1, MT2A, MT1G, MT1E, MGP, ECRG4, MT1X MT1F, GPX3, FRZB, SLPI, HSPB1, BTG2, ZFP36, JUNB, CHI3L2, SOD2, CCN2, HSPA1B, HSPA1A, EGR1, HSP90AA1, JUN, HSPH1, DNAJB1, FOS, KLF4, FOSB, CCN1, ANXA1, The expression levels of ATF3, KLF2, JUND, HSPA8, were significantly upregulated, while these genes were decreased in the direction of differentiation towards state3 (i.e. effector nucleus pulposus chondrocytes). Conversely, genes IGFBP3, PRSS23, TNFSF11, TREM1, GAPDH, IGFBP5, CRLF1, LDHA, CLMP, PTGES, NT5E, ISLR, TIMP3, MFGE8, ITGB5, TIMP2, PRG4, FGFBP2, PCOLCE2, S100A4 LGALS1, CD55, SERPINE2, TMSB4X, TMEM158, KLK10, TIMP1, S100A2, CD59, IL11, P3H2, FGF1, ANGPTL4, OGN, CRTAC1, SPARC, COL3A1, COL1A1, FN1, COL5A2, COL1A2, TNC, MXRA5, LTBP1, COL15A1, COL5A1, COL2A1, VEGFA, PLOD1, CCND1, MELTF, COL6A3, COL6A1, LOXL2, QSOX1, TGFBI, COL6A2 expression levels in in the direction of differentiation towards state3 (i.e. effector nucleus pulposus chondrocytes) The expression levels of these genes were significantly upregulated in the direction of state3 (effector nucleus pulposus chondrocytes) and significantly decreased in the direction of state2 (calcification-inhibiting nucleus pulposus chondrocytes). These genes showed a significant trend in the differentiation of nucleus pulposus chondrocytes.
Neutrophil Enrichment Analysis
Methods as before, functional enrichment analysis was performed for upregulated differential genes in the neutrophil cluster. Figure 6A illustrates the top 20 BP entries annotated by GO analysis of the neutrophil subcluster. The results show that the neutrophil subcluster is enriched for biological processes such as: leukocyte migration, cytokine production, cytokine-mediated signaling pathways, leukocyte-mediated immunity, interleukocyte adhesion, positive regulation of cell adhesion, leukocyte chemotaxis, leukocyte activation, and leukocyte value addition.
Similarly, the upregulated top100 marker genes of neutrophils were entered into STRING database to construct PPI plots (Fig. 6B), and the results showed that FCGR3A, SYK, HCK, PTPRC, ITGAM, FCER1G, FCGR3B, and FGR were the key genes of neutrophils in degenerating nucleus pulposus tissue.