Study participants
Samples were collected from study participants who provided informed consent, in accordance with the Declaration of Helsinki and with approval from the ethics review board of the Graduate School of Medicine, Osaka University, Japan (No. 855). Consent was obtained to publish information such as age, sex, the name of medical center, and the diagnosis. Study participants received no compensation. Patients were diagnosed with systemic sclerosis according to the 2013 ACR/EULAR classification criteria11. The diagnosis was confirmed by at least two rheumatologists.
Patient profiles
Twenty-one patients diagnosed with SSc and six healthy donors were recruited for this study. None of the patients had received any immunosuppressive therapy. All patients were either admitted to or visited Osaka University Hospital, where they underwent a comprehensive assessment to rule out infectious diseases, neoplastic lesions, and any overlap of other systemic autoimmune diseases, before the 2013 ACR/EULAR classification criteria was applied. The presence of ILD was diagnosed based on clinical symptoms, computed tomography (CT) scan and pulmonary function tests. CT images were reviewed by at least two rheumatologists and one radiologist. SRC was diagnosed according to the UK Scleroderma Study Group guideline48. Samples were included in the SRC group if they were collected from SSc patients within three months of SRC onset or at any time after the onset. Blood pressure and mRSS were measured on the day of blood collection. Four patients with LN were recruited for some analyses. Systemic lupus erythematosus was diagnosed according to the 2019 EULAR/ACR classification criteria49, and LN was confirmed by renal biopsy50.
PBMCs preparation
Twenty milliliters of whole blood was collected into Na-heparin blood collection tubes (Terumo, Cat. No. VP-H070K). PBMCs were isolated using Leucosep (Greiner, Cat. No. 22788-013), then washed and resuspended in Cellbanker 1plus (ZENOAQ, Cat. No. CB023) to a concentration of 1.0 × 107 cells/mL before storage at − 150°C.
Single-cell library construction (CITE-seq)
CITE-seq was performed using the same antibodies as in our previous report15. Thawed PBMCs were treated with DNA-barcoded antibodies, and single-cell suspensions were processed using the 10x Genomics Chromium Controller (10x Genomics). The libraries were constructed according to the protocol outlined in the user guide of Chromium Single Cell 5’ Reagent Kits v2 (Dual Index, Cat. No. PN-1000263) (10x Genomics). Briefly, up to 10,000 labeled live cells per sample were individually loaded into the 10x Genomics platform without sample mixing to generate a barcoded cDNA library for individual cells. Data quality control was conducted using a Bioanalyzer system (Agilent). Individual libraries were pooled and sequenced on the HiSeq 2500 or Novaseq 6000 platform (Illumina) to analyze gene and surface protein expression. Sequence information of CITE-seq is summarized in Supplementary Table 8.
Reference-based cell annotation and analysis of CITE-seq data
Raw FASTQ files were aligned to the GRCh38 reference genome using CellRanger (version 6.0.6). Filtered HDF5 feature-barcode matrix files were generated using the CellRanger count to create a Seurat object. Data quality control, scaling, transformation, clustering, dimensionality reduction, differential expression analysis, and visualization were performed using the Seurat R package (V4.3.0). Cells with nFeature_RNA values less than 200 or greater than 5,000, or mitochondrial read percentages exceeding 20%, were removed. Data normalization and scaling were performed using the SCTransform function. Except for the time course analysis, only the initial samples collected during the clinical course were integrated. For the comparison between SRC, LN, and healthy donors, samples from patients diagnosed with SRC or LN were integrated with those from healthy donors. For the time course analysis of the SRC patient, samples collected at three different time points from the same individual were included. Each cell subpopulation was identified through two rounds of clustering. Initially, reference-based integration was applied to the query dataset using the public CITE-seq dataset of 211,000 human PBMCs as the reference12. The FindTransferAnchors function was employed to find anchors between the reference and the query datasets, using precomputed supervised PCA transformation for SCT-normalized data. The MapQuery function was used to transfer cell-type labels and protein data from the reference to the query datasets. Platelets and erythrocytes were excluded from the analysis. To identify subpopulations within each cell type, a second round of clustering was conducted on specific populations: monocytes (CD14 Mono, CD16 Mono, cDC1, and cDC2), CD8+ T cells (CD8 Naive, CD8 TCM, and CD8 TEM), CD4+ T cells (CD4 Naive, CD4 TCM, CD4 TEM, Treg, and CD4 CTL), B cells (B naive, B intermediate, and B memory), NK cells (NK and NK_CD56bright), and pDCs (pDC). The RunUMAP function was used to perform UMAP dimensional reduction with 30 precomputed spca dimensions. A nearest-neighbor graph using the 30 dimensions of the supervised PCA reduction was computed using the FindNeighbors function followed by the FindClusters function for cell clustering. The resultant UMAP was visualized using the DimPlot function. Each cluster was manually annotated using gene expression and surface protein data. Doublets were manually removed using surface protein data.
PCA using cell composition of PBMCs
PCA was performed using the PCA function in the FactoMineR package, with data scaled to unit variance. First, the proportion of each cell subpopulation defined in the subset analysis was calculated, along with other minor cell populations, all relative to the total PBMCs. The proportion of each cell subset was included as a variable in the PCA, except for populations with fewer than 100 total cells. Vectors representing each subpopulation were shown on a plane defined by the first two principal components (PC1 and PC2) using the fviz_pca_var function. Each study participant was represented as a single plot on the same plane according to their respective PC1 and PC2 values.
Differential abundance analysis
Differential abundance analysis was performed on PBMCs, monocytes, CD4+ T cells, and CD8+ T cells collected in this study, as well as lung CD8+ T cells derived from public data34, using the miloR (version 3.15) package. These analyses were performed to detect groups of cells that are differentially abundant in different conditions by modeling the number of cells within the neighborhoods of a k-nearest neighbor (KNN) graph17. To adjust for cell number, up to 1,000 cells were randomly selected from each sample for differential abundance analysis. The buildGraph function was first used to construct a KNN graph on the basis of precomputed supervised PCA with k = 5, using 30 principal components. Using the makeNhoods function, cells were then grouped into neighborhoods according to their connectivity over the KNN graph. To test for differential abundance, Milo fitted a negative binomial generalized linear model to the counts for each neighborhood using TMM normalization. Age was used as a covariate in the testNhoods function. For visualization, the log2-fold change in cell numbers between two conditions in each neighborhood was calculated. Nodes representing neighborhoods that were considered statistically significant (α < 0.2) were colored in red or blue.
Differential gene expression analysis and gene set enrichment analysis
In specific cell subsets, DEGs were identified using the FindMarkers function. For the characterization of DEGs, gene set enrichment analysis was conducted on genes exhibiting a log2-fold change greater than 0.5, using the Enrichr, a web-based tool for analyzing gene sets51. The MSigDB Hallmark 202035 or BioPlanet 201952 were employed as the dataset and adjusted P-values for each pathway were calculated by the Benjamini–Hochberg method. TRRUST Transcription Factors 201928 was used to determine the adjusted P-values for each transcription factor–related pathway. Pathways not related to humans were excluded from the analysis.
Preparation of kidney cells and whole-blood leukocytes for Abseq
Renal tissue was obtained by needle biopsy from one patient at the onset of SRC (patient ID: SSc-19). The tissue was immediately washed with phosphate-buffered saline and processed into single cell suspension using the Tumor Dissociation Kit (human, Miltenyi Biotec). Enzyme R was excluded to preserve the cell-surface epitope. The sample was cut into small pieces with a maximal dimension of approximately 2 mm, combined with the enzyme mixture, and shaken at 37°C in a water bath at 160 rpm for 45 minutes. The sample was treated with DNase for 5 minutes at room temperature. No steps were taken to lyse red blood cells or remove dead cells.
Two milliliters of Whole blood was collected from the same patient in an EDTA-2Na blood collection tube (Terumo, Cat. No. VP-Na052K), and the leukocytes were isolated using Polymorphprep (Serumwerk, Cat. No. 1895). The sample was washed and resuspended with Cellbanker 1plus (ZENOAQ, Cat. No. CB023) to a concentration of 2.0 × 106 cell/mL prior to scRNA-seq experiments.
Single-cell library construction (Abseq)
Renal cells and total peripheral leukocytes were processed separately through the BD Rhapsody Express System (BD Biosciences, Catalog No. 665915). Thirty-five DNA-barcoded antibodies (detailed in Supplementary Table 10) were added to each cell suspension. Abseq libraries were constructed using WTA Reagent Kits (BD Biosciences, Cat. No. 665915). Approximately 10,000 peripheral leukocytes were loaded into the BD Rhapsody platform. All obtained renal cells were processed in an identical manner. Data quality control was performed using the Bioanalyzer (Agilent). The libraries were pooled for sequencing on the HiSeq 2500 or Novaseq 6000 platform (Illumina) to analyze gene and surface protein expression. Sequence information of Abseq is summarized in Supplementary Table 9.
Manual cell annotation and analysis of Abseq data
Raw FASTQ files were aligned to the GRCh38 reference genome using the BD Rhapsody Sequence Analysis Pipeline (version 1.12). Distribution-based error correction-adjusted molecule counts were presented as gene expression data tables to establish a Seurat object. The Seurat R package (V4.3.0) was used for data quality control, scaling, transformation, clustering, dimensionality reduction, differential expression analysis, and visualization. A total of 16,081 cells were selected for further analysis on the basis of unique molecular identifiers for each cell and percentages of mitochondrial reads. Peripheral leukocytes with nFeature_RNA values less than 200 or greater than 5,000, or mitochondrial read percentages exceeding 20%, were excluded from the analysis. Similarly, kidney cells were removed if they had nFeature_RNA values less than 200 or greater than 5,000, or mitochondrial read percentages exceeding 40%. The data were normalized and scaled using the SCTransform function. The data from peripheral blood and kidney samples were integrated using reciprocal PCA. The RunUMAP function was used for UMAP dimensional reduction with 30 precomputed PCA dimensions. A nearest-neighbor graph was then constructed using the 30 PCA dimensions with the FindNeighbors function, followed by clustering using the FindClusters function. The UMAP was visualized using the DimPlot function. Each cluster was manually annotated using gene expression and surface protein data. Platelets and erythrocytes were removed from the analysis. Doublets were also removed using cell-surface protein data.
Pseudotime analysis
The pseudotime of each cell and the cell trajectory were calculated using the Monocle 3 package26. The previously annotated Seurat object was imported into Monocle 3. Using the learn_graph function, we fitted a principal graph and plot the graph through the UMAP coordinates. The order_cells function was employed to calculate the pseudotime for each cell, with the root node set to the immature monocyte population characterized by high expression of S100A12 and low expression of HLA-DRB127 (Extended Data Fig. 14). The cell trajectory was visualized using the plot_cells function.
Public data analysis
The scRNA-seq data of lung cells derived from SSc patients with advanced ILD and healthy donors were obtained from the gene expression omnibus under accession numbers GSE128169 and GSE128033. We used only lung tissue samples, excluding those labeled with hashtags designated for sample multiplexing. The Seurat object was subsequently created using the CreateSeuratObject function. The quality control of scRNA-seq data was performed as previously reported34. Data were normalized and scaled using the SCTransform function, followed by data integration using reciprocal PCA. The RunUMAP function was used for UMAP dimensional reduction with 30 precomputed PCA dimensions. A nearest-neighbor graph using the 30 dimensions of the PCA reduction was computed using the FindNeighbors function, followed by clustering using the FindClusters function. The UMAP was visualized using the DimPlot function. Each cluster was manually annotated using gene expression data.
Module scoring
Module scores for each cell were calculated using the AddModuleScore function. To assess the expression of type II ISGs in lung CD8+ T cells, module scores were calculated using the “Interferon Gamma Response” gene set from MSigDB Hallmark 2020. To evaluate the similarity of each lung CD8+ T cell subset to the peripheral CD8+ T cell subpopulation, CD8_TEM_T2ISG, module scores were calculated using DEGs (fold change >0.5) of CD8_TEM_T2ISG. Gene scores for each subset were visualized using the Dotplot function according to cell-based scores.