Data collection
RNA sequencing data (GSE54236) and single-cell sequencing data (GSE212046) were of obtained from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). GSE54236 dataset contained 80 HCC tumor tissues and 81 cirrhotic non-malignant tissue samples. GSE212046 dataset included 2 cases of on-tumor tissues with background of liver cirrhosis with steatohepatitis and 2 cases of tumor tissues from HCC patients. Eighty HCC tumor tissue samples in GSE54236 dataset were randomly divided into 3 and 1 portions. Similarly, 81 cirrhotic non-malignant tissue samples in GSE54236 dataset were randomly divided into 3 and 1 portions. Three portions of HCC tumor tissue samples and 3 portions of cirrhotic non-malignant tissue samples were served as training dataset, one portions of HCC tumor tissue samples and one portions of cirrhotic non-malignant tissue samples were served as test dataset.
mRNA sequencing data
Nine tumor tissue samples from HCC patients and 5 cases of liver cirrhosis were collected for mRNA sequencing. The total RNA was extracted from tissues by Trizol method. The mRNA with polyA tail was enriched by Oligo (dT) magnetic beads to construct the sequencing library. Based on the Illumina NovaSeq6000 sequencing platform, these libraries were sequenced by double terminal (Paired-end, PE).
scRNA data analysis
Applying R packages "Seurat" and "NormalizeData" function, the scRNA data was analyzed. In order to ensure the data quality, the batch effects of the non-biotechnological bias were corrected. The number of highly variable genes was set to 3000 and identified using "FindVariableFeatures". Principal component analysis (PCA) was used to reduce the order of 3000 high variable genes by Seurat RunPCA function. The principal components were displayed by Seurat ELbowPlot function, and the first 15 principal components (PC) were selected for follow-up analysis. All samples were integrated by seurat's rpca method correction. Then, the "DIMS" parameter was set as 30, and the " k-nearest neighbor (KNN)" method was used to cluster the cells. Cell clusters were annotated with known cell types based on the cell type-specific markers obtained from the cellmarker 2.0 database [11].
Disulfidptosis‑related genes (DRGs) scoring
Twenty-three DRGs, SLC7A11, GYS1, NDUFS1, NDUFA11, NUBPL, NCKAP1, LRPPRC, SLC3A2, RPN1, ACTN4, ACTB, CD2AP, CAPZB, DSTN, FLNA, FLNB, INF2, IQGAP1, MYH10, MYL6, MYH9, PDLIM1 and TLN1, were colletced for DRGs scoring [7]. Applying “seurat” with addmoudlescore function, DRGs scoring was calculated. Cells were categorized into two distinct groups following the DRG-AUC (DRGs Area Under the Curve) values: those with high DRG-AUC and those with low DRG-AUC, with the median value being utilized as the threshold for classification.
Single-cell inferred chromosomal copy number variation (CNV)
To distinguish between benign and malignant cells in the tumor microenvironment, the copykat program (version 1.1.0) was applied to determine the genomic copy number distribution of individual cells. By integrating Bayesian techniques and hierarchical clustering, cells were classified as diploid normal cells or aneuploid tumor cells applying copykat program. The copykat program defines a model using a Gaussian mixture model (GMM), which assumes that a cell's gene expression is a mixture of three Gaussian models: amplification, deletion, and neutral states. Cells in which neutral genes account for at least 99% of expressed genes are categorized as high-confidence diploid cells. Therefore, we classify cells as diploid (benign) and aneuploid (tumor).
HdWGCNA analysis
The malignant cells and non-malignant cells in GSE212046 dataset were analyzed by R packages hdWgcna. The hdWGCNA package with KNN algorithm was employed to identify similar cells that could be aggregated. The average or sum expression of these cells was then calculated, resulting in a low sparse metacell gene expression matrix. The SetDatExpr function was utilized to specify Treg cells for constructing the expression matrix. Next, we performed parameter scans using the TestSoftPowers function to determine the optimal soft power threshold for constructing the co-expression network. The ConstructNetwork function was employed to establish the co-expression network based on the optimal soft threshold. Subsequently, the ModuleEigengenes (ME) function was utilized to calculate the module feature genes by performing principal component analysis (PCA) on a subset of the gene expression matrix specific to each module. Additionally, the ModuleExprScore function with either Seurat or UCell algorithm was used to compute the central gene feature score for each module. To visualize the correlation between modules, the ModuleCorrelogram function was applied, considering the hME, ME, or hub gene scores. Important modules were filtered based on their correlation coefficients and p-values with specific traits (e.g., tissue type, malignant cells, disulfide apoptosis score, and disulfide apoptosis group) using the GetModuleTraitCorrelation function.
Transcriptome data analysis
The R package ssGSEA was used to calculate the disulfidptosis gene score in the transcriptome data. The immune infiltration score for 28 immune cells was calculated based on the immune cell gene set [12]. The GSE54236 dataset was analyzed using the R package WGCNA, and the Pearson correlation coefficient was used to construct a Weight co-expression network. The cutreeDynamic function was used to divide the different modules with the "minClusterSize" parameter was set to 30. The correlation of each module with tissue type and disulfidptosis score was calculated to select significant modules. KEGG and GO analyses were performed using the R package ClusterProfiler based on the Entrez IDs of the genes in the important modules.
Construction of machine learning models
Based on the results of hdWgcna and Wgcna analyses with both the single-cell level and the transcriptome level, clusters of genes that correlate with both tissue type and disulfidptosis scores were selected for machine learning to select features and train models. Feature selection was performed using SelectFromModel in python scikit_learn library on RidgeCV, LassoCV, LDA, RandomForest, LinearSVC, LogisticRegression respectively. LogisticRegression, KNN, RandomForest, GradientBoosting, Linear Discriminant Analysis (LDA) were utilized to train the model. GridSearchCV function was used to optimize the model parameters. Models with fewer number of features, higher AUC values without overfitting and underfitting were selected to diagnose liver cancer cirrhosis. The models were validated on nine samples of liver cancer and five samples of cirrhosis.
Cell culture
Human hepatic stellate cells (LX-2) and liver cancer cells (HepG2 and Huh7) were obtained from ATCC (Manassas, VA, USA). All cells were maintained in RPMI-1640 medium (Gibco, Grand Island, NY, USA) in the presence of 10% fetal bovine serum (FBS; Gibco) and 1% penicillin-streptomycin (Sangon Biotech, Shanghai, China) at 37°C and 5% CO2 atmosphere. Cells were treated with different cell death inhibitor, ferroptosis inhibitor ferrostatin-1 (Ferr-1; 3 µg/mL; MedChemExpress, Monmouth Junction, NJ, USA), Caspase inhibitor Z-VAD-FMK (Z-VAD; 50 µM; MedChemExpress), disulfide bond reductant tris − (2-carboxyethyl) - phosphine (TCEP 2 mM; MedChemExpress). Cells were treated with 1% DMSO (MedChemExpress) as control.
Cell transfection
The SLC7A11 overexpression vector (SLC7A11) and small interfering RNA specially targeting RRAGD (si-RRAGD-1/2/3) were obtained from Sangon Biotech (Shanghai). Empty vector (EV) and scrambled siRNA (si-NC) were served as control. Cells were transfected with SLC7A11/EV or si-NC/si-SLC7A11 applying lipofectamine 2000 reagent (Thermo Fisher Scientific, San Jose, CA, USA).
Quantitative real-time PCR (qRT-PCR)
RNA extraction was carried out applying TRIzol regent (Thermo Fisher Scientific). PrimeScript™ RT reagent Kit (Takara, Dalian, China) and TB Green® Premix Ex Taq™ (Takara) were applied to synthesize cDNA and PCR reaction. GAPDH served as loading control. The relative expression of mRNA was analyzed by 2−ΔΔCT method.
Western blotting
Utilizing RIPA Lysis Buffer (Thermo Fisher Scientific), total proteins were extracted from cells. The proteins were separated by 10% SDS-PAGE, and then transferred on a nitrocellulose membrane. The membranes were incubated with 5% skimmed milk, and then incubated with primary antibodies, SLC7A11 (1:1000 dilution; Cat#ab307601; Abcam, Cambridge, MA, USA), RRAGD (1:1000 dilution; Cat# ab187679; Abcam) or anti-GAPDH (1:5000 dilution; Cat#10494-1-AP; Proteintech, Wuhan, China). Goat anti-rabbit HRP-IgG (1:2000 dilution; Cat#SA00001-2; Proteintech) was utilized to visualize protein bands. Finally, WB bands were developed by Enhanced chemiluminescence reagent (Beyotime, Shanghai, China) and analyzed by Image J software.
Detection of ATP
ATP Assay Kit (Beyotime) was used to detect the levels of ATP in cells following the protocol of manufacturer. Chemiluminescence of each sample was detected on a Luminoskan Ascent (Labsystems, Franklin, MA, USA).
Cell death
Annexin V-FITC Apoptosis Detection Kit (Beyotime, Shanghai, China) was applied to assess cell death. Cell suspension (104 cells) was incubated with 5 µL Annexin V-FITC and 10 µL PI in darkness at 25°C for 20 min. Finally, cell death was analyzed on a FACSCalibur flow cytometer (BD Biosciences, San Diego, CA, USA) in 1 h.
Statistical methods
All statistical analyses and plots were realized through R language or python package. The spearman correlation coefficient was used to assess the correlation between variables. Two-tailed Student’s t test and one-way ANOVA were used to analyze the statistical difference. P values less than 0.05 were considered statistically significant.