2.1 Participants
All participants including 115 HCC patients,103 patients with cirrhosis and 15 healthy individuals were enrolled from January 2018 to December 2019 in the Second Xiangya Hospital of Central South University and Hunan People’s hospital. All participants are provided with complete information on clinical characteristics. The eligibility criteria were as follows: 1) The participants are at least 18 years of age. 2)must be treatment naïve. 3) BCLC 0-A. Patients with intrahepatic cholangiocarcinoma including combine hepatocellular-cholangiocarcinoma or other malignancies were excluded. HCC and cirrhotic liver tissues were mainly obtained to initial surgical resection or biopsy. Blood sample tissues were obtained at the time of initial diagnosis. Healthy individuals samples were mainly blood samples defined as having no liver disease nor history of cancer at the time of enrollment. Baseline clinicopathologic data were collected at the time of initial diagnosis, including age, gender, personal history, HBV infection status and serum tumor markers. Laboratory analysis of CpG was done before therapy. Serum tumor markers are subject to clinical reference values. The cutoff value for the CpG locus was located at fifty percent. Values above 50 percent are defined as elevated, otherwise normal.
2.2 DNA extraction from tissues and plasma
DNA from tumor tissue, cirrhotic liver tissue were extracted using the QIAamp DNA FFPE Tissue Kit (Qiagen, Valencia, CA, USA). The absence of tumor cells in cirrhotic live tissue was confirmed by histopathological assessment. Circulating cell-free DNA (cfDNA) was recovered from 4 to 5 ml of plasma using the QIAamp Circulating Nucleic Acid kit (Qiagen, Valencia, CA, USA).DNA was quantified with the Qubit 2.0 fluorimeter (ThermoFisher Scientific, Waltham, MA, USA).
2.3 Targeted bisulfite sequencing
Fragmented tissue DNA (~200bp) and cfDNA were subjected to bisulfite conversion using EZ-96 DNA methylation-lightening MagPrep (Zymo research, CA, USA). Briefly, purified DNA was treated with sodium bisulfite. Subsequently, the converted single-strand DNA molecules were ligated to a splinted adapter, and amplified by an uracil-tolerating DNA polymerase to generate whole-genome BS-seq libraries. Custom-designed methylation profiling RNA baits were used for target enrichment. The target libraries were subsequently quantified by real-time PCR (Kapa Biosciences Wilmington, MA, USA) and sequenced on NovaSeq 6000 (Illumina, San Diego, CA, USA) with an average sequencing depth of 500X for tissue samples and 1,000X for plasma samples.
2.4 Methylation data processing
Raw sequencing data (.fastq) were first trimmed by Trimmomatic (v.0.36) and the aligned by BWA-meth to the C to T- and G to A-transformed hg19 reference genome. PCR duplicate reads were identified and removed by Picard tools (v.1.138). Paired reads were stitched together to represented to originating DNA fragments, and those with discordant pairing, or low mapping quality(MAPQ<60) were removed from further analyses.
2.5Independence of the prediction model from clinical characteristics
To determine whether the predictive power of the prediction model could be independent of other clinical variables (including age, gender, personal history, HBV infection status, serum tumor markers and CpG methylation level) for patients with HCC, univariate and multivariate logistic regression analyses were conducted, with the other traditional clinical characteristics as independent variables and the pathological type as the dependent variable. All reported P values were two-sided. The hazard ratio (HR) and 95% confidence intervals were calculated.
2.6 Construction and validation a prediction nomogram
To construct a CpG-based model for HCC diagnosis, R software (R software 4.1.2, USA) is used for building diagnostic predictive models and validation. In this study, we can use nomograms to predict the diagnosis in high-risk populations. All participants were randomly divided into two groups, training cohort (n=133) and validation cohort (n=100).The combined model based on all independent predict factors selected by the multivariable logistic regression analysis was used to construct a nomogram to assess the probability of early screening in high-risk populations. Subsequently, validation, including discrimination and calibration, were performed. The calibration curve of the nomogram was evaluated graphically by plotting the nomogram prediction probabilities against the observed rates. Overlapping with the reference line demonstrated that the model was in perfect agreement. At the same time, use the ROC analysis and the decision curve analysis (DCA) to compared the predictive accuracy. The p value of less than 0.05 reflected a statistically significant difference.