A CpG-based prediction model for the diagnosis of hepatocellular carcinoma patients

doi:10.21203/rs.3.rs-2463318/v1

Download PDF

Research Article

A CpG-based prediction model for the diagnosis of hepatocellular carcinoma patients

https://doi.org/10.21203/rs.3.rs-2463318/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Objective: Hepatocellular carcinoma(HCC),the most prevalent form of liver cancer, owns high morbidity and mortality. Early diagnosis for HCC is critical for the treatment and prognosis. Early diagnosis plays an important role in the improvement of HCC prognosis.

Methods: All clinical characteristics of 233 participants from multicenter were collected, including 115 HCC patients, 103 patients with cirrhosis, and 15 samples from healthy individuals. We identified several indicators significantly associated with HCC morbidity through logistic analysis to develop the prediction model. Further analysis revealed the independent predictive capacity of the predictive model. A nomogram comprising the predictive model was established, and data on 133 patients was utilized for the development of the model and on 100 patients was utilized for the validation. Furthermore, dozens of patients with tumors smaller than 2cm were collected for additional validation. The receiver operating characteristic (ROC) curve analysis confirmed the good performance of the predictive model.

Results: As a result, we identified five indicators that were significantly associated with HCC morbidity through univariate analysis and multivariate analysis. The predictive model was consist of age, drinking status and blood indicators, including AFP(alpha-fetoprotein),HBV(hepatitis B virus)infection status and a differential methylation CpG site. All the factors above were incorporated into the nomogramand the application of the nomogram gave good discrimination and good calibration. Calibration curves showed a favorable consistency between the predicted probabilities. ROC curve analysis showed that the nomogram had good discrimination, with AUC of 0.852 and 0.857 in the training group and validation group, respectively. Moreover, decision curve analysis has been implemented to evaluate and compare prediction nomogram.

Conclusion: The study provides a novel model for early diagnosis HCC, better than traditional screening and diagnostic indicators.

Hepatocellular carcinoma

diagnosis model

DNA methylation

Primary liver cancer is the sixth most common malignant tumor and the third leading cause of cancer-related deaths worldwide [1, 2]. Hepatocellular carcinoma(HCC), the most prevalent form of liver cancer, the majority cases of which develop from chronic hepatitis including hepatitis B and C virus, and usually progress to hepatic cirrhosis, finally a tumor[2]. The early insidious symptoms manifest as atypical gastrointestinal symptoms as abdominal pain and anorexia. Forasmuch, a considerable number of patients developed into the advanced stage at the initial diagnosis. The poor prognosis of HCC is in large part related to late-stage diagnosis, as symptoms do not appear until advanced stages when there are fewer effective treatment options[3]. In addition, invasiveness and resistance to therapeutic drugs in advanced HCC patients are responsible for treatment failure and making HCC the most recurrent cancer worldwide[4]. If identified at an early state, surgical resection offers a favorable prognosis, providing 5-year survival rates of more than 70%[5]. Accordingly, early and accurate diagnosis of HCC is of great significance in clinical decision-making and treatment.

At high risk individuals with cirrhosis and/or select hepatitis B carrier are recommended repeatedly undergone surveillance from ultrasound (U/S) with or without serum α-fetoprotein(AFP) measurement every 6 months observing changes of suspicious liver nodules[6]. Still, the performance of currently recommended surveillance strategies is suboptimal, particularly for early-stage detection. Due to the lack of adequate specificity and sensitivity, there is a part of early patients being underdiagnosed[7]. Magnetic resonance imaging (MRI)/computer tomography (CT) scan can exceed a sensitivity of 50% in early-stage subjects, but this procedure is typically reserved for those at risk since it is expensive and uncomfortable[8]. Currently, most HCC patients have been detected on the basis of clinical symptoms at advanced stage, rather than by high-quality screening techniques. More development of efficient and accurate screening biomarkers remains an urgent unmet clinical need.

One emerging strategy for HCC detection is the evaluation of circulating tumor biomarkers owning the characteristics of convenient operation and high repeatability. DNA methylation one of the widely studied and well-understood epigenetic modifications mainly refers to the methylation of the 5th carbon atom on cytosine in CpG dinucleotides[9, 10]. Aberrant DNA methylation consisting of losses and gains of 5-methyl-cytosin within the CpG dinucleotides is prevalent in cancer. In the process of tumorigenesis, DNA methylation acts as a biological feature through upregulation of DNA methyltransferase genes[11]. The methylation pattern is irreversible and highly stable in the early stage of tumorigenesis, detection of which can sensitively screen out suspicious cases in high-risk groups[12]. Numerous studies have shown that prevalence of specific methylation abnormalities was significantly correlated with HBV-mediated HCC progression[13, 14].

Based on the convenience of non-invasive fluid biopsy, the study is aimed to develop a simple blood-based model for predicting HCC. We established a predictive model consisting of general clinical characteristics and blood indicators. Earlier, we developed a sensitive blood-based non-invasive HCC screening model that can effectively distinguish early-stage HCC patients from high risk population[15]. For better clinical application, we try to combine clinical indicators and methylated CpG site to create a diagnostic model to determine whether it owns a sensitive effect.

2.1 Participants

All participants including 115 HCC patients,103 patients with cirrhosis and 15 healthy individuals were enrolled from January 2018 to December 2019 in the Second Xiangya Hospital of Central South University and Hunan People’s hospital. All participants are provided with complete information on clinical characteristics. The eligibility criteria were as follows: 1) The participants are at least 18 years of age. 2)must be treatment naïve. 3) BCLC 0-A. Patients with intrahepatic cholangiocarcinoma including combine hepatocellular-cholangiocarcinoma or other malignancies were excluded. HCC and cirrhotic liver tissues were mainly obtained to initial surgical resection or biopsy. Blood sample tissues were obtained at the time of initial diagnosis. Healthy individuals samples were mainly blood samples defined as having no liver disease nor history of cancer at the time of enrollment. Baseline clinicopathologic data were collected at the time of initial diagnosis, including age, gender, personal history, HBV infection status and serum tumor markers. Laboratory analysis of CpG was done before therapy. Serum tumor markers are subject to clinical reference values. The cutoff value for the CpG locus was located at fifty percent. Values above 50 percent are defined as elevated, otherwise normal.

2.2 DNA extraction from tissues and plasma

DNA from tumor tissue, cirrhotic liver tissue were extracted using the QIAamp DNA FFPE Tissue Kit (Qiagen, Valencia, CA, USA). The absence of tumor cells in cirrhotic live tissue was confirmed by histopathological assessment. Circulating cell-free DNA (cfDNA) was recovered from 4 to 5 ml of plasma using the QIAamp Circulating Nucleic Acid kit (Qiagen, Valencia, CA, USA).DNA was quantified with the Qubit 2.0 fluorimeter (ThermoFisher Scientific, Waltham, MA, USA).

2.3 Targeted bisulfite sequencing

Fragmented tissue DNA (~200bp) and cfDNA were subjected to bisulfite conversion using EZ-96 DNA methylation-lightening MagPrep (Zymo research, CA, USA). Briefly, purified DNA was treated with sodium bisulfite. Subsequently, the converted single-strand DNA molecules were ligated to a splinted adapter, and amplified by an uracil-tolerating DNA polymerase to generate whole-genome BS-seq libraries. Custom-designed methylation profiling RNA baits were used for target enrichment. The target libraries were subsequently quantified by real-time PCR (Kapa Biosciences Wilmington, MA, USA) and sequenced on NovaSeq 6000 (Illumina, San Diego, CA, USA) with an average sequencing depth of 500X for tissue samples and 1,000X for plasma samples.

2.4 Methylation data processing

Raw sequencing data (.fastq) were first trimmed by Trimmomatic (v.0.36) and the aligned by BWA-meth to the C to T- and G to A-transformed hg19 reference genome. PCR duplicate reads were identified and removed by Picard tools (v.1.138). Paired reads were stitched together to represented to originating DNA fragments, and those with discordant pairing, or low mapping quality(MAPQ＜60) were removed from further analyses.

2.5Independence of the prediction model from clinical characteristics

To determine whether the predictive power of the prediction model could be independent of other clinical variables (including age, gender, personal history, HBV infection status, serum tumor markers and CpG methylation level) for patients with HCC, univariate and multivariate logistic regression analyses were conducted, with the other traditional clinical characteristics as independent variables and the pathological type as the dependent variable. All reported P values were two-sided. The hazard ratio (HR) and 95% confidence intervals were calculated.

2.6 Construction and validation a prediction nomogram

To construct a CpG-based model for HCC diagnosis, R software (R software 4.1.2, USA) is used for building diagnostic predictive models and validation. In this study, we can use nomograms to predict the diagnosis in high-risk populations. All participants were randomly divided into two groups, training cohort (n=133) and validation cohort (n=100).The combined model based on all independent predict factors selected by the multivariable logistic regression analysis was used to construct a nomogram to assess the probability of early screening in high-risk populations. Subsequently, validation, including discrimination and calibration, were performed. The calibration curve of the nomogram was evaluated graphically by plotting the nomogram prediction probabilities against the observed rates. Overlapping with the reference line demonstrated that the model was in perfect agreement. At the same time, use the ROC analysis and the decision curve analysis (DCA) to compared the predictive accuracy. The p value of less than 0.05 reflected a statistically significant difference.

3.1 Differential DNA methylation between HCC and non-malignant participants

A differential expressed CpG site (cg14826425) were identified in 115 HCC patients samples, 103 cirrhosis samples and 15 healthy individuals samples, with well-integrated clinical information and reliable statistics for further analysis. It can be seen that the methylation site is significantly higher expressed in the HCC population than in the liver cirrhosis population and healthy population(Supplement Figure 1 and Supplement Figure2). A flow chat of the analysis procedure was depicted to describe the study more clearly (Figure 1). The baseline characteristics and pathological of the 223 participants enrolled in this study are summarized in Table 1.

Table 1. The Clinical characteristics and pathological of participants.

3.2 Logistic regression analysis and the risk stratification indicate a good performance

We conducted a univariate and multivariate logistic regression analysis to investigate the correlation of the clinical factors with the diagnosis of the HCC patients and identified 5 indicators related to HCC when the P value was ＜0.05 as age, HBV infection status, AFP, child-pugh classification (CPC) and cg14826425 methylation level. However, sex, smoking history and other tumor markers did not correlate with HCC(Figure 2A,2B). The child-pugh classification is one of the commonest beside tools utilized in estimating prognosis in patients with cirrhosis[16]. However, its usage as a risk prediction tool to HCC remains revisited. The child-pugh classification system has been recognized as limited in its ability to assess patients with good hepatic reserve, which is not suitable for assessing patients with recently diagnosed HCC and good remnant hepatic function[17]. Based on this, CPC did not been included as a predictor. On the other hand, alcoholic liver disease is the most prevalent type of chronic liver disease worldwide that can progress from alcoholic fatty liver to alcoholic steatohepatitis, finally to hepatocellular cancer in some cases[18, 19]. Excessive alcohol consumption remains an intractable risk factor of HCC. We incorporated drinking history into the multivariate logistic regression analysis, although no significant statistical significance was observed in the univariate logistic regression analysis. As shown in the forest diagram, age, HBV infection status, AFP and cg14826425 site were identified as independent predictive factors in training and testing sets (Figure 2A, B). Combining the predictive indicators could further separate HCC from those patients with cirrhosis and healthy individuals. Multivariate regression analysis indicates that a combination of five indicators could robust prediction method for HCC diagnosis.

3.3 Building and validating a predictive nomogram

To establish a clinical application method for predicting the early diagnosis of HCC from high-risk population, we developed a nomogram to predict the probability of HCC diagnosis in the recruited cohort. The predictors of the nomogram included five independent predictive factors (age, drinking history, HBV infection status, AFP, and cg14826425 site; Figure 3A). Calibration plots were used to visualize the performances of the nomogram. The 45°line represented the best prediction. Calibration plots showed that the nomogram performed well (Figure 3B,3C). At the same time, the performance of AUC of the nomogram model discrimination (AUC of training cohort=0.852, AUC of validation cohort=0.857) was significantly higher than that of age, AFP, HBV infection status and cg14826425 site (Figure 4A,4B). Similarly, the sensitivity and specificity in the training cohort were 0.812 and 0.781, respectively, in the validation cohort were 0.978 and 0.593, respectively, which are better than the AFP value(sensitivity=0.638 and specificity=0.750 of training cohort, sensitivity=0.717 and specificity=0.870 of validation cohort)(Supplement Figure 3). The assessment was performed both internally and externally, measured by C-index and calibration plots. The nomogram-related C-indexes were 0.852. The clinical usefulness was assessed using DCA. The nomogram showed the best benefit (Figure 3D,3E). These findings demonstrate that compared with nomograms built with a single prognostic factor, the nomogram built with the combined model is the best nomogram for predicting the early diagnosis of HCC from high-risk population, whether in the short or long term, which might facilitate patient counselling, decision-making and follow-up scheduling. In addition, we finally included a group of earlier HCC patients for validation whose tumor size is smaller than 2 cm (Supplement table 1). In this cohort，the AUC of the nomogram is also higher than AFP (Figure 4C). And the sensitivity of nomogram is 0.815 and specificity is 0.833, of AFP is 0.593 and 0.833, respectively (Supplement Figure 3).

Hepatocellular carcinoma is a systemic disease and required to be evaluated from an integrated viewpoint. Recognizing early-stage hepatocellular cancer is an urgent need to improve patients’ outcomes and receive more effective therapies[20]. The simultaneous use of multiple predictive markers has a higher efficiency than the use of single markers in the early diagnosis of cancer[21]. Moreover, due to the great individual differences and complicated influential factors, the traditional analytic strategies are often unable to predict the HCC patients from high-risk population. In our study, we used logistic regression to establish a multiple marker prediction model based on plasma samples and clinical characteristic to study the correlation between the five-marker model and HCC. The results showed that the model had better diagnostic value for HCC than AFP in both training cohorts and validation cohorts.

DNA methylation is a type of covalent chemical modification and a stable (replication-coupled) epigenetic marker. It can be detected in biological fluids and fresh-frozen and paraffin-embedded tissue samples, by methylome profiling in the clinical setting[22]. The high-throughput detection of genetic alterations has been widely used in the early diagnosis, individual treatment, and prognosis prediction of various cancers[23]. CpG island methylation, a common molecular tumor marker, has already been confirmed as a diagnostic and prognostic biomarkers for the most common cancers[24]. However, measurement of dozens of CpGs was laborious and involved fussy work. Therefore, we applied targeted bisulfite sequencing, which is based on the next-generation sequencing (NGS) method to assess the methylation status of targeted CgGs. In our previous study, a total of 2321 differentially methylated markers were identified using a highly sensitive DNA methylation profiling technique based on NGS by comparing the methylation profiles obtained from HCC, normal, and LC tissue samples. Our model yielded significantly improved performance over serum AFP testing for early-stage HCC versus non-HCC controls. This is also our first attempt to explore the significance of DNA methylation in the early diagnosis of liver cancer that has shown good results. In this study, the CpG site with significant differences between HCC and non-HCC controls in expression combined with clinical indicators to determine whether it is possible to quickly determine the probability of HCC in high-risk populations. The five-marker model performs well and lays a foundation for the future use of multi-indicator combination for early diagnosis in cancers.

A nomogram is a statistical tool that provides the individual patient with the overall probability of a particular outcome. In this study, we constructed a nomogram built with a combined model to accurately predict the likelihood of HCC in high-risk patients. The calibration plots indicated that actual diagnosis corresponded closely with predicted diagnosis, suggesting that the predictive performance of the nomogram was good. Meanwhile, we demonstrated that the combined model is the best by AUC compared with other built with a single risk factor. It is worth mentioning that the included indicators of nomogram are usually those with statistically significant in multivariate logistic regression analysis. We found that CPC is a significant indicator but not a diagnostic factor. Liver function of liver cirrhosis patients could be worse than HCC patients whose liver function can maintain at the normal level as that of healthy individuals. Apart from the drink status index, the other four indicators are statistically significant both in the univariate and multivariate logistic regression analysis.

In the study, just over 200 patients were included since the clinical information of each participant is required. And more than 100 patients were randomly assigned to the training group and the validation group. Despite the small number of people, the clinical data is complete and the analysis results are accurate. We randomly selected a CpG site with obvious differential expression among the 2321 CpG sites. Fortunately, the model we built showed good results. This is our first attempt to model clinical indicators in combination with DNA methylation, and with this result we are confident that we can build a better model.

We performed a comprehensive analysis of DNA methylation and clinical characteristic datasets of HCC data obtained from clinical patients and identified an important predictive signature in this study. This should be helpful for HCC diagnosis prediction and personalized treatment of high-risk population of HCC.

Ethical Approval

The ethics committee of The Second Xiangya Hospital of Central South University waived the need of informed consent. All experimental protocols were approved by The Second Xiangya Hospital of Central South University.

Competing interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Authors' contributions

Luo Biyuan have drafted the work and substantively revised it. Chen Zui put forward constructive opinions on the article, and Zhou Ning has collected the participants tissue, and Liu Xianling put forward important opinions on the content and ideas of the whole article. All authors agree to be personally accountable for the author’s own contributions.

Funding

This study was supported by the Natural Science Foundation of Hunan Province, China (2020JJ4796), and the Fundamental Research Funds for the Central Universities of Central South University(No.2021zzts0398).

Availability of data and materials

The datasets analysed during the current study available from the corresponding author on reasonable request.

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Ca-Cancer J Clin 2021, 71(3):209–249.
Forner A, Reig M, Bruix J: Hepatocellular carcinoma. Lancet 2018, 391(10127):1301–1314.
Llovet JM, Bru C, Bruix J: Prognosis of hepatocellular carcinoma: the BCLC staging classification. Semin Liver Dis 1999, 19(3):329–338.
Altekruse SF, McGlynn KA, Reichman ME: Hepatocellular carcinoma incidence, mortality, and survival trends in the United States from 1975 to 2005. J Clin Oncol 2009, 27(9):1485–1491.
Takayama T, Makuuchi M, Kojiro M, Lauwers GY, Adams RB, Wilson SR, Jang HJ, Charnsangavej C, Taouli B: Early hepatocellular carcinoma: pathology, imaging, and therapy. Ann Surg Oncol 2008, 15(4):972–978.
Heimbach JK, Kulik LM, Finn RS, Sirlin CB, Abecassis MM, Roberts LR, Zhu AX, Murad MH, Marrero JA: AASLD guidelines for the treatment of hepatocellular carcinoma. Hepatology 2018, 67(1):358–380.
Atiq O, Tiro J, Yopp AC, Muffler A, Marrero JA, Parikh ND, Murphy C, McCallister K, Singal AG: An assessment of benefits and harms of hepatocellular carcinoma surveillance in patients with cirrhosis. Hepatology 2017, 65(4):1196–1205.
Yu NC, Chaudhari V, Raman SS, Lassman C, Tong MJ, Busuttil RW, Lu DS: CT and MRI improve detection of hepatocellular carcinoma, compared with ultrasound alone, in patients with cirrhosis. Clin Gastroenterol Hepatol 2011, 9(2):161–167.
Baylin SB, Jones PA: A decade of exploring the cancer epigenome - biological and translational implications. Nat Rev Cancer 2011, 11(10):726–734.
Wu H, Zhang Y: Mechanisms and functions of Tet protein-mediated 5-methylcytosine oxidation. Genes Dev 2011, 25(23):2436–2452.
Ozen C, Yildiz G, Dagcan AT, Cevik D, Ors A, Keles U, Topel H, Ozturk M: Genetics and epigenetics of liver cancer. N Biotechnol 2013, 30(4):381–384.
Bergman Y, Cedar H: DNA methylation dynamics in health and disease. Nat Struct Mol Biol 2013, 20(3):274–281.
Park IY, Sohn BH, Yu E, Suh DJ, Chung YH, Lee JH, Surzycki SJ, Lee YI: Aberrant epigenetic modifications in hepatocarcinogenesis induced by hepatitis B virus X protein. Gastroenterology 2007, 132(4):1476–1494.
Nagaraju GP, Dariya B, Kasa P, Peela S, El-Rayes BF: Epigenetics in hepatocellular carcinoma. Semin Cancer Biol 2021.
Luo B, Ma F, Liu H, Hu J, Rao L, Liu C, Jiang Y, Kuangzeng S, Lin X, Wang C et al: Cell-free DNA methylation markers for differential diagnosis of hepatocellular carcinoma. BMC Med 2022, 20(1):8.
Kok B, Abraldes JG: Child-Pugh Classification: Time to Abandon? Semin Liver Dis 2019, 39(1):96–103.
Kumada T, Toyoda H, Tada T, Yasuda S, Tanaka J: Changes in Background Liver Function in Patients with Hepatocellular Carcinoma over 30 Years: Comparison of Child-Pugh Classification and Albumin Bilirubin Grade. Liver Cancer 2020, 9(5):518–528.
Seitz HK, Bataller R, Cortez-Pinto H, Gao B, Gual A, Lackner C, Mathurin P, Mueller S, Szabo G, Tsukamoto H: Alcoholic liver disease. Nat Rev Dis Primers 2018, 4(1):16.
McGlynn KA, Petrick JL, El-Serag HB: Epidemiology of Hepatocellular Carcinoma. Hepatology 2021, 73 Suppl 1:4–13.
Chaiteerakij R, Addissie BD, Roberts LR: Update on biomarkers of hepatocellular carcinoma. Clin Gastroenterol Hepatol 2015, 13(2):237–245.
Shariat SF, Karakiewicz PI, Ashfaq R, Lerner SP, Palapattu GS, Cote RJ, Sagalowsky AI, Lotan Y: Multiple biomarkers improve prediction of bladder cancer recurrence and mortality in patients undergoing cystectomy. Cancer 2008, 112(2):315–325.
How Kit A, Nielsen HM, Tost J: DNA methylation based biomarkers: practical considerations and applications. Biochimie 2012, 94(11):2314–2337.
Soto J, Rodriguez-Antolin C, Vallespin E, de Castro Carpeno J, Ibanez de Caceres I: The impact of next-generation sequencing on the DNA methylation-based translational cancer research. Transl Res 2016, 169:1–18 e11.
Costa-Pinheiro P, Montezuma D, Henrique R, Jeronimo C: Diagnostic and prognostic epigenetic biomarkers in cancer. Epigenomics 2015, 7(6):1003–1015.

No competing interests reported.

SupplementFigureandtable.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

A CpG-based prediction model for the diagnosis of hepatocellular carcinoma patients

Status:

Version 1

Abstract

Figures

1. introduction

2. Participants and Methods

3. Results

4. Discussion

5. Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1