Patients and Samples
This study included 37 patients who underwent curative hepatic surgery with immediate postoperative specimen collection. All surgeries were performed between November 2014 and August 2021. Of these patients, 33 were diagnosed with primary HCC and paired samples were collected from both cancerous and adjacent noncancerous liver tissues. The remaining four patients included three with liver metastases from colorectal, stomach, and lung cancers and one with hilar cholangiocarcinoma, and samples were obtained from non-cancerous liver tissues. Samples were obtained from patients who did not develop HCC before HBV infection. Consequently, 70 unique specimens were included in this study. Each patient provided informed consent and the Human Ethics Review Committee of Dokkyo Medical University approved the study (approval no. R-30-8J), which was in compliance with the ethical standards of the Declaration of Helsinki.
DNA Preparation, Library Construction, Quality Control, and Sequencing
DNA was extracted using a QIAamp DNA Mini Kit (Qiagen, Hilden, Germany). The genomic DNA was fragmented into a target size range of 180–280 base pairs. These fragments were then processed by end repair, A-tailing, and ligation using Illumina adapters. Following ligation, the fragments were subjected to amplification by polymerase chain reaction, size selection, and purification. The libraries were subjected to hybridization capture using biotinylated probes and magnetic beads coated with streptavidin to isolate the exons of interest. After washing away non-hybridized fragments and digesting the probe, the captured libraries were enriched using additional PCR amplification. We assessed the quality of the library using a Qubit fluorometer, real-time PCR for quantification, and a bioanalyzer for size distribution assessment. Whole exome sequencing (WES) was performed using the Illumina NovaSeq 6000 platform an a paired-end 150 bp sequencing strategy.
Data Quality Control
Stringent quality control measures were used to ensure the data integrity. In particular, paired-end reads were discarded in instances in which one read contained adapter contamination, as identified by the alignment of more than ten nucleotides to the adapter sequence with a mismatch tolerance of 10% or less. Reads were also excluded if they exhibited a high proportion of uncertain nucleotides, denoted by 'N,' exceeding 10% in either read. Furthermore, reads were discarded if they contained a substantial fraction of low-quality bases, with a Phred quality score of < 5, constituting more than half of the bases in either read of the pair.
Alignment, Variant Calling, and Target Gene Sorting
After quality control, the clean reads were aligned to the human reference genome (hg38) using the Burrows-Wheeler Aligner (BWA). The resultant mapping files in BAM format were further processed for sorting using SAMtools and duplicate marking using Picard. Genome Analysis Toolkit (GATK) was used for initial calls of single nucleotide polymorphisms (SNPs) and insertions/deletions (InDels). The GATK filtering parameters were as follows: Hard-filtering parameters: (1) stand-call-conf 30, minimum confidence threshold for variant calling, (2) QualByDepth < 2.0, (3) Fisher’s strand > 60.0, (4) RMSMapping < 40.0, (5) Haplotype score > 13.0, (6) MappingQualityRankSumTest < -12.5, (7) ReadPosRankSumTest < -8.0. Subsequent annotation of these genomic variants was performed using ANNOVAR, which provided comprehensive insights into the protein-coding changes, affected genomic regions, allele frequencies, and predicted deleterious effects. The 1000 Genomes Project served as the reference for identifying genetic polymorphisms. After excluding known SNPs with a variant allele frequency of > 0.01 in the East Asian population, including the Japanese population, we focused on 25 genes previously reported to be associated with HBV and HCC [7–11]. For the detection of copy number variants (CNVs) and regions with a loss of heterozygosity, we utilized Control-Freec, with default parameters for filtering. Similar to the SNVs and InDels, CNVs were evaluated in the same set of 25 genes.
The 25 genes of interest were TERT', CTNNB1, AXIN1, APC, GLUL, LGR5, TP53, CDKN2A, RB1, CCND1, FGF19, CCNE1, NFE2L2, ARID1A, ARID2, MLL, MLL2, MLL3, MLL4, RPS6KA3, PTEN, PIK3CA, TSC1, TSC2, and IRF2. These genes are reported to be involved in HCC carcinogenesis through various pathways [7–11]. TERT is involved in telomere maintenance, and mutations in TERT can promote escape from cellular senescence and favor cancer cells. CTNNB1, APC, and AXIN1 are involved in the Wnt-β-catenin pathway, and mutations in these pathways cause the accumulation and nuclear translocation of β-catenin, which in turn promotes expression of genes, such as GLUL and LGR5. CCND1, CDKN2A, RB1, and CCNE1 are involved in cell cycle progression from the G1 to the S checkpoint. TP53 is a well-known tumor suppressor gene associated with HCC. IRF2, a tumor suppressor gene that controls p53 protein activation, has also been identified in HBV-HCC. ARID1A and ARID2 encode part of a chromatin-remodeling complex called SW1/SNF. MLL(KMT2A), MLL2(KMT2B), MLL3(KMT2C), and MLL4(KMT2D) are involved in the methylation of lysine fourth in histone H3, which is also an epigenetic component reported to be involved in HCC carcinogenesis. FGF19 belongs to the FGF family and has been reported to promote HCC cell proliferation via the MAPK pathway. RP6SKA3, PTEN, PIK3CA, TSC1, and TSC2 have been reported to contribute to HCC carcinogenesis via the PI3K-Akt-mTOR pathway.
Clinical Information
We compiled all patient information from medical records at Dokkyo Medical University Hospital. Definitions were established for clinical terminology. Prior HBV infection was defined as patients negative for HBs Ag and positive for HBc Ab, and active HBV infection indicated those positive for serum HBsAg. HBcAb levels were quantified using a Cobas electrochemiluminescence immunoassay (Roche, Mannheim, Germany), and HBV DNA levels were determined using a TaqMan PCR assay (Roche, Basel, Switzerland) with a detection threshold of 2.1 log copies/mL. The term "alcoholic" was designated for individuals consuming 60 g or more of ethanol daily for a prolonged period, whereas "MASH" was defined based on the pathological findings in the resected non-cancerous areas, with an ethanol intake below 30 g.
Patient Follow-Up
Postoperative surveillance involved dynamic computed tomography (CT) and/or magnetic resonance imaging (MRI) every 3–6 months. Serum levels of biomarkers, such as alanine aminotransferase, alpha-fetoprotein (AFP), and des-gamma-carboxyprothrombin (DCP), were monitored at intervals of 1–6 months. Standard care protocols were followed for surveillance in Japan. If HCC recurrence was indicated in the screening examinations, confirmatory procedures, such as dynamic CT and MRI, were employed. The diagnosis was corroborated by the histology of the resected specimens and distinctive radiological features. The follow-up period was measured from the date of hepatectomy until the date of death, recurrence, or last available medical record, with a mean duration of 36 months (range, 2–105 months).
Statistical Analyses
We used R and SPSS for statistical analyses. We applied the chi-squared test and Fisher's exact test to examine whether there was an association between the presence or absence of gene mutations and liver status. The liver status was classified into three categories: HBV-HCC, HCC from prior HBV infection, and no HCC with prior HBV infection (Fig. 1). Correlations between mutations and subsequent recurrence were also examined in cases of HCC with prior HBV infection. Survival analyses were performed using the Kaplan–Meier method with log-rank tests. Univariate survival assessments were performed using a Cox proportional hazards model.
Confirmation of Variants by Sanger Sequencing
Sanger sequencing was used to identify important genetic variants in frozen specimens, similar to those used in WES. PCR was performed using primers that matched the target region, and the PCR products were purified. The Expand High Fidelity PCR System dNTPack (Roche, Mannheim, Germany) and Fast Gene Gcl/PCR Extraction Kit (FastGene, Tokyo, Japan) were used. Sequencing was performed using a Big Dye Terminator V1.1 Sequencing Kit (Thermo Fisher Scientific, Waltham, MA, USA) and a Hitachi 3500 Sequencer. Variants were also confirmed using blood samples to determine whether they were somatic or germline.
Functional Estimation of Variants
We performed in silico functional predictions of candidate variants using SIFT, Polyphen-2, and CADD. SIFT and Polyphen-2 scores were obtained simultaneously with CADD functional prediction. The thresholds for each tool were as follows. Using SIFT, scores close to 0 indicated that a mutation had a significant impact and scores < 0.05 indicated that a mutation was “deleterious." Using Polyphen-2, scores close to 1 indicated a more substantial effect and mutations were identified as “probably damaging" when the score exceeded 0.908. CADD uses a PHRED-like score, and a score of 20 or higher indicated that the mutation was in the top 1% with respect to damage, suggesting that it had a high impact on protein function.