Data preparation
Datasets used in this study were retrieved from Gene Expression Omnibus (GEO) database and The Cancer Genome Atlas (TCGA) (Table 1). For datasets from GEO, raw data were download and then normalized by the robust multi-array (RMA) method. For TCGA HCC (http://www.cbioportal.org/study?id=lihc_tcga), level 3 data, including RNA-Seq, copy number variation, somatic mutation, and the corresponding clinical features were obtained using TCGAbiolinks [19]. Differentially expressed genes (DEGs) between tumor and normal tissues were identified using the two datasets of GSE39791 [20] and GSE57957 [21], as both of them were generated by affymatrix platform and consisted of matched tumor and normal samples. The expression profile of TCGA HCC was used to verify the filtered DEGs and to identify co-expression genes. GSE130012 [22] and GSE146806 [23] are gene expression profiles of colorectal cancer and PAAD cell lines with METTL3 knocked down. We selected these two datasets to validate the relationship between the identified hub gene and RNA modification-related genes. GSE14520 [24] is consisted of 488 tumor and paired non-tumor tissue of HCC samples, and we only selected 221 samples that were classified as high-risk and low-risk metastasis. GSE56545 contains 12 primary and 9 recurrent tumors.
Table 1 Datasets used in this study.
Data set
|
Normal
|
Tumor
|
Platform
|
Detail
|
GSE39791
|
72
|
72
|
GPL10558
|
Hepatocellular carcinoma
|
GSE57957
|
39
|
39
|
GPL10558
|
Hepatocellular carcinoma
|
TCGA
|
50
|
365
|
Illumina HiSeq
|
Hepatocellular carcinoma
|
GSE130012
|
3
|
3
|
Illumina HiSeq
|
colorectal cancer derived cell line
|
GSE146806
|
3
|
3
|
Illumina HiSeq
|
pancreas cancer derived cell line
|
GSE14520
|
|
221
|
GPL571
|
Hepatocellular carcinoma
|
GSE56545
|
|
21
|
GPL15433
|
Hepatocellular carcinoma
|
Collection of RNA modification-related genes
m6A modification is the most prominent type of RNA modification and many involved genes have been identified. However, other types of chemically modified DNA including m5C, m1a, etc., which also belong to RNA modifications are rarely investigated. Therefore, we first collected RNA modification-related genes (RMRGs) by searching public literature. Seventy-six genes that were reported involving in the modifications of mRNA, tRNA, miRNA, and rRNA by experimental data, were finally retrained (Table 2) from four references [5, 18, 25, 26].
Differential expression analysis
Low expression genes were defined as the expression values of more than 90% samples in each dataset equaled zero, and were then excluded before performing differential expression analysis. The normalized gene expression profiles of GSE39791 and GSE57957 were used to identify differentially expressed RMRGs (DEGs). Paired student t-test was used and the corresponding P-value was calculated to evaluate the differential expressions of RMRGs between tumor samples and matched normal samples. Adjusted P values were then calculated by Benjamini and Hochberg’s approach to controlling the false discovery rate (FDR). RMRGs with FDR less than 0.05 were recognized as DEGs.
Co-expression analysis
RNA-seq data of TCGA HCC was selected to preform co-expression analysis. The expression profiles of identified DEGs and other non-RMRGs were extracted and their correlation was assessed by Pearson correlation coefficient (Pearson's r). T-test P-value and the corresponding FDR was then calculated to evaluate the significance level of correlation. Non-RMRG was recognized as co-expression gene if its correlation coefficient was >= 0.6 and FDR was <0.05.
Gene Ontology (GO) and KEGG pathway analyses
Functional enrichment analysis including GO and KEGG pathway was performed in the web-based tool of GeneCodis 3.0 [27]. A new algorithm that summarizes significantly enriched terms was implemented in GeneCodis 3.0 to conduct enrichment analysis. We used all the human genes as background and adjusted P-value of the hypergeometric test to determine significantly enriched biological process terms and KEGG pathways.
PPI network and Hub Gene identification
The protein-protein interaction (PPI) network of co-expression genes were constructed using the Search Tool for the Retrieval of Interacting Genes/Proteins database (STRINGdb) [28]. We set 0.8 as the minimum cut-off of an interaction score. PPI network was then imported to Cytoscape v3.7.2 [29]. Network topological parameters were analyzed using the plugin of NetworkAnalyzer [30]. The node that had a maximum degree and closeness centrality was recognized as the hub gene of the PPI network.
Cell lines culture and transfection
Three HCC cell lines, SNU398, SNU449, and SNU475, were purchased from Shanghai Biochip Company Ltd. (Shanghai, China). Cells were propagated in Dulbecco's Modified Eagle's medium, then 100ul of the cell suspension was transferred to a six-well plate and incubated at 37°C with 5% CO2 for one day. Next, the small interfering-METTL3 (si-METTL3) and si-NC (Empty vector) were added to 200ul of Opti-MEM (serum- and antibiotics-reduced) mixed with 3 ul lipofectamine 2000 to a final concentration of 20 nM. Then the above mixture was transferred to the six-well plate containing cell suspension, followed by adding 700 ul of pre-warmed DMEM (no serum and double antibodies). After incubation in a 37°C cell incubator for 4-6h, the medium was replaced by the complete medium with 10% fetal bovine serum and 1% double antibodies. Cells were collected for subsequent experiments after 48h.
Quantitative PCR
Total RNA was extracted from the transfected cells after 48h using the TRIzol™ Reagent (Thermo Fisher Scientific) under the instructions. The HiScript Q Select RT SuperMix for qPCR kit (vazyme, Nanjing, China) and SYBR Green Master Mix kit (vazyme, Nanjing, China) were utilized for cDNA synthesis and quantitative PCR, respectively. Real-time qPCR was performed in the BioRad CFX96 PCR system. The three genes primers used in the study were as follows, GGACTCATGACCACAGTCCA (GAPDH forward), TCAGCTCAGGGATGACCTTG (GAPDH reverse), AACCAGGGTCTGGATTGTGA (METTL3 forward), TCCAGTTGGGTTGCACATTG (METTL3 reverse), ACTTCGTGGTGGTGCTAAGA (RPS27A forward), and CCCAGCACCACATTCATCAG (RPS27A reverse). GAPDH was selected as an internal reference, and 2-ΔΔCt was calculated to quantify the relative expressions of METTL3 and RPS27A.
Statistical analysis
Statistical analysis and figures plotting were conducted in R programming software (v3.6.1). Student t-test and Kruskal-Wallis rank test was used for the comparisons of paired groups and multiple groups respectively. For survival analysis, TCGA HCC patients were first divided into high group and low group according to the median expression value of hub gene. Then cox proportional hazards model was built to evaluate the association of gene expression with patient overall survival (OS) and disease-free survival (DFS) [31]. Difference between the survival curves of high group and low group was tested using the ‘survdiff’ method in the package ‘survival’ [32]. In this study, we assigned ‘****’, ‘***’, ‘**’, and ‘*’ indicating P-value was < 1.0e-5, 0.001, 0.01, and 0.05 respectively.