Sample collection and preparation.
Control and mCRC samples from four patients were used to prepare for circRNA-Seq. On 1.5% agarose gels, we monitored RNA degradation and contamination. Using the NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, Wilmington, DE) and the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 System (Agilent Technologies, CA, USA), we measured RNA concentration as well as purity and assessed RNA integrity. We used the Ribo-Zero rRNA Removal Kit (Epicentre, Madison, WI, USA) to removal rRNA from total 1.5 μg RNA each sample. According to the manufacturer, NEBNextR UltraTM Directional RNA Library Prep Kit for IlluminaR (NEB, USA) was used to prepare libraries.
Under elevated temperature, we used divalent cations to carry out fragmentation in NEBNext First Strand Synthesis Reaction Buffer(5X). Random hexamer primer and Reverse Transcriptase were used to synthesize first strand cDNA. Subsequently, DNA Polymerase I and RNase H was used to synthesize second-strand cDNA. Using exonuclease/polymerase activities, we converted remaining overhangs into blunt ends. 3’ ends of DNA fragments were adenylate and then NEBNext Adaptor were ligated which have hairpin loop structure. All that were prepare for hybridization. For selecting insert fragments, we purified the library fragments with AMPure XP Beads (Beckman Coulter, Beverly, USA). The length of insert fragments were about 150~200 bp preferentially. Before PCR, 3 μl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37°C for 15 min. Followed by which, we conducted PCR with Universal PCR primers, Phusion High-Fidelity DNA polymerase and Index(X) Primer. Finally, we purified PCR products (AMPure XP system) and assessed library quality on the qPCR and the Agilent Bioanalyzer 2100. According to the manufacturer, we used TruSeq PE Cluster Kitv3-cBot-HS(Illumia) and performed the index-coded samples clustering on acBot Cluster Generation System. Cluster was generated. Followed by which, we can sequence the library preparations on an Illumina platform and generate the reads.
Data analysis of CircRNAs .
We processed the raw data of fastq format via in-house perl scripts and removed reads containing ploy-N, reads containing adapter and low quality reads to obtain clean data. In the meantime, we calculated GC-content, Q20 and Q30 of clean data. According to clean data with high quality, we conducted the downstream analyses.
Find_circ software and CIRI (CircRNA Identifier) tools were used to identify circRNA. SAM files were scan twice by CIRI so that sufficient information was collected for circRNAs identification and characterizing. Briefly, On the first SAM alignment scanning, junction reads were detected by CIRI which were with PCC signals reflecting a circRNA candidate. We used paired-end mapping (PEM) and GT-AG splicing signals to implement preliminary filtering. Each circRNA candidate was recorded and junction reads were clustered. After that, the SAM alignment was scan by CIRI again and additional junction reads were detected. Meanwhile, further filtering was performed to eliminate false positive candidates. Finally, we can output identified circRNAs with annotation information.
The find_circ software will first be able to take 20 bp from both ends of the reads on the genomic alignment as anchor points, and then the anchor points as independent reads mapped to the reference genome and find the unique matching site. If the alignment position of the two anchor points is reversed in the linear direction, the reading of the anchor point is extended until the joint position of the circular RNA is found. When the signal is spliced for GT/AG, it is judged to be a circular RNA. The intersect of the results of the two methods will be the final prediction result.
Quantification of circRNAs expression levels
Using CIRI tools and find_circ software, we determined the expression of circRNA by the number of junction reads identify. Using the DESeq R package, we performed the differential expression analysis of two groups. Statistical routines, provided by DESeq, can determine differential expression in digital gene expression data. Statistical routines used the model on account of the negative binomial distribution. To control the false discovery rate, we used the Benjamini and Hochberg’s approach to adjust the resulting P-values. Differentially expressed genes were considered as one with an adjusted P-value <0.01 and log2(Fold change) >1 or <-1. We annotated gene function based on the these databases: non-redundant protein sequence database (NR); Clusters of Protein homology database (KOG); the Swiss-Prot non-redundant protein sequence database; Gene Ontology database (GO); Kyoto Encyclopedia of Genes and Genomes database (KEGG); Clusters of Orthologous Groups of proteins database (COG);the Pfam database.
CircRNAs target prediction
Potential miRNAs that may bind to the differentially expressed circRNA were predicted by searching the CircBank database (http://www.circbank.cn/index.html) and the CircInteractome database (https://circinteractome.nia.nih.gov/index.html). The candidate miRNAs obtained from the two databases were intersected.
MiRNAs target prediction
The target genes of miRNAs were predicted by retrieving the miRDB database (http://www.mirdb.org/) and the TargetScan database (http://www.targetscan.org). From each database, the top 15 target genes with the highest scores were selected. After selection, the candidate genes were intersected.
Whole-exome sequencing (WES)
Whole-exome sequencing for all patients were performed on Illumina Novaseq 6000 using 2 × 150 bp pair-end sequencing method, according to the manufacturer’s instructions.Tumor sample was obtained via core biopsy during initial diagnostic procedure. Genomic DNA was extracted from paraffin embedded slides by using an E.Z.N.A tissue DNA Kit (Omega Biotek, Doraville, USA) according to the manufacturer’s protocol.Cell-free DNA (cfDNA) was extracted as described. In brief, 10 ml of whole blood was collected and cfDNA was extracted from plasma samples using the QIAamp Circulating Nucleic Acid kit (Qiagen) according to the manufacturer’s instructions.Library preparation was performed as described. Fragments of size 200–400bp were selected and followed by hybridization with capture probes baits, hybrid selection with magnetic beads and PCR amplification. Indexed samples were sequenced on Hiseq Xten VS NovaSeq 6000 (Illumina, Inc., USA) with 2 × 150 bp pair-end reads. The sequencing data in the FASTQ format were mapped to the human genome (hg19) using BWA aligner 0.7.10. Local alignment optimization, variant calling and annotation were performed using GATK 3.2, MuTect 2, and VarScan, respectively.
Quantitative reverse-transcriptase real-time polymerase chain reaction (qRT-PCR)
Total RNAs were extracted from different groups of cells using TRIzol Reagent Kit (Invitrogen, USA), and then were reverse-transcribed into cDNA. We fastened the products of qRT-PCR to agarose gel electrophoresis (Fermentas, USA), and normalized the expressions of circRNAs, miRNAs and mRNAs by glyceraldehyde 3-phosphate dehydrogenase (β-actin). Using the 2−ΔΔCt method, all target RNA transcripts expressions were calculated. All experiments were repeated for three times. The mean value was used as a result of the experiment. The primers were designed for qRT-PCR analysis as follow: The sequences of primers were forward 5′-GCTCCTTCTGGAACTTTGAC-3′ and reverse: 5′-TTGCTCACCCAGTAGGTCTT-3′ for circFARSA; Forward 5′- tgccctagca gcgggaacag ttctgcagtg -3′ and reverse 5′-agcgatcggt gctctggggt attgtttccg -3′ for hsa-miR-503-5p; Forward: 5′- AGATCTATGGAGCTGCTGTGCCACGA-3′ and reverse 5′- AGATCTTCACAGGTCGATATCCCGCAC-3′ for CCND2.
Patient and Public Involvement
We selected 3 cases of typical metastatic colorectal cancer patients as the research object, and another case of colorectal cancer patients without liver metastasis for research. We had obtained the consent of patients before operation and took the tissue for sequencing after operation. Patients did not participate in the process of research and paper design, only provided cancer tissue, and did not carry out any intervention on patients. In addition, we will fully inform the patients of the sequencing results through postoperative follow-up, carry out health education, and provide patients with appropriate treatment and prevention methods.
Statistical analysis
We performed all statistical analyses using SPSS version 17.0 statistical software (SPSS, Chicago, USA). The measurement data were presented as the mean ± standard error of mean (S.E.M), and statistical significance was determined by t-tests. The frequency of gene overexpression between two paired patients was compared by using the Pearson χ2 test. The relationships between gene levels and variables were analyzed by conducting the Spearman’s, Pearson’s, and linear regression analyses. Statistical significance was calculated using Origin 8.0 software programs (OriginLab, USA) and was considered at p-value < 0.05. GraphPad Prism 8 software (GraphPad Software, Inc., San Diego, CA) was performed for all analyses.