- LADC data sets and preprocessing
The LADC data sets, including RNA expression quantification data and the relevant clinical data, were downloaded from TCGA database (https://portal.gdc.cancer.gov/). The expression quantification data of mRNAs and LncRNAs were obtained from HTSeq-counts files on Illumina platform. A total of 533 LADC samples and 59 adjacent normal samples were collected. The IDs of mRNAs and LncRNAs were converted using GENCODE Human Release 31 (GRCh38.p12; https://www.gencodegenes.org/human/).
- Differential expression analysis
DELncRNAs and DEmRNAs between LADC and adjacent normal samples were analyzed using the R package DESeq2 (version 1.22.2; http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2 .html) [30]. The specific steps are given as follows.
(1) The samples were divided into two groups: LADC and adjacent normal samples.
(2) Genes whose sum of expression is less than 20 were removed since too small gene expression can lead to large calculation errors.
(3) The DESeq function in the R package DESeq2 was used to analyze the differential expression of the remaining genes and obtain the result dataset.
(4) The LncRNAs in the result data set were screened to obtain DELncRNAs according to the criteria of |log2 fold change (LFC)| > 1.5 and false discovery rate (FDR, or adjusted P-value) < 0.05. As for DEmRNAs, the criteria were |LFC| > 2 and FDR < 0.01.
(5) The R packages ggplot2 (version 3.1.1) and pheatmap (version 1.0.12) were used to plot the volcano plots and heatmaps of DELncRNAs and DEmRNAs.
- Survival analysis
The survival analysis was performed according to the following steps.
(1) The samples with missing survival status information were removed from the expression data of DELncRNAs.
(2) The R package DESeq2 was used to normalize the expression data of DELncRNAs.
(3) The LADC samples were divided into two groups: high expression (normalized expression > median expression) and low expression (normalized expression < median expression) according to the expression level of the DELncRNAs in these samples.
(4) The R package survival (version 2.44-1.1) was used to build the survival information dataset and to perform the log-rank test to calculate the correlation P-value of overall survival and DELncRNA expression level.
(5) The top-3 significant DELncRNAs related to overall survival were selected for further investigation. The survival curves of the three DELncRNAs were calculated with Kaplan-Meier (KM) method and visualized using the R package survminer (version 0.4.4).
(6) The significance was calculated again by performing Fisher's accurate test and Pearson’s chi-squared test to validate the correlation between the overall survival and gene expression level.
- PCR experiments
LADC and adjacent normal tissues were obtained from 22 patients who suffered from LADC. These patients were treated in Xiangya Hospital, Central South University in 2015. All subjects had written consents, and the protocol was approved by the ethics committee of Xiangya Hospital (No. 201503303). All procedures conducted in the study were in accordance with the Ethics Standards Institutions Research Committee and the 1964 Helsinki Declaration and its subsequent amendments or similar ethical standards. All patients had no history of other malignancies and never received radiotherapy or chemotherapy. All tissue samples were immediately frozen in liquid nitrogen and stored at -80 °C. All tumors and matched normal tissues were confirmed by two pathologists. Total RNA was extracted using the Trizol Reagent kit (Invitrogen) and was reversely transcribed into cDNAs by using HiScript II Q Select RT SuperMix for qPCR (Vazyme). Real-time PCR was performed on the cDNA templates using specific primers (TSINGKE) and the GeneCopoeia BlazeTaq™ SYBR® Green qPCR mix (GeneCopoeia). The LncRNAs relative expression levels were calculated as a ratio normalized to U6 expression. Comparative quantification was calculated with the 2−ΔΔCt method. The primers sequences used in this study are listed. lnc-YARS2-5: F, 5'- ggtaccagaagcagcacct-3'; and R, 5'- aaaagaactcggccaagctc-3'. lnc-NPR3-2: F, 5'- aagcaagcatactcgtccct-3'; and R, 5'- gagccaagacgtagaggagg-3'. LINC02310: F, 5'-gaggaggtgctttgcttctc-3'; and R, 5'-atgaaccgagtcctggagtc-3'.
- Clinical feature analysis
The TNM-staging system has become the standard method for clinicians and medical scientists to stage malignant tumors. In this paper, we adopted its 8th edition, which is the latest revision proposed by International Association for the Study of Lung Cancer (IASLC) [31, 32]. T-stage represented the size and location of tumor, N-stage reflected the spread of lung cancer in lymph nodes, and M-stage indicated whether the cancer cell had metastasized to a distant site. They were explained detailly in Table 1. TNM-stage gave an overall evaluation of the tumor progression via integrating the information given by T-stage, N-stage and M-stage. Its stage I, II/III and IV represented the early, middle and advanced stages of tumor, respectively.
The clinical feature analyses of the top-3 significant DELncRNAs were conducted based on the expression levels identified by PCR experiments and the clinical features of the 22 LADC patients. The specific analysis process is as follows.
(1) The CT values of the experimental genes (DELncRNAs) and reference genes in LADC and adjacent normal samples were obtained from the PCR experimental results.
(2) The ΔCT values were calculated as follows.
ΔCTLADC sample = CTLADC sample, experimental gene - CTLADC sample, reference gene
ΔCTadjacent normal sample = CTadjacent normal sample, experimental gene - CTadjacent normal sample, reference gene
(3) The ΔΔCT values were calculated by
ΔΔCT = ΔCTLADC sample - ΔCTadjacent normal sample
(4) LFC = -ΔΔCT
(5) The LADC patients with |LFC| ≤ 0.5 were removed. The remaining patients were separated into two groups: high expression (LFC > 0.5) and low expression (LFC < -0.5) according to the expression level of DELncRNAs in their tissues.
(6) The significance was calculated by performing Fisher's accurate test and Pearson’s chi-squared test to evaluate the correlation between the expression level of each of the three DELncRNAs and each clinical feature.
From the above analyses, LINC02310 can be screened out. Therefore, further validation of the clinical relevance of LINC02310 were performed based on the expression data and clinical data from TCGA. The specific steps are as follows.
(1) LINC02310 expression data were extracted from TCGA.
(2) The LADC samples were divided into two groups: high expression (LINC02310 expression > median expression) and low expression (LINC02310 expression < median expression).
(3) The correlation significance between LINC02310 expression level and each clinical feature was evaluated by performing Fisher's accurate test and Pearson’s chi-squared test.
- CCK-8 assay and colony formation assay
Two human lung cancer cell lines (H1299, PC-9) were used in the CCK-8 assay and colony formation assay since they can be efficiently infected according to our previous studies. They were purchased from Chinese Academy of Sciences Cell Bank (Shanghai, China).
(1) OE: the primers G0157014-1F (AGGATCGCTAGCGCTACCGGACTCAGATCTCGAGAGTCTCCTCTTG CAGATCAGATACCACC) and G0157014-1R (TACCCGGTAGAATTATCTAGAGTCGCGGGATCCAGATAC CTAAAAGGCACAAATCCTTCTG) were synthesized. They were then cloned into pLVX-Puro at XhoI-BamHI site to obtain the LINC02310 overexpression vector, pLVX-Puro-LINC02310, which was transfected into the cells.
(2) NC: a random sequence was cloned into pLVX-Puro at XhoI-BamHI site to obtain the control vector, which was then transfected into the cells.
(3) Si-02310: three small interfering RNAs (siRNAs) of LINC02310 were synthesized and transfected into the cells to down-regulate LINC02310 expression.
(4) Si-NC: A control sequence was transfected into the cells.
CCK-8 assay:
100 μL of OE, NC, si-02310 and si-NC of H1299 and PC-9 (1×103/mL) were transplanted in 96-well plate. Next, 10 μL of CCK-8 solution (Dojindo Laboratories, Kumamoto, Japan) was added to each well. Then, 40 min of incubation at 37 °C was involved. Finally, the absorbance was measured at 450 nm (OD450) by automatic microplate reader (Tecan, NANOQUANT, Swizerland) on days 0, 1, 2, 3 and 4, respectively. The experiment was repeated three times.
Colony formation assay:
OE, NC, si-02310 and si-NC of H1299 (1×103) were transplanted into 6-well plate and then incubated at 37°C for 10 days. Colonies were dyed with dyeing solution containing 0.1% crystal violet and 20% methanol. Cell colonies were then counted and analyzed.
- CeRNA regulatory network construction
(1) The lncRNASNP2 website (http://bioinfo.life.hust.edu.cn/lncRNASNP#!/) [33, 34] was used to predict the targeted miRNAs of LINC02310. The targeted miRNAs whose reads of exon model per million mapped reads (RPM) ≥ 1 were screened out.
(2) The starBase database (version 3.0; http://starbase.sysu.edu.cn/) [35] was used to predict the downstream mRNAs of the selected targeted miRNAs. The prediction programs TargetScan, microT, miRanda, miRmap, PITA, RNA22 and PicTar were integrated into starBase.
(3) These downstream mRNAs were intersected with the DEmRNAs to get the downstream DEmRNAs.
(4) The ceRNA (LINC02310-targeted miRNAs-downstream DEmRNAs) regulatory network was constructed and visualized by the software Cytoscape (version 3.7.1).
- Enrichment analysis
The enrichment analysis consisted of Gene Ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. It was performed using the R packages clusterProfiler (version 3.10.1) [36] and org.Hs.eg.db (version 3.7.0), which provided analysis functions and human gene annotation information, respectively. They were used to analyze and visualize the functional profiles of the predicted downstream DEmRNAs. Firstly, the GO functional annotation was performed, including molecular function (MF), biological process (BP) and cellular component (CC) [37]. The steps are as follows.
(1) The gene IDs of downstream DEmRNAs were listed.
(2) The enrichGO function was used to generate the enrichment result data set of MF, BP and CC, respectively.
(3) The barplot function was used to visualize the result.
Secondly, the KEGG pathway analysis was conducted through the following steps.
(1) The gene IDs of downstream DEmRNAs were converted into a format suitable for KEGG pathway analysis according to HGNC database (https://www.genenames.org/).
(2) The enrichKEGG function was used to generate KEGG pathway analysis result data set.
(3) The result was visualized via barplot function.
- PPI network
The PPI network was constructed using the STRING database (version 11.0; https://string-db.org/) and the Cytoscape software according to the following steps.
(1) The gene names of downstream DEmRNAs were uploaded to the STRING database to build a PPI dataset. In the dataset, the relevance between every two proteins (nodes) were evaluated by a combined score. The score ranges from 0 to 1, and a higher score represents a closer interaction.
(2) The interactions whose combined score > 0.7 were screened out and downloaded.
(3) The downloaded interactions were divided into different modules using MCODE (version 1.5, a plugin for Cytoscape, http://apps.cytoscape.org/apps/mcode). The module with the highest average combined score was selected to construct the PPI network, which was visualized by Cytoscape.