Data sources
The human PC related microarray datasets GSE86436 (expression data of mRNA and lncRNA) and GSE85589 (expression data of miRNA) were downloaded from the Gene Expression Omnibus (GEO (http://www.ncbi.nlm.nih.gov/geo/) database. There were six primary PC tissue samples and six adjacent non-tumor tissue samples in the GSE86436 dataset, and all the samples were detected on the Arraystar Human LncRNA microarray V2.0 platform (Agilent-033010 Feature Number version). In addition, blood serum samples from 88 PC patients and 19 healthy individuals in the GSE85589 dataset were used, with the [miRNA-4] Affymetrix Multispecies miRNA-4 Array platform.
In addition, related data for pancreatic adenocarcinoma (PAAD) in The Cancer Genome Atlas (TCGA) was downloaded from the University of California Santa Cruz Genome Browser database (http://xena.ucsc.edu/) [20], including clinical data, gene expression RNAseq log2(count+1) data, and miRNA expression log2(RPM+1) data. Pancreatic head, pancreatic body, and pancreatic tail samples were selected based on sample clinical phenotype information. A total of 137 pancreatic head samples, 14 pancreatic body samples, 14 pancreatic tail samples as well as their corresponding gene expression RNAseq log2(count+1) and miRNA expression log2(RPM+1) data were obtained.
Data preprocessing and lncRNA re-annotation
For the GSE86436 dataset, the standardized expression profile of mRNA and lncRNA, as well as probe sequences, were downloaded. The sequences of the probes were mapped to the GRCh38 human reference genome and “unique map” probes were selected. Then, the mapped gene for each probe was obtained based on the corresponding position and positive/negative strand information on the chromosome as well as the “Release 25” annotation file. Probes with the annotation of “protein_coding” were the corresponding mRNA corresponding, while probes with the annotations of “antisense”, “sense_intronic”, “lincRNA”, “sense_overlapping”, and “processed_transcript” were the corresponding lncRNA probes.
For the GSE85589 dataset, the standardized miRNA expression profile and annotation files were downloaded, followed by annotation of the probes. The probes with no mapped miRNA were removed, and when multiple probes mapped to one miRNA, the mean expression value was considered as the expression value of this miRNA.
For the data downloaded from the TCGA database, the gene expression RNAseq log2(count+1) data were converted to count values. Genes with a count value of 0 in more than half of the samples were filtered, followed by gene annotation based on the “Release 25” annotation file. Similarly, for the miRNA expression log2(RPM+1) data, miRNAs with a log2(RPM+1) of 0 in more than half of the samples were filtered, followed by their annotation based on miRbase database.
Differential analysis
The expression profiles of mRNA, lncRNA, and miRNA downloaded from the GEO database were analyzed to determine differential expression between the PC and normal groups. The corresponding P-value and log fold change (FC) were obtained using the classical Bayes method in the Limma package (Version 3.10.3, http://www.bioconductor.org/packages/2.9/bioc/html/limma.html). For the expression profiles of lncRNA and mRNA downloaded from the TCGA database, the raw count was standardized and converted into logCPM value using the TMM method in the edgeR package [21, 22] (Version: 3.4, http://www.bioconductor.org/packages/release/bioc/html/edgeR.html) to perform the differential expression analysis between pancreatic tail and vs. pancreatic body (T vs. B), pancreatic tail vs. pancreatic head (T vs. H), and pancreatic body vs. pancreatic head (B vs. H). The classical Bayes method in the Limma package was used to analyze the miRNA log2(RPM+1) between the three groups.
Differentially expressed mRNA (DE-mRNA) and DE-lncRNA were selected with the threshold of P < 0.05 and |logFC| > 1, while P < 0.05 and |logFC| > 0.263 were considered as the threshold for DE-miRNA. Overlapped mRNAs, lncRNAs, and miRNAs between PC vs. Normal and (union of genes among T vs. B, T vs. H, and B vs. H) were considered as the DE-mRNAs, DE-lncRNAs, and DE-miRNAs, respectively, related to PC and the locations of PC occurrence, and they were used in the subsequent analysis.
Co-expression analysis
The DE-mRNAs and DE-lncRNAs were used to calculate the Pearson correlation coefficient (r) of each mRNA and each lncRNA by one-to-one correspondence of the samples. The lncRNA-mRNA interaction pairs were selected with cut-offs of r > 0 and P < 0.05. Similarly, the r-value of each mRNA and each miRNA was also calculated by one-to-one correspondence of the samples, and the miRNA-mRNA interaction pairs were selected with cut-offs of r < 0 and P < 0.05.
ceRNA network construction
The miRNA-lncRNA interaction binding sites were predicted using miRanda software (version 3.3a), and the miRNA-lncRNA pairs were selected with a threshold score > 140 and threshold Energy < -20. In addition, the target genes of miRNAs were also predicted using miRWalk 2.0 (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/). The miRNA-mRNA pairs predicted to appear in at least three of nine databases (including miRWalk, Microt4, miRanda, miRMap, miRNAMap, PITA, RNA22, RNAhybrid, and Targetscan) were selected. The overlapped miRNA-mRNA pairs between co-expressed miRNA-mRNA pairs and predicted miRNA-mRNA pairs were considered the final miRNA-mRNA pairs. Finally, the lncRNA-miRNA-mRNA interactions were obtained based on the lncRNA-miRNA pairs, mRNA-miRNA pairs, and the co-expressed mRNA-lncRNA pairs, followed by the ceRNA network construction using Cytoscape software (version 3.4.0, http://chianti.ucsd.edu/cytoscape-3.4.0/).
Functional enrichment analysis
To explore the involved function of the DE-mRNAs, lncRNAs, and miRNAs, the clusterProfiler [23] (version 3.8.1, http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) in the R package was used to enrich the biological processes in Gene Ontology (GO) annotation and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The significantly enriched function terms were selected with the threshold of P.adjust < 0.05 and enriched gene count > 2. Notably, the functions of lncRNAs and miRNA were obtained based on the functional enrichment of the mRNAs in co-expressed mRNA-lncRNA pairs and in the final obtained mRNA-miRNA pairs, respectively.
Construction of mRNA prognosis risk model
The mRNAs in the ceRNA network were considered as the candidate mRNAs. Cox univariate regression analysis was used to calculate the regression coefficient and P-value between each candidate mRNA and the survival time and state. The prognosis-related mRNAs were selected with the P-value threshold < 0.05, combined with hazard ratio (HR) risk (theoretically, up-regulated genes between PC vs. Normal should be risk factors, corresponding to an HR > 1; otherwise HR < 1). The Risk Score was calculated as βgene1*exprgene1 + βgene2*exprgene2 + ... +βgenen*exprgenen, in which β is the prognostic correlation coefficient and exprgene is the expression value of the corresponding gene. The mRNAs were added to the model according to the P-value from small to large one-by-one. The high and low risk samples classified by the average value of the model constructed after adding an mRNA had the greatest significant correlation with survival (log-rank test). The area under the curve (AUC) of the high and low risk samples was maximized according to the expression value of the selected mRNA. In this case, the mRNA model was considered to be associated with prognosis.
Screening of prognostic factors and construction of Nomogram model
The clinical information of PC samples in the surveillance, epidemiology, and end results (SEER) and TCGA databases were downloaded. Clinical factors, including age, gender, location of cancer, clinical stage, and tumor histological grade, and the prognostic risk model score calculated as described above were used as independent variables. OS was used as the dependent variable to perform Cox univariate regression analysis for clinical factors having P < 0.05 to screen for prognostic factors. The Nomogram model was constructed based on the results of multivariate regression analysis, including conversion and assignment of regression coefficients, Nomogram plotting, and calibration curve plotting.
Construction of drug-gene interaction network
The targeted drugs for DE-mRNA were predicted using the DGIdb online database (http://www.dgidb.org/search_interactions) using the parameters settings of FDA Approved and Antineoplastic. The drug-gene interaction network was constructed based on the obtained drug-gene interactions using Cytoscape.