Background: Gene signature is useful to represent the molecular alternation in the disease genomes at specified conditions and is often used to distinguish samples into various groups for better research prospective as well as better clinical treatment. There is lack of efficient techniques that can take into account the complex gene expression profile and able to identify the most relevant signatures.
Methods: In this article, we presented a new framework to identify Dense Module based gene Signature (DeMoS) and their targeting miRNAs through Quasi-Clique detection algorithm and their application in prognosis survival study. Here we applied a cervical cancer data repository with prognosis clinical data to conduct our experiment. We first performed Empirical Bayes test using Limma method to identify dysregulated genes (or, miRNAs). MiRNA-mediated dysregulated target genes had been extracted from those dysregulated miRNAs. Thereafter, We detected dense co-expressed modules using Quasi-Clique identification technique. The average correlation coefficient was then computed for each resultant module. The module containing the highest correlation was formulated as the resultant gene signature. Next, We applied three well-known classifiers (SVM, PAM and Random Forest (RF)) using 10-fold cross-validation, and obtained AUC. Finally, we performed survival prognosis study for the resultant gene signature.
Results: The resultant signature consisted of ten genes, FGF9, FGF18, PPP1R9A, ERBB4, DCDC2, TOX3, ARMC3, DNALI1, RGL3 and ENPP3. In addition, We identified a total of eight dysregulated miRNAs that targeted the aforementioned gene signature. Hsa-mir-34c was found to be strongly associated with the genes signature since its out-degree centrality score was highest. On the other hand, the p-value of the Cox regression in the prognosis study for the resultant gene signature was found to be significant (=4.2e-02). Finally, DeMoS evaluated the highest AUC values (viz., 0.95 for SVM, 0.955 for RF, 0.955 for PAM) for our resultant gene signature in compared to the other state-of-the-art techniques.
Conclusions: Our framework estimated most promising gene signature that could classify multiple groups/subtypes of samples with higher AUC as well as statistically significant p-value in regression based prognosis analysis. Our method is useful to determine signature for any RNA-seq profile. Code is available at https://github.com/sahasuparna/DeMoS.