Background: With the constant update of large-scale sequencing data and the continuous improvement of cancer genomics data such as the cancer genome atlas ICGC and TCGA, it gains increasing importance how to detect the functional high-frequency mutation gene set in cells that causes cancer within the eld of medicine.
Methods: In this study, to solve the issue of mutated gene heterogeneity and improve the accuracy of driver modules, we propose a new recognition method of driver modules, named ECSWalk, based on the human protein interaction networks and pan-cancer somatic mutation data. This study rstly utilizes high mutual exclusivity and high coverage between mutation genes and topological structure similarity of the nodes in complex networks to calculate interaction weights between genes. Secondly, the method of random walk with restart is utilized to construct a weighted directed network, and the strong connectivity principle of the directed graph is utilized to create the initial candidate modules with a certain number of genes. Finally, the large modules in the candidate modules are reasonably split using the way of the induced subgraph, and the small modules are expanded using a greedy strategy to obtain the optimal driver modules.
Results: This method is applied to the analysis of TCGA pan-cancer data, and the experimental results show that ECSWalk can detect driver modules more effectively and accurately, and can identify new candidate gene sets with higher biological relevance and statistical significance than MEXCOWalk and HotNet2.
Conclusions: ECSWalk is of theoretical guidance and practical value for cancer diagnosis, treatment and drug targets.