Motivation: Variable selection is a common statistical approach to detecting genes associated with clinical outcomes of scientific interest. There are thousands of genes in genomic studies, while only a limited number of individual samples are available. Therefore, it is important to develop a method to identify genes associated with outcomes of interest that can control finite-sample false discovery rate (FDR) in high-dimensional data settings.
Results: This article proposes a novel method named Grace-AKO for graph-constrained estimation (Grace), which incorporates aggregation of multiple knockoffs (AKO) with the network-constrained penalty. Grace-AKO can control FDR in finite sample settings and improve model stability simultaneously. Simulation studies show that Grace-AKO has better performance in FDR control in finite sample settings than the original Grace model. We apply Grace-AKO to the prostate cancer data in the The Cancer Genome Atlas (TCGA) program by incorporating prostate-specific antigen (PSA) pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) as the prior information. Grace-AKO finally identifies 47 candidate genes associated with PSA level, and more than 75% of the detected genes can be validated.