Background: In microarray prognostic studies, we expect to identify genes that are associated with disease-free survival or overall survival. However, due to the rarity of the disease and the cost of sample collection, we face the challenge of limited sample size, which may prevent accurate risk assessment. It necessitates the method that can utilize information from similar studies or data for gene selection and risk assessment of the target task.
Results: We model the time-to-event data using the accelerated failure time model (AFT). We propose a transfer learning method for the AFT model to improve the fit on the target cohort by borrowing information from source cohorts adaptively. The Lasso penalty is used for gene selection and regularized estimation. We use Leave-One-Out cross validation based methods for evaluating the relative stability of individual genes and overall prediction significance.
Conclusion: We demonstrate through simulation studies that the transfer learning method for the AFT model can correctly identify a small number of genes, its estimation error is smaller than the corresponding error without using source cohorts. Meanwhile, the proposed method demonstrates satisfactory robustness and accuracy against cohort heterogeneity compared to the method that directly combines the target and source cohorts in the AFT model. We analyze the GSE88770 data and the GSE25055 data using the proposed method. The selected genes are relatively stable, and the proposed method has overall satisfactory prediction power.