Background: An important and effective step in cancer treatment is understanding the clonal evolution of cancer tumors. Clones are cell populations with different genotypes, resulting from the differences in the somatic mutations that occur and accumulate during cancer development. An appropriate approach for better understanding a tumor population is determining the variant allele frequency with which the mutation occurs in the entire population. Bulk sequencing data can be used to provide that information, but the frequencies are not informative enough in identifying different clones and their evolutionary relationships. On the other hand, single-cell sequencing data provides valuable information about branching events in the evolution of a cancerous tumor. However, in the single-cell sequencing data, the total population of sequenced cells is naturally much smaller than bulk sequencing so it is not precise enough for calculating cell prevalence.
Result: In this study, a new method called Conifer (ClONal tree Inference For hEterogeneity of tumoR) is proposed which combines aggregated variant allele frequency from bulk sequencing data with branch evolution information from single-cell sequencing data, in order to better understand clones and their evolutionary relationships. It is proven that the accuracy of clone identification is increased by using Conifer compared to other existing methods in both real and simulated data. Also, it is shown that the approach of Conifer in using single-cell sequencing data together with bulk sequencing data has reduced the possibility of cloning mutations with similar frequency but belonging to different clones.
Conclusions: In this study, we provided an accurate and robust method to identify clones of tumor heterogeneity and their evolutionary history by combining single-cell and bulk sequencing data.