Study overview and the sequencing depth
We retrospectively reviewed 1261 plasma cfDNA sequencing data sets with matched white blood cell (WBC) from 1046 lung cancer patients in a single sequencing center. Notice that 181 patients had multiple sampling. Plasma cell-free DNA was assigned to be sequenced with 20,000X raw depth (uncollapsed), while 10,000X for WBC gDNA (Fig. 1a). Patients’ information was summarized in Fig. 1b. To the best of our knowledge, this is the first systemic CH study of large Chinese lung cancer population. Notice that the library preparation, targeted enrichment and sequencing processes inevitably introduced replicates and other unwanted reads, we plotted the uncollapsed and actual collapsed coverage for cfDNA and WBC gDNA (Fig. 1c and 1d). The median uncollapsed coverage was 17742X and 9173X for cfDNA and gDNA, and median collapsed coverage was 2757X and 2349X for cfDNA and gDNA respectively. Considering the amount of cfDNA varied a lot among samples, and all the extracted cfDNA was used for library prep, we analyzed how cfDNA input affected the mean coverage using Pearson’s correlation (figure S1). The input cfDNA positively correlates with collapsed mean coverage.
Mutation landscape of featured CH genes in lung cancer patients and healthy controls.
Twenty-three featured CH genes were selected from several previous publications18,26−29 : ASXL1, ATM, CBL, CHEK2, CREBBP, DNMT3A, FGFR3, GNAS, IDH1, IDH2, JAK2, KMT2D, MED12, NF1, NOTCH1, RUNX1, SETD2, SF3B1, SRSF2, TET2, TNFAIP3, TP53, U2AF1. The mutational landscape of all the cfDNA samples from lung cancer patients and 54 healthy individuals was shown in Fig. 2. To determine the appropriate cutoff for CH mutation calling in liquid biopsy, we set the VAF at threshold (1%) and subthreshold (0.2%). We detected CH variants at threshold in 349 (27.68%) of 1261 lung cancer patients, and 782 (62.01%) at subthreshold. The top mutated CH related gene was DNMT3A, followed by TET2, ASXL1, TP53. On the contrary, in 54 healthy controls, only 7 (12.96%) and 21 (38.89%) showed variants in CH related genes at threshold and subthreshold respectively. Besides these 23 reported CH genes, we used our pipeline (see methods) to filter top 30 potential blood cell derived mutated genes (figure S2) to see if there were other neglected CH genes. We can see that the most popular CH genes (DNMT3A, TET2, ASXL1, TP53, etc.) were present in both methods. Among all the 452 CH mutations detected at the threshold, DNMT3A accounted for nearly 60% of them, followed by TET2, ASKL1, KMR2D and TP53. While in all the 1513 CH mutations detected the subthreshold, DNMT3A accounted for 51.55% of them, followed by TET2, U2AF1, CREBBP, and ASKL1 (figure S3).
Prevalence of Clonal hematopoiesis
We observed that CH related mutations correlated with increasing age (Fig. 3) in cancer patients, which was consistent with previous studies.11,35,36 No CH mutations were detected in the age range from 20–29 with threshold and subthreshold. The highest detection frequency was in 80–89 range: 47.06% (Fig. 3a) with threshold and 83.32% (Fig. 3b) with subthreshold respectively. However, there was a drop in age range 90–99 with threshold (33.33%), but a similar detection level in 90–99 (83.33%) and 80–90 with subthreshold. This suggests with 1% VAF cutoff, some mutations were missed so that the trend of increasing detection frequency with increasing age deviated at the end (highlighted with red dashed square). When analyzing number of mutations detected in different age ranges, the proportion of multiple mutations ( > = 2) also increased with age (Fig. 3c and 3d), and the trend was even more clear with the subthreshold. While in healthy individuals, we didn’t observe such trend (Figure S4).
Source of mutations in cell-free DNA
To identify the source of a mutation from cfDNA sequencing and the requirement of sequencing depth, we first analyzed VAFs of all mutations (Fig. 4a) and CH related mutations detected by our pipeline (Fig. 4b) in different depth range with the subthreshold setting. It was clear to see that in order to reliably detect mutations with VAF as low as 0.2% (red dash line), the collapsed coverage needed to be above 2500X. This was also confined by the bioinformatic algorithms which requested at least 5 mutant reads covered the mutational site. We plotted all somatic mutations and CH related mutations with its VAF in cfDNA and in gDNA (Fig. 4c), and most CH mutations fell in the diagonal range (Pearson’s R = 0.92, p < 2.2x10− 16), suggested their source of blood cells. Similar results can be seen with the threshold setting (figure S5). Here we presented one example of a patient (P53198) harboring TP53 mutations (Fig. 4d and 4e). When calling mutation below 1% VAF, a new mutation of TP53_p.C275Y appeared. As it had similar VAF in gDNA sequencing, it was classified as CH mutation from blood cells.
Correlation of CH mutations and clinical characteristics
In order to investigate the CH mutations in different clinical settings, we compared proportion of CH mutation positive patients with respect to smoking status (Fig. 5a), bTMB status (Fig. 5b), cancer subtype (Fig. 5c), and MSI status (Fig. 5d). Among them, MSI status was calculated by samples with matched tumor sequencing data (n = 461). No significant difference was observed in these subpopulations except for smokers and non-smokers with subthreshold. Smokers had significantly (p < 0.001) higher CH detection rate than non-smokers, which can be explained by the chemical mutagenesis nature of cigarette.37