Altered cfDNA fragmentation profile upon 5mC-immunoprecipitation (IP)
As it has been reported that cancer-derived cfDNA fragments may have altered methylation and smaller size [6, 11], we decided to focus on cfDNA fragments ranging from 100 to 220 base pairs (bp) to investigate whether the release of cancer-derived cfDNA was related to DNA methylation or not. In a preliminary analysis in discovery cohort, cfDNA extracted from plasma of 3 healthy individuals (H1, H2 and H3) and 3 breast carcinoma patients (P1, P2 and P3) in recovery period with low tumor burden were used for cfMeDIP-seq library construction with some modifications (fig. S1, A-E, and table S1). Both Input and IP libraries were sequenced for pair-end reads with around 0.5 and 5 coverage respectively (table S2). Interestingly, we observed a decrease of short cfDNA fragments (100 - 150 bp) density and short fragments ratio (defined as the ratio of short cfDNA fragments to long cfDNA fragments (151 - 220 bp)) in IP libraries compared with it in corresponding Input libraries for healthy individuals (Fig. 2, A-C and G), whereas these phenomena were not seen in patients with breast cancer (Fig. 2, D-F and H). Furthermore, mean cfDNA fragments size was found to increase from 170.06 (Input libraries) to 173.04 (IP libraries) bp in healthy individuals, which was not observed in cancer patients (170.51 to 170.71 bp) as well (fig. S2, A and B). To examine differences between healthy individuals and cancer patients, percentage change of short fragments ratio from IP library to corresponding Input library was calculated, we found that patients with breast cancer had significant smaller changes compared with healthy individuals (fig. S2, C-E and table S3).
To find out the short fragments ratio variation across human genome, genome-wide cfDNA fragmentation profiles in both Input (Fig. 2I, upper panel) and IP (Fig. 2I, middle panel) libraries were shown in 5-Mb windows for participants in discovery cohort according to the method described previously [3], changes of cfDNA fragmentation profile (IP - Input) due to 5mC-IP were calculated through subtracting the short fragments ratio in Input libraries from the short fragments ratio in IP libraries in each 5-Mb genomic window (Fig. 2I, lower panel). Smaller changes of short fragments ratio between IP library and Input library were observed in almost all genomic windows across human genome for patients with breast cancer.
Overall, these results suggested that more cancer-derived short cfDNA fragments were enriched during 5mC-IP reaction than noncancer-derived short cfDNA fragments. Therefore, we hypothesized that the enrichment of short cfDNA fragments in cancer patients might be due to the differences in methylation profiles.
Relationship between methylation and fragment size in cfDNA
To examine origins of the enriched short cfDNA fragments in patients with breast cancer, we first identified 2,211 differentially methylated regions (DMRs) between cfDNA of patients and healthy individuals (1,241 hypermethylated, 970 hypomethylated in patients at padj < 0.05 and |log2FoldChange| > 1 with each region represented 10kb genomic window) (Fig. 3, A and B, and table S4). We further evaluated DMRs-dependent cfDNA fragmentation pattern in IP libraries, it was found that cfDNA released from hypomethylated regions had higher short fragments ratio than hypermethylated regions in both patients and healthy individuals (Fig. 3C). Analysis of percentage change for short fragments ratio in hypomethylated regions compared with hypermethylated regions showed patients with breast cancer had increased short fragments ratio in hypomethylated regions compared with healthy individuals (Fig. 3D), which indicated that enriched cancer-derived short cfDNA fragments might be mainly released from hypomethylated regions.
In accordance with increased short fragments ratio in hypomethylated regions, size distribution of cfDNA fragments mapped to hypomethylated regions was found to shift to the direction of smaller size compared with cfDNA fragments mapped to hypermethylated regions, and this shift was to a greater extent in patients with breast cancer (Fig. 4, A-F). Moreover, mean size of cfDNA fragments mapped to hypomethylated regions decreased more in patients with breast cancer (4.60 bp, 172.33 bp in hypermethylated regions to 167.73 bp in hypomethylated regions) than healthy individuals (2.87 bp, 174.54 bp in hypermethylated regions to 171.67 bp in hypomethylated regions). Collectively, these findings again demonstrated that in contrast to healthy individuals, patients with breast cancer had enriched short cfDNA fragments during 5mC-IP reaction, which might mainly originated from hypomethylated genomic regions.
To further confirm the origin of short cfDNA fragments, size of cfDNA fragments in patients with lung cancer from another study were also investigated [25]. As expected, patients with lung cancer had higher percentage change of short fragments ratio in hypomethylated regions compared with it in hypermethylated regions (fig. S3, A and B).
DMRs-dependent cfDNA fragmentation profiles for breast cancer
To test whether simultaneously detecting cfDNA methylation and fragment size could be used for diagnosis of breast cancer or not, we then investigated the feasibility of applying DMRs-dependent cfDNA fragmentation profile for distinguishing cancer patients with healthy individuals. To account for potential biases contributed by short fragments before 5mC-IP, short fragments ratio from IP libraries in each differentially methylated 10kb window was corrected by short fragments ratio in corresponding Input libraries. This input-adjusted short fragments ratio was generated for 93 hypermethylated genomic windows and 691 hypomethylated genomic windows with at least 20 unduplicated cfDNA fragments for all samples and input-adjusted short fragments ratio of no more than 10 for any samples were identified within each window. As expected, input-adjusted short fragments ratio in hypomethylated genomic windows could differentiate cancer patients from healthy individuals, which was rarely seen in hypermethylated genomic windows (Fig. 5A, and fig. S4A). Similar differences were continuously observed even with gradually decreasing thresholds (padj < 0.05 and |log2FoldChange| > 0.9 to padj <0.05 and |log2FoldChange| > 0.5) for defining DMRs (Fig. S5). In addition, genome-wide distribution of hypomethylated windows showed that the difference occurred on almost all chromosomes (Fig. 5B, and fig. S4B). These findings suggested that variation in DMRs-dependent cfDNA fragmentation profile could differentiate patients with breast cancer from healthy individuals.
Breast cancer diagnostic accuracy in validation cohort
To verify whether the findings obtained from discovery cohort could be applied for diagnosis of breast cancer, we performed cfMeDIP-seq for cfDNA extracted from 11 patients with breast cancer (P4 - P14) and 8 healthy individuals (H4 - H11) in validation cohort (Table S1). All patients with breast cancer had not undergone previous treatment and were confirmed through biopsy. Similarly, increased short cfDNA fragments density in IP libraries of patients with breast cancer was observed (Fig. S6, A and B, and fig. S7). Within the identified 731 DMRs, higher percentage change of short fragments ratio as well as greater shift of size distribution of cfDNA fragments in hypomethylated regions compared with hypermethylated regions were also found for patients with breast cancer (Fig. S8, A-D, fig. S9, and table S5).
Subsequently, we assessed whether DMRs-dependent cfDNA fragmentation profile could differentiate cancer patients from healthy individuals in validation cohort. It was found that abnormal input-adjusted short fragments ratio in specific hypomethylated genomic windows were present for most of the patients with breast cancer, whereas it remained consistent in healthy individuals (fig. S10 and fig. S11).
We then developed an approach called ‘correlation assessment of DMRs-dependent cfDNA fragmentation profile’ to evaluate the abnormality of short fragments ratio in 72 frequently altered hypomethylated genomic windows with at least 20 unduplicated cfDNA fragments for all samples and input-adjusted short fragments ratio of no more than 10 for any samples were identified within each window. Correlation analysis of input-adjusted short fragments ratio from each participant to the median input-adjusted short fragments ratio of healthy individuals in the 72 hypomethylated windows was performed. It was found that healthy individuals had higher correlation with an average of 0.83, whereas patients with breast cancer had lower correlation with an average of 0.68 (Fig. 6A). If using the correlation value as classifier for detecting patients as being healthy or having cancer, at a threshold of 0.72, we detected 7 out of 11 patients as having breast cancer (63.6% sensitivity), whereas no healthy individuals were mis-detected (100% specificity) (Table 1). Receiver operator characteristic analysis for the detection of patients with cancer had an area under the curve (AUC) value of 0.909 (95% confidence interval, 0.771 - 1.000) (Fig. 6B). Taken together, DMRs-dependent cfDNA fragmentation profiling could distinguish patients with breast cancer and healthy individuals.