Benchmarking of Methods for DNA Methylome Deconvolution

doi:10.21203/rs.3.rs-3470543/v1

Download PDF

Article

Benchmarking of Methods for DNA Methylome Deconvolution

https://doi.org/10.21203/rs.3.rs-3470543/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 16 May, 2024

Read the published version in Nature Communications →

Version 1

posted

You are reading this latest preprint version

Defining the number and abundance of different cell types in tissues is important for understanding disease mechanisms as well as for diagnostic and prognostic purposes. Typically, this is achieved by immunohistological analyses, cell sorting, or single-cell RNA-sequencing. Alternatively, cell-specific DNA methylome information can be leveraged to deconvolute cell fractions from a bulk DNA mixture. However, comprehensive benchmarking of deconvolution methods and modalities was not yet performed. Here we evaluated 13 deconvolution algorithms, developed either specifically for DNA methylome data or more generically. We assessed the performance of these algorithms, and the effect of normalization methods, while modelling variables that impact deconvolution performance, including cell abundance, cell type similarity, reference panel size, method for methylome profiling (array or sequencing), and technical variation. We observed differences in algorithm performance depending on each these variables, emphasizing the need for tailoring deconvolution analyses. The complexity of the reference, the number of marker loci and, for sequencing-based assays, the sequencing depth have a marked influence on performance. By developing handles to select the optimal analysis configuration, we provide valuable source of information for studies aiming to deconvolute array- or sequencing-based methylation data.

Physical sciences/Mathematics and computing/Computational science

Biological sciences/Genetics/Epigenetics/DNA methylation

Profiling the number and abundance of different cell types in tissues is important for research into disease mechanisms as well as for diagnostic and prognostic purposes. For instance, the stromal cell composition of a tumor predicts response to therapy and patient survival and provides insights into drug resistance mechanisms^1–3. Typically, cell fractions in a tissue are determined using immunohistological analyses, cell sorting, or single-cell RNA-sequencing. Though valuable in specific settings, these techniques are either expensive and time-consuming, or limited in sensitivity by a restrictive availability and multiplexing ability of highly specific antibodies^4–6. Additionally, they are not applicable when querying abundance of tissues- or cells-of-origin in a complex mixture of nucleic acids, as found for example in blood plasma under the form of cell-free DNA or RNA^7–12.

A valuable alternative is provided by techniques that determine the individual cell types contributing to a mixture, by decomposing the signal derived from this mixture into its constituent signals, a procedure called “deconvolution”^13–16. Deconvolution is often applied to bulk transcriptomes, with expression of cell-type-specific marker genes reflecting the contribution of that cell type to the bulk profile. A plethora of deconvolution methods exist, and a recent benchmarking study on transcriptome deconvolution demonstrated that they display a variable performance¹⁷. Although transcriptomes can indeed reveal cell type compositions, they are inherently variable between samples and individuals due to variations in RNA quality. Additionally, transcriptional activity is dependent on cell type. Different cell types hence contribute to a varying degree to the total RNA content of a bulk sample^18–20.

DNA methylomes are a frequently used alternative data source. Like gene expression, DNA methylation patterns are cell-type specific and amenable to high-throughput profiling^21–25. They however possess a few advantages for deconvolution. Firstly, similar to marker genes, differential methylation at selected CpGs can serve as a cell-type-specific marker, but there are vastly more CpGs than expressed genes that can be used for deconvolution. Secondly, DNA methylation is often binary, being either present or absent at a given locus in contrast to the continuous distribution of RNA transcription, rendering deconvolution more straightforward. Thirdly, assuming ploidy is comparable between the cell types, each cell will provide an equal contribution to the mixture, in contrast to the transcriptional magnitude-dependent contribution of cells in bulk transcriptomes, with some cell types containing orders of magnitude more RNA than others. Despite these advantages, a comprehensive benchmarking of DNA methylation-based deconvolution is lacking.

To facilitate applying DNA methylation-based deconvolution, we here comprehensively evaluate deconvolution algorithms, developed either specifically for DNA methylome data, or for transcriptome data but having a generic basis. Different normalization methods are applied and tested on array- and sequencing-based DNA methylation profiles. We also assess the impact of variables that may influence deconvolution performance, including the cellular fraction, the number of markers used to build the reference and the depth of sequencing. Together, these analyses allow tailored selection of methods for successful and accurate DNA methylome deconvolution.

Setting up the benchmarking

Methods to deconvolute DNA methylome profiles of heterogeneous samples into their constituent cell or tissue types can be broadly categorized into linear methods and more complex machine learning models. For our benchmarking, we selected 13 commonly used or recently developed methods (Table 1). Since many of these have different underlying assumptions of data structure and distribution, we also tested seven data normalization approaches (Table S1), resulting in a total of 91 algorithm-normalization combinations. Deconvolution is typically performed on a limited subset of marker CpGs. For most analyses, we therefore decided to use a fixed number of markers per cell type (n = 100 for each source), based on pre-defined criteria (see methods), such that each cell or tissue type had an equal number of representations in the reference. To deconvolute mixtures of four tissue types, we thus identified 400 marker CpGs from reference datasets, while we identified 600 marker CpGs for mixtures of six leukocyte types (Table S2). The same set of CpGs was used for every comparison. As a ground truth, 200 in silico mixtures were generated by combining DNA methylation profiles of defined tissues or cell types (Table S3) in specified but randomly generated proportions. We provide a detailed description of all in silico mixtures in Table S4. Next, we tested the performance of each algorithm-normalization combination by computing measures of accuracy between deconvoluted and actual proportions. We assessed deconvolution performance by quantifying the root mean square error (RMSE), reflecting the absolute error between true and predicted values, as well as by using the R², which quantifies the correlation between true and predicted values but is less perceptive of systematic biases. Evaluation of both metrics is required, as some deconvolutions show a high linearity (R²) but also a large RMSE suggesting systematic deviations (see below). R² and RMSE values for the various analyses are compiled in supplementary table 5 and 6 respectively. When we applied deconvolution on in silico mixtures that were generated from the same dataset that serve as deconvolution references, most methods produced near perfect results (Figure S1) as expected. This differs from a real-world scenario, where reference profiles are generated from different samples, and often also profiled in different laboratories. To evaluate the real-world usage, we therefore selected reference datasets from the same cell or tissue sources, but which were independently generated (Fig. 1). These datasets differ markedly from those we use to generate our in silico mixtures, as was evident in a head-to-head comparison of DNA methylation levels at 100 marker CpGs (Fig. 2a, Figure S2a).

Tissue fraction deconvolution

As a first means of benchmarking all algorithm-normalization combinations, we focused on a relatively straight-forward deconvolution problem, by assessing the deconvolution performance for mixtures of four distinct tissues: small intestine (smallest fraction = 0.35%, largest fraction = 69.12%), blood (smallest fraction = 0.12%, largest fraction = 66.31%), kidney (smallest fraction = 1.2%, largest fraction = 74.04%) and liver (smallest fraction = 0.4%, largest fraction = 56.53%) (Figure S2), profiled using 450K micro-arrays (HumanMethylation450K BeadChips; Illumina) (Fig. 2a). The tissue types profiled vary in the specificity of the marker CpGs identified from the reference samples. This can be quantified by computing mean difference in methylation between one cell type and all others for their respective marker CpGs. These values ranged from 57% for blood to 42% for small intestine (Fig. 2b). Specificity of marker CpGs was also evident in the dataset used for in silico mixture generation (Fig. 2a-b). Indeed, although tissue type fractions were in general accurately predicted (RMSE = 0.11, R² = 0.65), we observed larger deviations for the small intestine, with a significantly higher RMSE (RMSE = 0.17; P < 10^− 16) (Fig. 2c-d). These deviations are likely in part driven by the lack of concordance between the small intestine datasets used for generating the in silico mixtures and for generating the DNA methylation reference (Figure S2a). This emphasizes the need to use concordant reference datasets for achieving the best deconvolution performance.

Disregarding the effect of normalization, MethylResolver and EMeth-Laplace were the poorest and best performing deconvolution methods respectively, though differences in performance were not significant. Also most normalization methods performed comparably. Only log transformation performed significantly worse than all other normalization methods (Fig. 2d-e and S2b ). The best RMSE value was achieved when combining ridge regression with quantile normalization (RMSE = 0.08; R² = 0.71), while the worst RMSE value was produced by combining log normalization with EMeth-Normal deconvolution (RMSE = 0.19; R² = 0.35) (Fig. 2d-e).

Impact of cell type similarity

Next, we investigated how these algorithms perform a more difficult and relevant deconvolution task, namely deconvolution of relatively homogeneous cell type fractions that have a common developmental origin. Specifically, we investigated how blood cell types can be reliably deconvolved, by constructing pseudo-bulks generated from 450K microarray profiles of fluorescence-activated cell sorting (FACS)-purified neutrophils (smallest fraction = 0.07%, largest fraction = 51.74%), monocytes (smallest fraction = 0.07%, largest fraction = 44.25%), CD4+ (smallest fraction = 0.07%, largest fraction = 47.89%) and CD8 + T cells (smallest fraction = 0.18%, largest fraction = 45.88%), natural killer cells (smallest fraction = 0.28%, largest fraction = 51.59%) and B cells (smallest fraction = 0.3%, largest fraction = 41.60%) (Figure S3), and applying the same strategy we described higher. 100 marker CpGs were identified for each cell type, resulting in a total of 600 loci (Fig. 3a and S3a). Overall, algorithms showed a similar specificity (Fig. 3b). Even though EpiDISH was slightly outperformed by other algorithms, it was the only algorithm consistently predicting proportions at an RMSE below 0.07. Notably, for some cell types deconvolution was more accurate than for others: indeed, quantification of natural killer and CD8 + T cell abundance performed poorly (P < 0.001), perhaps because they are both cytotoxic effector cells thus sharing functional activities, despite originating from divergent lineages. Furthermore, DNA methylation for natural killer cell loci differed significantly between reference and validation datasets (P < 10^− 16; Figure S3b-c). To further validate these rankings using an in vivo dataset, we evaluated 85 DNA methylome profiles from blood samples²⁶, comparing the cell type fractions predicted by deconvolution to the cell type fractions measured by flow cytometry. Overall deconvolution performance was lower for in vivo data, perhaps reflecting the additional measurement uncertainty introduced by applying flow cytometry (Fig. 3c-d). Notably, EpiDISH performed consistently well in both in silico and in vivo data (in silico: RMSE = 0.06, in vivo: 0.04; Fig. 3e-f). None of the normalization methods significantly improved deconvolution over non-normalized data, but log normalization performed significantly worse (P < 10^− 16), as similar to what we descirbed higher.

Larger array size improves deconvolution.

We next assessed the impact of array size on deconvolution efficiency, by analyzing DNA methylome profiles generated for the same cell types using EPIC microarrays (Infinium MethylationEPIC v1.0 BeadChip; Illumina), which encompass over 850,000 probes. These include most probes represented on 450K microarrays, as well as about 350,000 additional probes targeting a more enhancer CpGs and fewer CpG island CpGs than on the 450K microarray²⁷. Of note, all 600 marker CpGs we identified from 450K arrays higher were also present on the EPIC arrays. Pseudo-bulks were again produced for neutrophils (smallest fraction = 0.5%, largest fraction = 37.11%), monocytes (smallest fraction = 0.5%, largest fraction = 54.70%), CD4 (smallest fraction = 0.07%, largest fraction = 42.37%) and CD8 + T cells (smallest fraction = 0.04%, largest fraction = 49.18%), natural killer cells (smallest fraction = 0.09%, largest fraction = 43.98%) and B cells (smallest fraction = 0.24%, largest fraction = 48.73%) (Figure S5). When using the same marker CpGs identified from 450K data on EPIC array data, the performance for all 91 algorithm-normalization combinations improved significantly (RMSE of 0.03 versus 0.06; P < 10^− 6), suggesting a higher concordance between the EPIC array data used for generating in silico mixtures and for deconvolution (Figure S4).

We next identified 600 new marker CpGs from the EPIC array data. 484 CpGs were selected that do not overlap with those represented on the 450K array (Fig. 4a and S5a). Of these 484 additional marker CpGs, 246 overlap with known enhancer regions. As before, natural killer cells and CD8 + T cells were also not separated as accurately as other cell types (Fig. 4b), likely due to the significant discordance in methylation ratios between reference and validation for natural killer cell marker loci (P < 10^− 12; Figure S5b-c). The relative performance rankings of algorithms and normalization methods were nevertheless comparable to the analyses using 450K array probes described higher (Fig. 4c). However, when comparing deconvolution CpGs selected from the EPIC probes to those selected from 450K array data, fraction estimates improved slightly for all cell types, except for CD8 + T-cells (Fig. 4d).

To further validate these in silico analyses, we next assessed performances on DNA extracted from different cell types, mixed in vitro at prespecified ratios¹⁴. Here, deconvolution performance was comparable to our in silico generated mixtures, thus validating the relative rankings of deconvolution and normalization methods, as well as our strategy for generating pseudo-bulks (Fig. 4d-f).

Impact of the number of marker CpGs on deconvolution

All analyses described higher rely on 100 marker CpGs per cell or tissue type. Depending on the method used to analyze DNA methylation, a lower number of marker CpGs may be preferable (e.g., when cost is to be minimized). To assess the impact of the number of marker genes on deconvolution, we next repeated our performance assessment, while varying the number of marker CpGs included. Specifically, we selected the 2 to 500 top-ranked marker CpGs for each cell type and assessed performance for each algorithm on unnormalized data (normalization did not improve performance; Fig. 5a-b). For many algorithms, the performance increased consistently when 5 or more marker CpGs were included. Other algorithms, such as EMeth-binomial and EMeth-Normal, reached an optimum around 50 CpGs and performed poorer as more CpGs were included. The EMeth algorithms appeared to perform relatively well for low numbers of marker CpGs (n = 2 to 10). Finally, the EMeth-Laplace method was top-performing, both when using only a few marker CpGs (n = 2 to 10), or when many markers were included CpGs (n = 300–500).

Deconvoluting small fractions

For generating in silico mixtures, we generate randomly specified fractions between 0 and 75%. However, in many instances, rare cell types are evident or of particular interest. As these cell types are less abundantly present in the mixture, their profile contributes less to the bulk signal and deconvolution becomes more difficult. In order to test how predictions differed for these smaller fractions, we reassessed performance exclusively for cell type contributions below 3%. Accuracy at this threshold was noticeably lower compared to larger fractions (Fig. 6a-b). Interestingly, adding reference CpGs improved R² on average from 0.88 to 0.97 (P < 10^− 16) for all fractions, and also for small fractions an average increase from 0.07 to 0.40 (P < 10^− 16) was observed, indicating that addition of reference CpGs is particularly beneficial for predicting small fractions, but also that small fractions are difficult to predict accurately, irrespective of the algorithm used (Fig. 6c and 5a). In conclusion, deconvolution for small fractions is inadequate in performance for all methods tested, but this can be mitigated to some extent by enlarging the reference marker panel.

Impact of incomplete or over-extensive reference

In many cases, the reference used for deconvolution can be inaccurate, by including more or fewer cell types than those present in a bulk sample. This can introduce noise into the deconvolution experiment. Therefore, we compared deconvolution performances when one cell type was lacking from the reference, or when a cell type was included in the reference but absent from the in silico mixture between algorithms. Removing cell types from the reference generally tends to improve deconvolution accuracy, except when a cell type is left out that is similar to another one included in the reference (e.g., CD4- and CD8 + cell types; Figure S6a). Furthermore, regularization-based deconvolution methods performed very well in this experiment, significantly outperforming MethAtlas (P < 0.01), which is a generally well-performing algorithm (Figure S6b). Including more cell types in the reference has a similar effect, but seemingly less intense, with slightly worse deconvolution accuracy for cell types that are highly similar (Figure S6c). In this setting, EMeth-Laplace, Meth atlas and ordinary least squares produced the best deconvolution results, significantly outperforming EMeth-Binomial (P < 0.01) and ridge deconvolution (P < 10^− 16; Figure S6d). In general, an over-inclusive reference performs better than an incomplete one. Cell types absent from the in silico mixture were erroneously assigned fractions up to 5.6%, with the largest fraction found for natural killer cells by EMeth-Binomial, and the lowest fraction found for CD8 + T-cells by MethylResolver (Figure S6e). Finally, we noted that MethylResolver produces multiple predictions below 0, which is obviously impossible in reality.

Deconvolution of DNA methylation sequencing data

DNA methylation is increasingly being profiled by sequencing-based methods such as whole-genome, reduced representation, targeted or amplicon bisulfite sequencing (BS-seq), or third generation nanopore-based sequencers. Here, DNA methylation levels are quantified by calculating the fraction of reads with a methylated CpG over all reads at a given locus, yielding data similar to array-based measurements. These profiles however differ from array-based profiles as they are count-based, quantifying the exact number of sequencing reads showing CpG methylation, rather than a percentage-based estimate of the fraction of methylated CpGs. Also, selection of marker CpGs differs, with a much larger search space: all 28 million CpGs in the human genome can putatively serve for selection of marker CpGs from whole-genome bisulfite sequencing (WGBS) data, versus only ~ 450,000 or ~ 850,000 CpGs available on microarrays. Also, instead of parsing individual CpGs, average DNA methylation over entire genomic regions can be leveraged for deconvolution, with DNA methylation at flanking CpGs being often highly correlated ²⁸.

Here, we tested performance for sequencing-based DNA methylation data of the same deconvolution methods we describe higher (Table 1). We performed deconvolution using non-overlapping genomic regions flanking 100bp as markers. Including multiple flanking CpG reduces measurement errors and improve overall deconvolution performance (RMSE = 0.041 vs 0.052, P < 10^− 16; Fig. 7 and S7). When selecting differentially methylated regions (DMRs), larger DNA methylation differences were observed between cell types compared to the marker CpGs identified in array data (64% vs 56%; Figure S8a-c). Deconvolution was performed on 100 in silico mixtures, comprising 6 immune cell type proportions ranging between: 0.86% and 36.06% for neutrophils, 1.8% and 54.7% for monocytes, 0.1% and 36.3% for CD4 + T cells, 0.04% and 35.9% for CD8 + T cells, 0.1% and 39.4% for natural killer cells and 0.3% and 46.2% for B cells (Figure S8d). CD8 + T-cell and B-cell fractions were predicted less accurately than other cell types (P < 0.001; Fig. 7b). Furthermore, we noted a lower overall deconvolution performance for WGBS, at 50x sequencing depth, than for EPIC array data (RMSE = 0.04 vs 0.02, P < 10^− 16; Fig. 7c). The underlying reason is unclear. It may be due to experimental differences, as WGBS protocols are often far less standardized between research groups, but an alternative explanation may be the difference in approach taken to generate in silico mixtures. Indeed, reads were mixed for WGBS, whereas DNA methylation levels were calculated by proportional weight-summing for array-derived data. Similar to array-based observations, the most consistently accurate predictions were produced by the EpiDISH algorithm (RMSE = 0.05; Fig. 7b). Furthermore, normalization did not positively impact the overall deconvolution performance (Fig. 7c-d), with unnormalized data producing top-performing results (RMSE = 0.04).

Finally, deconvolution experiments are often based on targeted BS-seq ^29,30. In these, both the number of regions included and the sequencing depth can have a significant impact on the analysis cost. We set out to test the impact of these variables, first by varying the number of marker regions for deconvoluting 6 immune cell types on the same 100 in silico mixtures. This revealed that a local optimum was reached when 100 to 200 regions were included for deconvolution, irrespective of the deconvolution method used (Fig. 8a). We next assessed the impact of sequencing depths. We simulated depth ranges between 60× and 0.5× by downsampling, assessing the performance of the EpiDISH deconvolution algorithm on 20 unnormalized in silico mixtures and varying the number of marker regions between 2 and 500 (Fig. 8b). Interestingly, all simulated sequencing depths exceeding 18-fold appeared to yield a similar performance, suggesting that a depth of ~ 24× suffices for deconvolution, and that higher sequencing depths are unlikely to boost accuracy for a similar deconvolution task.

Comparing robustness to technical variables

To confidently apply deconvolution, a method is expected to be robust to technical variables such as varying wet-lab protocols, data sources and data formats. As we have differences in each of these 3 variables in the datasets included, we compared deconvolution accuracy over all datasets among the deconvolution methods. Both EMeth-Normal and EMeth-Binomial performed significantly worse than the other algorithms (P < 10^− 4; Fig. 9a). Though not significant, both EpiDISH and EMeth-Laplace are top-performing (RMSE [column Z-score normalized] = -0.91 and − 0.78). For deconvolution of small fraction EMeth-Laplace slightly outperforms EpiDISH deconvolution (Fig. 9b). Relative to the other algorithms, MethylResolver performed remarkably better on small fractions than on all fractions, indicating that for more fine-grained deconvolution, MethylResolver might be preferable. Finally, variance over the datasets differs greatly between deconvolution methods. For example, in terms of mean normalized RMSE, EMeth-Laplace performs very well, however variance is quite large between datasets. For Minfi, bounded-variable least squares and non-negative least squares deconvolution variance is lower and therefore might be more reliable.

Despite the importance of cell fraction deconvolution for basic science and clinical studies, a large benchmark of deconvolution methods and DNA methylation analysis methods is lacking. A few studies have been performed, but these focused on a limited number of algorithms, without assessing the impact of cell fraction size, of normalization methods, of the types of input material, of the number of marker CpGs used as reference and, for sequencing-based methylome, of the impact of sequencing depth. Here, we benchmarked 13 reference-based deconvolution methods and seven normalization methods, adding up to 91 combinations, both for deconvolution at the tissue- and the cell-type level, and for array- and sequencing-based data. These were tested on in silico, in vitro and in vivo mixtures, such that a quantitative head-to-head comparison between predicted and actual fractions was possible. To facilitate future benchmarking of novel deconvolution algorithms, we provide an extensive supplementary data section containing all in silico mixtures with and without normalization, as well as all marker CpGs identified per cell/tissue type.

Given the ubiquity of microarray data normalization methods available, we anticipated data normalization to have a positive impact on deconvolution performance. Remarkably, this did not hold true for most algorithms. Another aspect influencing deconvolution performance is specificity of the marker loci. Illumina 450K arrays interrogate over 450,000 CpG sites, and a further 413,743 CpGs are represented on the more recent EPIC arrays. This means that for selection of tissue- or cell-specific CpGs, only 1.7% or 3.2%, for 450K and EPIC respectively, of all CpGs in the genome are considered. Therefore, fewer specific marker CpGs for any given cell type are available for selection. While this is not an issue when disparate cell types are deconvoluted, highly discriminatory marker CpGs are scarcer for more similar cell types, such as natural killer and CD8 + T-cells. Another deconvolution variable is the number of marker loci. We observe that, for deconvolution of six cell types, increasing this number from 5 to 100 results in increasingly accurate predictions, while further increasing beyond 100 marker CpGs mostly yields only marginal gains for the tested algorithms. It should be noted that this optimum may differ when deconvoluting a variable number of cell types. Also, the completeness of the reference should be considered, as both an over-extensive and incomplete reference can negatively affect deconvolution performance. Indeed, we observed a decrease in predictive accuracy when the number of cell types present in the reference differs from that in the mixture. However, if the cell types are unknown, a more comprehensive reference does seem to perform better. Finally, for DNA methylome sequencing data, the depth of sequencing will also drastically affect the deconvolution performance, while plateauing at 24× coverage. It should be noted however that for accurate prediction of excessively small fractions (i.e., < 3%), sufficient sequencing depth and reference size is essential.

Cell fraction deconvolution is a welcome alternative to expensive and time-consuming cell sorting techniques, bypassing these fallbacks in exchange for a minimal loss in sensitivity. Such information can subsequently be translated into a broad range of both research and clinical applications, including oncological and immunological research applications.

Our study also has some limitations. Firstly, we focus on reference-based methodologies for benchmarking. Alternative reference-free algorithms have emerged over the past decade, but are more complex for benchmarking, and they also have different fields of application. With the advent of DNA methylome maps for most major cell types in the human body, we anticipate that most future studies in patients will rely on reference-based deconvolution. Additionally, read-based deconvolution has also been proposed to analyze sequencing data. These have not been investigated here. Secondly, most benchmarking datasets we use are derived from in silico mixtures. These may differ in subtle ways from real-life data but offer the advantage of being customizable in high throughput and representing exact ground truths. Indeed, real-life datasets typically lack accurately determined cell type contributions, and they thus fail to serve as accurate benchmarks. Lastly, as only data from healthy tissues is evaluated, deconvolution performance on samples from pathological specimens is unknown. For example, cancer is associated with pervasive perturbations of the DNA methylation landscape that is not reflected in the currently available reference profiles.

In conclusion, we provide a comprehensive overview of most currently available reference-based DNA methylation deconvolution methods, while simultaneously comparing performances between different data formats and resolutions. Overall, we observe that normalization rarely positively impacts deconvolution, and that both EpiDISH and EMeth-Laplace consistently perform well in most contexts^31,36. Furthermore, we provide guidelines on the appropriate number of loci that should ideally be used for deconvolution, and on the optimal sequencing depth needed to determine cell type contributions to bulk samples analyzed with BS-seq data.

Dataset selection

Healthy tissues

Illumina Infinium 450K datasets for healthy tissues were identified on ArrayExpress. For each tissue, two datasets were used: one to construct the methylation reference (ref) and one for validation (val). Healthy tissues included kidney (ref: GSE59157; val: GSE50874), liver (ref: GSE61258; val: GSE61278), small intestine (ref: GSE73832; val: GSE50475) and blood (ref: GSE48472; val: GSE84003). In silico mixtures of all tissues were generated, for both reference and validation datasets, which were subsequently used for assessment of predictive accuracy of cell fraction deconvolution.

In silico mixtures were constructed by selecting randomly generated fractions for each cell type and computing methylation values for each CpG of which each cell type contributes proportionally to its assigned fraction.

Immune cells

Immune cell array datasets were selected based on availability of methylation data at single CpG level and matching FACS-determined cell fractions. Both array-based, i.e., 450K HumanMethylation (reference: GSE71244, GSE65097; validation: GSE35069, GSE77445) and EPIC (reference: FlowSorted.Blood.EPIC; GSE110544; validation: GSE103541, GSE129376), and sequencing datasets were included so comparisons between data formats were possible.

For reference construction of sequencing data, the GSE186348 dataset was used. Benchmarking of cell fraction deconvolution was performed on in silico sequencing read mixtures of several other sequencing datasets acquired from the European Genome-Phenome archive: EGAD00001000710, EGAD00001001189, EGAD00001001261, EGAD00001001473, EGAD00001002460 and EGAD00001002508. For building these in silico mixtures, as total sequencing depth needed to be sufficient, all samples of one specific cell type were combined into 1 large mixture from which random sequencing reads were then sampled.

Array-based in silico mixtures were constructed by selecting randomly generated fractions for each cell type and computing methylation values for each CpG of which each cell type contributes proportionally to its assigned fraction. Alternatively, sequencing-based in silico mixtures were constructed similarly by selecting randomly generated fractions for each cell type and combining n sequencing reads of each cell type contributing proportionally to their specific assigned fraction.

Marker selection

Reference CpGs were selected using our own-developed algorithm. This algorithm initially selects those CpGs for each of the I tissues/cell types that produce Benjamini-Hochberg adjusted significant p-values (significance level of 5%) computed between methylation values of the target group and all other groups. Secondly, from these CpGs the algorithm selects n CpGs with highest mean methylation differences between the target group and all other groups. This results in I x n CpGs that make up the complete reference.

For selection of DMRs used in WGBS deconvolution, we applied the same strategy with the added constrained that regions were not allowed to overlap.

Processing of WGBS data

Raw sequencing reads were aligned to the GRCh37 genome using bwa-meth v0.2.5, trimmed using Trim Galore v0.6.6, deduplicated using Picard v3.1.0 and lastly methylation values were extracted using MethylDackel v0.6.0 ^32–35.

Normalization workflow

Predictive performance of cell fractions was assessed between seven normalization conditions, including (column-wise) Z-score, (column-wise) min-max normalization, quantile normalization, log normalization and no normalization.

Regular and column-wise Z-score normalization was performed by applying the following formula to the dataset:

$$f\left(x\right)=\frac{x-{\mu }}{\sigma }$$

Regular and column-wise min-max normalization was performed by applying the following formula to the dataset:

$$f\left(x\right)= \frac{x-\text{M}\text{i}\text{n}\left(x\right)}{\text{Max}\left(x\right)-\text{M}\text{i}\text{n}\left(x\right)}$$

Quantile normalization attempts to equalize two distributions in a rank-based way. For this study, we applied the ‘normalize.quantiles’ function of the ‘preprocessCore’ R-package v1.62.1³⁶.

Finally, log normalization was performed by applying the following formula to the dataset:

$$f\left(x\right)={\text{L}\text{o}\text{g}}_{e}\left(\text{x}\right)$$

Deconvolution workflow

A total of 13 deconvolution algorithms were included for comparisons. These deconvolution algorithms can be divided largely into two categories: simplistic linear models and a more complex machine learning model varied by three different likelihood distributions. In linear models, cell fractions are equal to the coefficients of the model, as these coefficients resemble the contribution of each variable, and thus cell type. As the most basic model, we included ordinary least squares regression (OLS). Additionally, we also included several regularization models, such as elastic net, ridge, and lasso regression. Some other constrained statistical algorithms included in the assay are bounded-variable least squares (BVLS), least trimmed squares (LTS), non-negative least squares (NNLS), linear constrained projection (CP) and robust partial correlation (RPC). For most of these models, raw algorithms were applied for deconvolution. Additionally, for several of these algorithms, tweaked deconvolution software packages were included, such as Minfi using CP, MethylResolver using LTS, Meth Atlas using NNLS and EpiDISH using RPC. Furthermore, an expectation maximization algorithm was used varied by three different likelihood distributions: EMeth-Binomial, EMeth-Laplace and EMeth-Normal.

Deconvolution algorithms were stripped from any inherent normalizations, such that only one normalization algorithm was applied at a time. Finally, predictive accuracy of the algorithms between all the included conditions were compared.

Statistical assessment of deconvolution accuracy

Deconvolution accuracy was assessed using a set of two metrics: R² and RMSE between predicted values and actual values for cell fractions. Normality was assessed using Shapiro-Wilk test. P-values were calculated by student’s t-tests, Mann-Whitney U tests, Tukey’s Honest Significant Difference tests and Spearman’s rank correlation.

Software

All analyses were performed using R v4.1.1 and Python v3.8.11.

Pardoll, D.M. The blockade of immune checkpoints in cancer immunotherapy. Nat Rev Cancer 12, 252-264 (2012).
Lambrechts, D., et al. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat Med 24, 1277-1289 (2018).
Jain, R.K. Normalization of tumor vasculature: an emerging concept in antiangiogenic therapy. Science 307, 58-62 (2005).
Ausserwöger, H., et al. Non-specificity as the sticky problem in therapeutic antibody development. Nat Rev Chem 6, 844-861 (2022).
Schonbrunn, A. Editorial: Antibody can get it right: confronting problems of antibody specificity and irreproducibility. Mol Endocrinol 28, 1403-1407 (2014).
Hewitt, S.M., Baskin, D.G., Frevert, C.W., Stahl, W.L. & Rosa-Molinar, E. Controls for immunohistochemistry: the Histochemical Society's standards of practice for validation of immunohistochemical assays. J Histochem Cytochem 62, 693-697 (2014).
Szilágyi, M., et al. Circulating Cell-Free Nucleic Acids: Main Characteristics and Clinical Application. Int J Mol Sci 21(2020).
Schwarzenbach, H., Hoon, D.S. & Pantel, K. Cell-free nucleic acids as biomarkers in cancer patients. Nat Rev Cancer 11, 426-437 (2011).
Chan, A.K., Chiu, R.W., Lo, Y.M. & Biochemists, C.S.R.C.o.t.A.o.C. Cell-free nucleic acids in plasma, serum and urine: a new tool in molecular diagnosis. Ann Clin Biochem 40, 122-130 (2003).
Swarup, V. & Rajeswari, M.R. Circulating (cell-free) nucleic acids--a promising, non-invasive tool for early detection of several human diseases. FEBS Lett 581, 795-799 (2007).
Huiwen, C., Kate, S., Tatjana, J., Bernard, T. & Joris Robert, V. Expanded knowledge of cell-free DNA biology: potential to broaden the clinical utility. Expanded knowledge of cell-free DNA biology: potential to broaden the clinical utility 3, 216-234 (2022).
Poon, L.L.M., Leung, T.N., Lau, T.K. & Lo, Y.M.D. Presence of Fetal RNA in Maternal Plasma. Clinical Chemistry 46, 1832-1834 (2000).
Wang, X., Park, J., Susztak, K., Zhang, N.R. & Li, M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat Commun 10, 380 (2019).
Titus, A.J., Gallimore, R.M., Salas, L.A. & Christensen, B.C. Cell-type deconvolution from DNA methylation: a review of recent applications. Hum Mol Genet 26, R216-R224 (2017).
Shen-Orr, S.S. & Gaujoux, R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr Opin Immunol 25, 571-578 (2013).
Teschendorff, A.E. & Zheng, S.C. Cell-type deconvolution in epigenome-wide association studies: a review and recommendations. Epigenomics 9, 757-768 (2017).
Avila Cobos, F., Alquicira-Hernandez, J., Powell, J.E., Mestdagh, P. & De Preter, K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun 11, 5650 (2020).
Cheng, W.C., et al. Intra- and inter-individual variance of gene expression in clinical studies. PLoS One 7, e38650 (2012).
Ozsolak, F. & Milos, P.M. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12, 87-98 (2011).
Marguerat, S. & Bahler, J. Coordinating genome expression with cell size. Trends Genet 28, 560-565 (2012).
Merbs, S.L., et al. Cell-specific DNA methylation patterns of retina-specific genes. PLoS One 7, e32602 (2012).
Hudon Thibeault, A.A. & Laprise, C. Cell-Specific DNA Methylation Signatures in Asthma. Genes (Basel) 10(2019).
Grigoriu, A., Ferreira, J.C., Choufani, S., Baczyk, D., Kingdom, J. & Weksberg, R. Cell specific patterns of methylation in the human placenta. Epigenetics 6, 368-379 (2011).
Bibikova, M., et al. High-throughput DNA methylation profiling using universal bead arrays. Genome Res 16, 383-393 (2006).
Smith, Z.D., Gu, H., Bock, C., Gnirke, A. & Meissner, A. High-throughput bisulfite sequencing in mammalian genomes. Methods 48, 226-232 (2009).
Houtepen, L.C., et al. Genome-wide DNA methylation levels and altered cortisol stress reactivity following childhood trauma in humans. Nat Commun 7, 10967 (2016).
Pidsley, R., et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol 17, 208 (2016).
Eckhardt, F., et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nature genetics 38, 1378-1385 (2006).
Galle, E., et al. DNA methylation-driven EMT is a common mechanism of resistance to various therapeutic agents in cancer. Clinical epigenetics 12, 1-19 (2020).
De Borre, M., et al. Cell-free DNA methylome analysis for early preeclampsia prediction. Nature Medicine, 1-10 (2023).
Teschendorff, A.E., Breeze, C.E., Zheng, S.C. & Beck, S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinformatics 18, 105 (2017).
Pedersen, B.S., Eyring, K., De, S., Yang, I.V. & Schwartz, D.A. Fast and accurate alignment of long bisulfite-seq reads. arXiv preprint arXiv:1401.1129 (2014).
Picard toolkit. (Broad Institute, Broad Institute, GitHub repository: https://broadinstitute.github.io/picard/, 2019).
MethylDackel. (https://github.com/dpryan79/MethylDackel).
Krueger, F. Trim Galore. (The Babraham Institute, https://github.com/FelixKrueger/TrimGalore).
Ben, B. preprocessCore: A collection of pre-processing functions . (2023).
M. Mullen, K. bvls R-package. (2013).
Hastie, T., Qian, J. & Tay, K. An introduction to glmnet. (2023).
Zhang, H., Cai, R., Dai, J. & Sun, W. EMeth: An EM algorithm for cell type decomposition based on DNA methylation data. Sci Rep 11, 5717 (2021).
Moss, J., et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat Commun 9, 5068 (2018).
Arneson, D., Yang, X. & Wang, K. MethylResolver-a method for deconvoluting bulk DNA methylation profiles into known and unknown cell contents. Commun Biol 3, 422 (2020).
Aryee, M.J., et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363-1369 (2014).
M. Mullen, K. & H.M. van Stokkum, I. nnls: The Lawson-Hanson algorithm for non-negative least squares (NNLS). (2012).

Table 1

Overview of included deconvolution algorithms
Method	Statistical model	Software package
BVLS	Bounded-variable least squares	bvls R-package³⁷
Elastic net regression	Elastic net regularization	glmnet R-package³⁸
EMeth-Binomial	Expectation maximization with binomial likelihood function	EMeth R-package³⁹
EMeth-Laplace	Expectation maximization with Laplace likelihood function	EMeth R-package³⁹
EMeth-Normal	Expectation maximization with normal likelihood function	EMeth R-package³⁹
EpiDISH	Robust partial correlation	EpiDISH R-package³¹
Lasso	Lasso regularization	glmnet R-package³⁸
Meth atlas	Non-negative least squares	Meth atlas Python-package⁴⁰
MethylResolver	Least trimmed squares	MethylResolver R-package⁴¹
Minfi	Houseman algorithm	Minfi R-package⁴²
NNLS	Non-negative least squares	nnls R-package⁴³
OLS	Ordinary least squares	lm base R function
Ridge	Ridge regularization	glmnet R-package³⁸

There is NO Competing Interest.

TableS1.xlsx
Table S1
TableS2.xlsx
Table S2
TableS3.xlsx
Table S3
TableS4.xlsx
Table S4
TableS5.xlsx
Table S5
TableS6.xlsx
Table S6
FigureS1.pdf
Figure S1 | Deconvolution on reference-based in silico mixtures. a. Performance of deconvolution on in silico EPIC data. Algorithm-normalization combinations are visualized as circles. R² represented by color, root mean squared error represented by size. Rows show deconvolution algorithms, columns show normalization methods. b. Scatter plots showing true proportions (x-axis) and predicted proportions (y-axis) in percentages for the best performing (left-upper) and worst performing (right-lower) algorithm-normalization combinations on reference-based in silico EPIC data.
FigureS2.pdf
Figure S2 | Correlation of methylation rates between reference and validation 450K tissue datasets. a. Scatter plot showing mean methylation ratios for reference (x-axis) and validation (y-axis) datasets in percentages at marker CpGs (n = 400) on in silico 450K tissue data. b.Scatter plots showing true proportions (x-axis) and predicted proportions (y-axis) in percentages on in silico 450K data for all tissues using quantile normalization combined with ridge deconvolution. c. Ridgeline plot showing distributions of cell proportions per tissue.
FigureS3.pdf
Figure S3 | Comparison of marker CpGs between reference and validation 450K immune cell datasets. a. Matrix of marker CpGs (n = 600) used for building immune cell methylation in silico mixtures of 450K immune cell data. b. Boxplots comparing differences in mean methylation ratio between one cell type and all others, at their respective marker CpGs, between reference (yellow) and validation (purple) datasets. c. Scatter plot showing mean methylation ratios for reference (x-axis) and validation (y-axis) datasets in percentages at marker CpGs on in silico 450K immune cell data. d. Ridgeline plot showing distributions of cell proportions per cell type.
FigureS4.pdf
Figure S4 | Deconvolution of immune cell types on EPIC data using CpGs selected on 450K data. a. Matrix of 450K marker CpGs (n = 600) used for building immune cell methylation reference of EPIC data. b. Deconvolution accuracy represented as violin plots showing root mean squared error (RMSE) values for the different deconvolution methods, normalization methods and cell types. Black diamond shapes represent median values, colors represent cell types. c. Performance of deconvolution on in silico EPIC data for all algorithm-normalization combinations represented as circles. Algorithm-normalization combinations are visualized as circles. R² represented by color, RMSE represented by size. Rows show deconvolution algorithms, columns show normalization methods. d. Scatter plots showing true proportions (x-axis) and predicted proportions (y-axis) in percentages for the best performing (left-upper) and worst performing (right-lower) deconvolution and normalization algorithms on in silico EPIC data.
FigureS5.pdf
Figure S5 | Comparison of marker CpGs between reference and validation EPIC immune cell datasets. a. Matrix of marker CpGs (n = 600) used for building immune cell methylation in silico mixtures of EPIC immune cell data. b. Boxplots comparing differences in mean methylation ratio between one cell type and all others, at their respective marker CpGs, between reference (yellow) and validation (purple) datasets. c. Scatter plot showing mean methylation ratios for reference (x-axis) and validation (y-axis) datasets in percentages at marker CpGs on in silicoEPIC immune cell data. d. Ridgeline plot showing distributions of cell proportions per cell type.
FigureS6.pdf
Figure S6 | Impact of incomplete or over-extensive reference on deconvolution. a. Boxplots showing root mean squared error (RMSE) values while leaving one cell type out of the reference, subtracted from root mean squared error values in the normal setting (i.e., equal number of cell types in reference and mixture). Titles represent removed cell types. X-axis represents the deconvoluted cell types. b. Boxplot showing RMSE values for each deconvolution method while leaving one cell type out of the reference. Black dots represent RMSE in the normal setting, colors represent the cell type that was left out. c. Boxplots showing RMSE values while leaving one cell type out of the mixture, subtracted from root mean squared error values in the normal setting. Titles represent removed cell types. X-axis represents the deconvoluted cell types. d. Boxplot showing RMSE values for each deconvolution method while leaving one cell type out of the mixture. Black dots represent RMSE in the normal setting, colors represent the cell type that was left out. e. Boxplots showing predicted fractions for cell types that are not present in the mixture for each deconvolution method separately.
FigureS7.pdf
Figure S7 | Deconvolution of immune cell types on whole-genome bisulfite sequencing data using marker CpGs. a. Matrix of marker CpGs (n = 600) used for building immune cell methylation reference of whole-genome bisulfite sequencing (WGBS) data. b. Deconvolution accuracy showing root mean squared error values (RMSE) for the different deconvolution methods, normalization methods and cell types. Black diamond shows median, colors represent cell types. c. Performance of deconvolution on in silico WGBS data. Algorithm-normalization combinations are visualized as circles. R² represented by color, RMSE represented by size. Rows show deconvolution algorithms, columns show normalization methods. d. Scatter plots showing true proportions (x-axis) and predicted proportions (y-axis) in percentages for the best performing (left-upper) and worst performing (right-lower) deconvolution and normalization algorithms on in silico WGBS data.
FigureS8.pdf
Figure S8 | Comparison of marker regions between reference and validation WGBS immune cell datasets. a. Matrix of marker regions (n = 600) used for building immune cell methylation in silico mixtures of whole-genome bisulfite sequencing (WGBS) immune cell data. b. Boxplots comparing differences in mean methylation ratio between one cell type and all others, at their respective marker CpGs, between reference (yellow) and validation (purple) datasets. c. Scatter plot showing mean methylation ratios for reference (x-axis) and validation (y-axis) datasets in percentages at marker regions on in silico WGBS immune cell data. d. Ridgeline plot showing distributions of cell proportions per cell type.

Download PDF

Journal Publication

published 16 May, 2024

Read the published version in Nature Communications →

Version 1

posted

You are reading this latest preprint version

Benchmarking of Methods for DNA Methylome Deconvolution

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results

Setting up the benchmarking

Tissue fraction deconvolution

Impact of cell type similarity

Impact of the number of marker CpGs on deconvolution

Deconvoluting small fractions

Impact of incomplete or over-extensive reference

Deconvolution of DNA methylation sequencing data

Comparing robustness to technical variables

Discussion

Methods

Dataset selection

Healthy tissues

Immune cells

Marker selection

Processing of WGBS data

Normalization workflow

Deconvolution workflow

Statistical assessment of deconvolution accuracy

Software

References

Table

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1