All of the studies described used prespecified standard operating procedures, statistical analysis plans, and acceptance criteria, as well as using qualified critical reagents, instruments and software, and traceable reagent lots. Study designs followed established Clinical and Laboratory Standards Institute (CLSI) guidelines when relevant [23-26]
Sample selection
Clinical samples were obtained from clinical collaborators and commercial vendors, for a total of 115 patients diagnosed with MM, ALL, or CLL [samples were derived from bone marrow aspirate (BMA) and peripheral blood]. All clinical disease samples had been previously characterized by mpFC and/or immunohistochemistry to independently quantify disease burden. In addition, cell lines for each lymphoid malignancy were purchased; these comprised MM lines IM-9 (ATCC; Manassas, VA), L-363 (Leibniz Institute DSMZ; Germany), NCI-H929 (Sigma; St. Louis, MO), and U-266 (ATCC); ALL lines GM14952 (Coriell; Camden, NJ), GM20390 (Coriell), and SUP-B15 (ATCC); and CLL lines MEC-1 (DSMZ), HG-3 (DSMZ), and PGA-1 (DSMZ). Genomic DNA (gDNA) was extracted using an automated QIAsymphony SP® instrument (QIAGEN; Hilden, Germany) and the gDNA concentration was measured by the Quant-iT™ PicoGreen® assay (Thermo Fisher Scientific; Waltham, MA). A subset of 66 clinical samples (21 ALL, 22 CLL, and 23 MM samples) was chosen for use in these analytical validation studies; samples were preferentially selected to have high disease burdens and high mass of gDNA since the contrived samples generated for these studies required higher volumes and tumor burdens than samples submitted for routine clinical assessment. Samples were also selected to provide representative proportions of non-unique clonotype sequences (relative to previously assayed clinical samples) while ensuring that no two samples carried an identical clonal sequence. Contrived samples were prepared by mixing gDNA from these 66 clinical samples and 9 cancer cell lines with gDNA from the bone marrow of 7 healthy subjects (Additional file 1: Table S1).
MRD detection and tracking by the clonoSEQ Assay
Cancer clonotype sequences are identified in diagnostic ‘ID’ samples and then measured in follow-up MRD samples using the clonoSEQ Assay. Genomic DNA is amplified using locus-specific multiplex PCR with a master mix of primers targeting V, D, and J genes of the IgH, IgK, IgL, BCL1/IgH and BCL2/IgH loci; a second PCR is used to add reaction-specific barcodes for sample identification. The assay also amplifies genomic regions present as diploid copies in normal gDNA to quantify the total nucleated cell content of a sample. Barcoded amplicons are then pooled into sequencing libraries, checked for adequate DNA amplification by quantitative PCR (qPCR), and sequenced using the Illumina NextSeq™ 500 System (Illumina; San Diego, CA). The target mass of input DNA for ID samples is 500 ng and for MRD samples, 20 µg. Positive and negative amplification and sequencing controls are included in each reaction batch to ensure that all steps meet predefined quality thresholds.
Sequencing data are processed using a custom bioinformatics pipeline, with data quality checked at the flowcell, PCR well, and sample levels. Reads are assigned to rearranged B-cell receptors (BCRs) for each sample and clustered into clonal receptor sequences; these sequences are assessed for their likelihood to be disease associated and their suitability for subsequent tracking. A sequence is considered acceptable for tracking if it comprises at least 3% of all BCR sequences at a given locus and at least 0.2% of all nucleated cells in the sample, is well separated from the background repertoire (no more than 5 other less-abundant sequences from the same locus with repertoire frequencies within a factor of 10 of the frequency of the sequence selected for tracking), is represented by at least 40 gDNA templates, and is sufficiently unique for tracking. Sequence uniqueness is assessed by comparison with a large database of previously observed Ig rearrangements; depending on its incidence in the database, each sequence is assigned a uniqueness score that reflects its likelihood of being detected in a healthy repertoire. Sequences with poor uniqueness scores are excluded from MRD tracking; this prevents false MRD signals from being generated by healthy clones with Ig rearrangements that coincidentally match sequences from a malignant clone.
Once suitable disease-associated sequences have been identified, these ID sequences are compared with those found in successive MRD sample(s) for tracking. Imperfect matching between ID and MRD sample sequences is permitted to account for potential somatic mutations in a disease-associated sequence; sequences with higher complexity (hence lower probability of independently forming in a non-malignant clonal population) are permitted to include a higher proportion of mismatched nucleotides. Finally, the abundance of each of the tracked sequences in an MRD sample is measured, and used to compute a consensus sample-level malignant cell count and a total nucleated cell count. The ratio of these values provides an estimate of the MRD frequency in a sample.
Sensitivity and Specificity
The goal of this analysis was to determine the sensitivity and specificity of the clonoSEQ Assay by assessing the limit of detection (LoD), the limit of quantitation (LoQ) and the limit of blank (LoB). These parameters were required in order to make sample-level MRD estimates for the subsequent evaluation studies.
The LoD was defined as the malignant-cell count at which the assay would detect MRD in 95% of samples. The LoQ was defined as the lowest clonoSEQ sample MRD frequency that could be quantitatively determined within 70% relative total error, defined as root-mean-square error (RMSE) divided by the number of input malignant cells. RMSE can be calculated as the square root of the squared bias plus the variance. An allowable 70% total error near the LOD of the assay is acceptable for the intended clinical use of the assay. At this level of total error, if two malignant cells were truly present in a sample (which is near the expected LOD), 95% of MRD measurements would report between 1 and 5 malignant cells. This would not significantly change the interpretation of the MRD result.
The LoD and LoQ of the clonoSEQ Assay were estimated and confirmed in 2 sequential experiments. gDNA from 66 clinical disease samples and 3 cell lines (1 for each lymphoid malignancy: GM14952, IM-9, MEC-1) was pooled at specific ratios according to the sample disease loads, such that each sample contained the same expected number of malignant cell equivalents. gDNA from 7 healthy donors was also pooled. The healthy gDNA pool was then used as a diluent for the disease gDNA pool to generate contrived samples with specific DNA masses and malignant cell frequencies.
The first experiment estimated the LoD and LoQ using DNA input amounts of 500 ng and 20 μg, each using 5 MRD frequencies ranging from 1–23 malignant cells per disease sample. This experiment generated LoD and LoQ estimates based on the combined data from all 3 disease indications and both DNA input amounts. The second experiment was designed to confirm the estimated LoD and LoQ using 8 input DNA concentrations across the entire targeted range from 500 ng to 20 μg. DNA input levels above and below the range (40 µg and 200 ng, respectively) were also included. For each input DNA concentration, the MRD frequencies estimated in the first experiment (in units of ‘malignant cell equivalents,’ which are independent of DNA input amount) were tested. In both the first and second experiments, each of the contrived samples was tested in duplicate with the clonoSEQ Assay using 1 operator set, 1 instrument set, and 4 reagent lots.
The LoB was determined by assessing the presence and abundance of a patient’s trackable malignant Ig sequences, as defined by the corresponding MRD frequencies, in healthy bone marrow. The MRD frequency that would be observed by chance in up to 5% of healthy repertoires, assuming a given amount of available gDNA, was then identified. This metric reflects the probability that a non-malignant clone would independently rearrange the same Ig receptor sequence as a malignant clone and not be excluded by the tracking algorithm, which could lead to an inflated MRD abundance estimate or false detection of MRD. While the LoB was defined in this study to control for a type I error rate of 5%, it was expected that the true false detection rate of the assay would be much less than 5% since the majority of sequences selected for MRD tracking are highly specific to the malignant clone from a given patient. During sample preparation, the calibrated clonotype sequences had all been identified as independent, and therefore none were excluded from this analysis.
Trackable malignant Ig sequences identified in the 66 patient samples were searched for in bone marrow-derived gDNA from 7 healthy donors at 3 DNA input amounts, 500 ng, 20 µg and 40 µg, respectively, which correspond to the minimum, target, and maximum range of the clonoSEQ Assay for MRD samples. Each of these 21 samples was tested with the clonoSEQ Assay using 1 operator set and 1 instrument set. At least 2 reagent lots were used for all test samples (4 reagent lots were used for the 500 ng and 20 µg samples, and 2 were used for the 40 µg sample). For each DNA input, 28 samples (7 x 4) were used to assess LOB.
Statistical analysis
To determine the LoD, the proportion of MRD positive results obtained from the clonoSEQ Assay was modeled as a function of expected clonal frequency (based on disease load estimates of the undiluted samples and subsequent dilution factors) using a probit model. The LoD was calculated as the expected number of malignant input cells at which the fitted probit curve reached a detection probability of 95%.
The LoQ was estimated using Sadler’s precision profile model to relate expected clonal frequencies to relative total error estimates [27]. The LoQ was calculated as the expected number of malignant input cells at which the fitted precision profile curve reached a relative total error of 70%.
The LoB was estimated in the 20 µg samples (which are most likely to contain sequences from non-malignant clones which match a tracked sequence) and confirmed in the 500 ng samples. Non-parametric statistics were used to find the 95th percentile of MRD measurements among all tracked sequences in all blank samples at each DNA input level. These analyses were independently repeated in the 40 μg samples to confirm LoB.
Precision
Study design
The primary goal of this study was to analytically validate the precision of the clonoSEQ Assay using clinical samples from 3 indications (MM, CLL, and ALL). Contrived disease samples were generated by diluting gDNA combined from 66 patient clinical samples with gDNA pooled from BMA from 7 healthy donors, to achieve 6 malignant cell frequencies in total DNA input amounts of 500 ng, 2 µg, and 20 µg (Fig. ).
The precision, repeatability and reproducibility study used a main effects screening design over 21 calendar days and 10 assay runs to measure the effects of day, run within day, operator set (3 sets), instrument set (2 sets of thermal cycler/liquid handler matrixed with 2 sequencers), and reagent lot (4 lots) for each disease indication and sample MRD frequency under study (Additional file 2: Figure S1). The disease-associated sequences from each clinical sample which were identified during ID testing were searched for in all contrived samples, generating a sample MRD frequency measurement for each of the 66 clinical samples in each contrived sample. These sample MRD measurements were then used to determine the precision of the clonoSEQ Assay.
Statistical analysis
For each DNA input level and sample MRD frequency measurement, mixed models and analysis of variance (ANOVA) were used to model MRD measurements as a function of different operator sets, instrument sets, reagent lots, days, and runs within day, while treating each variable as a random effect. This information was used to decompose the total variability in MRD measurements for each input DNA level into components of variance attributable to each variable and to random error. All data points with expected MRD levels below the LoD of a sample were excluded from analysis.
Estimates of repeatability were obtained from the component of variance associated with random error, which included the variability associated with duplicate measurements under the same experimental conditions. Estimates of reproducibility were obtained from the sum of the components of variance due to operator set, instrument set, reagent lot, day, run within day, and random error; estimates of lot-to-lot variability were obtained from the component of variance associated with reagent lot. The percentage coefficient of variation (%CV) due to repeatability, reproducibility, and lot-to-lot variability in replicated MRD measurements was then calculated for each input DNA level and targeted sample MRD frequency.
Linearity
Study design
The primary goal of this analysis was to analytically validate the linear range of the clonoSEQ Assay. Contrived disease samples across a range of malignant cell frequencies were created by spiking gDNA from the 9 cell lines (3 for each of MM, CLL, and ALL, as detailed above; only MM and ALL for the 40 μg DNA input) into background gDNA pooled from the whole blood of 3 healthy donors. Four DNA input amounts (200 ng, 2 μg, 20 μg, and 40 μg) were tested, which cover the acceptable range of inputs for MRD testing (500 ng–40 µg). While the minimum input for MRD testing via the clonoSEQ Assay is 500 ng (to ensure sensitivity at an MRD frequency of 1x10-4), we included a 200 ng input level to assess whether linearity extends beyond the range of the currently acceptable MRD testing input, as well as a 40 µg input level to measure linearity beyond the targeted MRD input of 20 µg. Genomic DNA from cancer cells was spiked into the background gDNA at frequencies ranging from just below the expected LoQ of 2.5 cancer cells to hundreds of thousands of cancer cells comprising up to 100% of nucleated cells in a sample (Table 1). The frequencies estimated by the assay were then checked for linearity across clinically relevant ranges for MRD testing.
Assay linearity was confirmed using data from the precision study, in which clinical sample gDNA was diluted with gDNA from pooled healthy individuals. Three representative clinical samples from each disease indication (totaling 9 samples) from the precision study were selected. Linearity assessment was conducted across 6 MRD frequencies at each DNA input: 500 ng, 2 µg, and 20 µg. The range of MRD frequencies tested for each DNA input amount is shown in Fig. 2.
Statistical analysis
Linearity was assessed by comparing the proportionality of individual MRD measurements to expected clone frequencies using the polynomial method [28]. First, the data in the verification range were fitted to regression models with first-order (linear), second-order (quadratic), and third-order (cubic) polynomials. If none of the non-linear terms in the second- and third-order polynomials were significant at P<0.05, linearity was established across the verification range. Otherwise, the higher-order polynomial model with the best fit was compared to the linear model at each clonal frequency. If the fitted polynomial was within ± 5% of the linear fit at every frequency, the results were considered acceptably linear; otherwise, the range of clonal frequencies was reduced and this procedure repeated until linearity was achieved.
Quantitation accuracy
Study design
The primary aim of these studies was to assess the analytical quantitation accuracy (or bias) of the clonoSEQ Assay relative to mpFC. Two types of experiment were conducted for this purpose: first, 2 ALL cell lines (SUP-B15, GM20390) and 2 MM cell lines (NCI-H929, U266) selected by the mpFC lab were diluted into healthy background mononuclear cells at 5 dilution levels from 5x10-7 to 1x10-2, with 2 replicates per sample. Second, the data generated in the precision study were re-analyzed for quantitation bias between clonoSEQ MRD measurements in diluted gDNA samples and expected MRD levels based on mpFC measurements of the original gDNA samples and subsequent dilution factors. The Pearson R2 coefficient was calculated to assess correlation.
Statistical analysis
For the study of cell lines blended with background mononuclear cells, MRD frequencies between mpFC and the clonoSEQ Assay were compared to demonstrate concordance.
For the re-analysis of data from the precision study, which provided a much larger number of data points, a nested bootstrap procedure incorporating random sampling with replacement from hierarchical correlated data was used to account for dependencies among samples and replicate measurements; bootstrap sampling was done separately for each disease indication and number of input cancer cells. Estimated clonoSEQ Assay bias was presented as relative bias (i.e., the difference between observed and expected over expected), along with non-parametric 95% confidence intervals (CI) determined by 10,000 bootstrap replicates. We anticipated a (relative) mean bias of ± 35%, which is small relative to clinically meaningful changes in MRD level, and that this bias would remain within ± 35% across the tested range of disease burden.
Sequence accuracy
Study design
This study assessed the observed rate of agreement between the nucleotide sequences identified in ID samples for tracking during sample selection and the nucleotide sequences identified in the contrived samples used in the precision study, both as described above.
Statistical analysis
For each clonotype sequence designated for tracking, all sequences in an MRD sample within Hamming distance ≤ N bp were included for assessment of overall percent agreement (OPA), where N was defined for each tracked sequence as the number of allowable mutations based on the complexity (or uniqueness) of the clonotype rearrangement. N was chosen to capture somatic genetic variation among B cells from the same clonal lineage without incorrectly grouping sequences from different clonal lineages. Once this population was established, the OPA between the original clonotype sequence and the sequences identified in the MRD assessment was calculated. All OPA values were also restated as a Phred quality score [i.e., -log10 (disagreement rate)].
The following algorithm was used to assess OPA:
Given:
- Length (of alignment between MRD sequence and tracked ID clonotype)
- Mismatches (number of mismatched bases in alignment)
- Allowed (allowed mutations for the tracked ID clonotype)
- Abundance (estimated number of templates for MRD sequence)
If (Mismatches ≤ Allowed):
- Positive Agreement = (Length - Mismatches)*Abundance
- Negative Agreement = Mismatches*Abundance
Across all sequences with (Mismatches ≤ Allowed):
- OPA = 100*sum(Positive Agreement)/[sum(Positive Agreement) + sum(Negative Agreement)]
This algorithm measures the degree of nucleotide agreement for each malignant clonotype in complex mixed samples, conditional on certainty (through the number of allowed mutations) that the sequence is genuinely a derivative of the malignant clonotype sequence and not a chance rearrangement within a separate clonal population.