QASeq development and 2-plex demonstration. Unique molecular identifiers (UMIs)21,22 are attached to individual input DNA strand via two cycles of PCR with long annealing time for high and uniform barcoding efficiency. After further amplification, NGS reads originating from the same input DNA strand carry the same UMI sequence and thus belong to the same UMI family. Therefore, the unique UMI family count represents the number of input DNA strands (Fig. 1a).
We demonstrated QASeq for absolute quantitation and copy number calculation using a 2-plex panel containing 2 quantitation modules in gene ERBB2 (target) and EIF2C1 (reference) respectively (Supplementary Sect. 2 and Sect. 6), and compared with ddPCR for the same 2 genes side-by-side using five replicated experiments for both methods (Fig. 1b). QASeq exhibited higher and more consistent conversion yield than ddPCR, where conversion yield is the fraction of input molecules that are observed in the experiment. 10 ng Human PBMC gDNA from the same healthy donor was used per experiment, corresponding to 2,790 haploid copies. QASeq showed higher conversion yield (86% on average) than ddPCR (53% on average). The coefficient of variation (CV) of molecule count was lower for QASeq (5.0% for ERBB2, 2.5% for EIF2C1) than for ddPCR (12.8% for ERBB2, 13.3% for EIF2C1) in 5 replicates.
High dynamic range of DNA input was observed for absolute quantitation using 2-plex QASeq (Fig. 1c). Observed ERBB2 molecule counts by QASeq were close to expected value calculated from DNA input amount because of high conversion yield. Lower conversion yield at 1 ng input DNA was possibly a result of material loss at low concentration.
ERBB2 ploidy calculated from QASeq was accurate and highly reproducible (Fig. 1d). ERBB2 ploidy was calculated as 2 × ERBB2 molecule counts / EIF2C1 molecule counts. The mean of 5 replicates was 1.98, which is close to normal ploidy of 2.00. The CV for calculated ERBB2 ploidy was 4.8% in five replicates. Spike-in cell-line DNA samples with different expected ERBB2 ploidy were assayed by 2-plex QASeq and ddPCR, and high correlation in calculated ploidy was observed between the methods (Supplementary Fig. S2).
The histogram for observed UMI family size distribution followed log-normal distribution after removal of small families (Fig. 1e), because in theory random yield difference in PCR efficiency was amplified exponentially in PCR cycles post UMI attachment. Large number of UMI families with UMI family size < 3 were observed. These small UMI families were not used for molecule counting as they are likely results of polymerase and sequencing errors.
Highly multiplexed QASeq for CNV detection. Combination of multiple absolute quantitation modules for CNV detection was demonstrated to overcome the stochasticity in sampling. Experimentally, we did observe that the stochastic error in copy number quantitation reduced as a function of number of quantitation modules in the gene (Fig. 2a). A QASeq panel with 175 modules was designed for ERBB2 CNV detection, with 49 quantitation modules in ERBB2 and 123 modules in other regions of human genome serving as the reference (Supplementary Sect. 3 and Sect. 6). The rest of 3 modules are in Chromosome X thus are not used in CNV analysis. Five technical replicates were conducted with 8.3 ng healthy PBMC gDNA per library. 1 to 49 modules were used for ERBB2 ploidy quantitation. The CV of ERBB2 ploidy in 5 replicated experiments was reduced from 3.25–0.69% as the number of modules increases, consistent with theory based on Poisson distribution.
Combination of multiple quantitation modules in QASeq allowed confident discrimination between 2.05 and 2.00 ploidy ERBB2 samples using QASEq. 2.05 ploidy sample was prepared by mixing a normal PBMC DNA sample and ERBB2-positive cell line (SK-BR-3) DNA. Normal sample was tested in quadruplicates, and 2.05 ploidy sample was tested in duplicates.
Multiplexed QASeq for CNV in tumor tissue samples. QASeq was applied to 18 fresh/frozen (FF) tumor samples from 16 breast cancer patients and was compared with both ddPCR and immunohistochemistry (IHC) results on ERBB2. QASeq ERBB2 ploidy results were concordant with IHC and ddPCR from the tumor tissue (Fig. 2cd) with potentially fewer false positives. ERBB2 ploidy from QASeq showed high correlation with ddPCR (Supplementary Fig. S3). For the single sample with discordance between QASeq and IHC, ddPCR agreed with QASEq. For all the 3 samples with discordance between QASeq and ddPCR, ddPCR ploidy were between 2.5 and 3 and IHC agreed with QASeq for negative call. There was no case where IHC and ddPCR agreed on a call that conflicted with QASeq results.
175-plex QASeq is theoretically equivalent to C(175, 2) = 15225 different ddPCR CNV assays (Supplementary Fig. S4). To utilize quantitation modules beyond just calculating ERBB2 ploidy, modules in ‘reference’ were further grouped based on gene. Ploidy for 10 genes and 2 chromosomal regions were calculated from QASEq. To reduce the false positives of CNV calls in clinical samples and account for potential poor sample quality, sequential Mann-Whitney U tests on each gene of interest was performed (Supplementary Fig. S5, see Supplementary Sect. 3 on data analysis for multiplexed QASeq with > 2 quantitation modules). As an example, ploidy of each of the 175 quantitation modules for a normal PBMC DNA sample and for an FF DNA sample from breast cancer tumor section was shown in Fig. 2e. In clinical sample analysis, the ploidy values for a gene will be reported as 2.00 if there is no statistical difference between gene of interest and reference by Mann-Whitney U test. We summarized the CNV results in all of 18 FF samples (Fig. 2f). The LoD for ERBB2 CNV was calculated from the five technical replicates in healthy gDNA to be 1.97 ploidy for copy number loss and 2.04 for gain (Supplementary Table S2 and Supplementary Sect. 3). Additionally, ERBB2 ploidy in PBMC DNA from 10 different healthy donors were assessed with 175-plex QASeq and ddPCR respectively for biological variability (Supplementary Fig. S6). There is no sample with ploidy deviating from 2.00 for over 10%, but ddPCR showed wider ploidy range (1.8 to 2.1) than QASeq (1.9 to 2.0) in the 10 normal samples.
QASeq could improve clinical sensitivity in CNV assessment. The ploidy values of all observed gene ploidies in 18 tumor DNA samples are plotted as a histogram (Supplementary Fig. S7). Using methods with LoD at 1.6 and 2.4 ploidy, such as ddPCR, 44% of the CNVs may be missed. Additionally, QASeq allows quantitation of mutations down to 0.1% variant allele frequency (VAF) with UMI error correction (Supplementary Fig. S8). As an NGS-based quantitation method, sequence mutation calling is performed in addition to the copy number calculation. We designed QASeq panel amplicons to include hot spot mutations commonly observed in breast cancer (Supplementary Sect. 4 and Sect. 6).
Study of QASeq liquid biopsy results with disease progression in ERBB2+ (HER2+) metastatic breast cancer patients. QASeq breast cancer liquid biopsy panel (Supplementary Sect. 6) was applied to serial longitudinal plasma cell-free DNA (cfDNA) samples and was compared with disease progression. 57 plasma cfDNA samples from 15 patients with ERBB2 + metastatic breast cancer were tested by QASeq, with 2–8 samples per patient, where all patients had a baseline sample obtained at the moment of diagnosis of metastatic breast cancer, and follow-up samples were obtained at different timepoints for each patient. All patients had a diagnosis of ERBB2+ (IHC II + or III + or FISH+) metastatic breast cancer and the ERBB2 status was confirmed from a biopsy obtained from tumor tissue.
We summarized ERBB2 ploidy change in cfDNA and disease progression dates for each patient using a swimmer plot (Fig. 3a). There were 8 patients who developed disease progression who had a plasma sample collected within 6 months before or after the disease progression. ERBB2 amplification or increase of ERBB2 ploidy relative to the previous time point was observed in 6 out of the 8 patients who developed disease progression. In the other two patients (de-identified patient ID 2697 and 2366), disease progression was reported 4 times for each patient and abnormal ERBB2 CNV can only explain part of the disease progression. Significant allele frequency changes in PIK3CA G1049R mutation in patient 2697 and in SNP rs1309838194 in patient 2366 were correlated with disease progression respectively (Fig. 3b). PIK3CA G1049R mutation (COSV55874453) is considered to be structural damaging alteration as disease-causing drivers23,24 in breast cancer. The VAF for PIK3CA G1049R in circulating cfDNA was increased by over 10-fold during follow-up, serving as evidence for increased tumor fraction. The PIK3CA mutation may have contributed to disease progression. In patient 2366, VAF for the SNP rs1309838194 in ERBB2 changed from around 50% (heterozygous at baseline time point) to 80% during follow-up, which indicated increased tumor-derived DNA in plasma and may be associated to disease progression. Since the overall ERBB2 ploidy in plasma was still normal, we think that the copy-neutral loss of heterozygosity (LOH)25,26 may be present in the tumor, leading to SNP allelic imbalance. Taken the CNV and mutation results together, abnormal change was reported by QASeq in all the 8 patients who developed disease progression.
The 15 cases were classified according to whether or not there was abnormal molecular findings in QASeq liquid biopsy and whether or not disease progression was clinically observed (Fig. 3c). Chi-square test suggested that the QASeq result and progression are not statistically independent (p = 0.038). Although all cases with disease progression were featured with QASeq abnormal findings showing good sensitivity, increase of ERBB2 ploidy was also observed in 4 patients who did not develop disease progression within this time frame. Because all patients presented here were treated with ERBB2-targeted therapy trastuzumab and pertuzumab until disease progression, molecular change may not translate to clinically observed disease progression in all cases.
Furthermore, tumor ERBB2 ploidy was inferred from plasma QASeq results and was compared with FISH. Because both CNV and mutation information are available from QASeq, tumor fraction in plasma cfDNA can be estimated based on VAF of tumor mutation; tumor ploidy can be calculated with plasma ploidy and tumor fraction. We demonstrated this normalization in two cases (Fig. 3d), where pathogenic mutation was observed in baseline cfDNA with VAF between 1% and 30%. Tumor FISH results were collected at three time points in case 1834. QASeq detected ERBB2 amplification in plasma cfDNA 5 months earlier than FISH from tumor tissue. In addition, the inferred tumor ERBB2 ploidy by QASeq was consistent with the available tumor ploidy from FISH. The inferred tumor ERBB2 ploidy was generally stable in both of the two patients, so ERBB2 ploidy change in cfDNA was influenced by tumor fraction. Based on the correlation of QASeq results with progression and FISH results, we envision non-invasive and sensitive longitudinal study of CNV and mutation change in plasma by QASeq can help with understanding disease progression and resistance mechanism.
RNA QASeq for gene expression level quantitation. Next, we demonstrated QASeq technology for RNA quantitation in a variety of samples including tumor tissue FFPE RNA, total blood RNA and total liver RNA. RNA sample was reverse transcribed to cDNA as input for QASEq. Random hexamer was chosen as reverse transcription primer to be compatible with low-quality fragmented FFPE RNA. A targeted multigene breast cancer panel covering 78 amplicons in 15 cancer-related and 5 reference genes similar to Oncotype DX27 panel was built (Supplementary Sect. 5 and Sect. 6). Expression of each gene is calculated from the molecule count of each amplicon, based on UMI count and conversion yield, and is further normalized relative to the expression level of the 5 reference genes in log2 scale (Supplementary Fig. S9).
The RNA quantitation accuracy was firstly validated using ERCC RNA spike-in mix. 16 ERCC sequences were targeted with 16 amplicons. The ERCC RNA sample was diluted and mixed with commercial human total liver RNA for a final expected molecule count between 3 and 100,000. The observed molecule count showed good correlation with the expected (Fig. 4a). QASeq quantitation for RNA was across five orders of magnitude and as few as 3 expected molecules were detected.
Reproducibility for expression level relative to reference genes was evaluated. Total liver RNA was assayed with breast cancer panel in replicates and consistent expression level was observed (Fig. 4b). We observed that multiple quantitation modules (amplicons) per gene reduced quantitation variability in expression level. The standard deviation for relative expression level in triplicate experiments became lower as the number of amplicons per gene increased from 1 to 5 (Fig. 4c), with median standard deviation reduced from 0.44 to 0.21. Outlier was only observed when only 1 amplicon is considered.
RNA Expression level from QASeq was extensively compared with other technologies including RNAseq28, NanoString nCounter29, Microarray30 and RT-qPCR using FFPE RNA from breast cancer and lung cancer tissue. The expression level was normalized in the same way relative to the 5 reference genes for all the methods, and was summarized in Fig. 5d for a breast cancer FFPE RNA. RNA QASeq is consistent with RNAseq and NanoString nCounter. Microarrary, however, showed poor correlation with any of the other methods. RNA QASeq was further compared with these technologies in a couple other samples (Supplementary Fig. S10-13). Nanostring showed high correlation with QASeq in all samples, but required much higher input amount than RNA QASEq. Low expression level species are dropped out at 10 ng as compared to the typical 150 ng input (Supplementary Fig. S14). Microarray showed poor concordance with both QASeq and nanostring in all samples (Supplementary Fig. S13 and S15). QASeq was consistent with RNAseq in most samples. However, because RNAseq was a non-targeted approach, most reads were wasted on genes of no interest and coverage uniformity issue led to poor robustness for the quantitation of lowly expressed genes as it was observed in the two FFPE samples (Supplementary Fig. S11). RT-qPCR is consistent with UMI-based QASeq quantitation, but is limited by low multiplexing ability.
We summarized the relative expression level in four clinical FFPE and 3 normal placenta FFPE samples (Fig. 4d). Hierarchical clustering indicated the expression patterns were the most similar between normal placenta samples.