Mass spectrometry-based single cell proteomics (SCP-MS) has recently seen significant developments1–3. To overcome analytical barriers such as insufficient peptide ion signals for MS identification and quantification, a multiplexing strategy based on labeling tryptic peptides from single cells with isobaric tandem mass tags (TMT) alongside a labeled carrier proteome to boost MS signal has been developed4. Several studies have recently emphasized the importance of increasing the number of ions sampled from the single-cell channels (SCCs) with a carrier proteome channel (CPC), concluding that the depth of peptide identification needs to be balanced against accuracy of quantification5–7. In this study, we highlight additional crucial factors for performing SCP-MS experiments, these include: 1) proper selection of the carrier proteome; 2) unneglectable isotope impurities caused by the carrier channel; 3) balance between signal-to-noise ratio (SNR), collisional energy and resolution; 4) suitability of SNR and intensity for different data interpretation strategies.
We modeled an SCP-MS experiment using TMTpro8 16plex labeling reagents, where channel 126 severed as the carrier proteome channel (CPC), 127C was left empty, and the last 14 channels represented SCCs at different ratios to the CPC (Fig. 1a). To achieve this, we constructed a mixed species sample from homo sapiens (HeLa cells), Saccharomyces cerevisiae (Yeast) and Escherichia coli (E. coli), which were pooled at different known ratios in the SCCs, in order to elucidate the bidirectional effect on identification and quantification in SCP (Fig. 1a). We investigated the effects of CPC quantities and proteome types by designing different CPC constructs with one of three different carrier proteomes: Human only (H), E. coli and yeast mixed (EY), and all three species (HEY) mixed across a large range (14x to 434x) of CPC to SCC ratios (hereafter as carrier levels). Together with samples without any CPC (no carrier), these samples were analyzed by liquid chromatography tandem mass spectrometry (LC-MS) with different MS parameters (Fig. 1a, Supplementary Note 1). Loading amounts (50pg to 200pg) per SCC were equivalent to single cell proteomes5.
We first tested how different carrier proteomes affect protein identifications across the mixed species channels. We compared number of non-human and human proteins identified with EY, Y and HEY as the carrier proteome as well as without any carrier proteome (Fig. 1b). The carrier proteomes primarily dictated which proteins were identified in different SCCs. Moreover, this pronounced bias correlated directly with the carrier levels. The results suggests that carrier proteomes need to be properly weighed to act as impartial carriers for all proteins in SCCs. We next examined total numbers of proteins identified and quantified at different carrier levels. As expected, including a CPC increased the numbers of identified proteins consistently with rising carrier levels (Fig. 1c, Supplementary Fig. 1, Supplementary Table 1). However, the number of human proteins with precise quantification across the 14 SCCs (CV < = 20 %) peaked at relatively lower carrier levels (42x). In the sample containing the highest carrier level (434x), the majority of identified proteins could not be reproducibly quantified (Fig. 1c).
We explored the relationship between quantitative precision and number of fragment ions, and found very high carrier levels led to worse correlations and inferior quantification performance despite higher number of total ions accumulated for MS/MS scans (Supplementary Fig. 2). We compared averaged SNR of the 14 single cell reporters (Av14) or a subset of 12 reporters with the least isotope impurities (Av12) with the respective CV values in the 14 or 12 SCCs for human peptide spectral matches (PSMs) and proteins (Fig. 1d, Supplementary Fig. 2, Supplementary Fig. 3). The CV values displayed negative correlations with the average SNR at both PSM and protein levels in agreement with the findings in other studies5. Higher carrier ratios limited the maximum single cell SNR, as the signal of both reporter ions and peptide fragment ions primarily derived from the CPC (Fig. 1d). It should be noted that the reporter ion intensities, which are normalized by injection time, correlated worse with the CV values than SNR (Supplementary Fig. 2, Supplementary Fig. 3).
Furthermore, we observed significant contribution of isotopic impurities, particularly from carrier channel 126. We calculated protein CV’s in all 14 SCCs with either raw or impurity corrected SNR of reporter ions and found impurity correction substantially increased the number of proteins with CV < = 20% in samples with carrier levels higher than 98x (Fig. 1e). At increased ratios, we observed that channel 126 produced noteworthy isotopic impurities in addition to those in the empty 127C channel (Supplementary Note 2), which affect channel 128C (126 + 2x13C), and importantly, 127N (126 + 15N). Of note, Cheung et al. also noticed this negative impact of channel 127N but ascribed it to ion coalescence5. The impurities explain worse quantification at high booster ratios if ignored. In fact, TMT can accurately quantify ratios higher than 400 even for channel 127N after impurity correction (Fig. 1f). Unfortunately, impurity correction also led to higher variations of quantified ratios and the correction for 15N is not available in most data processing tools (Supplementary Note 3). Due to the negative impact by 127N and 128C, we calculated the CV of all PSMs and proteins without these two channels, resulting in much more accurate and reproducible quantifications on the 12 unaffected channels (Fig. 1d, 1e, Supplementary Fig. 3). Next, we aimed to evaluate the overall quantitative accuracy across all 14 SCCs in yeast peptides by comparing their relative intensities against the expected values (Supplementary Fig. 4). Similar to quantitative precision, accuracy was highly dependent on SNR. Distributions of relative intensities in SCCs were highly dispersed at low SNR and they converged to expected values with increased SNR. As the Av14 values in samples with high carrier levels was limited, the abundance ratios were distributed almost randomly and led to the poor quantification accuracy.
To assess the impact of the CPC for detecting significantly regulated proteins, we took advantage of the known protein ratios in our mixed species samples and examined the sensitivity and specificity of the CPC approach. We utilized the predefined relative species abundances between channels of a ratio of 2 for yeast peptides, 0.5 for E. coli peptides, and 1 for human peptides (Supplementary Note 4). We used the four ratio estimates to perform t-test (represented by a volcano plot) analysis to identify significantly regulated PSMs with log2-fold change higher than 0.5 at p < 0.05. In all cases, less than one percent of human PSMs were wrongly assigned as significantly regulated, suggesting a high specificity (Fig. 1g, 1h). Despite lower number of PSMs identified, samples without any carrier were most likely to assign highest percentage of identified yeast and E. coli peptides as correctly regulated, however this sensitivity decreased as carrier levels increased. Ultimately, samples with 98x carrier were detected the highest number of regulated peptides. Conversely, only a small percentage of yeast and E. coli peptides were accurately quantified in samples with very high carrier levels (210x and 424x) despite the highest numbers of identified peptides.
We tested the most direct MS parameters, the normalized collisional energy (NCE) and MS/MS resolution to enhance SNR for quantification accuracy (Supplementary Note 1). We found elevated NCE levels (35%-38%) at lower MS/MS resolution is the best compromise between quantification accuracy and identification. In accordance with a previous study8, NCE levels between 32% and 35% gave most PSMs (Fig. 1i). However, numbers of PSMs with CV < = 20% generally increased as NCE was correspondingly increased, particularly at high carrier levels. This was due to a consistent increases of reporter SNR with higher NCE (Fig. 1j), despite the Sequest HT score function XCorr and MaxQuant Andromeda9 scores peaking at lower NCE (Supplementary Fig. 5). We simultaneously observed a steady decrease of single cell SNR values as carrier levels increased (Fig. 1j) indicating that even NCE level at 38% could not overcome the limits caused by the CPC. Higher MS/MS resolution resulted in higher fraction of identifications with CV < = 20% however this came at the cost of significantly reduced number of PSMs (Fig. 1k, Supplementary Fig. 6) due to the slower scan speed.
For profiling cellular heterogeneity based on global protein expressions10 with the isobaric carrier approach, protein abundances from reporter ions are first extracted and then subjected to dimensionality reduction methods, such as principal component analysis (PCA). To estimate relative protein copy numbers in proteomes, the intensity-based absolute quantification (iBAQ)11 is the method of choice. Therefore, it is essential to evaluate accuracy of protein abundances derived from reporter ions. We compared 4 different reporter ion abundance values (SNR and intensities both as raw and impurity-corrected, Supplementary Note 5) and demonstrated that they resulted in different protein abundance estimates especially in samples with low AGC target (Supplementary Fig. 7). Since the protein copy number estimates in our SCP model should match between the CPC and SCCs for each species, we tested the correlations between iBAQ values computed from full scans (MS1) with the CPC and SCCs (Fig. 1l). Protein abundances at MS1 were calculated as summed abundances of identified peptides, where the Minora algorithm in Proteome Discoverer was used to perform untargeted feature detection for the peptides. Protein abundances on reporter ions were calculated as summed quantities of identified peptides from reporter ion abundances. Unlike the quantification of a single protein across TMT channels, the intensity values correlated better with MS1 abundances than SNR values, especially with low AGC settings (Supplementary Note 6). This is likely due to the fact that both reporter ion intensity values and MS1 abundances are scaled based on injection times. Furthermore, carrier levels of 42x and 98x showed best correlations in almost all settings.
In conclusion, our study systematically explored the effects of the isobaric carrier approach using a defined mixed species model and provides a guideline for future SCP experiments (Table 1). Our finding that the carrier proteome specifically boosts the identification of the proteins contained within it opens the door for a variety of “targeted” SCP experiments. As we studied the tradeoff between identifications and quantitation at large carrier levels (> 100x), we observed that the underlying reasons were a compression of the dynamic range of single cell SNR at high carrier levels, and for specific channels the impurities from the carrier channel. Therefore, we suggest excluding channels 127N and 128C for SCP experiments with extreme carrier levels. We tested the sensitivity and specificity of identifying significantly regulated proteins and our model suggests an optimal carrier level of ~ 100x when analyzing 14 SCCs. We recommend using reporter ion SNR for fold-change-based quantifications across channels and reporter intensities for protein copy number estimation within each channel. A higher NCE of up to 35% achieves better quantification performance by enhancing reporter abundances while maintaining peptide identifications. In the future, the performance of SCP will benefit from the development of isobaric tags with higher multiplexing capacity (18-plex 12), more sensitive instrumentation, and higher dynamic range of mass analyzers. This study provides a roadmap to benchmarking such new developments.
Table 1
Recommended settings in Orbitrap instruments for isobaric labeling based single-cell proteomics
Parameter | Recommended setting | Rationale |
AGC | >= AGC300% | Higher AGC target allows more ions in MS2 scans |
NCE | 35% | Slightly higher NCE leads to higher reporter ion SNR without reducing in peptide fragment quality |
Resolution in MS2 | >= 60K | ● To resolve isobaric reporter ions ● To make use of long fill time needed to reach the high AGC target |
Carrier levels | 127N and 128C included: <= 100x | Impurities from TMTpro126 are substantial with very high carrier levels |
127N and 128C excluded: > 100x |