Figure 1 shows the BALF processing workflow, compatible with LC-MS/MS and quantitative proteomic analysis. Details of each step are provided in the Methods section. The key steps to the workflow include using a molecular weight cutoff spin filter which simultaneously concentrates the higher MW proteins and enables collection of endogenous peptides for analysis by LC-MS, if desired. Concentrated proteins are subjected to immunoaffinity depletion of high abundance plasma proteins, followed by concentration of non-retained proteins, and cleanup and trypsin digestion using protein trapping. Peptide mixtures are then ready for isobaric labeling, if desired, and/or direct analysis using nanoscale LC-MS/MS.
Initially our protocol included a protein precipitation step, similar to other past protocols(28, 30), to concentrate, and ostensibly eliminate contaminants from the BALF proteins. Methanol:chloroform precipitation was chosen as a means to phase separate potential lipid or surfactant contaminants from soluble proteins(30). However, we found that the precipitation step was inconsistent in its effectiveness at removing contaminants, as in some cases downstream LC-MS/MS analysis showed contaminants that obscured detection of BALF-derived peptides (see Supplementary Figure 1 for representative contamination results). Precipitation also had the downside of expanding sample volumes, which required extra handling and concentration steps.
As a solution, we introduced a concentration step using a MW cutoff spin filter, and downstream protein trapping via the S-Trap technology(31, 32). In our hands, protein trapping provided a means to efficiently capture proteins, remove contaminants, concentrate samples, and conduct tryptic digestion all within the same disposable filter, while providing an easy means to collect the peptides for further processing and/or analysis. After implementing the protein trapping step, we eliminated problems with contaminants.
Our workflow also provides a means to enrich endogenous peptides from BALF. These are most likely peptides processed proteolytically and have been shown to have potential diagnostic value in BALF samples(31, 32). We have found that the flow-through collected from the 3 kDa molecular weight cut-off filters using starting volumes of 1-5 mL of BALF contain micrograms of endogenous peptides, which can be concentrated and desalted using STAGE Tips and detected directly by LC-MS/MS (data not shown).
The yield of high-quality proteins, and resulting tryptic peptides, is another critical parameter for ensuring deep and accurate results in MS-based proteomics from challenging clinical samples such as BALF. Once we had determined the ability of our workflow to reliably eliminate contaminants from BALF samples, we assessed the yield of proteins and tryptic peptides from starting sample amounts representative of those generated from patients in the clinic. Table 1 shows results from nineteen processed BALF samples collected from healthy control patients, each with a starting amount of only 5 mL or less of volume, which represents an amount of clinical sample that is commonly collected from infants and children where volumes are proportional to body weight(33, 34). Here, we show the average amounts of protein, and peptides after digestion with trypsin, quantified at key points across the steps shown in Figure 1. Important values shown in Table 1 include the amount of protein available after depletion of high abundance proteins (including a calculation of % depletion), and also the final amount of peptides available after digestion from the S-trap. Supplementary Table 1 shows the values for individual samples used for this assessment. Although there is some variation in these numbers depending on the sample, the processing workflow consistently yields microgram amounts of high-quality tryptic peptides, leaving an excess of sample for analysis on contemporary LC-MS/MS instrumentation platforms.
Table 1
Yield of proteins and peptides from 5 mL of BALF across processing workflow
From the samples used for generating results in Table 1, we also assessed the depth of results from a direct LC-MS/MS analysis of these samples using an Orbitrap Eclipse system. Figure 2 shows a Venn diagram of the proteins identified from six of these samples, which yielded an average of 988 proteins identified via direct LC-MS/MS analysis. Parameters were set at 1% FDR with 2 or more unique peptides for protein identification and grouping (see Methods for details).
The amenability of any sample processing workflow to quantitative analysis is critically important, as most researchers will seek to compare changes in BALF protein abundance between different conditions. As such, we carried out a demonstration study using some of the samples from Table 1 to test repeatability of the workflow. We used the isobaric Tandem Mass Tags (TMT) reagents for multiplexed quantitative analysis(35). Proteins isolated from two different BALF samples (called here Test 1 and Test 2) were divided into two equal parts and taken through the protein depletion and clean-up steps of the workflow. The peptides from these test samples (5 micrograms per sample) were labeled with TMT reagents, as part of a larger testing experiment using the 16-plex TMTPro labeling kit and fractionated using offline high pH HPLC.
Table 2 shows results from the replicate TMT-labeled (Test 1 and Test 2) BALF samples. We found that samples divided into equal amounts prior to protein depletion had average protein ratios close to 1, with acceptable CVs for untargeted isobaric tagging experiments demonstrating reproducibility of our workflow. Across the entire 16plex TMT experiment, we identified with high confidence 1844 protein groups from 24306 peptides. Supplementary Table 2 shows the complete set of protein identification results from this TMT-based experiment, and the quantitative analysis of the Test 1 and Test 2 replicate samples.
Table 2
Results from a demonstration TMT-based quantitative repeatability experiment in BALF samples using our sample preparation workflow.
Finally, we also tested the effectiveness of handling samples that yield lower amounts of protein prior to LC-MS/MS analysis. Here, individual BALF samples with very small starting sample sizes (1-10 mL) yielded very low amounts of recovered digested peptides, in some cases only about 1 microgram. To demonstrate the amenability of our workflow to in-depth, quantitative analysis using TMT labeling, we adapted a microscale labeling(36) and fractionation method(37, 38). Here, to minimize sample handling steps the peptides are labeled while bound to C18 stationary phase. In this way, each of 16 separate samples with starting peptide amounts of 1 microgram each were subjected to TMT labeling, followed by microscale high pH fractionation (referred to here as “microfractionation”) resulting in 9 fractions for LC-MS/MS analysis. Table 3 shows the results from this analysis, comparing the effects of direct analysis on this sample compared to adding microfractionation on numbers of peptides and proteins identified. Here, results from three different 16-plex, TMT-labeled groups of samples were compared using either no fractionation or microfractionation, with average peptides and proteins identified across these different groups. Microfractionation increased the depth of protein identification significantly, by approximately 2-fold. Supplementary Table 3 shows the number of proteins identified within one of these 16-plex TMT labeled group of samples.
Table 3
Amenability of the workflow to material-limited samples and the effects of microfractionation on increasing depth of peptide and protein identification.
Our results demonstrate the effectiveness of our workflow for preparing samples for a wide variety of proteomic studies in BALF samples. BALF is a commonly collected clinical specimen, rich in biomolecules, and valuable for molecular characterization of lung disease and health. Proteins are a main component of BALF(9, 10, 21) making this sample type a prime target for MS-based proteomic analysis. However, the small sample volume, potential for contaminants and suppression by high abundance proteins is known to limit many researchers in their attempts to analyze BALF(23). Here, we present a robust workflow for processing BALF that overcomes these limitations.