Human samples.
A retrospective study on 481 human plasma samples was performed. A total of 192 control samples and 289 cancer samples from patients suffering from breast, kidney, or prostate cancer were collected. The criteria for healthy controls were included no history of any type of cancer and age over 18 years. For cancer patients, the disease was histologically confirmed by needle biopsy or by examining the surgical resection specimen. Both cancer patients and healthy controls were of the same ethnicity (Caucasian), collected at the same place (University Hospital in Olomouc), and processed in the same way. No other exclusion criteria were applied. The clinical information for all patients and controls is summarized in Fig. 1 and Supplementary Tables S1 and S2. The sample set was divided into training (using to build OPLS-DA models) and validation (indicates the possible use for samples with unknown classification) sets. Each 4th sample was assigned to the validation set, to obtain a distribution of 75% of samples belonging to the training set and 25% of samples to the validation set. Patients had no treatment before the blood collection. Human plasma was collected in 9 mL lithium-heparin collection tubes and then centrifuged. The supernatant was transferred, aliquoted, and stored at -80 °C until further processing for lipidomic analysis.
Ethics declaration.
The study was approved by the ethical committee at the University Hospital Olomouc. All subjects signed an informed consent. All methods were carried out in line with Ethical Principles for Medical Research Involving Human Subjects (Declaration of Helsinki).
Study phases.
The lipidome of 481 plasma samples was measured by UHPSFC/MS in the discovery phase. To guarantee that UHPSFC/MS results are reproducible, the same extracts were measured again several months later corresponding to the qualification phase. The sequence of sample measurements was randomized to exclude any measurement bias. The data set was independently processed and the results were compared to the discovery phase. Furthermore, the extracts were also measured with shotgun MS to exclude any bias caused by the employed method, independently processed, and compared with UHPSFC/MS results.
Chemicals.
Solvents for analysis, such as acetonitrile, 2-propanol, methanol (HPLC/MS grade), water (UHPLC/MS grade), and hexane, were purchased from Honeywell (Riedel-da Haën, CHROMASOLV™ LC-MS Ultra, Hamburg, Germany), distributed by Fisher Scientific (Waltham, Massachusetts, USA). Chloroform stabilized with 0.5-1% ethanol was purchased from either Sigma-Aldrich (St. Louis, MO, USA) or Merck (Darmstadt, Germany), respectively. Ammonium acetate was purchased from Fisher Scientific. Deionized water for liquid-liquid extraction was obtained from a Milli-Q Reference Water Purification System (Molsheim, France). Carbon dioxide of 4.5 grade (99.995%) was purchased from Messer Group (Bad Soden, Germany). Non-endogenous lipids were used as internal standards (IS) for quantitative analysis, i.e., MG 19:1/0:0/0:0, DG 12:1/0:0/12:1, and TG 19:1/19:1/19:1 from Nu-ChekPrep (Elysian, MN, USA); CE 16:0 D7, Cer d18:1/12:0, cholesterol D7, LPC 17:0/0:0, LPE 14:0/0:0, PC 14:0/14:0, PC 22:1/22:1, PE 14:0/14:0, PI 15:0/18:1 D7, SM d18:1/12:0, PS 14:0/14:0, PA 14:0/14:0, PG 14:0/14:0, LPG 14:0/0:0, HexCer d18:1/12:0, Hex2Cer d18:1/12:0, and SHexCer d18:1/12:0 from Avanti Polar Lipids (Alabaster, AL, USA). The concentrations of stock solutions of individual IS and the volumes needed to prepare the IS mixture are summarized in Supplementary Table S15.
Lipidomic analysis.
For the lipid extraction, a modified Folch procedure was employed, which was previously validated36. The same sample extracts were analyzed with UHPSFC/MS and shotgun MS. Human serum (25 µL) and the mixture of IS (17.5 µL) were homogenized in 3 mL of chloroform/methanol (2:1, v/v) for 10 min in an ultrasonic bath (40 °C). When the samples reached ambient temperature, 600 µL of water was added, and the mixture was vortexed for 1 min. After 3 min of centrifugation (3000 rpm), the aqueous layer was removed, and the organic layer was evaporated under a gentle stream of nitrogen. The residue was dissolved in a mixture of 500 µL of chloroform/2-propanol (1:1, v/v), carefully vortexed and filtered (0.2 µm syringe filter). The extract was diluted 1:20 with the mixture of hexane/2-propanol/chloroform (7:1.5:1.5, v/v/v) for UHPSFC/MS analysis and 1:8 with chloroform/methanol/2-propanol (1:2:4, v/v/v) mixture containing 7.5 mM of ammonium acetate and 1% of acetic acid for shotgun MS analysis.
UHPSFC/MS measurements were carried out on an Acquity Ultra Performance Convergence Chromatography (UPC2) system hyphenated to the hybrid quadrupole traveling wave ion mobility time-of-flight mass spectrometer Synapt G2-Si from Waters by using the commercial interface kit (Waters, Milford, MA, USA). The chromatographic settings were used with minor improvements from the previously published method37. The main difference is that the data were recorded in continuum and sensitivity mode. The peptide leucine enkephalin was used as the lock mass with the scan time of 0.1 s and the interval of 30 s. The lock mass was scanned but the mass correction was not automatically applied. All samples were measured in duplicates. Noise reduction was performed on the raw files using the Waters compression tool. Data files were lock mass corrected and converted into centroid data using the exact mass measure tool from Waters. The MarkerLynx software from Waters was used for data preprocessing. Further data processing was done by LipidQuant 1.0 software38.
Shotgun experiments were performed on a quadrupole linear ion trap mass spectrometer 6500 QTRAP (Sciex, Concord, ON, Canada) equipped with ESI probe using the characteristic precursor ion (PIS) and neutral loss (NL) scan events39. Raw data files were processed with the LipidView Software from Sciex in order to obtain a summary table of m/z vs. intensity for each scan mode (NL and PIS) of all samples. The raw data were prefiltered by applying the following settings in the positive ion mode, a tolerance mass window of 0.5 Da, a minimum intensity threshold of 0.1%, and a minimum signal-to-noise ratio of 3 after smoothing. The summary tables of m/z vs. intensity for all samples were exported as txt files and further processed by the LipidQuant 1.0 software.
Data processing.
LipidQuant 1.0 is a Microsoft Excel based script used for the automated data processing of txt files38 including m/z values vs. intensities for all samples. The experimental m/z values were compared to the theoretical m/z values from the embedded database for lipid identification, depending on the retention time window or scan type defining the lipid class. The lipid quantitation was performed by calculating the ratio of the intensities of the target lipid and the internal standard and multiplying with the known concentration of the internal standard. Isotopic correction type II40 was automatically applied and a summary table containing lipid concentrations in all samples was generated. Zero filling for missing values was applied by setting the number for 80 % of the minimum measured concentration for a given lipid species for all samples. If the concentration was not determined for more than 25% of the samples, then the lipid species was excluded from the data set. The data set was divided into training and validation set by assigning each 4th sample to the validation set. The clinical information for samples, like gender and pathological state, were revealed and samples were assigned. The final tables containing the lipid concentrations for all samples and fulfilling all defined criteria were used for MDA and other statistical tools.
Statistical analysis.
MDA was performed by the SIMCA software, version 13.0 (Umetrics, Sweden). The lipid species were defined as variables and the samples as observations. The data set was preprocessed by applying the logarithmic transformation, pareto scaling, and centering. The data preprocessing should facilitate the normal distribution of lipid concentrations and that low abundant lipid species contribute similarly to the MDA as high abundant lipid species. PCA was performed to evaluate for outliers, estimate the measurement quality by checking the clustering of QC samples, and evaluate the sample group clustering depending on the pathological state. OPLS-DA is a statistical tool to visualize differences between sample groups of known classification. OPLS-DA was built using the training set and then used for the sample prediction of the validation set. For both PCA and OPLS-DA, the score scatter plots for the first two components are visualized, even though more components may contribute to the model. The number of components for PCA and OPLS-DA models were determined by selecting the option autofit in the SIMCA software, where only components are considered of significance according to cross-validation rules. The cross-validation is automatically applied following Eastment et al. for PCA41 and Martens et al. for OPLS-DA42. The data set is divided into 7 groups, omitting one group, building the model, and predicting the excluded group. This is repeated for each group, and the results of predictions reveal the number of significant components, which is provided in Supplementary Table S4. OPLS-DA revealed differences in the lipidome by using gender as a classifier. As a consequence, the data sets for females and males were treated separately for investigation of the prediction performance.
Microsoft Excel was used for the calculation of average lipid concentrations obtained for all sample groups, fold change, T-value, and p-value. For the calculation of p-value, a two-sided two-sample T-test assumed unequal variances (Welch test) for the samples obtained from healthy controls and kidney, breast, or prostate cancer patients. P-values < 0.05 were considered as significant, but p-values were further evaluated according to the Bonferroni correction. All statistical parameters for all lipids are summarized in Supplementary Tables S3 and S6 below the lipid concentrations measured in individual samples and Supplementary Table S10. Another parameter indicating some relevance to differentiate samples from healthy controls and cancer patients, is the variable of importance (VIP) value obtained for each OPLS-DA plot. The most regulated and statistically significant lipid species with a fold change ± 20%, a p-value < 0.05, and a VIP value > 1 for all methods and phases are summarized in Supplementary Table S10. Box plots were used to better visualize lipid species concentrations depending on the health state. The box plots were constructed in R free software environment (https://www.r-project.org) using readxl and ggplot2 packages. In each box plot, the median was presented by a horizontal line, the box represented the 1st and 3rd quartile values, and whiskers stood for 1.5*IQR from the median and each measurement was plotted as a jittered point value. The receiver operating characteristics curves were generated by using the packages readxl and AUC in R. The dendrograms were also constructed in R43. For the circular dendrograms, the Euclidean distances were calculated, and then the upgma function from the phangorn library was used for clustering (the Ward agglomeration method was selected). Circular dendrograms were generated and surrounded by the heatmap (ggtree and gheatmap functions – ggtree library). For the heatmap presentation, all concentrations were min-max scaled.