Society is becoming more conscious of the foods we consume and more interested in understanding the compositions of specific dietary components. This sentiment is driven by the increasing awareness that diet influences our general health, including our microbiome, immune homeostasis, and even cognitive function. At the same time, our capacity to leverage untargeted metabolomics data, where many unidentified mass spectrometry signals are observed, has dramatically increased with the development of computational ecosystems such as GNPS/MassIVE1. There are now ~150-200 reported biomarkers of food intake2, but it has still remained impossible to determine the uniqueness of these molecules across different foods and food groups.
The Mass Spectrometry Search Tool (MASST)3 combined with a reference database of food metabolite data can be a powerful tool to understand the molecular landscape of foods. The MASST tool is a mass spectrometry search engine that identifies all data files in the GNPS/MassIVE untargeted metabolomics repository that contain a spectral match to a query MS/MS spectrum. We created a domain-specific MASST, called foodMASST (https://masst.ucsd.edu/foodmasst), to enable reporting of the search results in the context of foods and beverages (Figure 1a). As of Feb 2021, ~3,500 untargeted metabolomics files from different foods/beverages collected as part of the Global FoodOmics Project4 (GFOP) have been deposited in MassIVE, a public mass spectrometry repository. Each food sample includes a classification according to a customized food ontology with additional metadata. FoodMASST utilizes this reference dataset to determine the food/beverage items containing a query spectrum and to contextualize the molecule’s presence across foods. To increase usability by others in the community, we created a web interface to launch searches of known and unknown molecules with user-defined parameters and report the food information associated with the fragmentation data (= MS/MS) matches.
To start a foodMASST job, the parent mass and the MS/MS spectrum details are entered into the web interface. With the cloud based platform GNPS, all results are sharable with provenance and tied to user accounts. Once a job is completed, the results can be navigated through the landing page which displays several links to reports (Figure 1b). To provide additional context automatic spectral library search is performed against more than 30 public spectral libraries in the GNPS/MassIVE ecosystem including GNPS contributed libraries, Human Metabolome Database5, all three Massbanks6–8, and many others (for a list see https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp) to determine if molecules were known. “Dataset matches” navigates to all datasets (and files within those datasets) containing matches to the query spectrum. Reports specific to foodMASST can be found under “Foodomics Specific Analysis”. For each category in the food ontology, the proportion of matches are reported in “View Foodomics Specific Molecules” and visualized in “View Interactive Tree”. Metadata associated with the matching foods is reported in “View Matched Files”. For example, when the MS/MS spectrum for domoic acid, a potent neurotoxin from dinoflagellate blooms in the ocean9, was searched (see data availability for job link), the only two matches obtained were associated with seafood - freshly caught mackerel (Figure 1c).
We performed additional representative MS/MS searches for 6 known molecules and one unknown. The includes: biocides fenamidone, spirotetramat, and enilconazole; the plant pigment cyanidin; Vitamin B5; the antibiotic tetracycline; and an unknown molecule with a precursor m/z of 457.257. Fenamidone is a fungicide with low use in the US10 and accordingly was detected in few samples (mushroom, spinach, and lettuce; Figure 2a). For an interactive example of the results landing page see https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=16d14b8efd134fcabe227dd6377db1b9. The “View Interactive Tree” link displays a visual representation of the food matches organized according to the GFOP ontology. Spirotetramat is an insecticide that also has low use in the US10 and the foods sampled. However, this biocide is mainly used on citrus fruits and grapes and was detected in grapes, oranges, and cherries (Figure 2a). Enilconazole is a fungicide mainly used on citrus fruits10 and was detected in 31% of samples classified as citrus (Figure 2a). Interestingly, enilconazole is also used as an antifungal in veterinary medicine and was the only of the three biocides searched that was detected in a non-plant (goat cheese) sample. A search for cyanidin (Figure 2b), a plant pigment with reddish-purple color11, returned teas (47% prevalence) and fruits; the highest prevalence was observed in raspberries (86%), blackberries (75%), and strawberries (100%). Vitamin B5, a ubiquitous metabolite, was detected in many samples, but had the highest prevalence in animal-based foods and fungi (Figure 2c). We also searched for an antibiotic known to be used in farmed animals. Tetracycline12 (Figure 2d) was detected in beef (5%) and poultry (22%). Finally, we searched for an unknown compound (Figure 2e) that was detected in an Alzheimer’s clinical cohort. The unknown had the highest detection rate in rice (27%) and oat (17%) samples, enabling the formulation that it may be associated with dietary habits. Links to the foodMASST jobs described above can be found in the data availability statement.
There are some precautions one has to take to prevent the over-interpretation of the results. Additionally, there are limitations with the presented foodMASST approach that are not specific to foodMASST but rather general to MS/MS spectral matching based on untargeted metabolomics. For example, mass spectrometry can be collected in positive and negative ion mode. The reference data is currently limited to positive ion mode and thus molecules only ionizable in negative mode cannot be used, however this infrastructure can easily accommodate negative ion mode if the community chooses to provide such reference data. Another caveat is that two different molecules, especially structurally related isomers, can have nearly identical MS/MS spectra. Another common feature of mass spectrometry is that molecules may be ionized as different adducts (e.g., H+, Na+, K+, NH4+). It is common that in an untargeted metabolomics experiment to have multiple adducts for each molecule. We encourage searching all adducts that have MS/MS information as it is impossible to get informative MS/MS matching when there are 1, 2, or 3 fragment ions. Such searches provide too little structural information to be reliable13 and therefore the use of low information MS/MS spectral entries is discouraged. In general, the more ions and tighter the mass tolerances used for the search the less likely spurious matches are obtained.
The user may also be interested in structural analogs of related molecules in different foods as they are likely to have similar biological activities. Distributions of analogs can be discovered by searching in analog mode and reporting the neighbors of the searched spectrum in a molecular network. Analog searches will also allow improved discovery of MS/MS matches collected on different instruments or with different instrument settings.
The GFOP reference dataset will continue to grow. The community can contribute to the database that foodMASST uses by depositing LC-MS/MS-based metabolomics data, with the food-specific metadata, into GNPS/MassIVE followed by correspondence with the authors who will inspect the contributed data and add it to the existing database. We anticipate foodMASST will provide valuable insight for unknown MS/MS signals relevant to clinical studies and known signals being considered as dietary biomarkers. More broadly, the enhancement of MASST for domain-specific reporting using well-curated reference datasets will undoubtedly prove useful for many research areas.