2.1. Draft reconstruction and lipid metabolism
For the first draft of the genome-scale metabolic reconstruction of L. starkeyi, denominated lista-GEM, we used two well-curated GEMs as templates: Y. lipolytica iYali 4.1.2 (Kerkhoven et al., 2016)d toruloides rhto-GEM 1.3.0 (Tiukova et al., 2019). First, we identified the reactions from orthologs between the L. starkeyi genome NRRL Y-11557 (NCBI ID: 10576) and the Y. lipolytica or R. toruloides using bidirectional BLASTp (Madden, 2013). We considered as orthologs the genes with e-value < 1 x 10− 20, identity > 35%, and alignment length > 150 bp. We excluded the reactions in iYali that were already present in rhto-GEM or that simplified lipid metabolism. Then, we retrieved the pseudo-reactions (e.g., biomass formation and exchange reactions) from rhto-GEM. We performed the reconstruction steps using the RAVEN Toolbox 2.7.9 (Wang et al., 2018) in MATLAB (The MathWorks Inc., Natick, Massachusetts).
To represent the lipid metabolism in lista-GEM, we used the Split Lipids Into Measurable Entities (SLIMEr) formalism (Sánchez et al., 2019), which describes lipids by splitting them into their basic components, such as pseudo-reactions that describe both the lipid classes and the acyl chain distributions. Here, we incorporated the following acyl chains of biotechnological importance: 16:0, 16:1, 18:0, 18:1, 18:2, 18.3.
2.2. Biomass composition
From the total content of lipids, proteins, carbohydrates, RNA, and DNA retrieved from experimental measurements (Anschau et al., 2014; Matsuzawa et al., 2018; Probst and Vadlani, 2015), we updated the biomass equation of the rhto-GEM template and used it for lista-GEM. We also updated the biomass composition using data from glucose continuous cultures at a dilution rate of 0.06 h− 1 (Anschau et al., 2014). We calculated the distribution of deoxyribonucleotides based on the GC content (47%) of L. starkeyi genome, as well as the sum of mRNAs and ncRNAs. For the amino acid distribution, we calculated it from the amino acid composition of translated coding sequences. We collected the contribution of triacylglycerols (TAGs), sterols, free FAs, phosphatidylcholine (PC), phosphatidylethanolamine (PE), phosphatidylinositol (PI), phosphatidylglycerol (PG), phosphatidylserine (PS), cardiolipin, and diacylglycerols (DAGs) from Probst and Vadlani (2015) and Uzuka et al. (1974). Considering data from Calvey et al. (2016), Matsuzawa et al. (2018), and Takaku et al. (2020), we adjusted the FA profile for the chains 16:0, 16:1, 18:0, 18:1, 18:2, and 18.3. The calculation procedures used to define the stoichiometric coefficients are provided in the lista-GEM documentation biomassCalculations.xlsx file available in the GitHub repository and Zenodo archive (See Data availability).
2.3. Gap-filling, manual curation, and quality assessment
The gap-filling of lista-GEM was conducted in two steps. In the first step, we used Meneco (Prigent et al., 2017) to identify the reactions required for the biosynthesis of biomass components (target compounds) based on a list of available metabolites (seeds). The reactions identified by Meneco were retrieved from rhto-GEM. However, after Meneco was applied, we noticed that the model could still not sustain growth (i.e. produce biomass). Thus, in the second step, we used the “fillGaps” function from the RAVEN Toolbox. We considered growth on glucose (1 mmol/gDW h) at a biomass production rate of 0.01 h− 1. The reactions required to sustain biomass formation were then retrieved from rhto-GEM and iYali templates. Finally, we noted that three reactions included from iYali ('y300065', 'y300066', 'y200008') were not required for growth and led to water and H+ overproduction in rich media simulations and removed them.
After the gap-filling step, we included the specific reactions required by L. starkeyi to sustain growth on the specified carbon sources and to meet cofactor requirements. In contrast to other oleaginous yeasts, the malic enzyme of L. starkeyi preferably uses NAD+ instead of NADP+ as a cofactor (Tang et al., 2010). Thus, we removed the malic enzyme reaction that used NADP+ and maintained only the one that uses NAD+. Additionally, we manually included the reactions necessary for L-rhamnose, lactose, cellobiose, and levoglucosan utilization. Finally, we updated the gene-reaction rules (grRules field), replacing the genes in the model that still contained the identification from the template (R. toruloides) with L. starkeyi homologs. The non-growth associated maintenance reaction remained the same as in rhto-GEM due to the lack of available data for L. starkeyi. We assessed the quality of the final reconstruction using MEMOTE (Lieven et al., 2020) (Fig. 1A).
2.4. Simulations and validation
To quantitatively assess the growth of L. starkeyi on different carbon sources (glucose, acetate, arabinose, cellobiose, citrate, ethanol, galactose, lactose, levoglucosan, xylose, rhamnose, R-lactate, S-lactate, mannose, trehalose; see Fig. 1B) in minimal medium, we first constrained the lower bound of exchange reactions to zero and left only the oxygen, ammonium, H+, iron, phosphate, potassium, and sulfate exchange reactions unconstrained. Then, we allowed the uptake of each carbon source at -3 mmol/[g dry weight (DW) h] and optimized the formation of biomass via Flux Balance Analysis (FBA). To simulate growth on rich media, we also set the uptake of amino acids to -1 mmol/(g DW h).
To quantitatively assess model performance, we compared the experimental growth rate gathered from the literature with our predictions. Data were available for glucose, xylose and glycerol. When available, we set the carbon uptake rate as described in the manuscript. If not, we assumed a value of -3 mmol/(g DW h). The media (minimal or rich) was also adjusted based on the source manuscript description and the biomass formation was optimized using FBA. The correlation between in vivo and in silico growth data was determined by the Pearson’s correlation coefficient (Fig. 3).
Furthermore, we conducted phase plane analysis in two different scenarios to determine conditions that would favor growth and lipid production in three carbon sources found in agro-industrial wastes (glucose and xylose from lignocellulosic biomasses and glycerol from biodiesel production). In the first scenario, we varied the carbon source uptake [from 0 to -10 mmol/(g DW h)] and oxygen uptake [from 0 to -50 mmol/(g DW h)] rates, while maintaining the other components of the minimal media as described above unconstrained. In the second scenario, instead of constraining carbon and oxygen uptake, we constrained nitrogen [from 0 to -9 mmol/(g DW h)] and oxygen [from 0 to -27 mmol/(g DW h)] uptake rates. We optimized the growth considering the biomass formation equation as described above, and simulated lipid production by optimizing a pseudoreaction representing TAG (1–16:0, 2–18:1, 3–18:1) exchange via FBA.
Moreover, we evaluated the main reactions related to lipid accumulation in nitrogen-limiting conditions using the environmental version of minimization of metabolic adjustment (eMOMA) (Kim et al., 2019). First, the same pseudoreaction described above to represent TAG exchange was added to the model, and the lower bound of the non-growth associated maintenance reaction (NGAM) was set to a low value [0.5 mmol/(g DW h)] to represent stationary growth. Then, we blocked the exchange reactions for ethanol, trehalose, butanediol, pyruvate, fumarate, 2-oxoglutarate, malate, oxaloacetate, glyoxylate, and acetate since we did not find evidence regarding the excretion of these metabolites for L. starkeyi under nitrogen-limiting conditions. We also blocked the exchange of decanoate, palmitate, palmitoleate, oleate, 14-demethyllanosterol, episterol, ergosterol, fecosterol, lanosterol, zymosterol, and ergosta-5,7,22,24(28)-tetraen-3beta-ol to promote TAG accumulation. Then, we set the growth as objective and performed FBA to obtain the flux distribution under non-restricted conditions (minimal media). Next, we blocked nitrogen exchange to simulate nitrogen restriction and confirmed that the model could not predict the growth and conducted the traditional MOMA between the model with and without nitrogen restriction. To test the reactions that affect lipid accumulation via knockout or overexpression, we removed reactions with zero flux in both conditions. Thereafter, we performed the eMOMA by knocking out or overexpressing (2x higher flux) the remaining reactions. We kept reactions where at least 2% increase in TAG exchange compared to the nitrogen-restricted reference and at least 90% growth remained compared to the nitrogen-abundant reference condition. We conducted eMOMA simulations for glucose, xylose, and glycerol at a fixed carbon uptake of -3 mmol/(g DW h).
Finally, we predicted overexpression targets to improve lipid production using glucose, xylose, and glycerol as carbon sources via flux scanning based on enforced objective flux (FSEOF) analysis(Choi et al., 2010). We performed the simulations considering minimal media, set the NGAM to 0 mmol/(g DW h), the TAG exchange pseudoreaction as the target, and the carbon uptake to -3 mmol/(g DW h). We conducted all simulations using the RAVEN Toolbox (v. 2.7.9) and/or the COBRA Toolbox (v. 3.4) (Heirendt et al., 2019) in MATLAB (The MathWorks Inc., Natick, Massachusetts) using Gurobi® (v. 10.0) as the solver.