Materials
CaCl2 and paraformaldehyde were obtained from Sigma Aldrich. NaOH was obtained from Fisher Scientific. 1,3-Dihydroxyacetone dimer was obtained from Fluorochem EU. Formaldehyde solutions were prepared by depolymerization of paraformaldehyde at 60°C, the final formaldehyde concentration was determined by titration with sodium sulfite and phenolphthalein43. For all aqueous solutions ultrapure water obtained from an Elga Purelab Chorus 1 was used. Prior to use, water was degassed by stirring under vacuum for 10-15 minutes. Ion mobility-mass spectrometry experiments were performed with a timsToF instrument (Bruker Daltonics, Bremen, Germany) equipped with an electrospray ionization (ESI) source operating in positive mode.
Flow reactions
A CSTR (volume 435 μL) with five inlets and an outlet was fabricated from Poly(methyl methacrylate) by the Radboud TechnoCentre. LabM8 syringe pumps with BD plastik syringes were used to control input flow rates (a schematic is given in fig. S1 and an image of the setup is provided in fig. S2). Syringes were loaded with the specified solutions and connected to the reactor with filled tubing. When the flow of water was divided between two syringes the flows were fused using a Y connector prior to the reactor. The reactor and a small outlet tubing were filled according to the initial conditions of the experiment. Once filled the outlet tubing was capped with a one-way flow check-valve and the system was allowed to build up pressure to overcome the crack pressure of the valve. Subsequently the reactor output was diluted with a water flow (0.8 mL/min) controlled by a Bruker Elute HPG 1300 HPLC. The dilution flow was merged with the reactor outlet with a Y connector. With a subsequent Y connector, the flow was diverted between the instrument and a Restek RT-25020 back pressure regulator connected to a waste line. The back pressure regulator provided a constant pressure of two bars in the reservoir.
Flow profile generation
Experimental conditions were selected based on previously published research6, to create high compositional diversity over the used concentration range. The general workflow consists of first generating a desired function and then scaling the generated function to a suitable input profile. To do so, the function was first mean-centered, scaled with a manually chosen factor that was chosen to maximize amplitude without generating negative flows. After scaling, a baseline value of the corresponding syringe was added, so the profile fluctuates around the initial input concentration. The flow rate of water was used to counterbalance changes in flow rate, to ensure a constant total flow rate of 217.5 μl/min corresponding to a residence time of two minutes. The reaction was allowed to equilibrate at steady state for at least 30 minutes prior to starting flow profiles, which included another initial 30 minutes of steady state. Details of the parameters used for generating the nonlinear classification data are given in SI section 3.1, details of the generation of the dynamic input in SI section 4.1, and for the Lorenz attractor inputs in SI section 5.1.
Mass spectrometry
Trapped ion mobility spectrometry (TIMS) experiments were performed using an N2 carrier gas by scanning inverse ion mobilities from 0.4 V.s.cm-2 to 0.84 V.s.cm-2. The ramp time was set to 500 ms and the accumulation time to 20 ms to minimize ion activation in the TIMS region. The mass range scanned by the time of flight (ToF) analyser was set to m/z 50-650. A complete description of the instrumental parameters is available as SI section 1.1.
Ion intensity extraction
A list of ions with reference m/z and inverse mobilities was established based on the most intense signals observed (SI section 1.2). Ion chromatograms were then extracted for mass- and mobility-selected ions based on the reference list of ions using the TimsPy library43. Ion chromatograms were extracted with a mass width of 0.02 Da and a mobility width of 0.006 V.s.cm-2.
Nonlinear classification
For every input in the nonlinear classification dataset, ion signals were collected for 30 minutes (fig. S7-S8). The output in the last 10 minutes of this period were used as steady-state data, and signal intensities were averaged over the 10-minute time duration to reduce noise, resulting in 106-dimensional vectors for all 132 inputs. These vectors were subsequently normalized to remove the mean and scaled to unit variance across features. For the selected nonlinear classification tasks, a linear support vector classifier was trained to obtain classifications of the inputs (code available in the analysis/classification.ipynb notebook). For every task, a leave-5-out cross validation was performed with 100 repeats, and the Φ score was calculated for every repeat as
where TP denotes the number of true positives, TN denotes true negatives, FP denotes false positives, and FN denotes false negatives.
Complex dynamics prediction
Next, the same DHA input flow was used as input into the formose reservoir, ensuring the total flowrate remained constant by compensating the fluctuating DHA input with an inversely fluctuating water input. The response of the formose reservoir was collected every 500 ms for the duration of 3 hours, after which the output was averaged over bins of 10 seconds to reduce. The recorded formose reservoir response was trained on the individual substrate time series of the model for 60 minutes, using a ridge regression algorithm with the regularization strength set to α = 5×10-5. The trained weights were then used to predict the substrate time series directly from the reservoir output for the remainder of the measurement time (code available in the analysis/dynamics.ipynb notebook).
Forecasting
DHA, NaOH, and formaldehyde inputs were simultaneously varied according to the dynamics of a Lorenz attractor over the duration of 8 hours. The reservoir response was measured every 500 ms, after which the output was averaged over bins of 10 seconds to reduce. Next, a ridge regression algorithm was used with the regularization strength set to α = 5×10-5 to train the formose response on the input flows 2 minutes (120 seconds) into the future for a duration of 30 minutes. The trained weights were then used to forecast the input flows 2 minutes into the future directly from the reservoir output for the remained of the measurement time (code available in the analysis/forecast.ipynb notebook).
Mutual information
Mutual information is defined for a pair of random variables X and Y as