Oceans are changing rapidly under the effect of climate change. Both biophysical and socio-ecological impacts are already felt 1 and are expected to increase considerably in the coming decades 2 as a result of the increasing frequency of extremes in sea surface temperature 3,4 and sea level rise 5,6, which in coming decades are likely to compound the effects of deoxygenation 7 and acidification 8,9 leading to an increased risk of ‘perfect storm’ or compound event conditions 10. The ability to forecast future oceanographic conditions is thus crucial to address both long-term trends and short-lasting extreme events, for the protection of marine and coastal environments and the ecosystems and people depending on them 11. This work contributes to this effort with a specific focus on forecasting sea surface temperature (SST), which in many regions has both a long-term trend as well as increased variability in extreme events known as marine heatwaves 3. The time period of interest is the seasonal time scale, as this is most useful to many marine decision makers 12,13.
There is considerable interest in SST forecasts, as temperature is readily observed from space and has been shown to influence the distribution and abundance of many marine species 11,12. These forecasts have been attempted by several methods, including mechanistic modelling, statistical modelling and by expert judgement, as well as coupled dynamic models. In the last few years machine learning (ML) techniques have been added to the forecasting tool kit 14–16. In a significant contribution to the literature 17, a convolutional neural network (CNN) trained on an ensemble of CMIP numerical models (see Methods) and Simple Ocean Data Assimilation (SODA) 18 ocean reanalysis data, was shown to outperform individual numerical models in forecasting the ENSO3.4 index at lead times of up to 18 months. This shows that the output of numerical models can be effectively used, in addition to observations, to train a ML model. Following 19, we refer to this ML approach as H19.
One weakness of the H19 method is that a different ML model is needed for each combination of target forecast date and lead time and each ML model is trained individually. As a result, when the ML models are used in a forecasting mode, forecasts (or hindcasts) of consecutive months are carried out by independent models and can thus display undesirable high-frequency fluctuations. To address this limitation, in 19 the authors train a single ML model on longer time series. Again following 19, we refer to this later ML approach as H21.
Here we propose an alternative approach (summarised in Fig. 1) in which hindcasts produced by ML models with different lead times as well as information about persistence and seasonal periodicity are combined. We use the same ML architecture as in H19. However, unlike H19, we train our CNN with CMIP models only (see below for which sets of CMIP models have been used in different experiments) and do not use SODA reanalysis data for training. As in H19, this training provides a set of ML models, one for each target forecast date and lead time. In addition, from the output of the CMIP models, we extract information about persistence (system inertia) and seasonal periodicity (see step 1 ‘Skill Assessment’ in Fig. 1). Next, for each target forecast date, we assess the prediction skills of i) the ML models with different leads, ii) the persistence for different leads and iii) the contribution of periodicity (see step 2 ‘Skill Assessment’ in Fig. 1). These values are then combined (see step 3 ‘Skill-based Weights’) to generate weights proportional to the skill of each method. These weights reflect how well the different forecast methods performed on the trained data set. We then use the same architecture in forecasting mode. First, we feed the data from the GODAS reanalysis 20 to the trained ML models and generate a set of SST forecasts for different target dates and leads. We also compute the persistence for different leads and the contribution for periodicity (‘Forecasts’ in Fig. 1). Next, we use the skill-based weights previously computed on the CMIP models at step 3 to combine the forecasts and generate the final forecast for each target forecast date (‘Skill- Weighed Forecast’ in Fig. 1). Hereafter, we refer to this approach as Skill Weighed Forecast (SWF).
The skill assessment and generation of the skill-based weights are described in detail in the Methods (Section 4.3) and summarised in Fig. 2. For a given target date, and for different lead times, the green and the red lines show the mean squared error (mse) of the ML forecasts (EML) and of the persistence (Ep), respectively. The dashed line shows the mse of the ‘null’ forecast of zero anomaly at each time step (EN, see Section 4.3). We define the skill of a ML forecast as EN-EML, shown as SML in Fig. 2. Similarly, Sp shows the skill of the persistence forecast. To generate a final forecast for the target date, the ML and persistence forecasts are weighted proportionally to their respective skills (see Section 4.3, Eq. 2). Finally, the difference Sp-SML gives the predictability gain of forecasting via ML comparted to persistence.
We define the cumulative prediction skill CPS as the sum of a method’s predictive skill over all lead times. The cumulative prediction skill gives an indication of the method’s overall prediction ability. For a completely unpredictable signal, we could do no better that using the null forecast at each lead time. The cumulative predictability would then = 0. For a deterministic, perfectly predictable signal, we would have mse = 0 for each lead time (see Methods). The red and green areas in Fig. 3a and b, show the cumulative prediction skill of persistence and ML, respectively. The difference between these two areas represents the cumulative gain of predicting via ML vs persistence (Fig. 3c).