Development of a Random Forest Climate Model Correction Algorithm

doi:10.21203/rs.3.rs-4125598/v1

Download PDF

Research Article

Development of a Random Forest Climate Model Correction Algorithm

https://doi.org/10.21203/rs.3.rs-4125598/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In this paper, a variety of machine learning models for reducing climate model inaccuracy are developed and critically examined. The most effective model at mitigating climate model inaccuracy is a random forest regressor, which reduces temperature Root Mean Square Error (RMSE) from 2.90 to 0.44 in the Global Ensemble Forecast System (GEFS). Multiple linear models, neural networks, and random forest regressor correction models are trained on a large climate model inaccuracy dataset. This inaccuracy dataset is created by comparing the results of a climate reanalysis with the results of a climate reforecast, assuming that the reanalysis is more accurate at representing real climate values than the reforecast. This assumption is successfully validated by comparing both datasets to an observational validation set. The random forest correction model performs significantly better than the other correction approaches, for which possible explanations are discussed. Finally, this method of climate model correction is applied to a generalized setting, creating a program that can automatically generate a tailor-made random forest correction model for any climate model output.

Using climate models, scientists attempt to understand, calculate, and simulate the physics that cause our climate’s behaviour. While we possess a relatively good understanding of the underlying laws of physics driving our climate, it is often difficult to estimate how and where exactly these phenomena occur in nature, especially in a system as large and complex as the Earth’s climate. Due to the magnitude and complexity of this system, we are often forced to use parameterisations as approximations of physical phenomena, leading to inaccurate representations of physical processes, and thus inaccurate climate models (Hamill & Whitaker, 2006; Foley, 2010; Yang et al., 2012).

Statistical postprocessing techniques of climate model outputs have proven to be a skilful method of improving these inaccurate climate predictions (Mendoza et al., 2015). Abundant research exists applying Model Output Statistics (MOS) methods such as multiple linear regression, quantile regression, and logistic regression to improve climate model outputs (Clark & Hay, 2004; Friederichs & Hense, 2007; Wilks, 2009; Mendoza et al., 2015). While these MOS methods yield promising results (Mendoza et al., 2015), they primarily include relatively straightforward statistical regression techniques. It has recently been argued that that with contemporary advancements in Machine Learning (ML) and Neural Networks (NNs), it may be possible to develop complex algorithms that yield better climate predictions than contemporary climate models, without a full understanding of the underlying climate physics (Dueben & Bauer, 2018). In this paper we develop an approach using climate model predictions in combination with modern ML and NN techniques to reduce the prediction inaccuracy and get a higher climate model reliability. This solution offers a multitude of advantages, as it employs a climate model that includes an understanding of physical processes as a baseline, and then attempts to improve this baseline by using the computational power and efficiency of modern ML techniques.

Specifically, we trained a variety of different linear regression models (Pedregosa et al., 2011), Neural Networks (Rasp & Lerch, 2018), and Random Forest (RF) models (Breiman, 2001) to predict and

eliminate temperature errors in an ensemble of climate model predictions, and test which of these techniques proved to be the most effective. Error in this scenario is defined as reanalysis temperature (Copernicus Climate Institute, 2019) minus reforecast temperature (NOAA, 2019), which will be elaborated on in the next paragraph. An RF model turned out to be the most effective method for reducing climate model inaccuracy, reducing climate model Root Mean Square Error (RMSE) from 2.90 to 0.44. Applying this initial method of developing a climate model correction algorithm, a more general program was created that outputs a custom RF correction algorithm for any climate model. Further instructions for the application of this program can be found on https://github.com/timholthuijsen/ClimateModelCorrection, and in the discussion section of this paper.

As data for this model development, a Reforecast – Reanalysis temperature comparison was applied, using the Global Ensemble Forecast System V2 (GEFS) reforecast (NOAA, 2019) as modeled data, and the ERA5 Reanalysis by the European Centre for Medium-Range Weather Forecasts (ECMWF) (Copernicus Climate Institute, 2019) as authoritative data. Reforecast datasets, also known as hindcasts, are a commonly used tool for postprocessing climate model output (Yang et al., 2017; Chou et al., 2020). Reforecasts are created by running a climate model over a chosen period of time in the past and letting the climate model make its predictions retrospectively (Hamill et al., 2013). The GEFS reforecast was created with the specific purpose of quantifying model errors and developing novel ways to reduce these errors (Hamill et al., 2013), as done in this paper.

Reanalyses are a type of data assimilation that combine local climate observations with short-range weather forecasts to create an authoritative climate dataset, giving the best possible picture of the entire climate at a given time (Dueben & Bauer, 2018; Hersbach et al., 2020). Many studies have used reanalyses to improve the performance of specific models by tweaking their parameterisations to produce more accurate outcomes (Chawla et al., 2013; Pokhrel et al., 2018; Yang & Kim, 2019). Moreover, the ERA5 reanalysis we used in this paper has previously been used as a ground truth value to test climate data in multiple research works (Chang & Guillas, 2019; Grönquist et al., 2019; Dou et al., 2020; Oses et al., 2020; Grönquist et al., 2021). As an example, Dou et al. (2020) use the ERA5 reanalysis as an authoritative dataset for estimating atmospheric ice content mass. While this study acknowledges that the ERA5 reanalysis contains some uncertainty, the research validates that the reanalysis exhibits high consistency with observed climate variables (Dou et al., 2020). An older version of the ECMWF reanalysis was also used as validation data in the fifth IPCC report (Flato et al., 2013) for determining model error in the Coupled Model Intercomparison Project Phase 5 (CMIP5). Moreover, a study investigating the applicability of the ERA5 reanalysis as a replacement for weather station observations found that models trained on the reanalysis actually performed better than models trained on weather station data, due to the large amount of data available in the reanalysis (Oses et al., 2020).

However, since reanalysis datasets are still known to contain some bias when compared to observational data (Moalafhi et al., 2017), we used an observational dataset as validation to test whether the ERA5 reanalysis is in fact more accurate than the GEFS reforecast. For this observational validation, we compared both the reanalysis and reforecast datasets with the E-OBS gridded temperature data derived from in-situ observations (Copernicus Climate Institute, 2021) and tested whether the reanalysis is more reliable at representing this observational data than the reforecast is.

Data Preprocessing

In order to start the data analysis, all data first needs to be converted to the same dimension sizes for it to be comparable. To achieve this, all climate datasets are first converted to the same filetype, for which we chose the Network Common Data Form (NetCDF). This filetype is a convenient solution for geographic climate analysis varying over time. Both the ERA5 Reanalysis (Copernicus Climate Institute, 2019) and the GEFS reforecast (NOAA, 2019) data can be converted to this filetype without causing any data loss. To display the climate data that has thus been reformatted, the NetCDF files are loaded into the Python programming language and plotted on a global map with continental outlines, using the World Geodetic System 1984 (WGS84) coordinate system, as shown in Figs. 1 and 2. All Python code used for the analysis and creation of these maps can be found in the appendix of this paper.

As seen in Figs. 1 and 2, the reanalysis and reforecast datasets show some differences. While this can partially be attributed to modeling inaccuracies, the datasets also use different spatial resolutions and temporal units. In order to make the datasets comparable, these variables need to be converted to the exact same format. For the spatial resolution, this is relatively straightforward: since the resolution of the reanalysis is exactly four times higher than that of the reforecast, it is sufficient to reduce the reanalysis resolution to the exact extent of the reforecast dataset by only selecting every fourth cell. This yields datasets with a longitude of 360 cells by a latitude of 180 cells, as opposed to the reanalysis’ initial 1440 x 720 resolution.

The GEFS reforecast dataset starts on the 1st of June 1985 and ends on the 16th of September 2020, while the ERA5 reanalysis starts on the 1st of January 1980 and ends the 1st of August 2020. To solve this issue of mismatching timeframes, the temporal extent of both datasets was clipped to span a common overlap from the 1st of June 1985 until the 1st of August 2020. Turning to the time resolution, the reforecast contains data for every day of the month, whereas the reanalysis only contains data for the first day of every month. The reforecast data is reduced to match the timesteps of the reanalysis, meaning the final data for our analysis will refer to every first day of each month between June 1985 and August 2020.

By reducing the temporal and spatial resolutions of the datasets, a significant amount of data is lost. However, since the initial climate data was so abundant, this should not be a problem. In fact, the data we still use consists of 180 x 360 cells per map, across a total of 422 moments in time, resulting in more than 27 million datapoints that we can use for our analysis. This plethora of datapoints has proven to be more than enough to train and validate any of our machine learning models. The data is now ready to be compared, spanning the same spatial and temporal dimensions, and still holds a vast amount of information.

Creating a training and test set

When training our model, it is important that the data on which it is trained and the data on which it is tested are kept separate, so that we do not assess the model on the same data we train it on. This prevents overfitting the model on the training data and creates a more realistic accuracy assessment, since the final version of the model is validated on a dataset it has never seen before. Research has been conducted towards the ideal Train/Test ratio (Pawluszek-Filipiak & Borkowski, 2020; Rácz et al., 2021), but this has proven to be rather dependent on the type of data used (Pawluszek-Filipiak & Borkowski, 2020). Regardless, Train/Test ratios of either 70%/30% or 80%/20% are generally accepted and have been shown to perform well, specifically across large datasets (Rácz et al., 2021) like the one that we use. For our Train/Test split, we allocated 75% of the data to the training set, and the remaining 25% to the test set used for validating the model after training. Thus, we shuffle all the data, and randomly separate the dataset into a test and a training set.

Data Validation

Before beginning to tune the reforecast model output to match the reanalysis more closely, it is important to ascertain that the ERA5 reanalysis data is in fact more accurate than the GEFS climate model output. It is known that reanalysis datasets still contain some errors (Moalafhi et al., 2017), so it is crucial to confirm that the reanalysis dataset is an accurate representation of real-world climate variables before using it as test data. In order to validate the accuracy and reliability of both datasets, we use a validation dataset containing station measurements directly sourced from the European National Meteorological and Hydrological Services (NMHSs): the E-OBS meteorological data for Europe derived from in situ observations dataset (Copernicus Climate Change Service, 2021). While possibly still prone to some observational and interpolation bias, the temperature data in this dataset is derived directly from in situ climate observations made by NMHSs stations and can therefore be seen as very close to the real climate variables (Augustine et al., 2005; Sinisalo et al., 2013). We use this to test the accuracy of both the reanalysis and reforecast datasets against this ground truth baseline.

As seen in Fig. 3, the validation set does not exactly match the extent of the other datasets, as the observations on which the dataset is based only span across land areas within Europe. Therefore, we clip the extent of the reforecast and reanalysis dataset to match this extent above Europe as well. Additionally, the 25.933 different timestamps present in the validation set are reduced to exactly the 422 moments we have chosen for our analysis. The validation set still contains some missing values in water areas across this European extent, but these regions were ignored in the error estimation. After getting the observational, reanalysis, and reforecast datasets in the same shape, the mean temperature for each map was calculated over this extent, as shown in Table 1.

Table 1

*Mean European Land Temperature Comparison*
Model	Mean Temperature in degrees Kelvin
E-OBS Observational data	281.84
ERA5 Reanalysis data	282.09
GEFS Reforecast data	279.22

While all three mean temperatures are relatively close, the observational mean is significantly closer to the reanalysis mean than the reforecast mean. While this is an indication that the reanalysis reflects real climate variables more accurately, it is not yet a robust measurement of model error. In order to quantify these discrepancies more accurately, we will use the Root Mean Square Error (RMSE) to represent model inaccuracy. The RMSE gives a number error score which represents the difference between two climate maps, while weighing large outliers more heavily. This yields a unitless error measure between the predicted and the test data, which has previously been used for similar climate model tuning applications (Chang & Guillas, 2019; Grönquist et al., 2019). If we have a set of N model-predicted climate variables Ŷ (in this case temperature), and a set of N reference climate variables Y, the RMSE formula is defined as:

Formula 1

RMSE

$$RMSE= \sqrt{\sum _{i=1}^{n}\frac{{({Ŷ}_{i}-{\text{Y}}_{\text{i}})}^{2}}{N}}$$

Using the observational data as reference temperature, the RMSE of the reforecast data is 3.22. When testing the reanalysis’ accuracy on the observations, the reanalysis RMSE is 0.38, showing that the reanalysis data is substantially closer to the observed data than the reforecast data. This confirms that the reanalysis dataset is a relatively accurate and authoritative dataset suitable for climate model tuning. While it does not perform perfectly on the validation data and still has an error score of 0.38, this is a large improvement over the reforecast’s error score of 3.22. This indicates that if we can tune our climate model to reflect the reanalysis more accurately, it will be a more realistic estimation of real-world climate variables as well.

In order to define a baseline RMSE along which to compare our correction models, we measure how accurate the reforecast data is for reflecting the reanalysis. Comparing the difference between the reanalysis and the reforecast maps yields that the reforecast has an RMSE of 2.90 with the reanalysis data as a reference. Any of our correction models need to score below this baseline RMSE to be an accuracy improvement. This RMSE of 2.90 is lower than the 3.22 RMSE which the reforecast attained for representing the observational data, showing that the reforecast is closer to the reanalysis data than it is to the observational data. This is not entirely surprising, as the reanalysis dataset also employs short-range weather forecasting models (Hersbach et al., 2020), which may contain some incorrect parameterisations similar to the one’s present in the GEFS. This phenomenon, and how our correction model could potentially help reduce this bias, is further examined in the discussion section of this paper.

Regardless of the reanalysis’ slight inaccuracy, it is still deemed to be a valid test set, since it is much more accurate than the reforecasted climate data. Therefore, we will start quantifying climate model error by subtracting reforecast temperature from the reanalysis temperature. This yields global maps of climate model inaccuracy, such as the ones seen in Fig. 4. Using this definition of model error entails that a positive model error indicates that the prediction was too low and that the model temperature (reforecast) should have been higher, while a negative error value indicates that the prediction should have been lower. This model error is then calculated globally across all 422 timestamps, creating 422 unique inaccuracy maps. The inaccuracy patterns found in these maps differ a lot across different years, seasons, and regions. To give an indication of what these varying patterns look like, a sample of six inaccuracy maps equally distributed across the chosen timeframe is displayed in Fig. 4.

While some common patterns can be recognised in these maps, it is difficult to estimate the prevalence of a specific pattern across the years by simply looking at the different maps. In order to quantify the climate inaccuracy in a more generalized way, we use all the temperature data from the 422 different maps and average their error. This yields one, all-encompassing climate inaccuracy map, which shows the average inaccuracy at each grid cell of the GEFS from 1985 until 2020 (Fig. 5). The global patterns of inaccuracy that are displayed in Fig. 5 are further analyzed in the results section of this paper.

Adding more training data

The data we currently have in the dataset is still relatively meagre for making the more complex inaccuracy predictions that are needed to achieve a stronger RMSE reduction. Currently, one datapoint consists of its longitude, latitude, and time values, the reforecasted (model) temperature, the reanalysis (considered accurate) temperature, and the difference between these two temperatures (the error we are trying to predict). While this already offers a large amount of information about climate model inaccuracy, any machine learning model we train is likely to perform better if we give it more variables to train on. A condition for any data we add to the training and test datasets is that it needs to be consistently available throughout time, since we want our error correction model to be applicable for future climate model predictions as well. Therefore, we cannot include variables such as measured humidity or solar radiation since this data is not available for future model predictions. However, what we can add is other climate variables forecasted by the GEFS, as these predictions can also be made for the future. The GEFS contains predictions of many different physical processes and climate variables, which may prove helpful in our analysis and prediction of temperature inaccuracy. As such, we add the GEFS reforecasted Precipitation and Cloud Cover variables to the training and test sets and convert them to align with the spatial and temporal resolution as previously described.

Land use classification

A less straightforward addition to our dataset relates to the idea that land use may have an influence in determining temperature inaccuracy, which is explored further in the results section. In order to test and quantify this hypothesis, it would be ideal to add a global land use type dataset into the analysis. Most land use data publicly available does not suit our purpose exactly due to having impractical filetypes and/or shapes, or because of unnecessary high amounts of distinct land use classes. In order to use the most suitable land use data for our specific purpose, we developed our own custom-made land use dataset on a global scale.

To start this land use classification, we begin by georeferencing a high-resolution file of global satellite images (NASA, 2008) to a WGS84 coordinate system. After appending the appropriate coordinates to the global image, we use a form of supervised image classification in the GIS software ArcMap to begin the land use classification process. For this classification, we define 5 different classes of global land use: water, ice, land, shallow water, and desert. For each of these classes, several training samples are created on NASA’s georeferenced satellite map, as shown in Fig. 6.

After defining the training sets for the land use types, the classification model is trained to recognize the visual aspects of each different class, based on the training sample squares shown in Fig. 6. Maximum likelihood image classification is then used to assign each global coordinate to one of the five land use classes, resulting in a global land use map as shown in Fig. 7:

A confusion matrix accuracy assessment (Lina Yi & Guifeng Zhang, 2012) was conducted to calculate that the kappa accuracy coefficient for this classification is 0.91, which is above the generally accepted 0.81 threshold of “almost perfect agreement” (McHugh, 2012). Therefore, our custom land use classification dataset is determined to be sufficiently accurate to use in our analysis.

To implement the different land use classes into our training and test sets, five dummy variables are defined to represent the five different land use classes. Each of these dummy variables evaluates to 1 if that datapoint matches its corresponding land use class, and to 0 otherwise. For example, if a coordinate’s land use is part of the ‘Ice’ class, then the ‘Ice’ dummy variable will equal 1, and all the other dummy variables will equal 0. These variables, in combination with the newly created Precipitation and Cloud Cover variables, are all appended to the existing datasets. All this data is then used to create a huge Comma Separated Value (CSV) file that we will use to train our models, which is once again split into separate test and training sets. We use a CSV file specifically since it is convenient filetype to modify and export data between the ArcMap image classification and the Python analysis.

Table 1: An Example of the Training set CSV. ‘ModelError’ is the Variable we are Training to Predict.

Using the method for calculating average inaccuracy as described above Fig. 5, average inaccuracy is calculated specifically for the training set, and displayed in Fig. 8.

Since Fig. 8 averages all inaccuracies across the years, it is a good representation of the most prevalent patterns of inaccuracy and contains a lot of information about these patterns. Specifically, model inaccuracy seems to manifest itself differently between different continents but appears to remain relatively consistent within these continents. Additionally, all water regions on this map (with the notable exception of the Arctic Ocean) have approximately the same blue colour, indicating that temperature above water is consistently predicted too high by the same amount. The Antarctic temperature is also consistently predicted too high, which seems rather similar across most ice-covered regions. This implies that perhaps land use types such as ice- and water-covered regions may affect climate model inaccuracy in a rather consistent way, which would prove beneficial for the application of our land use classification dataset. An alternative explanation for this specific pattern could be that warmer regions are generally predicted too cold, whereas colder regions are generally predicted too warm. This hypothesis also seems to agree rather well with a large part of the data shown in Fig. 8, but different land use types could explain this phenomenon just as well. These patterns contain a large amount of information about inaccuracies in the GEFS, and their analysis deserves a paper of its own. For the purpose of climate model error reduction, we are mainly interested in using these patterns to reduce climate model inaccuracy.

Average Error Subtraction

As shown in Fig. 8, a map of average global model error gives a rather straightforward and understandable indication of average inaccuracy patterns. A logical inference to be made from these patterns of model error is that if the average model error were to be subtracted from all model temperature predictions, the average error could be reduced. If the model predicts a certain place 2 degrees too high on average, it may improve the accuracy if we simply subtract 2 degrees from all temperature predictions at that specific location. In order to test this hypothesis, the average error in the training set was subtracted from the test set, and the accuracy improvement of this method is determined.

The initial model RMSE of the reforecast on the reanalysis was 2.90. After subtracting the average model error in each cell from the test dataset, the RMSE becomes 2.51. While this method of average error subtraction is a relatively simple solution, it already yields an RMSE reduction of 0.39 across almost 7 million datapoints, without the application of any complicated statistical techniques. This shows promise for the implementation of more advanced error reduction techniques.

Linear Models

Considering the abundance of training data that has been obtained, we can start creating models that try to predict the climate model error based on this training data. The first statistical model we will use to predict this error is an Ordinary Least Squares (OLS) Multiple Linear Regression (MLR). The reason we start with a MLR model is that it is less of a “black box” model than more sophisticated ML algorithms, meaning that its coefficients can still be relatively well understood. This may provide us with relevant information about feature importance, and the suitability of a linear model for predicting climate model error. A MLR model attempts to quantify a linear relation between a dependent variable (model error) and a number of ß_i variables, including a ß₀ intercept, by minimising the sum of squared errors (Nimon & Oswald, 2013). In general, a MLR model can be described as follows:

Formula 2

General MLR

$$Dependent Variable= {ß}_{0}+{ß}_{1}{X}_{1}+{ß}_{2}{X}_{2}+ ...+ {ß}_{i}{X}_{i}+\epsilon$$

Where ß₀ is the intercept, X_i are the training variables, ß_i are their corresponding coefficients, and ε is the error term. Fitting such a MLR model to predict the model error in our training set, the linear formula becomes:

Formula 3

Fitted MLR model

$ModelError=6.92-0.0034*Longitude-0.00099*Latitude- 0.00013*Time$ $-0.021*Temperature-0.094*Precipitation-0.00041*CloudCover-0.32*Water$ $-2.23*Ice+0.43*Land+0.21*Shallow Water+1.91*Desert$

Several interesting observations can be made from this linear relationship between the model error and the training variables. The first significant observation is that the intercept for this MLR model starts off positive and rather high: at an initial intercept error of 6.92 Kelvin, with almost all independent variables subtracting from this value. This indicates that a hypothetical datapoint with a high score in all training variables, (e.g., a high longitude, latitude, time, temperature, precipitation, and cloud cover value) would have a large, negative model error, whereas a datapoint with low variable scores would have a remarkably high model error. However, it is important to bear in mind when analysing these coefficients that the land use variables (water, ice, land, shallow water, and desert) are in fact dummy variables, meaning that their relatively high values are not necessarily because of their importance, but rather because of their scale being limited between 0 and 1. Considering this, it is still surprising how some land use classes appear way more important in determining model error than others. The water, land, and shallow water coefficients are all relatively low values (-0.32, 0.43, and 0.21, respectively), whereas the ice and desert coefficients (the most extreme climates) are equal to -2.23 and + 1.92, respectively; showing that climate model predictions are most unreliable in the warmest and coldest climates. This result is especially interesting given the fact that the temperature coefficient is negative, which seems to contradict the positive value for desert climates and the negative value for icy climates. Of course, since land use type and temperature are inherently correlated features (deserts are generally warm and ice is generally cold), some amount of multicollinearity may be present in this MLR, causing some of these counter-intuitive coefficients and potentially reducing the skill of the model for predicting climate model inaccuracy. However, removing the land use variables from the MLR model has shown to reduce the model’s performance, so it is decided to keep them in the model.

The relatively high absolute coefficients for ice and desert areas could potentially indicate a large unreliability in climate model predictions for more extreme climates and could point to the inability of a linear model to represent the complex patterns of climate model inaccuracy in general. Indeed, our initial MLR model is calculated to have an adjusted-R² value of 0.048, indicating that a linear model is not an effective way of representing the complex relationship between the different climate variables. Nevertheless, our MLR model did attain a p-value < 2.2E^− 16, giving us sufficient evidence to reject the null hypothesis of no relation between the dependent and independent variables, and indicating that there is a statistically significant relationship between our variables at the 99% confidence level. Moreover, using our trained MLR formula to predict the model errors of the test set and then subtracting these predicted errors from the climate model reduces the climate model RMSE from 2.90 to 2.48. This indicates that the linear regression has some skill in predicting model error, which shows promise for more complex, non-linear prediction models.

Before trying a drastically more complex approach, we first import several of Scikit-Learn’s different regression models: Ridge, Lasso, and ElasticNet regression (Pedregosa et al., 2011). We will not go as deep into their coefficient and test statistic (R², p-value) analyses, as these regressions are simply different types of linear regression that attempt to solve some of the issues with OLS regression (Pedregosa et al., 2011). Rather, in the code provided as appendix with this paper, we import all the aforementioned models into our working space, fit them on the training data, and use them to predict and reduce model inaccuracy as previously described. The accuracy reduction results provided by each of the linear models are shown below in Table 2.

Table 2

*Error reduction by different regression types with reanalysis as a reference*
Regression type	RMSE
Original Climate Model	2.90
Lasso Regression	2.50
Elastic-Net Regression	2.50
OLS Linear Regression	2.48
Ridge Regression	2.47

From this table, it becomes apparent that these linear forms of regression are all rather comparable in terms of RMSE reduction for this dataset. This seems to confirm our suspicion that linear models are inadequate to explain the complex patterns of climate model inaccuracy. Thus, we move on to more sophisticated deep-learning models.

Deep-Learning Neural Networks

NNs consist of multiple layers of interconnected algorithmic nodes, or artificial neurons (Kartalopoulos, 1996). The way that each of these neurons handles its inputs is determined by its specific activation function algorithm. The neuron then gives its output to the next layer of neurons, modified by the optimized neuron weight, for which the model can be trained. Using different layer combinations and activation functions, an endless amount of different NN architectures could be created. These customized NNs allows for the quantification of complex relationships between many non-linear systems.

The application of NNs for climate model postprocessing is propitious and was already applied in several different climate forecasting scenarios (Krasnopolsky, 2013; Dueben & Bauer, 2018). Specifically, NNs have the advantage that they are better at representing complex, non-linear relationships between variables than traditional statistical approaches, while not being tied down by having to adhere to the parameterised laws of physics (Rasp & Lerch, 2018). It is for these reasons that we will train our own NNs to predict the reforecast inaccuracy, starting with Scikit-Learn’s premade Multi-layer Perceptron (MLP) model (Pedregosa et al., 2011).

Fitting the NN model on our training data takes a large amount of time and computational power, as NNs are computationally expensive relative to the linear models we have previously trained. They are computationally efficient compared to the complex physical parameterisations used in climate models (Rasp & Lerch, 2018), but this study is conducted without access to the supercomputers that physical climate models are generally run on. This means that a large amount of hyperparameter optimization using a grid search for the MLP is hard to achieve, as fitting the model multiple times with slightly changed hyperparameters is infeasible without access to exponentially more computing power. Regardless of hyperparameter optimization however, it is possible to fit a single instance of the model with manually chosen hyperparameters. Moreover, after training and applying the MLP NN on our climate data, its RMSE is reduced to an error value of 2.03: a larger improvement than any other model has previously been able to attain. Thus, the NN has indeed shown to be more skilful at recognising the non-linear patterns of climate model inaccuracy than any of the linear regressions.

Realising the potential of NN models to predict climate model inaccuracy, we attempted to define our own, custom-made NN using the Keras TensorFlow (Chollet et al., 2015) library and a custom NN architecture we have specifically designed for climate model adjustment. This custom-made NN has not managed to beat Scikit-Learn’s MLP model, but did still prove more skilful than the linear models. The lowest RMSE achieved by our own NN is an RMSE of 2.33, which is already a promising result. With more optimized NN architectures and computing power availability, NN models could potentially yield large climate model error reductions, and the potential applications of this for climate model postprocessing should be further investigated.

Random Forest Regression

A Random Forest (RF) regression model consists of a large amount of decision trees which use information about a certain datapoint to take steps through the decision forest, and eventually estimate model inaccuracy based on the final leaf-node it reaches (Breiman, 2001). Random Forest models have previously been used in climate model adjustment (Rasp & Lerch, 2018), in the context of predictions for wind (Lagerquist et al., 2017), precipitation (Gagne et al., 2014), hail (Gagne et al., 2017), and moist convection parameterisations (O’Gorman & Dwyer, 2018). However, to the best of our knowledge, random forest (RF) models have never been used in a postprocessing scenario to predict and reduce model temperature errors as described in this paper. This is surprising, since an RF regression turned out to be our most effective model by far, providing a drastically higher error reduction than any of the models previously described.

Using Scikit-Learn’s Random Forest Regression (Pedregosa et al., 2011), a 250-estimator RF decision tree was generated to determine the inaccuracy value of any datapoint based on the path it takes in the tree. Using this forest to predict and eliminate climate model error reduces the GEFS inaccuracy to an RMSE of 0.48, an error reduction significantly higher than any other model previously trained or used: and a significant model improvement in general. The R² value for this RF model is also 0.95 out of 1, indicating that the regression model fits the observed data remarkably well. Manually tuning the hyperparameters and doing some feature selection for this model, RMSE is further reduced to a value of 0.44, an error reduction of more than 84% on the original model RMSE of 2.90.

Table 3

*Summary of All Correction Models and their RMSE*
Model	RMSE
Original Climate Model	2.90
Average Error Subtraction	2.51
Lasso Regression	2.50
Elastic-Net	2.50
OLS Linear Regression	2.48
Ridge Regression	2.47
Custom Neural Network	2.33
Multi-layer Perceptron Neural Network	2.03
Random Forest without Time	0.76
Random Forest Regressor	0.48
Tuned Random Forest Regressor	0.44

Table 3 shows the accuracy improvement of all the different models developed in this paper. While all of these methods are skilful and reduce climate model RMSE by varying degrees, a large discrepancy is to be found between the RF models and the others. It appears that the RF models are by far the most efficient method of describing and predicting the patterns of climate model inaccuracy, as assessed within the scope of this study.

In order to gain a better understanding of how exactly the RF model reached such a large error reduction, a further analysis of this model architecture is required. Using the Mean Decrease in Impurity (MDI) measure, it is possible to calculate the specific importance of the different features that the RF model used to attain its high error reduction value. MDI is an RF based measure of feature importance (Bhadra et al., 2020), giving us an indication of which features are the most essential for determining climate model inaccuracy in our trained RF model. The calculated MDI is shown in Fig. 9.

While all models used different measures for feature importance, it is fair to assume that the RF feature importance is the one that provides the most insight, since the RF is the most effective model. From Fig. 9, it becomes apparent that the predicted temperature of the reforecast dataset is the most important factor in predicting temperature inaccuracy, confirming our hypothesis that there exists a strong relationship between temperature and model error, which is further supported by the patterns in Fig. 8. Additionally, Fig. 9 shows us that the Longitude, Latitude, and Time variables seem to be particularly important in determining model error as used by the RF model, while the land use classes have an exceptionally low feature importance.

Zooming in to the first nodes of an RF decision tree (Fig. 10) yields some more clues as to why spatial variables are so important for our RF error predictions. As can be seen in Fig. 10, two out of the three first decision nodes are related to the datapoint’s spatial location. However, in contrast to the linear forms of regression, these spatial parameters use a ‘higher than or equal to’ method of defining the spatial boundaries. Rather than assigning a fixed coefficient for the Lon and Lat values as the linear models must, the RF model can box off a specific region of the globe using multiple boolean statements and assign a certain value to the region defined in this way. Considering that the RF model can define specific regions on the map also explains why land use classes have such a low feature importance in this model, as land use type is essentially contained deep within geographic location. While spatial data does not explicitly represent land use classes, each coordinate does have its own deterministically associated land use type. Therefore, removing the land use classes from the training set entirely improves the skill of our RF model by reducing noise and collinearity, which yields the final RMSE result of 0.44.

The advantage of boxing specific areas becomes especially clear when considering Fig. 8, in which it is shown that specific continents and regions have distinct accompanying error levels. Being able to define an error for specific areas of the map rather than having to use a linear coefficient times coordinate type equation is a key advantage that an RF model has, partially allowing it to perform drastically better at error estimation than the other models. Thoroughly understanding the decisions that a random forest model makes to improve climate model predictions may provide valuable insights as to where exactly the climate models make mistakes, and how this could potentially be improved.

The relatively high importance of time in Fig. 9 is two-fold, as the time variable adds two distinct types of information to the training data. Firstly, a specific timestamp can be interpreted as the month or season within a year, which adds relevant information to the training data. Secondly, a timestamp also represents one specific moment in time, which could cause the model to be overfitted on the specific timeframe across which a reforecast is analyzed. One year may have seen specific temperature trends, which the RF model has learned to recognize and quantify. This could prove problematic when applying a trained RF model on different timeframes and may reduce postprocessing reliability. Therefore, when correcting climate model predictions of future dates, it may be advisable to consider removing the time variable from the training set entirely to prevent overfitting and see whether this increases RF reliability. Removing the time feature from the training data has been attempted with our specific 1985–2020 GEFS reforecast timeframe and yielded a model RMSE of 0.76: higher than the optimal RF RMSE of 0.44, but still better than any of the other models. This indicates that our RF model is still skilful without using the time feature, and that removing it entirely could be a potential solution to prevent model overfitting. Alternatively, it may be desirable to replace our 0-422 time unit scale with a repeating 1–12 scale, representing the month in a specific year rather than the number of months since the first datapoint. This solves the issue of overfitting the model for specific years but retains some time-specific information. Using a 1–4 time unit representing the season of a datapoint could potentially achieve a similar purpose.

In addition to having developed a very efficient RF model to reduce model inaccuracy for the GEFS specifically, considerable effort has been put into making all the code and methodology for this model’s development understandable and reproducible. All code created for the data analysis and model development used in this paper, including the final trained RF model, can be found at: https://github.com/timholthuijsen/ClimateModelCorrection

Alternatively, to generate a tailor-made RF correction model for any and all climate model outputs, the Python function ModelMaker.py has been created. This program is also freely available for download on the aforementioned GitHub page, and can be used to create a correction algorithm for any desired climate model output. Simply run the ModelMaker.py file from the command line, after which it will ask you for the path to the climate model output you wish to improve by creating a correction algorithm. The program will then calculate the initial error of your climate data on the ECMWF reanalysis, train an RF model to reduce this error, and return the trained model plus the reduced error value after applying the model. Note that any climate data given to ModelMaker.py needs to be of the same shape as the climate data used throughout this paper, namely a NetCDF file in the shape [422, 360, 180], with the dimensions (Time, Longitude, Latitude). For more customized dimension shapes or different validation sets, the other files in the GitHub repository can be applied. Further instructions for customized model generation can also be found in this Github repository, in addition to the data used to train the RF model defined in this paper, and the trained RF model itself.

A possible application of the generalized error-correction model may be to adjust the reanalysis data to reflect the E-OBS observational data more closely. This may help remove or reduce the RMSE of 0.38 still present in the reanalysis, reducing reanalysis bias. Furthermore, investigating how the model adjusts the reanalysis may yield some clues as to where this inherent bias originates from, allowing for the reanalysis to be further processed and its bias to be reduced. Worthy of further investigation would be whether this bias reduction trained on the reanalysis could also work in reducing the RMSE of the reforecast dataset. Noteworthy is that the reforecast’s RMSE difference on the test and validation sets (reanalysis and observational) of 0.32 (3.22–2.90) is rather close to the reanalysis’ inaccuracy of 0.38. This raises the question that if a method is developed for reducing this 0.38 RMSE in the reanalysis, whether this method would work on reducing the reforecast bias by a similar amount as well. This is a prime example of a potential application of the final product of this paper and may provide insights into both the origin of inherent model bias, as well as the methods to reduce it and how the eventual climate correction algorithm can aid with these processes.

What discerns our summarized model RMSE reduction (Table 3), is how far apart the RF RMSE is from all the other model RMSEs. While it is not surprising that an RF model is skilful at predicting climate model error, it is surprising by how much it outperforms the other models used. Specifically NNs, which are good at recognising complex, non-linear relationships, were expected to outperform linear models more than they did. Additionally, the increased prevalence of NNs over RF models in the literature for climate model postprocessing implicate NNs as the most probable contender for efficient error reduction, which our results contradict. The lackluster performance of NNs relative to RFs could partially be due to our NN architecture being sub-optimal for climate model error predictions, which could be improved in future research. Alternatively, the large discrepancy between NN and RF accuracy could also be explained by the surprisingly skilful performance of the RF model, begging the question why RF models in climate science seem to be predominantly used for parameterisation tuning, rather than postprocessing error adjustments. Further research should be conducted regarding the potential applications of RF models in both climate model predictions and the postprocessing of these predictions, since it has proven remarkably efficient for this specific application. Moreover, with access to more computing power, more computationally expensive hyperparameter tuning should be conducted to optimise the RF and NN adjustment models further, and tailor them for more individual climate models. The training data could potentially also be expanded with more climate variables than only precipitation and cloud cover, helping the RF and NN models make even more informed decisions. Finally, a thorough investigation of the RF decision tree for the GEFS should be conducted. By investigating this decision tree, we can start to understand how exactly the RF model makes its decisions of when and where to adjust climate model outputs. This, in turn, will give us a better understanding of where we should adjust our climate models to make them more accurate, and help us improve the predictions we make about the future of our climate.

Competing Interests

The author declares that no relevant competing or financial interests exist

Funding

The author declares that no funds, grants, or other support were received during the preparation of this manuscript.

Author Contributions

This paper was written solely by Tim Franciscus Holthuijsen

Data Availability

All source code can be found can be found under https://github.com/timholthuijsen/ClimateModelCorrection

Augustine JA, Hodges GB, Cornwall CR, Michalsky JJ, Medina CI (2005) An Update on SURFRAD—The GCOS Surface Radiation Budget Network for the Continental United States. J Atmos Ocean Technol 22(10):1460–1472. https://doi.org/10.1175/JTECH1806.1
Bhadra S, Sagan V, Maimaitijiang M, Maimaitiyiming M, Newcomb M, Shakoor N, Mockler TC (2020) Remote Sens (Basel Switzerland) 12(13):2082. https://doi.org/10.3390/rs12132082. Quantifying Leaf Chlorophyll Concentration of Sorghum from Hyperspectral Data Using Derivative Calculus and Machine Learning
Chang K-L, Guillas S (2019) Computer model calibration with large non-stationary spatial outputs: Application to the calibration of a climate model. J Roy Stat Soc: Ser C (Appl Stat) 68(1):51–78. https://doi.org/10.1111/rssc.12309
Chawla A, Spindler DM, Tolman HL (2013) Validation of a thirty year wave hindcast using the Climate Forecast System Reanalysis winds. Ocean Modelling (Oxford) 70:189–206. https://doi.org/10.1016/j.ocemod.2012.07.005
Chollet F et al (2015) Keras. Retrieved 6 May 2021, from https://keras.io
Chou SC, Dereczynski C, Gomes JL, Pesquero JF, De Avila AMH, Resende NC, Alves LF, Ruiz-Cardenas R, De Souza CR, Bustamante JF F (2020) Ten-year seasonal climate reforecasts over South America using the Eta Regional Climate Model. An Acad Bras Cienc 92(3):e20181242–e20181242. https://doi.org/10.1590/0001-3765202020181242
Clark MP, Hay LE (2004) Use of Medium-Range Numerical Weather Prediction Model Output to Produce Forecasts of Streamflow. J Hydrometeorol 5(1):15–32. https://doi.org/10.1175/1525-7541(2004)005<0015:UOMNWP>2.0.CO;2
Copernicus Climate Change Service (2020) E-OBS daily gridded meteorological data for Europe from 1950 to present derived from in-situ observations [Data set]. ECMWF. Retrieved May 9, 2019 from
https://cds.climate.copernicus.eu/cdsapp#!/dataset/insitu-gridded-observations-europe?tab=form
https://doi.org/10.24381/CDS.151D3EC6
Copernicus Climate Institute (2019) ERA5 monthly averaged data on single levels from 1979 to present. Retrieved November 11, 2019, from https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-monthly-means?tab=overview
Dou T, Xiao C, Huang Y, Yue H, Han W (2020) Estimation of the Atmospheric Ice Content Mass, Spatial Distribution, and Long-Term Changes Based on the ERA5 Reanalysis. Geophys Res Lett 47(15). https://doi.org/10.1029/2020GL088186
Dueben PD, Bauer P (2018) Challenges and design choices for global weather and climate models based on machine learning. Geosci Model Dev 11(10):3999–4009. https://doi.org/10.5194/gmd-11-3999-2018
Flato G, Marotzke J, Abiodun B, Braconnot P, Chou SC, Collins W, Cox P, Driouech F, Emori S, Eyring V, Forest C, Gleckler P, Guilyardi E, Jakob C, Kattsov V, Reason C, Rummukainen M (2013) Evaluation of Climate Models. Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC). Cambridge University Press
Foley AM (2010) Uncertainty in regional climate modelling: A review. Prog Phys Geogr 34(5):647–670. https://doi.org/10.1177/0309133310375654
Friederichs P, Hense A (2007) Statistical downscaling of extreme precipitation events using censored quantile regression. Mon Weather Rev 135(6):2365–2378. https://doi.org/10.1175/MWR3403.1
Gagne D, Mcgovern A, Haupt S, Sobash R, Williams J, Xue M (2017) Storm-Based Probabilistic Hail Forecasting with Machine Learning Applied to Convection-Allowing Ensembles. Weather Forecast 32. https://doi.org/10.1175/WAF-D-17-0010.1
Gagne D, Mcgovern A, Xue M (2014) Machine Learning Enhancement of Storm-Scale Ensemble Probabilistic Quantitative Precipitation Forecasts. Weather Forecast 29:1024–1043. https://doi.org/10.1175/WAF-D-13-00108.1
Grönquist P, Ben-Nun T, Dryden N, Dueben P, Lavarini L, Li S, Hoefler T (2019) Predicting Weather Uncertainty with Deep Convnets. ArXiv:1911.00630 [Physics, Stat]. http://arxiv.org/abs/1911.00630
Grönquist P, Yao C, Ben-Nun T, Dryden N, Dueben P, Li S, Hoefler T (2021) Deep learning for post-processing ensemble weather forecasts. Philosophical Trans Royal Soc Lond Ser A: Math Phys Eng Sci 379(2194):20200092–20200092. https://doi.org/10.1098/rsta.2020.0092
Hamill TM, Bates GT, Whitaker JS, Murray DR, Fiorino M, Galarneau TJ, Zhu Y, Lapenta W (2013) NOAA’s Second-Generation Global Medium-Range Ensemble Reforecast Dataset. Bull Am Meteorol Soc 94(10):1553–1565. https://doi.org/10.1175/BAMS-D-12-00014.1
Hamill TM, Whitaker JS (2006) Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon Weather Rev 134(11):3209–3229. https://doi.org/10.1175/MWR3237.1
Hersbach H (2020) The ERA5 global reanalysis—University of Amsterdam https://lib.uva.nl/discovery/fulldisplay/cdi_webofscience_primary_000540214600001CitationCount/31UKB_UAM1_INST:UVA
Kartalopoulos SV (1996) Understanding neural networks and fuzzy logic: Basic concepts and applications. Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/9780470546826
Lagerquist R, Mcgovern A, Smith T (2017) Machine Learning for Real-Time Prediction of Damaging Straight-Line Convective Wind. Weather Forecast 32. https://doi.org/10.1175/WAF-D-17-0038.1
Lina Y (2012) Object-oriented remote sensing imagery classification accuracy assessment based on confusion matrix. 1–8. https://doi.org/10.1109/Geoinformatics.2012.6270271. Guifeng Zhang
McHugh ML (2012) Interrater reliability: The kappa statistic. Biochemia Med 22(3):276–282
Mendoza PA, Rajagopalan B, Clark MP, Ikeda K, Rasmussen RM (2015) Statistical Postprocessing of High-Resolution Regional Climate Model Output. Mon Weather Rev 143(5):1533–1553. https://doi.org/10.1175/MWR-D-14-00159.1
Moalafhi DB, Sharma A, Evans JP, Mehrotra R, Rocheta E (2017) Impact of bias-corrected reanalysis‐derived lateral boundary conditions on WRF simulations. J Adv Model Earth Syst 9(4):1828–1846. https://doi.org/10.1002/2017MS001003
NASA. (2008), June 1 SVS: Draining the Oceans. https://svs.gsfc.nasa.gov/vis/a000000/a003400/a003487/
National Oceanic and Atmospheric Administration (NOAA) (2019) Download GEFS Reforecast v2 Ensemble Data. Retrieved November 11, 2019, from https://www.esrl.noaa.gov/psd/forecasts/reforecast2/download.html
Nimon KF, Oswald FL (2013) Understanding the Results of Multiple Linear Regression: Beyond Standardized Regression Coefficients. Organizational Res Methods 16(4):650–674. https://doi.org/10.1177/1094428113493929
Oses N, Azpiroz I, Marchi S, Guidotti D, Quartulli M, Olaizola IG (2020) Sensors 20(21):6381. https://doi.org/10.3390/s20216381. Analysis of Copernicus’ ERA5 Climate Reanalysis Data as a Replacement for Weather Station Temperature Measurements in Machine Learning Models for Olive Phenology Phase Prediction
Pawluszek-Filipiak K, Borkowski A (2020) On the Importance of Train-Test Split Ratio of Datasets in Automatic Landslide Detection by Supervised Classification. Remote Sens (Basel Switzerland) 12(18):3054. https://doi.org/10.3390/rs12183054
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: Machine Learning in Python. J Mach Learn Res. https://doi.org/10.5555/1953048.2078195
Pokhrel S, Hazra A, Chaudhari HS, Saha SK, Paulose F, Krishna S, Krishna PM, Rao SA (2018) Hindcast skill improvement in Climate Forecast System (CFSv2) using modified cloud scheme. Int J Climatol 38(7):2994–3012. https://doi.org/10.1002/joc.5478
Rácz A, Bajusz D, Héberger K (2021) Molecules 26(4):1111. https://doi.org/10.3390/molecules26041111. Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification
Sinisalo A, Anschütz H, Aasen AT, Langley K, von Deschwanden A, Kohler J, Matsuoka K, Hamran S-E, Øyan M-J, Schlosser E, Hagen JO, Nøst OA, Isaksson E (2013) Surface mass balance on Fimbul ice shelf, East Antarctica: Comparison of field measurements and large-scale studies. J Geophys Res Atmos 118(20) 11,625 – 11,635. https://doi.org/10.1002/jgrd.50875
Wilks DS (2009) Extending logistic regression to provide full-probability‐distribution MOS forecasts. Meteorol Appl 16(3):361–368. https://doi.org/10.1002/met.134
Yang B, Qian Y, Lin G, Leung R, Zhang Y (2012) Some issues in uncertainty quantification and parameter tuning: A case study of convective parameterization scheme in the WRF regional climate model. Atmos Chem Phys 12(5):2409–2427. https://doi.org/10.5194/acp-12-2409-2012
Yang E-G, Kim HM (2019) Evaluation of Short-Range Precipitation Reforecasts from East Asia Regional Reanalysis. J Hydrometeorol 20(2):319–337. https://doi.org/10.1175/JHM-D-18-0068.1
Yang X, Sharma S, Siddique R, Greybush SJ, Mejia A (2017) Postprocessing of GEFS Precipitation Ensemble Reforecasts over the U.S. Mid-Atlantic Region. Mon Weather Rev 145(5):1641–1658. https://doi.org/10.1175/MWR-D-16-0251.1

CapstoneAppendixPDF.pdf
Appendix Jupyter Notebook (.ipynb) Python code used for model development can be found in additional submission files

Download PDF

Version 1

posted

You are reading this latest preprint version

Development of a Random Forest Climate Model Correction Algorithm

Status:

Version 1

Abstract

Figures

Introduction

Methodology

Results

Discussion

Conclusion

Declarations

Competing Interests

Funding

Author Contributions

Data Availability

References

Supplementary Files

Status:

Version 1