2.1. Data Collection
Multiple high-quality databases including a variety of macroeconomic indicators and policy uncertainties were utilized for this study. Data sources included World Development Indicators (WDI) from the World Bank, International Financial Statistics (IFS) from the International Monetary Fund, and a number of Economic Policy Uncertainty Indexes from the Federal Reserve Economic Data (FRED). These indexes included monetary, fiscal, trade, and financial regulation policies in addition to general economic uncertainty. The WDI dataset provide extensive coverage of macroeconomic variables (Azolibe, 2022). This dataset contains GDP, inflation rates, trade balances, interest rates, and unemployment statistics. The data was accessed via the World Bank API using Python wbdata library (Reddy & NR). The IFS dataset was downloaded using pandasdmx python library (Araujo, 2023). IFS provide a broader international perspective on monetary policy variables such as interest rates, exchange rates, and money supply. Economic Policy Uncertainty (EPU) indexes are critical for quantifying uncertainty across various policy domains (Yu et al., 2021). Table 1 is showing the specific EPU indexes selected and their significance. These indexes are chosen to represent sector-specific uncertainties that directly impact policy predictions. Data was fetched via FRED api using Python and was integrated into the complex data pipeline of the study. The final datasets contain multiple dimensions of macroeconomic and policy data ranged from 1st January 1985 to 8th October 2024.
Table 1
Selected EPU indexes and their role in policy prediction and uncertainty.
Index Name | Significance |
Monetary Policy Uncertainty Index | Measures uncertainty that is associated with the interest rates, money supplies, and controls of inflation by the central banks. This is essential for monetary policy analysis. |
Fiscal Policy Uncertainty Index | Deals with uncertainty based on government spending and tax policies. Important for understanding the landscape and volatility of fiscal policy. |
Trade Policy Uncertainty Index | Captures the unpredictability in government trade agreements, tariffs, and import/export regulations. Vital for analyzing trade policy impacts. |
Financial Regulation Uncertainty Index | Quantifies uncertainty related to financial market regulations. Helps to assess the stability of financial systems and its impacts of regulatory changes. |
Government Spending Uncertainty Index | Measures uncertainty in government expenditure plans, investments in infrastructure, defense, and social programs. Useful for evaluating fiscal sustainability. |
Equity Market-Related Economic Uncertainty Index | Measures uncertainty in the equity markets driven by macroeconomic news and expectations. Helps in assessing investment risk. |
Macroeconomic Inflation: News and Outlook | Focuses on inflation-related uncertainty. This highlight market participants' reaction to expectations in terms of inflation. |
Global Economic Policy Uncertainty Index | A comprehensive measure of economic policy uncertainty across the globe. This would be helpful in knowing cross-country policy impacts. |
2.2. Data Preprocessing and Feature Scaling
The IFS dataset was filtered to retain only the relevant countries i.e., Germany, France, United Kingdom, Italy, Japan, and the United States. Columns with little to no relevance to the study's focus were excluded based on their indices. Only essential macroeconomic indicators were focused and a total of 139 columns were dropped from the original dataset. To avoid possible issues caused by mixed data types in the IFS file, the dataset was loaded with `low_memory = False` so that all columns could be handled without losing the integrity of the data (Zhou, 2023). After filtering and removing columns, the cleaned IFS dataset was merged WDI thus creating a temporal alignment of observations coming from different sources. Missing values (NaN) were filled using the mean of the respective columns. The combined data was saved for further analysis in CSV format. The variables of combined dataset were then standardized to prepare it for statistical modeling and analysis. Standardization has been carried out using `StandardScaler` from the Scikit-learn python library by scaling the data so that every feature will have zero mean and unit variance (Raju et al., 2020). Standardization ensures that features with differing scales (e.g., GDP in trillions vs. inflation rates in percentages) do not disproportionately influence model outcomes. Only numeric variables were scaled and the 'Date' column was excluded from this transformation. The scaled dataset was visualized through histograms to verify the standardization process.
2.3. Variables Grouping and Multicollinearity Check
Hierarchical models often rely on grouping variables to define levels of variation, so 'Country' was defined as the grouping variable. The country variable was converted to a categorical type and assigned integer codes to facilitate the hierarchical analysis. In this step, the dataset was also checked for missing values using `isnull().sum()` in python to ensure completeness of the data before proceeding to model fitting. Multicollinearity among the predictor variables was assessed to ensure that the model would not be biased due to correlated predictors (Shrestha, 2020). A heatmap was generated using python to visualize the correlations between numerical variables (correlation matrix). Variance Inflation Factor (VIF) metric was also computed for each predictor variable to check if any variables exhibit high multicollinearity (VIF values exceeding 10) (Folli et al., 2020). The correlation matrix and VIF values ensure that the independent variables do not distort the results of the Bayesian hierarchical model.
2.4. Stationary Testing
To analyze the stationarity in the standardized dataset, the Augmented Dickey-Fuller (ADF) test was employed (Ajewole et al., 2020; Sarker & Khan, 2020). Stationarity testing is a fundamental step in time series analysis as it ensures that the statistical properties of the series (mean, variance, etc.) do not change over time (Silva et al., 2021). The ADF test is a widely used statistical test for checking stationarity, particularly in the presence of potential autocorrelation in the data. The standardized dataset, containing both numeric and non-numeric columns, was loaded in CSV format. Only numeric columns were considered for stationarity testing. Non-numeric columns, such as date columns, were excluded from the analysis to prevent errors. Any missing values present in the numeric columns were removed to avoid interference in the ADF test's computations. The null hypothesis (H₀) for the ADF test indicate that time series is non-stationarity. The alternative hypothesis (H₁) suggests that the series is stationary. The 'AIC' (Akaike Information Criterion) was used to automatically select the optimal lag length for the test (Sarfaraz et al., 2021). For each column, the test statistic and p-value were recorded. A p-value below a significance level of 0.05 was considered stationarity.
2.5. Min-Max Normalization
In this study, Min-Max normalization was also employed to transform the features of the dataset into a specified range i.e., between 0 and 1. This step was applied to ensure that all features were on the same scale, which is particularly important for models that rely on distance metrics (Henderi et al., 2021). The dataset was loaded from a CSV file. Only the numeric columns (i.e., `float64` and `int64`) were selected for normalization. The Min-Max scaling was applied to each numeric column to transform the values to a range between 0 and 1. This was done using the `MinMaxScaler` from the `sklearn.preprocessing` module of python (Zollanvari, 2023). After transformation, the column names were updated to match those of the original dataset. The normalized dataset was saved into a new CSV file for further analysis. Min-Max normalization effectively preserves the original relationships between the values that allow for better model performance.
2.6. Bayesian Hierarchical Model (BHM)
A Bayesian Hierarchical Model (BHM) was applied using PYMC3 to assess the impact of macroeconomic uncertainties and key economic indicators on GDP (Wang, 2021). The BHM approach account for both individual and group-level variations and allow for uncertainty quantification and probabilistic interpretation of model parameters. The Bayesian Hierarchical Model is composed of multiple levels of probabilistic dependencies included prior distribution, likelihood, posterior distribution and posterior predictive sampling. The general form of the model is:
$$\:\mathcal{Y}\mathfrak{i}={\alpha\:}+\sum\:_{\mathcal{j}=1}^{\rho\:}\beta\:\mathfrak{j}\mathcal{X}\mathcal{i}\mathcal{j}+\in\:\mathcal{i},\:\:\:\:\:\:\in\:\mathcal{i}\sim\:\text{N}\left(0,{{\sigma\:}}_{\mathcal{y}}^{2}\text{}\right)$$
Where \(\:\mathcal{Y}\mathfrak{i}\) is the dependent variable (GDP), \(\:{\alpha\:}\) is the intercept (global level effect), \(\:\beta\:\mathfrak{j}\) are the regression coefficients for the independent variables, \(\:\mathcal{X}\mathcal{i}\mathcal{j}\) are the independent variables, \(\:\in\:\mathcal{i}\) is the observation error (van de Schoot et al., 2021).
In Bayesian analysis, prior distributions are specified for each parameter to express initial beliefs about their possible values. In our study, the posterior distribution is sampled using Markov Chain Monte Carlo (MCMC) methods, specifically the No-U-Turn Sampler (NUTS) (Devlin et al., 2024). Normalized scaled dataset was used for the modelling. The dependent variable was GDP (current US$). Weakly informative priors were assigned to the regression coefficients (betas) and intercept term (alpha). Following the Bayesian model fit, posterior predictive checks were performed to assess the model’s ability to replicate the observed data. This allowed us to validate the model's fit to the data and check whether the predicted values matched the observed data patterns. A Normal distribution, centered at zero with a standard deviation of 10, was chosen for these parameters to check initial uncertainty (Angelopoulos & Bates, 2021). The likelihood function was modeled with the mean (mu) weighted by their respective regression coefficients (P. Zhu et al., 2021). The model also included a standard deviation parameter (sigma_y) for the observation error. The likelihood analysis generated separate plots for the intercept (`alpha`) and each of the regression coefficients (`betas`) which capture the uncertainty and distribution of these estimates.
2.7. MCMC Simulation in Bayesian Hierarchical Modeling
Markov Chain Monte Carlo (MCMC) simulation was employed to estimate the posterior distributions of the model parameters within the Bayesian Hierarchical Model (BHM) framework. MCMC is a powerful computational tool for Bayesian inference especially when dealing with complex, high-dimensional models (Vlachou et al., 2023). The MCMC sampling was carried out using the No-U-Turn Sampler (NUTS) which is an adaptive variant of the Hamiltonian Monte Carlo (HMC) algorithm (Hoffman et al., 2021). NUTS was chosen due to its efficiency in exploring high-dimensional posterior distributions and its ability to automatically adapt step sizes. The model priors were specified as normal distributions for both the intercept (alpha) and regression coefficients (betas). The standard deviation of the dependent variable was modeled using a half-normal distribution (Bakouch et al., 2021). Specifically, the MCMC algorithm generated samples for the posterior distribution of the intercept (alpha), regression coefficients (beta), and the error term (sigma_y) based on the likelihood function defined in the model. This allowed us to estimate the relationships between GDP (dependent variable) and economic policy uncertainties (independent variables). Four independent chains were run in parallel, with 1,000 warm-up (tuning) iterations followed by 2,000 sampling iterations for each chain. This resulted in a total of 8,000 draws from the posterior distribution with a target acceptance rate of 0.95. Tuning steps allowed the sampler to adjust key hyperparameters (such as step size) before starting the actual posterior sampling. A target acceptance rate of 95% was specified to balance exploration and stability in sampling. We assessed convergence of the MCMC chains using the R-hat statistic which measures the ratio of the between-chain and within-chain variances (Lambert & Vehtari, 2022). An R-hat value close to 1.0 indicates that the chains have mixed well and are likely sampling from the target posterior distribution.
2.8. Uncertainty Quantification
To quantify uncertainty in the predictions from the Bayesian Hierarchical Model (BHM), we conducted posterior predictive checks using the results stored in the BHM model saved trace file. The normalized dataset utilized for the BHM was loaded, along with the saved trace of the model. A posterior predictive analysis was performed to simulate predictions based on the model’s posterior distribution (Mulvey et al., 2024). This involved generating a set of predictive values for the dependent variable (GDP) based on the posterior distributions of the model parameters. The 95% credible intervals for the predicted values were calculated to assess the uncertainty surrounding the predictions (Mehrtash et al., 2020). The predicted values were visualized alongside the observed data to illustrate the model's predictive performance and the associated uncertainty.
2.9. Policy Prediction
The policy prediction was done to evaluate the impact of key macroeconomic variables based on the posterior distribution results obtained in the prior model fitting steps (Dharma et al., 2020). These results were used to predict policy outcomes under two different scenarios. Scenario 1 assumes higher levels of uncertainty in economic policies. For example, values for features such as monetary policy uncertainty and trade policy uncertainty would be set to higher levels that reflects a situation with significant economic turbulence. Scenario 2 assumes more moderate or stabilized economic conditions with lower uncertainty across key policy variables. For instance, monetary policy uncertainty and trade policy uncertainty would be set at lower levels that represent a more stable economic environment. The data for this analysis included the posterior distributions of the model parameters obtained from the BHM, such as the mean values of the intercept (alpha) and the coefficients (betas) which were stored in a CSV file. These values represent the underlying economic relationships modeled in the study. The policy prediction was based on a linkage function using the linear combination of predictors and their corresponding posterior means from the BHM. The general form of the linkage function used for policy prediction is:
$$\:{\mathcal{Y}}_{predicted}=\:\alpha\:+\:{\beta\:}_{1}.\:{X}_{1\:}+\:{\beta\:}_{2}.\:{X}_{2\:}+\dots\:\:{\beta\:}_{n}.\:{X}_{n\:}$$
where \(\:\alpha\:\) is the intercept, \(\:{\beta\:}_{\mathcal{i}}\) are the coefficients from the posterior results, and \(\:{X}_{\mathcal{i}\mathcal{\:}}\)represent the input features/policy variables (Z. Zhu et al., 2021).