Nine out of ten people today breathe air that exceeds the acceptable threshold of pollutants approved by the World Health Organization (WHO), making air pollution an international issue responsible for roughly seven million annual global deaths through stroke, heart disease, lung cancer, diabetes, asthma, and other chronic respiratory diseases (Goshua et al. 2021; Martins et al. 2021). Air pollution can primarily be defined in terms of air quality index (AQI), consisting of six standard pollutants, including carbon monoxide (CO), ozone (O3), particulate matter (PM10 and PM2.5), nitric oxide (NO), and nitrogen dioxide (NO2). These pollutants exist in the form of solid, liquid droplets, or gas molecules (Jacobson, 2005; Szczurek & Maciejewska 2015). They also cause hazy and smoggy air, unpleasant smells, eutrophication, and global climate change (Bartoletti & Loperfido 2010; Mishra & Goyal 2015; Noel De Nevers, 2010).
The problem of air pollution in Malaysia on the environment and human health has been a matter of serious concern (Al-Dhurafi et al. 2018; Alyousifi et al. 2018; Fauziah et al. 2021; Koo et al. 2020; Masseran & Safari 2020; Raffee et al. 2022; Tajudin et al. 2019; Usmani et al. 2020). To adequately handle this problem, the Malaysian government through the Department of Environment (DOE) increased the number of monitoring stations to sixty-five in rural, suburban, urban, and industrial locations (EQR, 2018).
The adverse effects of particulate matter (PM10 and PM2.5) atmospheric and microscopic air pollutants have since been established and identified as the most prevalent among other criteria pollutants in Malaysia affecting human health, animals, ecosystem, and environment (Fong et al. 2018; Manga & Awang 2018; Masseran & Safari 2020; Tajudin et al. 2019; Usmani et al. 2020). The sources of these particles are both natural and human made. Natural sources include radon, fog and mist, volcanic eruptions, forest fires, soot, and salt spray. The anthropogenic emission on the other hand includes motor vehicle emission, heat and power generation, industrial activities, waste incineration, and residential cooking.
The particulate matter pollutants cause numerous medical disorders in the human body such as tuberculosis and other cardiopulmonary diseases, lung cancer, asthma, cardiovascular diseases, bronchitis, premature death and reduces life expectancy (Bo et al., 2022a; Gao et al., 2022; Lala et al., 2023; Liu et al., 2022; Lu et al., 2023; Silva et al., 2013; Wang et al., 2022). Moreover, particulate matter emission on vegetation's surface reduces the rate of photosynthesis and changes in decomposition cycles which affects animal groups and plants themselves (Correa-Ochoa et al., 2022; Grantz et al., 2003; Vincenti et al., 2022). The problem of air pollution in Malaysia, in particular the PM10 caused by haze episodes and other natural and human causes, has been identified as one of the major contributing factors to the upsurge in hospital admissions for the treatment of cardiovascular and other pulmonary diseases, which mainly affects children and the elderly. According to estimates, the country's annual inpatient cost due to haze episodes is roughly USD 91,000. It is estimated to have increased the death rate to 19 percent (Latif et al., 2018).
Sansuddin et al., (2011) analyzed the concentration of PM10 atmospheric pollutant at four different Malaysian air monitoring stations of industrial and residential categories over the years 2000 and 2003 using Gamma distribution, log-normal distribution, autoregressive model (AR), moving average model (MA) and the autoregressive moving average (ARMA) model. They observed high emissions of the pollutant in all the considered monitoring stations during the year 2002. The Gamma distribution best fitted the emission of the pollutant at the industrial location (Nilai and Johor) while the log-normal distribution best fitted the emission in stations (Kota Kinabalu and Kuantan) of residential category. However, PM10 events for the year 2003 were also estimated and the AR (1) simple model outperforms other time series models.
Ramli et al., (2023) investigated the performance of the Bayesian Model Averaging (BMA) by predicting the next day’s PM10 emission in Malaysian Peninsular using seventeen-year air quality monitoring data from nine monitoring stations. The forecast accuracy of the model was examined using five different measures including Index of Agreement (IA), Kling-Gupta efficiency (KGE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). They recommended the BMA model to be one of the predicting instruments for forecasting air pollutants, especially the PM10, having displayed impressive performance in predicting the PM10 concentration for\({R}^{2}\) value starting from 0.64 to 0.75 with IA measure starting from 0.84 to 0.91 while KGE starting from 0.61 to 0.74 in all the nine monitoring stations.
Bahiyah et al., (2017) investigated the concentration of PM10 pollutants in three air monitoring stations namely, Taiping, Tanjung Malim, and Pegoh in the state of Perak using the year 2015 air quality data. The highest concentration of the pollutant was observed in the urban category, followed by industrial and then sub-urban. The emission of the pollutant was observed to have reached its peak in all monitoring stations as the year approaches its end specifically in September and October and decreased sharply in all monitoring stations during November. The peak period is associated with the wind from the dry southern monsoon season usually occurring between June and October, while the decrease might be due to washing away most of the pollutants in the atmosphere due to excessive rainfall. The wind flow during the period of transboundary haze event was traced using the Hybrid-Single Practice Lagrangian Integrated Trajectory (HYSPLIT) model suggesting anthropogenic sources of the pollutant are significantly contributing to high emission of PM10 in the air.
Sentian et al., (2019) investigated the long-term temporal behavior of air pollutants including PM10 in twenty air quality monitoring stations across Malaysia over the period 1997 through 2015 using the Mann-Kendall test to examine the trend of the contaminant and HYSPLIT backward trajectory model to trace the source of PM10 atmospheric pollutant. The annual average concentration of PM10 varied significantly between monitoring stations while the annual average for other pollutants varied less with a lower coefficient of variation. Furthermore, the concentration of PM10 contaminant was found to be increasing in five monitoring stations between the years and decreasing in the remaining fifteen stations. The result of the HYSPLIT model showed that high seasonal PM10 events in most monitoring sites were due to transboundary pollution from neighboring Indonesia during the period of southwest monsoon.
Alyousifi et al., (2020) applied the Spatial Markov Chain (SMC) model to the daily maximum air pollution index (API) over three years (1 January 2012 to 31 December 2014) collected from 37 Malaysian air monitoring stations. The transitional probability for the station with healthy or good emission of the pollutant to remain in a healthy state is 0.814 given that its neighboring stations are in similar condition, and 0.708 for its neighboring stations are in a moderate state. The proportion of time for the station with healthy emissions to remain in a healthy state is 0.6 if its neighboring stations are in a healthy state. This proportion decreases to 0.4, 0.01, and 0 when the neighboring stations are in states of moderate, unhealthy, and very unhealthy emissions respectively. This translates to mean stations or regions with unhealthy state of air quality possess greater chances of causing closer stations or regions to be in the same state. They maintained that the concentration of Malaysian air pollution problems is in the southwest and the central western part of Peninsular.
Mohtar et al., (2018) studied the variation of major air pollutants in Malaysia under different seasonal conditions using the hourly dataset for four continuous air monitoring stations namely, Petaling Jaya, Shah Alam, Klang, and Cheras over the period 1 January 2005 through 31 December 2015. Nitrogen dioxide happens to be a lone gaseous that shows high variation between the considered monitoring sites while other gaseous pollutants show little variation in their emissions between the study sites. The variation of O3 and PM10 pollutants was observed to be seasonal and most of the time exceeded the threshold of the Malaysia Ambient Air Quality Standard (MAAQS) and the National Ambient Air Quality Standard (NAQSS). Furthermore, in all the study sites, high emissions of O3 were observed during the northeast monsoon (January to March) while peak emission of PM10 occurred during the southwest monsoon (June to September).
Manga and Awang (2018) modeled and predicted the spatiotemporal PM10 events for thirty-four air monitoring stations over the period 1 January to 31 December 2011 using the Bayesian autoregressive method. They observed a decrease in PM10 emissions as altitude increases. With increasing altitude, they noticed a decline in PM10 emissions. Their findings confirm the findings of other researchers that temperature and location affect PM10 emissions.
AL-Dhurafi et al., (2018) studied the Malaysian API data as compositional data by examining the proportion of each pollutant in determining the API value to identify the most dominant pollutant in the API dataset and conducting a 12-month forecast using the best of Vector Autoregressive (VAR) models. The API dataset for Klang City in Malaysia over the period January 2005 to December 2014 was utilized to accomplish the aims of the study. Each pollutant listed in the API was regarded as an independent component and the contribution of each component to the API was measured, the PM10 pollutant was found to be the most contributing and dominant pollutant among all others in the study area. However, the VAR (1) simple model happens to be the best-performing model among all other candidates in forecasting the twelve-month forecasts based on the Akaike Information, Final Prediction Error, Schwartz, and the Hannan-Quinn criteria.
The purpose of this paper is to primarily investigate the true status of long memory statistical issue and volatility persistence in the level of particulate matter of size 10 microns or less in diameter (PM10) atmospheric pollutant using both the original and subseries that encountered structural break. In about two decades many research interests were on confusing long memory with occasional structural break in mean since a stationary short memory \(I\left(0\right)\) the process with a structural break in the mean can show a slower rate of decay in the Autocorrelation function (ACF) and other properties of the long memory process (Cappelli & Angela 2006; Charfeddine et al. 2012; Diedold & Inoue 2001; Granger & Hyung 2004; Jensen & Liu 2006). Therefore, long memory behavior in the time series process can spuriously be detected due to the neglected problem of a structural break in the series by showing some hyperbolic decay in the Autocorrelation Function (ACF) as an observed behavior of long memory. Furthermore, our systematic approach for long memory analysis will also be extended to investigating the level of variation in PM10 events by applying the Autoregressive Conditional Heteroskedasticity (ARCH) and the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models on both original and subseries that encountered structural break. The paper will also study the behavior of the probability distribution of both original and subseries concerning normal distribution and study the nature of the flow of the pollutant using the coefficient of Skewness and Kurtosis, respectively. Some time series and statistical plots are also provided as well as some descriptive statistics for the concentration of the pollutant at selected monitoring sites.
Persistence as a statistical issue is defined as the presence of significant dependence between time observations that are far apart in time. Persistence may be identified through autocorrelation function (ACF) when its plot decays hyperbolically as lag increases contrary to the exponential decay of short memory processes such as in the autoregressive moving average (ARMA) process. It could also be identified by an increase in the spectral density function without limit as the frequency tends to zero, and lastly, by the rescaled adjusted range behaving as a function\({n}^{h}\) where \(h>\frac{1}{2}\), \(n\) is the sample size, instead of\({n}^{1/2}\) feature of the short memory process. The last form is also known as the Hurst phenomenon (Hosking, 1984).
Numerous researchers investigated the degree of long memory of air pollutants all over the world, for example, Windsor and Toumi (2001) confirmed the evidence of long memory in the level of ozone, PM10, and PM2.5 pollutants in the United Kingdom for up to 400 days. The results of their Kurtosis statistic show non-Gaussian statistics and high intermittency in the case of PM10 and PM2.5 pollutants but nearly Gaussian and low intermittency in the case of ozone. Pan and Chen (2008) proposed a control chart for monitoring the autocorrelated long memory particulate matter (PM10) dataset in Taiwan. The control chart with the autoregressive fractionally integrated moving average (ARFIMA) model performed better than the chart with the autoregressive integrated moving average (ARIMA) model as confirmed by their result. In a related development, Chelani (2012) uses ten years (2000–2009) data to confirm long-term memory in the level of three pollutants, namely, carbon monoxide (CO), nitrogen dioxide (NO2), and ozone (O3), at traffic area of Delhi in India using the method of detrended fluctuation analysis (DFA). However, the method of rescaled range (R/S) analysis of long memory investigation was used to investigate the degree of persistence in the case of ozone, particulate matter, nitrogen dioxide, and sulfur dioxide pollutants in Mexico City during the years 1999 to 2014 period. The degree of persistency was observed to be higher during the rainy season and lower during the dry season (Meraz et al., 2015).
In another development, Barros et al. (2016) used fourteen-year (2000–2013) monthly data for energy production to investigate the statistical issues of persistency and seasonality in Brazil. The result indicates the presence of a mean reverting form of persistency with a single structural break in two of the reviewed series. However, seasonality is found to be an essential issue in modeling long memory for energy production. Barros et al. (2016) examined five components of global carbon dioxide emissions and their per capita for the years 1751 through 2009 using the long-range dependence technique. Their results revealed that the series possesses a permanent form of persistency, and the cement production component of the pollutant’s emission has the highest degree of persistency when compared with other components of gas, liquids, solids, and gas flaring. The series was observed to have encountered structural breaks with the highest degree of persistency after the Second World War. Persistence behavior in the level of five criteria pollutants was confirmed at four major industrial cities in China, namely, Guangzhou, Shanghai, Shenzhen, and Beijing for the period September 28 for the year 2013 until December 12 for the year 2015. Observed persistence was more severe in the cities of Shenzhen and Guangzhou compared to the remaining cities under review (Chen et al. 2016). Moreover, Gil-Alana and Solarin (2018) studied the impact of the United States (US) environmental policies on nitrogen oxides (NOx) and volatile organic compounds (VOC) pollutants for fifty years (1940–2014) and investigated the degree of persistence of these pollutants during the period. Their results revealed that all the shocks in the pollutants are permanent especially in the case of nitrogen oxides by displaying orders of integration significantly greater than 1. The US environmental policies on pollutants are yet to be effective by observing negative trends within the period covered by the research. For example, the emission of these pollutants and their per capita in the year 2014 are lesser in concentration than in the year 1970.
Yaya et al. (2020) investigated air quality in the ten most polluted cities of California, USA, by looking at statistical issues of seasonality and persistence. All pollutants were found to be seasonal in the ten cities. Persistence was not found in the level of PM2.5 but in CO and O3 pollutants. Gil-Alana and Trani (2019) estimated the time trend and order of integration for CO2 emission across member states of the European Union (EU) from the year 1960 to 2013. Their study also examined the average CO2 emission of the whole EU as a case alongside those of China and the US. Their results revealed the lone evidence of a decreasing trend in the United Kingdom (UK) and a transitory form of persistence while all other members of EU states reported an increase in the trend of the contaminants with permanent shocks in the level of pollutants. Also revealed, was another transitory form of persistence in the case of China but permanent shocks for the whole EU and US. Gil-Alana et al. (2020) further confirmed the decrease in the emission of some criteria pollutants in the London metropolis but with strong evidence of persistence across the pollutants. Caporale et al. (2021) used daily records for the particulate matter (PM10) emission to investigate the trend and long-range dependence using the procedure of fractional integration framework in eight European capitals, namely, Amsterdam, Berlin, Brussel, Helsinki, London, Luxembourg, Madrid, and Paris for seven years (2014–2020). Temporary shocks in the level of pollutants were confirmed with a decrease in the fluctuation of the contaminant in Berlin, Brussels, and Paris.
The method of fractional integration was also found useful in other environmental data such as temperature and climatology (Bloomfield, 1992; Bruneau et al. 2020; Franzke, 2012; Gil-Alana, 2005, 2009, 2012; Pelletier & Turcotte 1997; Rybski et al. 2006; Vyushin & Kushner 2009; Yuan et al. 2019; Yusof et al. 2013), economics, and financial time series (Abbritti et al. 2016; Backus & Zin 1993; Connolly et al. 2005; Murialdo et al. 2020). Long memory detection avails researchers the opportunity to study long memory properties of the time series process under review and gives an idea of whether the persisting effect of a particular contaminant (if present) on the environment is temporary, permanent, or explosive.
Time series structural break occurs when there is a change in the appropriate method adopted for observing and defining a time series variable over time. These breaks could also be due to a single or multiple changes in used classification, definition of the variable, coverage, or instrument during the period of measuring the time series variable of interest. A time series process that encounters one or more structural breaks is said to have one or more discontinuities in its data-generating process (DGP), and the original series will be divided into subpopulations based on the number of structural breaks in the series. The statistical properties of these subseries within a broken regime need to be investigated with the view of getting more insight into each subseries based on their respective break dates. The procedure of estimating the break dates for each break regime could be accomplished within the structure of least squares regression (Ploberger and Krämer 1992).
The motivation for this paper is to amongst other things reduce wide gaps in research on air pollution studies in Malaysia concerning human health and the environment compared to neighboring and advanced countries (Usmani et al. 2020). To the best of our knowledge, little or no has been done to investigate the long memory and volatility problems for PM10 pollutants in Malaysia, hence our article is the first to investigate issues of long memory and volatility persistence of air pollutants through change point detection. The remainder of this article is structured as follows: Section 2 describes the study area, data, and methodologies employed. Section 3 gives definition of some relevant terms used in the study, while Section 4 discusses the result while concluding remarks are provided in Section 5.