A Hybrid Approach to Forecasting Water Quality in Urban Drainage Systems

doi:10.21203/rs.3.rs-2118063/v1

Download PDF

Research Article

A Hybrid Approach to Forecasting Water Quality in Urban Drainage Systems

https://doi.org/10.21203/rs.3.rs-2118063/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The pollutant monitoring in urban sewer systems is currently done by captors based on UV-Vis spectrometry. These have the purpose of determining the dynamics of water quality indicator concentrations. Analyzing time series and UV-Vis absorbance spectra is necessary to develop forecast methodologies as supporting tools for online and real-time control support. This work presents a hybrid approach based on seven methodologies combined with Principal Component Analysis (PCA) for UV-Vis absorbance time series forecasting, such as: Principal Component Analysis combined with Discrete Fourier Transform (PCA/DFT), PCA combined with Chebyshev polynomials, PCA combined with Legendre polynomials (PCA/L-Poly), PCA combined with Feed-forward Artificial Neural Networks (PCA/ANN), PCA combined with Polynomial (PCA/PolyReg), PCA combined with SVM (PCA/SVM) and Clustering process combined with Markov chains (kmMC). Three UV-Vis absorbances time series collected online were used at different experimental sites in Colombia, two in Bogotá and one in Medellin. The Mean Absolute Percentage Error (MAPE) values were obtained between 0% and 57% for all the study sites. Results show that it is impossible to determine the best forecasting methodology among the proposed ones because they would complement each other for different forecasting time steps and spectra ranges according to the target water quality.

Artificial Neural Networks

Discrete Fourier Transform

Polynomial transforms

Principal Component Analysis

UV-Vis time series forecasting

Water Quality

In the last two decades, the development and application of UV-Vis spectrometry have been carried out for the study of water in urban drainage, which allows researchers to record in situ absorbances in UV and Visible ranges for different types of water matrices (Langergraber et al. 2004; van den Broeke 2007; Zhu et al. 2021; Xue et al. 2022). The probes record in continuous measurement in discrete time, parameters commonly studied in wastewater, for example, Total Suspended Solids (SST) or Chemical Oxygen Demand (COD). UV-Vis captors are online sensors that record the attenuation of light (absorbance) and provide a single absorbance value for each wavelength measured in a predefined spectrum (Gruber et al. 2006). Mainly, these captors are used to monitor the pollutant load dynamics, for example, Nitrite-Nitrogen NO₂-N, Nitrate-Nitrogen NO₃-N, Biochemical Oxygen Demand BOD₅, COD or SST, in urban drainage systems, including drainage systems, sewerage, wastewater treatment plants (WWTPs) and receiving waters (Rieger et al. 2004; Lepot et al. 2016; Hu and Wang 2017; Lin et al. 2022). Thus, the possibility of real-time control (RTC – Real-time Control) of urban water systems is open, taking into account not only the aspects of water quantity but also the evolution of water quality in a short time (US-EPA 2006; Campisano et al. 2013; Sun et al. 2017; Happel et al. 2022; Zhu et al. 2022). Several tools must be developed to implement RTC systems, including water quality forecasting methods (Loc et al. 2020). Even though there are some experiences in the literature related to the forecasting of water quality time series (e.g., Faruk 2010; Ramin et al. 2012; Riesco et al. 2014; Loc et al. 2020; Kim et al. 2022; Islam et al. 2022), the authors are aware of some experiences (e.g., Plazas-Nossa and Torres 2014, 2015; Hernández et al. 2017; Plazas-Nossa et al. 2017) have been reported for forecasting UV-Vis time series for short sampling times (acquisition time in the order of 1 min). These works are commonly based on a single forecasting method, presenting different types of forecast quality for each time series. However, based on the above experiences, various methods are available and could be combined to improve the accuracy of the forecast results. The forecast values of the UV-Vis absorbance time series can be transformed into water quality indicators (i.e., NO₃-N, dissolved COD, SST) and could be used for decision making and for time control applications (US-EPA 2006; Sanguanduan and Nititvattananon 2011; Poch et al. 2012; Campisano et al. 2013; García et al. 2015; Garcia et al. 2016; Hornsby et al. 2016; Tzimas 2017; Jin et al. 2019; Islam et al. 2022; Suárez-Almiñana et al. 2022).

Some previous experiences use different methodologies for water quality multivariate forecasting (e.g., Plazas and Torres 2015; Plazas et al. 2015; Plazas et al. 2017; Park et al. 2020; Thai-Nghe and Thanh-Hai 2020). However, it is impossible to identify a single best forecast tool for all wavelengths or forecast time horizons. Despite the above, it seems possible to include them in an integrated forecasting system to select the best forecast value at different time steps and wavelengths, depending on the target water quality indicator. Therefore, the best option seems to be a system capable of adapting, depending on the behavior of the signals received in the UV-Vis spectrum.

This work proposes a hybrid approach based on different methodologies for analyzing and forecasting time series. Forecasting methodologies, based on discrete signals, were applied to capture the dynamic behavior of the time series, such as the Discrete Fourier Transform (DFT) (Proakis and Manolakis 2007), Polynomial Regressions such as Chebyshev (Boyd 2000; Canuto et al. 2006) and Legendre (Kopriva 2009), machine learning techniques such as Feed-forward Artificial Neural Networks (ANN) (Solomatine 2002), Machines with Support Vectors (Support Vector Machines - SVM) (Vapnik et al. 1997; Kandananond 2013).

The spectro::lyser™ (s::can company) submersible UV-Vis collectors measure approximately 65 cm in length, with a diameter of 44 mm. They are designed to record the attenuation of light (absorbance) almost continuously (one signal per minute). To provide light to the captor, it has a xenon lamp that generates wavelengths from 200 nm to 750 nm at 2.5 nm intervals (Langergraber et al. 2004; s::can 2006). Three UV-Vis time series with 5705 absorbance spectra were recorded at the following locations: (i) El-Salitre WWTP tributary from June 29, 2011, at 9:03 a.m. to July 3, 2011, at 5 p.m.: 33 h (readings every minute) in Bogotá; (ii) Gibraltar Pumping Station (GPS) from October 18, 2011, at 4:17 p.m. to October 22, 2011, at 3:21 p.m. (readings every minute) in Bogotá; and (iii) WWTP San Fernando tributary from September 24, 2011, at 06:04 a.m. to October 2, 2011, at 09:16 a.m. (readings every 2 min) in Itagüí, part of the Medellín metropolitan area, as shown in Fig. 1. For all UV-Vis absorbance time series, 4320 values were used for calibration, and 1385 values were used for testing.

Each UV-Vis absorbance time series comprises 219 wavelengths and requires a dimensionality reduction procedure. High dimensionality is a serious problem for machine learning, data mining, and pattern recognition tasks involving high-dimensional data (Zhang et al. 2016; Ayesha et al. 2020). Thus, various methods have been introduced in recent decades (Krawczak and Szkatula 2014). Reducing dimensionality is an important strategy to address this problem. By reducing the dimensionality, the new representation of the data is much smaller in volume than the original data set (Zhang et al. 2016; Zhu et al. 2016; Sengodan 2021). Dimensionality reduction algorithms are based on mathematical transformations to convert the original high-dimensional data space into a lower-dimensional feature space (Zhu et al. 2016; Ayesha et al. 2020; Sengodan 2021).

The color scale, in Fig. 1, is used as a visual absorbance amplitude indicator and it represents the presence of determinants, commonly monitored in wastewater (Plazas-Nossa et al. 2017). This color scale is proposed based on van den Broeke (2007): (i) in the UV range, the dark purple represents determinants such as Nitrites NO₂, Nitrates NO₃ and COD; (ii) in the visible (Vis) range, the violet, blue, green, yellow, orange and red colors represent determinants such as turbidity and total suspended solids. The “Time” axis depicts each of the spectrum captured by the captor (Plazas-Nossa et al. 2017).

Reducing dimensionality implies less processing time compared to the processing time of each time series for each wavelength. PCA was used to reduce dimensionality for the present work, combined with each methodology proposed for forecasting. In addition, previous experiences have shown that the forecast can improve if PCA is applied before the forecasting methodology: Plazas and Torres (2014) have shown that the PCA/DFT forecasting methodology systematically presented lower forecast errors and variability than those obtained using only the DFT procedure without PCA. PCA, proposed by Pearson (1901), performs a linear transformation from the original data set and finds a new coordinate system. In this new coordinate system, the first axis, called the first principal component (PC), captures the highest variance value of the data set; the second axis captures the second highest variance, and so on. The covariance matrix must be obtained to construct this linear transformation. The objective is to transform a given set of X data, with dimensions n x m, into another data set of lower dimension n x l, with a minimum loss of useful information (Juhos et al. 2008; Shlens 2009; Chowdhury and Husain 2020). For more information, see Plazas-Nossa and Torres (2014). The number of principal components is determined from the variance of the eigenvalue (eigenvalue), keeping only those PCs whose deviations are greater than or equal to one (eigenvalue > = 1). It is based on the Kaiser cutoff rule (Kaiser 1960; Jolliffe 2002; Chowdhury and Al-Zahrami 2014). The above procedure applies to the range of UV-Vis absorbance spectra (200 nm − 745 nm) over the three UV-Vis absorbance time series with the same length (5705 records).

Various Machine Learning techniques were tested, and it was possible to capture the behavior of the time series in the calibration stage, such as Artificial Neural Networks (ANN) (Solomatine 2002; Russell and Norving 2010; Zhu et al. 2022), Machines with Support Vectors (SVM) (Vapnik et al. 1997; Kandananond 2013; Priyadarshini et al. 2022) and Clustering process (k-means) (Saha and Manickavasagan 2021) combined with Markov Chains (Vrugt et al. 2013; Ginting et al. 2014; Okwuashi and Ndehedehe 2021), called kmMC. Machine Learning methods, a subfield of computer science called artificial intelligence, are based on the analysis and creation of algorithms that can be trained and constructed from time series values (data information). In recent years, ANNs have been used successfully for forecasting purposes to obtain one-step prognostic values as the horizon. They are accepted by different disciplines, being suitable due to their information processing characteristics, for example, non-linearity, parallelism, tolerance noise, learning, and generalization abilities (Yang et al. 2008; Young et al. 2015), especially for their ability to discover non-linear relationships (Faruk 2010; Ohana-Levi et al. 2022). Some experiences have shown that ANNs can be a promising technique for water quality forecasting (West and Dellana 2011; Martin et al. 2011; Riesco et al. 2014; Elbisy et al. 2014; Ouma et al. 2020; Uddin et al. 2022) due in particular to its ability to cope with a high number of inputs (multivariate data or training time steps), taking into account the non-linearities in noisy data sets, characteristics exhibited by the time series of water quality captured online (Solomatine 2002).

Another forecasting methodology is SVM, originally developed by Vapnik et al. in 1995. SVM are supervised learning models associated with learning algorithms that analyze data and recognize patterns. SVM are learning machines that apply the inductive principle of minimization of structural risks to obtain a good generalization in a limited number of learning patterns. SVM is a method based on the construction of hyperplanes in a multidimensional space, and it is used for classification and regression tasks, handling multiple continuous and categorical variables (Kandananond 2013; Imani et al. 2014; Uddin et al. 2022). SVMs are used for many machine learning tasks, such as pattern recognition, object classification, and, in time series forecasts, regression analysis (Sapankevych and Sankar 2009; López-Kleine and Torres 2014; Dilmi and Ladjal 2021). Different methodologies such as Fuzzy Logic, SVM, and Data Assimilation (DA) have been used and reported by Kim et al. (2014b), Tan et al. (2012), and Kim et al. (2014a). Finally, Markov Chains is a mathematical system based on transitions from one state to another. It is a random process generally characterized as a memoryless process: the next state depends only on the current state and not on the sequence of events that preceded it. This specific type of "forgetfulness" is called the Markov property (Vrugt et al. 2013; Ginting et al. 2014; Okwuashi and Ndehedehe 2021). Many researchers have applied the k-means clustering technique as a complementary tool for forecasting purposes (Zhang and Zhu 2012; Venkatesh et al. 2014; Cheng et al. 2015; Dong et al. 2020). Therefore, cluster analysis represents another viable option as it addresses the underlying multivariate data structure, natural classification, and compression (Jain 2010; Martin et al. 2011; Farrou et al. 2012; Riesco et al. 2014; Saha and Manickavasagan 2021).

In summary, the forecasting techniques used in this work are: (i) Principal Component Analysis (PCA) combined with Discrete Fourier Transform (DFT) - (PCA/DFT) proposed by Plazas-Nossa and Torres (2014); (ii) PCA combined with Chebyshev polynomials (Kopriva 2009) (PCA/Ch-Poly); (iii) PCA combined with Legendre polynomials (Kopriva 2009) (PCA/L-Poly); (iv) PCA combined with Feed-forward Artificial Neural Networks (PCA/ANN) proposed by Plazas-Nossa et al. 2017; (v) PCA combined with Polynomial regression (Barca et al. 2015; Han et al. 2016) (PCA/PolyReg); (vi) PCA combined with SVM (Vapnik et al. 1997; Kandananond, 2013) (PCA/SVM); (vii) Clustering process combined with Markov chains (kmMC) proposed by Plazas-Nossa et al. (2015).

This work proposes a methodology for forecasting UV-Vis absorbance time series through automatic analysis and choosing the best water quality prediction method among those previously described. Therefore, each absorbance time series's proposed procedure takes 4320 values for calibration, 1385 values for the test, and the forecast is made for 360 values. Each forecast value for each study site means (i) 6 hours for WWTP El-Salitre and GPS; and (ii) 12 hours for the San Fernando WWTP. Subsequently, the absolute percentage error value (Absolute Percentage Error - APE) (Bowerman et al. 2005; Kim et al. 2022; Said et al. 2022) (used as a performance indicator to evaluate and compare the seven different approaches) is calculated for each wavelength and every 30 time-steps. Therefore, based on the MAPE values (average APE value), the performance of each forecast methodology (lowest MAPE value) is established. The same process is repeated every 30 time-steps and is performed over seven iterations and a 6-hour forecast time horizon. Figure 2 shows a summary of the proposed hybrid forecasting system.

For all study sites, the number of PCs varies between 2 and 4, capturing more than 98% of the variability of the problem. Figure 3 shows the forecasting results for GPS absorbance time series. As an example, figures 3(a) and 3(b) show absorbance forecasted values for 200 nm (first wavelength of UV spectrum) and 382.5 nm (first wavelength of Vis spectrum). Figures 3(c) and 3(d) show at each iteration (1-22) the best and worst forecasting methodology, respectively, which is indicated in a heat-map representation using the corresponding colors scale. The color code is used to tag the best forecasting methodology in each iteration and each wavelength: PCA/DFT as red, PCA/Ch-Poly as green, PCA/L-Poly as blue, PCA/ANN as cyan, PCA/PolyReg as violet, PCA/SVM as yellow and kmMC as grey. Figures 3(e) and 3(f) show the best and worst forecasting methodology histogram, respectively. Similar results were obtained for El-Salitre WWTP and San Fernando WWTP.

Results are hardly generalizable, but it is possible to highlight that every forecasting methodology is necessary for this proposed hybrid system because each UV-Vis absorbance time series at each study site has its behavior. Therefore, no one forecasting methodology was the best at all iterations: PCA/DFT, PCA/Ch-Poly at least once were the best forecasting methodology, kmMC was in third place, followed by PCA/L-Poly, PCA/SVM and PCA/PolyReg. On the other hand, PCA/ANN was several times as best forecasting methodology.

The errors obtained with the proposed hybrid forecasting system were systematically lower than those obtained for each single forecasting methodology. For example, for PCA/DFT forecasting methodology, the APE range is between 0% and 80% for all the study sites and the wavelength ranges. In contrast, with the hybrid forecasting system proposed, the APE range is between 0% and 57% for all the study sites and the wavelength range. In addition, results show that the forecasting system proposed is guided by obtaining more accurate results. Finally, general behaviors of the forecasted UV-Vis absorbance time series are closer to the observed ones. Figure 4 shows the forecast results for the El-Salitre WWTP absorbance time series. As an example, Figures 4(a) and 4(b) show MAPE values for 200 nm (first wavelength of the UV spectrum) and 382.5 nm (first wavelength of the Vis spectrum). Figures 4(c) and 4(d) show in each iteration (1-22) the best and worst forecasting methodology, respectively, which is indicated in a heat-map representation using the color scale. Figures 4(e) and 4(f) show the histogram obtained for the best and worst forecasting methodology, respectively. Similar results were obtained for GPS and PTAR San Fernando.

Figure 4 shows the following results obtained for the El-Salitre WWTP: (i) at 200 nm (4(a)) PCA/DFT and PCA/PolyReg have similar behavior with MAPE values between 2% and 3%, the range of MAPE values for PCA/L-Poly, PCA/ANN and PCA/SVM was between 1% and 4%, the range of MAPE values for PCA/Ch-Poly and kmMC was between 2% and 8%; (ii) at 382.5 nm (4(b)) PCA/DFT and PCA/PolyReg have MAPE values between 8% and 10%, the range of MAPE values for PCA/L-Poly, PCA/ANN and PCA/SVM was between 6% and 15%, the range of MAPE values for PCA/Ch-Poly was between 4 and 25% and the MAPE values for kmMC were between 10% and 30%; (iv) the best forecasting methodology (4(c) and 4(e)) in descending order PCA/Ch-Poly, PCA/L-Poly, PCA/DFT, PCA/ANN, PCA/PolyReg and PCA/SVM; (v) worst forecast methodology (4(d) and 4(f)) in descending order PCA/Ch-Poly and kmMC.

Table 1 summarizes the results obtained for the three UV-Vis absorbance time series, as forecast results considering the MAPE values in each iteration using the proposed forecasting methodologies, identifying the three best forecasting methodologies for each study site and each range of spectra. The results in Table 1 are not generalizable. Still, it is possible to highlight that all forecasting methodologies are necessary for this proposed hybrid system because each UV-Vis absorbance time series at each study site has its behavior. Therefore, one single forecasting methodology could not be found as the best for all iterations: PCA/DFT, PCA/L-Poly, PCA/ANN, and PCA/SVM at least once found to be the best forecasting methodology, PCA/L-Poly, PCA/PolyReg, and kmMC were found in second place. However, PCA/Ch-Poly was never chosen as the best forecasting methodology.

For UV-Vis absorbance time series, dimensionality reduction is necessary for practical reasons, such as reducing processing times, which is a key aspect for real-time control (RTC) or decision making in real-time (Real-Time Decision-Making - RTDM). In addition, it was observed that, for the UV-Vis absorbance time series, this reduction implies a very low loss of relevant information, which could be explained due to the behavior of the UV-Vis absorbance spectra: high correlations between absorbances for longitudes nearby waveforms. The technique used in this work to reduce the problem's dimensionality was PCA as a linear transformation, following the results obtained by Plazas-Nossa and Torres (2014).

This article proposed a hybrid approach to analyze and forecast UV-Vis absorbance time series. The UV-Vis absorbance time series forecast was performed using Discrete Fourier Transform (DFT), Chebyshev and Legendre Transformations combined with Principal Component Analysis (PCA), Artificial Neural Networks (ANN), Polynomial Regression (PolyReg), and k-means clustering (km) combined with Markov-Chains (MC). After the application of the proposed hybrid forecasting system to three data sets of absorbance time series of UV-Vis spectra, in the same number of study sites, the results obtained show that it is possible to obtain valuable forecast results by considering different techniques and approaches for each moment in the series and each time step. The interest of the proposed hybrid method refers to an automatic selection of each method for each time step, according to the performance evaluation, which depends on each time series's behavior. The results show the need to propose a dynamic approach for forecasting UV-Vis absorbance time series in water quality applied to RTC tasks in urban sewage systems (online decision-making, online operation, etc.).

The results of different comparisons for various forecasting methodologies highlight that it is impossible to determine the best forecasting methodology among the proposed ones because they could all provide overall forecast values that would complement each other and for different forecast time steps and spectral ranges (UV or Vis). Therefore, a hybrid methodology can be applied, assuming each forecast methodology provides the best value for a specific wavelength or forecast horizon.

Specific processing time trials are required to assess the potential benefits of the proposed hybrid methodology for RTC and RTDM purposes in urban drainage systems. In addition, it is recommended to apply PCA to include other variables used as covariates, such as information on rainfall, pH, conductivity, temperature, etc. (variables used to guide the forecast of a primary variable) in a wide range of time series forecasting problems. In addition, it is possible to apply the proposed hybrid forecasting system to univariate time series, such as discharge time series (water flow).

Finally, the proposed hybrid forecasting system could be applied in different urban water systems such as pumping stations, built wetlands, a wide range of WWTP locations (tributaries, effluents), urban streams, urban rivers, and natural receiving bodies.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Leonardo Plazas-Nossa and Andres Torres. The first draft of the manuscript was written by Leonardo Plazas-Nossa and Andres Torres and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Availability of data and materials

Data availability Majority of the data were presented in this manuscript. Additional raw data may be made available upon request.

Ethical Approval

Ethics approval and consent to participate Not applicable.

Consent to Participate

Consent to Participate Not applicable.

Consent to Publish

Consent for publication Not applicable.

Competing Interests

Financial interests: Author Plazas-Nossa and Torres declare they have no financial interests.

ACKNOWLEDGEMENT

Authors acknowledge Bogotá Water and Sewage Company (Empresa de Acueducto y Alcantarillado de Bogotá – EAAB, under the Administrative Contract No. 9-07-25100-0763-2010) and Medellín Water and Sewage Company (Empresas Públicas de Medellín – EPM) for providing the information used in this research.

Ayesha, S., Hanif, M. K., & Talib, R. (2020). Overview and comparative study of dimensionality reduction techniques for high dimensional data. Information Fusion, 59, 44-58. https://doi.org/10.1016/j.inffus.2020.01.005
Barca, E., DelMoro, G., Mascolo, G. and DiIaconi, C. (2015). Gross parameters prediction of a granular attached biomass reactor through evolutionary polynomial regression. Biochemical Engineering Journal, 94, 74–84. https://doi.org/10.1016/j.bej.2014.11.016
Bowerman, B., O´Connell, R, and Koehler, A. (2005). Forecasting, time series, and regression: an applied approach. 4th ed. Thomson Brooks/Cole. Belmont, California. ISBN: 978-0534409777.
Boyd, J.P. (2000). Chebyshev and Fourier Spectral Methods. Second Edition. Dover Publications, Mineola-New York-USA. 2000. ISBN: 978-0486411835
Campisano, A., Cabot Ple, J., Muschalla, D., Pleau, M. and Vanrolleghem, P. A. (2013). Potential and limitations of modern equipment for real time control of urban wastewater systems. Urban Water Journal 10 (5), 300–311. https://doi.org/10.1080/1573062X.2013.763996
Canuto, C., Hussaini, M., Quarteroni, A. and Zang, T. (2006). Spectral Methods, Fundamentals in Single Domains. Springer-Verlag Berlin Heidelberg 2006. https://doi.org/10.1007/978-3-540-30726-6
Cheng, J., Gouchol, P., Yongmi, L., Hyun-Woo, P., Kwnag, K., Unil, Y. and Keun, H. (2015). A SOM clustering pattern sequence-based next symbol prediction method for day-ahead direct electricity load and price forecasting. Energy Conversion and Management 90, 84–92. https://doi.org/10.1016/j.enconman.2014.11.010
Chowdhury, Sh. and Al-Zahrami, M. (2014). Water quality change in dam reservoir and shallow aquifer: analysis on trend, seasonal variability and data reduction. Environ Monit Assess 186, 6127–6143. https://doi.org/10.1007/s10661-014-3844-0
Chowdhury, Sh. and Husain, T. (2020). Reducing the dimension of water quality parameters in source water: An assessment through multivariate analysis on the data from 441 supply systems. Journal of Environmental Management 274, 1–12. https://doi.org/10.1016/j.jenvman.2020.111202
Dilmi, S. and Ladjal, M. (2021). A novel approach for water quality classification based on the integration of deep learning and feature extraction techniques. Chemometrics and Intelligent Laboratory Systems 214, 1–18. https://doi.org/10.1016/j.chemolab.2021.104329
Dong, W., Sun, H., Li, Z., Zhang, J., & Yang, H. (2020). Short-term wind-speed forecasting based on multiscale mathematical morphological decomposition, K-means clustering, and stacked denoising autoencoders. IEEE Access, 8, 146901–146914. https://doi.org/10.1109/ACCESS.2020.3015336
Elbisy, M., Ali, H., Abd-Elall, M.A. and Alaboud, T. (2014). The Use of Feed Forward Back Propagation and Cascade Correlation for the Neural Network Prediction of Surface Water Quality Parameters. Water Resources 41(6), 709–718. https://doi.org/10.1134/S0097807814060153
Farrou, I., Kolokotroni, M and Santamouris, M. (2012). A method for energy classification of hotels: A case-study of Greece. Energy and Buildings 55, 553–562. https://doi.org/10.1016/j.enbuild.2012.08.010
Faruk, D. (2010). A hybrid neural network and ARIMA model for water quality time series prediction. Engineering Applications of Artificial Intelligence 23 (4), 586–594. https://doi.org/10.1016/j.engappai.2009.09.015
García, L., Barreiro-Gomez, J., Escobar, E., Téllez, D., Quijano, N. and Ocampo-Martinez, C. (2015). Modeling and real-time control of urban drainage systems: a review. Advances in Water Resources 85, 120–132. https://doi.org/10.1016/j.advwatres.2015.08.007
Garcia, X., Barceló, D., Comas, J., Corominas, L., Hadjimichael, A., Page, T. and Anuña, V. (2016). Placing ecosystem services at the heart of urban water systems management. Science of the Total Environment 563–564, 1078–1085. https://doi.org/10.1016/j.scitotenv.2016.05.010
Ginting, V., Pereira, F. and Rahunantham, A. (2014). Multi-physics Markov chain Monte Carlo methods for subsurface flows. Mathematics and Computers in Simulation 107, 1–15. https://doi.org/10.1016/j.matcom.2014.11.023
Gruber, G., Bertrand-Krajewski, J.-L., De Beneditis, J., Hochedlinger, M. and Lettl, W. (2006). Practical aspects, experiences and strategies by using UV/VIS sensors for longterm sewer monitoring. Water Practice and Technology 1(1): wpt2006020. https://doi.org/10.2166/wpt.2006.020
Han, Y., Liu, W., Bretz, F., Wan, F. and Yang, P. (2016). Statistical calibration and exact one-sided simultaneous tolerance intervals for polynomial regression. Journal of Statistical Planning and Inference 168, 90–96. https://doi.org/10.1016/j.jspi.2015.07.005
Happel, A. and Gallagher, D. (2022). Decreases in wastewater pollutants increased fish diversity of Chicago's waterways. Science of the Total Environment 824, 1–13. http://dx.doi.org/10.1016/j.scitotenv.2022.153776
Hernández, N., Camargo, J., Moreno, F., Torres, A., & Nossa, L. P. (2017). Arima as a forecasting tool for water quality time series measured with UV-Vis spectrometers in a constructed wetland. Tecnología y ciencias del agua, 8(5), 127–139. https://doi.org/10.24850/j-tyca-2017-05-09
Hornsby, C., Ripa, M., Vassillo, C. and Ulgiati, S. (2016). A roadmap towards integrated assessment and participatory strategies in support of decision-making processes. The case of urban waste management. Journal of Cleaner Production 142, 157–172. https://doi.org/10.1016/j.jclepro.2016.06.189
Hu, Y., & Wang, X. (2017). Application of surrogate parameters in characteristic UV–vis absorption bands for rapid analysis of water contaminants. Sensors and Actuators B: Chemical, 239, 718-726. https://doi.org/10.1016/j.snb.2016.08.072
Imani, M., You, R. and Kuo, Ch. (2014). Forecasting Caspian Sea level changes using satellite altimetry data (June 1992–December 2013) based on evolutionary support vector regression algorithms and gene expression programming. Global and Planetary Change 121, 53–63. https://doi.org/10.1016/j.gloplacha.2014.07.002
Islam, K., Newton, H., Rahman, J. and Trevathan, J. (2022). Long range multi-step water quality forecasting using iterative ensembling. Engineering Applications of Artificial Intelligence 114, 1-13. https://doi.org/10.1016/j.engappai.2022.105166
Jain, A. (2010). Data clustering: 50 years beyond Kmeans. Pattern Recoition Letters 31, 651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Jin, T., Cai, S., Jiang, D., & Liu, J. (2019). A data-driven model for real-time water quality prediction and early warning by an integration method. Environmental Science and Pollution Research, 26(29), 30374-30385. https://doi.org/10.1007/s11356-019-06049-2
Jolliffe, I.-T. (2002). Principal component analysis. Springer Series in Statistics (SSS) Second Ed. Springer-Verlag. New York 2002. ISSN: 0172-7397
Kaiser, H.-F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement 20(1), 141–151. https://doi.org/10.1177/001316446002000116
Kandananond, K. (2013). Applying 2k Factorial Design to assess the performance of ANN and SVM Methods for Forecasting Stationary and Non-stationary Time Series. Procedia Computer Science 22, 60–69. https://doi.org/10.1016/j.procs.2013.09.081
Kim, J., Yu, J., Kang, Ch., Ryang, G., Wei, Y. and Wang, X. (2022). A novel hybrid water quality forecast model based on real-time data decomposition and error correction. Process Safety and Environmental Protection 162, 553–565. https://doi.org/10.1016/j.psep.2022.04.020
Kim, S., Seo, D.-J., Riazi, H. and Shin, Ch. (2014a). Improving water quality forecasting via data assimilation – Application of maximum likelihood ensemble filter to HSPF. Journal of Hydrology 519, 2797–2809. https://doi.org/10.1016/j.jhydrol.2014.09.051
Kim, Y., Suk, H. and Plummer, J. (2014b). A wavelet-based autoregressive fuzzy model for forecasting algal blooms. Environmental Modelling and Software 62, 1–10. https://doi.org/10.1016/j.envsoft.2014.08.014
Kopriva, D. (2009). Implementing Spectral Methods for Partial Differential Equations, Algorithms for Scientists and Engineers. Springer Science + Business Media B.V. 2009. ISBN: 978-90-481-2261-5
Krawczak, M. and Szkatula, G. (2014). An approach to dimensionality reduction in time series. Information Sciences 260, 15–36. https://doi.org/10.1016/j.ins.2013.10.037
Langergraber, G., Fleischmann, N., Hofstaedter, F. and Weingartner A. (2004). Monitoring of a paper mill waste water treatment plant using UV/VIS spectroscopy. Water Science and Technology 49(1), 9–14. https://doi.org/10.2166/wst.2004.0004
Lepot, M., Torres, A., Hofer, T., Caradot, N., Gruber, G., Aubin, J. B., & Bertrand-Krajewski, J. L. (2016). Calibration of UV/Vis spectrophotometers: A review and comparison of different methods to estimate TSS and total and dissolved COD concentrations in sewers, WWTPs and rivers. Water Research, 101, 519-534. https://doi.org/10.1016/j.watres.2016.05.070
Lin, Z., Cheng, Sh., Sun, Y., Li, H. and Jin, B. (2022). Realizing BOD detection of real wastewater by considering the bioelectrochemical degradability of organic pollutants in a bioelectrochemical system. Chemical Engineering Journal 444, 1– 9. https://doi.org/10.1016/j.cej.2022.136520
Loc, H. H., Do, Q. H., Cokro, A. A., & Irvine, K. N. (2020). Deep neural network analyses of water quality time series associated with water sensitive urban design (WSUD) features. Journal of Applied Water Engineering and Research, 8(4), 313–332. https://doi.org/10.1080/23249676.2020.1831976
López-Kleine, L. and Torres, A. (2014). UV-vis in situ spectrometry data mining through linear and non linear analysis methods. DYNA 81(185), 182–188. https://doi.org/10.15446/dyna.v81n185.37718
Martin, C., Allan, J., Crosier, J., Choularton, T., Coe, H. and Gallagher, M. (2011). Seasonal variation of fine particulate composition in the centre of a UK city. Atmospheric Environment 45, 4379–4389. https://doi.org/10.1016/j.atmosenv.2011.05.050
Ohana-Levi, N., Ben-Gal, A., Munitz, S. and Netzer, Y. (2022). Grapevine crop evapotranspiration and crop coefficient forecasting using linear and non-linear multiple regression models. Agricultural Water Management 262, 1–11. https://doi.org/10.1016/j.agwat.2021.107317
Okwuashi, O. and Ndehedehe, Ch. (2021). Integrating machine learning with Markov chain and cellular automata models for modelling urban land use change. Remote Sensing Applications: Society and Environment 21, 1–16. https://doi.org/10.1016/j.rsase.2020.100461
Ouma, Y. O., Okuku, C. O., & Njau, E. N. (2020). Use of artificial neural networks and multiple linear regression model for the prediction of dissolved oxygen in rivers: case study of hydrographic basin of River Nyando, Kenya. Complexity 8, 1–23. https://doi.org/10.1155/2020/9570789
Park, S., Kim, K., Shin, C., Min, J. H., Na, E. H., & Park, L. J. (2020). Variable update strategy to improve water quality forecast accuracy in multivariate data assimilation using the ensemble Kalman filter. Water research, 176, 115711. https://doi.org/10.1016/j.watres.2020.115711
Plazas-Nossa, L. and Torres, A. (2014). Comparison of DFT and PCA/DFT as forecasting tools of absorbances time series received by UV-Visible probes installed in urban sewer systems. Water Science and Technology, 69(5), 1101 ̶ 1107. https://doi.org/10.2166/wst.2014.011
Plazas-Nossa, L. and Torres, A. (2015). PCA/DFT as forecasting tools for absorbance time series received by UV-Vis probes in urban sewer systems. Revista Tecnura 19(44), 47-57. https://doi.org/10.14483/udistrital.jour.tecnura.2015.2.a03
Plazas-Nossa, L., Flórez-Valencia, L. and Torres, A. (2015). Clustering and Bayesian inference as forecasting tools of UV-Vis absorbance time series. 10th IWA/IAHR International Urban Drainage Modelling Conference UDM-2015, 20-23 September 2015, Québec-Canada.
Plazas-Nossa, L., Hofer, T., Gruber, G. and Torres, A. (2017). Forecasting of UV-Vis absorbance time series using Artificial Neural Networks combined with Principal Component Analysis. Water Science and Technology 75(4), 765-774. https://doi.org/10.2166/wst.2016.524
Poch, M., Cortés, U., Comas, J., Rodriguez-Roda, I. and Sànchez-Marrè, M. (2012). Decisions on Urban Water Systems: Some Support. Universitat de Girona, Girona, Spain 2012. ISBN: 978-84-8458-401-8
Priyadarshini, I., Alkhayyat, A., Obaid, A. and Sharma, R. (2022). Water pollution reduction for sustainable urban development using machine learning techniques. Cities 130, 1–15. https://doi.org/10.1016/j.cities.2022.103970
Proakis, J., and Manolakis, D. (2007). Digital signal processing principles, algorithms, and applications. Fourth Edition. Pearson Prentice Hall. New Jersey-USA. ISBN: 978-0131873742
Ramin, M., Labencki, T. Boyd, D., Trolle, D and Arhonditsis, G. (2012). A Bayesian synthesis of predictions from different models for setting water quality criteria. Ecological Modelling 242, 127–145. https://doi.org/10.1016/j.ecolmodel.2012.05.023
Riesco, J., Mora, M. Dávila, F. and Rivas, L. (2014). Regimes of intense precipitation in the Spanish Mediterranean area. Atmospheric Research 137, 66–79. https://doi.org/10.1016/j.atmosres.2013.09.010
Rieger, L., Langergraber, G., Thomann, M., Fleischmann, N. and Siegrist, H. (2004). Spectral in-situ analysis of NO₂, NO₃, COD, DOC and TSS in the effluent of a WWTP. Water Science and Technology 50(11), 143–152. https://doi.org/10.2166/wst.2004.0682.
Russell, S. and Norvig, P. (2010). Artificial Intelligence. A modern approach. Third Edition. Prentice Hall Series. USA. 2010. ISBN: 978-01-3604-259-4
Saha, Dh. and Manickavasagan, A. (2021). Machine learning techniques for analysis of hyperspectral images to determine quality of food products: A review. Current Research in Food Science 4, 28–44. https://doi.org/10.1016/j.crfs.2021.01.002
Said, Z., Sharma, P., Elavarasan, R., Tiwara, A.-K. and Rathod, M. (2022). Exploring the specific heat capacity of water-based hybrid nanofluids for solar energy applications: A comparative evaluation of modern ensemble machine learning techniques. Journal of Energy Storage 54, 1–15. https://doi.org/10.1016/j.est.2022.105230
Sanguanduan, N. and Nititvattananon, V. (2011). Strategic decision making for urban water reuse application: a case from Thailand. Desalination 268, 141–149. https://doi.org/10.1016/j.desal.2010.10.010
Sapankevych, N. and Sankar, R. (2009). Time series prediction using support vector machines: A survey. IEEE Computational Intelligence Magazine 4(2), 24–38. https://doi.org/10.1109/MCI.2009.932254
Sengodan, G. (2021). Prediction of two-phase composite microstructure properties through deep learning of reduced dimensional structure-response data. Composites Part B 225, 1-13. https://doi.org/10.1016/j.compositesb.2021.109282
Solomatine, D. (2002). Data-driven modelling: machine learning and data mining in water related problems. Tutorial handouts. Proceedings of V International Conference on Hydroinformatics. 1–5 July, 2002. Cardiff- UK.
Suárez-Almiñana, S., Andreu, J., Solera, A. and Madrigal, J. (2022). Integrating seasonal forecasts into real-time drought management: Júcar River Basin case study. International Journal of Disaster Risk Reduction 70, 1–16. https://doi.org/10.1016/j.ijdrr.2021.102777
Sun, C., Joseph-Duran, B., Maruejouls, T., Cembrano, G., Meseguer, J., Puig, V., & Litrico, X. (2017). Real-time control-oriented quality modelling in combined urban drainage networks. IFAC-PapersOnLine, 50(1), 3941–3946. https://doi.org/10.1016/j.ifacol.2017.08.142
s::can (2006). Manual ana::pro Version 5.3 September 2006 Release, Messtechnik GmbH, Vienna, Austria 2006.
Tan, G., Yan, J., Gao, Ch. and Yang, S. (2012). Prediction of water quality time series data based on least squares support vector machine. Procedia Engineering 31, 1194–1199. https://doi.org/10.1016/j.proeng.2012.01.1162
Thai-Nghe, N., & Thanh-Hai, N. (2020). Forecasting Sensor Data Using Multivariate Time Series Deep Learning. In 7^th International Conference on Future Data and Security Engineering. Quy Nhon, Vietnam (pp. 215–229). Springer, Singapore. https://doi.org/10.1007/978-981-33-4370-2
Tzimas, A. (2017). Space assisted water quality forecasting platform for optimized decision making in water supply services. In 15th International Conference on Environmental Science and Technology, Rhodes, Greece. https://doi.org/10.3030/730005
Uddin, G., Nash, S., Mahammad, M., Rahman, A. and Olbert, A. (2022). Robust machine learning algorithms for predicting coastal water quality index. Journal of Environmental Management 321, 1–16. https://doi.org/10.1016/j.jenvman.2022.115923
US-EPA (2006). Real time control of urban drainage networks. Report EPA/600/R-06/120, US Environmental Protection Agency, US Environmental Protection Agency OoRaD, Washington, DC, USA.
van den Broeke, J. (2007). On-line and In-situ UV/Vis Spectroscopy: Real time multi parameter measurements with a single instrument, AWE International, Issue 10, page 54–59, informative magazine website http://www.aweimagazine.com/article.php?article_id=477, March 2007, visited 10 June 2012.
Vapnik, V., Golowich, S. and Smola, A., (1997). Support vector method for function approximation, regression estimation, and signal processing. Proceedings of the 9th International Conference on Neural Information Processing Systems, 281–287. https://dl.acm.org/doi/10.5555/2998981.2999021
Venkatesh, K., Ravi, V., Prinzie, A. and Van den Poel, D. (2014). Cash demand forecasting in ATMs by clustering and neural networks. European Journal of Operational Research 232, 383–392. https://doi.org/10.1016/j.ejor.2013.07.027
Vrugt, J., ter Braak, C., Diks, C. and Schoups, G. (2013). Hydrologic data assimilation using particle Markov Chain Monte Carlo simulation: Theory, concepts and applications. Advances in Water Resources 51, 457–478. https://doi.org/10.1016/j.advwatres.2012.04.002
West, D. and Dellana, S. (2011). An empirical analysis of neural network memory structures for basin water quality forecasting. International Journal of Forecasting 27, 777–803. https://doi.org/10.1016/j.ijforecast.2010.09.003
Xue, Z., Lv, Z., Liu, Ch., Yang, X., Yu, Sh. and Li, L. (2022). Chromatographic and spectroscopic comparison of dissolved organic matter variation in anaerobic-anoxic-oxic process with tertiary filtration and membrane bioreactor. Journal of Water Process Engineering 47. 1–15. https://doi.org/10.1016/j.jwpe.2022.102693
Yang, W., Nan, Jun. and Sun, D. (2008). An online water quality monitoring and management system developed for the Liming River basin in Daqing, China. Journal of Environmental Management 88, 318–325. https://doi.org/10.1016/j.jenvman.2007.03.010
Young, Ch-Ch., Liu, W-Ch. and Hsieh, W-L. (2015). Predicting the water level fluctuation in an Alpine lake using physically based, Artificial Neural Network, and time series forecasting models. Mathematical Problems in Engineering 2015, 1–11. https://doi.org/10.1155/2015/708204
Zhang, Z. and Zhu, Q. (2012). Fuzzy Time Series Forecasting Based On K-Means Clustering. Proceedings of Congress on Engineering and Technology CET-2012, 26-28 Oct. 2012, Beijing-China.
Zhang, Y., Xiang, M. and Yang, B. (2016). Linear dimensionality reduction based on Hybrid structure preserving projections. Neurocomputing 173, 518–529. https://doi.org/10.1016/j.neucom.2015.07.011
Zhu, M., Wang, J., Yang, X., Zhang, Y., Zhang, L., Ren, H., Wu, B. and Ye, L. (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment & Health 1, 107–116. https://doi.org/10.1016/j.eehl.2022.06.001
Zhu, T., Xu, Y., Shen, F. and Zhao, J. (2016). Orthogonal component analysis: A fast dimensionality reduction algorithm. Neurocomputing 177, 136–146. https://doi.org/10.1016/j.neucom.2015.11.012
Zhu, Q., Gu, A., Li, D., Zhang, T., Xiang, L., & He, M. (2021). Online recognition of drainage type based on UV-vis spectra and derivative neural network algorithm. Frontiers of Environmental Science & Engineering 15(6):136. 1–9. https://doi.org/10.1007/s11783-021-1430-6
Zhu, W., Duan, C. and Chen, B. (2022). Energy-pollutant nexus for wastewater treatment in China based on multi-regional input-output analysis. Journal of Cleaner Production 363, 1–11. https://doi.org/10.1016/j.jclepro.2022.132490

Table 1. The three best forecasting methodologies in descending order for each study site and each range of spectra

Time series

(Study site)

VIS

El-Salitre WWTP

PCA/ANN

PCA/L-Poly

kmMC

PCA/ANN

kmMC

PCA/L-Poly

GPS

PCA/DFT

PCA/ANN

kmMC

PCA/DFT

PCA/L-Poly

PCA/PolyReg

San Fernando WWTP

PCA/SVM

PCA/PolyReg

kmMC

PCA/SVM

kmMC

PCA/PolyReg

Download PDF

Version 1

posted

You are reading this latest preprint version

A Hybrid Approach to Forecasting Water Quality in Urban Drainage Systems

Status:

Version 1

Abstract

Figures

Introduction

Materials And Methods

Results And Discussion

Conclusions

Declarations

References

Tables

Status:

Version 1