4.1 Results from Databases
Table 1 shows the number of analyzed articles considered in the review, while Fig. 2 shows the PRISMA 2000 flow diagram for systematic literature review (SLR) showing the results of the database searches and their registries. The variation in the number of articles using the two search items (sources of contamination in aquatic environment in Nigeria and application of machine learning in water quality prediction) in the two databases showed an upward increase in articles on the term application of machine learning in water quality prediction compared to the former. The Scopus database showed a higher number of publications compared to the Web of Science database. This is because the Scopus database has a greater number of indexed journals compared to the Web of Science database.
Table 1
Number of articles analyzed and considered for the SLR
Search terms | Returned articles | Databases | Selected articles in the search |
---|
Sources of contamination in the aquatic environment in Nigeria | 729 | Scopus | 35 |
Application of machine learning in water quality prediction | 2,234 | Scopus | 20 |
Sources of contamination in the aquatic environment in Nigeria | 521 | Web of Science | 25 |
Application of machine learning in water quality prediction | 1,672 | Web of Science | 18 |
The sum of articles considered from abstracts and titles | | | 98 |
The sum of articles considered after full-text analysis | | | 40 |
The number of articles returned with the use of the search term sources of contamination in the aquatic environment in Nigeria and application of machine learning in water quality prediction were 729 and 2,234 respectively in the Scopus database. For the Web of Science database, 521 and 1,672 articles respectively were returned from the search (Table 1, Fig. 2).
The number of articles considered for review was 35 and 20 for the terms sources of contamination in the aquatic environment in Nigeria and application of machine learning in water quality prediction in the Scopus database. While 25 and 18 were selected for the Web of Science database (Table 1). It is worth mentioning that due to a large number of returned articles for the Scopus database, the titles and abstracts of articles read were for the first 600.
For inclusion and exclusion criteria, only articles that did not appear in the two databases were included for the review; as some articles appeared in both the Scopus and Web of Science databases. Therefore, a total of 98 articles were selected for reading of titles and abstracts while a total of 40 articles were selected for full-text analysis while others were discarded (Table 1). The discarded articles did not meet the established criteria.
Based on the search term “application of machine learning in water quality prediction”, out of the 257 publications that were considered, China occurred in 61, the United States in 57, India in 32, Malaysia in 15, south Korea in 14, Australia in 13, the United Kingdom and Canada in 10, Vietnam in 9, Iran in 8, Brazil and Iraq occurred in 7, Poland and Taiwan occurred in 6, Russian Federation, France, Germany, Italy, and Saudi Arabia occurred in 5; Morocco, Sweden, Bangladesh, Netherlands, and Hong Kong occurred in 4; Singapore, Spain, United Arabs Emirates, Norway, Palestine, Pakistan, and Greece occurred in 3; Croatia, Egypt, Pietro Florentine, Serbia, Belgium, Algeria, Nigeria, South Africa, Turkey, Columbia occurred in 2; Dakota, Philippines, Tunisia, Austria, Uganda, Romania, Chile, Denmark, Czech Republic, Luxembourg, Switzerland, Finland, Georgia, Peru, Sri Lanka, Portugal, State of Libya, Israel, and Mexico occurred in 1 had one publication each (Table 2). It can therefore be deduced frequency of the application of ML in these countries may be due availability of experts in the use of neural networks in engineering and environmental studies. Although this may be speculative, the author hopes that this observation will spur the interest of researchers in the field for more discussion and enlightenment.
Table 2
Number of articles per country occurring from the systematic literature review
Country | No. of Publication |
---|
China | 61 |
United States | 57 |
India | 32 |
Malaysia | 15 |
South Korea | 14 |
Australia | 13 |
United Kingdom | 10 |
Canada | 10 |
Vietnam | 9 |
Iran | 8 |
Brazil, Iraq | 7 |
Poland, Taiwan, Poland | 6 |
Russian Federation, France, Germany, Italy, Saudi Arabia | 5 |
Morocco, Sweden, Bangladesh, Netherlands, Hong Kong | 4 |
Singapore, Spain, United Arab Emirates, Norway, Palestine, Pakistan, Greece | 3 |
Croatia, Egypt, Pietro Florentine, Serbia, Belgium, Algeria, Nigeria, South Africa, Turkey, Columbia | 2 |
Dakota, Philippines, Tunisia, Austria, Uganda, Romania, Chile, Denmark, Czech Republic, Luxembourg, Switzerland, Finland, Georgia, Peru, Sri Lanka, Portugal, State of Libya, Israel, and Mexico. | 1 |
4.2 Bibliographic mapping using VOSviewer
The bibliographic map was created using the VOSviewer software. Co-occurrence was chosen as the type of analysis, while All keywords were chosen as the unit of analysis, with fractional counting selected as the preferred counting method. For the application of machine learning in the water quality prediction search file, the software returned 377 keywords with 186 meeting the threshold and occurring at least six times across all analyzed articles. For each of the 186 keywords, the total strength of the co-occurrence links with other keywords was calculated. The keyword with the greatest link strength was then selected for generating the bibliographic map. Links between keywords are an indication of the relationship between them (the link strength) represented by a numerical value. Implying that the higher the link strength, the stronger the association where the two terms occur together (van Eck and Waltman 2020). Terms were then grouped into clusters of a different colour; with each colour representing the strength of the association.
For the application of machine learning in water quality prediction terms, analysis was made based on both the countries of study and the type of machine learning algorithm frequently used. The bibliographic map based on the overlay visualization (with the year of publication) and density visualization is shown in Fig. 3 based on the type of machine learning model. The graphical representation of the variation in the number of publications and the occurrence of the ML algorithms in water quality prediction is shown in Fig. 3. Artificial neural networks had the highest occurrence and strongest link strength in all analyzed publications followed by regression analysis and random forest, while Bayes theorem and the convolutional neural network had the lowest occurrence and weakest link strength (Fig. 3–4). From the bibliographic map, high density and link strengths seem to occur mostly with Artificial neural networks, support vector machines, regression analysis, decision trees, and random forests. However, fuzzy systems, long short-term memory, convolutional neural networks, and hybrid, and ensemble ML models had the weakest links and strength, with the number of publications on the subject seeming to be increasing since 2021. This result implies that the use of hybrid ML models in water quality prediction has not been well explored globally; a majority of the prediction has been based on the use of artificial neural networks.
Based on the countries of study, China, the United States, and India had the highest number of publications and link strength concerning the term application of machine learning in water quality prediction (Fig. 5). The total number of publications involving the review developed by country was 61, 57, and 15 respectively for China, the United States, and India (Table 2, Fig. 5). This implies that countries like Croatia, Egypt, Pietro Fiorentini, Serbia, Belgium, Algeria, Nigeria, South Africa, Turkey, Columbia, Dakota, Philipines, Tunisia, Austria, Uganda, Romania, Chile, Denmark, Czech Republic, Luxembourg, Switzerland, Finland, Georgia, Peru, Sri Lanka, Portugal, State of Libya, Isareal and Mexico are grossly lacking behind in the use of ML in water quality monitoring and assessment.
For the sources of contamination in the aquatic environment in Nigeria search term, analysis was made based on the relationship between anthropogenic and natural/geogenic sources of contamination and the number of publications within the last two decades (2003–2024). Based on the relationship between pollutant type, heavy metals, Lead, Cadmium, Copper, Chromium, and Zinc were reported in more publications and had the greatest link strength among others. Meanwhile, Antibiotic agents, aromatic compounds, bitumen, Cl, combustion, dumpsite, EC, mineralization, municipal solid wastes, organic carbon, phosphates, rainwater, saline intrusion, urbanization, and suspended particulate matter showed the lowest link strength and were reported in fewer publications (Fig. 6, Table 3). Through the analysis of extracted literature, from 2003 to 2024, a relationship was established between anthropogenic and natural sources of contamination among all the reported articles on water quality contamination in Nigeria. From the results, it was observed that most studies attributed the occurrence of contamination in water to mining followed by lithogenic and anthropogenic processes (Fig. 7).
Table 3
number of publications based on sources of aquatic contamination in Nigeria
Pollutant | Occurrences per publication | Total link strength |
---|
Heavy metal | 67 | 62 |
Heavy metals | 59 | 52 |
Lead | 48 | 47 |
Cadmium | 40 | 39 |
Copper | 36 | 36 |
Chromium | 35 | 35 |
Zinc | 34 | 34 |
Escherichia coli | 30 | 26 |
Geologic sediments | 25 | 24 |
Polycyclic aromatic hydrocarbons | 25 | 21 |
Manganese, Nickel | 23 | 23 |
Arsenic, Iron | 22 | 22 |
Nitrate | 21 | 20 |
PAHs, pH | 20 | 20 |
Electrical conductivity | 19 | 16 |
Hydrocarbons | 18 | 17 |
Coliform bacterium, sewage | 14 | 13 |
Bacterium contamination, seasonal variation | 13 | 13 |
Microbial contamination | 12 | 10 |
Effluents, leachates, mercury, mining, petroleum, trace elements, trace metals, turbidity, weathering | 11 | 11 |
Agriculture, Calcium, total dissolved solids | 10 | 10 |
Aluminum, crude oil, feces, irrigation, sulfate, wastewater | 9 | 9 |
Anti-infective agents, chlorine, fecal coliform, industrial waste, land use, NO3, Na, population density, wastewater | 8 | 6 |
Chlorine compounds, DO, geology, industries, Mg, oil spill, K | 7 | 7 |
Co, F, landfill, leachates, leaching, organic matter, runoff, sulfur compounds, hardness, domestic wastes | 6 | 6 |
Antibiotic agents, aromatic compounds, bitumen, Cl, combustion, dumpsite, EC, mineralization, municipal solid wastes, organic carbon, phosphates, rainwater, saline intrusion, urbanization, suspended particulate matter | 5 | 4 |
These results are in agreement with reports by Omeka & Egbueri (2022), that due to an upsurge in anthropogenic activities in the country such as mining, industrialization, and socio-economic activities, high concentrations of heavy metal(loids) and potentially toxic elements have been reported in drinking water in the country. As reported by Obasi, & Akudinobi (2020), there is an upward increase in the concentration of Cd, Pb, Cr, Zn, and cobalt in waters from the solid-mineral-rich southern Benue trough in southeastern Nigeria. This has been attributed to the open-cast mining method that is prevalent among most mining operations in the country (Okolo et al. 2018; Omeka & Igwe 2021). In southeastern Nigeria, Ajala et al. (2022), provided a critical review of potential sources of heavy metals in water and aquatic fish as well as the human health risks from their consumption. High concentrations of Cd, As, Cu, and Zn were reported in both the drinking water and the shell of fishes. In Nsukka, southeastern Nigeria, they have been a reported high concentration of As, Cd, Hg, and Cr in water; attributed to anthropogenic influxes and run-off from domestic solid and sewage wastes (Nnaji et al. 2023).
Many studies in southwestern Nigeria have attributed the concentration of heavy metals in drinking water sources to industrialization. According to an extensive literature review carried out by Balogun et al. (2022) in the region, the long-standing sources of groundwater pollution cases have been attributed majorly to industrial effluents. In some parts of the Ibadan metropolis, southwest Nigeria, the concentration of heavy metals in shallow had-dug wells was observed to occur in the order of Zn > Fe > Pb > Cd > Mn; attributed to poor sanitation and industrialization (Ganiyu et al. 2021a). In the same region, the water quality of shallow aquifers was investigated for their metal content and bacterial load. The water was reportedly polluted due to pathogens and heavy metals; with the heavy metals occurring in the order of Cd > Pb > Zn > Fe > Mn, attributed to varying anthropogenic activities in the area (Ganiyu et al. 2021b). In the densely populated parts of the southwestern region, such as Lagos, water pollution has been associated with poor waste management practices, lowering of the water table (due to over-abstraction), and influx from industrial effluents (Ogundiran & Afolabi 2008). Some studies have also attributed the presence of heavy metals to both geogenic processes (such as weathering and leaching of subsurface geology) and anthropogenic sources (from industrial effluents). A study on the water quality assessment of Asa River in Ilorin, southwestern Nigeria revealed a high concentration of chromite (FeCr3O4) and pyrite (FeS) in river sediments, attributed to weathering of subsurface geology. Heavy metals such as Zn, Fe, Cr, and Mn were also reportedly found in elevated concentration due to the influx of industrial effluents in sediments (Adekola & Elleta 2007). Although some studies in the region have reported the presence of heavy metals in water due to vehicular emissions and related repair products, domestic sewage, and effluents (Tijani et al. 2004), most have been mostly due to industrial and agricultural effluents (Olaojo et al. 2016; Emenike et al. 2020).
In the northern and central parts of Nigeria, very few studies have reported water pollution due to industrial effluents Most reports on heavy metal concentration in water have been attributed to mining activities. In the Anka gold mining area of northwestern Nigeria, there have been reported cases of Pb poisoning found in the hairs and nails of children within the vicinity of the mine site; attributed to the ongoing mining activities within the region (Adebwumi 2020). Although a study by Lar et al. (2015) reported the concentration of heavy metals (such as Zn, V, Pb, Cu, Co, Be Cr, As, Cd, Sb, and Se) in drinking water wells due to volcanic eruption in the Panyam volcanic province, majority of the reports on metal concentration in the water have been attributed to mining and poor disposal of its effluents. Pb, Ni, and Hg poisoning have been reported in analyzed surface and groundwater samples from the Bagega gold mine province in Zamfara state, Northwestern Nigeria. The high elevated concentration of carcinogenic elements was attributed to artisanal gold mining (Nuhu et al. 2014).
In southeastern Nigeria, the sources of water pollution are highly ubiquitous-ranging from a wide range of anthropogenic sources (e.g., mining, poor waste disposal activities, poor hygiene, industries, population increase, land use, etc.) and geogenic sources (precipitation, chemical weathering, weathering, and rock-water interaction) (Omeka et al. 2023; Aghamelu et al. 2022). Although most of the reports on contamination have been from both anthropogenic and geogenic activities (Nnorom et al. 2019; Edet et al. 2003), the majority have been from mining and poor waste disposal activities. Reports from mining have been majorly from the solid mineral-rich zones of the lower Benue trough; with high concentrations of ore minerals occurring in association with Pb-Zn. (Omeka & Igwe 2021; Adamu et al. 2015). This has been mainly reported from Ebonyi, Enugu, and Cross River states. A review carried out by Umeoguaju et al. (2022) for two decades (2000–2020) revealed that the concentration of heavy metals in most water sources in southeastern Nigeria is attributed majorly to anthropogenic influences from mining and oil exploration. This is in agreement with a study by Opuene & Agbozu 2008) on the heavy metal assessment in fish from Taylor Creek, southern Nigeria.
4.3 Application of neural networks in Water Quality Monitoring: implications for prospects
This section addresses the efficacy of artificial neural networks (ANNs) in water quality assessment, the most frequently used ANN models, the input or predictor variables, and the model accuracy based on the coefficient of determination (R2). To achieve this, a literature search was conducted on the application of neural networks in water quality modeling and prediction. The search was conducted on papers from the Scopus and Web of Science databases between 2003–2024, using the keywords “water quality and artificial intelligence” with emphasis on both surface and groundwater. The results from the search were compiled and presented in Table 4, and plots were generated based on the results in Table 4.
Table 4
Dataset of selected articles from the Web of Science and Scopus databases used for SLR
Year | Location | ANN model | Input parameters | Highest R2 (during the testing stage) | References |
---|
2023 | 1. Surface and groundwater, Ojoto suburb, southeastern Nigeria 2. Groundwater samples from Osisioma, southeastern Nigeria 3. Groundwater samples from Egbema, southeastern Nigeria 4. surface and groundwater from Okurumutet-Iyamitet mine province, southeastern Nigeria | MLP-ANN MLP-ANN MLP-ANN MLP-ANN | TH, T, pH, TDS EC, Cl, Ca, SO4, Pb, HCO3, Zn Fe. Cu, Pb, Fe As, Cr, Benzene, Ethylbenzene m-Xylene, Toluene, and o-Xylene Fe, Zn, Ni, Cd, Cu and Pb HCO3−, SO42−, NO3−, Cl−, Mg2+, Ca2+, K+, and Na+ | 0.878 0.896 0.966 0.861 | (Egbueri 2023) (Akakuru et al. 2023) (Akakuru et al. 2023b) (Omeka et al. 2023) |
2022 | 1. Bouregreg watershed, Morocco. 2. Groundwater from Agartala municipality, India. 3. groundwater from El Kharga Oasis, Western Desert of Egypt. | BPNN BPNN ANFIS | pH and EC Na+, Mg2+, Ca2+, EC, HCO3−, and B EC, pH, T◦, TDS, Na+, Mg2+, K+, Ca2+, Cl−, CO32−, SO42−, NO3−and HCO3−, | 0.87 0.990 0.997 | (Bilali et al. 2022) (Mallik et al 2022) (Ibrahim 2022) |
2021 | (1) Groundwater wells from Illizi County, Algeria (2) groundwater from El Merk is an oil field in the SOUTHEAST of Algerian | BP-NN MLP-ANN | pH, TH, EC, TDS, Ca, Mg, Na, K, HCO3, Cl, SO4, and NO3 TH, NO3, and NO2 | 0.8957 0.9967 | (Kouadri et al. 2021) (Kouadri & Samir 2021) |
2020 | 1. The Elbe River, Germany 2. Groundwater from the Gaza Strip Palestine 3. Yipin River, China | Wavelet and BPNN (W-BPNN) MLP-NN MLP-NN | Flow, pH, Fe, and DO Abstraction average rate (AVR), relative humidity (RH), depth from the surface to well screen (DSWS), aquifer thickness (AT), recharge rate (RR), Initial chloride concentration (ICC), and groundwater level (GWL) pH, TP, Temperature, EC, PI, NH3-N, COD and TN | 0.780 0.9770 0.717 | (Li et al. 2020) (Kassem 2020) (Zhu and Heddam 2020) |
2019 | Water samples from the Shivganga River basin, India lakes, Tezpur University, India | BP-NN MLP-ANN | EC, pH, TH, TDS, Mg, Ca, K, Na, HCO3, PO4 NO3, Cl, and SO4 BOD and TSS | 0.932 0.783 | (Kadam et al. 2019) (Ahamad et al. 2019) |
2018 | 1. Gorganrood Basin, Iran 2. Gorganrood Basin, Iran | ANFIS ANFIS | SAR EC | 0.99 0.99 | (Azad et al. 2018) (Azad et al. 2018) |
2017 | 1. Hooghly River, India 2. Hooghly River, India 3. Saint John River, Canada 4. Langat River and Klang River, Malaysia | NN-CS NN-GA BPNN BPNN | Chlorides, turbidity, pH, TA, and residual chlorine TH Chlorides, pH, residual chlorine TA TH, and turbidity TSS BOD, DO, COD, Ph, NH3-N, and TSS | - - 0.976 0.7267 | (Chatterjee et al. 2017) (Chatterjee et al. 2017) (El Din and Zhang 2017) (Hameed et al. 2017) |
2016 | 1.Aji-Chay River, Iran 2. Aji-Chay River, Iran | W-ANN W-ANFIS | Salinity (EC) Salinity (EC) | 0.9960 0.9958 | Barzegar et al. 2016 (Barzegar et al. 2016) |
2015 | 1. Hilo Bay, Hawaii, USA 2. Hilo Bay, Hawaii, USA 3. Dahan River, Taiwan 4. Nadong River, South Korea | MLP W-ANN BPNN W-ANN | Salinity, DO, and Temperature Temperature, DO, and Salinity NH3–N Water level | 0.80 0.967 0.979 | (Alizadeh and Kavianpour 2015) (Alizadeh and Kavianpour 2015) (Chang et al. 2015) (Seo et al. 2015) |
2014 | Karoon River, Iran Karoon River, Iran | MLP-NN MLP-NN | DO COD | 0.85 0.74 | (Emamgholizadeh et al. 2014) (Emamgholizadeh et al. 2014) |
2013 | Johor River, Malaysia Jishan Lake, China Jishan Lake, China | MLP-NN W-ANN ENN | EC, TDS, and turbidity DO, Temperature, and pH DO, Temperature, and pH | 0.799 - - | (Najah et al. 2013) (Xu and Liu 2013) (Xu and Liu 2013) |
2012 | Kinta River, Malaysia | MLP-NN | 36 parameters | 0.765 | (Gazzaz et al. 2012) |
2011 | Nile River, Egypt | MLP-NN | 33 parameters | - | (Khalil et al. 2011) |
As observed earlier from the bibliographic coupling map in Fig. 2, artificial neural networks appear to be the most frequently used machine learning model in water quality monitoring and assessment globally, followed only by support vector machines and decision trees. The wide usage of ANN in water quality studies has been attributed to its efficiency and versatility in predictions even in systems with poor computational strength (Egbueri et al. 2023; Omeka et al. 2023). According to Ghavidel and Montazeri (2014), the unique ability of the ANN model to accurately match a broad range of nonlinear variables makes it a widely accepted ML model in most water quality studies. The ANN architecture is designed to mimic the human neural system; with the unique ability to quickly learn and send signals about a range of linear and nonlinear datasets through its interwoven parts known as “neurons” (Ozel et al. 2020). This makes it stand out among other water quality modeling techniques. Water quality data are usually nonstationary, random, nonlinear, and unpredictable (Beven, 2016). This means that the relationship between water quality and its controlling factors will tend to vary over time (as a result of varying anthropogenic and seasonal fluctuations), hence, the application of predictive models that work on historical data will become ineffective over time (Khan et al., 2021), hence more focus has been put on ANN due to its unique ability to process complex datasets.
4.3.1 Model utilization
It can be observed from Fig. 8 that the multilayer perceptron artificial neural network (MLP-NN) and back-propagated neural network (BP-NN) are the most frequently used neural network algorithms in water quality monitoring and prediction within the period of 2003–2024. This was closely followed by the adaptive neuro-fuzzy inference system (ANFIS), and other neural networks such as the Wavelet and BPNN (WNN) hybrid model, Neural Network trained by Cuckoo Search (NNCS), Neural Network trained by Genetic Algorithm (NN-GA), Elman Neural Network (ENN) and Wavelet-Adaptive Neural Fuzzy Interference System (WANFIS) in decreasing order. The high frequency of the MLP-NN and BP-NN in water quality monitoring and prediction is in line with a study by Rajaee et al. (2020). The wide acceptance of these algorithms has also been attributed to their ability to attain high modeling accuracy with fewer input parameters compared to other white-box models (Yetilmezsoy et al. 2011; Ighalo et al. 2020; Omeka 2023).
4.3.2 Model validation accuracy
In this assessment, only studies that used the coefficient of determination (R2) as a validation metric were considered. The exclusion was made for the root mean square error (RMSE). The RMSE was excluded because of the difficulty of directly comparing the units to different parameters. This implies that its accuracy is dependent on the size (value) of the parameter being considered. This would mean that a parameter with a large numerical value will show a high RMSE value; resulting in inaccuracy (Ighalo et al. 2020). Unlike the RMSE, the R2 is a measure of the extent of variation of a particular dataset, therefore giving better accuracy of parameters in a particular dataset (Adeniyi et al. 2019).
An observation of Table 4 shows that the adaptive neuro-fuzzy inference system (ANFIS), Wavelet-Adaptive Neural Fuzzy Interference System (W-ANFIS), and the Wavelet and BPNN (W-NN) hybrid models are the most accurate neural networks for both surface and groundwater monitoring and prediction. These findings are in agreement with the observation of Ighalo et al. (2020) on the frequently used neural networks for surface water prediction between 2011–2020. This study has revealed that although the MLP-NN and BP-NN seem to be the most popularly used algorithms in water quality prediction, however, they have low modeling accuracy compared to other hybrid models. The high accuracy of the W-ANFIS and W-NN hybrid models has been attributed to the in-cooperation of the wavelet decomposition of covariates (predictor variables) into detail and approximate components (Barzegar et al. 2016). On the other hand, the high accuracy of the ANFIS is because its architecture integrates the learning ability of both the Fuzzy Inference System and ANN (Emamgholizadeh et al. 2014).
4.3.3 Predicted water quality parameters
In this section, the most frequently analyzed water quality parameters for both surface and groundwater have been discussed. An observation of Table 4 shows that temperature, dissolved oxygen (DO), pH, Electrical conductivity (EC; as a function of salinity), and total dissolved solids (TDS) are the most frequently investigated parameters in water quality prediction. A possible reason for their high prevalence is due to significance in the determination of the overall water quality in both surface and groundwater sources. Ighalo et al. (2020) have attributed the prevalence of DO, TDS, and pH to the high availability and low cost of measuring equipment. Moreover, these parameters are readily measured in situ before laboratory analysis because of their high susceptibility to surface environmental changes (Igwe and Omeka 2021).