Study area
The Taoyuan Tableland, located in northwestern Taiwan, covers an area of 757 km2, comprising 2,898 ha of irrigation ponds, and the area has largely been converted to urban land uses due to urbanization and commercialization (Fang et al., 2009). The Taoyuan Tableland is located between the northern boundary of Linkou Tableland (23°05′N, 121°17′E) and the southern boundary of Hukou Tableland (22°55′N, 121°05′E), with the town of Yingge to the east (22°56′ N, 121°20′ E) and the Taiwan Strait to the west (22°75′ N, 120°99′ E) (Department of Land Administration, 2002) (Fig. 1).
In the past, there were no obvious or stable flows of rivers in the Taoyuan area. For residential uses and agricultural irrigation, many water storage facilities, such as irrigation ponds, have been established. In 1913, there were 10,000 ponds, the largest number recorded in the area. With the completion of the Shimen Reservoir and the Taoyuan and Shimen Canals, which provide new sources of irrigation water, many of these ponds no longer provide irrigation functions, but they still retain some other functions, such as decreasing the temperature. There are 2,851 still serving as irrigation ponds in Taoyuan now, according to a recent survey (Chou et al., 2017). Therefore, an effective method for monitoring the irrigation water quality change in ponds to evaluate the effects of urbanization on local water resources will be beneficial.
The Taoyuan Irrigation Association (TIA) and the Shihmen Irrigation Association (SIA) have a number of water quality monitoring stations in this area, as shown in Fig. 1, but only three of them (Shetzu No. 1, Jao No. 32B, and Jao No. 37B) monitored the water quality constituent of interest of this study, namely, the TP, within the time frame of the available satellite images. Additionally, these three stations were relatively new, and recently, floating photovoltaic power plants were placed on some portion of these three ponds, which cover some surfaces of the ponds (Song et al., 2018). The water quality data were based on seasonal field measurements from February 2018 to August 2019, and the data are available from the SIA (https://www.smia.gov.tw/water1.asp). Among these three stations, Shetzu No. 1 and Jao No. 37B collected monitoring data on seven days, while Jao No. 32B collected monitoring data only on three days during this period. Jao No. 37B could not be detected by Landsat 8 imagery and was later excluded from this study. Consequently, only Shetzu No. 1 (24°54'25.2"N 121°14'45.0"E) and Jao No. 32B (24°56'17.6"N 121°04'11.1"E) were studied, and their locations are shown in Fig. 1.
Data analysis
Only images with low cloud coverage in the Taoyuan Tableland area were chosen and needed, and this criterion significantly reduced the size of the image pool suitable for analysis. For example, the water quality sampling data collected on February 18, 2019, were excluded from subsequent analysis because the corresponding satellite images within one month were obscured by clouds, as shown in Fig. S1 (Electronic Supplementary Material). The target point was the location of the Taoyuan area in Fig. S1(a)-(d). From the pool of suitable images, five Landsat 8 images collected within one month of water quality measurements obtained by the SIA were selected for this study, as shown in Table 1. All Landsat 8 images were collected from the United States Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center (https://www.usgs.gov/centers/eros). The image specifications of the Operational Land Imager (OLI) spectral bands from Landsat 8 are listed in Table 2. However, this study only used information from band 1 to band 7. Among these bands, band 6 and band 7, so-called shortwave infrared (SWIR) bands, were used for atmospheric correction (Markham et al., 2014), and the others (band 1 to band 5) were processed at a 30-m resolution with ACOLITE (Quinten, 2017). The details from ACOLITE are shown in Fig. S2 and Fig. S3 (Electronic Supplementary Material). A distinct advantage of using the SWIR bands is that it does not require a clear water pixel to determine the type of aerosol. Therefore, imagery is not affected by stray light (Quinten and Ruddick, 2015).
Atmospheric corrections using the SWIR bands can be used to obtain reflectance values without being affected by atmospheric path radiation. (Quinten and Ruddick, 2015). Images from digital number (DN) conversion for spectral radiant (L) first format and then by atmospheric corrections using ACOLITE SWIR bands for processing to determine the surface reflection as
where ML is a multiplicative factor and AL is an additive factor provided in the metadata file. L is normalized to the band average of solar irradiance, and the spectral radiation reflectance (ρ) is calculated as follows:
where d is the distance between the earth and the sun, expressed in astronomical units; θ0 is the zenith angle of the sun; and F0 is the band average of solar irradiance. The reflectance of foam and whitecaps can be estimated from wind speed by empirical relationship. Notably, the reflectance of foam and whitecaps is assumed to be substantially corrected by following aerosol correction (Gordon and Wang, 1994). The atmospheric correction in this study can be simplified as
where pr and pa are the reflectances resulting from Rayleigh and aerosol scattering, respectively. pr represents the interaction between the two, and multiple scattering (pa) can also be included in the estimation. to is the Sun-sea diffuse transmittance. tv is the sea-sensor diffuse transmittance. pw is the parameter of interest, the water-leaving radiance reflectance, which is defined as the multiplication product of π and the water-leaving radiance divided by the above-water downwelling irradiance (Quinten and Ruddick, 2015).
The reflection values of the water quality stations were extracted from atmospherically corrected satellite imagery and used for analysis. Chosen pixel coordinates based on water quality sampling stations; however, it is unlikely to be ideal for the extraction of reflection values for several possible reasons, including errors between the coordinates and the actual sampling sites, noise caused by random debris on the surface of the reflected light from a nearby object pond area due to atmospheric scattering, and/or near the water sampling stations, which may be shallow, making reflections at the bottom of the problem. To obtain the reflection of the minimum error value, the search range was expanded to the entire pond area around the pixel designated to ensure that the position of the selected pixel in each image had the corresponding reflectance values. If two pixels have the same reflectance values, the pixel closest to the coordinates of the water quality sampling location, namely, the pixel in the center of the pond, was selected.
Multiple regression analysis
Multiple regression analysis was used in this study because it can generate results such as equations without using specialized software. Additionally, the relatively small sample size in this study does not warrant the use of ANNs. Correlation equations are derived to determine a quantitative relationship between the reflectance from each band of a given cell as the dependent variable and the concentration of the water component of interest, such as TP, in the cell as the independent variable. The SWIR band was used for the atmospheric correction, and consequently, band 6 and band 7 were not included in the regression. The other bands (band 1 to band 5) were used in subsequent calculations. In the regression analysis, the band ratios were used as independent variables because they are less likely to be affected by light conditions (Jensen, 2007). In the subsequent variable selection process, the initial independent variables considered are tabulated in Table 3, where all band and band ratios are based on the reflectance.
The selection of predictor variables for TP was based on multiple regression analysis with forward selection. Variables are added to the regression one at a time in the forward selection, then starting with no predictor variables. The p-value threshold includes a predictor in the multiple regression equation if the p-value is lower than a "probability to enter". In addition, the predictor that will most improve the fit first is used, namely, in a "forward" approach. A default value of 0.05 in SPSS (IBM, 2019) was used for the "probability to enter". In the forward selection method, the variation inflation factor (VIF) should be further considered in the multiple regression analysis to avoid multicollinearity in the model. When a predicting variable is a linear combination of other prediction variables in the model, multicollinearity occurs. The direct consequence of multicollinearity is that the error variance is exaggerated. If the overfitted model is used with a new set of data, it may result in a low prediction power. The VIF is calculated as follows:
where Rj is the multifactor coefficient of determination between the predictor variable of interest and others. A good rule of thumb to avoid severe multicollinearity is that all selected predictors should have VIF values of less than 10. (Chatterjee and Simonoff, 2013; Hocking, 2013). To include the VIF in the forward selection, the following procedure was proposed. First, make a forward selection. Second, we ensured that all the selected variables had a p value less than 0.05 and started to delete variables starting from the top of the p-value until all the variable p-values were as far as possible to below 0.05. Third, check the VIF of the remaining predictor variables, and if one of them has a value higher than 10, delete the variable with the highest VIF. Last, repeat step two; if all the variables of p-values are less than 0.05 and VIF values are less than 10, is to stop the process. Figure 2 illustrates the proposed procedure. As mentioned, when a variable was deleted from the model, the coefficients of variables, p-values, and VIFs were dynamically recalculated.