To develop both metrics, we used the historic NYSDEC macroinvertebrate data set to (1) identify indicator macroinvertebrates, (2) generate distribution frequencies for each indicator organism / water quality condition combination, and (3) used these indicators and frequencies to calculate and assess the two metrics. To prevent bias, we separated the historic data set into a training data set and a test data set. Steps 1 and 2 were accomplished using the training data set and step 3 was performed on the test data set.
Working Data Set
The NYSDEC collects water quality data and information in streams and rivers on a statewide, 5-year cycle. This includes benthic macroinvertebrate samples to estimate water quality impacts on aquatic life. For this project, the focus was on wadeable streams and rivers sampled using a traveling kick method which consisted of a one‐time, 5‐m diagonal transect sample through a riffle area over 5 minutes. Samplers kick the bottom substrate and collect the dislodged organisms in a 0.25m X 0.5m, 900 µm mesh net held downstream. Sampling occurs during the July to September index period. From each sample, a random 100-organism subsample is removed and each individual specimen is identified to the lowest possible taxonomic resolution, typically genus or species (Bode and Novak, 1995; NYSDEC, 2021; Riva-Murray et al., 2002; Smith et al., 2013, 2007).
We compiled all kick samples from 1990 to 2018. To prevent bias introduced by frequently resampled sites, we restricted the dataset to the most recent sample from each location. This resulted in 2842 samples collected from 1990 unique streams and rivers. We divided this data set randomly into a training data set and a test data set, each with 1421 samples.
Condition Categories
NYSDEC uses a multimetric index of biological integrity, called the Biological Assessment Profile (BAP) score, to summarize benthic macroinvertebrate data and report water quality impacts on aquatic life. For the traveling kick method, individual component metrics of the BAP include species richness, Hilsenhoff's biotic index (Hilsenhoff, 1988), Ephemeroptera– Plecoptera–Trichoptera richness (Lenat, 1988), percent model affinity (Novak and Bode, 1992), and the Nutrient Biotic Index—Phosphorus (Smith et al., 2007). BAP scores are calculated by normalizing component metrics to a 10 scale and taking the average. The BAP score is assigned to a four-tiered system of impact category: non (7.5–10), slight (5.0-7.5), moderate (2.5-5.0), or severe (0-2.5) impacts (NYSDEC, 2021). A final BAP score below 5 is associated with significant loss of biodiversity, functional organization, and ability to support a balanced community as compared to natural conditions (Karr,1991; Davis and Simon,1995) and suggests that the sampled stream is biologically impaired. A BAP score above 5 indicates that aquatic life in the sampled stream is unimpaired and reflects that of natural conditions or only slightly altered from natural.
Identifying Indicator Taxa
We used the training data set to identify indicators of impaired and unimpaired biological conditions. Taxonomic resolution was reduced to family level (with five exceptions), reflecting the extent a volunteer would likely be able to visually distinguish taxonomic differences in the field. This decision was based on our experience identifying organisms in the field without a microscope as well as interaction with a broad range of volunteers. Exceptions included Pelecypoda, Hirudinea, and Turbellaria which were reduced to class and Amphipoda to order because of difficulty distinguishing these organisms in the field. We kept Chironomus spp. (Diptera: Chironomidae) at genus because they are generally identified by their red coloration.
We selected the indicator taxa from the training data set by comparing taxa present in each condition category. To improve our resolution between impaired and unimpaired categories, we only used non-impacted samples (BAP > 7.5, n = 406) to identify indicators of unimpaired biological condition. All moderately and severely impacted (n = 287) macroinvertebrate community samples (BAP < 5.0) were used to represent the impaired condition due to the limited number of severely impacted samples (BAP < = 2.5). We calculated the Sørensen index to estimate community similarity within each category and calculated the relative contribution of each taxon. The Sørensen index is the most commonly used index in community ecology to compare populations using presence/absence data (Chao et al., 2006). Selected indicator taxa were more abundant within respective condition categories and contributed less than 2% to the Sørensen index of the opposing category.
Calculating Distribution Frequencies
We also used the training data set to calculate the frequency of each indicator taxa in each impact category, a necessary step to calculate the probability of impairment. We calculated these frequencies for non and slight impact categories separately but combined the moderate and severe impact categories into one impaired category because the sample sizes were so small. Specifically, frequency was the number of samples containing the indicator taxa divided by the total number of samples in the impaired, slightly impacted, or non-impacted categories. We also calculated the frequency of impaired, slightly impacted, and non-impacted populations overall.
Calculating and Assessing Metrics
We assessed five possible metrics using the presence of threshold indicator taxa (PTIT) with the test data set. We calculated the frequency at which 3, 4, 5, 6, and 7 minimum indicator taxa correctly or incorrectly (type 1 errors) identified samples collected from unimpaired streams (BAP > 5) or samples from impaired streams (BAP ≤ 5).
We assessed five possible metrics using the TPI also with the test data set. For each sample in the test data set, we calculated the probability of biological impairment and the probability biological condition is unimpaired using a modified Naïve Bayes equation (equations 1 and 2, respectively) (Russell and Norvig, 1995). Using the frequencies calculated from the test data set, we calculated the probability a sample in the test data set indicated biological impairment as the frequency of impaired samples multiplied by the frequency each indicator taxa found in the sample is found in impaired samples and divided by the frequency of impaired, slightly impacted, or non-impacted samples overall (Eq. 1). We calculated the probability a sample in the test data set indicated unimpaired conditions using the same equation but with non-impacted frequencies in the numerator (Eq. 2). Finally, we calculated the frequency at which samples with 50%, 70%, 90%, 95%, and 98% minimum probabilities matched the correct or incorrect (type 1 errors) condition category.
Where: NI = non-impacted; SL = slightly impacted; IM = impaired; IT = indicator taxa