Study sites. The study comprises 4093 stream sites randomly distributed throughout nine ecoregions of the conterminous United States38, encompassing a subcontinental area of approximately 1,231,000 km² and including lotic systems ranging from the Mississippi River to headwater streams (Supplementary Fig. S1). The nine ecoregions are areas of similar environmental characteristics, such as climate, vegetation, and geology (Supplementary Fig. S1). Thus, the sites are distributed across an area of distinct climatic conditions and are subject to different human activities operating at different intensities (Fig. 1a). The mean annual temperature ranges between − 1.07°C and 23.76°C, and mean annual precipitation between 128 mm yr− 1 and 4160 mm yr− 1. We used data from the EPA (Environmental Protection Agency; https://www.epa.gov/) collected during the summers of 2008, 2009, 2013, and 2014 from U.S. streams and rivers that have flowing or standing water in at least 50% of the reach length when sampled. Most sites were sampled in only one year (that is, not sampled over the four years), so the samples are temporally independent. Field sampling followed strict standardized protocols to provide a comprehensive assessment of aquatic assemblages and their environments39,40.
Fish and insect collection. Fish were sampled as described in USEPA39,40, nearly all by backpack or boat electrofishing. In wadeable sites < 13 m wide, sites were 40 wetted channel widths long, or a minimum of 150 m. For boatable sites and wadeable sites > 13 m wide, sites were 500 m or 20 wetted channel widths long and they were sampled until 500 individuals were collected or until 40 wetted channel widths were reached. Crews adjusted the settings on their electrofishing units so as to minimize stress and mortality, did not hold individual fish in the electrical field, exchanged live-well water continuously, and individual fish were released alive after they had been identified (except for voucher specimens that are anesthetized in clove oil or MS-222 prior to preservation in formalin). All biological sampling was compliant with federal and state scientific collecting permit requirements (which differ from state to state and agency to agency), and by following Standard Methods for Sampling North American Freshwater Fishes (American Fisheries Society, Bethesda, MD).
Insects were sampled by D-frame kick nets (0.09 m2 area, 500-µm mesh) at 11 systematic stations per site41. They were collected in a zig-zag pattern at wadeable sites. At non-wadeable sites, they were collected in the nearshore zone (alternating between banks). The 11 station samples were combined into a site composite sample, preserved in ethanol, and identified to genus. For consistency, 300 individuals were randomly selected from each sample and used in the data analyses. Insects were killed and preserved in ethanol prior to identification.
Assessing assemblage richness and density. We quantified the total density (individual number) of fish and insect in each stream site. We also quantified the observed richness of fish (i.e., number of fish species) and insects (i.e., number of insect genera) in each stream site. Moreover, because sampled species richness is a negatively biased estimator of the “true” species richness, we corrected sampled species richness with the coverage-based rarefaction and extrapolation methodology42. We fixed the coverage of all samples at 95% via rarefaction and extrapolation using the R package iNEXT to make species (fish) and genera (insect) richnesses comparable across sites43. Therefore, in addition to the observed richness, we also calculated the rarefied richness (Chao's richness: rare and abundant species are weighted equally) of fish and insects. For simplicity, we present only the rarefied richness in the main manuscript and the observed richness in the supplementary material.
Local contribution to beta-diversity. We used Legendre & De Cáceres19 to quantify compositional uniqueness from total beta diversity for fish and insect assemblages. To do so, we assembled the presence-absence data of fish and insects for each site. Next, we measured the total β-diversity (BDTotal) as the variance of assemblage composition for the conterminous U.S19. Then, we partitioned the BDTotal into local contributions to beta diversity (LCBD; Eq. 1):
LCBD=SS / SS
LCBD is the result of the squared distance from the sampling site i (SSi) divided by the total sum of squares of the data matrix (SSTotal). Specifically, the LCBD represents the degree of uniqueness in taxonomic composition in each site; i.e., it quantifies how much a specific site (a row in the matrix) contributes to the overall variance in the assemblage19. High LCBD values indicate a unique assemblage composition, whereas low LCBD values indicate a more common species/taxa set. We further partitioned LCBD variation into two components, species turnover and nestedness. High turnover values indicate that the changes in LCBD result from species replacement, whereas high values of nestedness indicate that the changes in LCBD result from species gains and losses from a nested assemblage20. LCBD(beta−total) as well as its components (turnover and nestedness) were calculated using the Jaccard dissimilarity index. To assess ecoregional and national changes in fish and insect assemblage composition, we calculated the LCBD and its components of each stream site to the ecoregion in which it occurred (i.e., LCBDecoregional, turnoverecoregional, and nestednessecoregional for each of the 9 ecoregions separately). Next, we calculated the LCBD and its components of each stream site to the entire U.S. (i.e., LCBDnational, turnovernational, and nestednessnational including all 9 ecoregions together). Lastly, because the sites were surveyed in different years, we calculated LCBD and its components separately for 2008, 2009, 2013, and 2014 to avoid comparison between distinct years. The LCBD and its components were computed using the function “LCBD.comp” in the R package ‘adespatial’44.
Assessing habitat complexity. At each site, we estimated instream physical habitat complexity according to (18, 45). Instream habitat complexity represents the mean of the summed cover (XFC_NAT) of six types of instream habitat complexity features: (i) undercut banks, (ii) rocks and boulders, (iii) large wood, (iv) brush, (v) instream live trees, and (vi) overhanging vegetation. At 11 evenly distributed plots along each site, field crews visually estimated the presence and proportional cover class of each of those 6 types according to field procedures described by USEPA39,40. For each fish concealment type, field crews estimated areal cover in four classes: absent (0), sparse (0 to 10%), moderate (10 to 40%), heavy (40 to 75%) and very heavy (> 75%). The site cover for each fish cover type was calculated by assigning cover class midpoint values (i.e., 0%, 5%, 25%, 57.5%, and 87.5%) to each plot’s observations and then averaging those cover values across all 11 plots45. The instream habitat complexity index ranged from 0 to 2.206, where 0 means that the stream has very low/absent habitat complexity, and 2.206 indicates that the stream harbors high habitat complexity. We then calculated the median (0.259) of the instream habitat complexity index and, using the median, we classified the sites into two habitat complexity categories: (i) high-habitat complexity (sites with habitat complexity values equal or above the median, i.e., ≥ 0.259) and (ii) low-habitat complexity (sites with habitat complexity values below the median, i.e., < 0.259). The median is a threshold that clearly separates two site classes, according to habitat complexity (Supplementary Fig. S14). Stream sites with high- and low-habitat complexity differ greatly in their structure and capacity to provide refuge and microhabitat for aquatic organisms (Supplementary Fig. S15). Consequently, this site separation allowed us to compare two distinct classes of habitat complexity (high and low) and then assess the impacts of human land-use on biodiversity for both classes. There were 2049 sites classified as “high-habitat complexity” and 2044 classified as “low-habitat complexity” (Supplementary Fig. S14). The instream habitat complexity index responds strongly to increasing human pressure18 and may act as an important driver of fish and insect diversity35. Therefore, this index is a potential mediator of the impacts of human land uses (i.e., urban and cropland intensification) on the diversity and composition of fish and insect assemblages.
Assessing urban and cropland intensification. For each site, we quantified the catchment intensities of the two most common human land uses worldwide1,23,38: (1) urbanization and (2) cropland. We quantified urbanization by examining four geospatial indicators taken from the EPA’s StreamCat dataset39,40,46 that represent direct proxies of increasing urban areas: (i) human population in the catchment (PopDen_Cat) based on TIGER line files from the US Census (TIGER Lines)47, (ii) percentage of the catchment that has impervious surface (PctImp_Cat) based on the National Land Cover Database (NLCD)47, (iii) road density in the catchment (RdDens_Cat) from the US Census data (TIGER Lines48 and (iv) percentage of the catchment composed of high-intensity human land use (PctUrbHi_Cat) (NLCD)47. These urban proxies were then used to create a metric of urban intensification, using Principal Component Analysis (PCA). The first PCA axis synthesized the major source of variation in the original four proxies (80%-86%). Therefore, we used the PCA-axis one scores (Supplementary Table S57) for each site to summarize these proxies into a single metric of urban intensification. This axis quantified a gradient of urban activities, where increasing scores represent increased urban intensification (Supplementary Table S57). Cropland intensification was quantified by the percentage of the watershed composed of agricultural activities derived by summing the NLCD crop and hay land-use classes48.
Statistical analyses
Linking habitat complexity to fish and insect biodiversity. We used mixed-effects models to determine the responses of species richness, total density, LCBDecoregional and LCBDnational to habitat complexity classes (categorical, two levels: high and low). Random effects considered were ecoregions and time, which accounted for spatial and temporal differences in the datasets (models 1–4).
(1) richness ~ habitat complexity class + (1|ecoregion) + (1|time)
(2) log-total density ~ habitat complexity class + (1|ecoregion) + (1|time)
(3) LCBDecoregional ~ habitat complexity class + (1|ecoregion) + (1|time)
(4) LCBDnational ~ habitat complexity class + (1|ecoregion) + (1|time)
We evaluated taxa richness using a generalized linear mixed-effects model assuming Poisson-distributed errors. Moreover, log-transformed total density and LCBD were analyzed assuming normally distributed errors (Gaussian) following previous studies25,49. We recognize that other error structures might be more appropriate to model response variables between 0 and 1 (e.g., LCBD). For instance, using a beta distribution might be more appropriate for modeling LCBD responses50. However, we used normal distribution because this allowed us to obtain easily interpretable coefficients on all assemblage metrics. Also, Blowes et al.5 found a strong positive correlation (Spearman = 0.90) between coefficients estimated with the Gaussian and Beta error, which supports the use of the Gaussian error in our study. We calculated percent change in the diversity metrics by scaling the percentage of change in low-habitat complexity based on the changes in high-habitat complexity (baseline). This approach allowed us to compare the results in low-habitat complexity with those in high-habitat complexity. The assumptions of normality, linearity and homoscedasticity were verified using graphical diagnostics. The models were run using the lme4 R package51.
Effect of human land uses on fish and insect biodiversity mediated by habitat complexity. Using mixed-effects models we also determined the effect of human land uses (continuous: urban and cropland intensification) on the richness, total density, and composition (LCBD) of fish and insects and how those effects interacted/covaried across habitat complexity classes (high and low). We evaluated log-transformed taxa richness, log-transformed total density, and LCBD assuming normally distributed errors in linear-mixed effect models (models 5–8). We then calculated the percent change in richness, total density, LCBD, and taxonomic dissimilarity with increasing urban and cropland intensification in each habitat complexity class. We compared the percent changes in the low-habitat-complexity class with those in the high-habitat complexity class.
(5) log-richness ~ human land-uses × habitat complexity class + (1|ecoregion) + (1|time)
(6) log-density ~ human land-uses × habitat complexity class + (1|ecoregion) + (1|time)
(7) LCBDecoregional ~ human land-uses ×habitat complexity class + (1|ecoregion) + (1|time)
(8) LCBDnational ~ human land-uses × habitat complexity class + (1|ecoregion) + (1|time)
Assessing the consistency of human land use effects on diversity and composition of fish and insect assemblages. To test whether urban and cropland intensification consistently affect richness, density, and LCBD of fish and insects we re-ran the mixed-effects models, but this time accounting for other drivers, such as climate (mean annual temperature [MAT] and precipitation [MAP]), natural land covers (forest/wetland, grassland/shrub, and riparian vegetation) and stream size (stream depth and catchment area). The sampling methodology for these drivers are explained in detail in the Supplementary Methods. Importantly, water temperature was not available in our dataset and we only have information on air temperature. We recognize that air temperature cannot fully represent the thermal regime of the stream/river, but air- and water temperature are correlated52 and many studies reported strong impacts of air temperature on aquatic biodiversity3,22,23,26. We modeled log-richness and log-density using a normal distribution, and LCBDecoregional and LCBDnational using beta distribution in the R package ‘glmmTMB’53. We then applied a model averaging procedure that calculated all possible subset models and chose those subset models with the lowest values (ΔAICc ≤ 2) of the Akaike Information Criterion corrected for small sample size (AICc). This analysis was conducted using the R package ‘MuMIn’54.