All analyses were carried out using R v4.1.0 with RStudio v2021.09.0 36,37 and ArcGIS Pro v3.1.0 38.
Threatened status and population trends
To assess and contextualise the threatened status of marine small cetaceans, we used data from the IUCN Red List 16. The IUCN Red List is the most commonly used method to assess the extinction risk of species at the global scale 39. Efforts to assess and periodically re-assess species are led and co-ordinated by the various specialist groups of the IUCN Species Survival Commission, alongside expert contributors from around the world. Full details of assessment methodology can be found on the IUCN’s website (www.iucnredlist.org) and in associated peer-reviewed publications 40 We extracted current and historical IUCN Red List categorisation of species and associated population trends for all small cetaceans (n = 72), great whales (n = 16), pinnipeds (n = 36), chondrichthyans (n = 1,234), and ray-finned bony fishes (gigaclass Actinopterygii, n = 10,164) which have a marine or partly marine distribution. Freshwater obligate species were not considered because they face a different landscape of threats. IUCN Red List data was extracted for the years 1996–2023, assessments made between 1996 and 2002 use IUCN Categories and Criteria Version 2.3, those made after 2002 use Version 3.1 40.
IUCN Red List Categories divide species into extinct (Extinct, Extinct in the Wild) and extant. Extant species are further sub-divided into threatened (Vulnerable, Endangered, and Critically Endangered), currently non-threatened (Least Concern, Near Threatened), and those for which there is too little data for assessments to be made (Data Deficient). Species in the higher threatened categories are under sequentially increasing extinction risk. Species are also assigned a population trend status of Increasing, Stable, Decreasing, or Unknown. To estimate the percentage of small cetaceans, great whales, pinnipeds, chondrichthyans, and ray-finned fishes that are threatened we assume that the relative proportions of threatened and unthreatened groups in Data Deficient species is equal to that of data-sufficient species.
$$Threatend \left(estimate \%\right)= \frac{Critically Endangered+Endangered+Vulnerable}{Total assessed- Data Deficient}$$
Species attributes
The extinction risk of vertebrates is closely linked to biological and geographical attributes across a wide range of taxa 10,12,41–43. In the marine realm, much of the recent research in this area has focused on bony fish and chondrichthyans. From a biological perspective, maximum size, often used as a proxy for growth rate and time to maturation, and reproductive or population growth rates are key predictors of extinction risk 10,12,44. Maximum size varies widely even among small cetaceans, ranging from the Vaquita (Phocoena sinus) at 1.5 m to the 12.8 m Baird’s Beaked Whale (Berardius bairdii). However, unlike bony fish and chondrichthyans, population growth rates of small cetacean are broadly similar across species at 3–8% per year 26,27 and is therefore unlikely to be a major predictor of varying threatened status. From a geographical perspective, distribution ranges and distribution depths may be linked with extinction risk. For example, in transboundary species like many chondrichthyans, species spread across many countries appear to be at elevated extinction risk, likely a result of disjointed management approaches 10,31. Those species restricted to shallower depth ranges and closer to shore are also at heightened extinction risk because they are more exposed to anthropogenic impacts, such as fisheries, and have less natural refugia 10–12. Similarly, small cetaceans are largely transboundary and species distributions range from shallow water, coastal obligates to offshore oceanics. For small cetaceans we extracted maximum size from SeaLifeBase (www.sealifebase.se) and the number of countries across which each species was distributed from the IUCN Red List (www.iucnredlist.org). The maximum water depth of species distribution was extracted from the IUCN Red List, which provides the only standardised global classifications of this type 16. Where a specific value was not given, species were assigned as Marine Coastal/Supratidal/Intertidal (50 m), Marine Neritic (100 m), Epipelagic (200 m), Mesopelagic (1,000 m), or Bathypelagic (4,000 m) based on their IUCN Red List Habitat Classification Scheme designation.
Identifying key threats and defining proxies
Threats to small cetacean species were also extracted from the IUCN Red List. Threats are ranked into six major categories: High Impact, Medium Impact, Low Impact, No/Negligible Impact, Past Impact, and Unknown. To identify the key threat faced by small cetaceans, we extracted only those threats considered to have High or Medium impact (Fig. 2). Once the key threats were identified, a series of proxies for these were compiled for analysis. Threat proxies were compiled for 163 countries with marine waters.
Re-estimated marine fisheries catch from large- (industrial) and small-scale (artisanal and subsistence) fisheries for 2019 were collated from the Seas Around Us database 45 and were used as proxies for fishing pressure. Fisheries governance strength might be expected to mediate the impact of fisheries on extinction risk. Fisheries Management Index from the Ocean Health Index (www.oceanhealthindex.org), which is extrapolated from research into the predictors of relative fisheries management efficacy among countries 29, were used as a proxy for fisheries management strength. Commercial, passenger, and recreational vessel traffic was extracted from World Bank vessel density maps for 2015-21 46 and represented ship-strike risk and noise pollution measures from both sectors. Additionally, the mean number of oil and gas rig counts per country for 2017-21 (https://rigcount.bakerhughes.com/intl-rig-count) was taken as a measure of the intensity of oil and gas exploration and drilling activity.
FAO crop production, livestock production and forestry production data for 2017-21 were compiled as a measure of the relative levels of effluent and pollutants associated with agricultural and forestry activity 47,48. Estimates for untreated wastewater discharge were also compiled 49; untreated wastewater includes that which originates from agriculture and forestry as well as other sources such as urban, industrial, and military uses. Annual plastic waste flow into the marine environment 50 was used as a proxy for human and industrial garbage and solid waste inputs.
Assessing the predictive importance of key threats and species attributes
We used key threat proxies and species attributes (Supplementary Information) to model the IUCN Red List categories of small cetaceans to analyse the relative importance of threats and attributes in predicting extinction risk. IUCN Red List categories were encoded in line with the IUCN Red List Index 16, whereby Extinct species are assigned a value of 0.0, Critically Endangered a value of 0.2, Endangered 0.4, Vulnerable 0.6, Near Threatened 0.8, and Least Concern 1.0. The nine Data Deficient small cetacean species were excluded from the analysis. Fisheries Management Index scores for each species were calculated as the mean of the Fisheries Management Index of all countries across which a species is distributed. All other independent variables were summed across all countries in which a species is distributed and standardised by the total coastline length. All independent variables were log-transformed to improve their distributions.
The model was built using eXtreme Gradient Boosting (XGBoost) 51, which is a form of Boosted Regression Tree. XGBoost is a powerful predictive model able to handle non-linear relationships between the dependent and independent variables and complex variable interactions, and is resilient to co-linearity among independent variables 52. However, to avoid redundancy within independent variables submitted to modelling, we tested for evidence of high co-linearity (R > 0.8) using Pearson’s correlations. Untreated wastewater was highly co-linear with crop agriculture (R = 0.87), livestock agriculture (R = 0.91), and forestry (R = 0.88) production. Further, crop agriculture and forestry production were also highly co-linear (R = 0.87). Untreated wastewater was selected to be retained for analysis because the variable is considered a more holistic proxy for effluent release into coastal environments.
First, the model hyperparameters for eta (the learning rate), maximum tree depth (the complexity of variable interactions), minimum child weight (minimum sum of instance weight per child node), subsample (subsample ratio of the training instance), and gamma (the maximum loss reduction) were tuned using 3-fold cross validation. Both subsample and gamma hyperparameters are used to reduce the likelihood of overfitting. Early stopping was used to tune the number of trees in the model. Monotonic constraints were added to the model to reflect our existing understanding of extinct risk, whereby species living in deeper offshore waters are less exposed to human impacts and therefore at lower extinction risk 11,12,19 and where threat proxies increase extinction risk should also increase 16. Monotonic constraints serve to make the model more stable and generalisable. Because of the high imbalance among IUCN Red List Index values (e.g., 2 species with an Index value of 0.2 and 42 species with an index value of 1) we weighted values inversely to their proportional contribution. Root mean squared error was used as the measure of model fit. A total of 1,800 unique combinations of hyperparameters were considered. The hyperparameters which produced the best root mean squared error during the tuning stage were eta = 0.1, maximum tree depth = 2, minimum child weight = 1, subsample = 0.9, and gamma = 0.1. The number of trees selected was 87.
The final model was fitted using 10,000 bootstrap iterations, within which data were randomly split into an 80%/20% train/test sets. Monotonic constraints and data weights were retained in the final model. For each bootstrap iteration of the model we extracted bias (average difference between real and predicted IUCN Red List Index values), relative importance of independent variables, the marginal effects for each variable, and we made predictions for data deficient species. There was little evidence for bias across the 10,000 bootstrapped iterations (3.27 x10− 2 [95% CI 3.18 x10− 2 to 3.36 x10− 2]). Root mean squared error of the final model was (6.54 x10− 2 [95% CI 6.52 x10− 2 to 6.55 x10− 2]).
Do small- or large-scale fisheries have the larger causal impact on extinction risk
To explore the relationship between fisheries and extinction risk we assessed and subsequently compared the causal effects of small- and large-scale fisheries. To examine causality, rather than just predictive power, we applied a structured causal model framework using Directed Acyclic Graphs with a backdoor criterion. We proposed a causal structure for the effect of fisheries on extinction risk whereby fisheries management controls fishing pressure, but not bycatch risk because few bycatch mitigation programs are implemented globally 33,34, and traits which influence fisheries exposure (maximum depth and number of countries a species is distributed across) moderate the causal impact of fisheries (Fig. 5). Causal effects on extinction risk (IUCN Index) were then modelled using two separate generalized linear models for small- and large-scale fisheries independently. The effects of both fisheries types could not be modelled together because they are highly co-linear (r = 0.885). Models included fisheries pressure (Sea Around Us catches); the interaction of fishing pressure and maximum depth and fishing pressure and number of countries (representing the moderating effect of these variables); and fisheries management (Fisheries Management Index) as a control variable.
Does research effort reflect the relative importance of threats?
The relative level of research effort in each threat was derived from the peer-reviewed literature. Searches were carried out in Web of Science (www.webofscience.com). All searches were conducted on the 5th of April 2023. Search results were restricted to include only articles written in English that were published between 1901 and the date on which searches were conducted. Searches were constrained to title and abstract content only and limited to only peer-reviewed publications. A series of initial search strings using Boolean logic were explored, combining terms relating to small cetaceans and seeking to maximise the number of relevant search returns whilst minimising the inclusion of extraneous results. The chosen search string combined the terms “odotocet*”, “toothed whale”, “dolphin”, “porpoise” and the species names of all 72 species explored in this study. Subsequent searches were then run to identify literature relevant to each specific threat, by appending additional parameters to the initial search. Full details of the Boolean logic search strings used are available (Supplementary Information).