In the present study, we examined the navigation routes used by various researchers during whale monitoring in the Gulf of California. Our findings highlight that only a tiny proportion of spatial units can be considered well-surveyed, and this biased pattern persist consistently across different resolutions. We observed an imbalanced distribution of survey effort, where fewer than ten navigation routes are sufficient for a cell to be among the top 5% with the highest number of routes. This result aligns with common patterns seen in the distribution of collection effort distribution across various biological groups (e.g., Boakes et al. 2010 or Lobo et al. 2018; García-Roselló et al. 2023), emphasizing the necessity for systematic monitoring to accurately estimate the abundance and distribution of both terrestrial and marine organisms (Tyne et al. 2016; Mannocci et al. 2018). Frequently, data collection occurs opportunistically and relies on prior knowledge of areas with high animal occurrence (Kot et al. 2010; Embling et al. 2015; Tyne et al. 2016). The prevalent research vessels in the Gulf of California are often small (length < 24m) and restricted to navigating close to the shore. Sampling in open waters, far from the coast, is feasible only with a few large vessels or by air routes, albeit at a significantly higher cost. Utilizing data on whale distribution from online opportunistic platforms substantially reduces research costs, particularly for animals with a high capacity for movement who spend a significant portion of their time diving (Evans and Hammond 2004). Through our analysis of navigation routes at various resolutions, we identified evidence of historical selectivity in recording whales, with only 3–10% of cells considered well-surveyed.
Nevertheless, a crucial question arises: Does the substantial spatial heterogeneity in survey effort stem from environmental and socio-economic factors, or is it merely a result of directing more efforts towards areas with a higher likelihood of whale sightings? The first scenario suggests that improving survey coverage would facilitate more accurate estimation of the density and distribution of various whale species. Conversely, the latter possibility implies that the distribution of survey efforts is associated with the distribution of whales, reflecting the knowledge and prolonged interaction between data collectors and the whale population. Gaps in biodiversity information may arise through a poorly considered mechanism. The existence of previously published or unpublished knowledge might suggest that a particular locality lacks biological interest, thus discouraging additional collection efforts (Dennis and Thomas 2000). We propose referring to these seeming data gaps as “knowledge gaps". Such gaps are likely to emerge once a certain threshold is surpassed in the inventory process of a region, especially in regions that undergo a relatively extensive collection effort for their biodiversity. This happens when one avoids sampling areas that, in the future, may have little interest, and collection efforts increasingly concentrate on locations recognized for their quality and ease in obtaining data on the target organisms. In this study, we suspect that these “knowledge gaps” are likely the primary cause of the relationship between the distribution of survey effort and whale spatial preferences. The spatial distribution of model residuals, along with their significantly autocorrelated values indicated that cells with the most navigation routes exhibited a much greater survey effort than expected based on environmental variables. These over-surveyed areas have recently been identified through satellite data as regions with a high occurrence of fin whales (Balaenoptera physalus) (Jiménes López et al. 2019). These areas predominantly include the route between Loreto and La Paz in the southwestern part of the Gulf and the waters between Bahía de Los Angeles and Angel de la Guarda Island in the north (Jiménes López et al. 2019). Moreover, the northern part of the Gulf of California has been recognized as a specific area where fin whales forage on daytime surface swarms of euphausiids (Ladrón de Guevara et al. 2008). Bahía de La Paz was another over-surveyed area where many cetacean species were also observed (Salvadeo et al. 2011; Pardo et al. 2013; Antichi et al. 2022).
Our results indicate that the distribution of survey effort is determined to minimally by environmental conditions, as these variables only partially influence it. This partial influence may be attributed, in part, to surface whale records being influenced by factors not considered in this study, affecting the dive behavior of individuals (Higby et al. 2012), or intermediate-depth conditions (Pardo et al. 2013; Dransfield et al. 2014). Be that as it may, as spatial resolution increases, the number of statistically significant variables appears to rise, albeit with an overall decrease in explanatory power. Regardless of resolution, our findings suggest that areas with the greatest effort are characterized by shallow and cold waters with high levels of particulate organic carbon and phytoplankton. These environmental conditions align with the abundance and distribution patterns of whales in the Gulf of California (Pardo et al. 2013; García-Morales et al. 2017) and other regions worldwide (Scales et al. 2017; Meynecke et al. 2021), results that further support the existence of “knowledge gaps”. However, it is important to note that these environmental variables often exhibit limited predictive capacity (Higby et al. 2012; Chavez-Rosales et al. 2022).
Previous studies have underscored the limitations of correlative statistical models when extrapolating suitability or probability values in areas with limited or no sampling effort, especially when using opportunistically collected data (Elith and Leathwick 2009; Chavez-Rosales et al. 2022). Nevertheless, even in the presence of opportunistic data, critical areas of occurrence can still be identified when considering sampling effort and incorporating key predictive variables (Higby et al. 2012; García-Roselló et al. 2015; Tang et al. 2021). In our study, the alignment of whale distribution with survey effort, coupled with the explanatory capacity of environmental variables, suggests that the distribution of whales can primarily drive the observed heterogeneous distribution of survey effort. Consequently, to avoid potential misinterpretations, we propose that future analyses aiming to predict any marine or terrestrial vertebrate distribution should consider the possible effect of the organism’s distribution on the distribution of the collection effort, even incorporating survey effort as an explanatory variable when necessary.
Collecting systematic data on marine or terrestrial vertebrates can be challenging, particularly in-migrant animals that are difficult to detect. However, it is crucial to consider survey efforts when analyzing opportunistic biological data to determine the existence of predetermined sites for data collection. On some occasions, the distribution and abundance of the organisms themselves can be the primary criterion explaining the preferential selection of certain localities. This preference may be influenced by the existence of previously published or unpublished knowledge and the prolonged interaction between data collectors and animals. We recommend exploring survey effort data before attempting to estimate suitability or favorable conditions for the species of interest. By doing so, potential errors in interpretations can be minimized, enhancing the accuracy of the findings.