The premise of this work was the creation of an up-to-date database including all the available archaeological data on ivory finds from the Central-Western Mediterranean basin. The collected information includes the typology of finds, the type of ivory used to create the object, its location, a detailed description of the object, its current storage location, and the available literature. However, based on this database, simple spatial distribution maps of archaeological features are rather descriptive if not connected to explanatory variables from socio-cultural or environmental background. Hence, we integrate topographic and hydrologic variables to the analytical workflow in order to estimate the impact of terrain roughness and stream-flow character on potential trade network development. In this section, we first introduce into the site database, following a description of the geographical settings and variables used and a detailed methodical outline of the environmental data used as well as the model set-up.
2.1 Archaeological background and site database
The study of ivory production, among other exotic raw materials, in Chalcolithic and Early Bronze Age contexts has been a focal point since the beginning of Iberian Prehistoric studies (Siret 1913; Guilaine 2018). Chapman (1990) suggests an African origin for the majority of the Iberian ivories, with a relevant part of fossil tusks sourced from the sandy terraces of Iberian rivers. However, since 1999, an interdisciplinary team led by T. X. Schuhmacher, conducted a study on Iberian ivory with the surprising result of a significant amount of Asian elephant (Elephas maximus) ivory, a species that during this time frame was inhabiting some ecological niches in the Near East (Pfälzner 2016). This discovery reignited the long-standing debate concerning East-West relations during the third and second millennia BC, and the potential existence of extensive contacts and exchange networks, challenging the paradigm of a Mediterranean divided in two geo-cultural regions (Renfrew 1967).
Building on this, several researchers focused on ivory as a socioculturally relevant commodity used to display the status of Iberian elites during Chalcolithic and Bronze Age (Garcia Sanjuan et al 2013). Ultimately, the objective was to trace the supply routes and understand their significance in the context of cultural interaction processes across the Mediterranean. To achieve this goal, several attempts were made to identify the actual provenance of the ivories, using diverse approaches based on archaeometry or typology (Lafrenz 2004; Pascual Benito 2012; Morillo León et al 2018). By now, it is accepted that ivory played a central role in the social discourse of Prehistoric elites, becoming a strategic element to reinforce their social prominence in different societies and groups across the Mediterranean Basin, with higher or lower relevance depending on the area (Luciañez Triviño 2018; Garcia Sanjuán et al 2013).
Due to its intrinsic physical characteristics, ivory is especially suitable for carving, producing smooth, shiny surfaces that were appreciated by different social groups across the globe in different periods. Ivory furthermore encompasses affordances beyond mere material properties, partly due to its distant origin and cost and network-intensive acquisition (Van de Noort 2012; Needham 2009). The connection between transport of the material and its individual value is pivotal for comprehending the role played by exotica in traditional societies, and a very relevant factor in the cultural interaction developed at least from the third millennium BC onwards across the Mediterranean Basin, the Atlantic Façade and continental Europe. Particularly, in the Iberian Peninsula the appearance of ivory in archaeological context is linked to the rise of social complexity during the Chalcolithic (García Sanjuán et al 2013). Different types of items have been identified, following local typologies and even workshops where the raw material was processed, supporting the idea of long distance exchange routes (Braciela et al 2022; Nocete et al 2013; Pascual Benito 2012; Liesau and Schumacher 2012). Surprisingly, there are no clear precedents for this working technology in the bone industry, suggesting again rather a possible influence of external elements within the local society - either in the form of foreign individuals or local people imitating the techniques (Morillo León et al 2018).
To acknowledge these archaeologically and socio-culturally complex relationships, we developed a site database that covers 64 sites across the Mediterranean Basin, covering a chronological frame spanning from approximately 3000 to 800 BC, roughly in coincidence with the Chalcolithic and the Early Iron Age in most Mediterranean areas. A typological study has been intentionally avoided, focusing instead on the provenance of raw materials, as during this chronology the vast majority of items correspond to local types that are also known in other raw materials, as it is the case of the italian ossi a globuli (small bone plaques decorated with a row of bosses or globules), the V-perforated buttons or the Early Minoan seals (Morillo León et al 2018; Morillo León 2019). During the acquisition of the archaeological data, the Iberian Peninsula became a central focus region due to intensive research over the past years. On the other hand, and despite the abundance of ivory production in the Levant during the Bronze Age, a more traditional approach that doesn’t particularly focus on raw material supply led us to the choice of limiting our study area to the western Mediterranean Basin.
Published research was screened to compile a database containing information about ivory finds from sites within our spatial and temporal window. The literature review included scientific papers as well as several museum collections (MET, Metropolitan Museum, New York; MAN, Museo Arqueológico Nacional, Madrid, MAS, Archaeological Museum of Seville). The final database contains 411 ivory artefacts from 64 sites and geographically spans from Spain (n = 20 sites), Italy (n = 9), Portugal (n = 8) to Greece (n = 9) and North Africa (n = 24). The following site distribution characteristics were included in the data collection: id, coordinates (WGS84, EPSG:4326, lat/lon), faunal species, site location, and chronology (Table 1).
Table 1
Site location and samples used in this paper. Coordinates are in WGS84, EPSG:4326. Coordinates are derived from the literature and refer to the archaeological site. A list of references for each site mentioned is provided as Supplementary Information.
ID | Latitude | Longitude | Genus | Site name | Chronology |
1 | 37.409 | -6.057 | Elephant | Valencina de la Concepcion-Montelirio | Chalcolithic |
2 | 37.416 | -6.077 | N/A | Valencina de la Concepcion-Matarrubilla | Chalcolithic |
3 | 37.409 | -6.059 | Elephant | Valencina de la Concepcion-Sector Urbanizacion Senorio de Guzman-Divina pastora | Chalcolithic |
4 | 40.076 | -3.855 | N/A | Numacia de la Sagra-Las Mayores | Chalcolithic |
5 | 38.238 | -6.015 | N/A | Huerta de Dios-Llerena | Chalcolithic |
6 | 37.683 | -1.700 | N/A | Los Cipreses-Lorca | Early Bronze Age-El Argar |
7 | 37.835 | -3.904 | N/A | Torre del Campo-Jaen | Late Chalcolitic-Early Bronze |
8 | 37.770 | -3.789 | N/A | Marroquies Altos | Chalcolithic |
9 | 37.723 | -3.966 | N/A | Las Penuelas | Chalcolithic |
10 | 39.043 | -3.498 | N/A | Motilla del Azuer | Early Bronze Age-Las Motillas |
11 | 37.137 | -3.548 | N/A | Cerro de la Encina-Monachil | Early Bronze Age-El Argar |
12 | 37.321 | -4.011 | N/A | Montefrio | Late Chalcolithic-Early Bronze Age |
13 | 37.727 | -2.514 | Elephant | Cerro de la Virgen | Late Chalcolithic-Early Bronze Age |
14 | 37.803 | -3.745 | N/A | Puente Tablas | Late Bronze Age |
15 | 39.024 | -2.028 | Elephant | El Acequion | Chalcolithic-Early Bronze Age |
16 | 36.963 | -2.522 | N/A | Los Millares | Chalcolithic |
17 | 37.297 | -1.880 | N/A | La Encantada | N/A |
18 | 37.265 | -1.788 | N/A | Almizaraque | Chalcolithic |
19 | 37.994 | -1.144 | Elephant | Murcia | N/A |
20 | 37.558 | -2.423 | N/A | El Malagon | Late Chalcolithic |
21 | 39.505 | -8.615 | Sperm whale | Galeria da Cisterna | Late Chalcolithic |
22 | 38.728 | -9.282 | N/A | Leceia | Chalcolithic |
23 | 37.157 | -7.546 | N/A | Nora | Late Chalcolithic |
24 | 38.889 | -9.074 | Sperm whale | Verdelha dos Ruivos | Late Chalcolithic |
25 | 38.585 | -8.873 | N/A | Palmela | Late Chalcolithic |
26 | 38.881 | -9.169 | N/A | Dolmen de Casainhos | Late Chalcolithic |
27 | 38.899 | -9.435 | N/A | Samarra cave | Chalcolithic |
28 | 40.791 | 8.449 | Sperm whale | Monte dAcodi | Late Neolithic |
29 | 40.633 | 8.327 | Elephant | Anghelu Ruju | Late Chalcolithic |
30 | 37.680 | 12.756 | N/A | Marcita | Late Chalcolithic |
31 | 37.143 | 15.030 | Elephant | Pantalica | Late Bronze Age |
32 | 37.217 | 14.633 | Elephant | Madonna del Piano | Late-Final Bronze Age |
33 | 40.288 | 18.426 | Hippopotamus | Roca Vecchia | Final Bronze Age |
34 | 45.033 | 11.650 | Elephant | Necropoli delle Narde | Final Bronze Age |
35 | 45.157 | 11.354 | Elephant | Lovara di Villa Bartolomea | Late-Final Bronze Age |
36 | 39.561 | 8.902 | Elephant | Padru Jossu | Chalcolithic |
37 | 37.528 | 22.874 | Hippopotamus | Asine | Middle-Late Bronze Age |
38 | 37.731 | 22.756 | Hippopotamus | Mycenae | Middle-Late Bronze Age |
39 | 35.299 | 25.161 | Hippopotamus | Knossos | Middle Bronze Age |
40 | 35.063 | 24.795 | Hippopotamus | Ayia Triada | Early Bronze Age |
41 | 35.024 | 24.924 | Hippopotamus | Platanos | Early Bronze Age |
42 | 35.235 | 25.160 | Hippopotamus | Archanes | Early Bronze Age |
43 | 37.599 | 22.800 | Elephant | Tyrins | Late Bronze Age |
44 | 37.059 | 10.063 | Hippopotamus | Utica | Early Iron Age |
45 | 33.497 | -7.104 | Elephant | Khef el Baroud | Late Chalcolithic-Early Bronze Age |
46 | 31.410 | -9.839 | Elephant | Sidi Harraz Cap Sim | Neolithic-Early Chalcolithic |
47 | 31.496 | -9.786 | Elephant | Morgador | Iron Age |
48 | 36.353 | 6.637 | Elephant | Khanghet Si Mohammed Tahar | Neolithic |
49 | 33.857 | -7.061 | Elephant | Rouazi Skhirat | Chalcolithic |
50 | 36.038 | 6.570 | N/A | Bou Zabaouine | Chalcolithic |
51 | 32.738 | 0.936 | N/A | Brezina | Chalcolithic |
52 | 35.368 | 1.320 | N/A | Columnata Tiaret | Chalcolithic |
53 | 36.353 | 6.638 | N/A | Damous el Ahmar | Chalcolithic |
54 | 36.468 | 4.598 | N/A | Djebel Gueldaman | Neolithic |
55 | 36.191 | 5.409 | N/A | Mezloug | Neolithic |
56 | 34.165 | 3.497 | N/A | Messaad | Neolithic |
57 | 36.770 | 5.096 | N/A | Pic des Singes | Neolithic |
58 | 33.979 | -6.900 | N/A | Dar es Soltan | Chalcolithic |
59 | 33.522 | -7.826 | Elephant | El Kiffen | Chalcolithic |
60 | 31.176 | -8.333 | N/A | Grotte de Toulkine | Neolithic |
61 | 35.764 | -5.937 | N/A | Grotte des Idoles | Neolithic |
62 | 34.660 | -3.418 | N/A | Hassi Ouenzga | Neolithic |
63 | 35.567 | -5.367 | N/A | Kahf Taht el Ghar | Chalcolithic |
64 | 35.359 | 8.397 | N/A | Abri Clariond | Neolithic |
2.2 Environmental datasets
The study area stretches from Portugal and Spain to Greece and from Morocco to the Alps, covering the Central and Western Mediterranean Basin (Fig. 1). Such a broad and supraregional environmental coverage contains multiple climatic, topographic, and hydrologic systems with a strong variation in terrain roughness, precipitation and temperature gradients, as well as streamflow characters of the multiple drainage systems. River run-off behaviour depends strongly on the seasonal cycle and varies according to the snow-melt regime during early summer and winter precipitation patterns.
Fundamental basis to model movement corridors and landscape permeability estimates is a digital elevation model (DEM). Here, we employed the SRTM 90 v.4.1 tiles (https://cgiarcsi.community/data/srtm-90m-digital-elevation-database-v4-1/, last accessed on 8th of May 2023) obtained from the CGIAR Consortium for Spatial Information (CGIAR-CSI) website. Developed by NASA, this dataset offers comprehensive and up-to-date open data coverage on a nearly global scale. Compared to the original version provided by the USGS, the CGIAR-SRTM data have been processed to fill undesirable voids in the model (Jarvis et al 2008). The dataset can be downloaded in tiles measuring 5 by 5 degrees in size (lon-lat) and has a resolution of 3 arc of a second, which translates to 90 m nominal resolution at the equator. A further possibility for download is using the elevation function from the geodata package in R (Hijmans et al 2023).
While higher resolution data, such as the 25 m resolution EU-DEM (Copernicus: Earth Observation component of the European Union’s space programme), are available for Europe, we decided to use the SRTM data for several reasons. Firstly, the vast size of the study area, coupled with the exponential increase in computational requirements, made the SRTM data more practical and efficient. Additionally, sourcing comparable high-resolution data for North Africa would have been time-consuming, offering marginal gains for our specific model. Moreover, to address hardware limitations, we rescaled the SRTM raster to a factor of 2, resulting in an approximate 200 m resolution elevation map, which would have made the choice of a 25 m resolution DEM irrelevant. As already mentioned, the 90 m resolution refers to measurements at the equator. Because of the projection from geographical coordinates (EPSG:3857), it derives that the resolution decreases at increasing distance from the equator. For the study area, the average resolution is slightly less than 100 m, resulting in the aforementioned approximate 200 m resolution. The scripts, including the functions and the workflow adopted to mosaic, crop and re-scale the elevation model are included in the repository. To facilitate movement modelling across areas covered by water, these were filled with values = 0, making them "operable" for the LCP model. However, uniformly assigning 0 values would result in a slope value of 0 across all the Mediterranean. Thus, a friction map was required to assign specific movement costs for crossing different terrain types. A friction map is a raster map in which each cell represents the friction cost required to cross it, irrespective of its slope value (Novaline Jaga et al 1993; Howey 2007). These maps are employed to represent variations in the ease or difficulty of movement across different areas, such as land-cover types, rivers, or other features in the landscape. Nevertheless, assigning friction costs is often a subjective process, and their use can potentially introduce bias, leading to results that reflect the initial parameter choice. For this reason we decided to proceed with caution, keeping the friction value changes to a minimum. A uniform friction value of 1 was given to all land areas, while applying friction values of 0, 2, and 5 to sea areas at distances of less than 10 km, 10–50 km, and beyond 50 km from the coastline, respectively. Rivers were given a friction value of 1, with an additional buffer of higher friction (3) around them to represent ease of travel along the riverbank but increased difficulty in crossing the river. Hydrologic streamflow data derives from the Copernicus European Union’s Space programme (https://land.copernicus.eu/imagery-in-situ/eu-hydro/eu-hydro-river-network-database, last accessed 21st of May 2023). The data comes in spatial lines format, which were extracted and filtered according to their stream flow hierarchy (Strahler order number). All river sectors with Strahler order number > 7 were selected from the catchments of the rivers (in alphabetical order) Danube, Duero, Ebro, Garonne, Guadalquivir, Mesima, Pinios, Po, Rhone, Tajo and Tevere. In some cases, secondary or minor river basins are provided aggregated to larger ones (for example the Po river basin includes the Adige too) and therefore not listed here. For North Africa, rivers were downloaded using the rnaturalearth R package (Massicotte and South 2023), and in particular the ne_download function which allows to download rivers and lakes data at the desired scale (using the argument “scale = 10”, downloading the highest resolution available). The terra (Hijmans 2023), sf (Pebesma 2018; Pebesma and Bivand 2023) and dplyr (Wickham et al 2023) R packages are used to process the river data. The script used to download the North African data and to process them with the European one is provided in the repository (01_rivers.R script).
The friction map has a resolution corresponding to a rescaling of 4 times of the original SRTM (i.e. ca. 400 m). The reason why it was not kept to an aggregation of 2 is that it is largely repetitive and uniform and therefore it is not necessary to achieve a greater resolution.
The choice of our friction parameters is based on the assumption that sea movement was faster and easier than land movement and, if within sight from the coastline (< 10 km) preferred over land movement. Rivers were not assigned lower friction values compared to land, as large alluvial basins already exhibit flatter terrain, making them more efficient to travel on. It derives that introducing a different friction for rivers would have resulted in redundancy or even a duplication of this pattern. However, we acknowledge that crossing major rivers may have presented challenges or increased effort, hence the higher friction buffer around rivers. The size of this buffer is of ca. 800 m (2 raster cells) for the only reason of avoiding the 0-cost crossing at the edge of a rasterised line. To acknowledge the fact that this choice is arbitrary and may not be shared by everyone, we provided a fully reproducible workflow, that includes the scripts necessary to create the friction map, allowing readers to modify friction values based on different theoretical assumptions and compare their results with ours. The friction map is created using map algebra in R, using the terra (Hijmans 2023) and sf (Pebesma 2018; Pebesma and Bivand 2023) packages. The script used to build the friction map is the 02_friction400.R in the repository.
2.3 Model set-up
We distinguish two different potential movement patterns: a land-based and a sea-based circuit or route network. As of today, it is more than clear that a very important component of movement in Prehistory was water-based, as testified by many archaeological finds and studies (Broodbank 2013; 2016; Blake 2014; Iacono 2019). The literature is rich in papers about computational approaches to mobility and LCP analysis in archaeological research, which resulted in a very well developed theoretical and methodological background. However, this is generally limited to land-based movement and tends to exclude seafaring due to increased complexity in the resulting model (e.g. Howey 2011; Llobera et al 2011; Verhagen and Jeneson 2012; White and Barber 2012; Whitley and Hicks 2003). The reasons for this are well-known and mostly related to the difficulties of combining two very different methods of transportation-movement that has been often pointed out in the literature. However, there are some studies trying to combine or to consider in some ways both types of movements simultaneously (Ducke and Suchowska 2022; Leidwanger and Knappett 2018). The leitmotif is to keep the models very simple, based on traditional movement land-based costs based on isotropic movement, coupled with friction costs that include water areas (rivers and sea). The more complexity is added to a model, the more variables need to be integrated and the more issues arise from the certainly significant differences in land and water movement costs. The advantages of conceptually simple models are multiple. Universality (read: reusability) is more at hand and computational work becomes less intense and more accessible.
Least Cost Paths (LCPs) are models that calculate the optimal or most efficient route between two (or more) locations in the geographical space. They find application in many fields, from urban planning to ecological models and archaeology (Fabrício Machado and Miranda 2022; Fjellström et al 2022; Balbi et al 2021). These models need two source data in order to be computed: spatial location (coordinates) of the points from and to which we want to calculate the travel costs and a surface with resistance or cost characteristics on which the route is calculated. This surface can depend on several factors such as elevation, slope or land cover. The output of a LCP is a single line connecting the input points. The most evident caveat of this method in archaeology is that it relies on the assumption that the connected points were directly linked during the period under study. This can be straightforward in documented historical contexts, as demonstrated by several studies (e.g. van Lanen et al 2015; Lewis 2021). Unfortunately, this is not the case for prehistoric case studies. Moreover, even if we are confident that two points were in contact, the presence of additional sites may have influenced the choice of an optimal path. The fact that the archaeological record is by definition incomplete and that we may not know some of the sites belonging to this network make the reconstructions even less reliable. Site size, catchment and relative importance of a node in the network add further layers of uncertainty. Although these issues are widely discussed, sometimes assuming contemporaneity and completeness of the dataset (or at leasts that new inputs would not substantially change the picture) seems to be the only possibility in order to reconstruct past mobility networks (Ducke & Suchowska 2022).
As already pointed out (Herzog 2014, 2022; Lewis 2021) there are also a number of issues concerning the computational side of LCP analysis. In this paper, it is not possible to address a discussion over these topics but it is important to consider them and to elaborate our models accordingly. The most important issue concerns the choice of the cost models and of the LCP algorithm (Herzog 2022). Another relevant issue is the choice of an appropriate DEM. It has been demonstrated that LCPs models based on high resolution DEMs (e.g. 5 m) are highly affected by modern infrastructures such as roads or highways (Verhagen and Jeneson 2012). In addition, it has been proved that the performance of LCPs does not improve considerably using higher resolution DEMs (Herzog 2022). However, when smoothing an elevation model or using a coarser one it is important to remember that also natural features such as small creek valleys are smoothed as the slope values generally used to compute the travel costs.
Part of these problems involve computational and storage requirements. In fact, increased complexity in the model has a strong impact on the size of the output and time required (two times higher resolution results in a quadratic increase of these parameters; Herzog 2022). Nonetheless, the increase in accessibility to computational power allowed the exploration of many different alternatives, improving the methodological and theoretical background of cost-based analysis. One of these approaches is the so-called “From Everywhere to Everywhere” (FETE) approach (White and Barber 2012). Compared to traditional LCPs or many other workaround solutions, such as the “flow” of movement using hydrological modelling (Llobera et al 2011), this method does not require the existence of an origin or a destination. What it does is to sample a regular grid of points within the study area and to compute every possible connection between these points, resulting in a network of paths that indicate the presence of high traffic corridors. Here, we propose three additional implementations to the original model: (i) the inclusion of water-based movements using a friction map that takes water into account, following in part what was suggested by Ducke and Suchowska (2022), (ii) instead of a visual assessment of the structure and location of the corridors compared to the archaeological sites, we study them in a formal way using statistical analysis in order to find out whether sites are located on high traffic routes more than expected by a random distribution or whether sites are located closer to the corridors than a random distribution. In this case a distance surface from the densest corridors (calculated using a KDE) is used as covariate; (iii) we noticed that this approach is affected by edge effects, due to the fact that points along or closer to the edges of the study area are less connected (i.e., they miss the connection to further points extended out of the study area). This would result in a higher density in the central areas, regardless of the topographical characteristics of the landscape. For this reason, we created a sensibly large area (500 km in every direction) that will then be cropped out and not discussed in the results.
These improvements greatly increase the confidence of our conclusions and also increase the formality of the analyses, not only compared to the “traditional” FETE but also when compared to a more classic LCP approach. This in turn may help the LCPs to regain centrality in the archaeological debate. The main downturn of this approach is the high computational costs required, especially because the area we want to analyse is large and encompass ca. half of the Mediterranean Basin. Only considering the land mass within the study area our region is about 50 times larger than the Oaxaca region analysed by White and Barber (2012). Therefore, we opted to downsample the SRTM-DEM by aggregating the original cells with a factor of 4, resulting in an approximate resolution of 200 meters.
The model used in this paper has been implemented in R programming language (R Core Team 2022), calling functions from GRASS GIS via the package rgrass (GRASS Development Team 2022; Bivand 2023). The script used to calculate the LCP density works as follow:
-
a grid of points is sampled within the study area at regular space intervals within landmass. Ideally, the spacing between each point in the grid is set to 125 pixels (ca. 25 km). For computational reasons we decided to break up this grid of points in 500 smaller groups and run the analysis on each subset separately and summing it at the end.
-
For each iteration, the script calculates walking paths between a source point and all other points using the r.walk function. This function computes the anisotropic cumulative cost of moving from the origin point to the input set of destinations. The costs are based on the elevation and friction maps provided to the function. The function returns two outputs: the cumulative cost of moving between each raster unit (200 m grid) and the movement direction taken. This function is a great improvement compared to the r.cost function also available in GRASS because travel times are anisotropic rather than isotropic (walking downhill or uphill matters).
-
Once the previous step is completed, the r.path function reads the results and computes the least cost paths between the origin and the target points. The input maps are used in a way that the path stops when the direction is 0 or negative.
-
The resulting paths are then used to calculate the kernel density, calculated using the density function from the spatstat package, summed together and scaled to the number of layers and iterations.
Evidently, our model is merely based on environmental and topographical constraints in order to compute the optimal path. This is because we are not interested in the route taken by prehistoric people but to study the relations of the characteristics of the landscape in terms of mobility (a sort of affordance) and the distribution of archaeological sites. Statistical correlation or lack of it is important to get a better grasp on human-environmental relations and the possible impact of socio-cultural factors.
2.4 Network and prediction model of Ivory sites
Using the recently developed network model algorithm v.net.models by Ducke and Suchowska (2022), we deploy the GRASS GIS interface in QGIS. Input data is a DEM with 200 m resolution and a projection in EPSG:3857. NA values were replaced with the value 1, which essentially fills the sea surface to prevent a cost value of 0. All negative values were equally set to 1 to remove artefacts and sinks from the DEM. A cost surface was produced using the r.cost algorithm in GRASS GIS as outlined above. For the v.net.models function, the input variables were set to default, the starting points were selected to be the 64 sites and the input raster is the cost surface. This set-up produces a “complete connectivity model” where all sites are interconnected (for more details, see: https://github.com/benducke/v.net.models, last accessed 21st of June 2023). A second cost surface was produced using a composite raster of the land mass and an additionally generated sea-surface raster that represents heterogeneous altitude from random point distribution. For this reason, a sample of 500 '000 points was generated across the buffered spatial extent (100 km) of the previous network model. The sample was assigned a consecutive number of values, buffered with 10 km and rasterized to the same spatial extent and resolution as the initial DEM. The raster was divided by 100 to range from 0 to 6082 m with peak values outside the land mass. From the extent of the DEM and the study area rectangle, a difference mask was created and the produced raster was cropped to the difference polygon. Finally, both rasters were merged, which resulted in a continuous raster representing the original DEM as well as heterogeneous raster values in areas covered by sea. From this raster, a cost surface was produced that includes fictive terrain roughness in all areas covered by water and hence do not automatically favour water-based transport but weighs more heavily land-based transportation systems.
Using the prediction function from the spatstat package, we leverage the calculated least cost path density as a covariate to make predictions about high probability locations (Baddeley et al 2016). These predictions are based on the point process and its relationship with the covariate. In essence, one is using the information derived from the least cost path density as a factor to estimate where high probability points are likely to be within the context of the point process. This approach permits to identify areas that are more favourable or significant according to the relationship between the point process and the least cost path density covariate. This involves several steps. Firstly, the data must be converted to a metre-based projection to be incorporated into the spatstat workflow. Next, we convert our raster data into a format recognized by spatstat, a real-valued pixel image using the as.im function. Simultaneously, we format our point pattern into an object of class "ppp" which stands for a planar point pattern, using the ppp function (Baddeley et al 2016).
With these preparations in place, we can employ the rhohat function from the spatstat package (Badelley et al 2012). This function provides a nonparametric estimate of intensity based on the covariate. In practical terms, let's imagine we have a collection of points labelled as "y", which represents spatial locations. Alongside these points, we have associated values, denoted as "X(u)", representing the spatial covariate at each respective location "u". We treat these points "y" as realisations of a spatial point process, denoted as "Y", where the intensity function "λ(u)" depends on the covariate through the equation λ(u) = ρ(X(u)) (Badelley et al 2012).
Our goal is to determine the function "ρ", also known as the resource selection function in the context of ecological site preference modelling (Badelley et al 2012, 2016). To achieve this, we create a plot where we compare the ratio estimator (referred to as "sites'') against the actual sites, incorporating a 95% confidence interval. We make an assumption that our point process follows a Poisson process, a straightforward model for events occurring randomly in space or time (Grandell 1976, 1; Orton 1982) (see code to this article for additional and supplementary plots of the point process).
On this plot, we observe "tickmarks" along the horizontal axis. Each tickmark corresponds to the parameter value associated with observed sites. Clustering of these tickmarks indicates areas where there is a higher intensity of sites, suggesting a preference for specific covariate value ranges. Peaks in the curve highlight locations with a denser concentration of points compared to their immediate surroundings. However, the significance of this intensity is determined by the confidence interval. A wider confidence interval implies lower confidence and might indicate the presence of isolated values or outlier clustering behaviour. Conversely, when we observe high site intensity along with small confidence intervals, this indicates high significance and, in such cases, a strong preference for a particular range of covariate values (Kempf 2021; Kempf & Günther 2023). In summary, this approach involves preparing and analysing spatial data to understand how the point process relates to a covariate, making use of nonparametric estimation and statistical visualisation techniques.
The subsequent step involves using the obtained results and the covariate as input variables for a prediction model. This model aims to generate a probability map, which indicates areas where potential sites could be distributed based on the raster data and the empirically observed point process. The predict.ppm function available within the spatstat package is employed to create this probability map (Badeley & Turner 2000). This map is rescaled to a range from 0 to 1, where a value of 1 signifies a high probability of finding sites. In essence, it provides a spatial representation of the likelihood of site presence based on the relationship between the covariate (least cost path density) and the point process.
Using the point process model (ppm) to represent our site data in ppp format and the LCP density in im format obtained from a stars object (sites_ppp ~ im_stars), we conducted 1000 iterations to simulate site locations based on the provided input variables. Subsequently, we assigned geographical coordinates to the ppm using the EPSG:3857 projection and calculated the intersection between the simulation results and the landmass window. To optimise computational efficiency and smooth the vector, we simplified the mask's outline.
This intersection process generated 33,458 points located on land, from which we performed a Kernel Density Estimation (KDE) with a sigma of 40 km. After trying different sigma in the range of 10–50 km we opted for 40 km due to a better representation of the local vs the regional density. In order to visualise areas with a high probability of containing sites, we established a statistically derived threshold for the KDE values. This threshold was defined as the mean plus the standard deviation (mean + SD), and all KDE values below this threshold were designated as NA. Conversely, all values surpassing the threshold were set to 1, creating a high-probability vector representing the potential distribution of sites. This vector was then superimposed onto the rescaled LCP density corridors (see Fig. 2). The scripts used to compute the rhohats and the predictive models are available in the “05_analyses.R” script, alongside with the code necessary to generate plots to visualise intermediate steps not shown in the paper.