IV.1 Material and data acquisition
Institutional abbreviations: MNHN, Muséum National d’Histoire Naturelle, Paris, France; NHMUK, Natural History Museum of United Kingdom, London, United Kingdom; MNCN, Museo Nacional de Ciencas Naturales, Madrid, Spain; MHNBe Naturhistorisches Museum Bern, Bern, Switzerland; NMB, Naturhistorisches Museum Basel, Basel, Switzerland; MGL, Musée cantonal de géologie de Lausanne, Lausanne, Switzerland; NHMMZ, Naturhistorisches Museum Mainz, Mainz, Germany; SMNS, Staatliches Museum für Naturkunde Stuttgart, Stuttgart, Germany; ZFMK, Museum Koenig Bonn, Bonn, Germany.
The dataset comprises 202 astragali belonging to 60 extant species and 56 extinct species of ruminants (Fig. 6). We selected specimens with the aim of covering as much of the group's diversity as possible in terms of phylogeny, size and habitat (Fig. 6). The nodes of the main clades are calibrated in time by the fossil record according to Mennecart et al. (2022). In order to quantify intraspecific variation, some species are represented by several individuals (Alces alces, Antilocapra americana, Axis axis , Bachitherium curtum, Bedenomeryx milloquensis, Bedenomeryx paulhiacensis, Bohlinia attica, Bos primigenius, Bos taurus, Capra aegagrus, Capra ibex, Capra sibirica, Capreolus capreolus, Cervus elaphus, Croizetoceros ramosus, Dicroceros elegans, Dorcatherium crassum, Elaphodus cephalophus, Eucladoceros ctenoides, Gazella dorcas, Helladotherium duvernoyi, Hispanomeryx daamsi, Hyemoschus aquaticus, Iberomeryx minor, Moschidae nov gen nov sp, Megaloceros giganteus, Metacervoceros rhenanus, Micromeryx azanzae, Micromeryx eiselei Micromeryx flourensianus, Miotragoceros pannoniae, Moschiola meminna, Moschus moschiferus, Muntiacus muntjak, Okapia johnstoni, Oriomeryx major, Ovis dalli, Ovis orientalis, Palaeoryx cordieri, Palaeomeryx eminens, Palaeomeryx kaupi, Rangifer tarandus, Saiga tatarica, Samotherium major, Sivatherium giganteum, Tetracerus quadricornis, Tyrrhenotragus gracilimus).
The 3D models were performed using different technologies (see Supplementary Data 2 for details). 140 individuals were surface-scanned with an Artec Space Spider surface scanner and reconstructed on Artec Studio 17. Two specimens from MNHN were digitized by Manuela Aiglstorfer with an Aicon Smart Scan at Plateau de Morphométrie du Muséum MNHN, CNRS UMS 2700 OMSI. Four specimens were surface-scanned using a FARO Laser ScanArm and digitized on Faro RevEng®2022.3 software thanks to Neil Adams and Tom Ranson. 33 individuals (ZFMK, SMNS) were scanned with x-ray computed tomography at the Staatliches Museum für Naturkunde Stuttgart using a Bruker Skyscan 1272 equipped with a 20-100kV / 10W x-ray source. 23 individuals were digitized using a nanoCTt system nanotom (phoenix x-ray, GE Sensing and Inspection Technologies GmbH, Wunstorf, Germany) hosted at the Department of Biomedical Engineering, University of Basel. Details of the origin and digitization method of each individual can be found in the supplementary data. 3D models from CT-data were reconstructed using AVIZO 9.0 software (Visualization Sci-ences Group). 3D data that were too large (up to 25 GB for some models derived from CT scans) were reduced with MeshLab® 1.3.3 software (Cignoni et al. 2008) to 20 MB or less. Left astragali were preferentially selected. Right astragali were reversed using Landmark Editor® 3.0.0.6 (Wiley 2006). Scale normalization in mm was performed using MeshLab® 1.3.3 software.
IV.2 Nomenclature
The nomenclature used for astragalus morphology (Fig. 7A) is derived from the combined work of Alexander and Bennett (1987), Barone (1999 ; 2010), Haruda et al. (2019), Barr (2014), Solounias and Danowitz (2016), and Pöllath et al. (2019).
IV.3 Ecological variables
Two ecological variables were taken into account, in the selection of modern species: mass and habitat (Fig. 6). Data were taken from Kingdon (1997), Nowak (1999) and Huffman (2023). Where a range of masses was provided, the mean value for the species was used, according to other studies (DeGusta and Vrba 2003; Curran 2012; Etienne et al. 2021). In our sample of modern ruminants, the mass range extends from 1.85 kg (Tragulus kanchil) to 1240 kg (Giraffa camelopardalis). The modern species were classified according to their habitat preferences into 4 categories (Fig. 6), based on those of Köhler (1993): closed environments (C: dense forests, high humidity), ecotones (E: transition zones with high variability of vegetation cover and humidity), open environments (O: grasslands, hot and cold deserts), mountainous environments (M: steep and/or rocky areas, with variable vegetation cover).
IV.4 Landmarking
To analyze the morphology of astragalus, we used 3D geometric morphometrics to quantify the shape of each individual. The regions of interest used to establish the protocol were defined on the basis of characters highlighted or used in previous studies (Barr 2014; Haruda et al. 2019; Pöllath et al. 2019). Additional characters were identified (e.g. medial extension; Supplementary Data 2). To capture the astragalus shape, 6 fixed landmarks were also placed (Fig. 7B) using Landmark Editor ® 3.0.0.6 (Wiley 2006). Due to the complex, curved and rounded shape of the astragalus, we also used a 3D sliding-semilandmarks procedure (Gunz et al. 2013; Dziomber et al. 2020) using R package geomorph (version 4.05). Semilandmarks slide along predefined curves and surfaces while minimizing binding energy (Gunz et al. 2013; Fabre et al. 2015). In total, the landmarking protocol is composed of 42 semilandmarks curves, of 10 semilandmarks each. The coordinates were then extracted in NTsyslandmark points (.dta) format.
IV.5 Statistical analysis using 3D geometric morphometrics
All the following analyses were carried out using MorphoJ® 1.06b (Klingenberg 2011) and R (version 4.2.2; Development Core Team. 2022), based on the code used by Mennecart et al. (2022). Statistical tests were used with a significance level of 5%. To compare the different morphologies of the astragalus, a procrustes analysis was performed using the "gpagen" function in the geomorph package. This is based on three operations: coordinate scaling, translation and rotation relative to a reference shape determined by the analysis (Klingenberg 2016). A centroid size for each individual is obtained at the end of this analysis.
Depending on the analysis and the question behind it, different subsets were used (see Fig. 8A).
To quantify intra-observer variability on landmark positioning, individual NMB 1122 of the species Micromeryx flourensianus from the Steinheim am Albuch locality, was replicated 10 times. The procrustean coordinates of the replicas were then compared with those obtained on other individuals belonging to the species Micromeryx flourensianus using a hierarchical analysis performed with the "Cluster analysis" tool and the "Euclidean similarity" option on the PAST® 2.17c software (Fig. 8A; Hammer et al. 2001). The results show that intra-observer variation is lower than intra-specific variation (Supplementary Data 1).
We performed principal component analysis (PCA; geomorph function "gm.prcomp") on the DataPhy and DataEco datasets (without allometry correction) to represent overall variation and assess the impact of allometry (Supplementary Data 1).
We carried out a partitioning analysis of variation in astragalus morphology on the DataEco dataset (Fig. 8B; Viacava et al. 2021). The factors included in the model are size (centroid size), phylogeny (clade) and habitat (habitat categories). To partition shape variation according to these variables, we used the "varpart" function from the package vegan (Oksanen et al. 2018).
To test the correlation between astragalus morphology and size we used a regression of procrustean coordinates (DataPhy and DataEco) on the logarithm of centroid size (Fig. 8C), as defined by Klingenberg (2016). To test how much phylogenetic information is contained in size-controlled variation, we performed a permutation test between the logarithm of centroid size and phylogeny on the DataPhy dataset (Fig. 8C). The result was significant (p = 0.001). This implies that working on the regression residuals of this analysis would potentially cause us to lose phylogenetic information that may be confused with allometry. So, we carried out a second regression on the DataPhy dataset, pooled by clade in order to retain the phylogenetic information contained in the size of the astragalus. Pooled within-group regression can be explained as a two-step procedure where the differences among group averages are first removed by centering the shape and size data by group and then an ordinary regression is carried out on these centered data (Mennecart et al. 2020). Working on residual values allows an exploration of the dataset with non-allometric variation (Klingenberg 2016; Fig. 8B). To test independence between size (Log of centroid size) and ecological variables with analyses of variance (ANOVA), we used the "aov" function in the package stats (version 3.6.2; Fig. 8C). For each, the prerequisites were checked by Shapiro-Wilk, Breusch-Pagan and Durbin-Watson tests, respectively for normality, homogeneity, and independence of residuals (Supplementary Data 2). After procrustes analysis, the ANOVA indicates a correlation between habitat and size (p = 0.00559). So, as with the phylogeny analysis, working on the residuals of the regression of data not pooled by habitat could lead to a loss of environmental information. For this reason, we carried out a regression on data pooled by habitat (Supplementary Data 1). To test for the presence of a phylogenetic signal on the morphology of the astragalus, we carried out a permutation test on the DataPhy2 dataset (Klingenberg and Gidaszewski 2010), based on a phylogeny whose branch tips are superimposed on the PC scores of each associated species (Fig. 8C; p <0.0001). To test for the presence of a signal linked to ecological variables on shape, procrustean ANOVAs were run on the DataEco and DataEco2 datasets using the "procD.lm" function in geomorph (version 4.0.5; Fig. 8C; p < 0.001).
To characterize morphologies within different clades and habitat categories, we performed between-group PCAs (bg-PCA) and canonical variation analyses (CVA) on the DataPhy2 and DataEco2 datasets (Fig. 8A, Supplementary Data 1). A bg-PCA and CVA provide additional information (Mitteroecker and Bookstein 2011; Mennecart et al. 2022). A bg-PCA observes variance between groups (clades or habitats) without standardizing intra-group variance. We calculated the bg-PCAs using the "groupPCA" function in the Morpho package (version 2.11; Schlager 2017). CVA maximizes the separation of means between groups relative to the variation in the ratio of groups according to a specified grouping variable. We performed CVAs with the "cva" function, from the Morpho package. To test the performance of the classification model, the analysis was cross-validated using the Jackknife method. We also used the "classify" function in the Morpho package, based on the results of CVAs performed on the DataPhy2 and DataEco2 datasets (Supplementary Data 2). A Kappa statistic is obtained, enabling us to estimate the certainty of assignment to a group (Cohen 1960).