Human faces are one of the richest sources of social information in our environment (see e.g., Jack & Schyns, 2015; Jack & Schyns, 2017; Zebrowitz & Montepare, 2008 for reviews). From facial appearance alone, observers spontaneously (e.g., Klapper et al., 2016), implicitly (e.g., Swe et al., 2020), and readily (e.g., Willis & Todorov, 2006) infer important personal characteristics of others, such as how trustworthy or dominant they are. Though fleeting, such judgments can have significant downstream consequences, ranging from dating preferences (e.g., South Palomares & Young, 2018), to professional success (e.g., Menegatti et al., 2021), and voting choices (e.g., Joo et al., 2015). Given the central relevance of these judgments to human social life, a longstanding goal in the human behavioral sciences has been to understand which types of faces drive these perceptions.
Current influential models posit that fundamental social trait judgments, such as of trustworthiness and dominance (Oosterhof & Todorov, 2008), are driven by specific facial features (e.g., Freeman & Ambady, 2011; Todorov & Oosterhof, 2011; Zebrowitz, 2017). For instance, smaller faces with upturned mouth corners, arched eyebrows, and a lighter skin tone are judged as more trustworthy (Jaeger et al., 2020; Said et al., 2009; Todorov & Oosterhof, 2011; Vernon et al., 2014; Zebrowitz & McDonald, 1991), while larger faces with a more prominent brow ridge and jaw, and a darker skin tone are judged as more dominant (Albert et al., 2021; Mileva et al., 2014; Todorov & Oosterhof, 2011; Zebrowitz et al., 2003). However, though human faces vary remarkably in both shape and skin tone (e.g., Farkas et al., 2005; Maddox, 2004), existing models of the facial features that drive social trait perception are based almost exclusively on White European faces (Cook & Over, 2021), which fundamentally limits their generalizability. For example, face ethnicity impacts social trait perception by biasing judgments of other-ethnicity faces towards ethnic stereotypes (e.g., Blair et al., 2002; Eberhardt et al., 2006; Eberhardt et al., 2004; Hutchings et al., 2024; Kleider-Offutt et al., 2018; e.g., Xie et al., 2021). However, because current models (e.g., Oosterhof & Todorov, 2008) do not represent ethnically diverse facial features, they cannot provide a causal explanation of which facial features drive these judgments. With mounting evidence now revealing the inherent limitations of WEIRD psychological science (see Cook & Over, 2021 for further discussion; see also e.g., Henrich et al., 2010; Jones et al., 2021; Rad et al., 2018 for related discussion), representing ethnic diversity is increasingly important for theoretical accounts of human social perception and in developing interventions that aim to address inequality and bias in cross-ethnicity interactions.
Here, we address this critical knowledge gap by modelling the specific facial features that drive the perception of two key social traits—trustworthiness and dominance—from three face ethnicities—Black African, East Asian, and White European. We include these three broad ethnic groups because they are anthropometrically distinct in terms of skin tone and/or facial structure (e.g., Farkas et al., 2005) and implicated in cross-ethnicity social trait perception differences (e.g., Xie et al., 2021). To model the facial features, we used a high-fidelity 3D generative model of the human face (Yu et al., 2012; Zhan, Garrod, et al., 2019) combined with the classic psychophysical method of reverse correlation used in ethology (e.g., Tinbergen, 1948), vision science (e.g., Mangini & Biederman, 2004), neuroscience (e.g., Hubel & Wiesel, 1959; Nestor et al., 2016; Zhan, Ince, et al., 2019), and engineering (e.g., Thompson et al., 1999; Volterra, 1930), and human social trait perception (Jack & Schyns, 2017). Figure 1 illustrates the approach.
On each experimental trial, we generated a novel 3D face identity using a high-fidelity generative model of the human face that is based on high-resolution real-world 3D captures (see Yu et al., 2012; Zhan, Garrod, et al., 2019 for more details). Specifically, the generative model randomly samples, for 3D face shape and 2D complexion separately, weights for 402 principal components that capture and control the natural facial feature variance associated with individual identities (henceforth called ‘identity components’). For example, in Figure 1A, red color-coding shows the 3D face shape features that deviate outward from the generative model average (e.g., a more prominent brow) and blue color-coding shows features that deviate inward from the average (e.g., a smaller nose). In the generative model, these randomly weighted identity components are then added to the average face for a given ethnicity, sex and age—in Figure 1A, the three faces below show the results of adding the same identity components to the average face for a Black African (BA), East Asian (EA), or White European (WE) male aged 25 years (see Methods—Stimulus generation). Thus, the randomly weighted identity components precisely control how the facial features of the stimulus change on each trial.
In a between-subjects design, observers (N = 60, white Western European, sex-balanced; see Methods—Observers) rated each resulting face stimulus according to perceived trustworthiness or dominance using a 7-point bipolar scale (e.g., ‘very untrustworthy’ to ‘very trustworthy’ with ‘neutral’ as the mid-point) in separate blocks randomly ordered across the experiment (see Methods—Task procedure)—in Figure 1B, the observer rated this face as ‘somewhat trustworthy’ (see red box). Each observer completed 2,400 trials per social trait rating task (n = 20 observers per face ethnicity), with stimulus sex blocked and randomized across the experiment for each observer. Importantly, to directly compare whether and how face ethnicity changes the facial features that drive social trait perception, we used identical facial feature variations across all experimental conditions. That is, we used the exact same 2,400 randomly generated identity components (1,200 per stimulus sex) in each of the three face ethnicity conditions, for each observer and each rating task. Thus, across all face ethnicity conditions, the faces had the same age, sex, and random identity components, and differed only according to the ethnicity of the average base face (see also SM—Expressivity of generative face model).
Following the experiment, we modelled the specific facial features that drive the perception of trustworthiness and dominance in each individual observer in each face ethnicity condition. Specifically, we measured the statistical relationship between the randomly generated identity components used on each trial and the observer’s corresponding social trait ratings, using linear regression (see Methods—Modelling procedure). Figure 1C illustrates this procedure with example trustworthiness ratings from one observer viewing Black African male faces. This analysis produced a quantitative 3D face model for each individual observer, face ethnicity, social trait, and stimulus sex—in Figure 1C, the color-coded faces show the results for 3D shape (i.e., deviations in cartesian space) and 2D complexion (i.e., differences in each of the three L*a*b color channels; e.g., see Weatherall & Coombs, 1992) separately (see SM—Model visualization). We thus produced a total of 240 face models ([20 observers 3 face ethnicities 2 social traits 2 stimulus sex]), which we validated using a leave-one-out cross-validation prior to further analyses (see Methods—Model validation). Figure 1D shows examples of the resulting validated 3D face models for trustworthy, untrustworthy, dominant, and submissive male faces, from one representative observer in each face ethnicity condition.
Our data-driven approach provides several advantages. First, by agnostically generating facial features from a high-fidelity model of the human face, we can model those that drive social trait perception in individual observers without constraints or biases imposed by prior assumptions (see Jack & Schyns, 2017 for further discussion). Second, by using the exact same randomly generated identity components in each face ethnicity condition, we can isolate how face ethnicity influences the specific facial features that observers use to make each social trait judgment. Third, our per-observer analyses preserve individual variation rather than erase it as traditional averaging approaches can do. This in turn enables any replications of effects to be demonstrated across N observers in the tested sample (Ince et al., 2022) and thus provides an estimate of the prevalence of these effects in the sampled population (Donhauser et al., 2018; Ince et al., 2021; see also Methods—Population prevalence).