This systematic review demonstrates an increase in the use of IVR methods to induce body ownership illusions and investigate wider social identities. Although this review has highlighted the potential for IVR methods to be used to understand intergroup attitudes, our results show that there is a large homogeneity of the population groups (sampled populations & embodied avatars) used and inconstancies in the outcome measures. These results are unpacked in the discussion below.
Demographic biases were demonstrated consistently in nearly all studies included in the review. All studies, except for one, were conducted in the global North and, thus, comprised of participants from predominantly western, educated, industrialised, rich and democratic (WEIRD) settings and populations (Henrich et al. 2010). WEIRD populations often lack representation from a wide range of racial or ethnic backgrounds. As identified by Durrheim (2023), progressive calls such as ‘WEIRD’ exclude conversations around race, suggesting an extension of the term to white and western populations. As such, various measurement and sampling-related biases emerged in the review, underscoring the current limitations in external validity within the existing studies. Considering the complexity and diversity of human behaviour and attitudes within and between different racial groups, the results from the reviewed studies may portray an incomplete perspective. A lack of consideration of such cultural differences and social context may lead to a superficial understanding of the effects of the embodiment phenomena in IVR, reducing the depth of insights that this research is able to offer. Additionally, researchers originating from WEIRD locations may possess their own implicit assumptions regarding the cultural, socio-political and racial norms of their own societies, which will inadvertently shape the research process and interpretation of results.
By a similar token, the existing literature has investigated either an ingroup perspective (i.e., embodiment in a same-race avatar) or an outgroup perspective (i.e., embodiment in a different race). In both types of designs, the ingroup perspective typically involved White participants embodying White avatars, while the outgroup perspective entailed White participants embodying Black avatars. Only one study included different ingroup-outgroup embodiment conditions other than Black and White ethnic groups. Within this study, Singaporean Chinese (SC) participants (ingroup) embodied both SC avatars and People’s Republic of China (PRC) Chinese avatars (outgroup). Of the studies that recruited participants from different ethnic groups (i.e., Asian, Hispanic or self-identified Other participants), most emulated the abovementioned patterns. That is, most still involved participants embodying either a combination of Black and White avatars (Groom et al. 2009; Tassinari et al. 2022) or Black avatars only (Thériault et al. 2021). Hence, while these studies can be considered more inclusive by recruiting a more diverse sample, the embodiment condition is still limited to Black and White social groups. Moreover, even with the inclusion of a more varied participant pool, the predominant majority remains White in most cases, except for a single study in which the characteristics of the embodied avatar are unclear (Alvidrez & Peña 2020).
Taken together, our findings reveal that in addition to the overrepresentation of WEIRD participants, the majority of articles either exclusively or predominantly recruited White participants who embody either same-raced (White) avatars and/or Black avatars, the latter of which is consistently used as the representation of the social outgroup. While it has been argued that this choice is frequently influenced by demographic attributes — that is, White individuals being the dominant majority in the study region — the studies in the current review do not provide a theoretically driven justification or rationale for their choice of study sample and avatar race. Additionally, race is often considered to be a sensitive or challenging area of empirical inquiry (Silverio et al. 2022), especially considering that the research area of focus is particularly controversial in that it involves one racial group embodying another. As a result, researchers might be hesitant to incorporate ethnically varied participants and avatars who could potentially embody marginalised social groups. This, in turn, could contribute to the observed patterns in the studies included. Nevertheless, the tendency to solely or predominantly draw on a White sample perpetuates and reifies the notion that this particular population sets the standard against which others are to be measured. Thus, future research should be directed at diversifying not only the avatars embodied but also the research sample to better capture heterogeneity within diverse populations.
Although the included studies in this review had limited diversity in their sample and avatar embodiment, they demonstrated that embodiment can be induced for outgroup (e.g., black or PRC group) avatars. In particular, ten out of the 12 studies included some form of embodiment questionnaire — assessing either immersion and feelings of presence in the virtual world or body ownership — thereby controlling for the success of the IVR experience. There is a need for further standardisation of measures of embodiment, such as The Participant Experience of Embodiment Questionnaire developed by Peck and Gonzalez-Franco (2021). It is also worth noting that an exclusively psychometric approach to assessing embodiment omits a dimension of depth that could be enriched through the incorporation of qualitative methods. (Hassard 2023; Lewis & Lloyd 2010).
In addition to assessments of embodiment experiences, it is necessary to re-evaluate the appropriateness of both explicit and implicit measures of racial bias. In particular, the IAT — commonly used as a standard measure of implicit racial bias — requires substantial refinement. While the IAT may enable researchers to study and predict individual behaviour over and above self-report measures, it has engendered some controversy in the literature. Major psychometric critiques include the IAT exhibiting modest test-retest reliability (Bar-Anan & Nosek 2014; Gawronski et al. 2017) and unsatisfactorily low implicit-criterion correlation (ICC) and incremental predictive validity (Greenwald et al. 2009; Kurdi et al. 2019; Oswald et al. 2013). Moreover, the test is criticised for the lack of clear cut-off points between bias and unbiased scores (Mitchell & Tetlock 2017) and its susceptibility to extraneous influences that affect IAT response times, such as general processing speed (Blanton et al. 2006) or executive functions, namely, task-switching ability (Ito et al. 2015). Different approaches to the IAT administration have also proved problematic, with repeated administrations diminishing the magnitude of the effect for a particular individual and single IAT administration yielding more polarised results compared to second or subsequent administration (Greenwald et al. 2022), the latter of which possibly accounts for the results obtained (Groom et al. 2009). However, the issues associated with the IAT administration may be alleviated by using an improved scoring algorithm (Greenwald et al. 2009) or using pretest-post-test administrations of the IAT (Greenwald et al. 2022), respectively. Nevertheless, the results of our meta-analysis have shown that the IAT remains sensitive to experimental manipulations of embodiment in IVR settings.
Borne out of our meta-analysis results, there appears to be general heterogeneity and discordance on the best practices to adopt for IVR studies, particularly in relation to the variability of measures of explicit bias. Moreover, there are also different scoring methods for measures such as the IRI, some of which have yet to be verified (Wang et al. 2020), including the use of standalone scores for empathy subscales (Patané et al. 2020; Thériault et al. 2021) or the sum of scores, suggesting that empathy is a general construct. The use of a variety of different tests and scoring methods brings into question the convergent validity of these measures, that is, the degree to which a test is related to other measures of the same construct and, consequently, the degree to which results are comparable across studies (Westen & Rosenthal 2003). Thus, throughout the examined studies, a wide range of prejudice-related measures, questionnaires and tasks are administered without any apparent emerging standards in the field. Although there is a need to replicate studies using existing measures, future research should also consider using alternative implicit and explicit measures to demonstrate a consistent change in racial bias or prejudice across a diversity of measures.
Finally, the current systematic review has notable strengths, namely, the inclusion of the quality appraisal procedure, the use of PICO and PRISMA guidelines, and the use of two independent reviewers together with a third reviewer to resolve conflict. These factors assisted in minimising errors and enhancing the power of the review. Nevertheless, notable limitations persist, including the scope of the literature search, which was only conducted in four major electronic databases (Embase, Global Health, MEDLINE and PsycINFO). Additionally, articles were only included if they were available in English. Therefore, the use of limited databases and the decision to solely incorporate English articles may have led to the oversight of additional relevant studies. The significance of language choice as a limitation of this review becomes particularly apparent considering that a drawback we identified is the prevalence of studies conducted in WEIRD settings such as the USA, where English is the predominant language. Hence, the inclusion of articles contingent on whether they are in English possibly constitutes selection bias. Another potential limitation is the small number of articles included in the review. While the relatively limited number of included articles may be indicative of stringent inclusion criteria, it is possible that it is largely dependent on the research topic of the review as well as the amount of available supporting evidence. Lastly, there is some heterogeneity across the studies in terms of control groups, interventions and measures, which have the potential to affect study results (Bartolucci & Hillegass 2010). Nevertheless, heterogeneity may be attributed to the scope of the review, as it determines the extent to which the included articles are diverse.