Here we report the general methods for Studies 1 and 2. Information specific to each study (e.g., participants, exclusion criteria) are reported for Study 1 (section 4.2) and Study 2 (section 5.2) separately.
2.1 Open Science Statement
We conducted this study in accordance with open-science practices. Confirmatory analyses and exclusion criteria were pre-registered on the Open Science Framework, with the full pre-registrations viewable at: Study 1 https://osf.io/ek8hj, Study 2 https://osf.io/urvg3.
2.2 Participant recruitment
All participants were recruited online via Prolific (www.prolific.ac), self-reported as neurotypical, with English as their primary language, and the US as both their country of birth and current place of residence. Participants reported no significant visual impairments, mild cognitive impairments, or dementia. Finally, participants had to have joined Prolific before 2020 and have a current approval rate of over 90% to ensure data quality. All participants were tested online using Gorilla (gorilla.sc, Anwyl-Irvine et al., 2019). None had taken part in any pilot studies associated with this project and, upon completion of the study, were paid for their participation. Ethical approval was obtained from the local Health Faculties Research Ethics Subcommittee, (removed for peer review), all research was performed in accordance with the regulations of the Declaration of Helsinki and, as such, informed consent was obtained from all participants prior to testing.
2.3 Stimuli Development
A novel stimulus set was created based on the Survey of Beliefs and Opinions (Saucier, 2018), and responses to this survey constituted the ground-truth against which accuracy was assessed. The original survey included statements such as “We must ensure that women have access to legal abortions”, “I feel that people get what they deserve” and “Only adults who know how to read and write should be allowed to vote”. Thus, a person’s previous responses to these statements constitute their mental states, their propositional attitudes (Butterfill & Apperly, 2013; Leslie, 1987). In the current study, a subset of 168 statements was selected based on the correlation between previous responses across these statements in a sample of 703 American respondents (Saucier, 2018). Specifically, we selected 24 subsets of 7 statements (see Supplemental Materials for full list of statements), where each subset contained an initial "starting" statement, a correlated "target" statement (rM=.34, rSD=.09), and 5 additional statements that were correlated to varying degrees with the target statement. For each subset, we selected a “target-mind” – a previous respondent whose responses across these statements were set as the ground-truth against which new participants in the current study would be assessed in terms of accuracy at inferring their responses. To ensure this ground truth was not idiosyncratic, we selected target-minds whose responses across all statements in their subset was the modal response for all the previous respondents.
2.4 Procedure
2.4.1 Task 1: Assessing Group Status
The first task was completed to determine participants’ group status in relation to each of the target-minds they would be asked to make mental state inferences about. Participants were presented with 24 “starting” statements and were asked to state their agreement/disagreement with the statement on a 5-point Likert scale. Across all the trials and tasks, the scale was as follows: 1=“strongly disagree”, 2=“slightly disagree”; 3=“neither agree or disagree”; 4=“slightly agree” and; 5=“strongly agree”. This scale was chosen to broadly align with responses in the original sample (Saucier, 2018) from which these statements were selected. The participant’s response was compared to the target-mind’s response and the difference in agreement between the two was used to index group status (i.e., whether the target-mind was in-group or out-group with respect to the current participant). When the participant and target-mind had a difference in agreement score of 0 or 1 (i.e., the participant gave the same response to the target-mind or matched the target-mind on either agreeing or disagreeing with the statement, but with different strengths of conviction, e.g., slightly agree vs strongly agree) they were defined as in-group. Trials in which participants gave a neutral response (i.e., that they neither agreed nor disagreed) were not coded (see exclusion criteria). Finally, when the participant and target-mind had a difference in agreement score of either 2, 3, or 4, they were defined as out-group.
On each trial, participants were not explicitly made aware of this in- or out-grouping, thus any effect of group status is a consequence of the participant’s own perception of differences between themselves and the target-mind.
2.4.2 Task 2: Predicting the views of in-group and out-group members
This task measured participants’ propensity to consider the target-mind, their accuracy in inferring the mental states of that mind, and how aware participants were of their own ability to make accurate inferences. This latter measure was operationalised as the relationship between actual inference accuracy and confidence in the accuracy of that inference. On each of 24 trials, participants were presented with a target-mind’s response to an initial “starting” statement. For instance: “Participant 437 [the target-mind] said that they strongly disagree that they believe in the superiority of their own gender.” Based on this information about one of their mental states, participants were asked to predict the target-mind’s response to a second target statement on a scale of 1–5, whereby 1 denoted “strongly disagreed” and 5 denoted “strongly agreed”. For instance, participants were asked how far they thought Participant 437 thought that ‘We should ensure that no one is denied a job due to prejudice’. No feedback was given. Participants were then asked how confident they were in their answer and could respond on a scale of 0-100, where 0=“Not confident at all” and 100=“Extremely confident”.
Next, participants were told that they could buy up to 5 further pieces of information about the target-mind, where each piece of information was an additional statement and the target-mind’s response to it. Participants were advised to seek as sufficient an amount of information as they felt they required to make an accurate prediction. This opportunity to seek further information was utilised as a measure of participants’ propensity to seek further information about a person’s mental states. Once participants had selected how many pieces of information they wanted to buy (0–5 pieces), any sought additional information was revealed in a randomised but fixed order across participants.
Thereafter, participants were asked to predict how the target-mind responded to the target statement for a second time, enabling the participant to update their original answer based on the new information. As before, participants were also asked to state how confident they felt in their updated answer. Still, no feedback was given. If participants opted to seek no additional information, they were still given the opportunity to update both their prediction and their confidence in the prediction. However, they were additionally asked to verify whether they had changed their prediction, despite having no further information and, if so, why. Participants could answer via a free-form text box and this measure was included as an attention check and informed the exclusion criteria below.
2.4.3 Task 3: AQ-28 & TAS-20
It has previously been demonstrated that autistic and/or alexithymic participants may perform differently on mentalising tasks relative to neurotypical individuals (Oakley et al., 2016; Pisani et al., 2021). Therefore, all participants completed the Autism-Spectrum Quotient Test (AQ-28; Hoekstra et al., 2001), a 28-item questionnaire designed to measure the expression of autistic traits in an individual. Lastly, all participants completed the Toronto Alexithymia Scale (TAS-20; Bagby, Parker & Taylor, 1994), a 20-item questionnaire designed to measure difficulty in identifying and describing emotions. These questionnaires were administered, despite recruiting only self-reported neurotypical participants, to further characterise the sample.
The study lasted approximately 35 minutes on average and participants were debriefed at the end.
2.5 Design
Both studies used a within-subjects design with 2 factors: group status, with two levels of in-group vs out-group; and timepoint, with two levels including before (timepoint 1) and after (timepoint 2) the opportunity to buy further information about the target-mind.
2.6 Measured variables
We measured three variables. First, we examined participants’ propensity to consider the minds of others and whether this was affected by their perceived group status. This propensity was operationalised as the percentage of available information about the target-mind (i.e., the additional statements and the target-mind’s responses to those statements) that was bought. Second, we measured the accuracy of participants’ mental state inferences both before (timepoint 1) and after (timepoint 2) any further information was bought. Accuracy was calculated as the percentage of correct responses, such that only cases where participants selected the target-mind’s exact response were coded as correct, while all other responses were coded as incorrect. Third, we measured participants’ awareness of their own ability to make accurate inference by measuring their confidence in their answer both before (timepoint 1) and after (timepoint 2) any further information was bought, in order to relate this to their accuracy.