Participants
We collected data from 42 participants; 2 participants were unable to complete all experimental sessions and so were dropped from analysis, leaving 40 complete datasets (mean age = 17.1; std = 0.9; 22 female). Participants were recruited through online advertisements and flyers. The study description indicated that we were interested in studying how different forms of spending time alone (with or without access to virtual interactions) affects adolescent cognition. Interested individuals completed a screening questionnaire to assess eligibility for the study (questionnaire data was collected using Qualtrics and REDCap56 Software). Participants were asked to indicate their gender (how they describe themselves) and sex (assigned at birth). In our sample of participants, gender labels were not different from sex labels and we use the term gender throughout the main text and supplement.
Participants were eligible to take part in the study if they were between 16–19 years of age, and reported no permanently implanted metal in their body, no history of brain damage and no currently diagnosed mental health disorder. Because data for this study was collected during a COVID-19 pandemic, we also followed exclusion criteria based on a departmental COVID-19 risk assessment. These criteria excluded participants with chronic underlying health conditions (such as asthma), participants who currently felt ill (or had tested positive for COVID-19), and participants who smoked. As we aimed to study effects of isolation in a sample of adolescents who have frequent and regular social interactions, we also excluded people who: (1) lived alone; (2) reported high feelings of chronic loneliness on the UCLA loneliness scale26 (we excluded adolescents with scores > 50, which is 2 standard deviations above the mean for an adolescent sample57); and/or (3) reported substantially smaller social network sizes than previously reported for an adolescent sample58 measured via two measures: (a) Number of close friends (the original questionnaire asks for number of people who give social support59, which we adapted to number of close friends to simplify the question for our adolescent sample); and (b) Number of social interactions in the past month: counting face-to-face and virtual social interactions that were primarily social in nature (i.e., excluding professional interactions, like talking to a doctor, teacher, or hairdresser). We excluded participants who reported fewer than two close friends and fewer than 10 social interactions in one month (which is ~ 7 standard deviations below the previously reported means for adolescents for both measures58). The exclusion thresholds for the social network size measures were lower than previously reported24 because data for this study was collected during the COVID-19 pandemic, which required social distancing and therefore lower levels of social connectedness were expected throughout the population.
All experimental procedures were approved by the Psychology research ethics committee at University of Cambridge, UK. Participants signed a consent form describing all experimental procedures before participating in the study. Each participant was compensated with up to £127 (minimum payment for each participant was £107) for participating in three sessions.
Experimental procedures
Each participant underwent three experimental sessions, separated by at least 24 hours. Figure 1 shows an overview of the experimental procedures. Each participant first completed a baseline session which included MRI scanning and the behavioural tasks (see Behavioural: Reward tasks for details). Following the baseline session, participants were invited to two isolation sessions (order counterbalanced). One session included up to 4 hours of total social isolation (iso total) during which participants had no access to any social interactions (real-life or virtual); another session included up to 4 hours of social isolation with access to virtual social interactions (iso media).
Following the methods of Tomova et al. 202024, we sought to experimentally induce the subjective experience of loneliness in adolescent participants. Re-analysing loneliness ratings from a subset of 18–24-year-old participants from Tomova et al. 2020 and pilot measures in two 16-year-olds (Ntotal = 21) showed that two hours of isolation resulted in increased self-reported loneliness in this age group. We therefore reduced the duration of isolation in this study compared with Tomova et al. 2020. The minimal duration of each isolation session was set to 3h 30min and the maximal duration was 4 hours. We randomly assigned an added duration of 0–30 mins of isolation time to each session (in steps of 5 mins, separately for each session), so that participants were not able to predict the precise end of the isolation period in either session. The average isolation duration was similar between sessions (iso total: m = 3h 46 mins, std = 11.2 mins; iso media: m = 3h 47 mins, std = 10.0; t(39)=-0.28, p = 0.79). All participants underwent all three experimental conditions (baseline, iso total and iso media). Each participant was pseudo-randomly assigned to one order, with the restriction that baseline was always first and that each order was approximately equally likely in the full sample. This design allowed us to compare effects of both types of isolation to a baseline unaffected by experience of isolation, and to compare effects of being totally isolated to being isolated with access to virtual interactions, while keeping other factors (such as spending time physically alone in a room) constant.
Note that in this study power calculations were made for within-subject effects of isolation (see also preregistration) and between-subject analyses (i.e., analyses on the predictors of individual differences in sensitivity to isolation) should be taken as preliminary.
During the iso total session, participants had access to a variety of non-social activities (such as games, puzzles, sudoku, books, drawing/writing supplies). We also encouraged participants to bring activities that they would like to occupy themselves with during the isolation. Examples of what participants chose to bring with them included school homework, a jewellery making kit, art materials and nail polish.
As we aimed to keep all social interactions during this session to a minimum, participants were given extensive instructions about the isolation procedures and the subsequent behavioural paradigms before starting social isolation. Participants gave their phones and laptops to the experimenter and were guided to a room containing an armchair, a desk and an office chair, and a fridge with a selection of food and beverages. The room had a roof window, which allowed day light to enter the room but kept participants from seeing other people outside the building. Participants remained in that room for the duration of the isolation and the subsequent behavioural testing. Participants were told that they could spend the isolation time however they wanted with the restriction that they should fill out the questionnaire every hour and should avoid sleeping during isolation. Participants were reminded to fill out the questionnaires via an alarm sound that went off every hour during isolation and at the end of isolation, indicating to participants that they should start the behavioural tasks. Participants were provided with a desk computer (with parental controls enabled), allowing them to visit only our Slack channel (an online messenger software allowing communication between a group of people (www.slack.com)) and the webpage containing our online questionnaires. Messaging in Slack was restricted to emergencies (that is, in case participants ran into problems that required assistance from the research team during isolation). During isolation and the subsequent behavioural testing, we monitored participants via a live camera, which allowed a researcher to check in on the participant without interacting with them. The camera only provided a live stream of the participant in the room to the experimenter and did not store any recordings of participants. Participants were informed about the camera in an information sheet they received before agreeing to participate in the experiment. Participants completed an online questionnaire (see Questionnaires for details) every hour during the social isolation period. The behavioural testing was started immediately after the social isolation. Participants first filled out questionnaires and then started the behavioural tasks, which they started themselves (i.e., there was no interaction with the experimenter between the end of isolation and the beginning of the behavioural testing). After the session, a member of the research team chatted with the participants about their experiences during isolation and made sure participants were not feeling troubled.
Procedures during the iso media session were the same as in the iso total session, except that participants could bring any electronic device with them into the room and were told they could use them as much as they wanted. Most participants brought their tablet or laptop as well as their phones. See section Questionnaires for descriptive data on how participants reported to have engaged with others virtually during the iso media session.
MRI scanning
MRI data were collected on an ultra-high field 7 Tesla MRI (Siemens 7T Terra) located at the Wolfson Brain Imaging Centre, University of Cambridge. For each participant, an MP2RAGE sequence was used to collect T1-weighted structural images in 224 interleaved sagittal slices with 0.7-mm isotropic voxels (FOV: 224 mm). We also collected a field map (phase-difference B0 estimation; echo time 1 (TE1) = 1.11 ms, echo time 2 (TE2) = 3.06 ms) to control for spatial distortions, which are particularly problematic with ultra-high field scanning60. Furthermore, DTI images were acquired with a scanning protocol of 90 1.4 mm-thick contiguous axial slices with an in-plane resolution of 1.4 × 1.4 mm providing full brain coverage. DTI data are outside the scope of this manuscript and will be described in a separate paper.
During acquisition of the anatomical images (~ 7 min in total), participants underwent a practice run of the MID task (see “MID task” for details).
Subsequently, we collected functional data during four runs of the MID task. Each functional run consisted of 250 volumes with 96 T2*- weighted echo planar slices (TR = 1500 ms, TE = 25 ms, FOV = 192 mm, 128 x 128 matrix, yielding a voxel size of 1.5 x 1.5 x 1.5 mm3) acquired as a partial-head volume in an anteroposterior phase-encoding direction using interleaved slices. The MID task took approximately 25 mins in total.
Functional MRI: Neural Reward Sensitivity
Using functional magnetic resonance imaging (fMRI), we measured blood-oxygen-level-dependent (BOLD) signal at a single voxel in successive scans (a voxel time-series) in response to anticipating rewards of different magnitudes during a monetary incentive delay (MID61) task. The MID task is a well-established task used to study neural correlates of reward anticipation (i.e., winning money) and was used here as a marker of neural reward sensitivity. During the task, participants were able to win either £0.2 (small win) or £5 (large win), or lose £–0.2 (small loss), or £–5 (large loss). On some trials, participants did not win or lose any money, which served as control trials. On each trial, participants were first presented with a cue indicating whether they could win or lose money on that trial. The cue was a yellow star with the amount of the possible win (or loss) written inside (i.e., 0 (control trials), 0.2 (small win trials), 5 (large win trials), − 0.2 (small loss trials), − 5 (large loss trials)). Following the cue, participants saw a white circle on the screen presented for a jittered time interval (range 1.5–4 s, mean 2.5 s) briefly followed by a white square, which was presented for 100 ms and then the circle again. Participants’ task was to press a button as fast as possible when they saw the square. Following their response, participants were presented with feedback revealing whether they won (or avoided a loss) or not on that trial (presented for 500 ms). At the end of each trial participants saw a fixation cross for a jittered time interval (range 2–6 s, mean 3.5 s) before the next trial began. The task was adaptive and the time window for how fast participants needed to respond narrowed or widened depending on their response times during the task to ensure that participants won on most (~ 80%) of the trials (to keep the task rewarding) but also that the task was not too easy (to keep participants engaged). Participants underwent a practice run of the task during the anatomical MP2RAGE scan, and the data from this run was used to calibrate the initial response window for the MID task.
Using functional magnetic resonance imaging (fMRI), we measured blood-oxygen-level-dependent (BOLD) signal at a single voxel in successive scans (a voxel time-series) in response to anticipating rewards of different magnitudes during a monetary incentive delay (MID61) task. The MID task is a well-established task used to study neural correlates of reward anticipation (i.e., winning money) and was used here as a marker of neural reward sensitivity. During the task, participants were able to win either £0.2 (small win) or £5 (large win), or lose £–0.2 (small loss), or £–5 (large loss). On some trials, participants did not win or lose any money, which served as control trials. On each trial, participants were first presented with a cue indicating whether they could win or lose money on that trial. The cue was a yellow star with the amount of the possible win (or loss) written inside (i.e., 0 (control trials), 0.2 (small win trials), 5 (large win trials), − 0.2 (small loss trials), − 5 (large loss trials)). Following the cue, participants saw a white circle on the screen presented for a jittered time interval (range 1.5–4 s, mean 2.5 s) briefly followed by a white square, which was presented for 100 ms and then the circle again. Participants’ task was to press a button as fast as possible when they saw the square. Following their response, participants were presented with feedback revealing whether they won (or avoided a loss) or not on that trial (presented for 500 ms). At the end of each trial participants saw a fixation cross for a jittered time interval (range 2–6 s, mean 3.5 s) before the next trial began. The task was adaptive and the time window for how fast participants needed to respond narrowed or widened depending on their response times during the task to ensure that participants won on most (~ 80%) of the trials (to keep the task rewarding) but also that the task was not too easy (to keep participants engaged). Participants underwent a practice run of the task during the anatomical MP2RAGE scan, and the data from this run was used to calibrate the initial response window for the MID task.
Questionnaires
Baseline. At the beginning of the behavioural testing in the baseline session, participants completed self-report questionnaires assessing trait and state anxiety (the State Trait Anxiety Index (STAI62)) and depression (the Center for Epidemiological Studies Depression (CES-D) scale)63. We report exploratory analyses assessing interactions between individual differences in trait anxiety and depression, in addition to measures of chronic loneliness (UCLA loneliness scale26) collected during the online screening, and the effects of our experimental manipulations in the supplementary text. Measures of self-reported loneliness (0-100), social craving (0-100), boredom (0-100) and current affect (using the Positive and Negative Affect Schedule (PANAS64) were also acquired at the beginning of the baseline session. In addition, participants were asked whether they drink alcohol and whether they vape (at least once per month). If yes, participants were asked to rate their current urge to drink alcohol and to vape on a scale from (0) not at all to (100) extremely. In accordance with the University COVID-19 risk assessment, participants who reported that they smoke were excluded from participation, and therefore we did not ask about craving in the domain of smoking. Because only 55% of our participants (N = 22) reported that they drank alcohol at all, and only two participants reported that they vaped, there was not sufficient power to assess effects of isolation on alcohol or vape craving.
Isolation sessions. Following each isolation session (iso media and iso total), immediately before participants started the behavioural tasks, we assessed state anxiety and craving for alcohol and vaping. In addition, participants self-reported loneliness, social craving, boredom and current affect at the beginning of each isolation session (T0) and then after every hour of isolation for 3 hours (T1, T2, T3), followed by a final rating at the end of isolation (T4; 3 hours 30 min to 4 hours of isolation, varying between participants). Effects of isolation on state anxiety are reported in the supplementary text.
All statistical analyses using self-report questionnaires were conducted on measures taken after 3 hours (T3) of isolation. This allowed us to capture participants’ affective state after a substantial period of isolation but before they knew it was over. Plots including loneliness and mood ratings at the final time point (T4) are shown in Fig. S5, and statistics using these time points are included in the supplementary text. Note that the general results remain the same regardless of whether T3 or T4 ratings were used. Boredom measures were used in exploratory analyses to assess whether self-reported boredom interacted with the effects of isolation (see supplementary text).
After the iso media session, participants completed a questionnaire which asked them to report how much they engaged in virtual social interactions during the session (percentage of time spent engaging in virtual social interactions during session (0-100%)). We also asked participants to indicate which method(s) (texting/messaging, voice calls, video calls, commenting/posting, gaming, other) and platform(s) (Instagram, Facebook, Facebook messenger, Snapchat, TikTok, Twitter, WhatsApp, other) they mostly used for their virtual social interactions during the session. Participants were asked to list whom they interacted with virtually (i.e., friends, family, acquaintances, romantic partner, other). Participants could select multiple options and, if they selected “other”, they were asked to specify.
Participants self-reported that, on average, they spent 47% of their time engaging in virtual social interactions during the iso media session (std = 28%; range = 5-100%). Most participants (35 out of 40) engaged with virtual social interactions more than 20% of iso media session and 18 out of 40 participants (45%) spent more than 50% of the session engaging in virtual social interactions. We asked participants what they used for connecting with others (multiple selections possible) and the majority of participants reported using instant messaging (37 out of 40). Some participants reported posting (9) and a small number reported engaging in voice calls (3), video calls (2) or gaming (3). Participants mainly used Snapchat (28), Instagram (27) and WhatsApp (23) to connect with others, followed by TikTok (10), Twitter (3), Discord (3) and Facebook/Facebook Messenger (2). The majority of participants connected virtually with friends (38), followed by family (19), romantic partners (13) and acquaintances (4).
Behavioural: Reward tasks
We used two computerized behavioural tasks to measure reward responsiveness in two domains (i.e., reward seeking and reward learning) at baseline and at the end of the two isolation sessions (iso total and iso media).
Reward seeking. We used an effort-based decision making (EBDM) task, which was an adapted version of the Effort Expenditure for Rewards Task (EEfRT65). In our adapted version, we manipulated and orthogonalized reward and effort. Participants were presented with a series of trials with different combinations of high rewards and low rewards for successfully completing “hard tasks” and “easy tasks”. First, participants underwent two calibration trials in which the task measured their maximum effort (the maximum number of button presses performed in 10 s using their index finger). Based on this, “hard task” trials required participants to make 80% of the maximal button presses they managed in 10 s to succeed. “Easy task” trials required participants to make 40% of the maximal button presses they managed in 10 s to succeed. Rewards earned could be either 1 point (low reward) or 4 points (high reward) and participants were presented with combinations of required effort (hard/easy) and potential rewards (high/low) on each trial before deciding whether they wanted to perform that trial’s task. Points were converted into monetary value at the end of each session.
In addition, we added the element of context (nature vs social). After successfully performing a trial, participants saw a message indicating how much they won (1 point or 4 points depending on the reward value of that trial) and pictures of landscapes (nature context; e.g., pleasant pictures of mountains, beaches, rivers, etc.) or social pictures (social context; e.g., pleasant pictures of people hugging, laughing together, etc.). Participants either saw 1 picture or 4 pictures, corresponding to the number of points they earned in that trial (i.e., 1 picture for 1 point and 4 pictures for 4 points). This allowed us to measure the extent to which effort-based decision-making was modulated by reward and whether isolation selectively enhanced reward seeking in social vs non-social contexts. In total, there were 8 conditions: high effort - high reward; low effort – high reward; high effort – low reward; low effort - low reward, each in either a nature or social context, with 6 trials per condition (48 trials in total), which were presented in random order.
Participants had to indicate whether or not they wanted to play each trial. There was no time restriction for participants’ decisions. If participants completed the trial successfully (i.e., pumped up the bar within the time limit), they saw the social or nature pictures described above. Pictures were presented for 8 s together with the number of points participants won. After each trial, a fixation cross was presented for 0.5 s.
Reward Learning. The ability to learn stimulus–reinforcement associations and to reverse them based on probabilistic feedback was measured using a probabilistic reinforcement and reversal learning task66,67. In the present task, participants were shown two slot machines and asked to choose between them to obtain a reward. One slot machine was rewarded 80% of the time; the other was rewarded 20% of the time. Participants needed to learn from feedback through trial and error which slot machine was rewarded more often. After 7 trials, the reward contingencies switched and participants needed to learn the new reward contingencies. Feedback was given via symbols (non-social feedback) and facial expressions (social feedback) in two counterbalanced blocks (28 trials per block, in total 56 trials). Rewards (positive feedback) were represented by either a plus symbol (+; non-social) or a smiling face (social), while the absence of a reward (negative feedback) was represented by a zero symbol (0; non-social) or a neutral face (social).
The two facial pictures (smiling and neutral) were generated by averaging facial pictures of Caucasian young adults. To do this, we used happy and neutral faces from the Averaged Karolinska Directed Emotional Faces set68. We averaged the female and male faces for each emotion (happy and neutral) rendering them ambiguous as to gender. Photoshop was used to create the averages. We cropped the images to remove the background/hair and display just the face. Facial pictures were displayed in black and white to match the non-social feedback. Participants were given 1 s to respond on each trial and then received feedback for 0.5 s. A fixation cross was presented for 0.5 s between each trial.
Participants also underwent three other tasks during the experiment (a go-no go task, a peer influence task and a threat learning task and (see full description of experimental procedures here: https://osf.io/kbgsv). The order of the tasks was counterbalanced between participants with the exception that the threat learning task was always presented after the reward tasks reported here, so that there would be no effect of threat exposure on reward processing. Results from the other behavioural tasks will be reported elsewhere.
Data analysis
Reward Seeking. We calculated the sum of number of played trials in the EBDM task and mean response times across all trials for the following conditions (nature or social): high effort - high reward; low effort – high reward; high effort – low reward; low effort - low reward (8 conditions in total). Response times (RTs) to choose whether to undergo an effortful task have been shown to be indicative of strength of preference27 and here were used as the main measure of reward seeking due to the low variance in participants’ choice data (most participants chose to play every trial in the task). We assessed whether the number of played trials and RTs when deciding to play differed between sessions (baseline, iso total, and iso media). We used mixed effects models to investigate fixed effects of session, context, reward and effort on choices and RTs (for trials in which participants decided to do the effortful task) using separate models for choices and RTs. Subject was included as a random effect (allowing intercepts and slopes to vary between participants69). Calibration (i.e., the number of button presses participants managed at the beginning of each session from which hard and easy tasks were calculated) was added as a control variable in the model.
The command for both models (i.e., choice and RT data) was: fitlme(Data,'Response~(session*effort*reward*context)+(Calibration)+(session*effort*reward*context|subjectID)').
Reward Learning. Participants’ choices from the RL task were analysed using a computational reinforcement learning and decision-making model for probabilistic reversal learning tasks66.
To assess learning during the task, we first extracted the trial-by-trial responses for each participant, then employed a computational reinforcement learning model for probabilistic reversal learning using a hierarchical Bayesian modelling approach66 to estimate the learning rates for each participant.
Here, individual parameters are drawn from group-level normal distributions. We used standard normal priors for the group-level meanse.g., 70–72. For the group-level standard deviations, we used half-Cauchy prior distributions, which tend to give sharper and more reasonable estimates than uniform or inverse-Gaussian prior distributions (see reference for details 73).
To select a model that best captured the behaviour of participants, we first compared model fits between three different reversal learning models on the data from each session. Each model employs the Rescorla-Wagner value update rule30 but differs in terms of how information is integrated. The Rescorla-Wagner update rule assumes that individuals assign and update internal stimulus value signals based on the prediction error, i.e., the mismatch between outcome (received reward/punishment following choice of this stimulus) and prediction (expected value of choosing this stimulus). The following models were compared:
(i) A reward-punishment model29, which expands the classic Rescorla-Wagner model of conditioning30 with separate learning rates for reward and punishment trials, here treating non-wins as punishments:
$${V}_{c,t}= \left\{\begin{array}{c}{V}_{c,t-1} + {\eta }^{rew}({O}_{t-1}- {V}_{c,t-1}), if {O}_{t-1}>0\\ {V}_{c,t-1}+ {\eta }^{nrew}({O}_{t-1}- {V}_{c,t-1}), if {O}_{t-1}< 0\end{array}\right.$$
1
Where, ηrew is the learning rate for rewards and ηnrew is the learning rate for non-rewards; O is the outcome received. In this model, only the chosen stimulus value is updated. Vc,t is the value of choice c on trial t. O > 0 indicates a win and O < 0 indicates no win.
(ii) An experience-weighted attraction model31, which includes an “experience” weight parameter that decouples acquisition and reversal by allowing the balance between past experience and new information to increasingly tip in favour of past experience:
$${n}_{c,t}= {n}_{c,t-1}\times \rho + 1$$
2
$${V}_{c,t}= ({V}_{c,t-1}\times \phi \times {n}_{c,t-1}+ {O}_{t-1})/{n}_{c,t}$$
3
Where, nc,t is the “experience weight” of the chosen stimulus on trial t, which is updated on every trial using the experience decay factor ρ. Vc,t is the value of choice c on trial t for outcome O received in response to that choice, and φ is the decay factor for the previous payoffs. In this model, φ is equivalent to the inverse of the learning rate in Rescorla-Wagner models.
(iii) A fictitious update model33, which includes an update rule for the unchosen option considering the knowledge individuals gain about the unchosen option, here with separate learning rates for positive and negative prediction errors.
$${V}_{c,t}=\left\{\begin{array}{c}{V}_{c,t-1}+{\eta }^{rew}\left({O}_{t-1}-{V}_{c,t-1}\right), if{O}_{t-1}>0\\ {V}_{c,t-1}+{\eta }^{nrew}\left({O}_{t-1}-{V}_{c,t-1}\right), if{O}_{t-1}<0\end{array}\right.$$
4
$${V}_{nc,t}=\left\{\begin{array}{c}{V}_{nc,t-1}+{\eta }^{nrew}\left(-{O}_{t-1}-{V}_{nc,t-1}\right), if{O}_{t-1}>0\\ {V}_{nc,t-1}+{\eta }^{rew}\left({-O}_{t-1}-{V}_{nc,t-1}\right), if{O}_{t-1}<0\end{array}\right.$$
5
Where the value V of both the chosen c and unchosen nc stimulus are updated with the actual prediction error and the counterfactual prediction error per trial t, respectively. O is the outcome received. The learning rate η, which is divided into ηrew (learning rate for rewards; learning_pos) and ηnrew (learning rate for non-rewards; learning_neg) evidences the magnitude of the value update affected by both positive and negative prediction errors.
Model fitting. For all models, a softmax choice function was used to compute the action probability given the action values. On each trial t, the action probability of choosing option A (over B) was defined as follows:
$$p\left(A\right) = \frac{1}{1+ {e}^{-\beta ({V}_{A}- {V}_{B})}}$$
6
Where β is the inverse temperature parameter that governs the stochasticity of the choice, computed using inverse logit transfer. Higher β values denote decisions driven by relative value whereas lower β values denote more choice stochasticity.
Parameter estimation was performed with hierarchical Bayesian analysis using Stan language in R via the hBayesDM package66. Posterior inference was performed using Markov Chain Monte Carlo (MCMC) sampling using the mode of the posterior distribution as the summary parameter for individual participants. Four MCMC chains were used, with 1000 post-warmup iterations per chain, resulting in 4000 valid MCMC samples. Model convergence was assessed by examining R-hat values, an index of the convergence of the chains74. R-hat values of all models were lower than 1.1 suggesting MCMC samples were well mixed and converged to stationary distributions.
The models were fitted separately to the data from each session (baseline, iso total and iso media) and condition (social, non-social) to assess whether model fit would be affected by session or condition.
Model comparison. Comparison of model fit was assessed using Bayesian bootstrap and model averaging, whereby log-likelihoods for each model were evaluated at the posterior simulations and a weight obtained for each model. We compared Bayesian model weights between models (higher weights denoting better model fit). Model parameters were then extracted for each model that provided the best estimate within a data set (here defined as > 0.50 model weight). We found that model fit was best for the fictitious update model for each data set (Table S5).
Following model selection, we extracted individual participants’ learning rates and inverse temperature parameters from the best performing model (i.e., the fictitious update model) for each session (baseline, iso total and iso media) and condition (social, non-social).
To compare differences in model parameters, we used mixed effects models to test for differences between sessions to estimate the fixed effects of feedback condition (social vs non-social) and session (baseline, iso total, and iso media) on three parameters of interest: learning_pos (learning rate from positive prediction errors (PEs), learning_neg (learning rate from negative PEs) and beta (inverse temperature). Separate models were calculated for each parameter. Subject was included as a random effect (allowing intercepts and slopes to vary between participants69).
The command for each model was:
fitlme(Data,'Parameter ~ (session * condition) + ( session * condition|subjectID)').
Simulations. Whether a certain difference in parameters indicates “better” or “worse” behaviour can be heavily dependent on the task design and the values of the other relevant model parameters37. We therefore used data simulations to identify the optimal learning parameters for the present task environment to help with interpretation of the results. The parameter combinations were taken from a grid spanned by the learning rate (η; minimum value/maximum value/steps = 0/1/60) and “inverse temperature β” (0/20/60). Each of these virtual participants then completed 28 trials corresponding to the actual number of trials in our task for each condition (social, non-social) in each session (baseline, iso total, iso media) with a reward schedule of 80:20. For each virtual participant, we then calculated the percentage of correct choices as the percentage of choices for the option that was associated with a higher probability for a reward. To reduce random noise due to the finite number of samples, we smoothed the resulting images with a gaussian filter (std = 2). We identified the optimal learning rate and inverse temperature as the values that resulted in the highest choice accuracy (average of 10 highest accuracy scores in the simulations; Fig. S2).
Data analysis and visualization for the reward tasks was implemented in Python (version: 3.7) using Jupyter notebooks (using packages Pandas, NumPy and Seaborn), RStudio and Matlab2020a.
MRI Data Analysis
Preprocessing. We used open source preprocessing pipelines for fMRI data, developed through the nipy and nipype initiatives75. We used the heudiconv python application which uses dcm2niix to convert raw scanner data into the NIFTI image file format, then organize that data into a BIDS-formatted directory structure. The fMRIprep application76 was used to minimally preprocess the anatomical and functional data (using default settings but including susceptibility distortion correction using fieldmaps (see below).
Because we collected fMRI data on 7T and used an MP2RAGE sequence, which is not a format that is currently supported by fMRIprep, we first processed the MP2RAGE data so it could be used by fMRIprep. To this end, we applied an offline reconstruction method using in-house code that reconstructs a phase-sensitive inversion recovery (PSIR) image from MP2RAGE scans (nifti files). The PSIR file was then added as a T1 file into fMRIprep to be processed and was treated as a T1 anatomical scan in fMRIprep.
Using fMRIprep, we then skull-stripped anatomical images, first roughly using the atlas-based ANTS program77, and refined this using information from FreeSurfer surfaces after reconstruction was completed78. Brain tissue segmentation was performed with the FMRIB Software Library (FSL) FAST program79. Images were spatially normalized to MNI-space using the multiscale, mutual-information based, nonlinear registration scheme implemented in ANTS. We visually inspected brain masks, tissue segmentation and FreeSurfer surfaces. Susceptibility distortion correction was performed using phase-difference B0 estimation80.
A reference image for each run was generated from the input BOLD timeseries. A functional brain mask was created using a combination of FSL, ANTS, AFNI and nilearn tools81. Using FSL’s MCFLIRT program82 we estimated and corrected for head motion, resulting in a coregistered BOLD series as well as motion-based confound regressors. Any run containing a framewise displacement greater than 0.4 mm on more than 25% of the total frames was excluded from additional analyses. Additional confound regressors were generated, including other measures of motion (framewise displacement and DVARS and anatomical CompCor83 timeseries derived from cerebrospinal fluid (CSF) and white matter tissue segments. The reference image of each run was aligned with the anatomical image using FreeSurfer’s program “bbregister”84. The timepoint-to-functional reference transformation, the functional reference to anatomical transformation, and the anatomical-to-MNI transformation were concatenated into a single transformation and used to transform each functional timeseries into MNI template space.
Spatial smoothing was performed on the fMRIprep outputs with a 6 mm smoothing kernel using FSL’s SUSAN tool85, which uses segmentation boundaries to avoid smoothing across tissue types. MRIQC, an open-source quality assurance software tool86, was used to generate additional reports which display Image Quality Metrics (IQMs).
Functional MRI Data Modelling. Analyses were conducted using the nipype framework75. For run-level analyses, the preprocessed timeseries were assessed with algorithms from the Artifact Removal Toolbox (ART) to identify frames within the run that have an abnormal amount of motion (0.4 mm of total displacement, or an intensity spike greater than 3 standard deviations from mean). The design matrix included boxcars for the experimental conditions convolved with a double-gamma hemodynamic response function (HRF), and nuisance regressors representing frame-wise motion, the anatomical CompCor regressors derived from white matter and CSF, as well as impulse regressors for volumes identified by ART. A high-pass filter was applied to the design matrix and the smoothed data. The model was evaluated using FSL’s FILM87. Subject-level contrast maps were generated using FSL’s FLAME87 in mixed-effects mode.
ROIs in MID task. To analyse the functional data from the MID task, we applied whole brain analyses and region of interest (ROI) analyses. Whole brain data was used to generate group-level contrasts for the high reward vs no reward conditions to assess whether the resulting group-level brain activation was in line with expected activation for this contrast38. For the ROI analyses, we included regions that have been consistently activated during reward anticipation in an Activation Likelihood Estimation meta-analysis of 50 fMRI studies using the MID task (Oldham 201838). These regions comprised: the ventral striatum (VS), midbrain (MB), amygdala (AMY), anterior insula (AI), occipital cortex (OC), thalamus (TH) and supplementary motor area (SMA). ROIs were created using the Harvard–Oxford cortical and subcortical probabilistic anatomical atlases; included in FSL. We were not able to find an anatomical mask for the anterior insula and therefore created ROIs based on peak activation locations reported in Oldham 2018 (left AI: -40 14 − 8; right AI: 34 24 − 2). We drew 5 mm spheres around these peak coordinates and created a binary mask combining left and right AI.
Quantifying neural reward sensitivity.
Univariate. We extracted mean activation from the contrast high reward > no reward during the anticipation phase of the MID task as a measure of univariate neural reward sensitivity from each ROI (7 ROIs in total).
Multivariate. Multivariate analyses can detect differences between conditions with higher sensitivity than conventional univariate analyses88. We therefore also employed a multivariate measure of neural reward sensitivity by quantifying the similarity of the multivariate spatial patterns of activity between high reward versus no reward during the anticipation phase of the MID task from each ROI. Multivariate classification accuracy here can be seen as an indication of the amount of information about a variable of interest available in the BOLD signal89,90, such that individual differences in classification accuracy indicate neural reward sensitivity in the multivariate domain. However, differences between multivariate and univariate tests do not afford conclusions about the nature or dimensionality of the neural signal (as both can also stem from the same source88). Thus, here, these measures were used to assess the same concept (i.e., neural reward sensitivity) and we corrected for multiple comparisons across the different measures. For each ROI, we extracted the β values from the generalized linear model (that is, the amplitude of the fit HRF) of the response to the two conditions of interest (high reward, no reward) for each trial in each run (28 trials per condition per run × 4 runs) resulting in 112 β values for each voxel. Responses were extracted from all voxels in the anatomically identified ROIs in each participant (see ROIs in MID task for ROI definition). No additional feature selection was applied. All multivariate analyses were conducted with the PyMVPA91 toolbox in Python (http://www.pymvpa.org) and Matlab 2020a. We first trained a machine learning algorithm to decode neural patterns of high reward vs no reward for each subject within each ROI. We used a 4-fold linear support vector machine (SVM; using linear kernel) classification for which we trained the classifier on 3 of the runs from the MID task while one run was left out for testing. We did this 4 times (4-folds) so that each run was a testing data set on one of the folds. We then averaged classification accuracies from each fold to create a measure of classification accuracy for each ROI for each participant. To obtain confidence intervals of the mean classification accuracy for each ROI, we used bootstrapping. We generated 1,000 datasets randomly by sampling with replacement from the classification accuracies for each ROI across participants using Matlab’s bootci function.
First, we tested whether these classification accuracies for each ROI across participants were significantly above chance, indicating the ability to decode high reward vs no reward representations in that ROI. While the selected ROIs were chosen based on highly consistent univariate responsiveness to the MID task across different studies, they might not necessarily also represent reward anticipation in the multivariate domain. For hypothesis testing at the group level, we used a permutation analysis following the methods in Stelzer et al.92 (also described in Tomova et al.24). This nonparametric approach does not depend on assumptions about the distribution of classification accuracies92,93. To generate a null distribution from the data, we followed the steps described in Stelzer et al., which we summarize below. We shuffled the condition labels randomly during training within each run, and then used the same 4-fold cross-validation approach as in the original analysis to obtain prediction accuracies for each fold, which we then averaged to create a measure of classification accuracy for each ROI for each participant. We performed this permutation analysis 100 times per participant (thus creating 100 random permutations), resulting in 100 accuracy values per participant. To create a null distribution at the group level, we then randomly drew one of the 100 accuracy values for each participant and calculated a mean across participants. This procedure was repeated 105 times for each ROI, creating the null distributions for each ROI. We calculated the probability P of obtaining a mean accuracy value in the null distributions that is equal to or higher than the true mean from the analyses. Following Stelzer et al., we rejected the null hypothesis of no group-level decoding if P < 0.001.
We were able to decode high reward vs no reward representations in all of our 7 selected ROIs: VS (mean accuracy = 0.568, bootstrapped CI = 0.540,0.594, P < 0.001); MB (mean accuracy = 0.551, bootstrapped CI = 0.525,0.582, P < 0.001); OC (mean accuracy = 0.579, bootstrapped CI = 0.552,0.606, P < 0.001); AI (mean accuracy = 0.545, bootstrapped CI = 0.521,0.573, P < 0.001); TH (mean accuracy = 0.560, bootstrapped CI = 0.534,0.592, P < 0.001); SMA (mean accuracy = 0.568, bootstrapped CI = 0.536,0.601, P < 0.001); AMY (mean accuracy = 0.506, bootstrapped CI = 0.482, 0.534, P < 0.001).
We used the classification accuracy obtained from each participant from the SVM classification as a measure of multivariate neural reward sensitivity for subsequent analyses.
For both measures of neural reward sensitivity (univariate and multivariate), we used Pearson correlations to test for an association between neural reward sensitivity and measures of reward seeking (RTs from the high reward / high effort conditions (mean across nature and social contexts) in the EBDM task) and reward learning (learning_pos rates in the RL task) in the isolation session. We calculated separate correlations for each ROI for each measure of neural reward sensitivity (univariate and multivariate; 14 tests in total) and report results as significant at p < 0.003 (0.05/14)). ROI data analysis and visualization was implemented in Python (version: 3.7) using Jupyter notebooks (using packages PyMVPA, SciPy, Pandas, NumPy and Seaborn) and Matlab2020a.