General Experimental Procedures
In Study 1, testing sessions took place over 5 consecutive days at the University of Oxford, Department of Psychiatry at Warneford Hospital. In the first visit, the participants were taken through a screening interview to assess their eligibility. Then, the participants responded to a set of demographic questions and completed a battery of psychological questionnaires. After the screening interview, the eligible participants continued with the first day of learning and completed 3 blocks of a simple RL task in order to learn the associations between shapes and rewards. In line with the aims of the study, participants’ affective state was manipulated using a WoF paradigm adapted from Eldar and Niv (2015).
To probe value-based recall of affective memories, after the training days, we asked participants to make decisions in a two-option forced-choice (TOFC) preference task in which various combinations of the abstract shapes they had learned about were paired with each other (i.e. on the last 2 days of the lab-based study, and the last day of the online study). Although no explicit feedback was given to participants in the preference test, they continued to accumulate money based on the reward probability of the chosen shape (i.e. 90% chance of winning 2p if the participant selects the shape associated with 90% reward probability in the learning phase). In the preference tests accumulated money was calculated but not displayed on-screen to ensure that participants cannot use this information as a proxy for their decision performance which would otherwise contaminate preference choice behaviour with further learning.
In Study 2, testing sessions took place over 3 consecutive days and were delivered using an online platform (due to the global COVID-19 pandemic). We manipulated the reward probabilities in each RL block pre-and-post WoF in a balanced manner in order to investigate how discrete affective events influence human RL. Further details of experimental procedures and statistical analysis approach is written below. All tasks were presented on a laptop running MATLAB (MathWorks Inc) with Psychtoolbox (v3.1).
Training phase
In the lab-based study (Study 1) on each of the learning days, participants started with a baseline block (30 trials), followed by two more blocks (each 90 trials), with rest periods between the blocks. In the online study (Study 2), all blocks contained 40 trials each. The shapes in the RL tasks were selected from the Agathodaimon and Dingbat Cobogo fonts. During the RL blocks, participants were asked to learn the reward probabilities associated with each shape through trial and error. Participants were explicitly informed that on any given trial if one shape is rewarded the other one is not rewarded (such that the probabilities are p versus 1-p). A green frame appeared around the rewarded option to provide feedback on participant choice. For every correct choice made, participants collected 2 pence and they were asked to accumulate as much reward as possible. Participants started with £15 on day 1 and a running total at the bottom of the screen was updated at the start of each subsequent trial.
In total, the participants were asked to learn reward probabilities associated with 14 shapes in the lab-based study and 24 shapes in the online study. Participant preferences between these shapes were later probed in the preference test which took place on the 4th and 5th days of the lab-based study and the 3rd /final day of the online study.
Wheel of Fortune
In order to influence the participants’ affective state, a single WoF draw was used in the break between the first and second blocks on each of the learning days. Participants were told that they could either win money, lose money, or receive nothing (blank). Unknown to the participants, the draw was not random but fixed, such that in the lab-based study each participant won (+£15), lost (-£11), or received a blank outcome (£0) across the 3 days (see Figure 1 and legends about a detailed description of the experiment). In the online study, we excluded the blank condition and focused only on win (+ £14) and loss (-£7) outcomes, and adjusted their outcome magnitudes considering the commonly observed salience differences between wins and losses 30 . To control for an effect of order, these outcomes were counterbalanced across participants. In Eldar and Niv 31 (2015), individual trials were rewarded with 25 cents and participants could either experience the loss or win of $7 after a single WoF draw (1/28 ratio). In contrast, we decided to increase the WoF outcome/reward ratio to ensure that these affective events (i.e. WoF outcomes) would feel more salient to participants. Therefore, in our experimental design, each correct prediction during RL led to a smaller reward (2p). As illustrated in Figure 1A, win and loss outcomes were sandwiched between large magnitudes of opposing outcomes, creating a near-miss effect aimed at strengthening the affective impact of the WoF. We used these large magnitude WoF outcomes to experimentally induce negative or positive memory biases.
Happiness ratings
During the learning days, participants were asked to report their current happiness. In the lab-based study (Study 1), we used a Likert scale from 1-9 (with higher numbers indicating greater happiness), whereas in the online study (Study 2) we used a visual analogue scale. Participants were asked to provide a happiness rating at different time points during the experiment, for example before starting the first RL block, more critically immediately before and after the WoF, and at the end of their training on any given day.
Preference tests
In the last phase of the experiment, we conducted preference tests. In order to assess the test-retest reliability of participant preferences we conducted the preference test twice, once on the 4th day and again on the 5th day in the lab-based study. Participants were presented with random pairs of all shapes they encountered during the training days and on each trial, were asked to choose the shape with the higher reward probability. The aim was to examine whether a WoF induced change of affective state would bias their valuations of the learned shapes. Written and spoken instructions were given and subjects were presented with a print-out of all the shapes (14 in Study 1 (lab) and 24 in Study 2 (online)) before the task began, to provide them with an opportunity to recall the learned shapes. Within each study, shapes were randomly paired with each other to cover all possible combinations leading to 400/456 trials (lab-based versus online) presented across three blocks, and the order of these trials was randomised across participants. We were particularly interested in the pairs of shapes that had objectively identical reward probabilities but appeared after different WoF outcomes, thus should be encoded under different affective influence (Figure 1). Therefore, the majority of trials presented during the preference tests prioritised randomly pairing equal probability shapes such that the value difference between the options would be near 0 (Supplementary Figure 7).
In the preference tests (both Study 1 and 2), participants did not receive any explicit feedback about whether their choices were correct or not, so they had to rely on what they had previously learned. In order to prevent further learning, participants' running total did not appear on the screen during a block of trials, but they did continue to accumulate money based on whether their choice was correct and the reward probability of the chosen shape. Their running total was only displayed between blocks to provide an indication of their performance in the previous block. Administration of preference tests took roughly 30 minutes for the online study and 50 minutes for the in-lab study with pupillometry.
Questionnaire measures
In addition to the computerised task, participants were asked to respond to a series of self-report questionnaires on the first day of their testing, prior to completion of the first learning block. These questionnaires included: (1) Beck Depression Inventory (BDI-II)32, a standard measure of depression; (2) Spielberger State-Trait Anxiety Inventory (STAI)33, an anxiety measure comprising trait (anxiety proneness) and state (current state of anxiety) subscales; (3) Positive and Negative Affect Schedule (PANAS), assessing the feeling and expression of positive and negative emotions34; (4) Behavioural Activation/Behavioural Inhibition (BIS/BAS), reflecting aversive motivation and appetitive motivation35; (5) and the Mood Disorder Questionnaire (MDQ)36, a screening tool for Bipolar Disorder.
Statistical analyses
We used repeated-measures analysis of variance (rmANOVA) models to investigate the effects of our experimental manipulations. All significant main effects and interaction terms were followed up by post hoc tests, corrected for multiple comparisons using Bonferroni correction. All analyses were conducted on MATLAB and SPSS v25. In the lab-based study, 6 participants were excluded from the analyses of pupillometry data due to signal dropout affecting more than half of the trials, and in total 2 participants were excluded from the behavioural analysis (one person reported intentionally selecting the lower probability shapes during the training phase, the other person dropped out from the study before preference test 2). In total, 5 participants were excluded from the online study, 4 individuals who did not give more than 4 unique mood ratings across 16 assessment points, and one individual who consistently selected the lower probability shapes more frequently than the higher probability shapes. Across all statistical models, reward probability, valence and behaviour in pre-post-WoF blocks were entered as within subject factors, whereas WoF order (e.g. whether participants experienced win or a loss outcome on the 1st day) and shape set which randomised the associations between shape identities and reward probabilities were entered as between subject factors.
Pupillometry
On the first day of preference tests (day 4) in the lab-based study, we collected pupillometry data to assess physiological response to affective memories during value-based decision-making. Participants' heads were placed at 70cm distance from the computer screen, stabilised by a chin rest. The eye-tracking system (Eyelink 1000 Plus; SR Research, Ottawa, Canada) was linked to the presentation computer through an ethernet connection. The sampling rate for pupillometry was set to 500 Hz recording from both eyes. The preference test was presented on a VGA monitor. A fixation cross marking the middle of the screen separated the presented pair of shapes.
Preprocessing of the pupillary data involved removing eye blinks that were identified using the built-in filter of the Eyelink system. A linear interpolation was implemented for all missing data points (including blinks). The resulting pupil trace was processed through a low pass Butterworth filter (cut-off of 3.75 Hz) and then z-transformed across the preference test session37,38. In order to assess phasic response to task-related variables, we performed baseline correction by subtracting the mean pupil size during the 2-second baseline period prior to our epochs of interest (i.e. decision and outcome) from each time point in the post-stimuli presentation period. Individual trials were excluded from the pupillometry analysis if more than 50% of the data from the outcome period had been interpolated18,37. The preprocessing resulted in a single set of pupil time-series per participant containing pupil dilation data for each of the included trials.