Participants.
Thirty-three right-handed participants (18–27 years old, 22 females) participated in the fMRI experiment with a variant Pavlovian conditioning task. Informed consent was obtained from each individual participant, in accordance with a protocol approved by the Beijing Normal University Research Ethics Committee.
Experimental paradigm.
In a variant Pavlovian conditioning task, two different cues were stochastically associated with either rewards (gain) or punishers (loss). The two conditions were alternately and randomly intermixed (Fig. 2). The associated gain and loss values (unit: Chinese Yuan) were randomly drawn from a beta distribution, with the mean ∈ [± 22, ± 26, ±30, ± 34, ±38], using the same standard deviation of 3.6 within a block of 4–8 trials. The number of trials within each block was randomly drawn from a uniform distribution, between 4 and 8 trials. Hence, the outcomes were noisy and volatile62–64. The sequence was randomly generated for each participant. Although the neighboring blocks of the same cue type always had different mean values, the cue-associated values appeared to be continuously varied, within [± 10, ± 50], and the change points between the neighboring blocks were not apparent, due to the large standard deviations within each block. Each participant was required to explicitly report the prediction value associated with each presented cue by scrolling the bar position to the target position, in combination with several button presses, where the right or left button presses would increase or decrease the magnitudes, respectively. Specifically, pressing the left or right button with the corresponding index finger corresponded to adding or subtracting 1 from the current position, respectively; the buttons for the middle finger resulted in steps of 5, the ring finger resulted in steps of 10, and the little finger button was used to submit the prediction value. The participants were not provided with any information regarding the environment associated with the task and were merely instructed to learn from the outcomes.
Task sequence.
Each trial started with a 1-s presentation of a fractal image, as the valence-associated cue (i.e., CS). After the cue presentation, the participants reported the valence (prediction) associated with the cue within 3 s. The initial position of the cursor always began at the prediction value reported for the previous trial with the same cue, or at the central position (± 30) for the first trial of each run. Immediately after reporting the prediction value, the participants reported their confidence rating, using a scale from 1 (indicating completely uncertain) to 8 (indicating completely certain), regarding the prediction precision, within 2 s. After a uniformly random jitter, lasting between 3 and 5 s, the actual associated value (outcome) was presented as feedback, for 1 s. The inter-trial interval (ITI) was uniformly random, lasting from 4 to 6 s, causing the prediction and feedback phases to be temporally separated by a 3–7 s gap. Each run consisted of 30 gain trials and 30 loss trials, and a total of eight runs were performed.
The outcomes of one gain trial and one loss trial were independently and randomly chosen to be added to each participant’s basic payment (100 Chinese Yuan, approximately 15 US dollars). In addition, each participant was instructed that another bonus equal to 40 Chinese Yuan would be rewarded for good performance in predicting the cue-associated values. In fact, all participants received this bonus. Prior to the fMRI experiment, each participant practiced two runs of the task outside of the scanner.
Behavioral analysis.
We used a simple model-free RL model to characterize the underlying learning process associated with the prediction update (i.e., the RW model2,3). Each participant’s prediction (p) was assumed to update through a trial-by-trial recursive process, as follows:
$${\text{p}}_{\text{n}}={\text{p}}_{\text{n}-1}+{\alpha }\text{*}{\text{p}\text{e}}_{\text{n}}$$
1
where \({\text{p}\text{e}}_{\text{n}}= {\text{o}}_{\text{n}}-{\text{p}}_{\text{n}-1}\), denoting the prediction error; \({\text{o}}_{\text{n}}\) denotes the actual outcome, \({\alpha }\) denotes the constant learning rate, and \({\text{p}}_{1}=\pm 30\). The updating process is driven by the prediction error.
Alternatively, the participants might progressively gain cue-outcome associations, as described by the Pearce-Hall (PH) model24, accounting for the associability or attention with the cue, as follows,
$${ {\alpha }}_{\text{i}}= \left(1-{\gamma }\right){{\alpha }}_{\text{i}-1}+{\gamma }{\beta }\left|{\text{p}\text{e}}_{\text{i}}\right|$$
2
where \({{\alpha }}_{\text{i}}\) denotes the associability strength at the trial i, \({\gamma }\) denotes the decay constant of the learning rate and \({\beta }\) is a scaling coefficient.
We fitted the trial-by-trial predictions with the outcomes, and calculated Bayesian information criterion (BIC) by transforming the minimum of residual sum of square into log-likelihood for each model in each individual participant. We used nonlinear optimization algorithms, implemented in MATLAB (Matlab2012b, Mathworks Inc., Natick, Massachusetts), to separately estimate the parameters of the gain and loss conditions, for each participant.
Further, to illustrate that the learning rates changed with the PEs, we separately divided the trials of the positive and negative PEs (the PEs that equaled to zero were omitted) equally into six bins across the gain and loss conditions for each participant. The mean learning rate in each bin was calculated by a linear model to fit the regression value between the prediction changes and the PEs (Fig. 6a).
fMRI parameters.
All fMRI experiments were conducted using a 3-T Siemens Trio MRI system, with a 12-channel head coil (Siemens, Germany). Functional images were acquired with a single-shot gradient-echo T2* echo-planar imaging (EPI) sequence, with a volume repetition time of 2 s, an echo time of 30 ms, a slice thickness of 3.0 mm, and an in-plane resolution of 3.0 × 3.0 mm2 (field of view: 19.2 × 19.2 cm2; flip angle: 90 degrees). Thirty-eight axial slices were taken, with an interleaved acquisition, parallel to the anterior commissure-posterior commissure line.
fMRI analyses.
The fMRI analyses were conducted using FMRIB’s Software Library65 (FSL). To correct for rigid head motion, all EPI images were realigned to the first volume of the first scan. Data sets in which the translation motions were larger than 2.0 mm or the rotation motions were larger than 1.0 degree were discarded. No data were discarded from this experiment. Brain matter was separated from non-brain matter by using a mesh deformation approach, which was used to transform the EPI images into individual high-resolution structural images and then into Montreal Neurological Institute (MNI) space, using affine registration with 12 degrees of freedom, and resampling the data with a resolution of 2 × 2 × 2 mm3. Spatial smoothing, with a 4-mm Gaussian kernel (full width at half-maximum), and high-pass temporal filtering, with a cutoff of 0.005 Hz, were applied to all fMRI data.
For the first-level analyses, two events were applied to each trial. The first event represented the prediction phase, time-locked to the onset of the cue presentation, with the sum of the cue presentation duration (1 s) and the response time (RT) of the prediction report representing the event duration. The second event represented the feedback phase, time-locked to the onset of the outcome presentation, with the presentation duration (1 s) as the event duration. Six general linear model (GLM) analyses with parametric regression were separately applied to the feedback phase of the current trial and to the prediction phase of the subsequent trial with the same cue as follows. (1) We first used the signed PEs (SPEs) to regress with the trial-by-trial fMRI activities, during both the feedback and prediction phases, separately for the gain and loss conditions, to specify the neural encoding of the SPEs in the dopaminergic reward system. (2) To further test the neural encoding of the SPEs in the dopaminergic reward system, we repeated the same process but separated the positive PEs from the negative PEs in both the gain and loss conditions (Supplementary Fig. 2). (3) Because the SPEs and the unsigned PEs (UPEs) were uncorrelated (mean r = 0.01, P = 0.48), we simultaneously regressed the SPEs and the UPEs with the trial-by-trial fMRI activities in both the feedback and prediction phases of the gain and loss conditions (Figs. 4 & 5). (4) We used the reported confidence as a parameter to regress the trial-by-trial fMRI activities during the prediction phase of the gain and loss conditions (Fig. 5). (5) To examine the neural correlates of the CS-associated valences, we regressed the outcomes and the predictions with the trial-by-trial fMRI activities during the feedback and prediction phases of the gain and loss conditions, respectively (Supplementary Fig. 3). (6) As both the outcomes (mean r = 0.57) at the current trial and the predictions at the subsequent trial (mean r = 0.44) were highly correlated with the SPEs at the current trial, we then regressed the outcomes and the predictions after orthogalization with the SPEs with the trial-by-trial fMRI activities during the feedback and prediction phases of the gain and loss conditions, respectively (Supplementary Fig. 3). We added the currently irrelevant SPEs or UPEs associated with the alternative CS as the confounding variables during both the feedback and prediction phases in each trial. All the regressors were convolved with the canonical hemodynamic response function, using two-gamma kennels. Further, we also used a delta function at the onsets of both phases to look for the possibly sharp phasic neural responses to the SPEs. We obtained very similar results as used the GLMs described above.
For the group-level analyses, we used FMRIB’s local analysis of mixed-effects (FLAME), which model both the ‘fixed effects’ of within-participant variance and the ‘random effects’ of between-participant variance, using Gaussian random field theory. Statistical parametric maps were generated by the threshold, with z > 3.1, P < 0.05 after family-wise error (FWE) correction for multiple comparisons, unless mentioned otherwise.
Regions-of-interest (ROI) definition.
We focused our analyses on the three ROIs of the dopaminergic reward system: the VS, putamen, and putative SNc. These ROIs were defined by the voxels within the anatomically defined regions that reached a significance level at z > 2.6 (P < 0.005) for the parametric regression of the positive PEs with fMRI activities during the gain condition in the voxel-wise whole-brain analysis. Therefore, the defined ROIs of the VS and putative SNc should agree with the conventional regions of the dopaminergic reward system thought to be responsive to the reward PEs. We then assessed the regression values of fMRI activities with the negative PEs, the SPEs, the UPEs, the cue-associated values and the reported confidence in these ROIs. The anatomical regions of the VS and putamen were defined by the Harvard Subcortical Structures Atlas (including probabilities > 0.5), and the anatomical region of the putative SN was defined by a mask around the ventral tegmental area/SNc45 (MNI coordinates: x: − 8 to + 6, y: − 26 to − 14, z: − 20 to − 12). The ROI of the amygdala was extracted using the same way as the VS. The ROI of the PMC was defined using the same approach, an anatomically defined PMC area that reached a significance level at z > 2.6 (P < 0.005) for the parametric regression of the SPEs with fMRI activities in both the gain and loss conditions during both the feedback and prediction phases in the voxel-wise whole-brain analysis.
ROI analyses.
The mean beta values of the GLMs were averaged from the voxels of the ROIs. Further, we also used a trial-based GLM to obtain the trial-by-trial values of the response activities during the prediction and feedback phases. Different from the normal GLM analyses, which use two common regressors across all the trials as described above, here, each trial had independent regressors66,67. We then divided all the trials for each participant equally, into ten bins, according to the normalized SPEs. The mean response beta value in each bin was calculated (Fig. 5b, Supplementary Fig. 4b).
Psycho-physiological interaction (PPI) analyses.
To calculate the voxel-wise functional connectivity between the VS region (the seed ROI) and the voxels across the whole brain that changed with the PE magnitudes (i.e., UPEs), we performed another voxel-wise GLM analysis, in which the time course of the VS region (physiological factor), the median-split UPEs (large: 1; small: − 1; psychological factor), and their interaction were put into the feedback phase as three parametric modulation regressors (Fig. 6b-c). Statistical parametric maps were generated by the threshold, with z > 2.6, P < 0.05 after FWE correction for multiple comparisons.