Metacognition in monkeys in both the memory and perception domains
To show that macaques are capable of metacognition, we quantified this capacity using bias-free metacognitive efficiency (H-model meta-d′/d′). We compared animals’ scores to zero using one-sample t tests and found that the meta-index values of all monkeys were above zero for both tasks (Figure 2c & d; meta-perception: H-model meta-d′/d′: Mars, t(19) = 5.685, p < 0.001; Saturn, t(19) = 5.639, p < 0.001; Uranus: t(19) = 10.55, p < 0.001; Neptune, t(19) = 9.458, p < 0.001; meta-memory: H-model meta-d′/d′: Mars, t(19) = 9.012, p < 0.001; Saturn, t(19) = 5.639, p < 0.001; Uranus: t(19) = 4.159, p < 0.001; Neptune, t(19) = 3.621, p < 0.001).
We then replicated the results with the phi coefficient (meta-perception: phi coefficient: Mars, t(19) = 3.643, p < 0.001; Saturn, t(19) = 6.245, p < 0.001; Uranus: t(19) = 6.722, p < 0.001; Neptune, t(19) =3.423, p < 0.001; meta-memory: phi coefficient: Mars, t(19) = 4.135, p < 0.001; Saturn, t(19) = 2.962, p = 0.004; Uranus: t(19) = 2.252, p = 0.018; Neptune, t(19) = 1.838, p = 0.041).
To further validate these results, we combined all trials per monkey across all days and then performed subject-based distribution simulations on each monkey. By randomly shuffling all the pairings between “responses” (correct/incorrect) and their corresponding “confidence levels” (high/low) within each subject, we generated 2,000 random pairings for each animal and simulated 4,000 metacognitive scores per animal (both the H-model meta-d′/d′ and the phi coefficient). These scores represent cases in which the animals had no metacognitive ability. We then tested these simulated scores against animals’ actual scores using a minimum statistic method34; we found that the animals indeed performed significantly above chance metacognitive ability in both tasks (all p values < 0.001; Table 1).
As a control to rule out any possible contribution of training effects, we compared the animals’ metacognition scores between the first ten days and the second ten days of testing. We found no difference between the first ten days and the second ten days of metacognitive performance in either perception (H-model meta-d′/d′: (t(39) = -0.314, p = 0.755) or memory (H-model meta-d′/d′: (t(39) = 0.89, p = 0.378). These results show that the metacognitive ability of the animals was stable across the whole testing period. For completeness, we checked the monkeys’ cognitive performance and found that they improved moderately in the second half in the memory task (accuracy: t(39) = -2.266, p = 0.029) but not in the perception task (t(39) = -1.083, p = 0.285).
TMS of BA46d impairs metacognitive performance but not cognitive performance
We then turned to our main question. We tested whether TMS of BA46d would affect metacognition on perceptual decision-making. We performed a 2 (TMS phase: on-judgement/on-wagering) × 2 (TMS: TMS-46d/TMS-sham) mixed-design repeated-measures ANOVA for metacognitive efficiency with TMS phase as a within-subjects factor and TMS as a between-subjects factor. We found a significant interaction between TMS phase and TMS modulation in both monkeys (Neptune, F(1,18) = 6.431, p = 0.021; Uranus, F(1,18) = 10.718, p = 0.004). The interaction was driven by lower metacognitive efficiency following TMS of BA46d than following sham treatment in the on-judgement phase condition (paired t tests: Neptune, t(9) = 3.675, p = 0.002; Uranus, t(9) = 2.741, p = 0.013), whereas no difference in metacognitive efficiency was found in the on-wagering phase (paired t tests: Neptune, t(9) = -0.3, p = 0.768; Uranus, t(9) = -0.841, p = 0.411); see Figure 3a and b. We replicated the metacognition deficit in the on-judgement phase with the phi coefficient (paired t tests: Neptune, t(9) = 3.51, p = 0.002; Uranus, t(9) = 5.637, p < 0.001).
These meta-indices are based on how the subjects rate their confidence and reflect how meaningful a subject’s confidence (reflected here by time wagering) is in distinguishing between correct and incorrect responses. Accordingly, we performed a three-way ANOVA (TMS phase: on-judgement/on-wagering × TMS: TMS-46d/TMS-sham × Confidence: unreached/reached) on task performance (accuracy) and observed a significant three-way interaction in both monkeys (Neptune, F(1,2313) = 5.530, p = 0.019; Uranus F(1,2295) = 6.910, p = 0.009). The TMS effect was stronger in the on-judgement TMS phase (TMS × Confidence interaction: Neptune, F(1,1167) = 10.672, p = 0.001; Uranus F(1,1160) = 10.404, p < 0.001, Figure 3c) than in the on-wagering TMS phase (TMS × Confidence interaction: Neptune, (F(1,1146) = 0.003, p = 0.954; Uranus F(1,1135) = 0.309, p = 0.579; Figure 3d). The effects in the on-judgement TMS phase were driven by higher accuracy following TMS-46d than TMS-sham in the unreached trials (Mann–Whitney U tests: Neptune, p = 0.001; Uranus, p < 0.001) but not in the reached trials (Mann–Whitney U tests: Neptune, p = 0.235; Uranus, p = 0.192). These findings confirmed that TMS targeting BA46d impairs metacognitive ability on a trial-by-trial level.
We further verified that type 1 task performance and mean wagered time were not affected by TMS. As expected, task performance (daily accuracy), reaction time (RT), and wagered time (WT) were not different between the two TMS conditions in either the on-judgement phase (paired t test, all p values > 0.1 for accuracy, RT, and WT in both monkeys) or the on-wagering phase (paired t test, all p values > 0.1 for accuracy, RT, and WT in both monkeys). These findings confirmed our first hypothesis that the monkey dlPFC is critical for meta-perception and that such effects are independent of perception processes.
Instantiation of TMS-induced impairment: Reduced accuracy-tracking ability of wagered time, altered reaction time–wagered time association, and altered trial-difficulty psychometric curve
We examined whether TMS would affect the ability of WT to track task performance in the two TMS phases (on-judgement/on-wagering). We focused our analysis on catch trials and incorrect trials, since we could not measure the precise WT for some trials (i.e., correct reached trials; see methods). We performed logistic regression on correctness with WT, TMS (TMS-46d/TMS-sham), and cross-product items as factors to test whether TMS of BA46d affected the response-tracking precision of WT. We found a significant interaction between TMS and WT in the on-judgement TMS phase (both monkeys: β3 = -0.149, standard error = 0.029, odds ratio = 0.862, z= -5.115, p < 0.001, Figure 3e) but not during the on-wagering phase (both monkeys: β3 = 0.010, standard error = 0.030, odds ratio = 1.010, z= 0.321, p = 0.748, Figure 3f). This effect in the on-judgement phase was driven by higher WT in correct trials than in incorrect trials in the TMS-sham condition (Mann–Whitney U tests: Neptune, p < 0.001; Uranus, p < 0.001, Figure 3i and j) but not in the TMS-46d condition (Mann–Whitney U tests: Neptune, p = 0.98; Uranus, p = 0.45, Figure 3g). We also confirmed that WT can predict the trial outcomes in a graded manner in the on-wagering phase (β1 = 0.152, standard error = 0.020, odds ratio = 1.164, z= 7.631, p < 0.001). These results revealed that TMS of BA46d, when administered during the on-judgement phase, affects metacognitive performance. We obtained the same results when we performed these logistic regressions on the two monkeys separately (Table 2).
Second, metacognitive abilities in animals are often confounded by behavioural association35. For example, animals are believed to make use of cues (environmental cues such as stimulus conditions and self-generated cues such as response time) to determine confidence instead of performing the task metacognitively. To rule out this possibility, we calculated the correlation between RT and WT in both experiments to check whether the monkeys relied on RT as an associative cue to determine confidence. The results showed no correlation between RT and WT correlation in the domain-comparison experiment (Figure 4a), indicating that the macaques did not rely on RT as an associative cue to determine their WT. We then utilized this phenomenon to verify the effect of TMS. WT was significantly negatively correlated with RT during the on-judgement TMS phase only in the TMS-46d condition (r = -0.195, p < 0.001) and not in the TMS-sham condition (Figure 4b). We found a significant difference in correlation coefficients between TMS-46d and TMS-sham in the on-judgement phase (z = -2.24, p = 0.0251). It is possible that monkeys started to rely on RT as an associative cue after having received TMS on area 46d, which hampered their metacognitive ability. As a control comparison, no difference was found between TMS conditions in the on-wagering phase (Figure 4c).
Moreover, as seen in the rodent literature, WT can be expressed as a function of the strength of evidence (e.g., odour mixture ratio in their task) and response outcome (correct/incorrect)18; the level of confidence should increase with evidence strength (resolution difference in our experiments) for correct trials and decrease with evidence strength for incorrect trials. We performed GLM to predict WT with four variables: TMS (TMS-46d/TMS-sham), TMS phase (on-judgement/on-wagering phase), resolution difference, and correctness and their cross-product items. We found a four-way interaction in the monkeys (Neptune, βTMS × TMS phase × correctness × resolution difference = -60.66, p = 0.010; Uranus, βTMS × TMS phase × correctness × resolution difference = -44.76, p = 0.019). Trial-difficulty psychometric curves of these results illustrated that the effects were driven by a strengthened correctness × resolution difference interaction in the TMS-sham condition (including trials in both the on-judgement TMS phase and the on-wagering TMS phase) (Neptune, βcorrectness × resolution difference = 48.99, p < 0.001; Uranus, βcorrectness × resolution difference = 42.20, p < 0.001) and no effect in the TMS-46d on-judgement condition (Neptune, βcorrectness × resolution difference = 13.55, p = 0.119; Uranus, βcorrectness × resolution difference = -2.50, p = 0.753, Figure 5c).
Critically, the correctness × resolution difference interaction was driven by the increased WT for correct trials in the TMS-sham condition (including trials in both the on-judgement TMS phase and the on-wagering TMS phase) (Neptune, βresolution difference = 27.47, p < 0.001; Uranus, βresolution difference = 27.76, p < 0.001) and decreased WT for incorrect trials (Neptune, βresolution difference = -21.51, p < 0.001; Uranus, βresolution difference = -14.43, p < 0.001, Figure 5d-f). These results suggest that in the TMS-sham condition, WT increased with resolution difference for correct trials and decreased with resolution difference for incorrect trials irrespective of TMS phase, whereas this pattern was disrupted during the on-judgement phase in the TMS-46d condition. Additionally, we confirmed that perceptual performance was intact by performing logistic regression on response outcomes with resolution difference, TMS (TMS-46d/TMS-sham), and cross-product item as factors. We found no interactions for either the on-judgement TMS phase or the on-wagering TMS phase in the monkeys (all Ps > 0.05).
Qualities of monkey metacognition: Wagered time (WT) is diagnostic of the animals’ performance
To further substantiate these results, we expected that monkeys could indicate their confidence using their trial-by-trial wagered time. We showed that wagered time is diagnostic of the animals’ performance using a number of analyses. First, we compared the accuracy in reached (high confidence) and unreached (low confidence) trials; chi-square tests revealed that monkeys had higher accuracy in higher-confidence trials in both meta-perception (all four monkeys: χ2(1) = 31.88, p < 0.001; for individual monkeys: all p values < 0.05, Figure 6a) and meta-memory (all four monkeys: χ2(1) = 13.41, p < 0.001; for individual monkeys: all p values < 0.05, Figure 6b). To test whether the WT tracked the response outcomes, we performed logistic regression on response outcomes with WT, task (memory/perception), and the cross-product as factors. We confirmed that the WT could accurately predict the trial outcome (β1 = 0.033, standard error = 0.007, odds ratio = 1.033, z = 4.586, p < 0.001; Figure 6e). We found no interaction between task and WT (β3 = 0.0014, standard error = 0.011, odds ratio = 1.014, z = 1.335, p = 0.182), indicating that WT in both memory and perception tasks tracked the response outcomes. These results showed that the trial-wise wagered time was diagnostic of the animals’ decision outcome, reflecting that the monkeys were aware of their judgement outcome. All results held when we performed the analyses for each monkey individually (Table 3).
Qualities of monkey metacognition: Evidence regarding domain specificity
While we found a positive correlation between the perception and memory domains in daily individual accuracy (r(80) = 0.271; p = 0.0151; Figure 7a), their respective metacognitive efficiency scores did not correlate (r(80) = 0.1134; p = 0.3164; right panel in Figure 7b). This prompted us to examine the domain specificity with bias-free metacognitive efficiency (H-model meta-d′/d′). To assess the potential covariation between metacognitive abilities, we calculated a domain-generality index (DGI) for each subject. We quantified each monkey’s domain generality as well as the mean across the two tasks (Figure 7c and d). Specifically, we shuffled the task types (memory/perception) across all 40 days (20 days of memory and 20 days of perception) within each subject. This procedure was shuffled 1,000 times, and we obtained 40,000 simulated DGI values for each monkey. We found that all monkeys’ DGIs were above the simulated values, as confirmed by Mann–Whitney U tests against the mean of the simulated data (Mars: 0.167; Saturn: 0.182; Uranus: 0.350; Neptune: 0.260; Mann–Whitney U test results: all p values < 0.001, Figure 7e). Additionally, we employed pairwise correlation to assess the similarity of the two tasks across and within subjects (Figure 7g). The matrix of pairwise correlation was hierarchically clustered (Figure 7h), revealing two distinct clusters in which data from the same domain in multiple monkeys grouped together (whereas within-monkey data did not). This indicates that the within-task similarity of metacognitive efficiency was stronger than the within-subjects similarity. Together, these results suggest domain-specific constraints on metacognitive ability that transcend the individual animal level.