The effectiveness of social signaling depends upon both the intended content of the signal (i.e., the socioemotional “ground truth”) and the interpretation of the signal by an observer (i.e., socioemotional inference; Mehu & Scherer, 2012). To better understand how social signaling is processed in the brain, we sought to dissociate the neural patterns underlying “ground truth” and inference in a dynamic naturalistic storytelling paradigm and test how these patterns relate to empathic accuracy. We found that both “ground truth” (i.e., the target’s self-reported emotional intensity) and an observer’s inferences about the target’s emotional intensity could be predicted from observers’ brain activity. The multivariate brain patterns derived from these predictions are dissociable; however, when the models’ predictions align, observers make more accurate inferences. These findings suggest that there is some latent representation of a target’s “ground truth” emotional intensity that observers transform into conscious inference. Moreover, these findings suggest that participants quickly and accurately represent another’s mental state even when they make an incorrect inference.
How can it be possible that participants “represent” the “ground truth” of another person’s social signals and that this “representation” can be dissociable from the participants’ conscious inferences? Human adults have highly adept schemas of social information that are activated when they perceive prototypical socioemotional expressions (Izard, 2007). Prior work has found that rich, category-specific visual features can be readily mapped to distinct emotions that are coded across many brain regions including primary visual cortex (Kragel et al., 2019). Unless a target is intentionally trying to deceive an observer, or has a disorder that impacts socioemotional communication, a target will convey information in a manner that will activate the correct schema in an observer (Chang et al., 2021). That is the function of social signaling– to convey information in a manner that will be quickly and accurately understood by the individual an organism wishes to convey it to (Mehu & Scherer, 2012). We suspect that the “ground truth” pattern revealed in this investigation is capturing this process of schema-activation in the observers. Schema-activation in this investigation is (a) specific to the intensity of a signal and (b) highly dynamic and multimodal.
The “ground truth” pattern relies significantly on brain areas implicated in speech comprehension (right angular gyrus; Seghier, 2013) and scene construction (PCC and calcarine sulcus; Irish et al., 2015), as well as mentalizing and social cognition (bilateral superior frontal gyrus, precuneus, and anterior insula; Van Overwalle & Baetens, 2009). It does not, however, rely on the amygdala– a brain region heavily implicated in emotion perception (Spunt & Adolphs, 2019). The whole brain “ground truth” pattern is also unique from PINES, which is a multivariate pattern that can predict the emotional intensity of negative static images with high accuracy (Chang et al., 2015). PINES was trained on emotional images, only some of which were social. Therefore, it is possible that our “ground truth” pattern is specific to the processing of socioemotional schemas. This pattern may be something the brain constructs in dynamic social interactions that helps it to create a stable representation of the social target. This interpretation is inspired by constructionist theories of emotion, which state that emotions are not fixed entities that have distinct and specific brain circuits, but flexible processes that are inseparable from the context in which they emerge (Barrett, 2017; Russell, 2003).
It is important to note that none of the subcortical regions commonly implicated in emotion perception (i.e., the amygdala, striatum, and periaqueductal gray) were significantly weighted in the “ground truth” or inference models. Instead, the distributed patterns primarily included cortical regions that perform complex multisensory integration (i.e., angular gyrus, frontal gyrus, PCC, temporal pole, and the precuneus; Scheliga et al., 2022). From a constructionist viewpoint, the socioemotional “ground truth” pattern may reflect neural processing that abstracts the literal input (i.e., facial expressions, body movements, speech and vocal intonations) into socioemotional schemas related to intensity. Similarly, the inference model may represent a subsequent stage of processing, where the information the target signaled is related to the observer’s past and current experiences and their expectations for the future. Indeed, the inference pattern relied both on mentalizing networks and brain areas implicated in social abstraction and somatosensory processing– the temporal pole and S1, respectively (Quandt et al., 2017). Being that the “ground truth” and inference models were (1) dissociable, (2) verifiable in their own held-out validation sets, and (3) unique from other published models of emotion induction, like PINES, we suspect that they capture unique components of social signal processing– schema activation and deliberate inference formation– that can only be experimentally-evoked by dynamic, naturalistic stimuli.
When the “ground truth” and inference models’ predictions aligned in the brains of the observers, observers made more accurate inferences. Furthermore, the two unique models could be combined to predict the empathic accuracy of observers in held-out validation trials. Interestingly, when individuals made inaccurate inferences, activity in right S1 and PHG increased with the intensity of their inferences when controlling for the intensity of the “ground truth” representation. This suggests that somatosensory simulations may support the transformation of socioemotional “ground truth” into a conscious, reportable inference. Part of the transformation from “ground truth” to inference in this paradigm requires a motor response: Observer’s must update their ratings with a button press. Therefore, it is difficult to disentangle this brain pattern from button pressing entirely; however, this pattern of brain activity was dissimilar from patterns associated with finger tapping in the NeuroSynth database. Furthermore, though the left lateralization of activation in M1 was consistent with the (right) hand making the ratings, the positive clusters in S1 and the PHG were ipsilateral to the ratings-hand, suggesting this activity is not reducible to task-induced motion. This is consistent with prior research that shows S1 activates during motor imagery and empathy (Blakemore et al., 2005; Hooker et al., 2010; Jafari et al., 2020; Schaefer et al., 2020). Furthermore, prior studies of imagination have found right lateralization of imagined stimuli (Reddan et al., 2018). Indeed, a growing thrust of empathy research purports a role for somatosensory simulations in the understanding of social interactions (Genzer et al., 2022; Jospe et al., 2020, 2022). As for the PHG, activity in this region is implicated in episodic memory and the processing of scenes (for a review see Aminoff et al., 2013). Taken together, these results suggest that inference involves an internalizing of the events described by the target, and that people simulate the actions described and relate them to their own prior experiences and expectations.
This study has several limitations. First, the “ground truth” and inference ratings occur spontaneously and sometimes simultaneously in this paradigm, therefore, we are unable to directly model the transformation of “ground truth” into inference in the brain. Second, though the stimuli themselves are highly dynamic and complex, the models are trained to predict only a single dimension of socioemotional information: intensity. This was done because naturalistic stories often signal positive and negative information at a faster rate than we can sample the brain data (see Polimeni & Lewis, 2021). For example, a participant may be describing both the sadness and the love they felt after the death of a family member. These are intense, complex emotions, therefore, removing valence from individual ratings allowed us to better model dynamic shifts in emotion signaling and to isolate signatures of signal intent from observer inference. Further validation of our models on other naturalistic and social audiovisual data is necessary to determine their sensitivity and specificity to both a target’s self-reported internal emotional state and an observer’s conscious inference of that state.
Various neuroimaging studies have attempted to predict aspects of socioemotional processing from human brain activity (see Figure S4 for a summary), however, this is one of the first investigations to situate socioemotional processing within the ethological framework of social signaling and inference. Most existing models were developed to predict an observer’s internal emotional state after it was experimentally influenced by an image or story, and the stimuli are often exaggerated social signals (e.g., an image of an angry actor pointing a gun). This is the first investigation to delineate brain activity related to the “ground truth” of social signals from brain activity related to the inferences observers make when they perceive these signals. The dissociability of these processes provides a new foundation for emotion research and delineates new promising targets for clinical intervention.
Sharing information is essential to the well-being of individuals and the communities they are a part of because individuals must interact with each other to achieve personal needs that cannot be achieved alone. Effective signaling can engender social bonds, mutual aid, and collaboration (Mauss et al., 2011). Ineffective signaling, however, can be costly. A missed alarm call can result in death, while misunderstanding social signals can result in ostracization or rejection. People with ASD experience this from both ends: They have difficulty understanding the intentions, thoughts, and feelings of others and have difficulty being understood. As a result, people with ASD experience high levels of loneliness and social isolation relative to other disability groups (Causton-Theoharis et al., 2009; Mazurek, 2014). However, little research emphasis has been placed on people’s ability to decode the social signals of people with ASD. The current study creates a new avenue for such investigations because it allows researchers, for the first time, to disentangle a signal’s intent from an observer’s inference, in dynamic real-world situations.
In summary, our study delves into the intricate processes of how humans interpret dynamic social signals and make inferences about the internal states of others. Using naturalistic stimuli, fMRI data, and machine learning, we established two distinct neural signatures: one predicting the "ground truth" intended emotional intensity of a target and another predicting an observer's inferences. These neural patterns are dissociable, indicating that they are separate components of socioemotional processing. Notably, when these brain-based models align, individuals demonstrate higher empathic accuracy. This work offers valuable insights into the brain processes that underlie the interpretation of social signals and has the potential to inform treatments for social and emotional disorders.