Embarrassment is a social emotion that most of us experience through daily life, and it occurs when a desired social image of oneself is threatened (Miller & Tangney, 1994; Tangney et al., 1996). It can be seen as a self-conscious and, simultaneously, other-conscious emotion: While one is more self-aware in an embarrassing situation, one is also concerned about other people's judgment (Tangney et al., 1996). Embarrassment is experienced mostly in public with other people. However, it can still occur privately when the audience is imagined and is more likely to occur around strangers than loved ones. Embarrassment is usually accompanied by typical physiological changes, such as blushing (Hofmann et al., 2006; Keltner, 2005; Miller, 2012), or changes in the voice and non-verbal behavior, such as avoiding eye contact or lowering one’s head. Tangney et al. (1996) categorized embarrassment as a “negatively valenced emotion” (pp. 1264), but there are also positive consequences that can arise for the person affected as well as the audience: According to Miller (2012), people often react helpfully in embarrassing situations, and showing embarrassment in the form of blushing, for example, elicits a favorable impression of the affected person. Stocks et al. (2011) additionally differed between personal and empathic embarrassment. Whilst the first is experienced for oneself, the latter is experienced for another person while, for example, observing an embarrassing task. This study, however, focused solely on personal embarrassment.
Core aspects of embarrassment, such as the fear of negative evaluation, fear of being rejected by others, and heightened self-consciousness, are also important aspects of social anxiety (SA) as well as social anxiety disorder (SAD). According to Rozen and Aderka (2023) embarrassment, SA, and SAD were consistently associated with each other in physiological measures, neural activities, and self-reports of emotions for clinical and non-clinical samples. Socially anxious people are generally more easily embarrassed and respond with more intense embarrassment than less socially anxious people (Leary & Hoyle, 2013; Rozen & Aderka, 2023). As with embarrassment, SA occurs mostly in public; if in private, an imagined audience or an imagined reaction from others is necessary. While embarrassment appears sudden, brief, and due to an actual misstep, SA appears gradually and over a longer period of time and can occur without having done anything wrong (Miller, 2009).
Embarrassment is generally neglected in research on basic emotions (Simon-Thomas et al., 2009). Especially when papers report on vocal cues in different emotions, they often examine either the classical basic emotions according to Ekman & Friesen (1971) or other emotions than embarrassment (e.g., Devillers & Vidrascu, 2007; Juslin et al., 2018; Patel et al., 2011; Sauter et al., 2010). Due to the consistent association of embarrassment and SA and the fact that embarrassment is still a neglected emotion, it is important to look more into the emotion itself, its relationship to other emotions, and how it is associated with SA from a basic research and clinical point of view.
Therefore, this paper's main goal was to exploratorily examine embarrassment and capture the emotion from different points of view. On the one hand, embarrassment can be compared categorically to other emotions; it can be described as how it relates to and shares information with them. There are different data sets in different languages, consisting of emotional speech, where the data has already been labelled and tested accordingly (see e.g., Burkhardt et al., 2005). On the other hand, embarrassment itself can be described dimensionally in more depth. Grimm et al. (2007) proposed a three-dimensional emotion space consisting of the axis valence, activation, and dominance (VAD), which was also used in this study to describe embarrassment dimensionally.
Previous studies investigating fundamental emotional research used either classical subjective clinical psychology approaches, relying mainly on the participants' self-reports and ratings of a few individual experts, or more objective measures such as neuroimaging methods and psychophysiological measures (Bastin et al., 2016). If subjective and objective measures are combined in embarrassment studies, they often focus on somatic or neuronal features (Hofmann et al., 2006; Müller-Pinzler et al., 2012) but seldom on voice parameters. The same goes for physiological indicators of SA or embarrassment: Research most often investigated body, hand, and head movements or gaze activity (Keltner et al., 2019). According to Weeks et al. (2012), there are several advantageous characteristics of voice parameters as physiological indicators of, for example, SAD, such as being less biased to subjects’ responses and more objective than through self-questionnaires. So, even though participants do not say the same, one can objectively compare the paralinguistic information of their answers (Burkhardt et al., 2005).
Human speech can be divided into verbal (linguistic) and non-verbal (paralinguistic) sounds. While it is obvious that verbal sounds play an important role in communication, non-verbal aspects, such as paralinguistics, carry a lot of additional information in conversations, such as the emotional and mental state of the person speaking (Kadali & Mittal, 2020). Conversely, changes in the voice may indicate changes in a person's mental and emotional state. This study, therefore, combined subjective measures (see section 2.3.) and objective engineering and machine learning approaches (see section 2.4.) to examine embarrassment.
To describe and examine embarrassment from different points of view, this paper had the following four goals: The first goal was to induce embarrassment in participants and test whether the induction was successful and, if so, how embarrassment was related to SA. We hypothesized that embarrassment would indeed be induced, with the participants being significantly more embarrassed during the embarrassment induction task compared to the pre- and post-induction periods. Verifying that embarrassment was induced was the only goal with a hypothesis. This was the fundament for the other three following exploratory goals, where we used our acoustic data to gain further insights using automatic speech processing techniques. The second goal was to test how well our trained model could predict our sample data in pre-induction, embarrassment, and post-induction and show the robustness of our embarrassment data. The third goal was to adopt a dimensional approach and map embarrassment onto the VAD dimension. The fourth and last goal was comparing embarrassment to other emotions, thus following a categorical approach. For the third and fourth goals, publicly available emotional speech corpora (i.e., acoustic samples with emotional labels) were used to train our models.