Interpersonal coordination refers to the temporal alignment of two or more individuals while they interact with each other (Hoehl, Fairhurst, & Schirmer, 2020). Growing interest in the dynamics of real-world social interactions (Redcay & Schilbach, 2019; Schilbach et al., 2013) has shown that interpersonal coordination is present across various domains, such as bodily movements (Chartrand & Bargh, 1999; Hale, Ward, Buccheri, Oliver, & Hamilton, 2020; Ramseyer & Tschacher, 2011), physiological signals (Feldman, Magori-Cohen, Galili, Singer, & Louzoun, 2011; Konvalinka et al., 2011) or brain activity (Hirsch, Noah, Zhang, Dravida, & Ono, 2018; Stephens, Silbert, & Hasson, 2010). Across all these domains it is widely agreed that interpersonal coordination has positive effects on social interactions (Hoehl et al., 2020), by facilitating communication (Hasson, Ghazanfar, Galantucci, Garrod, & Keysers, 2012) and increasing affiliation (Lakin, Jefferis, Cheng, & Chartrand, 2003). However, the specific patterns of interpersonal coordination remain poorly understood. A reason for this could be that traditional methods to record and analyse interpersonal coordination in dyadic social interactions fail to capture the full richness of interaction dynamics.
Here, we examine the interpersonal coordination of head nods and hand movements, using advanced methods (high-resolution automated motion capture and wavelet coherence analysis) and in three different conversational contexts. Tracking conversation behaviour across different contexts will allow us to test hypotheses about why people engage in particular patterns of nodding or hand movements, and thus to interpret what these actions might mean. In addition, we can test if the behaviour of individual participants is consistent from one context to another, that is, do some people always engage in a lot of nodding regardless of context while others rarely nod? If individual behaviour is consistent, this would support the development of automated methods that could discriminate personality (Heerey, 2015) or even diagnose psychiatric disorders from social behaviour patterns (Georgescu et al., 2019). Thus, this paper aims to explore what nodding and hand movements mean in conversation and how this type of data could be used for future research. We first review current knowledge about head nodding and hand movement patterns in conversation and detail our experimental manipulations.
Head Nodding Behaviour
Many non-verbal signals during a conversation are centred around the head (e.g., eye-gaze, blinks, facial expressions, and head movements), and listener’s attention is typically drawn to the speaker’s head and face during conversation (Argyle & Cook, 1976). Head nodding is regarded as a distinct social signal that is particularly sensitive to conversational demands and can convey several different meanings (Poggi, Errico, & Vincze, 2010), from signalling attention and understanding (Hadar et al., 1983; Kendon, 2002), to requests for information and passing turns (Duncan, 1972). Recent work from our lab has developed an automated method which can identify and quantify two distinct types of nods – fast nods and coordinated slow nods (Hale et al., 2020). From that work, we define fast nods as vertical head movements that are faster than 1.5 Hz, and slow nods as below 1.5 Hz. By examining how fast-nods and slow-nods are used across different conversational contexts, we aim to understand the meaning of nodding as a social signalling behaviour.
We consider three potential meanings of a head nod: backchannelling, mimicry and joint attention. First, backchannelling is the information flow in a conversation where the listener signals ‘back’ to the speaker. Verbal backchannels include linguistic vocalizations such as ‘uh-uh’, whereas non-verbal backchannels include facial expressions and head movements like nodding. For example, a listener may nod their head to show that they are listening, or even to indicate that one is agreeing with what the speaker is saying (Allwood & Cerrato, 2003; Duncan, 1972). Previous research (Hale et al. 2020) showed that a fast nods are produced mainly when participants are listening and receiving new information, which suggests this might be a backchannel. The present study will test if this is true across different information-exchange contexts.
Second, nodding could be a type of mimicry. Mimicry arises when one person copies the gestures, actions, or postures of another (Chartrand & Bargh, 1999). This mimicry typically occurs spontaneously during interactions and is believed to act as a ‘social glue’ to facilitate bonding and affiliation between people (Lakin et al., 2003). Previous work (Hale et al, 2020) identified coordinated slow nodding with a time-scale which matches previous reports of mimicry (Hömke, Holler, & Levinson, 2018). If this behaviour is a form of mimicry linked to social affiliation, we would expect the behaviour to be present across many different conversation contexts, regardless of the topic of conversation.
Third, nodding could represent a type of joint attention, which arises when two people gaze at the same object at the same time, typically with one leading the gaze and the other following (Emery, 2000). In raw motion capture data, this gaze following pattern might look like a nodding action if both people are looking downwards to an object in their hands, which was the case in Hale et al (2020). That is, it is possible that the ‘nodding coordination’ in the previous study arose primarily because both participants were jointly attending towards an object held in the hands. If this interpretation is correct, then conversations in a different context without the picture should not show coordinated slow nodding behaviour.
To summarise, we have described two types of head nodding behaviour – fast nods and coordinated slow nods – and we consider three different social meanings that could be applied to these behaviours – backchannelling, joint attention and mimicry. Changing context of a conversation provides a way to distinguish the social meanings of the different nodding behaviours. Here, we create three different conversation tasks, which allow us to manipulate information sharing and joint attention targets to distinguish between these different interpretations of nodding behaviour. Before detailing these different tasks, we will describe the hand movement behaviours which are the second focus of the present paper.
Hand Movement in Conversation
During conversation, co-speech hand movements are tightly linked to speech at the temporal and semantic level (Kita & Özyürek, 2003; Loehr, 2007). For instance, beat gestures are rapid movements used as temporal cues to emphasise relevant information (McNeill, 1992), whereas iconic gestures have high semantic content and are used to describe an object or action to disambiguate complex sentences (Kelly, Kravitz, & Hopkins, 2004; Kelly, Ward, Creigh, & Bartolotti, 2007). In fact, several studies show that co-speech hand gestures facilitate attention capture, affect speech comprehension, and improve learning and memory in both speakers and listeners (Cook, Mitchell, & Goldin-Meadow, 2008; Kendon, 1972; Marstaller & Burianová, 2013; McNeill, 1992). Another type of (non-co-speech) hand movements are self-grooming gestures. These are actions used to clean or maintain parts of the body (e.g. fixing the hair) in order to give a positive impression to others and increase affiliation (Daly, Hogg, Sacks, Smith, & Zimring, 1983). Despite the critical role of hand gestures in conversation and social interactions, little is known about their dynamics at the interpersonal level.
Single-participant studies have shown that individuals coordinate or imitate hand actions from a video-clip or virtual characters (Genschow, Florack, & Wänke, 2013; Pan & Hamilton, 2015, Stel et al., 2010), but to our knowledge only two previous studies have investigated hand gesture coordination in face-to-face conversation. Holler and Wilkin (2011) used a referential communication task (Clark & Wilkes-Gibbs, 1986) where dyads were given two equal sets of cards depicting figure-like stimuli, and were instructed to discuss them with the aim of placing the cards on a table in the same order. They found that participants imitated each other’s co-speech gestures during the conversation, and that such imitation played an important role in establishing mutual shared understanding. In another study, Ramseyer and Tschacher (2016) investigated the presence of hand movement imitation during conversation in the context of a natural psychotherapy session. In a single-case analysis, they found that patient and therapist imitated each other’s hand movements, and that the levels of interpersonal coordination over the sessions were positively associated with patient’s ratings of affiliation with the therapist.
Although these studies provide evidence of interpersonal coordination of hand gestures, they rely on slow but precise coding of video recordings by trained observers. Here, we aimed to determine if hand movement dynamics can be captured with high resolution motion capture and interpreted using the same automated framework that we used for head nods. This is an exploratory analysis, which will test if there is interpersonal coordination of hand movements that can be detected with motion capture and if this coordination varies across conversational contexts.
Changing Conversational Contexts
In the study of both head nodding and hand movements, it is clear that examining behavior in a single context is not enough to interpret the social meaning of a behavior or to provide a general analysis. Thus, the present study placed participants in three different conversational contexts. First, we used a picture description task which has previously been valuable in our lab and elsewhere (Chartrand & Bargh, 1999; Hale et al., 2020; van Baaren et al., 2009). Here, one participant holds a picture of a complex scene and must describe it to their partner, who listens and then can ask questions about this picture. Each trial lasts only 90 seconds and is divided equally into monologue and dialogue phases. This task is highly structured, with one person in the role of the ‘leader’ who holds the picture and who speaks most of the time. The presence of the picture also provides a clear target for joint attention.
The second ‘video recall’ task was selected to create a conversation with common ground (i.e., shared knowledge) that engages memory but did not involve the exchange of any new information. At two points during the data collection session, participants watched a 3 min wordless children’s animation together. Later, they were asked to recall the animation in detail, working together to describe as much as they could. This tended to be a slow unstructured conversation where both participants discuss events which they are familiar with.
The third ‘meal planning’ task was developed by Chovil (1991) and Tschacher et al. (2014) as a way to encourage a fun and relaxed conversation between strangers. Participants were asked to spend 5 min planning a meal using ingredients they both dislike. This conversation topic induces some general exchange of information about food preferences together with joint planning of the meal. The exchanges tend to be short and dynamic with laughter and overlaps in speaking.
Figure 1 provides an illustration of these three conversation tasks and a sample of the turn-taking behavior in each one. Panel A illustrates the picture description task where one participant (here blue) speaks for the majority of the time, providing information about the picture to their partner. Here, the picture itself provides a joint attention target. Panel B illustrates the video discussion task, where participants recall the short movie (i.e., share ‘common ground’) but do not exchange any new information. Panel C illustrates the meal planning task where both participants share information and often speak quickly with overlaps.
Measuring Interpersonal Coordination
To understand the changes in movement behaviour across these different conversational contexts, it is important to precisely measure and appropriately analyse the behaviour of our participants. Traditional video-coding methods have high accuracy but are very time-consuming and hard to quantify (Holler & Wilkin, 2011). Video-based analyses can quantify motion energy (Ramseyer & Tschacher, 2011, 2016), but their resolution is limited because they quantify pixel changes on a flat image. Motion capture technologies provide high resolution recordings of bodily movement in a 3D space (Feese, Arnrich, Tröster, Meyer, & Jonas, 2011; Hale et al., 2020; Poppe, Zee, Heylen, & Taylor, 2013). The present study uses this method to record head and hand position at a high sampling rate (120 Hz) while two participants interact face-to-face.
To analyse the data, we use wavelet coherence analyses (Fujiwara & Daibo, 2016; Issartel, Marin, Gaillot, Bardainne, & Cadopi, 2006). This provides a measure of interpersonal correlation for each frequency component and time-point in the interaction. Information on the frequency domain has been useful in distinguishing different types of nodding behaviour. For instance, recent studies in our lab using wavelet coherence analysis (Hale et al., 2020) have identified fast and slow nods as distinct behaviours which are visible in a wavelet analysis. The present study extends this to different contexts to test how context changes nodding behaviour.
The Present Study
The present study combined a high-resolution motion capture system with wavelet coherence analysis to investigate head and hand motion patterns of dyads as they engaged in three conversational tasks with varying degrees of structure and common ground. The aim of the study was to address three major questions.
Question 1: What do head-nodding signals mean? We hypothesis that, if fast-nods are a backchannel that signals ‘information received’, they should be found in the contexts where the participants exchange novel information (picture description and meal planning tasks) but not in the video discussion task. If coherence in slow nodding reflects affiliation it should be found across all contexts, but if it reflects joint attention it should be found only in the picture description task when an object (the picture) is available to look at.
Question 2: Are individual levels of head nodding correlated across contexts? If head-nodding is a robust individual signature with the potential to be a clinical marker, it should be consistent across contexts. For example, an individual who nods a lot in the picture description task should also nod in the video discussion task and this might correlate with personality measures. By testing for this pattern, we can explore the potential of nodding measures as a way to quantify individual differences in social behaviour.
Question 3: What are the patterns of interpersonal coordination of hand movement across contexts? This question is more exploratory, as there is little prior data on hand coordination, so we considered two aspects. First, can the wavelet coherence methods we used for nodding detect any robust pattern of hand movement coordination, and if so, what frequencies are seen? And second, does hand movement coordination change across contexts? Given the absence of previous studies on this topic, we did not make any specific predictions for the patterns found in each conversational task. However, we hypothesised that, if interpersonal coordination of hand gestures facilitates communication, dyads would generally show more interpersonal coordination of hand gestures when the task was unstructured and there was no common ground.