Participants & Experimental setup
In the experiment, ten triads (30 participants) took part in the study. Participants were between 20 and 35 years old and reported having normal hearing. The experiment was conducted in Danish, and all participants were native Danish speakers. The majority of subjects were students at the Technical University of Denmark. When organizing the triads, emphasis was placed on creating mixed-gender groups and ensuring that all three participants were strangers to each other prior to the experiment. However, due to scheduling difficulties, these criteria had to be relaxed. As a result, three triads ended up being same-gender groups, and two triads included pairs of individuals who were acquainted prior to the experiment. The experiment lasted about 2.5 hours, and participants were offered hourly monetary compensation for their participation. All experiments were approved by the Science-Ethics Committee for the Capital Region of Denmark (reference H-16036391), and were carried out in accordance with relevant guidelines and regulations. All participants provided informed consent prior to participation in the experiment.
During the experiments, participants were seated facing each other in an equilateral triangle, approximately 1.5m apart. Background noise was played back via an array of eight loudspeakers (Dynaudio BM6P) placed at a distance of 2.4 meters from the center. The loudspeakers were driven by a sonible d:24 amplifier, and each one played a Danish monologue28, resulting in spatially distributed multi-talker noise. The monologues lasted approximately 90 seconds each and were looped for the duration of the conversation. The noise was presented at a combined sound pressure level (SPLs) of either 48 dB or 78 dB, referred to as the “quiet” and the “noisy” conditions, respectively. The simultaneous presentation of multiple masking speech sources rendered them individually unintelligible in both conditions. Behind the loudspeakers, a circular black curtain fully enclosed the participant area to minimize visual distractions.
Task
Each participant went through three main phases of the experiment, as visualized in Fig. 1. First, participants were asked a series of 28 general knowledge questions on a given topic. The topics of the general knowledge questions and subsequent conversations were Hollywood movies (Which of these two movies is oldest?), Copenhagen landmarks (Which of these two places is closest to the city center?), and European countries (Which of these two countries has the larger population?). For each question, two response alternatives were given, each accompanied by a visual illustration and a label. The 28 questions were presented on a touch-screen tablet and included all unique paired combinations of 8 items (i.e. 8 Hollywood movies).
Participants were instructed to select one of the two options and to provide a confidence level, expressed as a percentage between 50% and 100%, with 50% indicating no preference for either option, and 100% indicating absolute certainty in the decision. They were instructed to interpret the scale as indicating their estimated probability of having answered the question correctly. After answering the 28 questions, the participants discussed the questions they had just answered in their triad. They were instructed to view the task as a collaborative effort and were told to aim to improve both their own performance and that of their group members. This was intended to encourage participants to share their own beliefs and to ask for assistance on questions where they were unsure, thereby facilitating a free exchange of information. To further aid the discussion, each participant was given a sheet displaying the eight items that appeared in the preceding question round. Once a 10-minute time limit was reached, or the conversation concluded naturally, the sheet was removed and participants individually answered the same 28 questions again, without talking. At the end of the round, participants received feedback in the form of a percent correct score on their pre- and post-discussion responses.
Prior to the main experiment, the group performed a short trial round on a separate topic not included in the main experiment. This trial round familiarized them with the task and interface and helped them overcome any initial awkwardness in the conversations. For each topic, there were two lists of 28 questions, one list for each noise condition. Thus, this process was repeated six times, once for each of the three topics and in each of the two noise conditions. The order of topics and conditions was randomized between groups, with the restriction that the same topic would never appear twice in a row. A brief break was included after the third or fourth round of questions.
Group decision model
The group decision model employed was based on the confidence weighted majority voting (CWMV) model23. The model was originally used in the context of a perceptual task where participants estimated probabilities of biased coin flip sequences. Adapting it the general-knowledge paradigm was straightforward, as participants were asked to submit confidence ratings based on their estimated probability of being correct. The model predicts a group's combined confidence rating, \(\:{c}_{g}\), in a binary decision scenario where each individual member's prior confidence is known. The group confidence is assumed to be reached through a consensus decision. Given prior confidence ratings, \(\:{c}_{i}\), from \(\:M\) individuals, \(\:{C}_{g}\) is predicted to be:
$$\:\begin{array}{c}\:{C}_{g}\sim\:N\left(k{\sum\:}_{i=1}^{M}{C}_{i}^{\beta\:},\sigma\:\right) \left(1\right)\end{array}$$
The confidence ratings \(\:{C}_{g}\) and \(\:{C}_{i}\) are measured in log-odds units, such that \(\:C=\text{ln}\left(\frac{c}{1-c}\right)\), where \(\:c\) is a confidence rating on a bounded scale from 0 to 1. Here, \(\:c=0\:\Rightarrow\:C\:=-{\infty\:}\) indicates maximal confidence in one option, \(\:c=1\:\Rightarrow\:C\:=\:{\infty\:}\) indicates maximal confidence in the other option, and \(\:c=0.5\:\Rightarrow\:C\:=\:0\) indicates no preference for either option. The parameters \(\:k\), \(\:\beta\:\) and\(\:\:\sigma\:\) are free parameters that control the shape of the probability distribution of posterior confidences given a set of \(\:M\) prior confidence ratings. Note that the definition given here differs slightly from the one originally proposed, as the errors are assumed to be normally distributed around \(\:{C}_{g}\), and not \(\:{c}_{g}\). Modelling the errors this way mitigates the truncation problems introduced by making \(\:{c}_{p}\in\:\left[0;1\right]\) normally distributed, as pointed out by the authors of the original study23.
While the CWMV model was originally intended for cases where a deliberating group makes a single consensus decision, this was not the case in the present study. To allow for individual posterior decisions, \(\:{C}_{g}\) was replaced with \(\:{C}_{j}^{p}\), representing the posterior (indicated by the superscript \(\:p\)) confidence of member \(\:j\). Furthermore, different weights were included for each pair of group members by adding indices \(\:i\) and \(\:j\) to the free parameter \(\:k\). For simplicity, the \(\:\beta\:\:\)parameter was dropped from the model. This allowed us to derive an analytical solution to the maximum likelihood estimator (MLE) of the model parameters, eliminating the need for numerical methods when inferring model parameters. The resulting group decision model used in this study was thus:
$$\:\begin{array}{c}\:{C}_{j}^{p}\sim\:N\left({\sum\:}_{i\in\:a,b,c}{k}_{j,i}{C}_{i},{\sigma\:}\right) \left(2\right)\end{array}$$
Here, \(\:i\) has been converted to a categorical variable representing the three group members, \(\:a\), \(\:b\), and \(\:c\). This change clarifies that we explicitly looked at triads, though the model is still, in principle, applicable to any group size. In this model, the free parameter \(\:{k}_{j,i}\) acts as a weighting factor on the initial confidence ratings \(\:{C}_{i}\). The weighting factor \(\:{k}_{j,i}\) controls how much participant \(\:j\) is influenced by group member \(\:i\)'s prior confidence rating when making their own posterior decision, or, in other words, how much information they obtained from member\(\:\:i\).
Using individual weights for each group member allows the model to account for the fact that individual members might gain more or less information from each other due to factors like hearing status, susceptibility to noise, personality factors, etc. The model defined by Eq. (2) can thus be considered an extension of CWMV for cases where consensus decisions are not enforced, and where individual differences in impact on the final decision are accounted for.
The model weights \(\:\mathbf{k}\) were estimated using an MLE. Given \(\:N\) trials of prior and posterior confidences from three group members, an MLE for the weight vector \(\:{\mathbf{k}}_{\varvec{j}}\) of group member \(\:j\) can be derived from the following system of equations (see supplementary materials for derivation details):
$$\:\begin{array}{c}\sum\:_{n=1}^{N}\left(\left[\begin{array}{c}{C}_{a,n}\\\:{C}_{b,n}\\\:{C}_{c,n}\end{array}\right]\cdot\:{\left[\begin{array}{c}{C}_{a,n}\\\:{C}_{b,n}\\\:{C}_{c,n}\end{array}\right]}^{T}\cdot\:\left[\begin{array}{c}{k}_{j,a}\\\:{k}_{j,b}\\\:{k}_{j,c}\end{array}\right]\right)=\sum\:_{n=1}^{N}\left(\begin{array}{c}{C}_{a,n}{C}_{j,n}^{p}\\\:{C}_{b,n}{C}_{j,n}^{p}\\\:{C}_{c,n}{C}_{j,n}^{p}\end{array}\right) \left(3\right)\end{array}$$
For clarity, the summation operators are taken to act on each row separately. \(\:{C}_{i,n}\) denotes the prior confidence of member \(\:i\) on the \(\:n\)'th trial, and \(\:{C}_{j,n}^{p}\) is the posterior confidence of member \(\:j\) on the \(\:n\)'th trial. Given observations of \(\:C\) and \(\:{C}_{j}^{p}\), this system of equations can be solved for \(\:{\mathbf{k}}_{\mathbf{j}}=\:{\left[\begin{array}{ccc}{k}_{j,a}&\:{k}_{j,b}&\:{k}_{j,c}\end{array}\right]}^{T}\), the weight vector of a given member \(\:j\).
When estimating weights using data from the experiment, trials with a confidence rating of 100% were first truncated to 99% to prevent infinite values when converting the confidence ratings to the log-odds domain. This effectively limited the magnitude of confidence scale in the log-odds domain to \(\:\pm\:\frac{0.99}{1-0.99}\approx\:\pm\:4.60\).
When using Eq. (3) to estimate weights from observed data, the weights are assumed to be invariant across multiple decisions; \(\:N\) distinct decisions are used to estimate each group member's weight \(\:{\mathbf{k}}_{\mathbf{j}}\). However, the conditions under which communication happens may impact the weights, so that the members of a given group might apply different weights towards each other depending on the conditions. For example, background noise can reduce the audibility of other group members, making their utterances less clear to the listener(s). This could, in turn, cause the listener to reduce their weight towards others, as the cues they shared were less salient or judged to be less reliable. The weight vectors can thus act as a quantitative measure of the dynamics by which information is exchanged in the group, and they can be compared across different conditions to explore how these dynamics are affected by an intervention.
Weight distances
To make quantitative claims about the effect of an intervention on the information exchange weights, a meaningful measure of distances between weights is required. Here, we used the inverse cosine similarity, or cosine distance, to quantify the distance between weight vectors. Results are reported in radians, corresponding to the angle between two weight vectors in the three-dimensional space of the decision weights. A cosine distance of zero radians between two weight vectors (i.e., the vectors are parallel and share the same sign) indicates that, if the set of initial confidence ratings is held constant, those two vectors represent identical posterior decisions in terms of binary choices. Similarly, the smaller the angular distance between two vectors, the higher the similarity between the posterior decisions they are derived from. The angular distance can also be used to quantify the relative weight towards individual group members. This is done by finding the distance between a weight vector and individual axes in \(\:k\)-space.
These two different ways to use the angular distance are illustrated in Fig. 2. Two hypothetical weight vectors, \(\:{\mathbf{k}}_{\mathbf{i}}\) and \(\:{\mathbf{k}}_{\mathbf{j}}\), belonging to members \(\:i\) and \(\:j\), are shown in blue and red, respectively. The notation \(\:D\left(\cdot\:,\cdot\:\right)\) is used to refer to the angular distance between two weight vectors, measured in radians. The bold line shows the angular distance between the two weights, i.e. \(\:D\left({\mathbf{k}}_{\mathbf{i}},{\mathbf{k}}_{\mathbf{j}}\right)\). The three axes in Fig. 2 can each be thought of as "belonging" to a specific group member, i.e., the \(\:{k}_{b}\)-axis belongs to member \(\:b\), as this dimension represents the weight towards member \(\:b\). Defining \(\:\widehat{\mathbf{m}}\) as the unit vector parallel to some member \(\:m\)'s axis, \(\:D\left({\mathbf{k}}_{\mathbf{i}},\widehat{\mathbf{m}}\right)\:\)measures the distance between member \(\:i\)'s weight vector and member \(\:m\)'s axis. When \(\:D\left({\mathbf{k}}_{\mathbf{i}},\widehat{\mathbf{m}}\right)\to\:0\) rad, the posterior (binary) decisions made by member \(\:i\) will approach \(\:m\)'s prior decisions. The confidence values may be scaled by some constant; this would correspond to changing the magnitude of the weight vector.
Assuming that weights are non-negative, the maximum possible value of \(\:D\left({\mathbf{k}}_{\mathbf{i}},\widehat{\mathbf{m}}\right)\) would be \(\:\frac{{\pi\:}}{2}\) rad, which would occur only if \(\:{\mathbf{k}}_{\mathbf{i}}\) is orthogonal to \(\:\widehat{\mathbf{m}}\), i.e., if member \(\:i\)'s weight towards \(\:m\) is zero. This would occur if \(\:i\) completely ignores any information shared by \(\:m\) when making their posterior decisions. A negative weight towards a member can only occur if the information shared by that member is "inverted" before it is integrated into the posterior answer. This would most likely occur only if participants believed they were being deliberately deceived by another member, or if they for some other reason believed the other member to be consistently more likely to be wrong than right. In the experiment presented in this study, we assumed that such behavior would not take place, as the task was explicitly collaborative. We thus assumed that negative weights would only occur as statistical anomalies.
Weight distance summary statistics
The possible directions of weight vectors spanned by non-negative weights are illustrated as the gray hemisphere in Fig. 2. The weights provide an abstract representation of how information is exchanged between individuals in a particular group. To facilitate comparison across multiple groups, four summary statistics were defined based on the information exchange weights and the angular distances between them. These summary statistics – overall weight change, self-weighting, weight equality and weight similarity, introduced separately in the following – are each associated with a different view on what constitutes successful information exchange, providing complementary perspectives on how to interpret the weights estimated using the decision model.
The first summary statistic, overall weight change, was quantified as \(\:D\left({\mathbf{k}}_{\mathbf{N}},{\mathbf{k}}_{\mathbf{Q}}\right)\), where \(\:{\mathbf{k}}_{\mathbf{N}}\) and \(\:{\mathbf{k}}_{\mathbf{Q}}\) denote the noise and quiet condition weights, respectively, for any given participant. This statistic was motivated by the idea that the quiet condition may be thought of as representing an “ideal” communication scenario, where no inhibitive effects on communication are present. Participants were thus assumed to use the weights that came naturally to them, given their individual personality traits and the groups' social dynamics. In this view, any substantial change in weights away from the quiet condition would represent a detriment to the information exchange process, as different weights than those achieved in quiet would indicate that different posterior choices would follow.
The second summary statistic, self-weighting, was defined using the relative weight towards oneself, i.e. \(\:D\left({\mathbf{k}}_{\mathbf{a}},\widehat{\mathbf{a}}\right)\) for the self-weighting of some group member \(\:a\). A low value of \(\:D\left({\mathbf{k}}_{\mathbf{a}},\widehat{\mathbf{a}}\right)\) indicates a high degree of self-weighting. Self-weighting is particularly interesting as its magnitude depends on how much new information the participant receives during the experiment. For example, consider a hypothetical “impossible” communication scenario, where the noise is imagined to be so loud that there is no way for participants to exchange any information. In such a scenario, each group member would be forced to simply repeat their prior decisions in the post-conversation round. This would result in weights that are equal to one towards oneself and zero towards others. Each participants’ weight vector would thus be parallel to their own axis. As the noise level gradually increases from quiet to infinite noise, one might expect an equally gradual increase in self-weighting, representing the effect that information from other group members gradually became harder to obtain or less reliable as the noise increased. If the noise level used in this study is loud enough to impact information exchange negatively, self-weighting should be higher in noise.
The third summary statistic used was weight equality. Defining the uniform weight \(\:\widehat{\mathbf{u}}=\left[1\hspace{1em}1\hspace{1em}1\right]\), the distance \(\:D\left(\mathbf{k},\widehat{\mathbf{u}}\right)\) was used to quantify any given weights’ distance to this uniform weighting. A low value of \(\:D\left(\mathbf{k},\widehat{\mathbf{u}}\right)\) thus indicated high weight equality. This statistic is motivated by mathematical considerations about the optimal weights that interacting agents can use to combine information in decision-making tasks 27,29. In the original CWMV model, this observation is one of the motivations for transforming the raw confidence ratings into log-odds23. Assuming that the confidence ratings \(\:c\) provided by participants reflect their probability of being correct on a given trial, the ideal value of the weight vector \(\:k\) in the present model would be a uniform weight, since the log-odds transformation of the confidence ratings is already performed in the model via \(\:C=\text{ln}\left(\frac{c}{1-c}\right)\). Under these assumptions, the uniform weight represents the weight that an ideal observer would use, and non-uniform weights are interpreted as representing non-ideal information exchange. If noise impacts information exchange negatively, weight equality should thus be expected to be higher in quiet conditions.
The fourth and final summary statistic used was weight similarity. Weight similarity was quantified by the distance between the weights of each pair of individuals in a group, i.e. \(\:D\left({\mathbf{k}}_{\mathbf{a}},{\mathbf{k}}_{\mathbf{b}}\right)\), \(\:D\left({\mathbf{k}}_{\mathbf{a}},{\mathbf{k}}_{\mathbf{c}}\right)\) and \(\:D\left({\mathbf{k}}_{\mathbf{b}},{\mathbf{k}}_{\mathbf{c}}\right)\). Lower values of these distances indicate that members used more similar weights. Weight similarity may be related to successful information exchange, as similar weights would mean that group members are making similar posterior decisions. One route by which such decision similarity can occur is if 1) group members successfully share with each other all relevant cues that they use to make their prior decision, and that 2) the validity of each shared cue is judged similarly by each group member when making the posterior decision. In this view, weights that are close together will be indicative of both successful exchange of information and collective agreement on the validity of the exchanged information. Thus, the closer together members’ weights were, the more successful the exchange of information. If the noise impacts information exchange negatively, weight similarity should thus be expected to be higher in the quiet condition.
Statistical analysis
The four weight change statistics were compared between the two conditions using permutation tests. For individual-level analysis, permuted samples were created by randomly shuffling the noise and quiet labels 10,000 times for each participant’s confidence ratings. New weights were estimated in each permuted sample, and permuted summary statistics were calculated using these weights. All tests were two-tailed, except for the test of the overall weight change statistic, which was one-tailed, since the statistic in question was non-negative by definition. For population-level analysis, permutation tests were performed using the median of the permuted weight change statistics from the individual-level analysis.