Results of Experiment 1: Nameabilities of proportionally presented gesture and speech fragments
The twenty gestures (length = 1771.00 ms, SD = 307.98) and speech segments (length = 447.08 ms, SD = 93.48) were divided into fragments of 5 different durations relative to their minimal lexical length: i.e., 0.5, 0.75, 1, 1.25, and 1.5 DP/IP. For each of the gesture and speech fragments, the last answer given was considered to be the one indicating comprehension. Nameability was calculated as the percentage of participants that provided the most commonly used label.
One-way analysis of variance (ANOVA) showed a significant main effect of gesture nameability (F(4) = 7.630, p <.001, ηp2 =.135). Overall, the nameability of gestures increased as the presentation time increased (Figure 1C), with the smallest nameability occurring when gestures were presented at 0.5 DP (mean =.35, SD =.16) and the largest occurring when gestures were presented at 1.5 DP (mean =.56, SD =.23). Nameabilities were also reported when gestures were presented in lengths of 0.75 DP (mean =.40, SD =.17), DP (mean =.46, SD =.20), and 1.25 DP (mean =.52, SD =.21).
A similar pattern was found in the five speech conditions (0.5 IP: mean =.41, SD =.20; 0.75 IP: mean =.53, SD =.18; IP: mean =.64, SD =.19; 1.25 IP: mean =.77, SD =.14; 1.5 IP: mean =.86, SD =.12) (Figure 1D), with a significant main effect of speech nameability as reflected in the one-way ANOVA (F(4) = 46.226, p <.001, ηp2 =.487).
Importantly, there was a significant Pearson correlation coefficient for gesture lengths and nameability (Pearson’s r =.996, p <.001) (Figure 1C), as well as a significant correlation for speech lengths and speech nameability (Pearson’s r =.999, p <.001) (Figure 1D). Together, the results indicated that the lexical information of either gesture or speech was proportionally expressed over processing time, which was measured at its semantic discrimination point.
Results of Experiment 2: ERP evidence of the dynamic neural stages for proportionally presented gesture-speech integration
Behavioral results
Three of the five gesture/speech fragments were chosen: the 0.75 DP/IP (before_ DP/IP), DP/IP, and the 1.25 DP/IP (after_DP/IP). Gesture segments were presented as primes and were immediately followed by speech segments. Two experimental manipulations were created, the gender congruency factor (e.g., a man doing a gesture combined with a male voice or a woman doing a gesture combined with a male voice) and the semantic congruency factor (e.g., a man or a woman doing a ‘cut’ gesture while saying Mandarin speech word ‘剪jian3 (cut)’ or a man or a woman doing a ‘spray’ gesture while saying ‘剪jian3 (cut)’).
A 3 (gesture fragments) * 3 (speech fragments) * 2 (semantic congruency) repeated-measures ANOVA revealed a significant main effect of semantic congruency (F(1, 29) = 38.618, p <.001, ηp2 =.57), with longer reaction times (RTs) for semantically incongruent (mean = 561.60, SD = 65.89) than semantically congruent (mean = 553.60, SD = 62.75) trials. A significant three-way interaction among gesture fragments, speech fragments, and semantic congruency (F(3.655, 105.995) = 2.556, p =.048, ηp2 =.081) was reported, reflecting that the magnitude of semantic congruency, which was used as an index of gesture-speech integration, was modulated by the interplay of semantic constraints by gesture with the speech representation presented.
The proportionally increased presentation of gesture information does not affect the gesture-speech semantic congruency effect, as reflected by a nonsignificant interaction of gesture fragments by semantic congruency (F(1.804, 52.314) =.879, p =.411, ηp2 =.029). The interaction of speech fragments by semantic congruency (F(1.925, 55.826) =.791, p =.454, ηp2 =.027) indicated no significant influence of the semantic congruency effect by the various speech information presented.
Additionally, simple effect analysis with Bonferroni correction showed a significant difference in RTs between the semantically incongruent and congruent conditions in the conditions of before_DP & before_IP (F(1, 29) = 7.369, p =.011, ηp2 =.203), DP & IP (F(1, 29) = 14.13, p =.001, ηp2 =.328), after_DP & IP (F(1, 29) = 7.141, p =.012, ηp2 =.198) and after_DP & after_IP (F(1, 29) = 22.617, p <.001, ηp2 =.438) (Figure 2A).
For the control factor of gender congruency, there was also a significant main effect (F(1, 29) = 84.403, p <.001, ηp2 =.744), indicating longer RTs when speech and gestures were produced by individuals of different genders (mean = 570.80, SD = 66.39) than by individuals of the same gender (mean = 545.26, SD = 63.08). The nonsignificant interactions indicated no gender influence on aspects of the semantic information conveyed by either gesture or speech during gesture-speech integration, as shown by the interaction of gesture fragment and gender congruency (F(1.995, 57.867) =.382, p =.684, ηp2 =.013); the interaction of speech fragment and gender congruency (F(1.768, 51.277) =.015, p =.978, ηp2 =.001); and the three-way interaction of gesture fragment, speech fragment and gender congruency (F(3.650, 105.849) = 2.044, p =.100, ηp2 =.066) (Figure 2B).
ERPs
Figure 3 presents grand-average ERPs elicited by the nine experimental conditions. Generally, semantically incongruent gesture-speech pairs elicited larger negative ERPs (mean = -1.059, SE =.172) than when the primer gesture contained congruent information (mean = -.934, SE =.170) with the subsequent speech. Based on the averaged ERP waveform, three components were identified and further analyzed: the early N1 effect from 0-100 ms (18), the N400 component from 300-500 ms (10, 26) and a LPC from 500-800 ms (27) (Figure 3). Notably, the gender congruency factor was ignored, as the paired t test failed to show a significant effect of gender congruency on the amplitude of N1 (t(29) = -1.325, p =.196), N400 (t(29) = 1.068, p =.294), and LPC (t(29) =.352, p =.727).
N1 effect: 0-100 ms time window
A 3 (gesture fragments) * 3 (speech fragments) * 2 (semantic congruency) ANOVA on early ERP components (0–100 ms after speech onset) revealed a significant main effect of gesture fragments (F(1.5826, 45.881) = 31.947, p <.001, ηp2 =.524), with the largest negative amplitude being observed in the after_DP condition (mean = -1.822, SE =.326), and the smallest negative amplitudes observed in the before_DP condition (mean = -.879, SE =.251) and DP condition in the middle (mean = -1.418, SE =.309) (Figure 4C). Further analysis of the 3 (gesture fragments) *6 ROIs ANOVA indicated a peak main effect of the gesture fragment over central-anterior sites for both the left and right hemispheres: LA (F(2, 28) = 32.249, p <.001, ηp2 =.697), RA (F(2, 28) = 28.650, p <.001, ηp2 =.672), LC (F(2, 28) = 19.430, p <.001, ηp2 =.581) and RC (F(2, 28) = 17.067, p <.001, ηp2 =.549). A main effect of the gesture fragment was also reported in the midline electrodes (F(2, 28) = 28.822, p <.001, ηp2 =.673) (Figure 4A).
There were no such early ERP effects due to speech fragments, as the 3 (gesture fragments) * 3 (speech fragments) * 2 (semantic congruency) ANOVA revealed no main effect of speech fragments (F(1.958, 56.775) =.014, p =.985, ηp2 <.001) (Figure 4B). There was neither a significant main effect of semantic congruency (F(1, 29) =.975, p =.332, ηp2 =.033) nor a semantic effect on gesture fragments (F(1.682, 48.780) =.192, p =.788, ηp2 =.007) or speech fragments (F(1.965, 56.973) =.505, p =.603, ηp2 =.017). Taken together, these results illustrate that the early ERP effect was caused by an increasing top-down gesture constraint resulting from proportionally added lexical representation.
N400 effect: 300-500 ms time window
For 300-500 ms epochs, the 3 (gesture fragments) * 3 (speech fragments) * 2 (semantic congruency) ANOVA revealed a significant main effect of gesture fragments (F(1.979, 57.394) = 11.646, p <.001, ηp2 =.287) and semantic congruency (F(1, 29) = 11.834, p =.002, ηp2 =.290), as well as a significant three-way interaction (F(6.955, 201.691) = 2.202, p =.036, ηp2 =.071). However, there was no effect of speech fragments (F(1.895, 54.944) =.785, p =.455, ηp2 =.026). There was also no single effect for gesture (gesture fragments * semantic congruency, F(1.907, 55.290) = 2.218, p =.121, ηp2 =.071) or single effect for speech (speech fragments * semantic congruency, F(1.964, 56.960) =.201, p =.815, ηp2 =.007). The results suggest that the neural correlate of gesture-speech integration indicated by a significant N400 effect between incongruent and congruent pairs was modulated by the interplay between the effect of gesture fragments and that of speech fragments.
Additionally, six (ROIs) separate ANOVAs of the 3 (gesture fragments) * 3 (speech fragments) * 2 (semantic congruency) analyses demonstrated a significant N400 effect in the before_DP/before_IP condition in the LA (F(1, 29) = 4.186, p = .050, ηp2 = .126), RA (F(1, 29) = 4.227, p = .049, ηp2 = .127), LC (F(1, 29) = 4.175, p =.050, ηp2 = .126) and RC (F(1, 29) = 4.402, p =.045, ηp2 = .132); in the DP/IP condition in the LC (F(1, 29) = 5.477, p = .026, ηp2 = .159), LP (F(1, 29) = 6.450, p = .017, ηp2 = .182) and RP (F(1, 29) = 5.056, p = .032, ηp2 = .148); in the after_DP/IP condition in the LA (F(1, 29) = 8.277, p = .007, ηp2 = .222), RA (F(1, 29) = 11.961, p = .002, ηp2 = .292), LC: (F(1, 29) = 5.582, p = .025, ηp2 = .161) and RC (F(1, 29) = 6.355, p = .017, ηp2 = .180); and in the after_DP/after_IP condition in the LA (F(1, 29) = 13.795, p = .001, ηp2 = .322), RA (F(1, 29) = 12.402, p = .001, ηp2 = .300), LC (F(1, 29) = 12.344, p = .001, ηp2 = .299), RC (F(1, 29) = 9.240, p = .005, ηp2 = .242), LP (F(1, 29) = 6.683, p = .015, ηp2 = .187) and RP (F(1, 29) = 6.397, p = .017, ηp2 = .181) (Figure 5A).
For the average of the middle line electrodes, a separate 3 (gesture fragments) * 3 (speech fragments) * 2 (semantic congruency) ANOVA indicated a significant three-way interaction (F(3.642, 105.627) = 3.240, p =.018, ηp2 =.101). There was no significant interaction between gesture fragments and semantic congruency (F(1.603, 46.487) =.477, p =.582, ηp2 =.016) or between speech fragments and semantic congruency (F(1.975, 57.273) =.332, p =.716, ηp2 =.011). Simple effect analysis showed that the significant interplay between the speech effect and the gestural modulation of the gesture-speech integration effect occurred in the following conditions: before_DP/before_IP (F(1, 29) = 14.588, p =.001, ηp2 =.335), DP/IP (F(1, 29) = 7.008, p =.013, ηp2 =.195), and after_DP/after_IP (F(1, 29) = 16.329, p <.001, ηp2 =.360) (Figure 5B). These results indicated that significant gesture-speech integration, as reflected by the N400 effect, existed in conditions in which top-down gesture constraints were balanced with bottom-up speech presentations.
Most importantly, a negative correlation was found for both the semantically congruent (Pearson’s r =.934, p <.001) and semantically incongruent conditions (Pearson’s r =.831, p =.006) over the middle line electrodes between the N400 amplitude and the 9 experimental manipulations (3 gesture fragments * 3 speech fragments), as the amount of information conveyed in the two modalities was gradually increased. Specifically, the largest negative N400 amplitude was found in the before_DP/before_IP condition, and the smallest negative N400 amplitude was observed in the after_DP/after_IP condition. This suggests that in addition to the N400 effect, the N400 amplitude itself may be related to the probability of representation and the degree of intersected semantic features between gestures and speech.
LPC effect: 500-800 ms time window
A 3 (gesture fragments) * 3 (speech fragments) * 2 (semantic congruency) ANOVA for 500-800 ms epochs showed a significant main effect of semantic congruency (F(1, 29) = 6.915, p =.014, ηp2 =.193). However, there was no main effect of gesture fragments (F(1.894, 54.940) = 1.255, p =.292, ηp2 =.041) or speech fragments (F(1.760, 51.035) = 1.490, p =.235, ηp2 =.049). There were also no effects of gesture (gesture fragments * semantic congruency, F(1.553, 45.044) =.614, p =.506, ηp2 =.021) or speech (speech fragments * semantic congruency, F(1.864, 54.049) =.208, p =.798, ηp2 =.007).
A three-way interaction among gesture fragments, speech fragments, and semantic congruency (F(3.390, 98.297) = 4.226, p =.005, ηp2 =.127) was reported. Simple effect analysis with Bonferroni correction showed a significant ERP difference between the semantically congruent and incongruent conditions in the conditions of before_DP/before_IP (F(1, 29) = 16.458, p <.001, ηp2 =.362), DP & IP (F(1, 29) = 6.834, p =.014, ηp2 =.191) and after_DP/after_IP (F(1, 29) = 12.812, p =.001, ηp2 =.306).
Analysis of the factors of ROIs demonstrated a significant LPC effect in the before_DP/before_IP condition in the LA (F(1, 29) = 16.469, p < .001, ηp2 = .362), RA (F(1, 29) = 9.745, p = .004, ηp2 = .252), LC (F(1, 29) = 18.330, p < .001, ηp2 = .387), RC (F(1, 29) = 10.543, p = .003, ηp2 = .267), LP (F(1, 29) = 5.757, p = .023, ηp2 = .166) and RP (F(1, 29) = 7.756, p =.009, ηp2 = .211); in the DP/IP condition in the LC (F(1, 29) = 9.479, p = .005, ηp2 = .246), RC (F(1, 29) = 4.231, p = .049, ηp2 = .127) and LP (F(1, 29) = 11.504, p = .002, ηp2 = .284); and in the after_DP/after_IP condition in the LA (F(1, 29) = 15.737, p < .001, ηp2 = .352), RA (F(1, 29) = 12.644, p = .001, ηp2 = .304), LC (F(1, 29) = 13.102, p = .001, ηp2 = .311), RC (F(1, 29) = 8.582, p = .007, ηp2 = .228), LP (F(1, 29) = 9.074, p = .005, ηp2 = .238) and RP (F(1, 29) = 5.948, p = .021, ηp2 = .170) (Figure 6A).
A gesture fragment * speech fragment * semantic congruency ANOVA on the average of middle line electrodes also revealed a significant three-way interaction (F(3.436, 99.652) = 4.025, p =.004, ηp2 =.122). Simple effect analysis with Bonferroni correction found a significant semantic congruency effect in the before_DP/before_IP (F(1, 29) = 14.873, p =.001, ηp2 =.339), DP/IP (F(1, 29) = 6.003, p =.021, ηp2 =.172), and after_DP/after_IP conditions (F(1, 29) = 15.150, p =.001, ηp2 =.343) (Figure 6B). This finding implies that this observed LPC with respect to gesture-speech integration existed only when there was no dominant information modality.