We investigated whether cortical processing of isolated sounds characterized by vowel formant structure and/or periodicity (pitch) differs in the auditory cortex of children with ASD and TD children, and whether differences related to the processing of these key vowel features contribute to poor perception of words in noise in children with ASD. In both groups of children presence of periodicity and formant structure was associated with sustained processing negativity (SPN) – an early (starting before 100 ms) negative shift of current in the primary and non-primary auditory cortical areas that persisted for several hundred milliseconds. We found no evidence for atypical processing of periodicity (f0) of non-vocal spectrally complex sounds lacking formant composition in children with ASD. In contrast, the SPN evoked by vowel-like sounds characterized by formant structure, was significantly reduced in ASD as compared with TD children, regardless of the periodicity of the sound. This SPN reduction emerged relatively late (around 150–200 ms after a vowel onset) and was localized bilaterally to the auditory areas anterior to the primary auditory cortex (parabelt area A4 and/or the pINS cortex). In the left, but not in the right hemisphere, reduced SPN in response to vowels predicted poor recognition of words presented against AM noise. Overall, our results suggest that impaired processing of vowel formant composition in children with ASD contributes to their impaired ability to recover words from glimpses of speech interrupted by masking noise.
Attenuated processing of vowel formant structure in children with ASD
The presence of SPN in response to periodicity or formant structure is consistent with the previous findings in neurotypical children and adults [39, 41, 88] and extends these findings to children with ASD, at least those with phrasal speech. Similar sustained negative shifts of current were observed in several recent MEG and EEG studies in response to sounds that can be characterized as acoustic patterns distinguished on the basis of their temporal properties such as periodicity [91] or frequency composition, either static or coherently changing in time [30, 38, 40, 92–94]. It has been suggested that this negative shift reflects the fundamental cortical mechanism of automatic grouping in the auditory modality [94]. The early (< 100 ms) latency of the SPN in our study (Fig. 4) is consistent with evidence on the remarkably early sensitivity of human auditory cortex to acoustic patterns [30] and, in particular, to vowels [88].
Being well-recognized vocal sounds deeply shaped by the experience of verbal communication, vowels are unique auditory objects. Studies have repeatedly shown that certain areas of the secondary auditory cortex and adjacent regions show a preference for conspecific vocalizations in humans [95–97] and non-human primates [79]. Although still debated [96], it has been suggested that voices are similar to faces in many ways, as both are “special”, carry information about the personality and emotional state of the subject, and are processed in specialized cortical areas [95].
In this respect, the decrease of SPN evoked by periodic and non-periodic vowel-like sounds in children with ASD is a remarkable finding. This decrease can be attributed specifically to an attenuated response to formant composition rather than to the periodicity of the vowel amplitude envelope (fundamental frequency / pitch), as the latter auditory cue was absent in non-periodic vowels. Since children with ASD had normal SPNs to nonvocal sounds characterized by f0 periodicity, as well as normal responses to control nonperiodic nonvocal stimuli (see Fig. 6A and B), the reduced SPNs to vowels cannot be explained by a general decrease in response amplitude or non-specific deficit in auditory pattern processing.
Despite the early start of the SPN (< 100 ms post-stimulus onset), its group differences emerged relatively late (> 150 ms post-stimulus onset) and were located predominantly in the non-primary auditory areas (Fig. 5). Spared functional activity of the primary auditory cortex in children with ASD in our study is in line with fMRI findings in ASD individuals [98, 99]. This result is also consistent with the results of Engineer et al. [100] who found in a mice model of autism that non-primary auditory cortical areas are more vulnerable to prenatal factors leading to autism than the primary auditory cortex.
Presence of typical SPN up to at least 150 ms after vowel onset (Fig. 6) suggests that the reduced activity in response to vowels in children with ASD is not inherited from the earlier stages of analysis, such as tonotopic processing of formant frequencies [101], detection of harmonicity in complex sounds [102] or detection of an acoustic pattern [30]. On the other hand, timing of the vowel-related SPN reduction generally agrees with results of the meta-analysis of MMN/MMF studies which concluded that responses to phoneme changes (either vowels or syllables) are reduced in individuals with ASD [48]. These considerations suggest that the processing deficit in children with ASD arises at the stage of phonetic analysis.
Notably, the time at which we observed a decrease in SPN in children with ASD coincides with the time at which categorization of isolated vowel sounds into distinct phoneme categories (e.g., /u/ vs. /a/) occurs (~ 175 ms post-stimulus onset) [103]. This stage is referred to as acoustic-phonetic mapping and distinguishes brain responses reflecting the true internalized percept of a vowel category from those that index acoustic properties of the vowel [103]. Deficits in phoneme category perception (e.g., relating vowels in the /y/ - /i/ continuum to the category /y/ or /i/) have previously been reported in children with ASD, despite preserved or even superior phoneme discrimination abilities (e.g., judging two vowels in the /y/ - /i/ continuum as the same or different) [104]. Therefore, it is likely that the neurofunctional abnormalities leading to decreased SPN in response to vowels in children with ASD reflect impaired acoustic-phonetic mapping necessary for phoneme categorization. This hypothesis is consistent with the observation that, in the left hemisphere, SPN reduction was most reliable and persistent in the mid-STG region located lateral to the primary auditory cortex in Heschl's gyrus (parabelt area A4 according to HCPMMP1 [73]) (Fig. 5). This area was suggested to be an initial STG waypoint of the ventral auditory stream – the auditory pathway optimized for recognition of acoustic pattern [34, 105, 106], especially those representing conspecific communication calls [107, 108]. In humans, this parabelt auditory region plays a crucial role in phoneme encoding [34, 109, 110]. The meta-analysis of neuroimaging studies of speech processing [34] concluded that phoneme recognition is associated with activation in the left mid-STG region, while the integration of phonemes into more complex patterns (i.e., words) is localized to the left anterior STG. This conclusion received strong support in a recent study of a patient with extensive lesions of the bilateral STS and left anterior STG, which showed that the intact region of the mid-STG alone can effectively subserve explicit vowel categorization despite the presence of "pure word deafness" [111].
The SPN in children with ASD was also decreased in the temporo-insular regions medially adjacent to the Hershel gyrus and extending in the anterior direction (Fig. 5). The pINS has strong structural and functional connections with the auditory cortex [77, 112–115] and is responsive to a wide variety of acoustic stimuli [78]. Yet, registration of its neuronal responses in humans [116] and non-human primates [79] has shown that this auditory region of the pINS is sensitive to conspecific vocalizations and can transmit respective auditory information further down to the anterior insula, which is involved in the evaluation of affective signals conveyed by vocal sounds. In the future, it would be interesting to test whether reduced SPN in pINS regions is associated with impaired human voice emotion recognition in ASD [117].
Apart from signals of auditory modality, the pINS comprises neuronal representations of somatosensory, motor, visual, vestibular, limbic signals and is thought to be involved in multisensory integration [118]. The right insula seems to be particularly important for the audiovisual integration [119]. In this regard, it is interesting that the reduction in the SPN induced by vowels in our participants with ASD was strongest in the right pINS (Fig. 5). In the future, it is interesting to investigate whether atypical activity or connectivity of the right pINS contribute to severe deficit in audiovisual integration during phoneme recognition in children with ASD [120].
However, it should be noted that the effects found in the insula in our study should be interpreted with caution because the MEG localization error is increased in deep structures such as the insula [74].
Suppressed processing of vowel formant structure is associated with words in noise perception difficulties in children with ASD
The reduced negativity underlying processing of formant structure in children with ASD predicted the severity of their word recognition problems in the AM noise: the diminished SPN responses to vowels in the left hemisphere were associated with lower WiNam scores (Table 2, Fig. 7). This finding has several important implications for interpreting vowel processing deficit and its impact on auditory speech recognition in ASD.
First, while WiN performance in children with ASD showed some developmental improvement throughout childhood, neither child’s age nor IQ could explain correlations between the reduced SPN and lowered WiNam scores (Table 2). The lack of correlation between WiN scores and IQ agrees well with the previous findings on the presence of speech-in-noise recognition difficulties even in high-functioning individuals with ASD [13]. On the other hand, our results suggest that these problems may be caused, at least in part, by a deficient vowel processing at the level of the non-primary auditory cortex.
The passive presentation of auditory stimuli and the presence of the SPN deficit at already ~ 150–200 ms after sound onset - i.e., at the preattentive stage of processing - make the potential contribution of higher-order factors such as voluntary attention or motivation unlikely. Yet, involuntary orienting of attention to auditory stimuli may still influence differences between ASD and TD groups. Indeed, the P3a-like responses to periodic and nonperiodic vowels were observed in both TD and ASD children in our study, likely reflecting involuntary shift of attention to perceptually salient speech stimuli [121]. These responses were left-lateralized, consistent with the left-hemispheric bias of the P3a novelty response to speech revealed in the auditory cortex during intracranial recordings in patients [89]. The presence of elevated P3a to periodic vowels in the left hemisphere in our participants with ASD suggests that their reduced negative responses to vowels is unlikely to be due to inattention to the auditory stream containing speech sounds as it was previously suggested [122]. On the contrary, their involuntary attention seems to be captured by perceptually salient periodic vowel stimuli to a greater degree than in TD children.
The excessive P3a-like response could contribute to the decrease in the left-hemispheric SPN to periodic vowels and its correlation with WiNam scores in children with ASD, but it can hardly explain the general trend toward SPN reduction or a common correlation pattern for both periodic and non-periodic vowels. There are several arguments in support of this assumption. (1) No group differences in the P3a-like responses or distinct P3a-like peaks were observed in the right hemisphere, despite the prominent right-hemispheric SPN attenuation in ASD vs TD group (Fig. 4, 6). (2) In case of non-periodic vowels, the group differences in vowel-related negativity started already around 150 ms (Fig. 6D), i.e. in the time interval when P3a is not yet evident. (3) The group differences in P3a amplitude and in frequency-of-occurrence of the P3a peak were found for periodic vowels only, while in ASD group, the SPN was reduced for both periodic and nonperiodic vowels. (4) In the left hemisphere, SPN was a better predictor of WiNam scores in children with ASD than the amplitude of the P3a-like component (Supplementary Table S2).
Although beyond the scope of this paper, the possible role of an enhanced left-hemispheric P3a-like response to speech sounds in autism deserves mention. The previous studies have shown that the P3a can be relatively independent of antecedent negativity. For example, Torppa et al. [123] observed in children with cochlear implants and normal hearing smaller MMN but larger P3a in response to speech sounds. Vlaskamp et al. [124] found that tone duration deviants induced smaller MMN, but larger P3a in children with ASD. It has been hypothesized that the larger P3a reflects increased recruitment of neural resources to compensate for less efficient automatic processing of salient sounds that lie outside the current attentional focus [124].
Second, despite the presence of altered vowel-evoked SPN in the auditory cortex of both hemispheres (Fig. 5), correlations with WiNam scores were only found in the left hemisphere (Table 2), indicating a specific relationship between the functional integrity of the left secondary auditory cortex and the ability to recognize words in fluctuating noise in children with ASD. Our previous study, which used the same stimuli to compare the SPN responses in neurotypical children and adults [41], showed that the left hemispheric asymmetry in vowel-evoked SPN was present in adults but not in children, in whom SPN responses had the equal amplitude in both hemispheres. The correlation between left-hemispheric but not right-hemispheric SPN to vowels and WiNam scores in children with ASD suggests that some degree of left-hemispheric specialization for vowel processing is already present in childhood and possibly increases with age, driven by the need to integrate the encoding of vowel spectral composition with a predominantly left-lateralized language system (see [125] for the concept of graded hemispheric specialization).
The left STG region that most reliably separated between ASD and TD participants in the present study is remarkably similar to the region that appeared to be sensitive to intelligibility of the sentences that, in turn, depends on the slow temporal modulation of the speech signals at the level of syllables (3–4 Hz) [105]. Our results do not exclude a role of the left A4 region in sentence intelligibility, perhaps in the context of the top-down interactions between phonetic and higher-level (lexical, syntactic, working memory, etc.) processes, but suggest that this region is tuned to spectro-temporal composition of vowels and that weakening of this tuning hinders WiNam task performance.
Third, while the atypical left-hemispheric processing of vowels in children with ASD correlated with WiNam scores, it did not correlate with WiNst scores (Table 2). This pattern of correlations suggests that the impaired vowel processing interferes with the ability of listeners with ASD to use dips in noise to capture acoustic cues. Psychoacoustic studies have shown that in subjects with normal hearing, information important for word recognition in fluctuating noise is conveyed through both the temporal fine structure (TFS) of vowels, i.e., carrier frequencies of formants, and their common amplitude envelope (f0 / pitch) [52]. In our study, WiNam scores in children with ASD correlated with SPN evoked by periodic and non-periodic vowels (Table 2). Since both have a formant structure, but nonperiodic vowels lack f0, atypical processing of formant frequencies seems to be a crucial factor contributing to the difficulties in perceiving words in AM noise in children with ASD. However, late (> 150 ms post-stimulus onset) occurrence of SPN reduction and its location in non-primary auditory cortex (left mid-STG region) suggests that the poor “dip listening” is due to insufficient grouping of formants into a “vowel object” rather than decoding of TFS separately for each formant frequency. Consistent with this hypothesis, a recent study of older adults found that impaired central grouping of acoustic patterns is a major contributor to their deficits in processing speech in noise [126].
Our study has several limitations. First, we restricted the analysis to temporal cortical regions, where the amplitude of response to sound is maximal, whereas important differences in the processing of linguistic stimuli in autism can also be observed outside the auditory cortex, such as in inferior frontal regions [135]. Second, we presented vowel stimuli which are very special overlearned conspecific auditory objects. It would be important to clarify whether the ASD-related deficit in SPN is specific to vowels or whether it is also observed for other auditory objects that have constant or coherently changing frequency composition. Third, since we used simple words to test tor speech-in-noise processing difficulties in ASD, one should be cautious about generalizing the findings to more complex linguistic constructions such as sentences. In the case of sentences, speech recognition in noise may be supported by prosody, the slow (syllable-rate) envelope of the speech signal [52], and higher-order semantic cues [136] that are absent or less important in the case of isolated words. Fourth, we did not control subjects' attention to the auditory stimuli, so the possibility remains that differences in attention allocation could affect the results. Comparing the responses in passive and active listening paradigms may help to clarify the role of attention in the observed differences in SPN. Fifth, only boys participated in this study. There are multiple sex differences in individuals with ASD (time of diagnosis, genetic burden, neurological and cognitive abnormalities) that may influence the variables investigated in this study [137]. Our sample size did not allow us to analyze the effect of gender. Therefore, we decided to limit our sample to males, who constitute the majority of individuals diagnosed with ASD. More research is needed to see if the findings can be extended to girls with ASD.
Direction for future research
Word recognition in amplitude-modulated noise depends on multiple integrative processes occurring at different levels of the brain hierarchy and involving numerous feedforward, recurrent, and top-down interactions [127, 128]. In a highly heterogeneous ASD population, difficulties with speech perception in noise may arise for a variety of reasons that are attributable to impairments at different stages of the auditory pathway or at higher hierarchical levels. Thus, our results indicating a role for impaired processing of vowel formant structure in WiN perception deficits in children with ASD do not exclude the contribution of other factors. In some children with autism, poor WiN recognition may be due to deficits occurring already at the subcortical level [22–24, 129, 130], as indexed by the frequency following response to speech sounds [131]. In the future, it would be important to investigate whether impairments in the analysis of the temporal fine structure of sound (TFS) [132] in the brainstem, and the deficit in cortical processes leading to the formation of auditory object contribute independently to poor WiN performance in children with ASD. Our findings do not rule out the “cognitive” hypothesis, which, based on behavioral results, attributes poor masking release in individuals with ASD to a weakness of the domain-unspecific mechanisms that integrate glimpsed fragments into meaningful speech [55, 56]. However, since our study was not designed to test this hypothesis, additional neuroimaging research is warranted to address this issue.
Impaired speech-in-noise hearing is one of the central symptoms of auditory processing disorder (APD) - difficulties in recognizing and interpreting sounds that result from central auditory nervous system dysfunction [133] and are often seen in children with ASD and other neurodevelopmental disorders [134]. Detecting at what level of speech signal analysis this dysfunction takes place is important for development of effective and personalized intervention for auditory processing abnormalities not only in ASD, but also in other neurodevelopmental disorders. In this respect, our findings contribute to an emerging profile of children with developmental listening difficulties that may be caused by abnormal processing of speech at different levels of the central nervous system.