The present study aimed at investigating the impact of hearing loss on adaptation to noise in speech recognition. We found that, in aided conditions, the SRT for words in competition with SSN worsens with increasing hearing loss, both for natural and vocoded words. We also found that adaptation to noise decreases as hearing loss increases, both for natural and vocoded words. These relationships remained when the effect of age on PTA was partialled out, which demonstrates that the loss of adaptation is related to the hearing loss and not to ageing.
The loss of adaptation to noise adds to the loss of speech information
Previous studies have shown that hearing loss impairs the encoding of and thus the access to speech TFS (Hopkins et al., 2008; Kale and Heinz, 2010; Lorenzi et al., 2006; Moore, 2008, 2011) and envelope information (Fullgrabe et al. 2003; Kale and Heinz, 2010; Ozmeral et al., 2018). It also degrades the resolution of the long-term speech spectrum in the auditory system because of poor frequency selectivity (Başkent, 2006; Henry et al. 2005; Leek and Summers, 1996; Noordhoek et al., 2000; Stelmachowicz et al., 1985; Turner et al., 1999). In addition, hearing loss at frequencies higher than those commonly measured in the audiogram (8-20 kHz) contributes to impaired speech recognition in broadband noise (Zadeh et al., 2019). Here, we demonstrate that hearing loss not only impairs access to speech acoustic cues, but also impairs the ability of listeners to adapt to the noise.
Figure 4 illustrates the contribution of various factors to the loss of intelligibility for the present HI listeners with the largest hearing loss (supplementary Fig. S1 shows similar data for participants with smaller losses). Panel A shows the SRTs for the six NH listeners with the best PTA thresholds (from 0 to 3 dB HL; mean=1.7 dB HL) and for the six HI listeners with the worst PTA thresholds (from 64 to 83 dB HL; mean=70.0 dB HL). NH listeners showed better SRTs for natural than for vocoded speech and even better SRTs when they had the opportunity to adapt to the noise. In contrast, HI listeners showed worse SRTs than NH listeners (p<0.05) and virtually constant SRTs across conditions (p≥0.293). The results can be interpreted as follows. First, because the vocoder preserved only the speech envelope and spectral information below 8.5 kHz (the highest cutoff frequency of the filter bank in the vocoder), the worse SRTs for vocoded words for HI than for NH listeners (VocNoPrec in Fig. 4A,B) reveals an impaired ability of HI listeners to use envelope and/or spectral information below 8.5 kHz. This impairment explains 52% of the total SRT loss (Fig. 4C). Second, because the recognition of natural speech depends on all the same cues as the recognition of vocoded speech plus TFS (Moore, 2011) and high-frequency (> 8.5 kHz) information (Zadeh et al., 2019), the difference between natural and vocoded words reveals the ability to use TFS and high frequency spectral information. Because NH but not HI listeners showed better SRTs for natural than vocoded words (VocNoPrec vs NatNoPrec in Fig. 4A), our results suggest that NH but not HI listeners benefit from adding TFS or high-frequency spectral information to speech. The lack of TFS and high frequency spectral speech information explains the 38% of the total SRT loss (Fig. 4C). Lastly, the improvement in SRT when adding a precursor shows the benefit of adaptation to noise. NH but not HI listeners showed adaptation (Fig. 4A), and the loss of adaptation explains the remaining 10% of the total SRT loss (Fig. 4C). Altogether, the present results show that HI listeners have difficulties recognizing audible speech in noise not only because they are less able to use/encode speech acoustic cues but also because they are less able to adapt to the noise background.
On the reduced adaptation to noise for HI listeners
We have found that hearing loss impairs adaptation to noise (Fig. 3, 4). Multiple mechanisms can underlie this result (Marrufo-Pérez and Lopez-Poveda, 2022; Willmore and King, 2023). Among them may be a smaller MOC reflex (MOCR)-mediated adjustments of the cochlear gain in the damaged cochlea. MOC fibers are reflexively activated with a time course of 277±62 ms (Backus and Guinan, 2006). MOC efferents terminate upon OHCs and their activation inhibits the cochlear amplifier gain, linearizing BM responses and reducing compression (Murugasu and Russell, 1996). Jennings et al. (2018a) reasoned that HI listeners do not show adaptation in AM detection because they have more linear BM responses and hence less MOCR-mediated BM linearization. A smaller adaptation to noise in speech recognition might also occur if the smaller MOCR-mediated BM linearization produces less enhancement of the speech envelope at the output of the BM responses in HI listeners. Some studies, however, have shown that adaptation to noise in speech recognition or AM detection can occur without MOCR effects (Marrufo-Pérez et al., 2018a, 2019).
Another proposed mechanism for adaptation to noise is a shift of the dynamic range of auditory neurons toward the most common level in the noise preceding the word (Ainsworth and Meyer, 1994; Marrufo-Pérez et al., 2018a, 2019, 2020; Marrufo-Pérez and Lopez-Poveda, 2022). When auditory neurons are presented with a low-varying-level noise, neurons shift their dynamic ranges toward the most common noise level so long as the level in question is above the neuron’s threshold. This adaptation increases a neuron’s sensitivity to changes in sound level (Dean et al., 2005; Wen et al., 2009). The improvement in sensitivity to level changes, however, is smaller when the variance in the stimulus level is large (Rabinowitz et al., 2011). Because audiometric hearing loss is often associated with the loss or dysfunction of inner hair cells (IHCs) and OHCs (Liberman and Dodds, 1984; Wu et al., 2020), it is possible that the reduced auditory peripheral compression due to OHC loss makes the IHC receptor potential representation more fluctuating than normal. This could result in neurons not having a prevailing level to adapt to, thus in less adaptation to noise for HI than for NH listeners (Marrufo-Pérez et al., 2020). It is yet to be shown, however, whether the inherent fluctuations of a steady noise at the output of a linear BM response are fluctuating enough to impair dynamic range adaptation to the sound level statistics.
A third potential mechanism is related with the disruption of the cues used by listeners to segregate the speech and noise stimuli. Jennings et al. (2018b) measured pure tone detection thresholds for short tones presented 2 or 197 ms after the onset of a 400 ms narrowband noise masker with flattened or with inherent fluctuating temporal envelope. They found that, when the probe was delayed in the noise, detection thresholds improved for the tones presented in the flattened noise but worsened for the tones presented in the fluctuating noise. Jennings et al. (2018b) reasoned that listeners relied on a temporal envelope-based cue to detect the probe and that the fluctuations of the preceding noise disrupted this cue. It has been shown that the amplitude fluctuations of a steady noise can also impair speech recognition when presented simultaneously with the speech, presumably because it is hard to distinguish the noise fluctuations from the amplitude fluctuations that convey speech information (Stone et al., 2011, 2012). However, it is uncertain if the precursor fluctuations can hinder speech recognition. If they did, the more linear BM responses of HI listeners would enhance the precursor noise fluctuations, making it harder for HI listeners to distinguish between noise and speech fluctuations, and resulting in smaller adaptation for HI than for NH listeners. This, however, does not seem to be the case here because although sometimes SRTs were worse with the precursor, the worsening occurred for NH and HI listeners and so it does not seem to be related with the hearing loss (points below zero in Fig. 3C, D).
Implications
The present study shows that adaptation to noise in speech recognition decreases with increasing hearing loss, even when the potential confounding effect of age is factored out. This finding is relevant in at least two ways. First, as explained earlier, HI listeners find it harder to recognize audible speech in noisy settings than NH listeners (Başkent et al., 2006; Duquesnoy and Plomp, 1983; Hopkins et al., 2008; Lopez-Poveda et al., 2014; Moore et al., 1999; Summers et al., 2013). Impaired access to speech acoustic cues (envelope, TFS, spectrum) has been shown to contribute to the impaired intelligibility. The present study reveals that the impact of hearing loss on speech recognition can be underestimated if adaptation is disregarded (Fig. 4). Second, research on the factors that can affect speech-in-noise intelligibility (for both NH and HI listeners) is often conducted disregarding adaptation or, more generally, temporal effects. Indeed, the speech-to-noise onset delay varies widely across studies (e.g., 500 ms in Johannesen et al., 2016; 3 seconds in Souza et al., 2019), sometimes is not even reported (Saunders and Forsline, 2006; Tognola et al., 2019; Wu et al., 2021a, 2021b), and most times is not justified. The present study suggests that the relevance of some factors may be different depending on the time when the speech is presented relative to the noise onset. For instance, cochlear mechanical dysfunction does not predict intelligibility in speech maskers for HI listeners (e.g., Lopez-Poveda et al., 2017), but it could predict the SRT impairment related to a loss of adaptation to noise.