About 30% of stroke patients worldwide suffer from language disorders, and the majority remains chronic [1]. Anomia, or word-finding difficulty, is a ubiquitous characteristic of aphasia, which significantly compromises communication and quality of life of individuals affected by stroke [2], [3]. Consequently, aphasia rehabilitation largely incorporates strategies fostering the recovery of impaired naming and communication by facilitating access to linguistic content.
To map a lexical concept to verbal structure requires multiple steps [4], [5]. First, there is the intention to articulate a specific concept in speech, followed by the so-called lexical access, which consists of the retrieval of a target word from a lexicon [6], [7]. At this stage, the focused concept activates the target lemma, the semantic and syntactic properties of the lexical item [8], triggering the speech form-defining, phonological system. The latter provides verbal execution where the articulatory shape of a word, in the context of other words, forms a sentence-like utterance [9]. In stroke-induced aphasia, depending on the lesion site and extent, some or all stages of this naming process might be impaired, leading to high variability in language performance deficits among affected individuals. Consequently, standard naming therapy, or the so-called cueing, is designed to address different phases and aspects of both retrieval and production [3]. For example, the well-established phonological cueing approach targets the ability to retrieve phonemes underlying the articulation of a word [10], [11]. To this aim, patients are given verbal cues that provide initial sound/s of the target word (e.g., “p” for “pancake”). Another therapeutic method is the semantic cueing, which targets the activation of lexical-semantic association networks [12], [13]. As such, semantic cueing consists of providing information that categorizes, describes, or defines target words (e.g., “it goes well with maple syrup” for a pancake).
In the clinical context, cueing is considered beneficial because it facilitates naming, consequently resulting in higher accuracy and faster reaction times of speech production. Indeed, phonological, semantic, and mixed approaches substantially improve not only immediate but also long-term naming performance as well as functional communicative effectiveness [14]–[18]. Critically, similar effects are reported when the cues are administered through technology-based methods, even to individuals with persisting aphasia [19]–[23]. This finding is particularly relevant in the context of rapid advancement of self-managed, computer-based exercises for individuals with aphasia, which are becoming widely tested and used not only as a part of the clinical inpatient care during the acute and subacute stages but also after the hospital discharge at patient’s homes [24].
The beneficial effects of cueing, whereby the naming of target words becomes faster and more accurate, are usually attributed to priming mechanisms occurring within residual language network bilaterally [25]–[27]. Depending on the type of administered cues (e.g., initial phoneme, full word), imaging studies report increased activity in regions including the right anterior insula, inferior frontal, and dorsal anterior cingulate cortices, as well as the left premotor cortex [28]. One account yields that cueing elicits activation of lexical representations at phonological and semantic levels in a selective manner [29], thus enabling the recovery of phonological or semantic deficits, respectively. This hypothesis, however, seems at odds with the notion that during therapeutic tasks such as picture naming semantic information contained by the stimuli might automatically activate phonological information and vice-versa [30], [31]. This interpretation might be explained by the interactive activation approach to word production, which proposes that lexical retrieval occurs within a distributed language network, in which nodes are connected across semantic, lexical, and phonological levels of representation in a feedforward and feedback manner (i.e., bidirectionally) [5]. Indeed, an analysis of the language connectome in both healthy controls and brain tumor patients showed a broad network spanning about 25% of the total human connectome [32]. Following this architecture, therapy-induced stimulation at the level of the semantic system can activate phonological and orthographical processing, and vice-versa. This, in turn, may explain why several studies report higher efficacy of a combined (i.e., mixed) cueing therapy rather than when semantic or phonological primes are delivered independently [18], [30]. Further supporting evidence for this network interpretation is the observation that speech perception is governed by general principles of statistical inference across all available perceptual sources [33] suggesting that similar principles of Bayesian inference are involved in cueing based rehabilitation strategies.
In this study, we aim to test the inference-based network perspective on language and its deficits. To this end, we propose two novel cueing strategies and investigate their effects on naming in the context of a within-subjects longitudinal clinical study with post-stroke aphasia patients. On the one hand, we investigated the so-called Silent Visuomotor Cues (SVC) strategy. SVCs provided articulatory information of target words presented in the form of silent videos which display lip movements of a speech and language therapist during naming [34]. On the other hand, we studied Semantic Auditory Cues (SAC). Here, the primes consisted of acoustic information semantically relevant to target words such as the sound of ringing for “telephone,” or the sound of an engine revving up for “car.”
First, the motivation to investigate SVC was grounded in neurophysiological evidence, which strongly supports the notion of perceptual functions of speech production centers. In particular, it has been demonstrated that part of the ventrolateral frontal cortex in humans (Brodmann’s 44), initially thought to be engaged in the control of motor aspects of speech production exclusively [35], [36], is also involved in the processing of orofacial gestures [37], [38]. This is well illustrated in a MEG (magnetoencephalography) study in which the authors compared the activation of the human Mirror-Neuron System, (MNS) including Broca’s area, during execution, observation and imitation of verbal and nonverbal lip forms [38]. The stimuli were presented in the form of static images illustrating orofacial gestures that solely imply action (i.e., motionless). Interestingly, the results yielded strong BOLD signals evoked bilaterally in the MNS, including Brodmann’s areas 44/45 (Broca’s area), during pure perception of lip forms. This finding explicitly demonstrates that viewing visual orofacial stimuli is sufficient to trigger activity in the distributed language network, including areas involved in word-finding and speech production. We, therefore, hypothesized that providing CVS, that is muted videos presenting lips articulating target word, might improve verbal performance, suggesting improved retrieval in participants with aphasia.
Second, we aimed to empirically explore the effects of SAC on lexical access and verbal execution in the same group. We chose to study whether semantically relevant sounds positively impact naming based on the notion of an embodied inference driven language network, which proposes that auditory and conceptual brain systems are neuroanatomically and functionally coupled [39], [40], driven by the statistics of real-world interaction [41], [42]. Specifically, a functional Magnetic Resonance Imaging (fMRI) study [39] revealed that cortical activations induced by listening to sounds of objects and animals (e.g., “ringing” or “barking”) overlap with activations induced by merely reading words that contain auditory features (e.g., “telephone”, “dog”). The authors reported the overlap in the posterior superior temporal gyrus (pSTG) and middle temporal gyrus (MTG), which suggests that common neural sources underlie auditory perception and processing of words that comprise acoustic features. Critically, MTG plays a significant role within the brain’s language network during syntactic processing in both comprehension and production of speech [43]. For example, MTG was shown to subserve the retrieval, including selection and integration, of lexical–syntactic information in a syntactic ambiguity resolution task [44], [45]. Interestingly, pSTG is also involved in speech production, which is evidenced by clinical studies of conduction aphasia [46] as well as behavioral and imaging experiments with healthy subjects who performed tasks that included word generation [47], reading [48], syllable rehearsal [49], and naming [50]–[52]. Hence, here we hypothesized that providing aphasia patients with SAC may facilitate naming possibly by activating brain regions involved in language production processing.
Similar to phonological and semantic cueing, we reasoned that, if the proposed SVC and SAC strategies are beneficial for the recovery of anomic disturbances in aphasia, they will foster naming accuracy and communication skills. We delivered and tested the efficacy of both types of cues in the context of longitudinal clinical intervention in which participants underwent a peer-to-peer Virtual Reality (VR)-based language therapy using the Rehabilitation Gaming System for aphasia (RGSa) [22], which incorporates principles of Intensive Language Action Therapy (ILAT) [53], [54].