Summary of findings
We found electrodes responsive to visual word presentation across occipital, temporal, parietal, and frontal lobes with a wide variety of response shapes, including both increases (enhancement) and decreases (suppression) in high gamma power. Responses in occipital lobe and fusiform gyrus were strongest, fastest, and had the highest percentage of enhanced responses, while frontal subregions had the slowest responses with the lowest percentage of enhanced responses. We further analyzed the diversity of temporal response shapes among electrodes using agglomerative clustering and found both sustained and transient responses at a variety of latencies. Last, we analyzed neural sensitivity to a hierarchy of word features including visual, phonological, lexical, and semantic features to provide a detailed account of stimulus encoding over time and space. Anatomically, we found early occipital representation of visual features, concurrent in lingual gyrus with sensitivity to word neighborhood size, followed shortly by sensitivity in fusiform gyrus to frequency, letter, and phoneme, sensitivity in inferior frontal gyrus to frequency and semantics, and late representation of several features along the lateral sulcus. Furthermore, we found that electrode populations with different temporal response shapes revealed in our clustering analysis showed significant encoding of stimulus features; we found a broadly-sensitive, enhanced cluster with early representation of visual, phonological, and lexical features; three separate mid-latency enhanced clusters with specialized sensitivity to frequency, semantics, and phonemes and letters, respectively; and sparse representation of semantics in a mid-latency suppressed cluster. Taken together, our results provide evidence of both feed-forward and feed-back processing during visual word recognition and demonstrate that stimulus encoding can be achieved by anatomically distributed networks.
Spatiotemporal feature sensitivity
Our results provide a detailed map of feature representation in time and space that builds upon an extensive literature of visual word recognition. To that end, it is useful to note how our observations compare with previous findings. Previous studies had implicated occipital lobe, fusiform gyrus, and the N150 in orthographic processing; we found low-level visual sensitivity occurring within 100 ms in lingual gyrus and in the early, enhanced functional cluster 2, which might underlie an early orthographic-specific ERP, though we do not see letter sensitivity in fusiform until 380 ms. Our observation of early phoneme sensitivity is plausibly consistent with the finding that the N250 is modulated by phonological content 24,31–34, while late phonological encoding in precentral gyrus is consistent with literature suggesting its involvement in phonological decoding 6,9,52. Notably, we find that phonological sensitivity is similar but not precisely overlapping with letter sensitivity, and in fact precedes letter sensitivity in most areas. These similarities could be driven by the relatively high correlation between letter and phoneme information in English generally and in our stimulus set specifically (S3); a larger set of test words exhibiting contrasting attributes of each hierarchical type could better tease these factors apart in future study.
We observe substantial lexical sensitivity both for orthographic neighborhood size and word frequency. Neighborhood sensitivity largely overlaps with bitmap sensitivity in lingual gyrus and the broadly representative cluster 2, suggesting that visual processing may coincide with the activation of orthographically similar words, at higher spatial resolution and shorter latency than previously found (150 ms 45). Our measure of frequency sensitivity is consistent with previous studies implicating word frequency in fusiform gyrus 22,53 and inferior frontal gyrus 22, and in the range of 100-200 ms 46–48. We further demonstrate that frequency sensitivity is specifically represented by the anatomically distributed cluster 4 at 160 ms. Our results suggest that lexical features are represented early, robustly, and distributed across multiple brain areas.
Anatomically, semantic sensitivity is observed in areas that are reliably implicated in semantic processing, including inferior frontal gyrus, superior frontal gyrus, and middle temporal gyrus 9. Further, we observe semantic sensitivity as early as 180 ms in the anatomically heterogeneous functional cluster 5, which could suggest an anatomically distributed semantic encoding as suggested by recent landmark neuroimaging studies 54,55. This timing precedes the extensively-studied, semantically-sensitive N400 ERP 35, and is early enough to be compatible with the hypothesis that the N400 is the summation of M250 and M350 51.
Interestingly, our analyses revealed several anatomical areas and a few clusters that did not show significant sensitivity to any of the selected features. There are several possible explanations for this. First, this may reflect a limitation of the experimental design; while the electrodes kept for analysis strongly respond to word presentation, our lack of a control task means that this activity may not be specific to word presentation, but may rather reflect more general visual or cognitive processing. Secondly, our analysis across time, space, and subjects with varying levels of noise may simply lack the statistical power to detect all effects in our data. Third, our feature list is not exhaustive, and it is possible that these areas are sensitive to aspects of the stimulus that were not tested here.
Feed-forward vs. connectionist
Evaluating the simultaneous spatiotemporal resolution of a hierarchy of features allows us to evaluate the flow of information through the brain and contribute to the debate between the feed-forward and connectionist accounts of visual word recognition. The strictest form of the feed-forward model posits a functional cascade in which visual features are first decoded into letters, then mapped to an item in the orthographic lexicon (likely in fusiform gyrus) before additional representations can be activated 2–6. Notably, a study using an overlapping portion of this dataset found that the subsequent memory effect does appear to be distributed in a hierarchical, feed-forward stream41. However, our anatomical results are incompatible with such a strict feed-forward view of visual word recognition in several ways: 1) we observe phonological sensitivity as early as visual sensitivity; 2) sensitivity to orthographic neighborhood size occurs very shortly following bitmap sensitivity and preceding letter sensitivity; 3) fusiform gyrus shows frequency sensitivity at 180 ms, followed by sensitivity to phoneme at 280 ms and only then to letter at 380 ms. Additionally, in our analysis of cluster sensitivity, neighborhood and frequency sensitivity preceded letter sensitivity. Contrary to the strict feed-forward model, these results suggest an early role for lexical and phonological representations, especially in occipital cortex and fusiform gyrus. However, neither do our results suggest a completely integrated response, as we do observe some systematic separation of features in time and anatomical location. Anatomically, we see bitmap, neighborhood, and phoneme representation emerge within 100 ms, followed shortly by frequency encoding at 180 ms, semantic encoding at 240 ms, and letter encoding at 380 ms. Functionally, we observe clusters with early multi-feature representation, but also individual clusters with feature-specific sensitivity at middle latencies. Overall, these results suggest that the mechanisms underlying visual word recognition include both feed-forward and feed-back processing.
Role of fusiform gyrus
Fusiform gyrus has received special attention in the reading literature and as the center of two major debates: first, whether its function is specific to visual word processing and selective to word identities 2,8,11–15,56, and second, whether it serves as a strictly feed-forward hub of orthographic information 2,3,23,28,53,57,4–7,19–22. In this study, we find significant fusiform sensitivity to frequency (consistent with previous studies22,23,53) as early as 120 ms, phoneme as early as 280 ms, and letter at 380 ms. Further study is needed to determine whether this lexical and phonological sensitivity arises from top-down or bottom-up influences on fusiform gyrus, but our results demonstrate that fusiform is not limited to encoding orthographic information. This result may be consistent with its role as sensitive to individual words. Further ECoG research using traditional false font paradigms could address this question more directly.
Anatomically distributed feature encoding
The spatiotemporal resolution of iEEG allowed us to decouple encoding networks in our data from anatomical boundaries. Using clustering analysis, we uncovered a variety of anatomically-heterogeneous temporal response shapes. Moreover, these functional populations had significant feature sensitivity, suggesting that anatomically-distributed networks can in fact be relevant to stimulus encoding. We find evidence for an early “hub” cluster which represents a combination of visual, phonological, and lexical features within 100 ms of word onset, which may feed forward to other functional groups. By 180 ms, both frequency and semantic information have been decoded in separate populations. Either by concurrent calculation or feedback, by 200 ms the hub cluster 2 has also represented word frequency. After this early, rapid processing, additional feature sensitivity emerges that could represent late-stage checking of the word identity 58. Notably, we observed clusters that encode multiple features across different timepoints as well as clusters with sensitivity to specific features. The notion of distributed encoding has gained recent traction in the human language literature, especially in the semantic domain54,55,59. Further, distributed encoding could help explain the wide range of locations and timepoints implicated in our anatomical sensitivity analysis and in any literature that analyzes data by region of interest; if a particular feature is encoded by a network of similarly-behaving neurons spread across multiple areas, anatomical analyses may detect feature encoding in several involved areas.
Limitations & future work
Limitations of this work include nonuniform coverage across subjects and brain areas, limitations inherent to invasive neurophysiology in humans. Due to the limited time for testing and the experiment’s dual role as a memory test, the range and quantity of stimuli were limited, and many subjects saw each word only one time; further work with repeated trials would enable useful decoding and reliability analyses. The task demands on the subject- to recall each list of words- may also impact the neural response; comparison with lexical decision or other reading tasks would allow assessment of the task impact on processing. Including useful controls such as false fonts, consonant strings, or pronounceable pseudowords would allow more fine-grained study of particular word features and neural mechanisms.