Abstract
Reading is a recent cultural invention that exploits the intrinsic abilities of the visual system to process text. However, the underlying neural mechanism that enables us to read efficiently is unclear. Our ability to read fluently can arise due to the formation of specialized detectors for letter combinations. Alternatively, the representation of words can be more compositional, like the default representation in visual cortex wherein the neural response of an object can be predicted using its part responses. Here, we show evidence for the latter hypothesis by constructing a model in which the response to a string can be predicted using single letter responses. This model is purely visual in nature and does not incorporate any linguistic factors.
We tested the performance of this model in predicting human performance in two tasks. The first was visual search, in which subjects had to find an oddball target string embedded among distractors. The second was a lexical decision task, in which subjects had to indicate whether a given string was a word or not. In both tasks, the model was able to predict human performance accurately, without invoking any lexical or linguistic factors. To investigate the underlying neural correlates, we performed measured brain activity using fMRI while subjects performed a lexical decision task. We found that dissimilarities between words and nonwords in visual search corresponded best with neural dissimilarities in the Lateral Occipital region (LO). By contrast, lexical decision times, which were best predicted using word-nonword dissimilarities in the compositional model, were best matched to the overall activation of the Visual Word Form Area (VWFA). Thus, viewing a string of letters activates a compositional code in the higher visual areas, and subsequent decisions about its lexical status are computed in the visual word form area.