Abstract
We investigated how visual processing and linguistic processing interact by determining the extent to which people's interpretations of visual scenes is affected by the sentences used to describe these scenes. Thirty-one college students watched captioned videos in which 10 shapes were moving. In a third of the trials, all 10 shapes moved randomly (“random” trials). In the rest of the trials, one shape “chased” another shape (against a background of 8 randomly moving shapes), with the distance between the two shapes being either 60 pixels or 120 pixels.
A caption appeared under each video (even the “random” videos) that read “A SHAPE is VERB-ing a SHAPE2”, where VERB referred to 1 of 4 verbs (chase, flee, lead and follow). Notice that all 4 verbs can be used to describe the same visual scene, albeit from different perspectives. After each video, subjects answered the question “Was a SHAPE1 VERB-ing a SHAPE2?”
Analyses revealed that how quickly a “chasing” event was detected WAS affected by visual characteristics of the videos (e.g., whether there was chasing and, if so, the distance between the shapes), but was NOT affected by the linguistic content of the captions (the sentence's verb and veracity). This finding is consistent with linguistic information not affecting the visual percept of an event. Conversely, how quickly a query was answered WAS affected by the linguistic content of captions (the sentence's verb and veracity), but was NOT affected by distance between the objects (a strictly visual characteristic that was not encoded in the caption). This suggests that perceptual information has relatively little residual impact on a linguistic task, once the visual percept is encoded. Taken as a whole, these results suggest that, at least during the early phases of processing, visual processing and linguistic processing are fairly independent of one another.