Abstract
Real-world scenes are filled with objects representing not only visual information, but also meanings and semantic relations with other objects in the scene. The guidance of eye movements based on visual appearance (low-level visual features) has been well studied in terms of both bottom-up and top-down aspects. However, effects on eye movements by object meaning and object relations have been studied comparably less because of a few hurdles that make such study more complicated: (1) Object segmentation is difficult, (2) Semantic relations among objects are hard to define, and (3) A quantitative measure of semantic guidance has to be developed.
This is the first study to measure semantic guidance, thanks to the efforts by two other research groups: the development of the LabelMe object-annotated image database (Russell et al., 2008) and the LSA@CU text/word latent semantic analysis tool, which computes the conceptual distance between two terms (Landauer et al., 1998). First, we generated a series of semantic salience maps for each eye fixation which approximated the transition probabilities for the following saccade to the other objects in the scene, assuming that eye movements were entirely guided by the semantic relations between objects. Subsequently, these semantic salience maps were combined and the final strength of guidance was measured by the Receiver Operator Characteristic (ROC), which computes the extent to which the actual eye fixations followed the ideal semantic salience map.
Our analysis confirms the existence of semantic guidance, that is, eye movements during scene inspection being guided by a semantic factor based on the conceptual relations between the currently fixated object and the target object of the following saccade. This guidance may facilitate memorization of the scene for later recall by viewing semantically related objects consecutively.
Supported in part by NIH grant R15EY017988 to M.P.