Abstract
Inference of task based on eye movements, known as the inverse Yarbus process, is challenging. Previous approaches have used aggregate measures, such as mean fixation duration or saccade velocity as well as hand-selected areas of interest (AOIs), with classification accuracies above chance but with high loss rates. Here, we used Capsule-based Convolutional Neural Networks (CapsuleNets), a recently introduced modification of convolutional neural networks (CNNs), to identify a participant's task (counting objects vs aesthetic judgement) based only on saliency maps derived from raw eye fixation coordinate data. Traditional CNNs have been widely used for classification of visual data, including types of flowers or species of animals, and are highly accurate in many situations. CNNs function analogously to the human visual system, with highly sparse representations such as edges or patches leading to progressively more specified layers such as faces or noses, and ultimately followed by subject classification based on the activation of each layer. However, CNNs suffer from an inability to process spatial information, which restricts their utility for ambiguous or irregular images, or images where spatial information is especially important -- both of which are the case when discriminating between saliency maps of varying tasks. By introducing capsules, the neural network is able to utilize spatial locations of features such as a cluster of fixation points around an expected search target versus a more spread out cluster for a non-target. Results show that CapsuleNets improve accuracy rates and minimize loss for saliency map classification by up to 35% compared to traditional CNNs. This method of saliency map analysis provides a method for classification of eye movement data that is highly generalizable to different tasks and stimuli types.
Meeting abstract presented at VSS 2018