Abstract
The effect of visual task on the pattern and parameters of eye-movements has been long investigated in the oculomotor studies of human vision. However, there is not much done in the inverse process; that is inferring the visual task from eye-movements. Visual search is one of the main ingredients of human vision that plays an important role in our everyday life. In our previous work we developed an ergodic HMM-based model to infer the visual task in pop-out search by locating the focus of covert attention. In this paper, we improve our previous model to infer the task in a conjunction search in an eye-typing application, where users can type a character string by directing their gaze through an on-screen keyboard. In this scenario inferring the task is equivalent to figuring out what word has been eye-typed. The inherent complexity of conjunction search usually calls for off-target fixations before locating the target. However, these off-target fixations are not randomly distributed and show a pattern according to the target. The brain tends to direct the gaze on objects that are seemingly similar to the target. Therefore, we propose a tri-state HMM (TSHMM) to model the attention cognitive process of human brain, where the three states represent the fixations on the target, similar non-target and dissimilar non-target objects. We train a TSHMM for each character by using the Baum-Welch algorithm to capture the dynamics of attention during the search and construct a lexicon network by concatenating the characters and use a technique called token passing to reveal the best state sequence for the test data. The results show a great improvement compared to our previous model. We can further improve the results by setting a-priori constraints on the order of characters by making a dictionary of valid words.
Meeting abstract presented at VSS 2012