Abstract
Eye-tracking studies over static scenes typically yield a large number of fixations that need to be grouped into meaningful clusters to obtain salient regions of human interest. Such salient regions give valuable insights about top-down and bottom-up cues in the visual scene. I develop an information theory based measure, to improve the detection of salient regions from eye fixation data over social and natural scenes with multiple entities of human interest. Eye fixation clustering approaches typically rely on (a) prior knowledge of number of clusters and distance based thresholds [Santella et. al., ETRA, 2005] (b) priors on typical object sizes and inter-object distances to identify long saccades that might land on new regions [Katti et. al. ACM MM, 2010] (c) thresholds on visual conspicuity to detect modes in smoothed fixation distributions [Judd et. al., ICCV, 2009]. My method uses KL divergence to assess information gain due to changes in the distribution of regions of interest discovered by different threshold values for methods of type (a), (b) and (c). For each scene with accompanying eye fixation data, clusters arising out of successive threshold choices ‘i' and ‘i+1’ are converted into continuous probability distributions Di and Di+1. Cumulative sum of absolute KL divergence values ͨ1; abs(KL(Di+1 || Di )) over successive threshold choices ‘i' and ‘i+1’ are then computed, this cumulative sum grows slowly over threshold choices that can reliably estimate clusters embedded in the fixation data and in turn helps to identify a good range of choices for threshold. Additionally, this measure can also score scenes for the presence of strong visual structure arising out of social and affective cues or aesthetic placement of visual elements. Analysis over scenes from three large scale public eye-fixation datasets demonstrates effectiveness of the proposed measure, illustrative results have been provided as supplementary material.
Meeting abstract presented at VSS 2013