Abstract
How do features interact to guide human gaze? Models of human attention have assumed a linear combination of features to construct a final saliency map that drives human overt/covert attention. We asked what role, if any, do second order feature interactions play in attracting attention.
We examined the eye movements of 8 subjects while they watched videos containing natural and synthetic scenes. A set of five low-level feature channels including Color (C), Intensity (I), Orientation (O), Flicker (F) and Motion (M) using center-surround differences were computed for each of 46,489 video frames shown to the subjects. A total of 11,430 saccades were analyzed. We compared 4 models including i) simple unweighted sum of 1st-order terms, i.e., C, I, O, F, M; ii) a weighted linear sum of 1st-order terms; iii) unweighted sum of 1st-order and all 2nd-order multiplicative feature interaction terms (e.g., CC, CI, CO, CF, CM, etc.); and iv) a weighted linear combination of 1st and 2nd order terms. For the weighted combinations, the weights were learned using a genetic algorithm (GA) that optimizes a cost function defined as the difference between the distribution of human saccade end points and salient locations as computed by the respective models. The optimal solution was found from a large search space of size 2^20 for the model incorporating 1st order terms and 2^80 for the model incorporating both 1st and 2nd order terms respectively.
We found that the optimized 1st order model performed significantly better than all other models (p<0.05). Further we found that models using 2nd-order interactions did not improve the predicitive power of a model in explaining eye movements of human subjects.