As further motivation for our work, we noticed that many computational models that aim to predict where people look used features at multiple scales. Some models that predict where people look are biologically inspired bottom-up computational models based on multiscale low-level image features (Hou & Zhang,
2007; Itti & Koch,
2000; Itti, Koch, & Niebur,
1998; Koch & Ullman,
1985; Li,
2002; Parkhurst, Law, & Niebur,
2002; Parkhurst & Niebur,
2003; Peters, Iyer, Itti, & Koch,
2005; Privitera & Stark,
2000; Rosenholtz,
1999; Torralba,
2003; van Zoest, Donk, & Theeuwes,
2004). Other models include top-down features such as face and person detection (Cerf, Harel, Einhäuser, & Koch,
2008; Hershler & Hochstein,
2005,
2006; VanRullen,
2006), horizon line or context detection (Ehinger, Hidalgo-Sotelo, Torralba, & Oliva,
2009; Torralba, Oliva, Castelhano, & Henderson,
2006), text detection, object detection, or a combination of many of them (Judd, Durand, & Torralba,
2009; Oliva, Torralba, Castelhano, & Henderson,
2003). Others use mathematical approaches to predict fixations (Avraham & Lindenbaum,
2009; Bruce & Tsotsos,
2006,
2009; Kienzle, Wichmann, Scholkopf, & Franz,
2007) or natural image statistics (Zhang, Tong, Marks, Shan, & Cottrell,
2008). Some models also attempt to predict fixations during visual search task where low-level image features have little or no impact on fixations (Einhäuser, Rutishauser, & Koch,
2008; Einhäuser, Spain, & Perona,
2008; Henderson, Brockmole, Castelhano, & Mack,
2006; Navalpakkam & Itti,
2007; Rao, Zelinsky, Hayhoe, & Ballard,
2002; Tsotsos et al.,
1995; Underwood, Foulsham, van Loon, Humphreys, & Bloyce,
2006). For the design of future models, it is interesting to get a notion as to whether all levels of image features are equally important.