Several researchers have proposed top-down or goal-driven saliency in their models. They use some high-level features based on those from earlier databases, and conduct learning mechanisms to determine model parameters. Tatler and Vincent (
2009) proposed a model incorporating the oculomotor behavioral biases and statistical model to improve fixation prediction. They believe that a good understanding of how humans move their eyes is more beneficial than salience-based approaches. Torralba, Oliva, Castelhano, and Henderson (
2006) proposed an attentional guidance approach that combines bottom-up saliency, scene context, and top-down mechanisms to predict image regions likely to be fixated by humans in real-world scenes. On the basis of a Bayesian framework, the model computes global features by learning the context and structure of images, and the top-down tasks can be implemented in the scene priors. Cerf, Frady, and Koch (
2009) proposed a model that adds several high-level semantic features such as faces, text, and objects to predict human eye fixations. Judd, Ehinger, Durand, and Torralba (
2009) proposed a learning-based method to predict saliency. They used 33 features including low-level features such as intensity, color, and orientation; mid-level features such as a horizon line detector; and high-level features such as a face detector and a person detector. The model used a support vector machine (SVM) to train a binary classifier. Lee, Huang, Yeh, and Chen (
2011) proposed a model that adds faces as high-level features to predict salient positions in video. They used a support vector regression (SVR) technique to train the relationship between features and visual attention. Zhao and Koch (
2011) proposed a model similar to that of Itti, Koch, and Niebur (
1998), but with faces as an extra feature. Their model combines feature maps with learned weightings and solves the minimization problem using an active set method. The model also implements center bias in two ways. Among the models described above, some focus on adding high-level features to improve predictive performance, while others use machine learning techniques to clarify the relationship between features and their saliency. However, the so-called high-level features are blur concepts and do not encompass all types of environments. Moreover, most of the learning processes failed to consider that the same feature values likely have different saliency values in different contexts.