Abstract
Movie posters, social graphics, and product advertisements are designed to capture the viewer’s attention and efficiently redirect it to relevant information. For natural images, computational models of saliency have been successful at predicting where the average observer is likely to look. However, such models have been lacking for graphic designs.
Knowing what holds an observer’s attention for a longer period, rather than what pops out in a bottom-up manner, is more relevant for graphic designs, which are composed of image and text elements. Analogous to saliency, we define an ‘importance map’ as a heatmap providing a real value at each image pixel indicating the probability that observers would find that image region important. In contrast to saliency, instead of collecting ground truth data using an eye tracker, we ask human participants to annotate regions of graphic designs they think are important (using methodology from O’Donovan 2014). Averaging the annotations of 25-30 observers generates smooth, ground truth importance maps. We collected importance maps for 1000 designs across 5 different classes: webpages, movie posters, mobile UIs, infographics, and advertisements, and present this as the Imp1k dataset.
We also introduce a computational model of importance for graphic designs, trained using Imp1k. Our model is a deep neural network that can simultaneously predict the class of a graphic design with 95% accuracy, and can predict the importance maps with a Pearson’s Cross Correlation of 0.827 (KL score of 0.159) compared to ground truth. We extend our model by training it with natural images as a 6th class, and demonstrate that the same model can be used to predict saliency maps on natural images and importance maps on graphic designs. Finally, we show how our Unified Model of Saliency and Importance (UMSI), can be used to generate automated suggestions within interactive design applications.