Abstract
How do we know that a kitchen is a kitchen by looking? Here we start with the intuition that although scenes consist of visual features and objects, scene categories should reflect how we use category information. In our daily lives, rather than asking ourselves whether we are in (for example) a “beach” scene, we tend to use scene category information for actions such as navigation and search. Because we act within scenes, we test the hypothesis that scene categories arise from scene functions. We collected a large-scale scene category similarity matrix (5 million trials) by asking observers to simply decide whether two images were from the same or different categories. To serve as a noise ceiling (maximum possible correlation) for comparison among computational models, we used boostrapping to compute the correlation among observers (r=0.75). Using the actions from the American Time Use Survey, we normed these scenes by what actions could be done in each scene (1.4 million trials). We found a strong relationship between category distance and functional distance (r=0.50, or 66% of the maximum possible correlation from noise ceiling). Furthermore, the function-based model outperformed alternative models of object-based similarity (r=0.33), visual features from a convolutional neural network (r=0.39), lexical distance between category names (r=0.27), and models of simple visual features such as color histograms (r=0.09), gabor wavelets (r=0.07), and gist descriptor (r=0.11). Using hierarchical linear regression, we found that functions captured 85.5% of overall explained variance, with nearly half of the explained variance captured only by functions, implying that the predictive power of alternative models was due to their shared variance with the function-based model. These results challenge the dominant school of thought that visual features and objects are sufficient for categorization, and provide immediately testable hypotheses about the functional organization of the human visual system.
Meeting abstract presented at VSS 2015