Abstract
Indoor scenes typically contain a wealth of cues that are potentially useful for recognition, including characteristic global spatial properties and objects contained in the scene. Which of these cues do people actually use for accurate and quick scene categorization? Here we investigated this question in the spatial frequency (SF) domain. Using the SF Bubbles technique (Willenbockel et al., 2010), we examined which SFs are significantly correlated with observers' RTs in a scene categorization task with four categories — bathroom, bedroom, kitchen, and office. The base stimuli consisted of 800 gray-scale photographs (256 x 256 pixels) equated in luminance histograms and rated as typical exemplars of the respective category. They were SF filtered trial-by-trial using 20 randomly distributed Gaussian "bubbles" with a standard deviation of 1.8. The stimuli were presented in random order at a visual angle of 6 degrees and remained on the screen until the observer's response. Observers were instructed to press the space bar as soon as they recognized the scene category and, upon stimulus offset, press the respective key for the correct category. Feedback was provided after each trial. Mean accuracy across observers was 88.16% correct; mean RT was 486 ms. RTs did not differ significantly between categories. A multiple linear regression on the transformed RTs from correct trials and the respective SF filters revealed two SF bands significantly linked with fast responses: one around 3 cycles per image (cpi) and one around 28 cpi (octave width about 1). Interestingly, the latter SF band overlaps with the SFs found to be correlated with fast and accurate object recognition (Caplette et al., 2014). Our results suggest that people use a combination of the scene gist conveyed by low SFs and object information conveyed by a narrow band of relatively high SFs for effectively recognizing complex indoor scenes.
Meeting abstract presented at VSS 2017