Abstract
Feature integration theory (FIT) provides a framework for parsing the visual input into basic features and for binding those features into integral percepts. The idea of feature parsing and integration is still central to mechanistic explanations of visual search: search performance becomes less efficient when a search target is defined by multiple features compared to a single feature. However, this framework has been empirically tested with basic, localized features (e.g., color, orientation). In our study, we expand the FIT to ecologically more realistic scene features. We conducted a series of visual search experiments where participants searched for a target scene among distractor scenes. The target and distractor scenes were defined from a 2-dimensional parametric feature space of indoor scenes. In particular, we manipulated complex high-level features, such as indoor lighting and scene layout, using Generative adversarial networks. Along each axis of the space, we generated target and distractor scenes. The target scenes can be discriminated from distractors either based on a single feature or the conjunction of two features. When participants performed this task across different search array set sizes, we observed that search RT and accuracy became inefficient when the target was defined by feature conjunction. The effect survived after luminance and RMS contrast were ruled out as potential confounds. These results extend the idea of the FIT framework to ecologically realistic scene features.