Abstract
The primate visual system is hierarchically organized (Felleman and Van Essen, 1991), such that cells in higher visual areas respond to increasingly complex image features (Duffy and Wurtz, 1991; Kobatake and Tanaka, 1994; Movshon et al., 1985). A widespread assumption is that the complexity of the neuronal tuning in higher visual areas results from the combination of simpler features (coded by the pattern of neural activity from afferent cells in lower visual areas). This feature combination scheme is thought to be conjunctive aka AND-like (Baker, Behrmann, and Olson, 2002; Wang, Tanifuji, and Tanaka, 1998) – the response to a combination of parts can be supra-linear (super-additive). Also, the locations of the combined features are important – sub-regions within a cell's receptive field can have different orientation preferences (Anzai, Peng, & Van Essen, 2007). However, this “conjunction of localized features” mental model has rarely been explicitly or thoroughly tested, despite its simplicity. Furthermore, this qualitative mental model can be implemented in a variety of ways, and it is unclear which (if any) specific computational implementation is able to account for the seemingly disparate phenomena in various ventral and dorsal visual areas. In fact, it is unclear if these visual areas even share a common computational mechanism with regard to increasing feature complexity. Here, we review the evidence supporting such a mental model for both the ventral and dorsal streams. In addition, we present a specific computational implementation, taken from a more general model of visual processing (Jhuang et al., 2007; Riesenhuber and Poggio, 1999; Serre et al., 2007), and show that it is able to account for a variety of results in ventral and dorsal visual areas. Overall, we find evidence that a conjunction of localized features may indeed underlie complex shape and motion feature selectivity.