In anatomical terms, the different layers of our model can be interpreted as follows. The input layer represents incoming image information by Gabor wavelets, which resemble the receptive field properties in primary visual cortex (V1). The biological counterpart of our Assembly Layer would be an area like central or anterior inferotemporal cortex. Neurons here respond to stimuli from large parts of the visual field, and they code for complex shapes (Tanaka,
1996,
2003) similar to the face parts represented by the Assembly Layer. The fact that information about object position and scale can be read out from IT neurons (Hung, Kreiman, Poggio, & DiCarlo,
2005), which disagrees with the assumptions made by pure pooling models, points to the possibility of our control units residing there as well. Of course, in the cortex the mapping from V1 to IT does not happen directly, but via intermediate stages including V2 and V4. This is not accounted for in our current model but will be included in future extensions. We have described previously the likely form (Wolfrum & von der Malsburg,
2007b) of such routing over several stages and a possible ontogenetic mechanism (Wolfrum & von der Malsburg,
2007a). Finally, the Gallery of our model might correspond to an area like the fusiform face area (ffa), which is specialized for face recognition (Kanwisher & Yovel,
2006; Tsao, Freiwald, Tootell, & Livingstone,
2006). Note that the detection of faces (as distinct from recognition) is not modeled by us. Also in the brain, this appears to happen outside of ffa. Summerfield et al. (
2006) find neurons in medial frontal cortex that are selectively active when subjects have to make a face vs. non-face decision, independently of face identity. Likewise, prosopagnosia patients recognize objects as faces but cannot identify them (Zhao, Chellappa, Phillips, & Rosenfeld,
2003). As discussed before, face recognition is special because faces have a generic shape, but recognition from thousands of individuals requires high sensitivity to detailed differences. This might become possible through competitive interaction (which in fact is the mechanism by which recognition happens in our Gallery Layer) in the small and compact ffa (Kanwisher,
2006). Apart from faces, there is evidence suggesting that ffa can also serve as an area of expertise for other object classes (Gauthier, Skudlarski, Gore, & Anderson,
2000; Tarr & Gauthier,
2000). In the same sense, our model is not confined to face recognition but could be used for recognition of any kind of object type that has a prototypical shape and requires high sensitivity to small differences among objects.