Abstract
Subjects perform poorly at recognizing upside-down faces. The face-inversion effect is in contrast to subjects’ performance with inverted objects, which is not as drastically impaired. Experimental results suggest that a similar effect, though to a lesser degree, may be seen in the inversion of mono-oriented objects, such as cars, where subjects’ performance on inverted mono-oriented objects is between that of faces and other objects. We build an anatomically-inspired neurocomputational model to explore this effect. Our model includes a foveated retina and the log-polar mapping from the visual field to V1. This transformation causes changes in scale to appear as horizontal translations, leading to scale equivariance. Rotation is similarly equivariant, leading to vertical translations. When fed into a standard convnet, this provides rotation and scale invariance. It may be surprising that a rotation-invariant network shows any inversion effect at all. This is because there is a crucial topological difference between scale and rotation: Rotational invariance is discontinuous, with V1 ranging from 90 degrees (vertically up) to 270 degrees (vertically down). Hence, when a face is inverted, the configural information in the face is disrupted, while feature information is relatively unaffected. We show that as the model learns more faces, i.e., becomes a face expert, its performance on inverted faces degrades. A similar effect occurs when the model learns car models, but not when it learns cars at the basic level. A standard convolutional network shows the same effect, except that its inverted performance degrades to near 0, unlike the log-polar network. This suggests that the inversion effect arises as a result of visual expertise, where configural information becomes relevant as more objects are learned at the subordinate level. Contrary to conventional wisdom, the model suggests that the inversion effect begins at the level of V1, rather than at higher-level visual areas.