Abstract
While the rhesus monkey is widely used as an animal model of human visual processing, it is not known if high-level visual behaviors, such as invariant object recognition, are quantitatively comparable across rhesus monkeys and human. To address this question, we systematically compared the object recognition behavior of two monkeys (M, Z) with that of human subjects. To enforce true object recognition behavior (rather than image matching), several thousand naturalistic images, each with one foreground object, were generated by rendering a 3D model of each object with randomly-chosen viewing parameters (2D position, 3D rotation and viewing distance) and placing that foreground object view onto a randomly-chosen, natural image background. Monkeys were trained on a match-to-sample paradigm, with 100ms foveal presentation of a sample image (randomly-chosen among thousands possible) followed immediately by lateral presentation of two response images, each displaying a single canonical-view object. Monkey M responded by holding gaze fixation over the selected image for 700ms, while monkey Z touched the selected image on a touchscreen. Data from 554 human subjects performing the same tasks on Mechanical Turk were aggregated to characterize mean human object recognition behavior, as well as 25 separate MTurk subjects to characterize individual human subject behavior. To date, we have compared monkeys and humans on 16 objects. Our results show that monkeys not only match human performance, but show a pattern of object confusion that is highly correlated with pooled human subject confusion patterns (M: 0.8550; Z: 0.8148; noise corrected r), and is statistically indistinguishable from individual human subjects (p=0.48, exact test). Importantly, these common patterns of 3D object confusion are not shared with low level visual representations (pixels, V1-like). Taken together, these results suggest that rhesus monkeys and humans share a neural "shape" representation that directly underlies object perception.
Meeting abstract presented at VSS 2014