Abstract
Rapid categorization has been extensively studied over the past years. How the visual system achieves object recognition with such speed and accuracy remains however a matter of debate. Here we show that a specific implementation (Riesenhuber & Poggio, 1999; Serre & Poggio, VSS 2005) belonging to a class of feedforward theories of object recognition - that extend the Hubel & Wiesel simple to complex cell hierarchy from V1 to AIT - can predict the pattern of performance achieved by human observers on a rapid animal vs. non-animal categorization task.
We generated a balanced set of stimuli by selecting animal images from four different subcategories based on body size and viewing distance from the camera (from heads to full bodies in clutter). Recognition performance by human observers (n = 21) was tested with a backward-masking paradigm, i.e., 20 ms stimulus presentation followed by a variable inter-stimulus interval (ISI) then a 80 ms mask duration.
We found that the feedforward model could predict the pattern of performance of human observers (both HIT and d') on the different animal subcategories for an ISI of 30 ms with an overall correlation between the model and the human observers p = 0.72, p < 0.01. To further challenge the model we tested the effect of image rotation on recognition performance. Consistent with previous psychophysics results (Guyonneau et al., ECVP 2005), both human observers and the model were fairly robust to image orientation.