Abstract
Recent work has highlighted a seemingly sharp divergence between human and machine vision: whereas people exhibit a shape bias, preferring to classify objects according to their shape (Landau et al. 1988), standard ImageNet-trained CNNs prefer to use texture (Geirhos et al. 2018). However, existing studies have tested people under different conditions from those faced by a feedforward CNN, presenting stimuli long enough for feedback and attentive processes to come online, and using tasks which may bias judgments towards shape. Does this divergence remain when testing conditions are more fairly aligned? In six pre-registered experiments (total N=1064) using brief stimulus presentations (50ms), we asked participants whether a stimulus exactly matched a target image (e.g. a feather-textured bear). Stimuli either matched (a) exactly (the same image), (b) in shape but not texture (“shape lure”, e.g. a pineapple-textured bear), (c) in texture but not shape (“texture lure”, e.g. a feather-textured scooter), or did not match in either shape or texture (“filler”). We tested whether false-alarm rates differed for shape lures versus fillers, for texture lures versus fillers, and for texture lures versus shape lures. This paradigm avoids explicit object categorization and naming, allowing us to test whether a shape bias is already present in perception, regardless of how shape is weighted in subsequent cognitive and linguistic processing. We find that people do rely on shape more than texture, false-alarming significantly more often for shape lures than texture lures. However, although shape-biased, participants are still lured by texture information, false-alarming significantly more often for texture lures than for fillers. These findings are robust to stimulus type (including multiple previously studied stimulus sets) and mask type (pink noise, scramble, no mask), and establish a new benchmark for assessing the extent to which feedforward computer vision models are “humanlike” in their shape bias.