Abstract
Can multiscale image statistics (Portilla & Simoncelli, 2000) explain ensemble perception phenomena better than feedforward object recognition? In the conventional view, objects' properties are rapidly measured and averaged to provide scene gist information (Alvarez, 2011), but this mechanism doesn't fit patterns of human performance. We showed (Cain, Dobkins, Vul, VSS 2016) that mean circle size judgments were systematically biased when comparing different numbers of items: participants selected the display with more items as larger, incurring robust point of subjective equality (PSE) shifts. Others have noted this perturbation (Chong & Treisman, 2005; Sweeny, et al., 2014), yet maintain the conventional view. We replicated the experiment and developed a computational model of how texture statistics could—without individuating or measuring objects—explain both the successes and failures of human ensemble perception. We trained linear support vector machines (SVMs) with three feature sets that reflect increasing amounts of image structure: pixel statistics (multiscale luminance properties, 16 features), marginal statistics (pixel statistics + autocorrelations, 421 features), full texture statistics (marginal statistics + crosscorrelations, 2456 features). The 48 easiest Equal set-size trials formed the training set; on each 2AFC trial, we computed the difference in these statistics between the two displays. Each SVM's classifications on the remaining 768 trials were used to fit its psychometric function. Compared to humans' PSE shifts, the pixel SVM's PSE shift was too extreme (a 2:1 ratio, as predicted), while both higher-order statistics (marginal and full) matched humans' PSE shifts. Because the marginal and full SVMs responded identically, the additional crosscorrelation features are unnecessary for explaining human behavior on this task, while the autocorrelations of the marginal statistics are crucial. Our ideal observers successfully reproduce robust human biases on an ensemble mean task—without explicit object representation. This texture representation approach could be applied to arbitrary scene stimuli to more parsimoniously explain parallel preattentive processing.
Meeting abstract presented at VSS 2018