Our psychophysical task entails learning the shape of a novel silhouette. The brief display time makes the task challenging because the visual information obtained from one fixation is insufficient to determine the shape exactly, given the reduced resolution of the periphery. In this
1, we describe a probabilistic model for representing visual information about the stimulus and how this representation is updated with information acquired from new fixations.
We represent the stimulus shape as a collection of edgelets or small straight-line segments that approximate the continuous shape boundary. Each edgelet can assume any one of eight possible orientations, which is a discretization of all possible orientations from 0° to 180°. There are a total of n edgelet orientations along the boundary, labeled x i, where i = 1, 2, …, n, and x i = 1, 2, …, 8 for each i. We set n to be equal to the number of boundary or edge pixels. The edgelet orientations are unknown to the observer and need to be inferred from visual information.
We have defined this edgelet representation rather than a pixel-based representation to reduce our computational load. This simplification incorrectly assumes perfect knowledge of edge locations in the stimulus, completely ignoring positional uncertainty. However, positional uncertainty is roughly 10-fold less than orientation uncertainty across eccentricities (Levi et al.
1985; White, Levi, & Aitsebaomo,
1992). Thus, ignoring positional uncertainty is unlikely to affect the topology of our strategy prediction maps.
The visual information obtained about the edgelets is modeled using the responses of a bank of filters that measure the frequency of each orientation in a local region. The responses of a population of oriented filters within a neighborhood of radius
r(
E) are represented as a histogram over eight orientations. We choose
r(
E) to be equal in size to a “perceptive hypercolumn,” as described by Levi et al. (
1985) for vernier acuity in the periphery. Specifically,
r(
E) is the distance at which small flankers begin to elevate thresholds for a vernier acuity stimulus. It is thought that these flankers encroach on the orientation-selective cells that are analyzing the vernier stimulus and is, therefore, a rough measure of orientation hypercolumns. As this is a perceptual finding, Levi et al. coin it the perceptive hypercolumn. Quantitatively,
where
E2 is the eccentricity at which acuity drops to half its value in the fovea and
s is the slope. We further interpret
r(
E) as an effective radius over which the visual system spatially pools orientation information (
Figure A1). Unpublished data from our laboratory support the Levi et al. parameters.
More precisely, let
E i(
F) denote the eccentricity of location
i relative to fixation
F. Thus, we write
r(
E i(
F)) for the radius of the histogram at edgelet
i given fixation
F. The histogram is normalized by the total number of edgelets within the radius so that all the histogram entries sum to 1. For each edgelet
i viewed from fixation
F, we denote the histogram by
h i(
F), where the boldface indicates that it is a vector with eight components (see
Figure A1).
The histogram h i( F) provides a summary of the shape boundary near edgelet i. If the boundary is perfectly straight within the receptive field radius, then the histogram will show the presence of only one orientation in the entire local population, which uniquely determines all the edgelet orientations in that population. Conversely, a flatter, higher entropy histogram indicates that the local shape is more complex.
We model the evidence that h i( F) provides about edgelet orientation x i using a simple likelihood model: P( h i( F)| x i, E i( F)) = h i, xi( F)/ Z, where h i, xi( F) is the x ith component of h i( F), that is, the fraction of edgelets within the pooling neighborhood with orientation x i. Z is a normalization constant. For an intuitive interpretation of the likelihood function, notice that if h i( F) is 0 for some component x i = z, then no edgelet in the local population has orientation z; thus, the likelihood function P( h i( F)| x i, E i( F)) equals 0 for x i = z, which rules out the possibility that x i = z. Conversely, the higher the value of P( h i( F)| x i, E i( F)) for any component value x i = z, the more likely that the true value of x i is actually z.
A simple uniform prior is placed on the distribution of orientations:
P(
x i) = 1/8, which means that all orientations are a priori equally likely. Using Bayes' rule, we obtain the posterior distribution
where
Z′ is a normalization constant.
Given the posterior distributions over all the edgelets, the entropy map is defined as described in the next section. Similarly, the posterior probability can be updated for multiple fixations
F 1 and
F 2 as follows:
where
Z 2 is a normalization constant. Although this approximation has some undesirable properties (such as making the marginal distribution more peaked if the same fixation is made repeatedly), it provides a simple mechanism for combining histogram evidence from multiple, distinct fixations.