Open Access
Article  |   August 2018
Frequency tuning of shape perception revealed by classification image analysis
Author Affiliations
Journal of Vision August 2018, Vol.18, 9. doi:https://doi.org/10.1167/18.8.9
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      John Wilder, Ingo Fruend, James H. Elder; Frequency tuning of shape perception revealed by classification image analysis. Journal of Vision 2018;18(8):9. https://doi.org/10.1167/18.8.9.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Classification image analysis is a powerful technique for elucidating linear detection and discrimination mechanisms, but it has primarily been applied to contrast detection. Here we report a novel classification image methodology for identifying linear mechanisms underlying shape discrimination. Although prior attempts to apply classification image methods to shape perception have been confined to simple radial shapes, the method proposed here can be applied to general 2-D (planar) shapes of arbitrary complexity, including natural shapes. Critical to the method is the projection of each target shape onto a Fourier descriptor (FD) basis set, which allows the essential perceptual features of each shape to be represented by a relatively small number of coefficients. We demonstrate that under this projection natural shapes are low pass, following a relatively steep power law. To efficiently identify the observer's classification template, we employ a yes/no paradigm and match the spectral density of the stimulus noise in FD space to the power law density of the target shape. The proposed method generates linear template models for animal shape detection that are predictive of human judgments. These templates are found to be biased away from the ideal, overly weighting lower frequencies. This low-pass bias suggests that higher frequency shape processing relies on nonlinear mechanisms.

Introduction
A considerable portion of primate visual cortex appears to be involved in the coding of object shape (Connor, Brincat, & Pasupathy, 2007). Although objects in our visual world are generally three dimensional, the boundary of a 3-D object projects to the retina as a closed contour, and the shape of this planar contour provides an important cue for object detection and recognition (Elder & Velisavljević, 2009). In this paper, we focus on the visual coding of this planar shape information. 
There are many theories of planar shape representation but little consensus on which is the best account of human shape perception. Early theories (Attneave, 1954; Hoffman & Richards, 1984) emphasized local features of the contour, such as curvature extrema and inflections, and later work generalized these features across scale-space (Mokhtarian & Mackworth, 1986; Dubinskiy & Zhu, 2003). Alternatively, shape can be encoded as a sum over global basis functions of the contour (Granlund, 1972; Pavlidis, 1980). 
An alternative to a contour-based representation is a symmetry axis representation in which a planar shape is represented by its skeleton and an associated distance function (Blum, 1973; Feldman & Singh, 2006; Kimia, Tannenbaum, & Zucker, 1995). This approach has the advantage of making perceptually salient shape symmetries explicit and capturing regional properties of shape that are not directly represented by contour methods. 
A third class of theory considers shape as a process of transformation or growth (Elder, Oleskiw, Yakubovich, & Peyré, 2013; Grenander, Srivastava, & Saini, 2007; Jain, Zhong, & Lakshmanan, 1996; Leyton, 1989; Sharon & Mumford, 2006; Thompson, 1917). These theories have a natural generative expression that can be used to support inference with noisy or incomplete visual data and are more able to capture topological properties of objects (Elder et al., 2013). 
In order to test these theories as models for human shape perception, a general method for measuring human sensitivity to shape features is required. Ideally, the method will not be biased toward a specific theory and can be applied to a wide variety of shapes, including natural shapes. One candidate is the classification image methodology, which measures how human judgments vary with random perturbations in the stimulus in order to identify selectivity for visual features. This methodology has been used effectively to reveal aspects of linear (Beard & Ahumada, 1998; Murray, Bennett, & Sekuler, 2002) and nonlinear (Morgenstern & Elder, 2012; Nandy & Tjan, 2007; Solomon, 2002) spatial contrast encoding. The question we ask in this paper is whether it can be applied to shape perception. 
Classification images
In a standard yes/no classification image experiment, the observer is required to discriminate between two deterministic visual signals (signal 1 and signal 0; Figure 1). On each trial, a random sampler selects one of these two signals, and a random sample of white Gaussian spatial pixel noise is added. The resulting noisy stimulus is then displayed to the observer, who must decide which of the two signals is present. 
Figure 1
 
The standard classification image experiment.
Figure 1
 
The standard classification image experiment.
It can be shown (Green & Swets, 1966) that, to generate the largest proportion of correct responses, an ideal observer must have an accurate model, called a template, of the difference (signal 1 − signal 0) of the two (noiseless) signals (Figure 2). The ideal observer then computes the inner product of this template with the noisy stimulus and forms a decision by comparing this decision variable with a fixed threshold. 
Figure 2
 
The linear template model of visual detection.
Figure 2
 
The linear template model of visual detection.
If a human uses the same strategy but with an imperfect template and internal additive Gaussian noise, that template can be estimated with the classification image methodology. Specifically, an unbiased estimate of the template is obtained by computing the mean of the stimulus noise for each of the four possible (signal, response) pairs, adding the two means for the response = signal 1 trials and subtracting the two means for the response = signal 0 trials (Ahumada, 2002; Beard & Ahumada, 1998; Murray et al., 2002). We henceforth refer to this method for template estimation as the noise-averaging method. 
It has also become increasingly popular to view this problem within the framework of the generalized linear model (GLM) and to estimate the template by maximizing the likelihood of the observer's responses (Abbey & Eckstein, 2001; Knoblauch & Maloney, 2008; Murray, 2011; Solomon, 2002). Once again, we assume that an observer's judgment is formed by computing the inner product of an imperfect template with the noisy stimulus and then comparing against a fixed threshold. Logistic regression generates a maximum likelihood estimate of the template under the assumption that internal noise follows a logistic distribution, and probit regression assumes that internal noise follows a normal distribution. 
Although the GLM approach is not guaranteed to be unbiased, by explicitly maximizing the likelihood of the data, it may be more efficient than the noise-averaging method. Because studies of the relative efficiency of the two methods have yielded mixed results (Abbey & Eckstein, 2001; Knoblauch & Maloney, 2008; Murray, 2011), we evaluate both approaches to the estimation of shape templates (see Linear systems identification method). 
Applying the classification image methodology to shape
In the standard form of the classification image methodology, noise is added in the luminance domain in the form of a perturbation in the gray level at each pixel, and the method estimates the weight assigned to the gray level at each pixel. This is useful for estimating human contrast sensitivity but is not appropriate for understanding shape perception. This is because shape is defined not in the luminance domain, but in the spatial domain as a sequence of spatial (x, y) coordinates in the image. Thus, to understand shape perception, noise must be added not to the gray levels of the image but to the spatial coordinates of the shape. 
Kurki, Saarinen, and Hyvärinen (2014) have recently adapted the classification image methodology to explore the perception of an interesting class of planar shapes called radial frequency (RF) patterns. An RF pattern is a closed planar contour that can be conveniently represented in polar coordinates with origin at the center of the shape. RF patterns are, by definition, shapes that can be represented as a sum of simple radial basis shapes whose radial coordinates are sinusoidal functions of polar angle. 
To explore human perceptual sensitivity to features of these shapes, Kurki et al. (2014) displayed each shape as a pattern of bright spots and then added random noise to the radial coordinates of the elements. Under the assumption that human shape discrimination is based upon an inner product of the stimulus with a stored template of the difference in the radial position of these spots for the two signals, the classification image technique can be used to estimate this template. Specifically, Kurki et al. used classification image methodology to examine the perceptual sensitivity to perturbations of the so-called RF4 pattern, which has the appearance of a rounded square. By sampling these patterns at a relatively small number of locations, Kurki et al. were able to keep the dimensionality of the stimulus low, which allowed reasonably accurate estimates of the classification image. The results of these experiments suggest that human observers integrate information globally over the shape in order to form a judgment. 
We are inspired by this successful application of the classification image technique to simple shapes and in this paper seek to extend this success in a number of important ways. First, the stimuli employed by Kurki et al. (2014) consisted of a disconnected pattern of circular blobs that must be perceptually integrated into a coherent object. In order to dissociate this process of perceptual organization from the perception of shape, it is desirable to employ smooth, continuous contours. 
Second, although RF patterns are an interesting class of shape, shapes that are not roughly circular cannot generally be represented as RF patterns, and this excludes many natural shapes (e.g., animals, human bodies). Given that our visual system has evolved to process natural shapes, it is important to be able to directly measure how these more complex shapes are processed. 
Generalizing to complex natural shapes
In the present study, we use animal shapes drawn from the Hemera photo object database as visual stimuli. Animal shapes are of particular ecological relevance, figuring prominently in the earliest known examples of cave art (Aubert et al., 2014) and are known to be processed efficiently by the primate visual system (Elder & Velisavljević, 2009; Fabre-Thorpe, Richard, & Thorpe, 1998; Thorpe, Fize, & Marlot, 1996). We represent these shapes as high-resolution polygons, which appear as smooth continuous contours when rendered on a digital display. 
An immediate issue arises: These smooth continuous contour stimuli generally consist of thousands of points—too many dimensions to estimate efficiently and reliably using a classification image methodology. To address this issue, we employ a Fourier descriptor (FD) representation of shape (Granlund, 1972; Pavlidis, 1980) in which each vertex (x, y) of the polygon is represented as a coordinate x + yi in the complex plane. By taking the Fourier transform of the complex vector representing these vertices, we obtain the FD representation, which represents the complex amplitude of the shape at each frequency over the index space of the vector. We stress that by coding a shape as a function of arc length rather than polar angle, the FD representation generates a complete (i.e., invertible) description of an arbitrary polygon in the plane and, thus, can faithfully represent general shapes, including natural shapes. This is quite distinct from radial basis functions, which can only represent shapes that are functions of polar angle (e.g., convex shapes, star-like shapes). Importantly, one can capture the main features of a natural shape using only a small number of the lowest FD frequency components, limiting the dimensionality of the stimulus to a manageable level. 
To be precise, suppose a deterministic polygonal shape (the signal) consists of the complex N-vector s representing the sequence of N vertices sj, Display Formula\(\def\upalpha{\unicode[Times]{x3B1}}\)\(\def\upbeta{\unicode[Times]{x3B2}}\)\(\def\upgamma{\unicode[Times]{x3B3}}\)\(\def\updelta{\unicode[Times]{x3B4}}\)\(\def\upvarepsilon{\unicode[Times]{x3B5}}\)\(\def\upzeta{\unicode[Times]{x3B6}}\)\(\def\upeta{\unicode[Times]{x3B7}}\)\(\def\uptheta{\unicode[Times]{x3B8}}\)\(\def\upiota{\unicode[Times]{x3B9}}\)\(\def\upkappa{\unicode[Times]{x3BA}}\)\(\def\uplambda{\unicode[Times]{x3BB}}\)\(\def\upmu{\unicode[Times]{x3BC}}\)\(\def\upnu{\unicode[Times]{x3BD}}\)\(\def\upxi{\unicode[Times]{x3BE}}\)\(\def\upomicron{\unicode[Times]{x3BF}}\)\(\def\uppi{\unicode[Times]{x3C0}}\)\(\def\uprho{\unicode[Times]{x3C1}}\)\(\def\upsigma{\unicode[Times]{x3C3}}\)\(\def\uptau{\unicode[Times]{x3C4}}\)\(\def\upupsilon{\unicode[Times]{x3C5}}\)\(\def\upphi{\unicode[Times]{x3C6}}\)\(\def\upchi{\unicode[Times]{x3C7}}\)\(\def\uppsy{\unicode[Times]{x3C8}}\)\(\def\upomega{\unicode[Times]{x3C9}}\)\(\def\bialpha{\boldsymbol{\alpha}}\)\(\def\bibeta{\boldsymbol{\beta}}\)\(\def\bigamma{\boldsymbol{\gamma}}\)\(\def\bidelta{\boldsymbol{\delta}}\)\(\def\bivarepsilon{\boldsymbol{\varepsilon}}\)\(\def\bizeta{\boldsymbol{\zeta}}\)\(\def\bieta{\boldsymbol{\eta}}\)\(\def\bitheta{\boldsymbol{\theta}}\)\(\def\biiota{\boldsymbol{\iota}}\)\(\def\bikappa{\boldsymbol{\kappa}}\)\(\def\bilambda{\boldsymbol{\lambda}}\)\(\def\bimu{\boldsymbol{\mu}}\)\(\def\binu{\boldsymbol{\nu}}\)\(\def\bixi{\boldsymbol{\xi}}\)\(\def\biomicron{\boldsymbol{\micron}}\)\(\def\bipi{\boldsymbol{\pi}}\)\(\def\birho{\boldsymbol{\rho}}\)\(\def\bisigma{\boldsymbol{\sigma}}\)\(\def\bitau{\boldsymbol{\tau}}\)\(\def\biupsilon{\boldsymbol{\upsilon}}\)\(\def\biphi{\boldsymbol{\phi}}\)\(\def\bichi{\boldsymbol{\chi}}\)\(\def\bipsy{\boldsymbol{\psy}}\)\(\def\biomega{\boldsymbol{\omega}}\)\(\def\bupalpha{\unicode[Times]{x1D6C2}}\)\(\def\bupbeta{\unicode[Times]{x1D6C3}}\)\(\def\bupgamma{\unicode[Times]{x1D6C4}}\)\(\def\bupdelta{\unicode[Times]{x1D6C5}}\)\(\def\bupepsilon{\unicode[Times]{x1D6C6}}\)\(\def\bupvarepsilon{\unicode[Times]{x1D6DC}}\)\(\def\bupzeta{\unicode[Times]{x1D6C7}}\)\(\def\bupeta{\unicode[Times]{x1D6C8}}\)\(\def\buptheta{\unicode[Times]{x1D6C9}}\)\(\def\bupiota{\unicode[Times]{x1D6CA}}\)\(\def\bupkappa{\unicode[Times]{x1D6CB}}\)\(\def\buplambda{\unicode[Times]{x1D6CC}}\)\(\def\bupmu{\unicode[Times]{x1D6CD}}\)\(\def\bupnu{\unicode[Times]{x1D6CE}}\)\(\def\bupxi{\unicode[Times]{x1D6CF}}\)\(\def\bupomicron{\unicode[Times]{x1D6D0}}\)\(\def\buppi{\unicode[Times]{x1D6D1}}\)\(\def\buprho{\unicode[Times]{x1D6D2}}\)\(\def\bupsigma{\unicode[Times]{x1D6D4}}\)\(\def\buptau{\unicode[Times]{x1D6D5}}\)\(\def\bupupsilon{\unicode[Times]{x1D6D6}}\)\(\def\bupphi{\unicode[Times]{x1D6D7}}\)\(\def\bupchi{\unicode[Times]{x1D6D8}}\)\(\def\buppsy{\unicode[Times]{x1D6D9}}\)\(\def\bupomega{\unicode[Times]{x1D6DA}}\)\(\def\bupvartheta{\unicode[Times]{x1D6DD}}\)\(\def\bGamma{\bf{\Gamma}}\)\(\def\bDelta{\bf{\Delta}}\)\(\def\bTheta{\bf{\Theta}}\)\(\def\bLambda{\bf{\Lambda}}\)\(\def\bXi{\bf{\Xi}}\)\(\def\bPi{\bf{\Pi}}\)\(\def\bSigma{\bf{\Sigma}}\)\(\def\bUpsilon{\bf{\Upsilon}}\)\(\def\bPhi{\bf{\Phi}}\)\(\def\bPsi{\bf{\Psi}}\)\(\def\bOmega{\bf{\Omega}}\)\(\def\iGamma{\unicode[Times]{x1D6E4}}\)\(\def\iDelta{\unicode[Times]{x1D6E5}}\)\(\def\iTheta{\unicode[Times]{x1D6E9}}\)\(\def\iLambda{\unicode[Times]{x1D6EC}}\)\(\def\iXi{\unicode[Times]{x1D6EF}}\)\(\def\iPi{\unicode[Times]{x1D6F1}}\)\(\def\iSigma{\unicode[Times]{x1D6F4}}\)\(\def\iUpsilon{\unicode[Times]{x1D6F6}}\)\(\def\iPhi{\unicode[Times]{x1D6F7}}\)\(\def\iPsi{\unicode[Times]{x1D6F9}}\)\(\def\iOmega{\unicode[Times]{x1D6FA}}\)\(\def\biGamma{\unicode[Times]{x1D71E}}\)\(\def\biDelta{\unicode[Times]{x1D71F}}\)\(\def\biTheta{\unicode[Times]{x1D723}}\)\(\def\biLambda{\unicode[Times]{x1D726}}\)\(\def\biXi{\unicode[Times]{x1D729}}\)\(\def\biPi{\unicode[Times]{x1D72B}}\)\(\def\biSigma{\unicode[Times]{x1D72E}}\)\(\def\biUpsilon{\unicode[Times]{x1D730}}\)\(\def\biPhi{\unicode[Times]{x1D731}}\)\(\def\biPsi{\unicode[Times]{x1D733}}\)\(\def\biOmega{\unicode[Times]{x1D734}}\)\(j \in [0, \ldots ,N - 1{]}\). The low-pass FD representation of the shape is then the complex 2M-vector S, where  
\begin{equation}\tag{1}{S_k} = {1 \over N}\sum\limits_{j = 0}^{N - 1} {s_j} {e^{ - 2\pi ijk/N}},\quad k \in \left[ { - M, \ldots ,M - 1} \right].\end{equation}
 
Here k indexes frequency: k = 0 is the DC component that determines the location of the stimulus (fixed to zero), the sequence k = −1, −2, …, −M represents low to high negative frequencies and the sequence k = 1, 2, …, M − 1 represents low to high positive frequencies. (Although for real input signals Fourier coefficients for negative frequencies are simply the conjugates of the coefficients for the corresponding positive frequencies, this is not the case for complex signals.) 
To limit the dimensionality of the estimation problem, we require that MN. In the section on dimensionality, we demonstrate through simulation how using a smaller value for M leads to more accurate estimation of template coefficients. In our psychophysical experiments, we set M = 16; Figure 3 shows an example. Because we must represent both real and imaginary FD coefficients, this shape is described by a total of 64 parameters. These low-pass animal shapes will be used as signal 1 for our discrimination experiments. For signal 0, we construct a stimulus consisting only of the first fundamental (S−1 + S+1) of the FD representation for the corresponding animal shape with all other coefficients (including DC) set to zero. The resulting signal 0 stimulus traces an ellipse roughly approximating the animal shape. Note that, because the DC component and first fundamental are matched between signals 1 and 0, discrimination must be based on the 29 positive and negative higher frequency components k = ±2, …, ±(M − 1), −M
Figure 3
 
The Fourier descriptor (FD) representation of a planar shape.
Figure 3
 
The Fourier descriptor (FD) representation of a planar shape.
In our experiments, the visual stimulus Display Formula\({\bf{\tilde S}}\) will consist of one of the two possible signal shapes S corrupted by a complex Gaussian noise vector N added in the FD domain: Display Formula\({\bf{\tilde S}} = {\bf{S}} + {\bf{N}}\). The real and imaginary components of the noise are drawn independently. In the spatial domain, our noisy stimulus Display Formula\({\bf\tilde s}\) will be given by Display Formula\({\bf\tilde s} = {\bf{s}} + {\bf{n}}\), where n is the inverse Fourier transform of N. Note that because the Fourier transform is linear, n will also be Gaussian. 
Matching the noise to the signal
Although the standard classification image method uses white noise, natural images tend not to be white but low pass, containing more energy at low frequencies than high frequencies (Field, 1987). Natural shapes, when represented in the FD domain, also tend to be low pass. In particular, we find that, for animal shapes drawn from the Hemera photo object database, spectral density decreases roughly linearly in log–log space, indicating a power law: Display Formula\({S_k}\propto |k|^{ - \alpha }\). Figure 4 shows the spectral amplitudes for the three animal stimuli we employ in this study. We see that the amplitude spectra are roughly approximated by power laws with exponents α in the range 1.3 to 1.7. (Power law fits explain between 48% and 65% of the variance in these three examples.) 
Figure 4
 
Amplitude spectrum of the three animal shapes used in this study.
Figure 4
 
Amplitude spectrum of the three animal shapes used in this study.
Due to the low-pass nature of the shapes, adding sufficient white stimulus noise to reduce human performance to 75% correct would drive the signal-to-noise ratio (SNR) for higher frequencies to zero, rendering them useless and preventing any estimation of higher frequency coefficients of the human template. Also, there would be a large difference in the amplitude spectrum of signal 0 (white) and signal 1 (low-pass) stimuli, which might dominate the observer's decision at the expense of phase information. For these reasons, we elected to roughly match the spectral density of the added noise to the spectral amplitude of the signal difference, using low-pass power-law noise with exponent matching the power-law fit to the amplitude spectrum of the animal shape. This roughly equalizes the expected SNR of low and high shape frequencies for the ideal observer although deviations of the stimulus amplitude spectrum from the power-law model mean that the exact utility of each frequency will vary. The FD representation makes the addition of low-pass noise straightforward as the noise added to each FD coefficient of the stimulus remains independent. 
The ideal observer
The ideal observer uses the matched filter W that maximizes SNR and, hence, discriminability d′. If the noise were white, the ideal observer would form a real-valued scalar decision variable R given by Display Formula\(R = {\rm{Re}}\left( {{{\bf{W}}^H}{\bf\! {\tilde S}}} \right)\), where W = S1S0 is the signal difference, WH is the Hermetian conjugate (i.e., the conjugate transpose) of W and the operator Re() takes the real part of its argument. 
For our nonwhite noise, the matched filter is also shaped by the noise: Display Formula\({{\bf{W}}^I} = {\Sigma ^{ - 1}}\Delta {\bf{S}}\), where Σ is the real-valued covariance of the added noise over frequencies k (Kay, 1998). Because in our case the noise at each frequency is independent, Σ is diagonal, and the ideal template can be written as  
\begin{equation}\tag{2}W_k^I = {S_k}/\sigma _k^2 \propto {S_k}{\left| k \right|^{2\alpha }},k = \pm 2, \ldots , \pm (M - 1), - M.\end{equation}
Because the spectral amplitude of the signal S falls roughly as Display Formula\(|k{|^{ - \alpha }}\), the ideal observer's template is actually high pass, increasing in amplitude with frequency, roughly as Display Formula\(|k{|^\alpha }\). This makes sense because, although the SNR is by design equalized across frequencies, the signal amplitudes are falling roughly as Display Formula\(|k{|^{ - \alpha }}\), and so higher frequency components must be boosted to contribute equally to the decision variable.  
By Parseval's theorem, the decision variable R can also be computed in the spatial domain: Display Formula\(R = {\rm{Re}}\left( {{{\bf{w}}^H}{\bf\tilde s}} \right)\), where w is the inverse Fourier transform of W. In other words, the observer's computation can be thought of as an inner product in either the FD or spatial domains, and the corresponding templates are related through the Fourier transform. 
Computing the shape-classification image
We model human perceptual shape discrimination using the linear template model (Figure 2) in the FD domain. In the standard noise-averaging classification image method the added noise is white, and an unbiased estimate Display Formula\({\bf{\widehat W}}\) of the observer's inner template W can be computed as  
\begin{equation}\tag{3}{\bf{\widehat W}} = ({{\bf{\overline N}}_{11}} + {{\bf{\overline N}}_{01}}) - ({{\bf{\overline N}}_{10}} + {{\bf{\overline N}}_{00}})\quad ({\rm{White\ stimulus\ noise}}),\!\end{equation}
where Display Formula\({{\bf{\overline N}}_{ij}}\) is the mean of the added noise over all trials in which the stimulus contained signal i and the observer indicated signal j (Ahumada, 2002; Beard & Ahumada, 1998; Murray et al., 2002).  
In our case, the noise is low pass, not white. Abbey and Eckstein (2002) considered the discrimination of two distinct signals embedded in additive nonwhite Gaussian noise within a two-alternative, forced choice (2AFC) experimental paradigm. In particular, they showed that, for a linear observer with additive Gaussian internal noise, generalizing from white to nonwhite stimulus noise involves normalization of the estimated template by the covariance of the added noise. Murray (2016) extended this result to a yes/no paradigm, but because that proof is embedded in the context of an analysis of more general decision rules, we provide in Appendix B a proof specific to the yes/no task with a threshold decision rule. In particular, we prove that an unbiased estimate Display Formula\({\bf{\widehat W^{\prime} }}\) of the observer's inner template can be computed for general multivariate Gaussian stimulus noise, added independently to real and imaginary coefficients by dividing the biased estimate by the real-valued noise covariance Σ:  
\begin{equation}\tag{4}{\bf{\widehat W^{\prime} }} = {\Sigma ^{ - 1}}{\bf{\widehat W}}\quad ({\rm{General\ case}}).\end{equation}
 
In our case, the covariance matrix is diagonal, and the coefficients of the observer template are given by  
\begin{equation}\tag{5}{\widehat W^{\prime} _k} = {\widehat W_k}/\sigma _k^2 = |k{|^{2\alpha }}{\widehat W_k}.\end{equation}
 
Because the GLM framework does not depend upon the distribution of the stimulus, no adjustments to it are required to estimate the observer template Display Formula\({\bf{\widehat W^{\prime} }}\)
Assessing the shape-classification image
In a white-noise framework, the ideal template is simply the signal difference, and deviations of the human observer from ideal can be assessed by comparing the estimated human template with the signal difference. However, in our low-pass stimulus noise framework, the ideal observer must also normalize the stimulus by the noise covariance: Display Formula\(W_k^I \propto {S_k}/\sigma _k^2 = {S_k}|k{|^{2\alpha }}\), resulting in a high-pass template. In order to visualize human template tuning to the low-pass FD frequency structure of natural shapes, we therefore scale estimated templates by the noise covariance Display Formula\(\sigma _k^2 = |k{|^{ - 2\alpha }}\). For the noise-averaging method, this cancels the earlier normalization to the calculation of the classification image (Equation 5), and so the similarity of human and ideal observer tuning can be assessed by simply comparing the un-normalized classification image Display Formula\({\bf{\widehat W}}\) (Equation 3) with the signal difference ΔS. However, for the GLM method, the estimated template Display Formula\({\bf{\widehat W^{\prime} }}\) must be multiplied by the noise covariance Σ prior to comparison with the signal difference ΔS
The human observer template W is only identifiable up to a scale factor. To facilitate comparison with the ideal observer, we scale the classification image Display Formula\({\bf{\widehat W}}\) by the positive scale factor β that minimizes a measure of deviation from the signal difference ΔS. We expect uncertainty in the estimated coefficients of the observer template W to scale with the standard deviation of the stimulus noise σk. We therefore determine the scale factor β that minimizes the weighted squared deviation between the observer template and the signal difference:  
\begin{equation}\tag{6}\beta = \arg \mathop {\min }\limits_{\beta ^{\prime} } \sum\limits_{k = \pm 2, \ldots , \pm M,M + 1} {\left( {1/\sigma _k^2} \right)} {\left| {\beta ^{\prime} {{\widehat W}_k} - {S_k}} \right|^2} = \arg \mathop {\min }\limits_{\beta ^{\prime} } \sum\limits_{k = \pm 2, \ldots , \pm M,M + 1} | k{|^{2\alpha }}{\left| {\beta ^{\prime} {{\widehat W}_k} - {S_k}} \right|^2}.\end{equation}
 
Scaling estimated templates by the noise covariance also allows us to visualize the templates in the spatial domain. The estimated spatial template is simply the inverse Fourier transform Display Formula\({\bf{\widehat w}}\) of the estimated (and noise-scaled) FD template Display Formula\({\bf{\widehat W}}\), and tuning can be assessed by comparing Display Formula\({\bf{\widehat w}}\) to the signal difference Δs. Note that the fundamental of the FD representation forms an ellipse in the spatial domain that acts as a kind of “scaffold” that higher FD frequencies modulate. Because the fundamental for signal 0 and signal 1 are matched, the fundamental for the signal difference is zero. As a result, rendering the signal difference or estimated human template without the fundamental generates an uninterpretable squiggle. This is rectified by adding the fundamental to the ideal and estimated human templates prior to displaying in the spatial domain. 
Simulations
Before reporting psychophysical results, we present the results of three simulation studies that have informed the design and validation of our method. 
Linear systems identification method
To compare noise-averaging and GLM approaches to template estimation, we conducted a simulation experiment using the rabbit shape as signal. The stimulus noise gain was set to the mean value required for our three human observers to perform at 75% correct (see Experiment 2 below). We simulated a noisy ideal observer model, adding internal Gaussian noise to bring performance down to 75% correct. We generated estimates Display Formula\({\bf{\widehat W}}\) of the simulated observer template using both noise-averaging and probit GLM methods for 100–5,000 trials, repeating the experiment 30 times. These estimates were evaluated by computing the squared error (Equation 6) of the (noise-scaled) estimated template Display Formula\({\bf{\widehat W}}\) relative to the signal difference. We found that the two methods led to very similar accuracies with the GLM having a slight edge for experiments with between 500 and 2,500 trials (Figure 5a). Because the experiments described below employ 1,500 trials, we employ the GLM method in most of the analyses to follow. 
Figure 5
 
Results of template estimation simulations using the rabbit shape as signal. The simulated observer used an ideal template with added internal Gaussian noise. Plots show mean and standard error over 30 repetitions. (a) Total weighted squared error (Equation 6) for the noise averaging and probit GLM template estimation methods as a function of the number of trials. (b) Weighted squared error at each frequency (Equation 6) for a 1,500-trial experiment as a function of the dimensionality M of the stimulus.
Figure 5
 
Results of template estimation simulations using the rabbit shape as signal. The simulated observer used an ideal template with added internal Gaussian noise. Plots show mean and standard error over 30 repetitions. (a) Total weighted squared error (Equation 6) for the noise averaging and probit GLM template estimation methods as a function of the number of trials. (b) Weighted squared error at each frequency (Equation 6) for a 1,500-trial experiment as a function of the dimensionality M of the stimulus.
Dimensionality
To evaluate how the dimensionality M of the low-pass FD shape representation affects the accuracy of template estimation, we conducted a second simulation, again using the rabbit shape as signal but varying the dimensionality M of both the signal and the noise components of the stimulus (i.e., the number of low-frequency harmonics) and adjusting the gain of the stimulus noise to maintain 75% correct performance. For each value of M, we ran 30 simulated experiments of 1,500 trials each and used the GLM method to estimate the observer template. Figure 5b shows that, as the stimulus dimensionality M increases, the mean error of estimated observer template coefficients also increases for all estimated FD frequencies. This result demonstrates the importance of using a low-dimensional FD subspace to obtain accurate shape template estimation. 
Amplitude and phase information
The Gaussian noise added to the FD coefficients induces noise in both the FD amplitude and phase domains. To interpret the psychophysical results that follow, it is helpful to know how this noise determines the information available in these two domains for our shape-discrimination task. The amplitudes of the noisy FD coefficients follow a smooth, positively skewed distribution defined on Display Formula\([0,\infty )\) known as a Rice distribution (Rice, 1945, equations 3.7–10). In the FD phase domain, the stimulus noise induces a smooth, symmetric, unimodal distribution centered on the signal phase. When conditioned on amplitude, the phase distribution is von Mises; however, the marginal phase distribution is more complicated; we derive an analytical expression for this distribution in Appendix A
To understand how this low-pass noise affects the information available in the FD amplitude and phase domains, we employed our analytical models for the marginal amplitude and phase distributions of the noisy FD coefficients (Appendix A) to implement “limited ideal” observer models that are allowed to use either only one complex FD frequency component or just the amplitude or the phase of that one FD component. We then ran these models on the experiment, using the rabbit shape as signal, to determine their sensitivity (d′) (Figure 6). We found that although both FD amplitude and phase carry information, phase is, on average, somewhat more informative than amplitude (mean d′ of 1.2 for phase vs. 0.75 for amplitude). A similar pattern is seen for the other two shapes used in this study. 
Figure 6
 
Sensitivity (d′) of FD coefficients, their amplitudes, and phases for the shape discrimination task (rabbit shape).
Figure 6
 
Sensitivity (d′) of FD coefficients, their amplitudes, and phases for the shape discrimination task (rabbit shape).
Experiment 1
Method
Participants
There were three observers: author JW (O1), author IF (O2), and a third naïve observer (O3). Observers gave informed consent prior to participation. 
Stimuli
The observer was required to discriminate between two signal shapes: signal 1 and signal 0. For signal 1, we employed three different animal shapes in a blocked design: a deer, a rabbit, and a turtle (Figure 4). The shapes were derived from the Hemera object data set, which consists of blue-screened images of isolated objects, roughly 600 × 500 pixels in size. The bounding contours of the objects were extracted at pixel resolution and represented in the FD domain using only the 16 lowest negative and 15 lowest positive frequencies. For signal 0, we employed the first fundamental (S−1 + S+1) of the FD representation for the corresponding animal shape (signal 1), which traces an ellipse roughly approximating the animal shape. 
On each trial, the visual stimulus consisted of one of these two signals corrupted by independent low-pass Gaussian noise added in the FD domain. In particular, we set Display Formula\({\sigma _k}{ \propto _{}}|k{|^{ - \alpha }}\), where σk is the standard deviation of the added Gaussian noise at frequency k and α is the power-law exponent of the spectral density of the natural shape being discriminated (signal 1). Note that independent noise was added to real and imaginary components of each FD coefficient. 
Each shape was shown as a white contour centered at the center of the screen, roughly 16° × 16° in visual angle on a dark gray background. At the viewing distance of 57 cm, each pixel subtended roughly 1.5 arcmin. 
Design and procedure
The experiment adhered to the tenets of the Declaration of Helsinki. 
The experiment was blocked by animal. There were 1,500 trials per animal, run in three sessions of 500 trials each. Before each session, the observer was (re)familiarized with the two signal shapes. First the (noise-free) animal shape (signal 1) was shown until the observer pressed a button and then the (noise-free) ellipse (signal 0) shape was displayed. A second button press ended the familiarization procedure. 
On each trial, one of the two possible signals was randomly selected with uniform probability, a random sample of low-pass Gaussian noise was added in the FD domain, and the inverse Fourier transform was applied to this noisy FD signal to generate the displayed spatial stimulus (Figure 7). The observer was given unlimited time to indicate with left/right arrow keys which of the two signals was shown. Feedback was provided in the form of a 1-s auditory tone (high pitch indicating correct, low pitch indicating incorrect).1 
Figure 7
 
Stimulus generation. The plots show real and imaginary FD coefficients across frequency.
Figure 7
 
Stimulus generation. The plots show real and imaginary FD coefficients across frequency.
An adaptive psychometric procedure (Quest, Watson, & Pelli, 1983) was used to adjust the gain of the noise after each trial in order to maintain performance near 75% correct. 
Results
We found that the Quest procedure reliably maintained performance near 75% correct (Table 1). Figure 8 shows an example of the estimated human shape template in both FD and spatial domains for one observer and one shape. Note that the noise-averaging and GLM methods generate very similar templates. Qualitatively, the estimated human template seems to roughly track the ideal at low frequencies but is attenuated at high frequencies. Figures 9a and d shows that this low-pass bias is consistent across observers and shapes. 
Table 1
 
Experiment 1. Performance (percentage correct) for each observer and shape.
Table 1
 
Experiment 1. Performance (percentage correct) for each observer and shape.
Figure 8
 
Experiment 1. Example shape classification image estimated with noise-averaging and GLM methods.
Figure 8
 
Experiment 1. Example shape classification image estimated with noise-averaging and GLM methods.
Figure 9
 
Experiment 1. Estimated spatial observer templates \({\bf{\widehat w}}\).
Figure 9
 
Experiment 1. Estimated spatial observer templates \({\bf{\widehat w}}\).
To quantitatively assess this apparent low-pass bias, we fit power law models to the estimated human template coefficients Display Formula\({\widehat W_k}\) (Figure 10). Observe that the fall-off in spectral density as a function of frequency is much steeper for the estimated human templates than for the corresponding ideal templates. 
Figure 10
 
Experiment 1. Amplitude spectrum for ideal and estimated observer templates; α is the maximum likelihood estimate of the power law exponent in \({S_k} \propto |k{|^{ - \alpha }}\), i.e., the negative of the slope of the best-fitting line, shown in blue.
Figure 10
 
Experiment 1. Amplitude spectrum for ideal and estimated observer templates; α is the maximum likelihood estimate of the power law exponent in \({S_k} \propto |k{|^{ - \alpha }}\), i.e., the negative of the slope of the best-fitting line, shown in blue.
One prediction of the linear template model is that templates estimated using signal 0 trials only should not be systematically different from templates estimated using signal 1 trials only. Further, it has been shown that nonlinearities in human detection and discrimination mechanisms tend to bias templates toward the signal (Beard & Ahumada, 1998; Goris, Zaenen, & Wagemans, 2008; Morgenstern & Elder, 2012; Nandy & Tjan, 2007; Solomon, 2002). Thus, if human shape-discrimination mechanisms involve substantial nonlinearities, templates estimated from signal 1 trials only are predicted to look more like the animal shapes than templates estimated from signal 0 trials only. Conversely, any resemblance between the animal shapes and templates estimated from signal 0 trials only cannot be due to such bias because the animal shapes were not present in the stimulus for any of these trials. 
Spatial observer templates estimated from signal 0 trials only and signal 1 trials only are shown for the GLM method in Figure 9b and c, respectively, and for the noise-averaging method in Figure 9e and f, respectively. We observe that templates derived from signal 0 trials only still track the low frequencies of the animal stimuli, indicating a significant linear component to this shape-discrimination task. At the same time, systematic differences between templates derived from signal 0 trials and those derived from signal 1 trials indicate the presence of nonlinearities. Qualitatively, the signal 1 templates seem to exhibit sharper features than the signal 0 templates. 
To assess whether these nonlinearities can be explained in terms of differences in shape-frequency tuning, we fit power-law models to the templates estimated from signal 0 trials only and from signal 1 trials only; Figure 11 summarizes the results. Despite the apparent difference in the spatial templates, there is no consistent difference in the spectral slopes (α exponents of the best-fitting power laws) for templates estimated from all trials, signal 0 trials only or signal 1 trials only, F(1, 12) = 0.29, p = 0.6. This suggests that these nonlinearities have more to do with the phase tuning of human shape-discrimination mechanisms. 
Figure 11
 
Experiment 1. Power-law exponents for estimated human and ideal templates. Error bars represent standard error of the mean.
Figure 11
 
Experiment 1. Power-law exponents for estimated human and ideal templates. Error bars represent standard error of the mean.
Do these nonlinearities bias the signal 1 templates toward the animal shape? To test this, we computed the weighted squared deviation of estimated templates from ideal (i.e., the objective minimized by Equation 6). Figure 12 shows that templates estimated from signal 1 trials are indeed, on average, slightly closer to the ideal templates, and a three-way ANOVA (signal 0 vs. 1 × shape × observer) reveals that this main effect is statistically significant, F(1, 12) = 23.5, p = 0.0004. 
Figure 12
 
Experiment 1. Deviation (root mean weighted squared deviation, Equation 6) of estimated observer and ideal templates. Error bars represent standard error of the mean.
Figure 12
 
Experiment 1. Deviation (root mean weighted squared deviation, Equation 6) of estimated observer and ideal templates. Error bars represent standard error of the mean.
Evaluating the model
We employ two methods to evaluate the estimated linear shape templates as models of human shape discrimination. Specifically, we compare the agreement between a linear model MH based upon the estimated human shape template and a linear model MI based upon the ideal template. To avoid overfitting, MH templates and responses were computed using leave-one-out cross-validation over the 1,500 trials of the experiment. 
First, we use the t score method employed by Morgenstern and Elder (2012), which is related to the measure of choice probability introduced by Britten, Newsome, Shadlen, Celebrini, and Movshon (1996). Although the original choice probability method was nonparametric, in our experiments the Gaussian nature of the stimulus noise means that model responses will also be Gaussian distributed, making a parametric approach appropriate. 
The t score method measures the agreement between the scalar values of the model decision variable and the binary responses of the human observer. The premise is that if the model decision variable is causal on the human responses, its value should be predictive of those responses. To assess this, the trials are first partitioned into a subset in which the stimulus contained signal 0 and a subset in which the stimulus contained signal 1. Then, for each of these subsets, the t score for the difference in the mean model response when the observer responded signal 1 versus when they responded signal 0 is computed:  
\begin{equation}\tag{7}t = {{{\mathbb{E}}\left[ {{R_M}|{R_H} = 1} \right] - {\mathbb{E}}\left[ {{R_M}|{R_H} = 0} \right]} \over {\sqrt {{\rm{Var}}\left[ {{R_M}|{R_H} = 1} \right]/{n_1} + {\rm{Var}}\left[ {{R_M}|{R_H} = 0} \right]/{n_0}} }}.\end{equation}
Here RM is the model response, RH is the human response, and n1 and n0 are the number of trials in which the human observer responded signal 1 and signal 0, respectively. To be consistent with human judgments, the model should generate high values when the observer responds signal 1 and low values when the observer responds signal 0, thus producing a large positive t score.  
Figure 13 compares the t scores for a linear template model MH based on the estimated human template with those for a linear template model MI based on the ideal template; t scores are generally higher for MH, F(29, 1) = 10.3, p = 0.0033, indicating that the model based upon the estimated human template is more consistent with human behavior than the ideal template model. 
Figure 13
 
Experiment 1. t score measure of agreement between linear template models of shape discrimination based upon estimated human templates (MH) and the ideal template (MI).
Figure 13
 
Experiment 1. t score measure of agreement between linear template models of shape discrimination based upon estimated human templates (MH) and the ideal template (MI).
Our second method of evaluation takes into account the presence of internal noise in our human observers. Our human observers perform in the 73%–79% correct range on our shape-discrimination task. Simulations reveal that the ideal observer performs at 100% correct for the same stimuli, and a linear model based upon the estimated human templates performs in the 76%–81% correct range. This suggests that at least part of the inefficiency in human observer responses derives from internal noise, and to be complete, a model of human performance must take this noise into account. 
To this end, we revise our models MH and MI to include added internal zero-mean Gaussian noise. MH employs the estimated observer template, and MI employs the ideal template. The gain of the internal noise was adjusted so that the proportion correct of the model matched that of the human observer. Specifically, we measured proportion correct for the model over a range of noise gains, fit a sigmoid function, and then from this function estimated the noise gain that would generate the proportion correct attained by the human observer. 
To assess these models, we measured trial-by-trial agreement with human psychophysical responses. Figure 14a shows the trial-by-trial agreement of the two models with the human data (i.e., the proportion of model responses matching human responses). These are compared against the agreement expected by chance for an observer that matches human performance but is otherwise statistically independent, given by Display Formula\(p_c^2 + {(1 - {p_c})^2}\), where pc is proportion correct. The estimated observer template model MH is consistently more predictive of human judgments than the ideal template model MI, and a three-way ANOVA (model × shape × observer) reveals that this difference (main effect of model) is significant, F(1, 12) = 26.42, p = 0.0002. This result clearly indicates the utility of the shape-classification image method: It produces a model that is significantly more predictive of human performance than could be obtained by simply degrading an ideal observer model with noise. 
Figure 14
 
Experiment 1. (a) Trial-by-trial agreement of observer template model MH and ideal template model MI with human responses. (b) Trial-by-trial internal consistency of observer template model MH and ideal template model MI with different samples of internal noise. The blue horizontal bar and shading indicate mean and standard error of the agreement/consistency expected by chance. In both cases, this is given by \(p_c^2 + {(1 - {p_c})^2}\) and represents a model observer that matches human proportion correct pc but for which all errors are due to internal noise.
Figure 14
 
Experiment 1. (a) Trial-by-trial agreement of observer template model MH and ideal template model MI with human responses. (b) Trial-by-trial internal consistency of observer template model MH and ideal template model MI with different samples of internal noise. The blue horizontal bar and shading indicate mean and standard error of the agreement/consistency expected by chance. In both cases, this is given by \(p_c^2 + {(1 - {p_c})^2}\) and represents a model observer that matches human proportion correct pc but for which all errors are due to internal noise.
The trial-by-trial agreement of model MH with human judgments is in the 70%–76% range. Should this be considered good? If MH were a perfect model of the human observer, we would expect its agreement with the human observers to be comparable to its internal consistency, i.e., the agreement between its responses to identical stimuli (same stimulus noise sample, but different internal noise samples). The results of this analysis are shown in Figure 14b. We observe that the internal consistency of the noisy ideal observer MI is at chance levels, indicating that internal noise is the dominant factor limiting its performance. In contrast, we observe much higher internal consistencies for our noisy observer template model MH, in the range of 72%–90%, indicating that stimulus noise and internal noise jointly determine its performance. Importantly, we note that the internal consistency of MH is considerably higher than its agreement with our human observers. This discrepancy shows that MH is not a perfect model of human shape detection; deviations could include both inaccuracies in the estimated template as well as unmodeled nonlinearities in the human visual detection mechanism. 
To gain further insight into how far MH is from being a perfect model, it would be helpful to also know the internal consistency of the human observer when presented with repeated trials of exactly the same stimulus (same signal, same noise sample). This internal consistency yields an upper bound on the agreement any model with the same internal consistency could hope to achieve with the human data. If model MH achieves agreement near to this upper bound, it should be judged a good model. This motivates our second experiment. 
Experiment 2
Our second experiment was identical to our first with the exception that each stimulus (signal + noise) was repeated twice, in separate trials, separated by a random interval. This double-pass technique allows us to measure human internal consistency and compare this with model–observer agreement. 
Method
Methods are identical to Experiment 1 except where noted below. 
Participants
Three human observers participated in this experiment. O1 and O2 were the authors (JW and IF) who were also observers in Experiment 1. O3 was a new observer, naïve to the purpose of the experiment. 
Stimuli
Stimuli were the same as for Experiment 1
Design and procedure
The procedure was identical to Experiment 1 with the following exceptions. Although for Experiment 1 we used Quest to adapt the gain of the noise from trial to trial, in Experiment 2 the gain of the noise was set independently for the three shapes to match the average noise threshold over observers from Experiment 1. The noise was then fixed at this gain for all trials and for all three observers. 
Observers performed 3,000 trials per shape: two passes of 1,500 trials each. Each trial of the first pass involved an independent sample of the stimulus noise. These same stimuli (with the same noise samples) were presented again in the second pass, but in a different random order. Exactly the same noise samples were used for all observers, allowing consistency both within an observer and agreement between observers to be computed. Although the order of the trials was randomized so that the delay between repeated presentations varied widely, all observers saw the stimuli in the same order. 
Results
We found that the fixed noise gains selected for each shape yielded performance close to 75% correct for all observers (Table 2). Figure 15 shows the estimated templates. The results are qualitatively similar to the results of Experiment 1
Table 2
 
Experiment 2. Performance (percentage correct) for each observer and shape.
Table 2
 
Experiment 2. Performance (percentage correct) for each observer and shape.
Figure 15
 
Experiment 2. Spatial observer templates \({\bf{\widehat w}}\) estimated from (a) all trials, (b) signal 0 trials only, and (c) signal 1 trials only.
Figure 15
 
Experiment 2. Spatial observer templates \({\bf{\widehat w}}\) estimated from (a) all trials, (b) signal 0 trials only, and (c) signal 1 trials only.
As for Experiment 1, we employed two methods to evaluate the estimated linear shape templates as models of human shape discrimination. (For details of the methods, please see the earlier section on Evaluating the model.) Figure 16 compares the t scores for a (noiseless) linear template model MH based on the estimated human template with those for a (noiseless) linear template model MI based on the ideal template; t scores are generally higher for MH, F(29, 1) = 25.1, p = 2.5 × 10−5, indicating that the model based upon the estimated human template is more consistent with human behavior than the ideal template model. 
Figure 16
 
Experiment 2. t score measure of agreement between linear template models of shape discrimination based upon estimated human templates (MH) and the ideal template (MI).
Figure 16
 
Experiment 2. t score measure of agreement between linear template models of shape discrimination based upon estimated human templates (MH) and the ideal template (MI).
As for Experiment 1, we also added internal noise to our observer models MH and MI to match the performance of our human observers. Figure 17a shows the trial-by-trial agreement of the two resulting models with the human data (i.e., the proportion of model responses matching human responses) together with the average agreement between different human observers as a reference. A three-way ANOVA (model × shape × observer) reveals a significant main effect of model, F(2, 20) = 31.78, p = 6.2 × 10−7, and a (protected) least significant difference (LSD) post hoc test indicates that the model based on the estimated observer templates MH is significantly more predictive of human responses than the noisy ideal observer model MI, t(2) = 5.36, p = 3.12 × 10−5. At the same time, the agreement between different human observers exceeds that between MH and human observers, t(2) = 2.43, p = 0.025. We conclude from this analysis that, although the noisy observer template model MH provides a better account of human judgments than the noisy ideal template model MI, there is an important aspect of human shape discrimination, common to our three observers, that is captured by neither model. 
Figure 17
 
Experiment 2. (a) Trial-by-trial agreement of observer template model MH and ideal template model MI with human responses compared with agreement between observers. (b) Trial-by-trial internal consistency of observer template model MH and ideal template model MI with different samples of internal noise, compared with internal (within) consistency of human observers. The blue horizontal bar and shading indicate mean and standard error of the agreement/consistency expected by chance. In both cases, this is given by \(p_c^2 + {(1 - {p_c})^2}\) and represents a model observer that matches human proportion correct pc but for which all errors are due to internal noise.
Figure 17
 
Experiment 2. (a) Trial-by-trial agreement of observer template model MH and ideal template model MI with human responses compared with agreement between observers. (b) Trial-by-trial internal consistency of observer template model MH and ideal template model MI with different samples of internal noise, compared with internal (within) consistency of human observers. The blue horizontal bar and shading indicate mean and standard error of the agreement/consistency expected by chance. In both cases, this is given by \(p_c^2 + {(1 - {p_c})^2}\) and represents a model observer that matches human proportion correct pc but for which all errors are due to internal noise.
The double-pass design also allows us to compute the internal consistency of each of our human observers and compare against the internal consistency of our models (Figure 17b). As in Experiment 1, the internal consistency of the noisy ideal model MI hovers around chance levels, indicating that internal noise is the dominant factor limiting its performance, and the internal consistency of the noisy observer template model MH is consistently higher, indicating that both stimulus and internal noise jointly limit its performance. A three-way ANOVA (model × shape × observer) reveals a significant effect of model on internal consistency, F(2, 20) = 33.96, p = 3.7 × 10−7. Post hoc tests, again using Fisher's protected LSD, reveal that model MH has significantly higher internal consistency (lower internal noise) than MI, t(8) = 5.54, p = 2.0 × 10−5. At the same time, our human observers have higher internal consistency than MH, t(8) = 2.50, p = 0.02. 
Figure 18 provides a more detailed comparison of the human data with our noisy models. Here we varied the internal noise gain for the two models over a broad range to sweep out curves that relate performance (proportion correct) to internal consistency; note that the gain of the internal noise decreases from left to right as these curves are traversed. We also plot our nine experimental measurements (3 observers × 3 shapes) as points in this space. The substantial downward displacement of these points relative to the ideal observer highlights the substantial degree of systematic inefficiency present in the human visual system: for the same level of internal noise, proportion correct for our human observers may be up to 10% lower than for the noisy ideal. The fact that the human data points fall near to the estimated classification image observer (MH) curves suggests that MH does a reasonable job in capturing this systematic inefficiency. 
Figure 18
 
Proportion correct versus internal consistency for the human data (symbols), observer template model MH (colored curves), and the ideal template model MI (black curve).
Figure 18
 
Proportion correct versus internal consistency for the human data (symbols), observer template model MH (colored curves), and the ideal template model MI (black curve).
Discussion
Contributions
In this paper, we have introduced a new yes/no classification image methodology to explore the discrimination of 2-D (planar) shape. In contrast to prior work (Kurki et al., 2014), our method can be applied to fully general shapes, including ecologically important stimuli, such as animals. Critical to the success of the technique is the projection of shape stimuli onto an FD basis, which allows the essential features of the shapes to be captured by a small number of coefficients, limiting the dimensionality of the observer template that must be estimated. 
Because natural shapes are steeply low pass (Figure 4), efficient identification of these weights also depends upon the use of low-pass (correlated) stimulus noise. We have provided a methodology for analyzing responses from a yes/no shape discrimination task using low-pass stimulus noise and proven that an unbiased estimate of the observer template can be estimated by the standard noise-averaging method as long as mean noise fields are normalized by the noise covariance (Appendix B; see also Murray, 2016). Alternatively, templates can be estimated by maximizing the likelihood of a generalized linear model. 
There are several aspects of our results attesting to the utility of this new methodology. First, the method yields observer templates bearing a resemblance to the target shapes (Figures 9 and 15), demonstrating that the method is picking up information relevant to the task. Second, templates estimated from signal 0 trials only are similar, although not identical, to templates estimated from signal 1 trials, showing that the resemblance of estimated templates to target animal shapes is not just an artifact of a nonlinearity in the human shape-discrimination system (Ahumada & Beard, 1999; Morgenstern & Elder, 2012; Nandy & Tjan, 2007; Solomon, 2002). Third, using two different evaluation methods, we find that the trial-by-trial agreement between human responses and models based upon estimated observer templates is significantly higher than for models based upon ideal templates. Fourth, the internal consistency of the estimated noisy observer template model MH approaches the internal consistency of human observers when matched to human performance (Figure 17). Finally, a plot of performance (proportion correct) versus internal consistency (Figure 18) reveals that human performance is limited by a substantial degree of systematic inefficiency, roughly matched by the systematic inefficiency of our estimated observer templates (model MH). 
In summary, it seems that the proposed method is sound and can help characterize human shape-discrimination performance. In particular, what it appears to tell us here is that human shape discrimination can be modeled, to some approximation, within the linear template framework and that discrimination is based primarily on lower shape frequencies. Consistent with this selectivity for lower frequencies, Figure 18 shows that the human data and the MH model are closest to ideal for the turtle shape, which is the most low pass of the three shapes tested (Figure 4). 
Open questions
Why would human shape-discrimination mechanisms revealed by our method be biased toward lower shape frequencies? One possibility is that spatial uncertainty generates greater phase uncertainty for high shape frequencies than low frequencies. For example, suppose that spatial uncertainty can be modeled as a Gaussian process Display Formula\(\sim {\cal N}\left( {0,{\sigma ^2}} \right)\) in the arc length coordinate of the shape. This would cause phase uncertainty to scale roughly as , where k is the FD frequency. To take this into account, the observer should attenuate these higher shape frequencies in the linear template. This hypothesis could be tested in the future by measuring performance as a function of the variance of added phase noise to identify the equivalent internal phase noise for both high and low frequencies (Pelli & Farell, 1999). 
The small but apparent differences between shape templates estimated from signal 0 and signal 1 trials (Figures 9 and 15) show that there are significant nonlinearities in human shape-discrimination mechanisms. What is the nature of these nonlinearities? One possibility is that the human visual system deals with the increase in phase uncertainty with FD frequency by shifting from linear to phase-invariant mechanisms at higher FD frequencies. To test this idea, we modified the input to our linear template model to include not only the complex FD stimulus coefficients Sk, but also their phase-invariant moduli Display Formula\(\left| {S_k} \right|\). We reasoned that if the human visual system shifts from linear to phase-invariant encoding at higher frequencies, this expanded model should yield higher agreement with the human data, and we should see a shallower fall off in the Display Formula\(\left| {S_k} \right|\) template coefficients relative to the fall off in the Sk template coefficients. We found that, in fact, the agreement with the human data for the two models was very similar, and the low-pass fall off was nearly identical for the linear and phase-invariant coefficients. We conclude from this analysis that nonlinearities in human shape discrimination cannot be accounted for by a simple shift from linear to phase-invariant encoding of the stimulus at higher frequencies. 
If not a simple shift to phase-invariant mechanisms at higher frequencies, what could explain the evidence for the nonlinear encoding we see in our data? Prior work in the domain of spatial vision may provide insight. Classification image analysis in the power-spectrum domain (Morgenstern & Elder, 2012) suggests that large-field contrast grating detection is based on an incoherent (phase-invariant) energy pooling but over highly localized linear filters. Similarly, it is possible that in our experiments higher shape frequencies are coded at least partially incoherently by localized shape mechanisms and combined through nonlinear (e.g., energy) pooling. Candidate localized shape encoding mechanisms include shapelets (Dubinskiy & Zhu, 2003) and formlets (Elder et al., 2013). It is also quite possible that higher FD frequency components are not processed independently from other components. For example, coding of higher FD frequencies may be conditioned upon phase alignment with lower FD frequencies. 
Given these possibilities, we should be careful in our interpretation of the low-pass bias in estimated linear templates; this bias does not necessarily mean that higher shape frequencies are not important, only that they are not used in a linear way. Repeating the experiments reported here but with higher shape frequencies removed could serve to better quantify the nonlinear role of higher frequencies in shape discrimination. A modest decline in performance would suggest that these higher frequencies are not critical to human performance (whether processed linearly or nonlinearly), and a substantial drop in performance would suggest that the high frequencies are being used but in a nonlinear way. 
We must also ask to what degree the task we set for our observers is a reasonable approximation of how we use shape “in the wild.” In our experiments, observers were asked to repeatedly discriminate the same noisy animal shape from a noisy ellipse, always presented at exactly the same location, orientation, and size. This is quite unlike the typical way we process shape information in day-to-day tasks in our normal visual environment, in which we must be prepared to discriminate between thousands of different shapes that may appear at arbitrary locations, poses, and scales. 
We note that our method could easily be adapted to vary the location, orientation, and scale of the shape, and this would serve to make the task more natural. Although there have been some attempts to generalize the classification image methodology to more than two-way discrimination (Dai & Micheyl, 2010; Knoblauch & Maloney, 2008; Watson, 1998), as with the two-class method, these multiclass methods estimate difference templates, not the templates themselves. This problem is magnified in the multiclass case because the template for a category contains the negative images from each other class (Murray, 2011). In addition, the number of trials needed to get reliable estimates of the templates increases with the number of alternative categories. 
A third issue concerns the nature of the noise. We employed low-pass noise here in order to match the spectral density of the signal (the shape) and to improve the efficiency of template estimation. However, prior work (Abbey & Eckstein, 2007) suggests that human observers can, under some conditions, adapt their templates to be more efficient depending upon the spectral density of the noise, e.g., by up-weighting higher frequencies in low-pass noise. This issue points clearly to new experiments: estimate the observer shape template with different types of noise and assess how estimated templates adapt. 
Acknowledgment
The authors would like to acknowledge funding by the Natural Sciences and Engineering Research Council of Canada (NSERC) CREATE Program in Vision Science & Applications, and the NSERC Discovery Grant Program. IF also acknowledges receiving York University Bridging Funds. 
Commercial relationships: none. 
Corresponding author: John Wilder. 
Address: Department of Computer Science, University of Toronto, Toronto, Canada. 
References
Abbey, C. K., & Eckstein, M. P. (2001). Maximum-likelihood and maximum-a-posteriori estimates of human observer templates. In E. A. Krupinski & D. P. Chakraborty (Eds.), Proceedings of SPIE, 4324 (pp. 114–122). Bellingham, WA: SPIE. https://doi.org/10.1117/12.431179.
Abbey, C. K., & Eckstein, M. P. (2002). Optimal shifted estimates of human-observer templates in two-alternative forced-choice experiments. IEEE Transactions on Medical Imaging, 21 (5), 429–440.
Abbey, C. K., & Eckstein, M. P. (2007). Classification images for simple detection and discrimination tasks in correlated noise. Journal of the Optical Society of America A, 24 (12), B110–B124.
Ahumada, A. J.,Jr. (2002). Classification image weights and internal noise level estimation. Journal of Vision, 2 (1): 8, 121–131, https://doi.org/10.1167/2.1.8. [PubMed] [Article]
Ahumada, A. J.,Jr., & Beard, B. (1999). Classification images for detection [Abstract]. Investigative Ophthalmology and Visual Science, 40 (4), S572.
Attneave, F. (1954). Some informational aspects of visual perception. Psychological Review, 61 (3), 183–193.
Aubert, M., Brumm, A., Ramli, M., Sutikna, T., Saptomo, E. W., Hakim, B.,… Dosseto, A. (2014, October 9). Pleistocene cave art from Sulawesi, Indonesia. Nature, 514 (7521), 223–227.
Beard, B. L., & Ahumada, A. J.,Jr. (1998). Technique to extract relevant image features for visual tasks. In B. E. Rogowitz & T. N. Pappas (Eds.), Proceedings of SPIE, 3299 (pp. 79–85). Bellingham, WA: SPIE. https://doi.org/10.1117/12.320099.
Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer.
Blum, H. (1973). Biological shape and visual science (part 1). Journal of Theoretical Biology, 38, 205–287.
Britten, K. H., Newsome, W. T., Shadlen, M. N., Celebrini, S., & Movshon, J. A. (1996). A relationship between behavioral choice and the visual responses of neurons in macaque MT. Visual Neuroscience, 13, 87–100.
Connor, C., Brincat, S., & Pasupathy, A. (2007). Transformation of shape information in the ventral pathway. Current Opinion in Neurobiology, 17, 140–147.
Dai, H., & Micheyl, C. (2010). Psychophysical reverse correlation with multiple response alternatives. Journal of Experimental Psychology: Human Perception and Performance, 36 (4), 976–993.
Dubinskiy, A., & Zhu, S. (2003). A multiscale generative model for animate shapes and parts. In Proceedings of the 9th IEEE ICCV (pp. 249–256). Los Alamitos, CA: IEEE Society.
Elder, J. H., Oleskiw, T. D., Yakubovich, A., & Peyré, G. (2013). On growth and formlets: Sparse multi-scale coding of planar shape. Image and Vision Computing, 31, 1–13.
Elder, J. H., & Velisavljević, L. (2009). Cue dynamics underlying rapid detection of animals in natural scenes. Journal of Vision, 9 (7): 7, 1–20, https://doi.org/10.1167/9.7.7. [PubMed] [Article]
Fabre-Thorpe, M., Richard, G., & Thorpe, S. J. (1998). Rapid categorization of natural images by rhesus monkeys. Neuroreport, 9 (2), 303–308.
Feldman, J., & Singh, M. (2006). Bayesian estimation of the shape skeleton. Proceedings of the National Academy of Sciences, USA, 103 (47), 18014–18019.
Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A, 4 (12), 2379–2394.
Goris, R. L. T., Zaenen, P., & Wagemans, J. (2008). Some observations on contrast detection in noise. Journal of Vision, 8 (9): 4, 1–15, https://doi.org/10.1167/8.9.4. [PubMed] [Article]
Granlund, G. H. (1972). Fourier preprocessing for hand print character recognition. IEEE Transactions on Computers, C-21, 195–201.
Green, D., & Swets, J. (1966). Signal detection theory and psychophysics. New York: Wiley.
Grenander, U., Srivastava, A., & Saini, S. (2007). A pattern-theoretic characterization of biological growth. IEEE Transactions on Medical Imaging, 26 (2), 648–659.
Hoffman, D., & Richards, W. (1984). Parts of recognition. Cognition, 18 (1–3), 65–96.
Jain, A., Zhong, Y., & Lakshmanan, S. (1996). Object matching using deformable templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (3), 267–278.
Kay, S. (1998). Fundamentals of statistical signal processing: Detection theory. Englewood Cliffs, NJ: Prentice-Hall.
Kimia, B., Tannenbaum, A., & Zucker, S. (1995). Shapes, shocks and deformations I: The components of two dimensional shape and the reaction diffusion space. International Journal of Computer Vision, 15, 189–224.
Knoblauch, K., & Maloney, L. T. (2008). Estimating classification images with generalized linear and additive models. Journal of Vision, 8 (16): 10, 1–19, https://doi.org/10.1167/8.16.10. [PubMed] [Article]
Kurki, I., Saarinen, J., & Hyvärinen, A. (2014). Investigating shape perception by classification images. Journal of Vision, 14 (12): 24, 1–19, https://doi.org/10.1167/14.12.24. [PubMed] [Article]
Leyton, M. (1989). Inferring causal history from shape. Cognitive Science, 13, 357–387.
Mokhtarian, F., & Mackworth, A. (1986). Scale-based description and recognition of planar curves and two dimensional shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8 (1), 34–43.
Morgenstern, Y., & Elder, J. (2012). Local visual energy mechanisms revealed by detection of global patterns. Journal of Neuroscience, 32 (11), 3679–3696.
Murray, R. F. (2011). Classification images: A review. Journal of Vision, 11 (5): 2, 1–25, https://doi.org/10.1167/11.5.2. [PubMed] [Article]
Murray, R. F. (2016). Classification images in a very general decision model. Vision Research, 123, 26–32.
Murray, R. F., Bennett, P. J., & Sekuler, A. B. (2002). Optimal methods for calculating classification images: Weighted sums. Journal of Vision, 2 (1): 6, 79–104, https://doi.org/10.1167/2.1.6. [PubMed] [Article]
Nandy, A., & Tjan, B. (2007). The nature of letter crowding as revealed by first- and second-order classification images. Journal of Vision, 7 (2): 5, 1–26, https://doi.org/10.1167/7.2.5. [PubMed] [Article]
Pavlidis, T. (1980). Algorithms for shape analysis of contours and waveforms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 301–312.
Pelli, D. G., & Farell, B. (1999). Why use noise? Journal of the Optical Society of America A, 16 (3), 647–653.
Rice, S. O. (1945). Mathematical analysis of random noise. The Bell System Technical Journal, 24 (1), 46–156.
Sharon, E., & Mumford, D. (2006). 2D-shape analysis using conformal mapping. International Journal of Computer Vision, 70 (1), 55–75.
Solomon, J. A. (2002). Noise reveals visual mechanisms of detection and discrimination. Journal of Vision, 2 (1): 7, 105–120, https://doi.org/10.1167/2.1.7. [PubMed] [Article]
Thompson, D. (1917). On growth and form. Cambridge, UK: Cambridge University Press.
Thorpe, S., Fize, D., & Marlot, C. (1996, June 6). Speed of processing in the human visual system. Nature, 381, 520–522.
Watson, A. B. (1998). Multi-category classification: Template models and classification images [Abstract]. Investigative Ophthalmology and Visual Science, 39 (Suppl. 4), S912.
Watson, A. B., & Pelli, D. G. (1983). Quest: A Bayesian adaptive psychometric method. Perception & Psychophysics, 33 (2), 113–120.
Footnotes
1  Although we defined correctness in terms of the signal (0 or 1) used to generate the stimulus and not in terms of the ideal observer response, for the range of noise gains used in our psychophysical experiments, the ideal observer always generated the correct response, and so the two were equivalent.
Appendix A: Phase distribution of a complex normal variable
Let z = x + iy be a complex normal random variable, where the real and imaginary components may have different means but identical variance: Display Formula\(x\sim {\cal N}\left( {{x_0},{\sigma ^2}} \right),y\sim {\cal N}\left( {{y_0},{\sigma ^2}} \right)\). z can also be represented in polar coordinates (r, θ), where x = r cos θ, y = r sin θ. We wish to identify the probability density of z in this polar coordinate frame. 
By the multivariate change of variables theorem, we have that  
\begin{equation}\tag{8}{p_{R,\Theta }}(r,\theta ) = \left| {{{\partial (x,y)} \over {\partial (r,\theta )}}} \right|{p_{X,Y}}(r\cos \theta ,r\sin \theta ),\!\end{equation}
where Display Formula\(\left| {{{\partial (x,y)} \over {\partial (r,\theta )}}} \right|\) is the Jacobian of the inverse transformation and evaluates to Display Formula\(\left| {{{\partial (x,y)} \over {\partial (r,\theta )}}} \right| = r\).  
Thus, we have  
\begin{equation}\tag{9}p_{R,\Theta }(r,\theta ) = rp_{X,Y}(r\cos \theta ,r\sin \theta ) = {r \over {2\pi \sigma ^2}}\exp \bigg\{ - {1 \over {2\sigma ^2}}\Big( {{{( {r\cos \theta - {r_0}\cos {\theta _0}} )}^2}} + {{( {r\sin \theta - {r_0}\sin {\theta _0}} )}^2} \Big) \bigg \} = {r \over {2\pi \sigma ^2}}\exp \bigg\{ - {1 \over {2\sigma ^2}}\Big( {{r^2} - 2r{r_0}\cos ( {\theta - {\theta _0}} ) + r_0^2} \Big) \bigg\} = {r \over {2\pi \sigma ^2}}\exp \bigg\{ - {1 \over {2\sigma ^2}}\Big( {{( {r - {r_0}\cos ( {\theta - {\theta _0}} )} )}^2} + r_0^2{\sin }^2( {\theta - {\theta _0}} ) \Big) \bigg\} = {r \over {2\pi {\sigma ^2}}}\exp \bigg\{ { - {1 \over {2{\sigma ^2}}}{{( {r - {r_0}\cos ( {\theta - {\theta _0}} )} )}^2}} \bigg\}f(\theta ),\!\end{equation}
where Display Formula\(f(\theta ) \buildrel \Delta \over = \exp \left\{ { - {{r_0^2} \over {2{\sigma ^2}}}{{\sin }^2}\left( {\theta - {\theta _0}} \right)} \right\}.\)  
Defining Display Formula\(r^{\prime} = r - {r_0}\cos \left( {\theta - {\theta _0}} \right),r^{\prime} \in \left[ { - {r_0}\cos \left( {\theta - {\theta _0}} \right),\infty } \right)\), this can be written as  
\begin{equation}\tag{10}{p_{R^{\prime} ,\Theta }}\left( {r^{\prime} ,\theta } \right) = {1 \over {2\pi {\sigma ^2}}}\left( {r^{\prime} + {r_0}\cos \left( {\theta - {\theta _0}} \right)} \right)\times \exp \left( { - {{{{r^{\prime} }^2}} \over {2{\sigma ^2}}}} \right)f(\theta ).\end{equation}
 
Marginalizing over r′ on the domain Display Formula\(r^{\prime} \in \left[ { - {r_0}\cos \left( {\theta - {\theta _0}} \right),\infty } \right)\) then yields  
\begin{equation}\tag{11}p_\Theta(\theta) = \bigg({1\over {2\pi}}\exp\bigg(-{r^2_0\cos^2(\theta - \theta_0) \over2\sigma^2}\bigg) + {r_0\cos(\theta-\theta_0) \over \sqrt{2\pi}\sigma}G\bigg({r_0\cos(\theta-\theta_0) \over \sigma}\bigg)\bigg)f(\theta),\!\end{equation}
where G(x) is the cumulative distribution of a standard normal variable x.  
Defining Display Formula\(u(\theta ) = \left( {{r_0}/\sigma } \right)\cos \left( {\theta - {\theta _0}} \right)\) and Display Formula\(v(\theta ) = \left( {{r_0}/\sigma } \right)\sin \left( {\theta - {\theta _0}} \right)\), this can be expressed more compactly as  
\begin{equation}\tag{12}p_\Theta (\theta ) = \bigg( {{1 \over {2\pi }}\exp ( { - {u^2}/2} ) + {u \over {\sqrt {2\pi } \sigma }}G(u)} \bigg)\times \exp \left( { - {v^2}/2} \right).\end{equation}
 
Thus, the marginal distribution of the phase θ is symmetric about θ0. We have verified this equation by sampling z and comparing against the resulting empirical density of the phase. 
Appendix B: Yes/no classification images with correlated stimulus noise
In the standard classification image method with additive white Gaussian noise, an unbiased estimate Display Formula\({\bf{\widehat w}}\) of the observer's inner template w can be computed as  
\begin{equation}\tag{13}{\bf{\widehat w}} = ({{\bf{\overline n}}_{01}} + {{\bf{\overline n}}_{11}}) - ({{\bf{\overline n}}_{00}} + {{\bf{\overline n}}_{10}}),\!\end{equation}
where Display Formula\({{\bf{\overline n}}_{ij}}\) is the mean of the added noise over all trials in which the stimulus contained signal i and the observer indicated signal j (Ahumada, 2002; Murray et al., 2002).  
What if the noise is Gaussian but not white? Abbey and Eckstein (2002) considered the discrimination of two distinct real signals embedded in additive nonwhite (correlated) Gaussian noise, within a 2AFC experimental paradigm. In particular, they showed that, for a linear observer with additive Gaussian internal noise, generalizing from white to nonwhite noise involves normalization of the estimated template by the covariance of the noise. Here we show that this result generalizes to a yes/no task. We first prove the claim for real signals and then show that it easily generalizes to complex-valued stimuli. 
Let Display Formula\({\bf{\tilde s}} = {\bf{s}} + {\bf{n}}\) be an m-dimensional real-valued random vector representing a visual stimulus, where s is a binary signal variable taking one of two values s0 or s1 and Display Formula\({\bf{n}}\sim {\cal N}\left( {0,\Sigma } \right)\) is added zero-mean multivariate normal stimulus noise with covariance matrix Σ. Following Ahumada (2002), we model the internal source of variability by assuming that the criterion β is also a random variable. We let Display Formula\(R \in \{ 0,1\} \) represent the two possible responses of the observer. 
Let Display Formula\({{\bf{\overline n}}_{ij}} = {\mathbb{E}}\left[ {{\bf{n}}|{\bf{s}} = {{\bf{s}}_i},R = j} \right]{}\). Then  
\begin{equation}\tag{14}{{\bf{\overline n}}_{i0}} = {{\mathbb{E} }_{{\bf{n}}\beta }}\left[ {{\bf{n}}|{{\bf{w}}^ \top }\left( {{{\bf{s}}_i} + {\bf{n}}} \right) \lt \beta } \right],\!\end{equation}
and  
\begin{equation}\tag{15}{{\bf{\overline n}}_{i1}} = {{\mathbb{E} }_{{\bf{n}}\beta }}\left[ {{\bf{n}}|{{\bf{w}}^ \top }\left( {{{\bf{s}}_i} + {\bf{n}}} \right) \gt \beta } \right],\!\end{equation}
where w is an m-vector representing the observer template. Without loss of generality, we assume that Display Formula\(||{\bf{w}}||\; = 1\).  
Let U be an orthonormal rotation matrix with first column w, so that Display Formula\({{\bf{U}}^ \top }{\bf{w}} = {{\bf{e}}_1} \buildrel \Delta \over = {\left[ {1,0, \ldots ,0} \right]^ \top }\), and let Display Formula\({\bf{n^{\prime} }} \buildrel \Delta \over = {{\bf{U}}^ \top }{\bf{n}}\) and Display Formula\({\bf{s}}_i^\prime \buildrel \Delta \over = {{\bf{U}}^ \top }{{\bf{s}}_i}\). Note that Display Formula\({\bf{n^{\prime} }}\sim {\cal N}\left( {0,\Sigma ^{\prime} } \right)\) is also zero-mean multivariate normal with covariance matrix Display Formula\(\Sigma ^{\prime} = {{\bf{U}}^ \top }\Sigma {\bf{U}}\). Note also that Display Formula\({{\bf{w}}^ \top }\left( {{{\bf{s}}_i} + {\bf{n}}} \right) = {\left( {{{\bf{U}}^ \top }{\bf{w}}} \right)^ \top }{{\bf{U}}^ \top }\left( {{{\bf{s}}_i} + {\bf{n}}} \right) = {\bf{e}}_1^ \top \left( {{\bf{s}}_i^\prime + {\bf{n^{\prime} }}} \right) = {s^{\prime} _{i1}} + {n^{\prime} _1}\), where Display Formula\({s^{\prime} _{i1}}\) and Display Formula\({n^{\prime} _1}\) are the first elements of Display Formula\({\bf{s}}_i^\prime\) and n′, respectively. 
In this new coordinate frame, Display Formula\({\bf \overline n_{\it i\rm 1}}\) can be expressed as  
\begin{equation}\tag{16}{{\bf{\overline n}}_{i1}} = {{\mathbb{E} }_{{\bf{n}}\beta }}\left[ {{\bf{n}}|{{\bf{w}}^ \top }\left( {{{\bf{s}}_i} + {\bf{n}}} \right) \gt \beta } \right] = {\bf{U}}{{\mathbb{E} }_{{\bf{n^{\prime} }}\beta }}\left[ {{\bf{n^{\prime} }}|s_{i1}^\prime + {n_i^\prime} \gt \beta } \right].\end{equation}
 
Let us now consider the conditional expectation of each element Display Formula\({n^{\prime}_k}\) of the noise Display Formula\({\bf n^{\prime}}\) in this new coordinate frame. Consider first the conditional expectation of the first element Display Formula\({n^{\prime}_1}\):  
\begin{equation}\tag{17}{\mathbb{E} }_{{\bf n}^{\prime}}\left[ {n_1^\prime|{n_1^\prime} \gt \beta - s_{i1}^\prime} \right] = \int_{\beta - {s_{i1}^\prime}}^\infty {{n_1^\prime}} p\left( {{n_1^\prime}} \right)d{n^{\prime} _1} = {1 \over {\sqrt {2\pi } {{\sigma_1 ^{\prime} }}}}\int_{\beta - {s_{i1}^\prime}}^\infty {{n_1^\prime}} \exp \left( { - {{n^{\prime2} _1} \over {2\sigma ^{\prime2} _1}}} \right)d{n^{\prime}_1} = {{ - {{\sigma_1^{\prime} }}} \over {\sqrt {2\pi } }}\left. {\exp \left( { - {{n^{\prime2} _1} \over {2\sigma ^{\prime2} _1}}} \right)} \right|_{\beta - {{s_{i1}^\prime}}}^\infty = {{{{\sigma_1^{\prime} }}} \over {\sqrt {2\pi } }}\exp \left( { - {{{{\left( {\beta - {s_{i1}^\prime}} \right)}^2}} \over {2\sigma ^{\prime2} _1}}} \right),\!\end{equation}
where Display Formula\({\sigma^{\prime}_1}\) is the standard deviation of Display Formula\({n^{\prime}_1}\).  
Now consider the conditional expectations of the remaining elements Display Formula\({{n^{\prime}_k}, k \ne 1}\):  
\begin{equation}\tag{18}{\mathbb{E} }_{\bf{n^{\prime}}}\left[ {n^{\prime} _k}|{s_{i1}^\prime} + {n^{\prime} _1} \gt \beta \right] = {\mathbb{E} }_{\bf{n^{\prime} }}\left[ {n^{\prime} _k}|{n^{\prime} _1} \gt \beta - {s_{i1}^\prime} \right] = \int_{\beta - {s_{i1}^\prime}}^\infty p \left( {n^{\prime} _1} \right)\int_{ - \infty }^\infty {n^{\prime} _k} p\left( {n^{\prime} _k}|{n^{\prime} _1} \right)d{n^{\prime} _k}d{n^{\prime} _1}.\end{equation}
 
Because Display Formula\({n^{\prime} _1}\) and Display Formula\({n^{\prime} _k}\) are jointly normal, the conditional random variable Display Formula\({n^{\prime} _k}|{n^{\prime} _1}\sim {\cal N}\left( {{\mu ^{\prime}_{k|1}},\sigma ^{\prime2} _{k|1}} \right)\) is univariate normal with mean and variance given by  
\begin{equation}\tag{19}{\mu ^{\prime} _{k|1}} = {\left( {{\sigma ^{\prime}_{1k}}/{\sigma ^{\prime}_1}} \right)^2}{n^{\prime} _1},\!\end{equation}
 
\begin{equation}\tag{20}\sigma ^{\prime2} _{k|1} = \sigma ^{\prime2} _k - \sigma ^{\prime4} _{1k}/\sigma ^{\prime2} _1,\!\end{equation}
where Display Formula\(\sigma ^{\prime2} _k\) is the variance of Display Formula\({n^{\prime} _k}\) and Display Formula\(\sigma ^{\prime2} _{1k}\) is the covariance of Display Formula\({n^{\prime} _1}\) and Display Formula\({n^{\prime} _k}\). (See, for example, Bishop, 2006, p. 87, equations 2.81–2.82.)  
Thus, we have  
\begin{equation}\tag{21}{{\mathbb{E}}_{{\bf{n^{\prime} }}}}\left[ {{n^{\prime}_k}|{s_{i1}^\prime} + {n^{\prime}_1} \gt \beta } \right] = \int_{\beta - {{s^{\prime} }_1}}^\infty {{\mu ^{\prime}_{k|1}}} p\left( {{n^{\prime}_1}} \right)d{n^{\prime} _1} = {\left( {{\sigma ^{\prime}_{1k}}/{\sigma ^{\prime} _1}} \right)^2}\int_{\beta - {s^{\prime} _{i1}}}^\infty {{n^{\prime }_1}} p\left( {{n^{\prime }_1}} \right)d{n^{\prime} _1} = {{\sigma ^{\prime2} _{1k}} \over {\sqrt {2\pi } {\sigma ^{\prime} _1}}}\exp \left( { - {{{{\left( {\beta - {s^{\prime} _1}} \right)}^2}} \over {2\sigma ^{\prime2} _1}}} \right).\end{equation}
 
As a result, we can write  
\begin{equation}\tag{22}{{\bf{\overline n}}_{i1}} = {\bf{U}}{{\mathbb{E}}_{{\bf{n^{\prime} }}\beta }}\left[ {{\bf{n^{\prime} }}|{s_{i1}^\prime} + {n^{\prime} _1} \gt \beta } \right] = {1 \over {\sqrt {2\pi } {\sigma ^{\prime} _1}}}{{\mathbb{E}}_\beta }\left[ {\exp \left( { - {{{{\left( {\beta - {s_{i1}^\prime}} \right)}^2}} \over {2\sigma ^{\prime2} _1}}} \right)} \right]\times {\bf{U}}{\left[ {\sigma ^{\prime2} _1,\sigma ^{\prime2} _{12}, \ldots ,\sigma ^{\prime2} _{1m}} \right]^ \top } = {c_{i1}}{\bf{U}}{{\mathbb{E}}_{{\bf{n^{\prime} }}}}\left[ {{\bf{n^{\prime} }}{n^{\prime} _1}} \right],\!\end{equation}
where Display Formula\({c_{i1}} = {1 \over {\sqrt {2\pi } {\sigma ^\prime _1}}}{{\mathbb{E}}_\beta }\left[ {\exp \left( { - {{{{\left( {\beta - {s^{\prime} _{t1}}} \right)}^2}} \over {2\sigma ^{\prime2} _1}}} \right)} \right]{}\) is a positive proportionality constant.  
This result can be transformed back to the original pixel coordinates by applying the inverse rotation U and making the substitution Display Formula\({n^{\prime} _1} = {{\bf{w}}^ \top }{\bf{n}} = {{\bf{n}}^ \top }{\bf{w}}\) before taking the expectation:  
\begin{equation}\tag{23}{{\bf{\overline n}}_{i1}} = {c_{i1}}{{\mathbb{E}}_{{\bf{n^{\prime} }}}}\left[ {{\bf{Un^{\prime} }}{n^{\prime} _1}} \right] = {c_{i1}}{{\mathbb{E}}_{\bf{n}}}\left[ {{\bf{n}}{{\bf{n}}^ \top }} \right]{\bf{w}} = {c_{i1}}\Sigma {\bf{w}}.\end{equation}
 
Thus, we have that an unbiased estimate of the observer template w can be obtained by premultiplying Display Formula\({{\bf{\overline n}}_{i1}}\) by the inverse covariance of the stimulus noise:  
\begin{equation}\tag{24}{\bf{w}} = {\left( {{c_{i1}}\Sigma } \right)^{ - 1}}{{\bf{\overline n}}_{i1}},\quad {\rm{with}}\quad {c_{i1}} = {1 \over {\sqrt {2\pi } {\sigma ^{\prime} _1}}}{{\mathbb{E}}_\beta }\left[ {\exp \left( { - {{{{\left( {\beta - {s_{i1}^\prime}} \right)}^2}} \over {2\sigma ^{\prime2} _1}}} \right)} \right].\end{equation}
 
It is straightforward to show that for Display Formula\({{\bf{\overline n}}_{i0}}\) an analogous equation holds but with a negative proportionality constant. Specifically,  
\begin{equation}\tag{25}{\bf{w}} = {\left( {{c_{i0}}\Sigma } \right)^{ - 1}}{{\bf{\overline n}}_{i0}},\quad {\rm{with}}\quad {c_{i0}} = - {1 \over {\sqrt {2\pi } {\sigma ^{\prime} _1}}}{{\mathbb{E}}_\beta }\left[ {\exp \left( { - {{{{\left( {\beta - {s^{\prime} _{i0}}} \right)}^2}} \over {2\sigma ^{\prime2} _1}}} \right)} \right].\end{equation}
 
We now generalize this result to complex-valued signals. In our complex-valued linear template model, discrimination is based on a real-valued scalar decision variable r given by Display Formula\(r = \rm{Re}( {{{\bf{w}}^{\it{H}}}{\bf{\tilde s}}}\ )\), where w is the complex-valued observer template and Display Formula\({\bf{\tilde s}}\) is the complex-valued noisy stimulus. This can be re-expressed as a sum of two real-valued inner products:  
\begin{equation}\tag{26}r = {\bf{w}}_x^ \top {{\bf{\tilde s}}_x} + {\bf{w}}_y^ \top {{\bf{\tilde s}}_y},\!\end{equation}
where wx and wy are the real and imaginary components of w and Display Formula\({{\bf{\tilde s}}_x}\) and Display Formula\({{\bf{\tilde s}}_y}\) are the real and imaginary components of Display Formula\({\bf{\tilde s}}\), respectively. This can be reduced to a single real-valued inner product if we stack the real and imaginary components of the template and stimulus:  
\begin{equation}\tag{27}{{\bf{w}}_{xy}} = {\left[ {{\bf{w}}_x^ \top ,{\bf{w}}_y^ \top } \right]^ \top },{{\bf{\tilde s}}_{xy}} = {\left[ {{\bf{\tilde s}}_x^ \top ,{\bf{\tilde s}}_y^ \top } \right]^ \top } \to r = {\bf{w}}_{xy}^ \top {\bf{\tilde s}}_{xy}.\end{equation}
From the proof above, we know that an unbiased estimate Display Formula\({{\bf{\widehat w^{\prime} }}_{xy}}\) of the real-valued template wxy can be obtained by normalizing the biased estimate (Equation 13) by the covariance of Display Formula\({{\bf{\tilde s}}_{xy}}\):  
\begin{equation}\tag{28}{\bf\widehat w}^\prime_{xy} = \Sigma _{xy}^{ - 1}{{\bf{\widehat w}}_{xy}}.\end{equation}
Because there is a 1:1 identification of the real-valued coefficients of the template wxy with the real and imaginary coefficients of the complex-valued template w, Equation 28 also yields an unbiased estimate Display Formula\({\bf{\widehat w^{\prime} }}\) of the latter. In our particular case, because the same real-valued independent and identically distributed noise process Display Formula\(\sim {\cal N}\left( {0,\Sigma } \right)\) is used to generate both real and imaginary components of the stimulus, Σxy is block-diagonal and can be written as  
\begin{equation}\tag{29}{\Sigma _{xy}} = \left[ {\matrix{ \Sigma&{{{\bf{0}}_m}} \cr {{{\bf{0}}_m}}&\Sigma \cr } } \right],\!\end{equation}
and so  
\begin{equation}\tag{30}{\bf{\widehat w^{\prime} }} = {\Sigma ^{ - 1}}{\bf{\widehat w}}.\end{equation}
 
Figure 1
 
The standard classification image experiment.
Figure 1
 
The standard classification image experiment.
Figure 2
 
The linear template model of visual detection.
Figure 2
 
The linear template model of visual detection.
Figure 3
 
The Fourier descriptor (FD) representation of a planar shape.
Figure 3
 
The Fourier descriptor (FD) representation of a planar shape.
Figure 4
 
Amplitude spectrum of the three animal shapes used in this study.
Figure 4
 
Amplitude spectrum of the three animal shapes used in this study.
Figure 5
 
Results of template estimation simulations using the rabbit shape as signal. The simulated observer used an ideal template with added internal Gaussian noise. Plots show mean and standard error over 30 repetitions. (a) Total weighted squared error (Equation 6) for the noise averaging and probit GLM template estimation methods as a function of the number of trials. (b) Weighted squared error at each frequency (Equation 6) for a 1,500-trial experiment as a function of the dimensionality M of the stimulus.
Figure 5
 
Results of template estimation simulations using the rabbit shape as signal. The simulated observer used an ideal template with added internal Gaussian noise. Plots show mean and standard error over 30 repetitions. (a) Total weighted squared error (Equation 6) for the noise averaging and probit GLM template estimation methods as a function of the number of trials. (b) Weighted squared error at each frequency (Equation 6) for a 1,500-trial experiment as a function of the dimensionality M of the stimulus.
Figure 6
 
Sensitivity (d′) of FD coefficients, their amplitudes, and phases for the shape discrimination task (rabbit shape).
Figure 6
 
Sensitivity (d′) of FD coefficients, their amplitudes, and phases for the shape discrimination task (rabbit shape).
Figure 7
 
Stimulus generation. The plots show real and imaginary FD coefficients across frequency.
Figure 7
 
Stimulus generation. The plots show real and imaginary FD coefficients across frequency.
Figure 8
 
Experiment 1. Example shape classification image estimated with noise-averaging and GLM methods.
Figure 8
 
Experiment 1. Example shape classification image estimated with noise-averaging and GLM methods.
Figure 9
 
Experiment 1. Estimated spatial observer templates \({\bf{\widehat w}}\).
Figure 9
 
Experiment 1. Estimated spatial observer templates \({\bf{\widehat w}}\).
Figure 10
 
Experiment 1. Amplitude spectrum for ideal and estimated observer templates; α is the maximum likelihood estimate of the power law exponent in \({S_k} \propto |k{|^{ - \alpha }}\), i.e., the negative of the slope of the best-fitting line, shown in blue.
Figure 10
 
Experiment 1. Amplitude spectrum for ideal and estimated observer templates; α is the maximum likelihood estimate of the power law exponent in \({S_k} \propto |k{|^{ - \alpha }}\), i.e., the negative of the slope of the best-fitting line, shown in blue.
Figure 11
 
Experiment 1. Power-law exponents for estimated human and ideal templates. Error bars represent standard error of the mean.
Figure 11
 
Experiment 1. Power-law exponents for estimated human and ideal templates. Error bars represent standard error of the mean.
Figure 12
 
Experiment 1. Deviation (root mean weighted squared deviation, Equation 6) of estimated observer and ideal templates. Error bars represent standard error of the mean.
Figure 12
 
Experiment 1. Deviation (root mean weighted squared deviation, Equation 6) of estimated observer and ideal templates. Error bars represent standard error of the mean.
Figure 13
 
Experiment 1. t score measure of agreement between linear template models of shape discrimination based upon estimated human templates (MH) and the ideal template (MI).
Figure 13
 
Experiment 1. t score measure of agreement between linear template models of shape discrimination based upon estimated human templates (MH) and the ideal template (MI).
Figure 14
 
Experiment 1. (a) Trial-by-trial agreement of observer template model MH and ideal template model MI with human responses. (b) Trial-by-trial internal consistency of observer template model MH and ideal template model MI with different samples of internal noise. The blue horizontal bar and shading indicate mean and standard error of the agreement/consistency expected by chance. In both cases, this is given by \(p_c^2 + {(1 - {p_c})^2}\) and represents a model observer that matches human proportion correct pc but for which all errors are due to internal noise.
Figure 14
 
Experiment 1. (a) Trial-by-trial agreement of observer template model MH and ideal template model MI with human responses. (b) Trial-by-trial internal consistency of observer template model MH and ideal template model MI with different samples of internal noise. The blue horizontal bar and shading indicate mean and standard error of the agreement/consistency expected by chance. In both cases, this is given by \(p_c^2 + {(1 - {p_c})^2}\) and represents a model observer that matches human proportion correct pc but for which all errors are due to internal noise.
Figure 15
 
Experiment 2. Spatial observer templates \({\bf{\widehat w}}\) estimated from (a) all trials, (b) signal 0 trials only, and (c) signal 1 trials only.
Figure 15
 
Experiment 2. Spatial observer templates \({\bf{\widehat w}}\) estimated from (a) all trials, (b) signal 0 trials only, and (c) signal 1 trials only.
Figure 16
 
Experiment 2. t score measure of agreement between linear template models of shape discrimination based upon estimated human templates (MH) and the ideal template (MI).
Figure 16
 
Experiment 2. t score measure of agreement between linear template models of shape discrimination based upon estimated human templates (MH) and the ideal template (MI).
Figure 17
 
Experiment 2. (a) Trial-by-trial agreement of observer template model MH and ideal template model MI with human responses compared with agreement between observers. (b) Trial-by-trial internal consistency of observer template model MH and ideal template model MI with different samples of internal noise, compared with internal (within) consistency of human observers. The blue horizontal bar and shading indicate mean and standard error of the agreement/consistency expected by chance. In both cases, this is given by \(p_c^2 + {(1 - {p_c})^2}\) and represents a model observer that matches human proportion correct pc but for which all errors are due to internal noise.
Figure 17
 
Experiment 2. (a) Trial-by-trial agreement of observer template model MH and ideal template model MI with human responses compared with agreement between observers. (b) Trial-by-trial internal consistency of observer template model MH and ideal template model MI with different samples of internal noise, compared with internal (within) consistency of human observers. The blue horizontal bar and shading indicate mean and standard error of the agreement/consistency expected by chance. In both cases, this is given by \(p_c^2 + {(1 - {p_c})^2}\) and represents a model observer that matches human proportion correct pc but for which all errors are due to internal noise.
Figure 18
 
Proportion correct versus internal consistency for the human data (symbols), observer template model MH (colored curves), and the ideal template model MI (black curve).
Figure 18
 
Proportion correct versus internal consistency for the human data (symbols), observer template model MH (colored curves), and the ideal template model MI (black curve).
Table 1
 
Experiment 1. Performance (percentage correct) for each observer and shape.
Table 1
 
Experiment 1. Performance (percentage correct) for each observer and shape.
Table 2
 
Experiment 2. Performance (percentage correct) for each observer and shape.
Table 2
 
Experiment 2. Performance (percentage correct) for each observer and shape.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×