**Recently in vision science there has been great interest in understanding the perceptual representations of complex multidimensional stimuli. Therefore, it is becoming very important to develop methods for performing psychophysical experiments with multidimensional stimuli and efficiently estimating psychometric models that have multiple free parameters. In this methodological study, I analyze three efficient implementations of the popular Ψ method for adaptive data collection, two of which are novel approaches to psychophysical experiments. Although the standard implementation of the Ψ procedure is intractable in higher dimensions, I demonstrate that my implementations generalize well to complex psychometric models defined in multidimensional stimulus spaces and can be implemented very efficiently on standard laboratory computers. I show that my implementations may be of particular use for experiments studying how subjects combine multiple cues to estimate sensory quantities. I discuss strategies for speeding up experiments and suggest directions for future research in this rapidly growing area at the intersection of cognitive science, neuroscience, and machine learning.**

*multidimensional*to refer to the case where there are multiple stimulus dimensions which give rise to models that have a relatively large number of free parameters. The major problem posed by the need to identify multidimensional sensory-processing models from experimental data is the ability to collect sufficient data for attaining accurate estimates of model parameters in the time available to work with a subject. One approach to speeding up the collection of sensory data is to design stimuli adaptively so that at each trial, one presents stimuli optimized for the goal of accurate parameter estimation (Chaloner & Verdinelli, 1995; Lewi, Butera, & Paninski, 2009; DiMattina & Zhang, 2011). This general approach of adaptively choosing stimuli during an experiment to maximize some utility function goes by a variety of names, including

*adaptive design optimization*(Cavagnaro, Myung, Pitt, & Kujala, 2010) and

*optimal experimental design*(Atkinson, Donev, & Tobias, 2007; DiMattina & Zhang, 2011), and many different algorithms of this kind have been proposed and applied in experiments to accelerate the process of estimating psychometric function parameters (Hall, 1981; Watson & Pelli, 1983; Kontsevich & Tyler, 1999; Kujala & Lukka, 2006; Lesmes, Lu, Baek, & Albright, 2010; Prins, 2013b). However, with few exceptions (Kujala & Lukka, 2006; Lesmes et al., 2010), nearly all of these procedures have been applied to estimating psychometric functions defined in 1-D stimulus spaces.

*of some*

**θ***psychometric function F*(

**x**,

*) mapping stimulus parameter values*

**θ****x**to the range [0, 1]. The goal of characterizing this function's parameters

*is to accurately model the process generating the data. For an*

**θ***n*-alternative forced-choice experiment, the probability Ψ of a correct response given stimulus parameters

**x**is given by where

*π*

_{l}is the lapse rate and

*π*

_{c}is the chance of obtaining a correct response by guessing (Kuss, Jakel, & Wichmann, 2005). One popular choice of the psychometric function is given by where is the logistic or sigmoidal function commonly used in machine learning and neural modeling (Murphy, 2012). Although there are many choices of sigmoid-shaped curves that are commonly used in modeling psychophysical data, all of them yield very similar estimates of important psychophysical parameters like sensory thresholds (Kingdom & Prins, 2010). A different parameterization of the psychometric function in Equation 2 that makes more explicit the psychophysical parameters of interest may be written as where

*λ*is the

*threshold*and

*β*the

*sensitivity*. This formulation is equivalent to that in (2), with

*θ*

_{1}=

*β*and

*θ*

_{0}= −

*βλ*. Figure 2a illustrates a sigmoidal psychometric function.

**x**= (

*x*

_{1}, …,

*x*)

_{N}^{T}interact to effect perception. For instance, recent investigations have considered how observers combine multiple cues like texture gradients and stereoscopic information for estimating the slant of a surface (Knill & Saunders, 2003; Hillis et al., 2004), while other investigations have considered how haptic and visual information are combined in performing reaching movements (Ernst & Banks, 2002). Other studies have considered how various cues are integrated when subjects perceive complex natural stimuli like occlusion boundaries or surfaces defined by multiple cues like texture, color, and luminance (Landy & Kojima, 2001; McGraw et al., 2003; Ing et al., 2010; DiMattina et al., 2012; Saarela & Landy, 2012). In the simplest case, cues are combined in a linear manner, and one can predict the response to simultaneous variations of multiple cues from the responses to each cue in isolation (Ernst & Banks, 2002; Knill & Saunders, 2003). However, several previous studies have demonstrated that sensory cues do not always combine linearly (Frome, Buck, & Boynton, 1981; Saunders & Knill, 2001; Zhou & Mel, 2008), making it necessary to covary several feature dimensions simultaneously to fully characterize the nature of their interactions.

*x*

_{1}and

*x*

_{2}which models the contribution of each individual variable and their possible multiplicative interactions may be written as

*. Generalizing Equation 6 to three stimulus variables is relatively straightforward, and in this case we have*

**θ***N*-dimensional stimulus space, the parameter-space dimensionality

*M*will be

*y*

_{1}=

*x*

_{1},

*y*

_{2}=

*x*

_{2}, and

*y*

_{3}=

*x*

_{1}

*x*

_{2}. The general form of this reparameterized model is where

**y**= (

*y*

_{1}, …,

*y*

_{M}_{−1})

^{T}are the reparameterized stimuli, with

*M*being the parameter-space dimensionality as given in Equation 8.

*curse of dimensionality*(Bellman, 1961; Bishop, 2006) is quite familiar from studies using classification images (Mineault, Barthelme, & Pack, 2009; Murray, 2011), as well as sensory neurophysiology (Wu, David, & Gallant, 2006). Even in estimating the univariate psychometric function in Equation 2, a large amount of data must be collected in order to obtain sufficiently tight confidence intervals to see the effects of experimental manipulations on important psychometric function parameters like thresholds and slopes (Maloney, 1990; Wichmann & Hill, 2001). For this reason, numerous procedures have been developed to speed up the collection of psychophysical data using adaptive stimulus generation, the most popular of which is the Ψ method (Kontsevich & Tyler, 1999). These methods greatly decrease the number of stimuli needed to obtain reliable estimates of psychometric function parameters, at the cost of increased trial duration due to the need to iteratively compute the next stimulus.

*S*

_{Θ}

*p*(

_{n}*) to denote the original continuous density evaluated on the support*

**θ**_{k}*. As new stimulus-response observations (*

**θ**_{k}**x**

_{n}_{+1},

*r*

_{n}_{+1}) are obtained, this density is updated using Bayes's rule: where

**x**

_{n}_{+1}to present by finding the stimulus in a discrete set

*H*

_{n}_{+1}averaged over possible subject responses

*r*= 0 or 1 is minimized. In other words, entropy is estimated by integrating out uncertainty about the true model parameters over the current posterior density

*p*(

*r*|

**x**). Mathematically, we write where with the calculation of

*p*(

*r*|

**x**) accomplished by integration over the current posterior. Efficient implementation of this procedure relies on a precomputed lookup table of the value of

*p*(

*r*|

**x**

*,*

_{i}*) for*

**θ**_{j}*r*= 0 and 1 over all

*∈ Θ and*

**θ**_{j}**x**

*∈*

_{i}*X*. A full description of the Ψ method and run-time analysis is given elsewhere (Kontsevich & Tyler, 1999; Kujala & Lukka, 2006).

*D*= 1 and

_{x}*D*= 2 stimulus dimensions. I simulated psychophysical experiments in MATLAB on two different systems (3.30 GHz Intel Xeon-64 and 3.0 GHz Intel® i7-32) and quantified the improvements in parameter estimation due to the Grid-Ψ method as well as the per-trial running time. To quantify the estimation error on a trial-by-trial basis, we obtain a point estimate

_{x}_{n}

**θ**_{T}= (0, 1)

^{T}, the Grid-Ψ method (a special case of

*optimal experimental design*or OED) greatly increases the accuracy of our parameter estimates when compared to independent and identically distributed (IID) sampling from a discrete grid of evenly spaced points (i.e., the method of constant stimuli) and yields a posterior density with lower entropy (Figure 3a, right panel). Stimulus placement is shown in Figure 3b, with the blue curve indicating the empirical stimulus probabilities from all trials and all 100 Monte Carlo experiments. We see from this plot that the Grid-Ψ method concentrates its sampling near the endpoints of the linear region of the psychometric function (black dashed line, scaled to [0, 0.3]). These findings are consistent with the original study of the Ψ method (Kontsevich & Tyler, 1999), where stimulus placement was concentrated in a similar region of the psychometric function. Further analysis reveals that at the endpoints of the linear region, there is a large change in the probability of a correct response when the psychometric function parameters are varied, for both the sigmoidal form (Equation 2) and the form used by Kontsevich and Tyler (1999; see also Supplementary Figure S1).

*L*=

_{x}*L*=

_{θ}*L*= 51 levels per stimulus and parameter dimension, with stimuli being chosen from a uniform grid on the interval [−7, 7]. To generate the grid of parameter values, I uniformly spaced

*θ*

_{1}=

*β*,

*θ*

_{0}= −

*βλ*. When efficiently implemented using fully vectorized MATLAB/Octave code (available at http://itech.fgcu.edu/faculty/cdimattina/), the 1-D Grid-Ψ method is quite fast, taking less than 10 ms/trial (6.97 ± 0.18 ms,

*N*

_{trials}= 5,000) on a high-end workstation (Intel Xeon-64) and about 50 ms/trial (49.13 ± 1.31 ms,

*N*

_{trials}= 5,000) on a midrange system (Intel i7-32). Similar but more modest improvements were seen over the more restricted range of stimuli [−5, 5], which more closely corresponds to the dynamic range of the 1-D sigmoid for this simulated observer in Figure 3a (Supplementary Figure S2). However, in general the appropriate dynamic range for a given observer may be unknown prior to the experiment, making it wise to err on the side of caution and to include at least a few stimuli along each tested dimension which will always be detected (or missed) by every subject.

*L*= 51) I used in the 1-D case. Using much less dense grids (

*L*= 21) permitted implementation of this method, but it took nearly 4 s/trial (Intel Xeon-64 workstation) to generate the next stimulus (3.96 ± 0.013 s,

*N*

_{trials}= 100), making it far too slow for use in actual psychophysical experiments. In the 2-D implementation, I used a factorial grid of stimulus values

*x*

_{1},

*x*

_{2}∈ [0, 5] and defined a factorial grid of parameter values by uniformly spacing log

*θ*

_{1}, log

*θ*

_{2}, log

*θ*

_{12}∈ [−1, 1] and

*θ*

_{0}=

*λβ*, where

*λ*∈ [0, 4] and log

*β*∈ [−1, 1]. As in the 1-D case, we obtain a substantial reduction in error (Figure 3c, left panel) and entropy (Figure 3c, right panel) with the Grid-Ψ procedure for a hypothetical observer having true parameters

**θ**_{T}= (−3, 1, 1, 1)

^{T}. In this example, the true observer had a nonzero interaction term

*θ*

_{12}, which led to stimulus placement along the diagonal

*x*

_{1}=

*x*

_{2}of the stimulus space (Figure 3d) as well as along each of the individual stimulus axes. As with the 1-D case, the stimulus placement was located in regions of the stimulus space where there is a large change in the probability of correct response with respect to each of the psychometric function parameters (Supplementary Figure S3). This simple example nicely illustrates the necessity of simultaneously covarying stimulus parameters when there is the potential for nonlinear interactions. By contrast, for simulations on models similar to Equation 6 except without a nonlinear interaction term

*θ*

_{12}—i.e.,

*F*(

**x**,

*) =*

**θ***σ*(

*θ*

_{0}+

*θ*

_{1}

*x*

_{1}+

*θ*

_{2}

*x*

_{2})—we find that stimulus placement is concentrated along the individual cardinal axes (Supplementary Figure S4). This validates the standard procedure of characterizing individual parameter dimensions separately when their interactions are linear (Hillis et al., 2004). I now consider three alternative implementations of the Ψ procedure which are tractable in higher dimensions.

*particle filter*(Carpenter, Clifford, & Fearnhead, 1999) which represents a continuous posterior density

*p*(

_{n}*) by a density*

**θ***p*(

_{n}*) using the expression where*

**θ***S*

_{Θ}are sampled from the prior

*p*

_{0}(

*) and fixed through the experiment*

**θ***N*

_{θ}

*p*

_{0}(

*θ*), leading to the simplified importance-weight update rule:

*N*of supports sampled from some prior density

_{θ}*p*

_{0}(

*). This naturally gives rise to the question of how to specify an appropriate prior distribution. One recent study (Kim, Pitt, Lu, Steyvers, & Myung, 2014) suggests a principled way to specify priors, namely via hierarchical Bayesian modeling where the results of previous experiments are used to estimate a hyper-prior which can be used to define a prior for subsequent experiments. In the present study, I am interested in the effects of using different sampling strategies (adaptive and nonadaptive) while controlling for the prior shape, and I do not systematically consider the problem of prior specification or possible effects of prior misspecification.*

**θ***N*= 1,000 particles, and in the 2-D examples (Figure 4c through f) I used

_{θ}*N*= 5,000 or 10,000. For comparison, in the Grid-Ψ method for 2-D with

_{θ}*L*= 21 levels per stimulus dimension, we have

*N*=

_{θ}*L*

^{4}= 194,481 particles. In one 2-D example (Figure 4c, d), I used a Gaussian prior with

*σ*in each dimension equal to one half the upper and lower bounds of the grids used in the Grid-Ψ example. This guaranteed that the set of particles sampled approximately the same range of parameter values as before. Similar results were obtained using the uniform prior implicitly assumed in the Grid-Ψ method, from which the same numbers of particles were sampled (Figure 4e, f).

*E*as a function of the number of particles

_{n}*N*used to represent the posterior for Prior-Ψ and Grid-Ψ (black diamond). Supplementary Figure S5 shows the median final error

_{θ}*E*

_{ML}. We see from Figure 5 (left panel) that one can certainly increase the speed of the implementation by reducing the number of particles used to represent the posterior, but this may reduce the accuracy of the final parameter estimates (Figure 5, right panel; Supplementary Figure S5). On the 32-bit system, implementing Prior-Ψ can potentially be slow in cases where large numbers of particles are needed (

*N*= 5,000: 737 ± 31 ms;

_{θ}*N*= 10,000: 1.60 ± 0.39 s,

_{θ}*N*

_{trials}= 10,000). Although stimulus selection times of nearly 1 s may be acceptable for many experiments, over thousands of trials this overhead can add tens of minutes to the experiment duration. Therefore, it is of great interest to develop faster implementations of the Ψ method, especially for generalizations into even more stimulus dimensions where more particles are needed to accurately represent the posterior density.

*have probability*

**θ**_{i}*N*

_{eff}of effective particles in a particle filter, given by the expression

*N*

_{eff}quickly decreases during the course of the experiments presented in Figure 4. Although my examples did not require resampling for accurate estimation, for some problems it may be useful. The trade-off to consider is whether the improvement in estimation accuracy justifies the additional time needed for the MCMC resampling step, which takes longer as the experiment progresses, since the likelihood function has more terms. This decision is clearly dependent on the particular problem.

*d*= {

**x**

_{1},

**x**

_{2}, …,

**x**

*} known as a*

_{m}*design*, which will optimize some utility function

*U*(

*d*) specifying some desired experimental goal, for instance prediction of new observations (O'Hagan & Kingman, 1978), model comparison (Cavagnaro et al., 2010; Cavagnaro, Pitt, & Myung, 2011), or accurate estimation of model parameters (Lewi et al., 2009). For parameter-estimation problems, a valid design must include at least as many points as there are parameter dimensions: For instance, for the standard 1-D psychometric function shown in Figure 1a (which has two parameters), a valid design

*d*= {

**x**

_{1},

**x**

_{2}} for estimation must contain two unique stimuli. In a Bayesian setting, we may formally state the design problem as finding where

**y**conditioned on

*d*,

*, we obtain the expression where the quantity*

**θ***U*(

*d*|

*) is the*

**θ***conditional expected utility*of the design

*d*. In general, exact evaluation of the integrals in Equations 19 and 20 is not analytically tractable and typically requires use of numerical Monte Carlo methods (Chaloner & Verdinelli, 1995). OED can also be implemented sequentially using a greedy algorithm where only a single stimulus is chosen on each trial to maximize the expected utility, and in fact many of the information-theoretic stimulus design methods in the neuroscience and psychology literature are simply special cases of sequential optimal design with expected posterior entropy as the utility function (Kontsevich & Tyler, 1999; Lewi et al., 2009; DiMattina & Zhang, 2011; Cavagnaro et al., 2010).

*D*= {

*d*

_{1},

*d*

_{2}, …,

*d*} we may present. Assuming that we have precomputed the conditional expected utility

_{m}*U*(

*d*|

_{j}*) of each design*

**θ**_{i}*d*for all

_{j}*, we can then approximate the expected utility of design*

**θ**_{i}*d*for trial

_{j}*n*+ 1 given our current discrete posterior

*D*may be computed using a simple matrix multiplication where

**q**

_{n}_{+1}]

*=*

_{j}*U*

_{n}_{+1}(

*d*). At each iteration, we present the design which maximizes the expected utility, i.e., the design corresponding to the largest component of

_{j}**q**

_{n}_{+1}. In order to ensure that there are designs in our set

*D*which are optimal for the possible states of nature in

*S*

_{Θ}, prior to the experiment we precompute for each of the

*∈*

**θ**_{i}*S*

_{Θ}the optimal design

*N*×

_{θ}*N*matrix

_{θ}**U**. However, since a design which is optimal for a given

*∈*

**θ**_{i}*S*

_{Θ}is also nearly optimal for nearby

*∈ Θ, we can reduce the dimensionality of the matrix*

**θ**_{k}**U**and thereby speed up the multiplication in Equation 21 by only including designs in

*D*for a subset of

*N*points

_{d}*∈ Θ.*

**θ**_{i}*K*= dim(

*) stimuli (i.e., a design*

**θ***d*∈

*D*) every

*K*trials. Also, the term “lookup” should not be confused with the precomputation step in the standard Ψ technique, which simply precomputes

*p*(

*r*|

**x**

*,*

_{i}*) for all stimuli*

**θ**_{j}**x**

*and supports*

_{i}*. In this context, “lookup” refers to the fact that for each design in our set, we precompute the expected utility of that design for each of the supports in our discrete approximation of the posterior and store the value in a matrix*

**θ**_{j}**U**which remains fixed throughout the experiment. Since the matrix

**U**is defined on the fixed set of supports specified prior to the experiment, this method may not be as readily amenable to MCMC resampling as Prior-Ψ, due to the need to also update the design set

*D*and the utility matrix

**U**along with the particles.

*d*

^{*}for estimating the true parameters

**θ**_{T}= (0, 1)—equivalent to

*λ*= 0,

*β*= 1—is given by the stimuli

*x*

_{1},

*x*

_{2}= ±1.54. These stimuli are located at the regions of the psychometric curve where the change in the observer's correct response probability with respect to each of the parameters is large (Supplementary Figure S1). We see that these two stimuli are where all of my implementations of the 1-D Ψ method concentrate much of their sampling (Figures 3b and 4b), eventually alternating between these two stimuli as the experiment progresses. This fact makes sense in light of the fact that asymptotically, the posterior density becomes well described by a Gaussian (Kay, 1993), at which point minimizing the entropy is equivalent to maximizing the Fisher information determinant (Atkinson et al., 2007). Indeed, in the original implementation of the Ψ method, as the experiment progressed the stimulus presentations also alternated between two stimuli located near regions of their psychometric function where the response probability had a large change with respect to the function parameters (Kontsevich & Tyler, 1999).

*d*= {

*x*

_{1},

*x*

_{2}} = ±1.54 is optimal for estimating parameters

*λ*= 0 and

*β*= 1, we see from Figure 6b that nearby designs also have high utility. This near optimality of nearby designs makes it possible to compute optimal designs for only a subset of the

*N*particles used to represent the posterior density, thereby reducing the space and time requirements of the lookup-table method. Figure 6c illustrates for a grid of different parameter values (left panel) the optimal designs (right panel) for those values. We see that nearby parameter values map to nearby optimal designs, which is consistent with the fact that the mapping between parameter and design space is smooth and continuous. A derivation of the D-optimal design for Equation 2 is shown in Figure 6a, and an analytical approximation to this design is given in the Appendix. Similarly, the numerically optimized D-optimal design for the 2-D psychometric function (Equation 6) is shown in Figure 7. Note that it concentrates design points in regions of the psychometric function where the change in response probability with respect to each of the parameters is high (Supplementary Figure S3). However, it is important to remember that the D-optimal design for a set of parameters is not simply the concatenation of D-optimal designs for each parameter individually. Therefore, the D-optimal design in Figure 7 is not identical to the locations of maxima in Supplementary Figure S3.

_{θ}*N*

_{trials}= 2,500) on the 64-bit Xeon and 3 ms (2.52 ± 0.37 ms,

*N*

_{trials}= 2,500) on the 32-bit i7 to generate the next (four-stimulus) design for the 2-D psychometric function with

*N*= 5,000 particles and

_{θ}*N*= 1,250 designs. By contrast, the Prior-Ψ procedure with the same number of particles used to represent the posterior (

_{d}*N*= 5,000) took nearly 1 s/stimulus (737 ± 31 ms,

_{θ}*N*

_{trials}= 10,000) on the 32-bit system for estimating this same 2-D psychometric function. A direct comparison of methods with

*N*= 5,000 particles sampled from the same Gaussian prior demonstrates that the Lookup-Ψ method offers a tremendous speedup over the Prior-Ψ procedure (300× on the i7) without sacrificing final accuracy as measured by Equation 14 (Prior-Ψ error = 0.360 ± 0.239; Lookup-Ψ error = 0.369 ± 0.231,

_{θ}*N*

_{mc}= 100).

*p*(

_{n}*) as a Gaussian centered on the posterior mode*

**θ***. As new observations (*

**μ**_{n}**x**

_{n}_{+1},

*r*

_{n}_{+1}) are obtained, the mode and covariance estimates are updated until the mode converges to a final estimate with a sufficiently small covariance. In the case where the system response model

*p*(

*r*|

**x**,

*) is Gaussian, one may analytically compute the new mode*

**θ**

**μ**_{n}_{+1}and new covariance

**Σ**

_{n}_{+1}recursively using the Kalman-filter update formulas (Kay, 1993). However, when the response model is not Gaussian or cannot be reasonably approximated as one (for instance, the psychophysical model in Equation 1), the new posterior mode

**μ**_{n}_{+1}cannot be computed analytically and must be found using numerical optimization. However, since a single observation generally does not drastically change the location of the posterior mode, this optimization is quite fast. Given the new mode

**μ**_{n}_{+1}, computation of the covariance

**Σ**

_{n}_{+1}is straightforward using the formulas presented in this section.

*of the log-posterior: where and*

**μ**_{n}*D*

_{1:}

*= {(*

_{n}**x**

_{1},

*r*

_{1}), (

**x**

_{2},

*r*

_{2}), …, (

**x**

*,*

_{n}*r*)} is all of the stimulus-response data collected during the first

_{n}*n*trials. Exponentiating both sides of Equation 24 yields a Gaussian approximation

*p*centered at

_{n}*with covariance*

**μ**_{n}**Σ**

*, with the entropy of this Gaussian density given by*

_{n}**x**

_{n}_{+1}, the Ψ method minimizes the expected entropy of the subsequent Gaussian posterior

*p*

_{n}_{+1}(

*|*

**θ***D*

_{1:}

*,*

_{n}*r*

_{n}_{+1},

**x**

_{n}_{+1}), where

*r*

_{n}_{+1}∈ {0, 1} is the (unknown) subject response for trial

*n*+ 1.

*p*(

*r*|

**x**) = ∫

*p*(

*r*|

**x**,

*)*

**θ***p*(

_{n}*)*

**θ***d*is straightforward and can be accomplished by Monte Carlo integration. From Equation 26 we see that computing the entropies

**θ***H*

_{n}_{+1}(

**x**,

*r*) simply amounts to calculating the two possible determinants of

**Σ**

_{n}_{+1}, which we find using for the cases where

*r*= 0 and

*r*= 1. Note that we evaluate Equation 27 using the current posterior mode

*, which is a reasonable approximation assuming that successive posterior modes are nearby. Applying this method to the generic multivariate regression model (Equation 9), we readily compute*

**μ**_{n}

**θ**_{T}and discrete stimulus search grid as the other examples (

*L*= 21 evenly spaced stimuli on [0, 5] in the 2-D case), and defined my initial prior with

**μ**_{0}= (−1, 0.5, 0.5, 0.5)

^{T}and

**Σ**

_{0}= 2 ·

**I**. I found that the Laplace-Ψ procedure was extremely fast, taking only about 20 ms (18.83 ± 1.88 ms,

*N*

_{trials}= 10,000) to choose the next stimulus on the 32-bit Intel i7 for the 2-D psychometric function. In my implementations, due to the relatively low dimensionality of the parameter space (4–8 dimensions), I made use of the MATLAB/Octave command fminunc.m (supplied with gradient and starting from

*) to update the posterior mode and found this to be fast (15.69 ± 2.91 ms,*

**μ**_{n}*N*= 100) for the highest dimensional examples I analyzed.

*E*(Equation 14) obtained by IID stimulus presentation after

_{iid}*N*trials, we define the trial speedup factor

*S*=

*N*/

*N*, where

_{oed}*N*is the number of trials needed on average to reach the criterion error

_{oed}*E*. This factor tells us how many times faster OED methods are than IID methods for attaining the same accuracy. Supplementary Figure S7 plots this speedup factor for all of the methods for the case of the 2-D psychometric function (Equation 6). We see that for this particular example, each of these methods is about 3.5–4.5 times more efficient than IID sampling in terms of the savings in number of stimulus presentations. Clearly, the degree benefit obtained by adaptive stimulus optimization methods will in general be dependent on the particular problem. Therefore, the results presented here in the context of a few specific examples provide an existence proof that such methods may be useful in some situations.

_{iid}

**θ**_{T}= (−4, 1, 1, 0.5, 0, 1, 0, 1)

^{T}of the eight-parameter model (Equation 6) defined on three-dimensional stimuli

**x**= (

*x*

_{1},

*x*

_{2},

*x*

_{3})

^{T}. In this hypothetical example, stimulus feature

*x*

_{3}interacts multiplicatively with stimulus feature

*x*

_{1}, and therefore it is inappropriate to assume linear cue combination. Our stimulus space is a discrete 3-D grid with

*L*= 21 levels uniformly spaced on [0, 4], for a total of (21)

^{3}= 9,261 possible stimuli. Our Gaussian prior had mean

**μ**_{0}= (0, 0.5, 0.5, 0.5, 0, 0, 0, 0)

^{T}and covariance

**Σ**

_{0}= 2 ·

**I**.

*θ*

_{12}(Figure 3), many of the stimuli chosen by the Ψ procedure lie on the diagonals of the stimulus space (Supplementary Figure S8). This demonstrates the crucial importance of covarying stimulus features when their perceptual interactions are nonlinear. I found that when searching over the full grid of

*N*= 9,261 stimuli for the stimulus maximizing Equation 13, stimulus selection times were somewhat slow (Xeon-64: 827 ± 5 ms,

*N*

_{trials}= 25,000) due to the large number of function evaluations needed. I now demonstrate two possible ways to speed up the stimulus selection times.

*L*, where

^{D}*L*is the number of levels per dimension and

*D*is the dimensionality. Therefore, one may wish to reduce the number of function evaluations by either optimizing Equation 13 on a continuous stimulus space or periodically winnowing the stimulus space so that only stimuli which have had high utility over past trials are retained. The justification for this winnowing of the stimulus space is that as one learns more about the true parameters of the system, fewer stimuli are potentially useful for refining one's estimate. The expected information gain (normalized by the current entropy) is illustrated in Figure 11 for the 1-D psychometric function (Equation 2), and we see that as the experiment progresses, fewer stimuli have high expected utility, with the algorithm eventually alternating between the two points (±1.54) which comprise the D-optimal design for the true system parameters (Figure 6a).

*N*used to represent the posterior density grows, the time required by the Prior-Ψ procedure to generate the next stimulus can become experimentally inconvenient.

_{θ}*D*= {

*d*

_{1}, …,

*d*} remains fixed throughout the experiment, it should not be too time consuming upon resampling a new set of particles (much smaller the initial sample from the prior) to recompute the utility

_{m}*U*(

*d*|

*) of each design in*

**θ***D*for each particle, particularly if the design set is winnowed to exclude designs of low expected utility as the experiment progresses. Furthermore, given a particle

*, for many models it may be possible to analytically compute an approximately optimal design (see Appendix). The Lookup-Ψ method is*

**θ**_{i}*O*(

*N*·

_{d}*N*) and demonstrates a substantial improvement over the

_{θ}*O*(

*N*·

_{x}*N*) Prior-Ψ method for an identical number

_{θ}*N*of supports (for

_{θ}*N*≪

_{d}*N*), and I suggest that this method may be a powerful substitute for Prior-Ψ in situations where a large number of particles is needed to accurately represent the posterior density, thereby making Prior-Ψ intractable. Further exploration of this approach is potentially of great interest to researchers in statistical learning as well as in cognitive sciences.

_{θ}*, 26 (Suppl 1), 18.*

*Perception**, 17 (4), 439–448.*

*IEEE Transactions on Automatic Control**, 50 (2), 174–188.*

*IEEE Transactions on Signal Processing**. Oxford, UK: Oxford University Press.*

*Optimum experimental designs, with SAS, Vol. 34**. Princeton, NJ: Princeton University Press.*

*Adaptive control processes: A guided tour, Vol. 4**, 17 (4), 430–436.*

*Current Opinion in Neurobiology**Probability and statistics: Essays in honor of David A. Freedman*( pp. 316–334 ). Shaker Heights, OH: Institute of Mathematical Statistics.

*, 32 (31), 10618–10626.*

*The Journal of Neuroscience**Sharp failure rates for the bootstrap particle filter in high dimensions*. In B. Salem Clarke & S. Ghosal (Eds.),

*Pushing the limits of contemporary statistics*:

*Contributions in honor of Jayanta K. Ghosh*( pp. 318–329 ). Shaker Heights, OH: Institute of Mathematical Statistics.

*. New York: Springer.*

*Pattern recognition and machine learning**, 146 (1), 2–7.*

*IEE Proceedings—Radar, Sonar and Navigation**, 22 (4), 887–905.*

*Neural Computation**, 18 (1), 204–210.*

*Psychonomic Bulletin & Review**,*

*Statistical Science**10,*273–304.

*. Hoboken, NJ: John Wiley & Sons.*

*Elements of information theory**, 10, 145–175.*

*Oxford: Oxford University Press**, 12 (13): 15, 1–21, doi:10.1167/12.13.15. [PubMed] [Article]*

*Journal of Vision**, 23 (9), 2242–2288.*

*Neural Computation**, 7, 1–16.*

*Frontiers in Neural Circuits**, 415 (6870), 429–433.*

*Nature**, 71 (2), 145–150.*

*Journal of the Optical Society of America**. Hoboken, NJ: Wiley Online Library.*

*Markov chain Monte Carlo**, 69 (6), 1763–1769.*

*The Journal of the Acoustical Society of America**, 144 (1), 62–80.*

*Journal of Econometrics**, 4 (12): 1, 967–992, doi:10.1167/4.12.1. [PubMed] [Article]*

*Journal of Vision**, 10 (4): 10, 1–19, doi:10.1167/10.4.10. [PubMed] [Article]*

*Journal of Vision**. Englewood Cliffs, NJ: PTR Prentice-Hall.*

*Fundamentals of statistical signal processing**, 26 (11), 2465–2492.*

*Neural Computation**, 27 (12), 712–719.*

*Trends in Neurosciences**, 43 (24), 2539–2558.*

*Vision Research**, 25 (1), 57–74.*

*IEEE Transactions on Pattern Analysis and Machine Intelligence**, 39 (16), 2729–2737.*

*Vision Research**, 427 (6971), 244–247.*

*Nature**, 50 (4), 369–389.*

*Journal of Mathematical Psychology**, 5 (5): 8, 478–492, doi:10.1167/5.5.8. [PubMed] [Article]*

*Journal of Vision**, 18 (9), 2307–2320.*

*Journal of the Optical Society of America A**, 10 (3): 17, 1–21, doi:10.1167/10.3.17. [PubMed] [Article]*

*Journal of Vision**, 21 (3), 619–687.*

*Neural Computation**. Cambridge, MA: MIT Press.*

*Visual psychophysics: From laboratory to theory**, 47 (2), 127–134.*

*Perception & Psychophysics**, 26 (5), 530–549.*

*IEEE Transactions on Pattern Analysis and Machine Intelligence**, 3 (4): 2, 265–273, doi:10.1167/3.4.2. [PubMed] [Article]*

*Journal of Vision**, 9 (10): 17, 1–24, doi:10.1167/9.10.17. [PubMed] [Article]*

*Journal of Vision**. Cambridge, MA: MIT Press.*

*Machine learning: A probabilistic perspective**, 11 (5): 2, 1–25, doi:10.1167/11.5.2. [PubMed] [Article]*

*Journal of Vision**, 116 (3), 499–518.*

*Psychological Review**Curve fitting and optimal design for prediction*. Journal of the Royal Statistical Society: Series B (Methodological), 40 (1), 1–42.

*, 165, 493–507.*

*Progress in Brain Research**, 6 (10), 421–425.*

*Trends in Cognitive Sciences**, 109 (3), 472–491.*

*Psychological Review**, 76 (373), 103–106.*

*Journal of the American Statistical Association**, 12 (6): 25, 1–16, doi:10.1167/12.6.25. [PubMed] [Article]*

*Journal of Vision**Kingdom, F. A. A. (2009)*. Palamedes: Matlab routines for analyzing psychophysical data.

*Available from http://www.palamedestoolbox.org*.

*, 13 (7): 3, 1–17, doi:10.1167/13.7.3. [PubMed] [Article]*

*Journal of Vision**, 58, 59–67.*

*Vision Research**, 41 (24), 3163–3183.*

*Vision Research**, 57 (3), 773–786.*

*Journal of Neurophysiology**, 136 (12), 4629–4640.*

*Monthly Weather Review**, 21 (9), 1278–1286.*

*Neural Networks**. The Johns Hopkins University School of Medicine, Baltimore, MD.*

*Adaptive modeling of marmoset inferior colliculus neurons in vivo*(Unpublished doctoral dissertation)*. Oxford, UK: Oxford University Press.*

*Sensory cue integration**, 10 (11), 3543–3558.*

*The Journal of Neuroscience**, 8 (12): 8, 1–13, doi:10.1167/8.12.8. [PubMed] [Article]*

*Journal of Vision**, 33 (2), 113–120.*

*Perception & Psychophysics**, 63 (8), 1293–1313.*

*Perception & Psychophysics**, 78 (2), 803–821.*

*Econometrica**, 29, 477–505.*

*Annual Review of Neuroscience**, 14 (4): 14, 1–14, doi:10.1167/14.4.14. [PubMed] [Article]*

*Journal of Vision**, 8 (4): 4, 1–25, doi:10.1167/8.4.4. [PubMed] [Article]*

*Journal of Vision**d*= {

*x*

_{1},

*x*

_{2}} is the set of observations which maximize the determinant of the Fisher information matrix, given by where

*r*= 0, 1), we have

*p*(

*r*= 1|

**x**,

*) =*

**θ***σ*(

**θ**^{T}

**x**) =

*η*and

*p*(

*r*= 0|

**x**,

*) = 1 −*

**θ***η*, where we use the simplifying notation

**x**= (1,

*x*)

^{T}. Applying the definition in Equation 30, it is simple to show that and therefore

*α*

_{2}=

*= (0, 1)*

**θ**^{T}.

*d*

^{*}, due to the presence of transcendental functions, but using the (

*λ*,

*β*) parametrization (Equation 4) and making some approximations allows us to solve analytically for an approximately D-optimal design. To see this, assume that the solution is symmetric about the threshold

*λ*, so that

*σ*′(−

*u*) =

*σ*′(

*u*). Since

*λ*,

*β*) = (0,1), this gives us an approximate optimal design of

*d*

^{*}= { ±1.54}.