**Comparing models facilitates testing different hypotheses regarding the computational basis of perception and action. Effective model comparison requires stimuli for which models make different predictions. Typically, experiments use a predetermined set of stimuli or sample stimuli randomly. Both methods have limitations; a predetermined set may not contain stimuli that dissociate the models, whereas random sampling may be inefficient. To overcome these limitations, we expanded the psi-algorithm (Kontsevich & Tyler, 1999) from estimating the parameters of a psychometric curve to distinguishing models. To test our algorithm, we applied it to two distinct problems. First, we investigated dissociating sensory noise models. We simulated ideal observers with different noise models performing a two-alternative forced-choice task. Stimuli were selected randomly or using our algorithm. We found using our algorithm improved the accuracy of model comparison. We also validated the algorithm in subjects by inferring which noise model underlies speed perception. Our algorithm converged quickly to the model previously proposed (Stocker & Simoncelli, 2006), whereas if stimuli were selected randomly, model probabilities separated slower and sometimes supported alternative models. Second, we applied our algorithm to a different problem—comparing models of target selection under body acceleration. Previous work found target choice preference is modulated by whole body acceleration (Rincon-Gonzalez et al., 2016). However, the effect is subtle, making model comparison difficult. We show that selecting stimuli adaptively could have led to stronger conclusions in model comparison. We conclude that our technique is more efficient and more reliable than current methods of stimulus selection for dissociating models.**

*m*discrete psychophysical models best describes subjects' behavior, under the assumption that the model underlying subjects' behavior is contained in the set of models. Under a traditional experimental approach an experimenter would present a number of stimuli

**x**to a subject and obtain the corresponding responses to these stimuli

**r**. Using Bayes' rule, we can compute the probability of a particular psychophysical model

*m*given the responses and stimuli as:

*p*(

*m*) is the prior probability of each model

*m*,

*p*(

*m*|

**r**,

**x**) is the posterior distribution of each model and

*p*(

**r**|

**x**,

*m*) is referred to as the marginal likelihood. The marginal likelihood is obtained by marginalizing over the parameters

*θ*of the particular model:

**x**that were presented to the subject. Different stimuli and responses produce different posterior distributions of models. We can characterize the quality of a possible posterior using a particular utility function. Following previous work in model comparison, we use the entropy of the posterior distribution to characterize its quality (Cavagnaro et al., 2013; Cavagnaro et al., 2010; Cavagnaro et al., 2011; DiMattina, 2016):

*x*and parameters

*θ*on discrete grids, similar to Kontsevich and Tyler (1999). This requires three quantities: a prior distribution over models

*p*(

*m*), a prior distribution of parameters for each model

*p*(

*θ|m*), and a likelihood look-up table for each model

*p*(

*r*|

*x*,

*θ*,

*m*), which represents the probability of a response given a model and parameter set. Using these quantities, we can design an iterative algorithm to select the optimal stimuli on a trial-to-trial basis, which is as follows:

- Compute the posterior distribution of models given response
*r*in the next trial to stimulus*x*:Note,\begin{equation}{p_t}(m|r,x) = {{{p_t}(r|x,m){p_t}(m)} \over {\sum\nolimits_m {{p_t}(r|x,m){p_t}(m)} }}\end{equation}Display Formula \(\sum\nolimits_m {p_t} (r|x,m){p_t}(m)\) can also be written*p*(_{t}*r*|*x*) and should be stored as the term is also used in Step 4. - Use
*x*_{t}_{+1}as the stimulus on the next trial to receive response*r*_{t}_{+1}. - Because Step 1 requires a prior on the parameters
*p*(_{t}*θ*|*m*), this prior must be recursively updated in addition to updating the model priors. As such we set the parameter and model priors to their posteriors:\begin{equation}\eqalign{ {p_t}(\left. \theta \right|m,{r_{t + 1}},{x_{t + 1}})&= {{{p_t}(\left. \theta \right|m)p(\left. {{r_{t + 1}}} \right|{x_{t + 1}},\theta ,m)} \over {\sum\nolimits_\theta {p_t} (\left. \theta \right|m)p(\left. {{r_{t + 1}}} \right|{x_{t + 1}},\theta ,m)}} \cr {p_{t + 1}}(\left. \theta \right|m)&= {p_t}(\left. \theta \right|m,{r_{t + 1}},{x_{t + 1}}) \cr {p_{t + 1}}(m)&= {p_t}(\left. m \right|{r_{t + 1}},{x_{t + 1}}) \cr} \end{equation} - Return to the first step until the desired number of trials is completed or sufficient model evidence has been obtained.

*s*

_{2}and a reference

*s*

_{1}, described by:

*s*

_{2}–

*s*

_{1}with a mean

*α*and variance

*λ*is a lapse rate accounting for trials where an observer guesses randomly, and

*α*is a bias parameter accounting for biases in subject's responses. We assume the subject's sensory noise changes with the stimulus in one of three ways. The first, and simplest model, assumes sensory variance is independent of the stimulus. We denote this the constant noise model. The second model assumes that the standard deviation of the sensory noise increases linearly with the signal intensity, and thus has zero standard deviation if the signal is absent. This model is referred to as the Weber model. Finally, we consider a model where the sensory noise is nonzero when the signal is absent and also has a linearly increasing part, which we will refer to as the generalized model.

*β*to be kept in a similar range for each model); for the Weber model we assume

*σ*

^{2}= (

*βs*)

^{2}, and for the generalized model we assume

*σ*

^{2}=

*γ*

^{2}+ (

*βs*)

^{2}. The above response model means we can parametrize a subject's response behavior (regardless of model) using four parameters,

*θ*= [

*α*,

*β*,

*γ*,

*λ*].

*p*(

*θ*|

*m*), we assumed a uniform discrete distribution for each parameter and that the parameters are independent. Finally, for the prior over models

*p*(

*m*) we used a uniform distribution over the three models.

*γ*, it was not used for these models). The stimuli for these trials were either selected adaptively using our algorithm, or randomly from the same stimulus grid. This led to a total of 12,000 simulated datasets.

^{2}. The fixation dot was 0.2° in size and drawn in the center of the screen. The stimuli were displayed at a resolution of 1,024 × 768 on a gamma-corrected 17-in. Iiyama HM903DTB monitor (Iiyama, Tokyo, Japan) viewed from a distance of approximately 43.5 cm.

*s*

_{1}°/s and the other (the probe) with speed

*s*

_{2}°/s. The subject was asked to judge which of the two was faster and indicate this with a button press. The position of the reference stimulus (left or right of fixation) was randomized on each trial. The experiment was split into two sessions, the ordering of which was counterbalanced across subjects. In one session (algorithm session) subjects performed 1,500 trials, 750 of which were adaptive trials and 750 were random trials. On an adaptive trial, the Gabor speeds were selected using our algorithm based on the previous stimuli (and responses) generated by this algorithm; on a random trial the speed of each Gabor was selected randomly from the stimulus grid. The stimuli and parameter grids used were the same as for the simulation experiment. In this session the screen was refreshed at 72 Hz.

*s*

_{1}was randomly selected from a set of five possible values, the value of

*s*

_{2}on this trial was then selected using the psi-marginal algorithm (Prins, 2013; see Table 2 for the grids used). This was done in order to maximize the information gain about

*μ*(the point of subjective equality) and

*σ*(the standard deviation) for this particular value of

*s*

_{1}under the assumption the probability of a subject's response follows:

*σ*

^{2}is the variance of the normal distribution and

*μ*is the mean of the distribution. Selecting stimuli in this manner allows us to assess how effective the more traditional fixed reference approach is to separating sensory noise models compared to our algorithm. In this session, stimuli were refreshed at 144 Hz. Note that the probe

*s*

_{2}had a denser grid in this session (see Table 2 compared to Table 1); this allows us to better estimate the psychometric curve of each subject but may also give an advantage to this method in terms of model comparison. Prior to each session, subjects performed 20 practice trials from the respective session.

*i*is the trial index,

**r**is a vector of subject response,

**s**is a vector of the reference stimuli,

_{1}**s**is a vector of probe stimuli, and

_{2}*Bern*stands for a Bernoulli distribution.

*scipy.optimize.minimize*function. The L-BFGS-B is an iterative algorithm designed to optimize a nonlinear function subject to parameter boundaries (Byrd et al., 1995). The parameter bounds were set to those in Table 1, with the exception of

*β*which had bounds of [0.001, 1]. To ensure a global minimum was found we used 200 random initializations and selected the parameter set with the highest log-likelihood. The initial values were obtained by drawing each parameter value from a continuous uniform distribution with the same bound as those specified above.

*k*is the number of parameters of the model. It is important to note that computing model probabilities using Equation 1 also implicitly corrects for the number of parameters (MacKay, 2003).

*β*and

*γ*(see Figure 3). Figure 3 shows that the model probabilities are primarily green to yellow when the generative model is Weber or Constant. This indicates both methods mostly select the correct model. We can also see that in general adaptive sampling produces model probabilities that trend closer to 1 (i.e., yellow), indicating stronger evidence in favor of the correct model. When the generative model is the Generalized model, a substantial number of simulations produce probabilities supporting alternative models (indicated by the blue shading). At first this seems counterintuitive. However, for small values of

*γ*and

*β*, the generalized model becomes almost equivalent to the Weber and constant model. Because these have fewer parameters, they are favored in this situation.

*r*is the subject's response,

*ϕ*is the phase at which the targets are presented, Φ is a cumulative Gaussian with mean

*μ*and standard deviation

*σ*evaluated at the SOA. For the constant model,

*μ*is a fixed value across phases:

*μ*=

*α*. In this model choices are independent of the phase of the motion. The sinusoidal model entails

*μ*changes sinusoidally as a function of phase, and is thus written

*μ*=

*α*+

*β*sin(

*ϕ*+

*ϕ*

_{0}), in which

*α*,

*β*, and

*ϕ*

_{0}are free parameters representing a subject's fixed bias, amplitude of the modulation, and phase offset, respectively. Regardless of the model, we can parameterize the subject response probability using

*θ*= [

*α*,

*β*,

*ϕ*,

_{o}*σ*].

*i*is the trial index,

*N*is the number of trials,

**r**is a vector of subject responses,

**SOA**is a vector of the SOA's the subject was presented,

*is a vector containing the phase the targets were presented at, and*

**ϕ***Bern*stands for a Bernoulli distribution.

*α*,

*β*, and

*σ*which had bounds of [−250, 250], [0, 250], and [0.1, 250] respectively. To ensure a global minimum was found, we used 300 random initializations and selected the parameter set with the highest log-likelihood. The initial values were obtained by drawing each parameter value from a continuous uniform distribution with the same bound as those specified above.

*λ*set to 0) to provide us with a semiparametric estimate of BTD for each phase.

*β*and

*ϕ*

_{0}and thus they were removed from the parameter set when simulating this model. The stimuli for these trials were selected either randomly from the stimulus grid shown in Table 4 or using our adaptive algorithm. This led to a total of 8,000 simulated datasets. Additional simulations were performed based on a truncated Gaussian parameter distribution, reflecting the estimated behavioral parameter range (see Supplementary Material S1).

*β*is not at zero. This implies the modulation of BTD is sinusoidal but the effect on the log-likelihood is insufficient to overcome the penalization for the additional parameters. This is also supported by the model predictions shown in Figure 7, illustrating that the sinusoidal model is a closer fit than the constant model to the independent estimate of BTD for each phase.

*σ*and

*β*(see Figure 10). If the generative model is the constant bias model, both adaptive and random sampling method lead to model probabilities favoring the correct model but the adaptive method produces only slightly higher probabilities. Adaptive sampling leads to the probability of the correct model being slightly higher (indicated by a more yellow hue), which leads to a larger proportion of Bayes factors being over 3. By contrast, when the generative model is the sinusoidal model, the model probabilities range from strongly in favor of the sinusoidal model to strongly in favor of the constant model for both sampling methods. This is understandable because the smaller the amplitude of the sinusoid, the closer the sinusoidal model becomes to the constant model and thus penalizing for the additional parameters leads to favoring the simpler constant model. Interestingly, the shift in model probabilities from sinusoidal to constant is dependent on the variability of a subjects decisions; the smaller

*σ*is, the lower

*β*can be, while still inferring in favor of the sinusoidal model.

*α*estimate.

*bioRxiv*, 150052, https://doi.org/10.1101/150052.

*(pp. 1026–1034). Red Hook, NY: Curran Associates, Inc.*

*Advances in neural information processing systems**, 8 (11), e1002771, https://doi.org/10.1371/journal.pcbi.1002771.*

*PLoS Computational Biology**, 15 (3): 7, 1–16, https://doi.org/10.1167/15.3.7. [PubMed] [Article]*

*Journal of Vision**, 19 (6), 716–723, https://doi.org/10.1109/TAC.1974.1100705.*

*IEEE Transactions on Automatic Control**, 117 (6): 2250–2261, https://doi.org/10.1152/jn.00022.2017.*

*Journal of Neurophysiology**, 7 (6), e1002080, https://doi.org/10.1371/journal.pcbi.1002080.*

*PLoS Computational Biology**. New York, NY: Springer.*

*Model selection and multimodel inference: A practical information-theoretic approach*(2nd ed.)*, 16 (5), 1190–1208, https://doi.org/10.1137/0916069.*

*SIAM Journal on Scientific Computing**, 59 (2), 358–375, https://doi.org/10.1287/mnsc.1120.1558.*

*Management Science**, 22 (4), 887–905.*

*Neural Computation**, 18 (1), 204–210, https://doi.org/10.3758/s13423-010-0030-4.*

*Psychonomic Bulletin & Review**. Chichester, UK: John Wiley & Sons.*

*Theory of probability: A critical introductory treatment**, 15 (9): 5, 1–20, https://doi.org/10.1167/15.9.5. [PubMed] [Article]*

*Journal of Vision**, 16 (9): 1, 1–18, https://doi.org/10.1167/16.9.1. [PubMed] [Article]*

*Journal of Vision**, 23 (9), 2242–2288.*

*Neural Computation**Frontiers in Neural Circuits, 7*, https://doi.org/10.3389/fncir.2013.00101.

*, 11 (6): 16, 1–19, https://doi.org/10.1167/11.6.16. [PubMed] [Article]*

*Journal of Vision**(pp. 2377–2385). Red Hook, NY: Curran Associates, Inc.*

*Advances in neural information processing systems**, 16 (6): 15, 1–17, https://doi.org/10.1167/16.6.15. [PubMed] [Article]*

*Journal of Vision**. (arXiv: 1112.5745).*

*arXiv:1112.5745*[cs, stat]*, 9 (3), 90–95, https://doi.org/10.1109/MCSE.2007.55.*

*Computing in Science Engineering**. Cambridge, UK: Cambridge University Press.*

*Probability theory: The logic of science**, 13 (8), 1020–1026, https://doi.org/10.1038/nn.2590.*

*Nature Neuroscience**. Retrieved from http://www.scipy.org/.*

*SciPy: Open source scientific tools for Python**, 90 (430), 773–795, https://doi.org/10.1080/01621459.1995.10476572.*

*Journal of the American Statistical Association**, 7 (6), e40216, https://doi.org/10.1371/journal.pone.0040216.*

*PLoS One**, 9 (2), e1002927.*

*PLoS Computational Biology**, 26 (11), 2465–2492.*

*Neural Computation**, 39 (16), 2729–2737, https://doi.org/10.1016/S0042-6989(98)00285-5.*

*Vision Research**, 2 (9), e943, https://doi.org/10.1371/journal.pone.0000943.*

*PLoS One**, 50 (4), 369–389, https://doi.org/10.1016/j.jmp.2005.12.005.*

*Journal of Mathematical Psychology**.*

*ArXiv*:1409.7552v1*, 112 (26), 8142–8147, https://doi.org/10.1073/pnas.1500361112.*

*Proceedings of the National Academy of Sciences**, 10 (3): 17, 1–21, https://doi.org/10.1167/10.3.17. [PubMed] [Article]*

*Journal of Vision**. Cambridge, UK: Cambridge University Press.*

*Information theory, inference and learning algorithms**, 26 (4), 609–619.*

*Vision Research**( Vol. 445, pp. 51–56). SciPy Austin, TX.*

*Proceedings of the 9th Python in Science Conference**, 11 (12), e1004649, https://doi.org/10.1371/journal.pcbi.1004649.*

*PLoS Computational Biology**, 12 (Oct), 2825–2830.*

*Journal of Machine Learning Research**Frontiers in Neuroinformatics*,

*2*, https://doi.org/10.3389/neuro.11.010.2008.

*, 31 (47), 17220–17229, https://doi.org/10.1523/jneurosci.2028-11.2011.*

*Journal of Neuroscience**, 26 (5), 508–522, https://doi.org/10.1016/j.jco.2010.04.001.*

*Journal of Complexity**(pp. 3–18). Amsterdam, the Netherlands: Elsevier.*

*Closed Loop Neuroscience**, 13 (7): 3, 1–17, https://doi.org/10.1167/13.7.3. [PubMed] [Article]*

*Journal of Vision**, 116 (3), 977–985, https://doi.org/10.1152/jn.01071.2015.*

*Journal of Neurophysiology**, 12 (4), e1004859, https://doi.org/10.1371/journal.pcbi.1004859.*

*PLoS Computational Biology**(pp. 1361–1368). Cambridge, MA: MIT Press.*

*Advances in neural information processing systems 17**, 9 (4), 578–585, https://doi.org/10.1038/nn1669.*

*Nature Neuroscience**, 13 (2), 22–30, https://doi.org/10.1109/MCSE.2011.37.*

*Computing in Science Engineering**, 8 (12): 8, 1–13, https://doi.org/10.1167/8.12.8. [PubMed] [Article]*

*Journal of Vision**, 5 (6), 598–604, https://doi.org/10.1038/nn0602-858.*

*Nature Neuroscience**x*

_{1}and

*x*

_{2}, one for the reference and one for the probe. We model these as normally distributed random variables, with a mean centered on the true reference and probe values and a variance that is a function of the underlying sensory noise model. As such we can write

*x*

_{1}and

*x*

_{2}as

*x*

_{2}>

*x*

_{1}and 0 otherwise. In order to derive the distribution of an observer's response it is useful to note this is equivalent to

*x*

_{2}–

*x*

_{1}> 0. As

*x*

_{2}and

*x*

_{1}are normally distributed random variables, subtracting them produces another normally distributed variable Δ

*x*. Therefore the subject's response probability can be written as:

*,*

_{x}*s*

_{2}–

*s*

_{1}, with a mean of 0 and variance

*α*to account for small deviations from unbiased behavior and a lapse term

*λ*to account for lapses in the task. Therefore, the final response probability can be written,