**Abstract**:

**Abstract**
Inpsychophysics, researchers usually apply a two-level model for the analysis of the behavior of the single subject and the population. This classical model has two main disadvantages. First, the second level of the analysis discards information on trial repetitions and subject-specific variability. Second, the model does not easily allow assessing the goodness of fit. As an alternative to this classical approach, here we propose the Generalized Linear Mixed Model (GLMM). The GLMM separately estimates the variability of fixed and random effects, it has a higher statistical power, and it allows an easier assessment of the goodness of fit compared with the classical two-level model. GLMMs have been frequently used in many disciplines since the 1990s; however, they have been rarely applied in psychophysics. Furthermore, to our knowledge, the issue of estimating the point-of-subjective-equivalence (PSE) within the GLMM framework has never been addressed. Therefore the article has two purposes: It provides a brief introduction to the usage of the GLMM in psychophysics, and it evaluates two different methods to estimate the PSE and its variability within the GLMM framework. We compare the performance of the GLMM and the classical two-level model on published experimental data and simulated data. We report that the estimated values of the parameters were similar between the two models and Type I errors were below the confidence level in both models. However, the GLMM has a higher statistical power than the two-level model. Moreover, one can easily compare the fit of different GLMMs according to different criteria. In conclusion, we argue that the GLMM can be a useful method in psychophysics.

*Generalized Linear Model*(GLM; Agresti, 2002) is usually applied to these data sets; in its simple form the model has two parameters, typically the intercept and the slope of the linear predictor. More sophisticated models have been proposed to take into account lapses and guesses by the participant (Wichmann & Hill, 2001a; Yssaad-Fesselier & Knoblauch, 2006), and a priori hypotheses of the experimenter (Kuss, Jäkel, & Wichmann, 2005). Ordinary GLMs, as well as other models of the kind mentioned above, assume that the responses are independent and conditionally identically distributed. While the responses of a single subject may approximately satisfy these assumptions, the repeated responses collected from more than one subject generally do not. Actually, nonstationarities due to learning or fatigue, for example, may result in violations of these assumptions even in the case of a single subject: In this context, notice that a beta-binomial model has been proposed to deal with nonstationary responses (Fründ, Haenel, & Wichmann, 2011). In the case of repeated responses from more than one subject, ordinary GLMs treat the errors within subject in the same manner as the errors between subjects, and tend to produce invalid standard errors of the estimated parameters. A typical approach to overcome this problem consists in applying a two-level analysis (e.g., Morrone, Ross, & Burr, 2005; Pariyadath & Eagleman, 2007; Johnston et al., 2008). First, the parameters of the psychometric function are estimated for each subject. Next, the individual estimates are pooled to perform the second-level analysis, and inference is carried out, for example, by means of

*t*test or ANOVA statistics.

*Generalized Linear Mixed Models*(GLMM; Cox, 1958; Rasch, 1961; Breslow & Clayton, 1993; Agresti, 2002; Bolker et al., 2009; Knoblauch & Maloney, 2012). The GLMM is an extension of the GLM that allows the analysis of clustered categorical data, as in the case of repeated responses from different subjects. In the GLMM, the first and second levels of the analysis are implemented simultaneously. Some of the advantages of the GLMM with respect to the two-level analysis are: (a) the GLMM takes the whole ensemble of the responses as input data, (b) it separately estimates the variability of fixed and random effects, and (c) it allows an easier assessment of the goodness of fit.

^{1}To our knowledge, the issue of estimating the PSE has never been addressed within the GLMM framework.

*Parameter-As-Outcome Model*(PAOM). Since the GLM is closely related to the GLMM, we introduce some general concepts of the GLM, and then we extend these concepts to the GLMM. Mathematical details on the GLM and the GLMM are provided in the Appendix. Next, we introduce our algorithm for the estimate of the PSE within the GLMM framework. Finally, we compare the performance of the GLMM and the two-level approach on two different sets of published experimental data (Johnston et al., 2008; Moscatelli & Lacquaniti, 2011), and on simulated data.

*yes-or-no discrimination paradigms*(Klein, 2001), that is, experimental paradigms for which the corresponding psychometric function has a range (0, 1). In our prototypical experiment of time perception, participants are presented with two stimuli, a reference stimulus of constant duration and a test stimulus of variable duration, and they are asked to judge whether the duration of the test is longer or shorter than that of the reference. The duration of the test stimulus is randomly chosen from a given set (method of the constant stimulus; Treutwein, 1995).

*Y*consists of multiple independent observations from a distribution of the exponential family (e.g., Normal distribution for a continuous response variable, Poisson distribution for counts, Binomial distribution for proportion). The predictor is a linear function of one or more explanatory variables. Let

*x*denote

_{jk}*k*experimental variables (each one repeated over a number

*j*of trials), and

*β*the parameters of the model, then the linear predictor is ∑

_{k}*. The link function*

_{k}β_{k}x_{jk}*g*relates the response variable to the linear predictor, so that the range of the latter (−∞ to +∞) is the same as that of

*g*(

*Y*). For example, in the psychophysics of the time perception, let

*x*be the duration of the test stimulus and

_{ij}*Y*the response variable for subject

_{ij}*i*and trial

*j*;

*Y*= 0 if the test trial has been judged shorter than the reference, and

_{ij}*Y*= 1 if judged longer. We can link the probability of a longer response

_{ij}*P*(

*Y*= 1) with the linear predictor by means of the

_{ij}*probit*link function Φ

^{−1}:

*β*

_{0i }is the intercept and

*β*

_{1i }is the slope of the linear function. If

*β*

_{1i }> 0, the

*probit*function is the inverse of the standard Normal cumulative distribution (otherwise, if

*β*

_{1i }< 0, the curve is the complement of the cdf). The two parameters

*β*

_{0i }and

*β*

_{1i }extend the model to the entire class of Normal cumulative distributions: the parameter

*β*

_{0i }shifts the curve to the left or to the right, while the parameter

*β*

_{1i }affects the rate of increase of the curve. The subscript

*i*in Equation 1 indicates that, at this first level of the analysis, a separate model is fit for each single subject. The point-of-subjective-equivalence (

*PSE*) is a function of the parameters

_{i}*β*

_{0i }and

*β*

_{1i }:

*PSE*estimates the accuracy of the response, while the slope of the linear function

_{i}*β*

_{1i }estimates its precision. The noise of the response is an inverse function of the slope, commonly referenced as the

*Just-Noticeable-Difference*(JND).

*x*the response from subject 1 is $ Y 1 j * $|

_{ij}*x*

_{1j }∼

*N*(

*μ*

_{1j }, $ \sigma 1 j 2 $), the response from subject 2 is $ Y 2 j * $|

*x*

_{2j }∼

*N*(

*μ*

_{j}_{2}, $ \sigma j 2 2 $), and the response from subject

*m*is $ Y m j * $|

*x*∼

_{mj}*N*(

*μ*, $ \sigma m j 2 $). Therefore, the responses are not identically distributed between different subjects. Fitting such dataset with an ordinary GLM, the resulting errors might be correlated.

_{mj}*t*- or

*F*-statistics.

^{2}Moreover, the PAOM does not retain the information about the number of repetitions per subject. The number of repetitions has only an indirect effect on the power of the analysis (it will reduce the subject-specific standard error and the estimates of the parameters will be more reliable). As an additional point, the PAOM implicitly assumes that different subjects have the same weight in the second-level analysis, which is an incorrect assumption when, for example, the number of trials is different from subject to subject. Finally, the selection of the best statistical model may be difficult. We may have to decide whether to include a given parameter or we may have to choose between different link functions. While several criteria of model comparison can be applied to each subject, they may not easily extend to the whole population.

*β*

_{0}and

*β*

_{1}are the

*fixed-effect parameters*. Note that, unlike in Equation 1, they are not indexed by the

*i*subscript, being the same for all subjects. The other difference with ordinary GLM is in the error terms. By introducing the latent variable $ Y i j * $, we define the model as:

_{ ij }represents the variability

*within subjects*and the error-term u

_{i}the variability

*between subjects*; the error-term

*u*is also known as

_{i}*random-effects parameter*.

^{3}In GLMMs, the errors ε

_{ ij }are independent only conditional on the random parameter

*u*(see the Appendix for further information).

_{i}*x*-axis; therefore it is also called

*random location parameter*. Within the GLMM framework, we can model multiple sources of between-subjects variability. In examples 1 and 2 (see Results section), we show how to model between-subjects variability in both the accuracy and the precision of the response: This accounts for the more general case, in which the psychometric function of each subject may have a different value of PSE and/or slope (or equivalently, JND).

*m*psychometric functions were fitted to

*m*subjects.

*L*

_{0}and

*L*

_{1}are the maximized log-likelihood functions of the two models. Under the null hypothesis that the simpler model,

*M*

_{0}, is better than

*M*

_{1}, the LR has a large-sample $ \chi ( 1 ) 2 $ distribution. The likelihood ratio test is adequate for testing fixed-effect parameters if the sample size is relatively large (Bolker et al., 2009; Austin, 2010). On the other hand, the LR may not have a standard $ \chi ( 1 ) 2 $ distribution when testing for the variance of random-effect parameter

*u*. In this case, the null hypothesis that the variance of the random component is zero places the true value of the variance on the boundary of the parameter space defined by the alternative hypothesis. The limiting distribution of

_{i}*LR*does not fully approach a

*χ*

^{2}random variable, and therefore this statistics tends to be conservative when testing for the variance of a random effect. The

*LR*test can be used with these caveats in mind; as shown in the examples, we compared the test with other criteria in order to draw inferences on random effects. In a similar fashion to the LR test, it is possible to “profile” the change of the likelihood versus each parameter of the model. This approach is not available in the current version of lme4, but will be available in a future release of the package.

*L*is the maximized log-likelihood function and

*k*is the number of parameters of the model. Therefore, the AIC balances the fitting and the complexity of the models. We can compare the AIC of two models in order to take a decision on a given parameter; the preferred model is the one with the minimum AIC value. The AIC provides a criterion for model selection, but it is not a statistical test on model's parameters.

*β*= 0. We refer the variable

*z*to the standard normal distribution

*Z*to get one-sided or two sided

*p*-values. Equivalently, for the two-sided alternative, the random variable

*z*

^{2}has a $ \chi ( 1 ) 2 $ distribution. There are three caveats about the Wald statistics. First, it may not provide a reliable inference for small sample sizes. Second, the

*Z*and $ \chi ( 1 ) 2 $ approximations are only appropriate for a GLMM without overdispersion (overdispersion means that the variance in the data is higher than the variance predicted by the statistical model, see for example Agresti, 2002; Bolker et al., 2009; Durán Pacheco, Hattendorf, Colford, Mäusezahl, & Smith, 2009). Third, the Wald statistics are not a good choice for testing random-effect parameters, since the distribution of the estimated variance scaled against its asymptotic standard error does not converge to a normal distribution. We therefore recommend using a criterion for model comparison (such as the AIC discussed above) in order to evaluate whether the inclusion of a random-effect parameter is justified.

*glmer*(package

*lme4*) provides, for each subject, adjustments to the fixed effects of the model. Formally, they are not parameters of the model, but are considered as conditional modes. The algebraic sums of these conditional modes and the fixed parameters, adjust the intercept and slope of the model to the responses of each subject. In Figure S1 and S2 we used fixed parameters and conditional modes for a graphical representation of the model.

*β*

_{0},

*β*

_{1}are the fixed-effect parameters of the model. Within the Delta method, the (1-

*α*) confidence interval is equal to:

*m*subjects,

*d*possible values of a continuous, independent variable (such as the stimulus duration

*x*), and

_{ij}*n*repetitions for each value. First, we fit the GLMM to the original data in order to estimate the fixed effects, as well as the variance and covariance matrix of the random effects. Consider for example a GLMM with random intercept and slope. Then, we simulate

*m*pairs of random predictors from a multivariate normal distribution (R package

*mnormt*) with a mean equal to zero, and with variance and covariance equal to the values estimated by the model. For each subject

*i,*we adjust the fixed effects with the respective simulated values of random predictors. Thus, we get

*m*values of intercept and slope (

*β*

_{0i }and

*β*

_{1i }for

*i = 1…m*). Using the estimates

*β*

_{0i }and

*β*

_{1i }responses

*Y*are randomly generated from a binomial (

_{ij}*n*,

_{ij}*πˆ*), where

_{ij}*j*= 1, …,

*d*; i = 1, …,

*m*and

*π*= Φ(

_{ij}*β*

_{0i }+

*β*

_{1i }

*x*).We use the R function

_{ij}*rbinom*to generate random numbers from the binomial distribution. Then we fit again the simulated responses with GLMM and we determine the fixed effects estimates $ \beta 0 s i m $, $ \beta 1 s i m $, $ P S E s i m $ = −( $ \beta 0 s i m $/ $ \beta 1 s i m $) for these data. This simulation of data is repeated a large number

*B*of times, providing bootstrap estimates $ P S E 1 s i m $, $ P S E 2 s i m $, …, $ P S E B s i m $ . These values are ranked from the smallest to the largest. Many methods are available for identifying a 100(1 -

*α*)% confidence interval from this rank-ordered distribution (as described in Efron, 1987). According to the percentile method, the inferior and superior confidence interval are the bootstrap estimates, whose rank is respectively (

*α*/2) *

*B*and (1 −

*α*/2) *

*B*. This is denoted as the bootstrap interval.

*lme4*(package version 0.999375-42; Bates et al., 2011) in order to fit the GLMM. We used our own R-program to simulate data, to plot the curves in Figure S1 and to estimate the PSE and its variability. The scripts of the R-based program have been tested on Linux Ubuntu 12.04 (32-bit) and Mac OS X 10.7.4 (64-bit) using R version 2.15.0. Our R-program together with detailed comments can be obtained from the corresponding author or from the following web site: http://mixedpsychophysics.wordpress.com.

*yes*responses over the total number of trials).

Parameter | Estimate | Standard Error | z value | Pr(> |z|) |

Intercept | −5.5896 | 0.5502 | −10.16 | < 0.0001 |

Slope | 0.0067 | 0.0007 | 10.02 | < 0.0001 |

_{i}for each subject and condition according to Equations 1 and 2. In the second-level analysis, we tested (with paired

*t*-test) the effect of motion conditions on PSE

_{i}and slope separately. The difference in slope was highly significant (

*t*= 5.16,

*df*= 6,

*p*= 0.002), whereas the difference in PSE

_{i}was not significant (

*t*= −0.78,

*df*= 6,

*p*= 0.46). Table 2 and Table 3 show the mean and standard deviation of these two parameters. It is worth noting that, in step two, this model discards the subject-specific standard error (i.e., Column 3 of Table 1). Thus, the

*SD*in Table 2 and 3 estimates only the variability between-subjects.

Condition | Mean | SD | |

1 | Down | 0.00789 | 0.00233 |

2 | Up | 0.00648 | 0.00266 |

Condition | Mean | SD | |

1 | Down | 825.11 | 44.79 |

2 | Up | 837.64 | 48.60 |

*probit*link function). We included three random-effects parameters (the random intercept, the random slope and their correlation) and four fixed-effects parameters. The model is the following: where $ Y i j * $ is the latent variable as defined in Equation 4,

*x*is the stimulus duration,

_{ij}*d*is the dummy variable for the experimental condition (0 for Downward and 1 for Upward),

_{ij}*x*is the interaction between the stimulus duration and the dummy variable,

_{ij}d_{ij}*θ*

_{0}, …,

*θ*

_{3}are the fixed-effects coefficients, $ u i 0 $, $ u i 1 $ are the random-effect coefficients. The fixed-effect parameter corresponding to the slope

*θ*

_{1}estimates the precision of the response in the Downward condition; the higher the slope, the higher the precision. The parameter of the interaction between the stimulus duration and the dummy variable,

*θ*

_{3}, tests whether the slope is significantly different in the two experimental conditions.

Parameter | Estimate | Std. Error | z value | Pr(> |z|) |

θ _{0}(intercept) | −5.35950 | 0.81099 | −6.60859 | < 0.0001 |

θ _{1}(Motion Duration) | 0.00632 | 0.00084 | 7.53519 | < 0.0001 |

θ _{2}(Downward) | −1.13510 | 0.28860 | −3.93308 | < 0.0001 |

θ _{3}(Interaction) | 0.00147 | 0.00035 | 4.19224 | < 0.0001 |

*p*< 0.001). The large differences in AIC and LR test are both in favor of the model with two random effects. We next examined the behavior of each subject by means of a model-fit plot (Figure S2). The model with a single random effect (green in Fig. S2) provides a poor fitting mainly due to the responses of Subject 2, a possible outlier. We removed this subject from the dataset and compared again the two models. The difference in AIC between the two models was now much smaller, being respectively 286 for the random-slope model and 291 for the single random effect model.

*θ*

_{3}) was highly significant (z Wald statistics). In order to confirm the results, we fitted the data again with a simpler model, excluding the parameter of interest

*θ*

_{3}. This simpler model assumes the same slope for the two motion conditions. We then compared the two models with the LR test; according to the test the second models fits poorly compared to the former ( $ \chi ( 1 ) 2 $ = 18.3;

*p*< 0.001). Coherently, the first model including the parameter

*θ*

_{3}has a smaller AIC compared with the other. In summary, the two methods of model comparison (LR test and AIC) and the Wald statistics led us to a similar conclusion on the parameter

*θ*

_{3}; the difference between the upward and downward motion condition was statistically significant, the responses being significantly more precise in the downward condition.

*SD*) in the downward condition, and 848 ± 20 ms in the upward condition. We also evaluated the PSE with the bootstrap method (sampled datasets

*B*= 600). The PSE was 833 ms (95% confidence interval: 792-866 ms) in the downward condition and 847 ms (95% confidence interval: 798-880 ms) in the upward condition. In both experimental conditions, the distribution of the PSE estimated with the bootstrap method has a negative skewness, whereas the Delta method assumes a solution symmetric about the mean (Equation 11).

*slope*) but not the accuracy (

*PSE*) of the response. The estimate of the two parameters was similar between the two models, with a difference of about 1% to 2% of the value of each parameter. On the other hand, it does not make sense to compare the standard deviation of the two models, because they have completely different interpretations. In PAOM, the

*SD*is a measure of the variability between subjects. In the GLMM, it is possible both to estimate the variability due to the experimental condition (i.e., Table 4) and the variability between subjects (the

*SD*and correlation of the random effects).

*H*

_{0}:

*β*

_{1down }=

*β*

_{1up }with both GLMM and PAOM (we used the LR test with GLMM). For each sample width, we repeated the analysis with six different combinations of subjects. As shown in Figure 1, we could always reject the null hypothesis. The

*p*-values were on average smaller with GLMM than with PAOM. In this example, we performed more analyses than usually expected in a research article in order to illustrate different possibilities and potential pitfalls of GLMM.

*Ibid*., experiment 1, pp. 2–4). We chose this dataset because the analysis performed in the original article showed a difference in the PSE between the two conditions. Each experiment consisted of an invisible flickering and control condition (140 trials per condition). The duration of the standard stimulus was fixed at 500 ms. Five subjects were tested.

*t*-test. The difference in slope was not significant (

*t*= 1.88,

*df*= 4,

*p*= 0.13), whereas the difference in PSE was highly significant (

*t*= 8.83,

*df*= 4,

*p*< 0.001). The average difference was 33 ms. Then we applied the GLMM (

*probit*link function) on the same data set. The model was the following: where $ Y i j * $ is the latent variable,

*x*is the stimulus duration,

_{ij}*d*is the dummy variable for the experimental condition (either 1 or 0),

_{ij}*x*is the interaction between the stimulus duration and the dummy variable,

_{ij}d_{ij}*η*

_{0}, …,

*η*

_{3}are the fixed-effects coefficients, and $ u i 0 $, $ u i 1 $ are the random-effect parameters. As in the previous example, we included three random-effects parameters (the random intercept, the random slope and their correlation). Table 5 reports the estimate and standard error of the fixed-effect parameters. The significance of each parameter of the model was assessed by means of the Wald statistics.

Parameter | Estimate | Standard error | z value | Pr(> |z|) |

η _{0}(Intercept) | −4.61845 | 0.65614 | −7.039 | < 0.0001 |

η _{1}(Duration) | 0.00914 | 0.00137 | 6.688 | < 0.0001 |

η _{2}(Control Condition) | 1.14258 | 0.409575 | 2.790 | 0.00528 |

η _{3}(Interaction) | −0.00180 | 0.00811 | −2.222 | 0.02631 |

*SD*) in Invisible Flicker condition and 505 ± 7ms in the Control condition. The difference between the two estimated values of PSE was 32 ms. We further evaluated the PSE with the bootstrap method (sampled datasets

*B*= 600). The PSE was 473 ms (95% confidence interval: 458-490 ms) in Invisible Flicker condition and 505 ms (95% confidence interval: 491-523 ms) in the Control condition. Crucially, the two confidence intervals estimated with the bootstrap method do not overlap.

*β*

_{0i }and

*β*

_{1i }. We then centered each fitted model on the average of the continuous predictor, by choosing an appropriately shifted value of the intercept:

*x¯*is the average value of the continuous predictor (equal to 800) and $ \beta 1 i $ is the fitted slope of the model. According to Equation 2, the

*PSE*of each “shifted” model was:

_{i}*Y*were randomly generated from a binomial (

_{ij}*n*,

_{ij}*πˆ*) ; where

_{ij}*i*is the subject,

*j*is the stimulus level, and

*π*= Φ( $ \beta 0 i * $ +

_{ij}*β*

_{1i }

*x*). We used the R function

_{ij}*rbinom*to generate random numbers from the binomial distribution. The simulated data set consists of the set of counts along with their associated predictors, sample sizes and subject labels. The procedure was repeated 1,000 times, resulting in 1,000 randomly generated data sets. Each sampled dataset was then fitted with a GLMM (random intercept and slope) and a PAOM.

*t*-test for each simulated experiment. In each

*t*-test, the alternative hypothesis stated that the true value of the PSE was different from 800. Consistently with the expectation, we were unable to reject the null hypothesis in 49/1,000 samples (with a significance-level

*α*equal to 0.05). According to GLMM, the grand mean of the PSE was 799.95. The 95% CIs did not include the theoretical value of the PSE (800) in 49/1,000 samples. Thus, the two methods resulted to be unbiased, and Type I Errors of the PSE were within the confidence level of 0.05.

*d*is a dummy variable (

_{ij}*d*= 1 in one half of the observations),

_{ij}*β*

_{2i }= −[800 · (

*β*

_{1i }+

*δ*)] − $ \beta 0 i * $, and

*δ*accounts for a simulated experimental effect. According to this equation, PSE = 800 for both

*d*= 1 and

_{ij}*d*= 0, while the difference in slope between

_{ij}*d*= 1 and

_{ij}*d*= 0 is equal to

_{ij}*δ*(actually,

*δ*is a parameter of the model, just as

*β*

_{0i }, …,

*β*

_{02}. We used this notation to highlight that this specific parameter accounts for a

*simulated difference*between experimental conditions). Using this algorithm, we first checked the two models for Type I Errors. With respect to

*δ*, the hypothesis

*H*states that the parameter is not significantly different from 0, thus we set

_{0}*δ*= 0 in Equation 16 and simulated 1,000 data sets. In this case, we simulated 720 trials per subject (360 trials in each experimental condition, with

*d*= 1 or

_{ij}*d*= 0). With a confidence level of 0.05, we would expect

_{ij}*δ*to be not significantly different from 0 in less than 5% of the samples. We tested the parameter by means of the Wald statistics. The parameter was significantly different from 0 in 42/1,000 samples with PAOM (Figure 3) and 38/1,000 samples with GLMM (Figure 4). Furthermore, according to the two methods, the two estimates of the PSE were unbiased (Figures 5 and 6) and the PSE was significantly different from 800 in less than 5% of the samples.

*δ*> 0. We always chose plausible values of the parameter (according to our original data set). In different simulations,

*δ*was 0.0005, 0.001, 0.002 and 0.0026. According to PAOM, the parameter was significantly different from zero respectively in 210/1,000, 593/1,000, 996/1,000 and 1,000/1,000 samples. According to GLMM, the parameter was significantly different from zero respectively in 283/1,000, 795/1,000, 1,000/1,000 and 1,000/1,000 samples. As shown in Figure 7, the GLMM has a higher power than PAOM (0.0005 >

*δ*> 0.002) while Type I Errors are similar (

*δ*= 0).

*p*= 0.05) in both the PAOM and the GLMM. In these simulations, we computed

*p*-values of the

*δ*parameter by means of the z Wald statistics, whereas in Example 1 and Example 2 we performed statistical inference by means of LR test and model comparison. On real data, the z Wald statistics is unreliable if overdispersion occurs; this is not the case with simulated data, where the programmer can control the sources of variability. According to previous studies, Type I error rate in lme4 depends on both the number of clusters, repetitions and within-cluster correlations (Austin, 2010; Zhang et al., 2011). Austin (2010) found that the empirical type I error rates were acceptable as long as the number of subjects and repetitions per subject were larger than five. Similarly, Bolker et al. (2009) suggested the rule-of-thumb of including more than five subjects or clusters for each random effects, and more than 10 samples per each level of the linear predictor. The power of the analysis depends on the sample size, the effect size, and its source of variability, that is, the rate of

*yes*responses (it will affect the variance in a Binomial distribution), the overdispersions, and the heterogeneity between subjects or clusters. Checking previous studies in the literature may help in predicting the correct sample size to address a specific experimental question.

*weighted*sum that evaluates the function at certain

*quadrature points*. The approximation improves as the number of quadrature points increases; in

*lme4*package it is possible to choose the number of points. Alternatively, the likelihood function can be approximated using Monte Carlo Markov Chain (MCMC) methods. The R package

*lme4*allows the MCMC method in Linear Mixed Model (LMM; Pinheiro & Bates, 2000), however, at the time we wrote the article, this is not possible for GLMMs. The R package MCMCglmm (Hadfield, 2010) allows fitting GLMMs by means of MCMC algorithms. This package follows a Bayesian approach, and therefore the researcher has to choose a prior distribution of the parameters of the model. The Gauss-Hermite and Monte Carlo integration methods provide likelihood approximations, and the parameter estimates converge to the ML estimates, as the quadrature points and the sample size increase. Alternatively, the penalized quasi-likelihood method (PQL) maximizes the quasi-likelihood rather than the likelihood function. The PQL may yield biased estimates with binary data (Goldstein & Rasbash, 1996); Agresti (2002) recommends using ML rather than PQL methods. Finally, it is worth noting that the likelihood ratio test cannot be used with the PQL fitting, because the test requires the ML. All methods discussed above are implemented in several R packages (for a comparison, see Bolker et al., 2009; Austin, 2010; Zhang et al., 2011). The R-based code that we used in this article is available on our web site (http://mixedpsychophysics.wordpress.com).

*nested random effects*and

*crossed random effects*(Bolker et al., 2009). Both types of random effects can be implemented in lme4.

*logit*and

*probit*GLM). In GLMMs, it is possible to include subjects-level random effects in order to account for over-dispersion (Agresti, 2002). Other tips on GLMM and over-dispersion are in Agresti (2002), Bolker et al. (2009), and the related web site: www.glmm.wikidot.com.

*γ*that accounts for the base-line probability. The probability of

*Yes*response is therefore:

*glmer*.

*. Hoboken, NJ: John Wiley & Sons.*

*Categorical data analysis*(2nd ed.)*(pp. 267–281). Budapest, Hungary: Akadémiai Kiadó.*

*Proceedings of the 2nd International Symposium on Information Theory*

*The International Journal of Biostatistics**,*6(1), 16.

*Journal of Memory and Language**,*59, 390–412. [CrossRef]

*Trends in Ecology and Evolution**,*24 (3), 127–135. [CrossRef] [PubMed]

*Journal of American Statistical Association**,*88, 9–25.

*Journal of Royal Statistical Society B**,*59 (2), 431–446. [CrossRef]

*. Pacific Grove, CA: Duxbury Press.*

*Statistical inference*(2nd ed.)

*Journal of the Royal Statistical Society B**,*20, 214–242.

*Psychological Methods**,*3

*,*186–205. [CrossRef]

*Journal of Mathematical Psychology**,*54, 304–313. [CrossRef]

*Statistics in Medicine**,*28 (24), 2989–3011. [CrossRef]

*. New York: Chapman & Hall.*

*An introduction to the bootstrap*

*Journal of American Statistical Association**,*82, 171–200. [CrossRef]

*Statistics in Medicine**,*22 (12), 1977–1988. [CrossRef] [PubMed]

*Spatial Vision**,*11, 135–139. [CrossRef] [PubMed]

*. New York: Springer.*

*Bayesian item response modeling: Theory and applications*

*Journal of Vision**,*11 (6):16, 1–19, http://www.journalofvision.org/content/11/6/16, doi:10.1167/11.6.16. [PubMed] [Article] [CrossRef] [PubMed]

*Journal of the Royal Statistical Society A**,*159, 505–513. [CrossRef]

*Journal of Statistical Software**,*33 (2), 1–22. [PubMed]

*Behavior Research Methods**,*39 (1), 101–117. [CrossRef] [PubMed]

*Vision Research**,*48 (17), 1852–1858. [CrossRef] [PubMed]

*The Statistician**,*50 (1), 41–50.

*Perception & Psychophysics**,*63 (8), 1421–145. [CrossRef] [PubMed]

*. New York: Springer.*

*Modeling psychophysical data in R*

*Journal of Vision**,*5 (5):8, 478–492, http://www.journalofvision.org/content/5/5/8, doi:10.1167/5.5.8. [PubMed] [Article] [CrossRef]

*Perception & Psychophysics**,*47, 127–134. [CrossRef] [PubMed]

*Human Brain Mapping**,*27 (5), 402–410. [CrossRef] [PubMed]

*. Newbury Park CA: Sage.*

*Bootstrapping: A non-parametric approach to statistical inference*

*Nature Neuroscience**,*8 (7), 950–954. [CrossRef] [PubMed]

*Journal of Vision**,*11 (4):5, 1–17, http://www.journalofvision.org/content/11/4/5, doi:10.1167/11.4.5. [PubMed] [Article] [CrossRef] [PubMed]

*Experimental Brain Research**,*210 (1), 25–32. [CrossRef] [PubMed]

*PLoS One**,*2(11), e1264, doi:10.1371/journal.pone.0001264 .

*. New York: Springer-Verlag.*

*Mixed-effects models in S and S-PLUS*

*Journal of Memory and Language**,*59, 413–425. [CrossRef]

*. Berkeley, CA: University of California Press.*

*Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics, and Probability, Volume 4: Contributions to Biology and Problems of Medicine*(pp. 321–333)

*Psychometrika**,*72, 621–642. [CrossRef]

*. Newbury Park, CA: Sage.*

*Modern methods of data analysis*(pp. 325–373)

*Vision Research**,*35 (17), 2503–22. [CrossRef] [PubMed]

*Perception & Psychophysics**,*63 (8), 1293–1313. [CrossRef]

*Perception & Psychophysics**,*63 (8), 1314–1329. [CrossRef]

*, 5, 7–21.*

*BMC Neuroscience*

*Behavior Research Methods**,*38 (1), 28–41. [CrossRef] [PubMed]

*. Epub ahead of print. Internet site: http://onlinelibrary.wiley.com/doi/10.1002/sim.4265/full (Accessed October 9, 2012).*

*Statistics in Medicine*^{2}In other fields of neuroscience, two-level statistical models have been proposed to take the subject-specific variability into account; see, for example, Mériaux, Roche, Dehaene-Lambertz, Thirion, & Poline, 2006.

^{3}In Equation 4 and 5, we labeled the within and between-subjects sources of variability with the

*ε*and

_{ij}*u*error terms. These two error terms do not appear in Equation 3 because it is an expression for the expected value. Alternatively, we could write the between-subjects variability as a random-effects parameter multiplied for a random weight predictor

_{i}*Z*:

*Y*=

^{*}_{ij}*β*

_{0i }+

*β*

_{1i }

*x*+

_{ij}*u*+

_{i}Z_{i}*ε*. The random predictor

_{ij}*Z*equals to 1 for subject

*i*and 0 otherwise. In matrix notation:

*Y*

^{*}=

*Xβ*+

*Zu*+

*e*

**.**We assume that

*u*has a normal distribution with mean 0 and variance

*σ*. This second notation has been also frequently used (Breslow & Clayton, 1993; Agresti, 2002).

^{2}_{u}*probit*link function (as in Equation 1):

*ε*instead of a standard Gaussian, has a Gaussian distribution with 0 mean and variance

_{ij}*σ*

^{2}, we can easily derive Equation A4:

*v*and

_{im}*v*are error-terms from different trials and within the same subject. While error terms within the same subject are positively correlated (Equation A8), they are independently conditional on the random parameter

_{in}*u*. The higher $ \sigma u 2 $, the higher is the correlation between the error terms of the same subject. According to the model (Equation A5 and A6),

_{i}*v*and

_{im}*v*are independent only for a degenerate distribution of

_{in}*u*, that is, if $ \sigma u 2 $ = 0.

_{i}