Integration of proximity and good continuation cues is analyzed as a probabilistic inference problem in contour grouping. A Bayesian framework was tested in a multistable dot lattice experiment. In rectangular lattices, distance ratio and global orientation of rows and columns were manipulated. Discollinearity was introduced by imposing zigzag in one orientation, by either fixed or stochastic displacement of elements. Results indicate that proximity and good continuation are generally treated as independent sources of information, added to prior orientation log-odds to produce the odds of grouping percepts. Distance likelihood is well captured by a power law, and discollinearity likelihoods by generalized Laplace distributions, with higher kurtosis for stochastic zigzag. While observers prefer vertical over horizontal orientations, the exact prior distribution is idiosyncratic. Perceptual grouping along cardinal axes is less affected by distance, but more by discollinearity, than along oblique orientations. Results are qualitatively and quantitatively compared to ecological statistics of contours (J. H. Elder & R. M. Goldberg, 2002). The potential of hierarchically extended Bayes models for a better understanding of principles in cue integration is discussed.

*x*as the symbol for the realization of an observable variable

*X*, such as a sensory measurement, which is related to an unobservable state of the world

*ξ*

_{i}, the

*i*

^{th}among

*k*mutually exclusive possibilities. If we have sensory measurements

*X*

_{1}and

*X*

_{2}, the posterior probability is given by the equation:

*X*

_{1}and

*X*

_{2}are conditionally independent, the multidimensional elements in the expression can be factored:

*ξ*=

*ξ*

_{1}versus

*ξ*=

*ξ*

_{2}, it makes sense to consider their posterior odds

*ξ*=

*ξ*

_{1}and negative if it favors the hypothesis

*ξ*=

*ξ*

_{2}, is also referred to as

*weight of evidence*. Recent evidence shows that the log-odds is a quantity represented by the nervous system in a situation where a decision has to be made in the face of a probabilistic outcome. For instance, the firing rate of LIP neurons in rhesus monkeys is roughly proportional to experimentally manipulated log-odds in a probabilistic classification task (Yang & Shadlen, 2007). As with humans, the rhesus monkeys did not deterministically choose their behavioral decision to match the most likely alternative, which would have optimized their chance of getting a reward.

*a*and

*b*. Apart from the inter-vector angle (here constrained to 90°), the most important construction parameter of a lattice is the ratio of inter-element distances along the two lattice vectors: ∣

*b*∣/∣

*a*∣. Our experimental methodology is largely based on the procedure introduced by Kubovy and Wagemans (1995): subjects reported the perceived global organization upon presentation of a dot lattice.

*a*-orientation, with alternating sign on each dot. We will refer to these as fixed zigzag lattices (FZZ). In a second operationalization of discollinearity, stimuli were constructed by displacing the dots by a random amount along the

*a*-vector. The magnitude of the displacement was determined by the absolute value of a normally distributed random deviate. We will refer to these constructions as stochastic zigzag lattices (SZZ). Both perturbation methods result in a structure with zigzag in the

*b*-orientation, while dots remain collinear in the

*a*-orientation.

*θ*). In the FZZ lattices, where the zigzag is fixed, there is a straightforward and constant relation between displacement ±d and turning angle

*θ*. Note that row spacing (

*b*′) has to undergo a compression to keep the inter-dot distance along

*b*constant (Figure 1). For fixed zigzag lattices, inter-row spacing and displacement were calculated to match previously chosen levels for ∣

*b*∣/∣

*a*∣ and

*θ*.

*assign*a local orientation, namely the tangent to the visual spline connecting the dot to its neighbors. Under any smooth interpolation function, the tangents to the spline at the dot centers will be parallel with the global lattice orientation. The virtual orientations of dot dipoles thus have an angular deviation from co-circularity that exactly equals the turning angle. At least in zigzag patterns, this makes turning angles numerically comparable to discollinearity expressed as deviation from co-circularity.

*v*∣, and the turning angle,

*θ*. The larger the distance between contour elements, the smaller the likelihood that they belong to the same contour. Similarly, the more

*θ*deviates from 0, the smaller the likelihood that the elements belong to the same contour. These principles are not only dictated by common sense, but have also been measured in the study of the statistics of grouping principles in natural scenes (Elder & Goldberg, 2002; Geisler, Perry, Super, & Gallogly, 2001; Sigman, Cecchi, Gilbert, & Magnasco, 2001).

*b*rather than

*a*being the “true” contour: the posterior odds equals the sum of the evidence that contour elements are to be found at a distance

*d*

_{v}if a contour

*v*does exist, that contour elements are found at a turning angle

*θ*

_{v}, added to the prior log-odds ratio of

*b*rather than

*a*constituting a contour. Except for the stimulus manipulations, the only difference between

*a*and

*b*is their orientation. Thus, a prior for

*v*represents the probability that a contour is to be found at the orientation of

*v, ρ*

_{v}.

*P*(

*d*

_{v}∣

*v*) indicates the likelihood that a contour present in the

*v*-orientation contains image elements with a distance

*d*

_{v}. It is more likely that two nearby contour elements are connected by the same contour than two elements located further apart (e.g., Geisler et al., 2001). A monotonically decreasing function of distance, bounded to exclude negative values, is needed to capture this qualitative aspect. The likelihood that a subcontour of a given length

*d*exists in a

*randomly picked*contour is determined by

*P*(

*l*>

*d*). This probability, the complement of the cumulative distribution function, is called the survival function of

*l*evaluated in

*d*.

*p*(

*l*) =

*λ*

*e*

^{−λl}. The probability that a contour without known length contains a subcontour of length

*d*

_{v}, as based on the survival function, is

*p*(

*l*) =

*q*

*l*

_{0}> 0, below which it is 0. The complement of the cumulative density function, assuming that

*d*

_{v}≥

*d*

_{0}, is given by

^{−q}. The corresponding Bayes factor is

*θ*between sampled contour elements? Feldman and Singh (2005) propose the most generic circular distribution to model the likelihood of a given turning angle, the von Mises density function. The likelihood of a turning angle in a contour is

*πI*

_{0}(

*β*))

^{−1}. Hereby we assume that the distribution is centered on 0°, and not shifted as in Feldman and Singh, who specifically discuss the case of closed contour figures.

*I*

_{0}(.) is a specific Bessel function needed to ensure that the distribution integrates to 1.

*β*is a concentration parameter: a high

*β*corresponds to a distribution with most of its density mass near the mean. A low

*β*corresponds to a more spread density.

*a*-orientation. Knowing that

*θ*

_{a}= 0, the likelihood equation for the a-contour candidate simplifies to

*e*

^{β}(2

*πI*

_{0}(

*β*))

^{−1}. After simplification, the log Bayes factor is

*β*′

*e*

^{−β′∣θ∣}. Taking into account that in this experiment,

*θ*

_{a}= 0, the log Bayes factor for discollinearity under a Laplace distribution is very simple:

^{1}

*θ*

_{b}∣ rather than ∣

*θ*

_{b}∣ itself, then a generalized Laplacian distribution, as used by Elder and Goldberg (2002) in their good continuation metrics, is in order. The probability function of the generalized Laplace is proportional to [

*e*

^{−(c∣x∣/σ)}]

^{γ}, with

*c*= √Γ(3/

*γ*)/√Γ(1/

*γ*). The normalizing constant of this function depends on both

*γ*and

*σ,*but for a chosen set of parameters, it does not matter in the log Bayes factor:

*φ*(

*ρ*

_{a}) is the prior density of the orientation of the

*a*-vector, and

*φ*(

*ρ*

_{b}) of the

*b*-vector, we will write the log prior odds as:

*ρ*

_{b}always equals

*ρ*

_{a}+ 90°. We will show later that the application of priors is somewhat more idiosyncratic, subject to qualitative variations across participants. Because this is a more empirical matter, we will discuss the appropriateness of alternative models for the log prior odds in Results section.

*b*∣/∣

*a*∣ parameter was manipulated from 4/5 (0.8) to 5/4 (1.250) in five steps along a logarithmic scale (0.8, √0.8 ≈ 0.8944, 1, √1.250 ≈ 1.118, 1.25). In a crossed-factorial design, the discollinearity

*θ*also took five different values, in a linear range from 0° to 20° (0°, 5°, 10°, 15°, 20°). 0° corresponds to collinearity. We did not go beyond 20° to stay clear from grouping orientations that are not part of the response alternatives. The discollinearity manipulation always applied to the

*b*-vector. Dots along the

*a*-vector were always collinear.

*a*-vector was set from 0° to 170° in intervals of 10°.

*a*-) or the zigzag (

*b*-) vector based on the resemblance to the icons. During the instructions, the participants were told that the wiggly line meant that the dots did not have to be perfectly collinear to constitute a valid organization. A third disk was empty. Participants could use it to indicate that they did not perceive any global grouping at all, or that their organization did not correspond to the options offered in the response screen. After the participant had clicked on one of the disks with a red dot indicating the position of the mouse pointer, the circular viewing window appeared on the screen, and the trial cycle restarted.

*a*or

*b*are less important as potential grouping orientations in rectangular lattice patterns. The blank disk accommodates for the few expected “neither-

*a*-nor-

*b*” groupings. At the same time, the blank option has the function of removing lapse trials, avoiding the estimation of a lapse rate post hoc (as was done by, e.g., Bleumers, De Graef, Verfaillie, & Wagemans, 2008).

^{2}

*a*-,

*b*-, and ‘blank’ choices given by each subject for each stimulus. In qualitative terms, the pattern that we expect contains three components:

- For a distance ratio ∣
*b*∣/∣*a*∣ < 1 the number of*b*-responses is larger than the number of*a*-responses when no discollinearity is involved. The number of*b*-responses declines as the distance ratio increases. - Since the discollinearity
*θ*is only applied to the*b*-vector, a larger*θ*will weaken grouping along the*b*-orientation, resulting in fewer*b*-responses. - Response frequencies are modulated by global lattice orientation.

*a*-,

*b*-, and blank responses for each combination of distance ratio and discollinearity are shown for two representative subjects in Figure 4. Within one set of axes, the levels of distance ratio increase along the abscissa, while the levels of discollinearity increase along the vertical dimension. The results for FZZ (fixed zigzag) and SZZ (stochastic zigzag) stimuli are presented next to each other.

*θ*= 0°). From left to right, the relative proportion of

*b*-responses (in green) decreases in favor of the

*a*-responses (in red). This is the pattern of data expected on the basis of grouping by proximity. The particular condition with ∣

*a*∣ = ∣

*b*∣, is a starting point to explore the main effect of discollinearity (prediction 2). In principle, proximity should not play a role, since dots in the

*a*- and

*b*-vector orientations are equidistant. Thus the increase in

*a*-responses one can observe moving up along the middle column in each set of axes (where ∣

*b*∣/∣

*a*∣ = 1.0000), is purely the influence of discollinearity. In the margins, below the horizontal and to the left of the vertical axes, one can inspect the evolution of response frequencies as a function of distance and discollinearity respectively, collapsed over the other experimental factors. Comparing the modulation in frequencies, it is obvious that, at least in the range in which the variables were manipulated, distance is much more important for grouping than discollinearity. The global orientation preference (prediction 3) can be read from the radial line graph in the corner of the data layout. Each of the radial lines is drawn in the orientation

*ρ*in which the

*a*-vector could be oriented (0° to 170°). The length of a line corresponds to the proportion of

*a*-responses relative to the sum of

*a*- and

*b*-responses. The pattern at the right, for subject PF, for instance, is clearly indicative of a large preference for vertical over horizontal grouping.

*i*

^{th}distance and

*j*

^{th}discollinearity along

*b*is simply the sum of their log Bayes factors within each level (

*k*) of global orientation. The

*a*-vector is not indexed, because its inter-dot distances and collinearity are held constant in the experiment. Each level of each stimulus variable is represented by a separate log Bayes factor, except for the logical constraints that the evidence provided by the proximity cue equals 0 if

*d*

_{a}=

*d*

*, as for the discollinearity cue if*

_{b}*θ*

_{b}=

*θ*

_{a}= 0. With 4 degrees of freedom for the influence of distance ratio, 4 degrees of freedom for discollinearity, and 18 degrees of freedom for global orientation, there are 26 parameter estimates for 5 × 5 × 18 = 450 combinations of the variables involved.

*d*

*=*

_{a}*d*

*with*

_{b}*θ*

*=*

_{a}*θ*

*= 0, because the*

_{b}*a*- and the

*b*-vector cannot be distinguished in this condition. Logically, the log Bayes factor for this case is 0, and grouping odds purely reflect the prior odds. This generic model thus has 24 degrees of freedom for the specification of likelihood function: one for each (

*i*,

*j*) combination, minus one. Together with the 18 levels of the prior odds, this amounts to 42 parameters to be estimated.

*χ*

^{2}distributed with 42 − 26 = 16 degrees of freedom. However, due to the high number of free parameters in this specific case, we opted to rely on bootstrapping rather than asymptotic theory to approximate the distribution of the test statistic and the corresponding

*p*-value. In addition, we combined the individual test statistics to obtain a “global”

*p*-value under the appropriate distribution. This meta-analytical procedure allows for the quantification of the statistical merit of competing models as evaluated

*across*observers (see 1 for details).

*p*-values >.50 for 4 subjects and one

*p*-value between .25 and .50. One

*p*-value was nearly significant, with

*p*≈ .06; the likelihood ratio statistic reached significance at .05 level in a single case, with

*p*≈ .02. For the subset of SZZ stimuli, for two observers we found

*p*> .50, and for the others .25 <

*p*≤ .50. Summed deviance differences, both across and separated per stimulus type, all produced global

*p*-values near .50. Based on this analysis there is no evidence that we should reject the simpler independent model in favor of the generic model with more parameters: including interaction only trivially improves model fit. Therefore, we can conclude that observers combine grouping cues as independent pieces of information. In the parametrically guided analysis (Formalization of cue interactions section), we will return to the possibility that the volunteer with a .02

*p*-value for interaction with the FZZ stimuli (BN) is an exception in the group.

*b*∣ − ∣

*a*∣. Under the power law, expressed in Equation 6, response odds are a linear function of log(∣

*b*∣) − log(∣

*a*∣). The Pure Distance Model states that the log-odds are a linear function of (∣

*b*∣ − ∣

*a*∣)/min(∣

*a*∣, ∣

*b*∣). In the range of tested inter-element distances, the difference between these predictors is numerically small. The validity of the parametrical models according to the exponential model, the power law, and the PDL can be evaluated statistically by comparison with the generic independence model of Equation 11 in a likelihood ratio test. The models according to exponential, power law, and the PDL are not mutually nested. Therefore, comparisons among these have to be based on how well each fits the data empirically, as well as on theoretical considerations.

*p*-values of aggregated deviances indicate significant deviations from all three models. When the most extreme deviance was excluded from the calculation,

*p*exceeded .15 for the power law model. This suggests that the tested data set (FZZ for subject CSE), with a deviance much larger than average, is atypical. Selectively excluding data sets (“jackknifing”, see the section on meta-analytical principles in 1) did not increase the

*p*-value for the exponential model. Therefore, unlike the power law, it fails globally.

*does*apply, inversion of values of ∣

*b*∣/∣

*a*∣ (e.g., from 0.8 to 1.25) leads to sign inversion of the response log-odds (e.g., from a logit of 6 to a logit of −6). The deviation of data from scale invariance is then most easily evaluated by how the nonparametric log-odds for the most extreme distance ratios differ in absolute value. Deviations from the power law are unsystematic, generally moderate and rarely significant. Predictions of the exponential model, on the other hand, tend to underestimate the curvature of the data. The CSE/FZZ data set is the only set with curvature still lower than exponential. Subject CSE's pattern of FZZ data points toward a global preference for a convex likelihood function.

*n*= 7 at

*α*= .05).

*γ*= 1), as well as a very good approximation to the von Mises distribution (

*γ*= 2) as special cases. We will analyze these alternatives in the same way as we did with proximity, by comparing their model fits and predictions with the free-form, nonparametric model of independent combination of grouping cues (Equation 11), as well as among each other.

*p*-values of added deviance differences confirmed this pattern; the “losing” model for each stimulus type deviated from the data to generate

*p*< 0.01, versus

*p*≈ .44 (von Mises, FZZ) and

*p*≈ .14 (Laplace, SZZ). The latter value seems to signal a somewhat poor fit, but 40% of the global deviance was contributed by one participant (JW).

*p*> .5.

_{2}(1) = 0 − Laplace) and 2 (log

_{2}(2) = 1 − von Mises). SZZ results tend to cluster near

*γ*= 1, FZZ either near or above

*γ*= 2. The right panel gives an idea of how the parameters translate into subjective probability distributions. Except for the large variability among observers, it is obvious that, in general, the SZZ discollinearity distributions are more kurtotic than the FZZ distributions. The lowest estimates we obtained for

*γ*coincide with the corresponding co-circularity estimate as reported in Elder and Goldberg (2002) (log

_{2}(

*γ*) ≈ −.136). On the other hand, the inferred standard deviations

*σ*of the distributions are 2.8 to 6.7 times larger than the standard deviation of Elder and Goldberg's co-circularity distribution, even when orientation priors are excluded from the model for comparability.

*k*

^{th}global orientation:

*a*-orientation, the log-odds of response frequencies would be linearly dependent on

*θ*

_{b}^{γ}−

*θ*

_{a}^{γ}. Compared to the model with a completely free likelihood function (Equation 11), this model performed very well, with, 11 out of 14 fits where the increase in deviance was not statistically significant. In the three other cases, two

*p*-values were situated between .01 and .05, (CLS/FZZ and JW/SZZ) and one

*p*≈ .0015 (CSE/FZZ), for the reasons mentioned before.

*β*on turning angle (

*β*

^{(j)}), or dependence of the value of

*σ*or

*γ*on the distance ratio (

*σ*

^{(i)},

*γ*

^{(i)}). The only interaction that emerged in a consistent manner occurred in the BN/FZZ data set. As one can verify in Figure 9, the modulation of distance grouping by discollinearity is ordinally coherent: the larger the discollinearity along the

*b*-vector, the stronger response frequencies are determined by the distance ratio (

*p*≈ .002, for log-likelihood ratio test of the model with

*β*

^{(j)}versus fixed distance weight

*β*). Apparently, when

*b*is already at a “disadvantage” because ∣

*b*∣ is larger than ∣

*a*∣, discollinearity gets “punished” more, as if collinearity and proximity mutually enforce each other. If, on the other hand,

*b*is short compared to

*a,*discollinearity does not play any role of importance.

*relative*preference of one versus the other global orientation. From a Bayesian point of view, that is, supposing that one knows the ecological statistics of contour orientations, it is possible to develop normative guidelines about how relative orientation probabilities should be evaluated. According to the data of Coppola et al. (1998), horizontal and vertical contours are more prevalent than other orientations in all types of scenes (indoor, outdoor, and forest scenes). The way orientation frequency drops off with deviation from the cardinal axes and whether vertical or horizontal orientations are more frequent depends on the type of scene. Analysis of the data of Coppola et al., digitized from their histograms, revealed that orientation distributions can be thought of as discrete mixtures of at least three important components: horizontal and near-horizontal orientations, vertical and near-vertical orientations, and a set of orientations uniformly distributed between 0 and 180°. The weights of these three components and the shape of the near-vertical and near-horizontal distributions are scene-specific. Orientation frequencies in urban scenes decrease from cardinal axes in a way that is reminiscent of the Laplace distribution and are dominated by vertical orientations. In natural scenes, the drop-off is more gradual and has the characteristics of a von Mises distribution, and horizontal orientations are slightly more prevalent than vertical ones. Also Hansen and Essock (2004) arrive at a horizontal preponderance conclusion for scenes devoid of manufactured structures.

*a*- or the

*b*-vector. In rectangular lattices, the orientations involved in a stimulus and a stimulus rotated 90° are exactly the same—e.g., 10° and 100°. Lattices with

*ρ*

*= 10° (*

_{a}*ρ*

*= 100°) and*

_{b}*ρ*

*= 100° (*

_{a}*ρ*

*= 10°) involve identical prior probabilities, with differently assigned labels. Therefore, it would make sense to include the following constraint in the model:*

_{b}*p*< .05) reductions in the goodness-of-fit for more than half of the data sets (4/7 FZZ and 4/7 SZZ sets).

*does*make a difference whether a certain vector is an

*a*or a

*b*-vector; in other words, the experimental manipulations of proximity and collinearity “interact” with the global orientations when converging toward a grouping percept. To understand better the origin of this effect, we re-fitted models with orientation-dependent estimates for

*β, γ*and

*σ,*while

*including*the inversion constraint, forcing global orientation preferences to behave as reflecting prior probabilities. In order to obtain stable estimates and to allow the model-fitting algorithm to converge, we tested inclusion of orientation-specific

*β, γ*and

*σ*separately. In addition, we partitioned the stimulus presentations into near-horizontal lattices (

*ρ*

*= 170°, 0°, 10°), near-vertical lattices (*

_{a}*ρ*

*= 80°, 90°, 100°), and oblique sets near 30°, near 60°, near 120° and near 150°.*

_{a}*β*is larger for obliquely oriented lattices than for lattices with main vectors near the cardinal axes. Proximity is treated as if it were less informative at horizontal and vertical orientations. The opposite is the case for

*γ*and

*σ*(not shown). Discollinearity affects grouping

*more*at vertical, and especially at horizontal orientations. With

*γ*inversely related to kurtosis, the implied distribution of turning angles is clustered more tightly near 0° for horizontal lines. When

*γ*is held constant,

*σ*follows a similar pattern: the spread of the implied discollinearity distribution is tighter for horizontal and vertical lines than for diagonal ones. In summary, observers seem to use the assumption that horizontal and vertical contours are longer and straighter than oblique lines.

- In general, proximity and collinearity are treated as independent sources of information for grouping.
- Global orientation of a grouping candidate strongly influences its salience, reflecting prior assumptions about orientations of contours. The observers in this study preferred vertical orientations over horizontal ones, although with large inter-individual variation in the effect size.
- The most successful model for the subjective likelihood function of distance is based on a power law. However, predictions from the Pure Distance Law are practically indistinguishable.
- The most successful model for the subjective likelihood function of discollinearity is based on the generalized Laplace distribution. Parameter estimates point toward higher kurtosis in the subjective distribution of angles along contours with stochastic zigzag than with regular zigzag.
- Observers diverge significantly in the weight of evidence assigned to different grouping cues. Nevertheless, there is a remarkable consensus in the principle of independence of cues in their combination, as well as in the importance of evidence, with a 25% increase in distance being much more detrimental to grouping than a zigzag of 20°.
- Further analysis revealed that grouping by proximity is weakened for cardinal lattice orientations, while discollinearity exerts a stronger influence for these orientations. This is consistent with an internal model in which horizontal and vertical contours tend to be longer and straighter than contours in oblique orientations.

*P*(

*x*∣

*ξ*), then four independent realizations of the same variable will have a joint likelihood of

*P*(

*x*

_{1},

*x*

_{2},

*x*

_{3},

*x*

_{4}∣

*ξ*) =

*P*(

*x*

_{1}∣

*ξ*)

*P*(

*x*

_{2}∣

*ξ*)

*P*(

*x*

_{3}∣

*ξ*)

*P*(

*x*

_{4}∣

*ξ*). If the four realizations of

*X*have the same value, as do four distances along a chain of equidistant dots,

*P*(

*x*

_{1},

*x*

_{2},

*x*

_{3},

*x*

_{4}∣

*ξ*) =

*P*(

*x*∣

*ξ*)

^{4}. Generalizing, for

*n*identical measurements, the joint likelihood function gets raised to the

*n*

^{th}power, or the log-likelihood multiplied with

*n,*compared to the case of a single measurement.

*n*dipoles as independent sources of information rather than 1 raises the probability distribution for the distance in each orientation to the

*n*

^{th}power. This would mean that the estimates obtained for

*β*in Results section are an

*n*-fold of Elder and Goldberg's 2.92 estimate (see Figure 6). The estimates suggest that subjects might use 6–7 dipoles per orientation to evaluate grouping odds based on distance. Similarly, while the co-circularity cue in Elder and Goldberg has a standard deviation of approximately 1.34 radians, or about 0.42 in log

_{2}units, a joint set of

*n*discollinearities will generate a downward shift of the estimated standard deviation. This is somewhat harder to evaluate numerically, but in any case, variations like these might help to explain where inter-subject variability originates from.

*do*behave normatively: with a single exception, the contributions of proximity and collinearity were additive in log-odds, which points toward Bayesian independent cue combination. This is not the first time that grouping laws have been shown to operate additively in the response logit. In an earlier study, we obtained the same result for proximity and alignment in Gabor lattices (Claessens & Wagemans, 2005). In a recent publication, Kubovy and van den Berg (2008) discuss the independence in the combination of proximity and brightness similarity. The linear increase or decrease in grouping log-odds provoked by manipulations of brightness consistency is largely compatible with the generalized Laplace fitted by Elder and Goldberg (in fact, with

*γ*= .89 it is an almost pure Laplace distribution), and with the independence of brightness and proximity in ecological data. Results in various dot lattice experiments thus converge, possibly not in the exact numerical parameter values but at least in the qualitative principles, toward an optimal Bayesian combination of different sources of grouping information.

*b*-response with a chance that can be calculated by the inverse of the logit:

*b*-response, per trial, depends on the stimulus at hand; in terms of lattice parameters, summarized in

*d*

*,*

_{a}*d*

*,*

_{b}*θ*

*,*

_{a}*θ*

*,*

_{b}*ρ*

*,*

_{a}*ρ*

*. Let us write that, for the*

_{b}*i*

^{th}trial, the log-odds are log(

*P*(

*b*∣

*d*

_{bi},

*θ*

_{bi},

*ρ*

_{bi})/

*P*(

*a*∣

*d*

*,*

_{ai}*θ*

*,*

_{ai}*ρ*

*)); the inverse logit, as given above, transforms this to what we will write as*

_{ai}*P*(

*a*-response)

_{i}and

*P*(

*b*-response)

_{i}. The exact way in which the log-odds are determined by the stimulus parameters depends on the model used and the values of model parameters. The maximum likelihood principle dictates that the “best” estimates for values of model parameter are those that make the set of observed responses the most probable. Maximizing the likelihood is equivalent to minimizing the deviance (−2 × log-likelihood). That is, parameter estimates are established by minimizing

*N*the total number of non-blank trials, and

*I*(

*v*)

_{i}an indicator function that is 1 if the response in the

*i*

^{th}trial corresponds to the vector v, and 0 otherwise. Deviance was minimized through a numerical algorithm (quasi-Newton), available in the SAS nlmixed procedure (SAS Institute Inc., 2004), with realistic starting values, and boundary conditions where applicable. The inverse of the approximated Hessian of the negative log-likelihood, evaluated at the maximum-likelihood estimates, is a variance–covariance matrix used to calculate standard errors.

*χ*

^{2}distributed, given certain regularity conditions. This is the core of the likelihood ratio test. Imagine that we were to test whether the threshold

*μ*of a certain sychometric function

*p*=

*F*(

*x*,

*μ, σ*) equals 0. We would fit the model M

_{1}, without constraints, leading to maximum likelihood l

_{1}, with maximum likelihood estimates

_{1}and

_{1}. Fitting of model M

_{0}against the data would yield estimate

_{0}and likelihood l

_{0}, with

*μ*

_{0}= 0. The likelihood ratio, l

_{0}/l

_{1}summarizes how much support we have for the null-hypothesis

*μ*= 0 versus the model without constraints. A value near 1 shows that the constraint does not greatly affect model fit, and that

*μ*can be discarded as a free parameter without much loss. The actual statistic used to evaluate whether the constraint should be maintained or rejected is minus twice the log likelihood ratio (−2logLR) −2log(l

_{0}/l

_{1})—note that this equals the absolute differences between deviances. If the sample is sufficiently large, and if the null hypothesis is true, −2logL is distributed as

*χ*

^{2}, with as degrees of freedom the number of parameters held fixed by the hypothesized constraint. In our example, to test statistically whether

*μ*might be 0, at significance level

*α,*we would look up the 1 −

*α*quantile in the

*χ*

^{2}distribution with

*df*= 1. Only if the log-likelihood ratio as obtained for a sample of data exceeds this criterion, we would reject

*μ*= 0, at least in the classical Neyman-Pearson framework of hypothesis testing. The likelihood ratio test procedure is described in many classical and standard statistical textbooks (e.g., Bain & Engelhardt, 1992), especially in those covering categorical data analysis (e.g., Agresti, 2002).

*k,*and the number of observations

*n,*then AIC is the deviance “corrected” with 2 ×

*k,*while BIC corrects the deviance with log(

*n*) ×

*k*. Notwithstanding their obvious similarity, AIC and BIC have very different theoretical motivations. Yet both can be seen as a deviance that is penalized for relying on estimation to fit the data. BIC is considerably more severe in this penalty, which explains why model selection based on AIC and BIC can give quite different results.

*χ*

^{2}distribution of deviance differences. However, due to the high number of estimated parameters involved as well as a considerable number of expected 0-frequency observations in the test of independent combination of grouping cues, we deemed it safer, in this particular case, to rely on bootstrapping techniques to have access to more accurate

*p*-values. Specifically, we proceeded along the lines of the following scheme:

- Establish the maximum likelihood estimates (MLE) under each of the competing models. Calculate the deviance for each:
*D*_{M0},*D*_{M1}. The test statistic is the absolute difference between the deviances. - Use the MLE of the parameters to generate artificial data sets, sampling simulated response frequencies from binomial distributions under the assumption that M
_{0}, the constrained model, is true. - Perform a MLE procedure for both M
_{0}and M_{1}on the simulated data sets. Calculate*D*_{M0}−*D*_{M1}for each. - Fit a gamma distribution to the bootstrapped distribution of deviance differences.

*p*-values with the raw bootstrapped distribution, the gamma distribution provides a smoothing that is theoretically related to the asymptotically expected

*χ*

^{2}distribution.

*p*-value for each observer–stimulus type combination.

*p*will be under a .05 significance level for purely statistical reasons. An isolated case of significance should not lead to automatic rejection of the null hypothesis. The models described in this paper were fitted at subject-level, and therefore we have seven

*p*-values for each model comparison within each type of discollinearity. Several scenarios are possible. The most convenient situation, at least from a data-analytic point of view, is that either all

*p*-values are reasonably near 0.5, without any significance, or all

*p*-values are below 0.05. These are strong cases for absence or presence of the tested effect, respectively. But this is rather exception than rule. What to do if all

*p*-values approach 5% without individually signaling significance? What if one or more

*p*-values reach significance, but most do not? How to conclude whether there is an effect, whether we are confronted with an outlier or an entirely different group of subjects, whether the significance is a normal statistical fluctuation, or whether yet another stochastic mechanism is at work?

*μ*in a psychometric function is 0. In the testing procedure, we would fit both the unconstrained and the constrained (

*μ*= 0) model, per subject

*j*, where

*j*= 1 to

*J*. The maximum likelihoods for both models would be transformed in

*J*× 2 deviances. Under the null hypothesis, the differences between these, Δ

*D*

_{j}=

*D*

_{M0,j}−

*D*

_{M1,j}approximately follow a

*χ*

_{df=1}

^{2}distribution. Distribution theory states that the sum of

*χ*

^{2}-distributed variables is also

*χ*

^{2}distributed, with as shape parameter the sum of degrees of freedom of the constituent distributions. In short, Δ

*D*∼

*χ*

_{df=k}

^{2}⇒

*D*

_{j}∼

*χ*

_{df=Jk}

^{2}. This is a very useful result to establish a “global effect” criterion: if the

*aggregated*deviance exceeds the .95 quantile in a

*χ*

_{df=Jk}

^{2}distribution, a “global effect” exists. Except for serving as a very welcome summary of the global pattern of observers' data, the combination of deviance differences provides a large increase in statistical power compared to the individual fits. On the other hand, this does not allow us to discard the individual deviance differences or

*p*-values from our discussion. It is very well possible that one of our volunteers is an outlier and is subject to a completely idiosyncratic effect. In this case, the global deviance difference will be largely constituted by the deviance difference of one single individual; in other words, while most of the deviance differences would be in the “body” of the

*χ*

_{df=k}

^{2}distribution, one would find one deviance difference far in the “tail”. A case like this is easily discovered by jackknifing: leaving out each participant in turns, re-calculating the aggregated deviance and re-evaluating global significance. Another useful tool to diagnose this type of situation is the quantile plot, in which obtained deviance differences are plotted against their ordered expected values in a sample of size J from a

*χ*

_{df=k}

^{2}distribution. One can also plot the

*p*-values, which should approximately be uniformly distributed in the interval [0, 1] if the null-hypothesis is true. If all deviance-difference points cluster very much toward one side, there is a global effect. If all points are scattered tending, in median, toward the middle, there is a global absence of effect. If all points are reasonably close to the middle of the theoretical distribution, but one approaches boundary or is situated far in the tail, we have an outlier. If a group is near median, and a group near the extreme, the participants are subdivided in two groups.

*a*- versus

*b*-responses, would not depend on the number of response alternatives offered. The independence of the relative preference between two choice alternatives (here,

*a*and

*b*) from the presence or absence of other choice options (here, the blank) is an axiomatic idea in choice theory known as irrelevance of independent alternatives (Luce, 1959). Although substantial differences between participants in the tendency to use the blank option do exist, the results support our reliance on the irrelevance assumption: the pattern of

*a*- and

*b*-response frequencies for the subset of perfectly collinear stimuli (

*θ*= 0°) is identical to the pattern obtained in the analyses by Kubovy et al. (1998) for the same stimuli with four response alternatives.