The flash-lag effect refers to the phenomenon in which a flash adjacent to a continuously moving object is perceived to lag behind it. Phenomenally, the flash appears to be spatially shifted relative to the moving stimulus, and the amount of lag has often been quantified as the flash’s nulling position, which is the physical spatial offset needed to establish perceptual alignment. The present study offers a better way to summarize flash-lag data. Instead of plotting data in terms of space, the psychometric function of the observer’s relative-position judgment is drawn on spatiotemporal plot. The psychological process underlying illusory lag is formulated as spatiotemporal bias and uncertainty and their estimate as a spatiotemporal convolution kernel that best explains the spatiotemporal psychometric function. Two empirical procedures of kernel estimation are described. One procedure is to fit the free parameters of the kernel to experimental data for continuous motion trajectory. The second is to give an analytical solution to the kernel using experimental data for random motion trajectory. The two procedures yield similar kernels, with negligible spatial bias and uncertainty and substantial temporal bias and uncertainty. In addition, it is demonstrated that an experimental manipulation of temporal predictability of the flash can change the temporal bias in the estimated kernel. The results of this novel analysis reveal that the flash-lag effect is viewed as a spatiotemporal correlation structure, which is largely characterized by the tendency to compare the position of the flash in the past with the position of the moving item in the present.

^{2}gray on the 24.4-cd/m

^{2}white background) arranged collinearly with the gap of 20 pixels (they always moved synchronously as though a single bar; the singular form “bar” will be used to refer to these two rectangles). In each trial, the moving bar made a horizontal translation at a constant speed along a line 2 degrees below the fixation point (leftward or rightward chosen randomly at each trial). At a random timing, another upright rectangle (4 pixels × 16 pixels; its brightness having been subjectively equated with that of the moving bar) or the flash was briefly presented for one frame, with its vertical position between the two rectangles comprising the moving bar. The flash’s horizontal position was chosen randomly from a range of appropriate position levels for making a psychometric function. The observer judged whether the flash was seen to the left or right of the moving bar. The best-fit cumulative Gaussian was estimated by the maximum likelihood method.

*Ψ*(

*x*) (Figure 3A)? Clearly, it stems from the bias and uncertainty that must occur whenever the visual system encodes signals from the outer world. If the observer is perfectly noise-free such that the flash’s position and the moving bar’s position are represented with perfect accuracy and precision, the psychometric function is simply reduced to a step function,

*Ψ*(

*x*) (Figure 3B). In a realistic system, however, some bias and uncertainty are inevitable. Suppose that for some reason a flash spatially shifted in the direction of motion is seen as aligned with the moving bar and that there is a spatial range around this bias within which we are not sure whether the flash and the moving bar are perceptually aligned or not. Assuming that the bias and uncertainty are characterized by the mean (

*μ*

_{x}) and standard deviation (

*σ*

_{x}) of Gaussian, one can plot the probability density function of perceptual alignment,

*K*(

*x*) (Figure 3C). This is the distribution of the physical position of the flash that is perceptually aligned with the moving bar presented at position zero. When a flash is presented at some physical position along the abscissa, the probability of seeing it to the right of the moving bar is calculated as the integration of

*K*(

*x*) up to this position. Thus, . For the following explanations and fitting procedures, however, let us rewrite this relationship by a mathematically equivalent form, , that is convolution of

*Ψ*with kernel

*K*.

*K*) incorporates all bias and uncertainty generated in the “black box” process, which is responsible for all differences between the perfect psychometric function (

*Φ*) and the one that is observed (

*Ψ*). In any event, the key concept of this section is that

*Ψ*=

*Φ**

*K*.

*Ψ*was observed.

*Φ*is given.

*K*is the solution to the behavior of the “black box” process. Unfortunately, however, the kernel

*K*shown in Figure 3C is not the only correct solution, but just one of many. Why? Because bias and uncertainty can also occur along

*time*.

*actual*measurement at each spatiotemporal position. By applying the best-fit cumulative Gaussian shown as the green curve in Figure 2E, one gets a smoothly curved surface shown in Figure 5A. Each point on this surface indicates the percentage of “right” responses to the flash presented at each spatiotemporal position. Let us call this surface a spatiotemporal psychometric function.

*p*=

*f*(

*x*,

*t*), where

*f*indicates the percentage of “right” responses to the flash at a spatiotemporal point (

*x*,

*t*). Likewise, the profile in Figure 4B is

*p*=

*f*(

*x*− 5,

*t*− 5), plotted relative to the moving bar presented at (−5, −5). Then the superposition, shown in Figure 4C, is to add

*f*(

*x*+

*χ*,

*t*+

*τ*) if and only if (

*χ*,

*τ*) is along the motion trajectory (rightward at 1 pixel/frame). As the motion trajectory in space-time can be written as

*m*(

*x*,

*t*) = bool[

*x*=

*t*] (where bool[

*Q*] is 1 if

*Q*is true, 0 if false), the profile in Figure 4C is simply written as . This equation is the definition of spatiotemporal correlation between

*f*(

*x*,

*t*) and

*m*(

*x*,

*t*). In this context, the trajectory of the moving bar (the white diagonal line) can also be called the autocorrelation of the moving bar’s position. Now the abscissa and ordinate of Figure 4C can be viewed as the relative position and time, respectively, between the flash and moving bar, the latter of which is always located at the origin. Let us call this format a spatiotemporal correlogram.

*Ψ*, the perfect psychometric function

*Φ*, and the internal kernel

*K*introduced in the previous section are spatiotemporal functions (see Figure 5A–C),

*Ψ*(

*x*,

*t*),

*Φ*(

*x*,

*t*), and

*K*(

*x*,

*t*). The shape of

*Φ*is the performance of a hypothetical noise-free system in which everything is always registered with perfect accuracy and precision: the percentage of “right” responses is always 100% for every flash on the right of the motion trajectory, whereas it is always 0% on the left. The goal of analysis is to discover the internal kernel

*K*that satisfies the relationship

*Ψ*=

*Φ**

*K*. There is a problem, however:

*K*is not determined uniquely.

*K*(

*x*,

*t*). In the previous section,

*K*(

*x*) was assumed to be a Gaussian function of space, with its

*μ*

_{x}and

*σ*

_{x}characterizing spatial bias and uncertainty, respectively. However, if the visual system somehow produces spatial bias and uncertainty, for the same reason it should produce temporal bias and uncertainty as well. When the observer is supposed to judge the relative position between a simultaneously seen pair of motion and flash, perceptual simultaneity might be biased so that it tends to be between a moving bar at the present and a flash in the past—moreover, to an uncertain extent in the past. As in the case of spatial bias and uncertainty, let us assume that temporal bias and uncertainty are characterized by a Gaussian function (with

*μ*

_{t}and

*σ*

_{t}). Putting them together, the kernel

*K*(

*x*,

*t*), which is the probability density function of perceptual spatiotemporal alignment, forms a 2D Gaussian in space-time. To avoid further complication, its covariance is hereafter assumed to be zero, i.e., its spatial and temporal components are independent of each other. (In fact, in a preliminary version of the fitting analysis described below, the space-time correlation coefficient

*ρ*was also included as one of free parameters, but it yielded the best-fit

*ρ*of only 0.079.)

*K*(

*x*) illustrated in the previous section is now described as a special case of the 2D Gaussian, when

*μ*

_{t}= 0 and

*μ*

_{t}→ 0. Indeed, convolution of

*Φ*with this particular

*K*equals

*Ψ*. Another extreme example of

*K*is a pure temporal function.

*K*could also be other shapes in between. Note that all these candidates equally satisfy the relationship

*Ψ*=

*Φ**

*K*. Therefore, although the above analysis clearly proposes that the flash-lag effect is viewed as a spatiotemporal correlation structure, the particular data set used in Figure 5 is not informative enough to determine the shape of

*K*.

*K*maintains a common shape irrespective of the moving bar’s speed, one can find the best-fit parameters of

*μ*

_{x},

*σ*

_{x},

*μ*

_{t}, and

*σ*

_{t}that minimize the residual between the model (

*Φ**

*K*) and the observed data (

*Ψ*) for all speed conditions. Using the 66 (11 position levels × 6 speeds) points as the data set, Levenberg-Marquardt nonlinear optimization yielded (

*μ*

_{x},

*σ*

_{x},

*μ*

_{t},

*σ*

_{t}) = (2.10 pixels, 1.75 pixels, −4.95 frames, 3.38 frames, respectively). The spatiotemporal plot of the kernel is shown in Figure 5C. Convolution of the perfect psychometric function with this best-fit spatiotemporal kernel resulted in a fairly good approximation to the actual data; the resulting theoretical profiles are drawn as the red curves in Figure 2. Importantly, exactly the same kernel is used throughout the six different speed conditions. These sigmoidal shapes are virtually indistinguishable from the one-dimensional cumulative-Gaussian fit applied separately for each condition (green curves in Figure 2).

*μ*

_{t}), indicating that the observer somehow compared the relative position between a moving bar at the present and a flash in the past, as though they were stimuli seen simultaneously. As the kernel is elongated temporally (

*σ*

_{t}), the perceptual simultaneity between the moving bar and flash has a large temporal uncertainty. The kernel also shows a little amount of spatial uncertainty (

*σ*

_{x}), but it is in fact in an excellent agreement with the sensitivity of spatial vernier acuity between a flash and a stationary bar at the tested eccentricity range. (A control experiment measured this acuity by performing the same test with the speed of the bar at 0 pixels/frame. The

*σ*

_{x}in this condition was found to be 1.75 pixels.) Finally, the kernel has a slight spatial offset in the direction of motion (

*μ*

_{x}). Thus, the observer somehow judged the moving bar and the flash with 2-pixels offset as perceptually aligned.

*Ψ*to a combination of

*Φ*and

*K*was made possible only by preparing multiple samples from different speed conditions and by finding best-fit parameters.

*Φ*is quite rectangular: the percentage of right responses should be 0% on the left of the current jumping bar, 100% on the right of it, and should remain at chance otherwise (Figure 7B). The observed percentage of right responses for an actual observer is also plotted in a form of a spatiotemporal correlogram. Each response is plotted at each spatiotemporal position of the flash relative to the spatiotemporal position of the current jumping bar, which is always located at the center of the correlogram (Figure 7A).

*K*that satisfies the relationship

*Ψ*=

*Φ**

*K*. One could solve this by maximum likelihood estimation of the four free parameters of 2D Gaussian, as was done in the previous section. However, the advantage of the randomness of the jumping bar greatly helps break down the question to a few easier ones. Specifically, the randomness makes spatial and temporal correlation structures orthogonal to each other, which means that the spatial and temporal components of

*K*can be estimated separately.

*K*. This is the only source that brings about the horizontal sigmoid near the center of

*Ψ*because however the temporal component of

*K*may change, it would only vertically distort

*Ψ*. Therefore, the spatial component of

*K*can be estimated as the one that satisfies the relationship . The temporal summation of

*Ψ*is plotted in Figure 8, as

*Ψ*(

*x*), and the best-fit cumulative Gaussian is superimposed; the perfect psychometric function for it is shown as

*Φ*(

*x*).

*K*(

*x*) is the deconvolution of

*Ψ*(

*x*) and

*Φ*(

*x*), but in this case it can be calculated simply as the first-order derivative of

*Ψ*(

*x*). The result was a Gaussian with parameters (

*μ*

_{x},

*σ*

_{x}= 0.042 pixels, 1.08 pixels).

*μ*

_{x}was extremely close to zero, which means that there was no response bias in space.

*σ*

_{x}was as small as 1 pixel, indicating this observer’s good vernier-acuity performance around perceptual alignment.

*K*. Convolution of

*Φ*(

*x*,

*t*) with

*K*(

*x*) estimated above would only produce a little bit of spatial blur at the transition between 0% and 100%, leaving the temporal structure unchanged. Thus, following the same logic as above, all temporal shift and blur observed in

*Ψ*(

*x*,

*t*) remain to be explained by the temporal component of

*K*. The spatial summation of

*Ψ*is plotted as

*Ψ*(

*t*) (with the responses in the negative portion of space flipped and merged to the positive portion, and with data close to the ordinate excluded), and its low-pass-filtered curve is superimposed; the perfect psychometric function for it is shown as

*Φ*(

*t*). As

*K*(

*t*) is the deconvolution of

*Ψ*(

*t*) and

*Φ*(

*t*), it was calculated as the division of

*Ψ*(

*t*) by

*Φ*(

*t*) in Fourier domain. The result of deconvolution is shown by the green curve. Another way to find

*K*is to estimate the best-fit pair of

*μ*

_{t}and

*μ*

_{t}by the maximum likelihood method, which was also tested. The result of fit, (

*μ*

_{t},

*μ*

_{t}) = (−7.97 frames, 6.29 frames), is overlaid (blue). The two estimated profiles of

*K*(

*t*) were in good agreement with each other.

*K*was found to be biased toward the past, with considerable side lobes in time (Murakami, 2001).

*K*can be reconstructed as multiplication of the two orthogonal components (Figure 7C). This shape seems to share several aspects with the estimation in the previous section. First, its peak is located 5–8 frames in the past. Second, temporal uncertainty is so large as to span more than 10–20 frames. Third, spatial uncertainty is within the range of vernier-acuity sensitivity measured in stationary stimuli. However, the spatial offset as found in the previous continuous-motion condition is absent (it should be so; see “Discussion”).

*K*. The stimulus and procedure were otherwise identical to the original experiment described in the previous section.

*K*(

*x*,

*t*) was estimated independently at each phase along the time series around the schedule change. The results exhibited no systematic change in the estimated spatial component of

*K*, which was still comparable to the range of vernier-acuity sensitivity, indicating that the overall task difficulty was effectively kept constant across schedules. However, a small but significant (

*t*test,

*p*< 0.05) change in the estimated temporal bias

*μ*

_{t}was observed, as clearly shown in Figure 9. Specifically, the estimated values of

*μ*

_{t}for the fixed-interval (predictable) phases were smaller than those for the variable-interval (unpredictable) phases by roughly one frame, although the absolute baseline of

*μ*

_{t}varies across observers. (The decrease in the temporal uncertainty

*μ*

_{t}was also significant for the author’s data but not for other observers.)

*y*-intercept of Figure 2H, where the spatial PSE is plotted as a function of the moving bar’s speed. Whereas a purely temporal account of the flash-lag effect predicts a linear regression that exactly passes the origin, the actual data lie slightly above this prediction. Thus, the result suggests that somehow the observer tended to judge a slightly (i.e., about half of the width of each rectangular stimulus, see Figure 1) overreached flash as perceptually aligned with the moving bar, irrespective of speed. It is questionable, however, whether this is really one of the general characteristics of the flash-lag effect. Previous studies have sometimes shown similar positive

*y*-intercepts (Krekelberg & Lappe, 1999), but not always (Nijhawan, 1994; Kirschfeld & Kammer, 1999). Repeatability across observers and across experimental situations is, therefore, open to future investigations.

*Ψ*(

*x*,

*y*,

*t*).