**Abstract**:

**Abstract**
**It is conventionally assumed that the goal of the visual system is to derive a perceptual representation that is a veridical reconstruction of the external world: a reconstruction that leads to optimal accuracy and precision of metric estimates, given sensory information. For example, 3-D structure is thought to be veridically recovered from optic flow signals in combination with egocentric motion information and assumptions of the stationarity and rigidity of the external world. This theory predicts veridical perceptual judgments under conditions that mimic natural viewing, while ascribing nonoptimality under laboratory conditions to unreliable or insufficient sensory information—for example, the lack of natural and measurable observer motion. In two experiments, we contrasted this optimal theory with a heuristic theory that predicts the derivation of perceived 3-D structure based on the velocity gradients of the retinal flow field without the use of egomotion signals or a rigidity prior. Observers viewed optic flow patterns generated by their own motions relative to two surfaces and later viewed the same patterns while stationary. When the surfaces were part of a rigid structure, static observers systematically perceived a nonrigid structure, consistent with the predictions of both an optimal and a heuristic model. Contrary to the optimal model, moving observers also perceived nonrigid structures in situations where retinal and extraretinal signals, combined with a rigidity assumption, should have yielded a veridical rigid estimate. The perceptual biases were, however, consistent with a heuristic model which is only based on an analysis of the optic flow.**

*optimal-observer model*) makes two major assumptions in order to make this problem computationally tractable; specifically, the ambiguity in sensory signals is resolved thorough the use of (a) extra retinal information regarding egomotion and (b) an assumption of the rigidity of the external world. We show here that perceived 3-D structure is inconsistent with this model and that neither of the two assumptions appears to be used in deriving 3-D structure from optic flow. Instead, we show that perceptual judgments are consistent with a model in which 3-D structure is directly derived from optic flow following certain simple heuristics, which correctly predict large biases away from veridicality, even under stimulus conditions that mimic natural full-cue conditions.

*ω*of the two-plane configuration in a viewer-centered coordinate frame, which generates an optic flow pattern carrying information about the 3-D structure (Koenderink & van Doorn, 1975, 1978; Koenderink, 1986; Braunstein & Tittle, 1988). The optic flow information, however, is not sufficient for an accurate estimate of the metric structure of the two-plane configuration. Each surface projects a velocity gradient, termed deformation (

_{r}*def*)—in this particular case, the rate of change of retinal velocity magnitudes along the horizontal dimension (Liter, Braunstein, & Hoffman, 1993; Domini, Caudek, & Proffitt, 1997; Caudek & Domini, 1998; Domini, Caudek, Turner, & Favretto, 1998; Liter & Braunstein, 1998; Domini & Caudek, 1999). The velocity gradient is ambiguous, since

*def*specifies infinite combinations of surface slant

*σ*and relative rotation

*ω*, being

_{r}*def*

**=**

*ω*tan

_{r}*σ*. In Figure 1b, it can be seen how surfaces with different slants viewed by observers moving by different amounts generate the same

*def*. In theory, an accurate measurement of the pattern of retinal accelerations would eliminate this ambiguity, but it is known that this measurement is highly unreliable and therefore useless (Norman & Todd, 1993; Domini & Braunstein, 1998; Eagle & Hogervorst, 1999; Hogervorst & Eagle, 2000).

*ω*and thus of

_{r}*σ*from

*def*, since

*σ*= tan

^{−1}(

*def*/

*ω*). Therefore, an optimal-observer model combining sensory information (retinal and extraretinal) with a

_{r}*prior for stationarity*is able to accurately derive the metric structure of the two-plane configuration. Note that a prior for stationarity also implies

*a prior for rigidity*(Ullman, 1979; Grzywacz & Hildreth, 1987). Since static surfaces do not change their structure during the observer's motion, most 3-D transformations generated by egomotion are necessarily rigid. For example, in the case of the two-plane configuration, if the surfaces are static in the world, then the dihedral angle between the two surfaces does not change either. In the remaining part of this article, we will therefore refer to this prior as to the

*stationarity/rigidity prior*.

*heuristic-observer model*, which ignores both egocentric information and a prior for stationarity/rigidity, perfectly predicts the empirical results.

*ω*

_{r}_{1}≠

*ω*

_{r}_{2}. The sensory information available for estimating

*ω*

_{r}_{1}and

*ω*

_{r}_{2}is the velocity gradients (

*def*

_{1}and

*def*

_{2}) and an estimate of the observer's egomotion. The latter can also be described in terms of angular velocity, since when the viewpoint shifts along the horizontal axis with linear speed

*T*, as in our experiments, it produces a rotation of the surface relative to the observer, in a head-centric coordinate frame, of angular velocity

_{x}*ω*=

_{e}*T*/

_{x}*z*, where

_{f}*z*is the fixation distance (see Figure 1b). An estimate of the angular velocity

_{f}*ω̂*can be provided by extraretinal and proprioceptive signals (Buizza, Leger, Droulez, Bertoz, & Schmid, 1980; Ferman, Collewijn, Jansen, & van den Berg, 1987; Nawrot, 2003; Nawrot & Joyce, 2006; Gu, DeAngelis, & Angelaki, 2007; Bennur & Gold, 2008; Gu, Angelaki, & DeAngelis, 2008; Liu & Angelaki, 2009; Nawrot & Stroyan, 2012).

_{e}*ω*

_{r}_{1}and

*ω*

_{r}_{2}from

*def*

_{1},

*def*

_{2}, and

*ω̂*is that a surface rotation independent of the observer can also contribute to

_{e}*ω*

_{r}_{1}and

*ω*

_{r}_{2}(Figure 1b). If, during a head movement, the surface also rotates (with angular velocity

*ω*), then the relative angular velocity between surface and observer is

_{s}*ω*=

_{r}*ω*−

_{s}*ω*(Figure 1b, right panel). The problem of estimating the relative rotation of the two surfaces is therefore ill posed, since the velocity gradients corresponding to each surface (

_{e}*def*

_{1}and

*def*

_{2}) define a set of two equations and four unknowns (

*ω*

_{r}_{1},

*ω*

_{r}_{2},

*σ*

_{1},

*σ*

_{2}):

*P*(

*def*

_{1},

*def*

_{2}|

*ω*

_{r}_{1},

*ω*

_{r}_{2}) is the likelihood term,

*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) the prior, and

_{e}*P*(

*def*

_{1},

*def*

_{2}) a normalizing constant. Since the likelihood term is the product of two likelihoods, then

*ω*

_{r}_{1}and

*ω*

_{r}_{2}) depends on both the gradient of the optic flow produced by each surface (through the likelihood terms) and an additional term that, as we will see shortly, restricts the possible solutions on the basis of egomotion information (

*ω̂*) and a priori assumptions about the rotation of the surfaces in the world.

_{e}*def*is ambiguous, this solution is never veridical (see Appendix A for the mathematical proof). No matter what the physical 3-D rotation of a surface is, the most likely rotation is a monotonically increasing function of

*def*. Thus, if the two-plane configuration undergoes a 3-D rigid rotation but each surface projects a different value of

*def*(

*def*

_{1}≠

*def*

_{2}), the predicted perceptual interpretation is that of a nonrigid structure. Conversely, if the two surfaces rotate by different amounts but produce the same value of

*def*(

*def*

_{1}=

*def*

_{2}), the nonrigid rotation is predicted to be perceived as rigid. Indeed, these predictions are compatible with the findings of Domini et al. (1997) showing that the two-plane configuration is perceived as rigid only if the two surfaces generate the same value of

*def*.

*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) of Equation 3 can change the maximum likelihood estimate (MLE), since it incorporates both information about the observer's egomotion and the stationarity/rigidity prior (see Appendix B for the mathematical proof). On the basis of how precisely the motion of the observer is measured and how strong the stationarity/rigidity prior is, we can discriminate between two possible models:

_{e}- The
*optimal-observer*(OO) model includes a strong stationarity/rigidity prior and a precise measurement of the observer's egomotion. For an active observer, this model correctly derives a rigid interpretation whenever the optic flow is compatible with such an interpretation. Most importantly, in the unnatural experimental condition in which the observer is static, the model predicts suboptimal performance, incorrectly assigning nonrigid interpretations to rigid stimuli. For a passive observer, this model substantially mimics the MLE interpretation, thus predicting our previous findings [see “Optimal observer (OO)” in Appendix C]. - The
*heuristic-observer*(HO) model predicts the behavior of an observer who has access to noisy egomotion information, and embeds a weak stationarity/rigidity prior. It can be shown that this model makes predictions for both active and passive observers that are qualitatively identical to those of the MLE model [see “Heuristic observer (HO)” in Appendix C].

*Are biases in a passive observer's judgments of rigidity due to a visual system that ignores both the stationarity/rigidity prior and the observer's egomotion (HO model), or do they take place because passive viewing of the optic flow forces the visual system to operate in nonideal conditions (OO model)?*

*reference surface*, was always stationary in an allocentric reference frame (

*ω*= 0). The other surface, referred to as the

_{sr}*target surface*, rotated in depth by an amount proportional to the observer's own motion

*ω*. Thus, the amount of rotation of the target surface was defined by a gain factor

_{e}*g*so that

*ω*=

_{st}*gω*

_{e}. The factor

*g*was made to vary within the range [−1, +1]. At the extreme values,

*g*= +1 caused an allocentric rotation

*ω*of the target surface of equal magnitude but opposite direction to the relative rotation

_{st}*ω*induced by the observer's translation; whereas

_{e}*g*= −1 caused an allocentric rotation

*ω*of the target surface equal to the relative rotation

_{st}*ω*. Only

_{e}*g =*0 specified a static target surface and, therefore, a rigid structure.

*g*giving rise to a chance-level performance—and (b) the just-noticeable difference (JND), that is, the smallest

*g*variation from the PSR which gave rise to a nonrigid percept. The deformations projected by the reference and target surfaces were

*def*= −

_{r}*ω*tan

_{e}*σ*and

_{r}*def*= (

_{t}*gω*

_{e}−

*ω*)tan

_{e}*σ*, respectively.

_{t}*g*values tested in our experiments. To this purpose, we considered the viewing parameters used in Experiment 1 (lateral head translation of 160 mm performed at 125 mm/s at a viewing distance of 668 mm), but the results of the simulation generalize to Experiment 2 as well. The input to the HO and OO models were the average values of

*def*and the average observer's motion

*ω*measured during the experimental sessions. For the passive viewing condition it was assumed that the egomotion signal specifies an immobile observer (

_{e}*ω*= 0).

_{e}*g*(bottom x-axis) and the corresponding difference between the deformations of the two surfaces,

*def*−

_{r}*def*(top x-axis). The bottom row shows the difference between the maximum a posteriori (MAP) estimates (

_{t}*ω̂*−

_{rr}*ω̂*) of the relative angular velocities of the reference (

_{rt}*ω̂*) and target (

_{rr}*ω̂*) surfaces. In the top row, the same data are replotted in terms of the probability of perceiving the target surface as rotating slower than the reference surface, by assuming Gaussian noise on the measurement of

_{rt}*ω̂*−

_{rr}*ω̂*.

_{rt}*ω̂*−

_{rr}*ω̂*> 0, when the deformation of the target surface is smaller than the deformation of the reference surface (

_{rt}*def*−

_{r}*def*> 0), and vice-versa when the deformation of the target surface is larger than the deformation of the reference surface (

_{t}*def*−

_{r}*def*< 0). This prediction is in agreement with previous results (Domini et al., 1997). However, for the active observer the predictions of the two models differ substantially (red). For the active observer, the OO model predicts a bias towards a rigid interpretation, whereas the HO model predicts a similar performance for both the active and the passive observer.

_{t}*ω*of 60°/s. A white virtual marker was rendered, ideally superimposed on the physical marker, and 40 measures of instantaneous angular displacement between the physical marker and the virtual marker

*θ*were collected by taking photographs with a Nikon D90 (resolution 3216 × 2136, ISO 6400, exposure time = 1/100, focal length = 38 m) centered and aligned with the turntable (Movie 1). Two levels of complexity of the graphic scene were also tested: a low level, in which just the virtual marker was displayed, and a high level, in which the marker was displayed together with a complex mesh with more than 10

^{6}vertices. The estimated system lag (

*θ*/

*ω*) was about 27.9 ± 1.26 ms and was independent of graphical complexity (26.8 ± 1.91 ms vs. 28.9 ± 1.66 ms in the low vs. high complexity condition,

*t*= 0.82,

*df*= 38,

*p*= 0.4). According to the results of Allison, Harris, Jenkin, Jasiobedzka, and Zacher (2001), such a low system lag combined with the low head-translation velocity used in our experiments (about 22°/s) is not likely to produce artificial distortions of the perception of stability.

^{2}). The quasirectangular projections were separated by a vertical gap (15 mm) at the center of the screen. In the active condition, the motion of the dots on the screen was calculated in real time by tracking the observers' vantage point. The dots on each simulated planar surface were projected onto the image plane by using a generalized perspective pinhole model, with the observer's right eye position as the center of projection. In the passive condition the same optic flow was replayed to the passive observer.

*def*on perceived surface orientation and motion is exactly the same for the two possible directions of the translatory movement (Fantoni, Caudek, & Domini, 2012). Given the repetitive nature of the observer movement, this also facilitated the execution of the task.

*ω*about the vertical axis of the simulated planar surfaces relative to the observer of about ±6.75° and ±7.85°, respectively.

_{e}*σ*of the reference surface was 30° in Experiment 1 and 25° in Experiment 2, and remained constant throughout a trial. The simulated slant

_{r}*σ*of the target surface was, instead, coupled in real time with the observer's motion through the following equation: where

_{t}*g*is the rotation gain and

*α*the visual direction.

*g*was 0, the target surface was stationary with a constant slant

*σ*=

_{t}*σ*

_{0t}. A negative gain (

*g*< 0) made the surface rotate in the same direction as the observer's gaze, whereas a positive gain produced the opposite rotation. The time derivative of Equation 4 is the angular velocity of the surface,

*ω*=

_{st}*gα̇*=

*gω*

_{e}. Since

*ω*=

_{rt}*ω*−

_{st}*ω*is the relative rotation between the target surface and the observer,

_{e}*ω*=

_{rt}*gω*−

_{e}*ω*. Therefore, the instantaneous deformation of the target surface is

_{e}*def*= (

_{t}*gω*−

_{e}*ω*)tan

_{e}*σ*, which on average is equal to (

_{t}*gω*−

_{e}*ω*)tan

_{e}*σ*. Thus a positive gain reduces the amount of

_{0t}*def*projected by the target surface, whereas a negative gain increases it. Most critically, there is a value of rotation gain for which the target surface generates the same average

*def*as the reference surface (Figure 3).

*σ*of the target surface could take on two possible values (45° and 55°), whereas only one value (45°) was possible in Experiment 2.

_{0t}*translatory*components (inversely proportional to the deviation between the observer's right-eye visual axis and the stimulus center, during the corresponding active vision trial) and

*rotational*components (proportional to the three degrees of freedom of head rotations performed by the observer during the corresponding active vision trial). The translatory components of our passive displays (on average, 0.5 cm), together with the display durations (0.7 s) and display size (on average, 5°), were all small enough to prevent vection. This was confirmed by preliminary interviews with the observers.

*g*= ± 0.9, ± 0.5). Only participants with more than 80% correct responses were admitted to the experimental session.

*not*more likely to see a rigid structure (Figure 4a) than in the passive condition (Figure 4b). This result is opposite to the prediction of the OO model, that is, a flattening of the psychometric function in the active condition (Figure 2, right).

*def*difference was even larger in the active than in the passive viewing condition. This may indicate that the active observers had an advantage over the passive observers for the measurement of the velocity gradients, for example through a process of retinal stabilization (Oosterhoff, van Damme, & van de Grind, 1993; Cornilleau-Pérès & Droulez, 1994; Aytekin & Rucci, 2012). A similar speculation was made in previous studies (Caudek, Fantoni, & Domini, 2011; Fantoni et al., 2012), but it warrants further research.

*psignifit*software (Wichmann & Hill, 2001). The goodness of fit of each best-fit psychometric curve was assessed with the 95% confidence interval criterion based on Monte Carlo simulations of 10,000 data sets.

*p*-values were obtained using Markov chain Monte Carlo simulations (10,000 samples).

*t*= 21.9,

*p*< 0.001). A similar bias was found in passive vision (PSR = 0.452 ± 0.024,

*t*= 18.71,

*p*< 0.001). Furthermore, the PSR depended on the slant of the target surface: A larger slant corresponded to a larger PSR in both the active (PSR at 45° = 0.37, PSR at 55° = 0.53;

*t*= 6.04,

*p*< 0.01) and passive (PSR at 45° = 0.39, PSR at 55° = 0.51;

*t*= 2.99,

*p*< 0.01) conditions. These effects were accounted for by our LME model, revealing a significant main effect of the slant of the target surface (

*t*= 4.41,

*p*< 0.001).

*t*= 2.81,

*p*< 0.01; viewing condition:

*t*= 3.66,

*p*< 0.01). Importantly, the JND in the active condition was significantly smaller than the JND in the passive condition (0.16 and 0.26, respectively,

*t*= 3.66,

*p*< 0.01), a result that is opposite to the prediction of the OO model. This effect cannot be explained by the unbalanced temporal ordering of the passive and active vision phases. Because the passive vision phase always came

*after*the active vision phase, any effect of practice should have produced the opposite result.

*g*= 0) was systematically perceived as nonrigid, since the target surface was mostly perceived as rotating faster than the reference surface. Instead, nonrigid structures (

*g*= PSR) were systematically perceived as undergoing a rigid transformation.

*def*difference for both active (Figure 5a) and passive (Figure 5b) observers, two surfaces are perceived as undergoing the same rotation in depth only when they project identical velocity gradients (same

*def*). When the projected velocity gradients are discernibly different, the two-plane configuration is perceived as undergoing a nonrigid transformation. A statistical analysis on PSR and JND recoded as a function of

*def*difference revealed that

*def*was the only determinant of the perceptual responses, and the additional contribution of the slant of the target surface was not significant (PSR:

*t*= 0.42, not significant; JND:

*t*= 0.036, not significant).

*shrinking or expanding*, which required a Euclidean estimate of the 3-D structure.

*t*= 15.03,

*p*< 0.001), which is close to those found in Experiment 1, and did not depend on the tilt of the target surface (

*t*= 0.947, not significant), on whether the observer was active or passive (

*t*= 0.175, not significant), or on the interaction of these two variables (

*t*= 0.954, not significant). Similar results were obtained on the JNDs with the same LME model (tilt:

*t*= 1.87, not significant; viewing condition:

*t*= 2.2, not significant; tilt × viewing condition:

*t*= 0.9, not significant).

*def*difference was 0 (PSR = −0.021,

*t*= 1.51, not significant).

*def*s) projected by the two planar surfaces. When the

*def*difference was detectably different from 0, the two surfaces appeared as undergoing different amounts of rotation in depth.

*def*component of the optic flow. Since

*def*is inherently ambiguous, the MLE solution is in general inaccurate: It assigns to two different values of

*def*two different values of estimated angular velocities, independent of whether or not the two

*def*s were generated by surfaces undergoing the same rotational motion. The MLE solution is for a passive observer the best the visual system can do with only optic flow information.

*instantaneous*optic flow was

*always*compatible with a rigid transformation, even for the nonrigid displays.

*both*the passive and active conditions, as predicted by the OIE model described in Appendix C.

*both*the prior for rigidity is uninformative and egomotion information is disregarded. This model is therefore equivalent to the MLE model proposed by Domini and Caudek (2003) for perception of 3-D structure from the passively viewed optic flow, which only relies on the information provided by the local velocity gradients (

*def*). The predictions of this model were also confirmed in a series of studies showing systematic biases in both the perception of motion and slant of actively viewed planar surfaces (Fantoni et al., 2010, 2012; Caudek et al., 2011).

*def*, while both ignoring statistically plausible priors, like stationarity or rigidity, and potentially available egocentric signals, remains open. Two possible explanations could be attempted.

*Perception as Bayesian inference*(pp. 409–423). New York: Cambridge University Press.

*IEEE Virtual Reality*, 2001, 3, 247–254, http://dx.doi.org/10.1109/VR.2001.913793.

*Vision Research*

*,*70

*,*7–17. [CrossRef] [PubMed]

*Vision Research*

*,*38

*,*187–194. [CrossRef] [PubMed]

*lme4. Linear mixed effects models using S4 classes*(R package Version 0.9975) [Computer software]. Retrieved from http://cran.rproject.org/web/pakages/lme4/index.html.

*Nature Neuroscience*

*,*11

*,*1121–1122. [PubMed] [CrossRef] [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance*

*,*14

*,*582–590. [PubMed] [CrossRef] [PubMed]

*Experimental Brain Research*

*,*71

*,*406–410.

*Journal of Experimental Psychology: Human Perception and Performance*

*,*24

*,*609–621. [PubMed] [CrossRef] [PubMed]

*PLoS ONE*

*,*6 (4), e18731, doi:10.1371/journal.pone.0018731.

*Data fusion for sensory information processing systems*. Norwell, MA: Kluwer Academic Publishers.

*Biological Cybernetics*

*,*97

*,*461–477. [PubMed] [CrossRef] [PubMed]

*Vision Research*

*,*34

*,*2331–2336. [PubMed] [CrossRef] [PubMed]

*Vision Research*

*,*35

*,*453–462. [PubMed] [CrossRef] [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance*

*,*24

*,*1273–1295. [CrossRef]

*Sensory cue integration*(pp. 120–143). New York: Oxford University Press.

*Journal of Experimental Psychology: Human Perception and Performance*

*,*25

*,*426–444. [PubMed] [CrossRef] [PubMed]

*Trends in Cognitive Sciences*

*,*7

*,*444–449. [CrossRef] [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance*

*,*23

*,*1111–1129. [PubMed] [CrossRef] [PubMed]

*Perception and Psychophysics*

*,*60

*,*747–760. [PubMed] [CrossRef] [PubMed]

*Journal of Vision*

*,*13 (2): 15, 1–14, http://www.journalofvision.org/content/13/2/15, doi:10.1167/13.2.15. [PubMed] [Article]

*Journal of Vision,*8 (14): 5

*,*1–10, http://journalofvision.org/8/14/5/, doi:10.1167/8.14.5. [PubMed] [Article]

*Vision Research*

*,*39

*,*1713–1722. [PubMed] [CrossRef] [PubMed]

*Nature Neuroscience*

*,*3

*,*69–73. [CrossRef] [PubMed]

*Trends in Cognitive Sciences*

*,*8

*,*162–169. [CrossRef] [PubMed]

*PLoS ONE*

*,*7 (3), e33911, doi:10.1371/journal.pone.0033911.

*Journal of Vision*

*,*10 (5): 12, 1–20, http://www.journalofvision.org/content/10/5/12, doi:10.1167/10.5.12. [PubMed] [Article]

*Vision Research*

*,*27

*,*811–828. [PubMed] [CrossRef] [PubMed]

*Current Biology*

*,*16 (4), 428–432, doi:10.1016/j.cub.2006.01.019. [CrossRef] [PubMed]

*Journal of the Optical Society of America, A*

*,*4

*,*503–518. [CrossRef]

*Nature Neuroscience*

*,*11

*,*1201–1210. [PubMed] [CrossRef] [PubMed]

*Nature Neuroscience*

*,*10

*,*1038–1047. [PubMed] [CrossRef] [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance*

*,*26

*,*934–955. [PubMed] [CrossRef] [PubMed]

*Experimental Brain Research*

*,*163

*,*388–399. [PubMed] [CrossRef] [PubMed]

*Proceedings of the National Academy of Sciences, USA*

*,*2010

*,*1–6, doi:10.1073/pnas.1016211108.

*Annual Review of Psychology*

*,*55

*,*271–304. [CrossRef] [PubMed]

*Journal of Vision*, 7 (7): 5, 1–24, http://journalofvision.org/content/7/7/5/, doi:10.1167/7.7.5. [PubMed] [Article]

*Trends in Neurosciences*

*,*27 (12), 712–719, doi:10.1016/j.tins.2004.10.007. [CrossRef] [PubMed]

*Kybernetik*(pp. 224–247). Munich: Oldenberg.

*Optica Acta*

*,*22

*,*773–791. [CrossRef]

*Vision Research*

*,*35

*,*389–412. [PubMed] [CrossRef] [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance*

*,*24

*,*1257–1272. [CrossRef] [PubMed]

*Perception*

*,*22

*,*1441–1465. [CrossRef] [PubMed]

*Journal of Neuroscience*

*,*29

*,*8936–8945. [PubMed] [CrossRef] [PubMed]

*Journal of Vision*

*,*3 (11): 17, 841–851, http://www.journalofvision.org/content/3/11/17, doi:10.1167/3.11.17. [PubMed] [Article] [PubMed]

*Vision Research*

*,*46

*,*4709–4725. [CrossRef] [PubMed]

*Vision Research*

*,*59

*,*64–71, doi:10.1016/j.visres.2012.02.007. [CrossRef] [PubMed]

*Perception and Psychophysics*

*,*53

*,*279–291. [CrossRef] [PubMed]

*Perception and Psychophysics*

*,*48

*,*179–187. [PubMed] [CrossRef] [PubMed]

*Perception*

*,*22

*,*99.

*Perception and Psychophysics*

*,*64

*,*717–731. [PubMed] [CrossRef] [PubMed]

*Vision Research*

*,*42

*,*1991–2003. [PubMed] [CrossRef] [PubMed]

*Journal of the Optical Society of America*

*,*2

*,*343–349. [CrossRef] [PubMed]

*Perception and Psychophysics*

*,*52

*,*446–452. [PubMed] [CrossRef] [PubMed]

*Proceedings of the 13th Annual ACM Symposium on User Interface Software and Technology*( pp. 161–170). New York: ACM.

*The interpretation of visual motion*. Cambridge, MA: MIT Press.

*Journal of Vision*, 3 (5): 1, 318–332, http://www.journalofvision.org/content/3/5/1, doi:10.1167/3.5.1. [PubMed] [Article] [PubMed]

*Handbook of experimental phenomenology: Visual perception of shape, space and appearance*(pp. 181–204). Hoboken, NJ: Wiley-Blackwell.

*Psychological Science*

*,*24 (9), 1673–1685, doi:10.1177/0956797613477867. [CrossRef] [PubMed]

*Journal of Neuroscience*

*,*33

*,*17081–17088. [CrossRef] [PubMed]

*Perception and Psychophysics*

*,*15

*,*339–343. [CrossRef]

*Journal of Neurophysiology*

*,*91 (4), 1913–1918, doi:10.1152/jn.01044.2003. [CrossRef] [PubMed]

*Psychological Science*

*,*14

*,*340–346. [PubMed] [CrossRef] [PubMed]

*Vision Research*

*,*41

*,*3023–3037. [PubMed] [CrossRef] [PubMed]

*Nature*

*,*409

*,*85–88. [PubMed] [CrossRef] [PubMed]

*Trends in Cognitive Science*

*,*9

*,*431–438. [PubMed] [CrossRef]

*Perception and Psychophysics*

*,*63

*,*1293–1313. [CrossRef] [PubMed]

*def*s

*P*(

*def*|

_{i}*ω*),

_{ri}*i*= 1, 2, can be calculated by integrating over the nuisance variable

*σ*the image formation model

*P*(

*def*|

_{i}*ω*,

_{ri}*σ*) multiplied by the prior distribution

*P*(

*σ*):

*def*,

*P*(

*def*|

_{i}*ω*,

_{ri}*σ*) is a Gaussian distribution with mean at the measured value of the velocity gradient (

*def*) and variance

*P*(

*σ*) is considered to be an uninformative Gaussian centered at 0 with variance

*P*(

*def*|

_{i}*ω*,

_{ri}*σ*),

*P*(

*σ*), and

*P*(

*def*|

_{i}*ω*) for

_{ri}*def*

_{1}= 0.2 rad/s (Figure A1a) and

*def*

_{2}= 0.35 rad/s (Figure A1b). In this simulation, we considered an observer moving at

*ω*= 19.35°/s viewing a static two-plane configuration for which

_{e}*σ*

_{1}= 30° and

*σ*

_{2}= 45°. We now consider the Maximum Likelihood Estimate (MLE) of

*ω*

_{r}_{1}(blue lines) and

*ω*

_{r}_{2}(red lines), which are the values of relative angular velocities maximizing

*P*(

*def*

_{1},

*def*

_{2}|

*ω*

_{r}_{1},

*ω*

_{r}_{2}). These values are different from the real value of

*ω*=

_{r}*ω*= 19.35°/s (Figure A1c, green circle) and also different from each other (Figure A1c, x- and y-coordinates of the red outlined circle). Therefore, the MLE is that of a nonrigid structure, since the estimated rotation of the surface producing a larger value of

_{e}*def*is larger. The result of this simulation is compatible with the findings of Domini and colleagues (1997) showing that perceived angular velocity is a monotonically increasing function of

*def*and that the two-plane configuration is perceived as rigid only if the two surfaces generate the same value of

*def*.

*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) of Equation 3 can change the MLE interpretation (see Appendix A), since it incorporates information about the observer's egomotion and the stationarity/rigidity prior. It can be shown that where

_{e}*P*(

*ω*

_{r}_{1}|

*ω*) =

_{e}*P*(

_{ωs}*ω*

_{s}_{1}=

*ω*

_{r}_{1}+

*ω*) and

_{e}*P*(

*ω*

_{r}_{2}|

*ω*) =

_{e}*P*(

_{ωs}*ω*

_{s}_{2}=

*ω*

_{r}_{2}+

*ω*), with

_{e}*P*(

_{ωs}*ω*

_{s}_{1}) and

*P*(

_{ωs}*ω*

_{s}_{2}) indicating the a priori distributions over the surface angular velocities

*ω*

_{s}_{1}and

*ω*

_{s}_{2}. These a priori distributions (modeled as Gaussians centered at 0 with standard deviation

*s*) are sharply peaked at 0 if the surfaces are assumed to be stationary in the world, since a stationary surface is defined by

_{ωs}*ω*= 0. If both a priori distributions are narrowly peaked at 0, then both surfaces are stationary and as a consequence the structure is rigid.

_{s}*P*(

*ω*|

_{e}*ω̂*)—a Gaussian distribution centered at

_{e}*ω̂*with standard deviation

_{e}*s*—defines the precision of the measurement of the egomotion angular velocity

_{ωe}*ω̂*. This distribution is narrowly peaked at

_{e}*ω̂*if the egomotion estimate is very precise.

_{e}*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) and its influence on the posterior critically depends on the precision of the measurement of the egomotion angular velocity

_{e}*ω̂*, and the strength of the prior for stationarity/rigidity.

_{e}*ω*

_{r}_{1},

*ω*

_{r}_{2}) that maximizes the posterior—differs from the MLE can be seen in Figures C1 through C3. Depending on the values of

*s*(precision of egocentric motion estimate) and

_{ωe}*s*(strength of the stationarity/rigidity prior), we can foresee the following three qualitatively different models.

_{ωs}*s*and

_{ωe}*s*are very small,

_{ωs}*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) is sharply peaked at the veridical solution

_{e}*ω*

_{r}_{1}=

*ω*

_{r}_{2}=

*ω̂*(Figure C1b, left panel). In this case,

_{e}*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) has a strong influence on the widely spread likelihood function (Figure C1b, middle panel), therefore producing MAP estimates (Figure C1b, right panel) defining a rigid interpretation (

_{e}*ω*

_{r}_{1}=

*ω*

_{r}_{2}).

*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) is sharply peaked at the solution

_{e}*ω*

_{r}_{1}=

*ω*

_{r}_{2}= 0, since the sensed egocentric motion is zero (Figure C1a, left panel). In this case,

*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) only pulls the MAP solution towards small rotations, but it effectively constitutes a noninformative prior, since it does not favor any particular solution that is different from zero.

_{e}*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) is widely spread, since the system has no access to reliable information about the observer's egomotion (large

_{e}*s*) and does not assume that surfaces in the world are stationary (large

_{ωe}*s*). In this case,

_{ωs}*P*(

*ω*

_{r}_{1},

*ω*

_{r}_{2}|

*ω̂*) constitutes a noninformative prior which does not change the MLE interpretation: For both passive and active observers, the MAP estimate defines a nonrigid transformation.

_{e}*ω*

_{r}_{1},

*ω*

_{r}_{2}) are the same. In this case, the measurement of

*ω*is uncertain, but the strong stationarity prior imposes the condition that

_{e}*ω*

_{r}_{1}=

*ω*

_{r}_{2}. The MAP estimate defines a rigid transformation for both active and passive observers.