Free
Research Article  |   October 2010
Integration of visual and inertial cues in perceived heading of self-motion
Author Affiliations
Journal of Vision October 2010, Vol.10, 1. doi:10.1167/10.12.1
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ksander N. de Winkel, Jeroen Weesie, Peter J. Werkhoven, Eric L. Groen; Integration of visual and inertial cues in perceived heading of self-motion. Journal of Vision 2010;10(12):1. doi: 10.1167/10.12.1.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

In the present study, we investigated whether the perception of heading of linear self-motion can be explained by Maximum Likelihood Integration (MLI) of visual and non-visual sensory cues. MLI predicts smaller variance for multisensory judgments compared to unisensory judgments. Nine participants were exposed to visual, inertial, or visual–inertial motion conditions in a moving base simulator, capable of accelerating along a horizontal linear track with variable heading. Visual random-dot motion stimuli were projected on a display with a 40° horizontal × 32° vertical field of view (FoV). All motion profiles consisted of a raised cosine bell in velocity. Stimulus heading was varied between 0 and 20°. After each stimulus, participants indicated whether perceived self-motion was straight-ahead or not. We fitted cumulative normal distribution functions to the data as a psychometric model and compared this model to a nested model in which the slope of the multisensory condition was subject to the MLI hypothesis. Based on likelihood ratio tests, the MLI model had to be rejected. It seems that the imprecise inertial estimate was weighed relatively more than the precise visual estimate, compared to the MLI predictions. Possibly, this can be attributed to low realism of the visual stimulus. The present results concur with other findings of overweighing of inertial cues in synthetic environments.

Introduction
Perception of self-motion and -orientation in the environment is based on neural integration of inputs from the visual, vestibular, kinesthetic, and tactile senses. There has been extensive research on this topic, in particular on the contribution of the vestibular and visual systems (e.g., Bischof, 1974; Guedry, 1974; Howard, 1982). The visual system provides multidimensional information on self-motion by means of optic flow and on spatial orientation by means of visual frame and polarity information (Howard & Childerson, 1994). The vestibular system in the inner ear detects linear motion by means of the otolith organs. Furthermore, there is evidence for extra-vestibular sensory neurons dedicated to the perception of gravity, the so-called “graviceptors” (Mittelstaedt, 1996; Zaichik, Rodchenko, Rufov, Yashin, & White, 1999). The dynamics of the different sensory systems and interactions between them have been represented in mathematical models to explain fundamental psychophysical characteristics of self-motion and -orientation (e.g., Bos & Bles, 2002; Zupan, Merfeld, & Darlot, 2002). For example, models have successfully described the multisensory processes in the onset of perceived self-motion (e.g., Henn, Cohen, & Young, 1980; Young, Dichgans, Murphy, & Brandt, 1973), the neural disambiguation of the gravito-inertial force into gravity and linear acceleration by means of visual information (MacNeilage, Banks, Berger, & Bülthoff, 2007), and visual–vestibular interaction in the perception of self-tilt (Vingerhoets, De Vrijer, Van Gisbergen, & Medendorp, 2009). However, heading perception, i.e., the direction of linear motion along the naso-occipital axis, has mainly been studied as a visual task with non-moving observers and has not received much attention in the literature on multisensory perception. Furthermore, although multisensory perception has been associated with higher precision than unisensory perception for several perceptual tasks, it is not yet known whether this principle also holds for the perception of heading. In this paper, we describe an experiment in which we measured variability in visual, inertial, and combined visual–inertial perceptions of heading in order to test whether multisensory stimuli yielded more precise heading judgments than their unisensory constituents. 
Visual heading perception
During self-motion, heading is specified by the point in the optic array from which the surrounding image radially expands (the focus of expansion, FOE). Heading can be estimated by localizing this point (Gibson, 1950; Warren & Hannon, 1988). Warren, Morris, and Kalish (1988) presented to participants moving horizontal random-dot planes and instructed the participants to report whether it looked as if they moved to the left or right of a target; 75% correct thresholds were in the range of 0.65° to 1.2°, depending somewhat on the density of the dots and stimulus speed. In a paper by Telford, Howard, and Ohmi (1995), a 75% correct detection threshold of 5.5° was reported, averaged over participants. Accuracy of heading judgments depends upon the part of the fovea stimulated, eccentricity of the FOE (Crowell & Banks, 1993; Warren & Kurtz, 1992), and the coherence of motion of particles in the optic flow field (Gu, Angelaki, & DeAngelis, 2008). 
Inertial heading perception
The vestibular system responds to rotational and linear accelerations of the head associated with changes of self-motion. It is less useful for the perception of the actual speed and trajectory of self-motion. For these aspects, we rely on the visual system. However, the otolith organs do provide information on the direction of linear acceleration (Gu et al., 2008; Gu, DeAngelis, & Angelaki, 2007; Ohmi, 1996; Telford et al., 1995), as do several extra-vestibular graviceptors throughout the body (Mittelstaedt, 1996; Zaichik et al., 1999). For humans, a 75% correct detection threshold of non-zero heading (i.e., not straight ahead) for inertial stimuli has been reported to be 11.4°, averaged over participants (Telford et al., 1995). Thresholds as small as 1.3° have been reported for extensively trained macaque monkeys, suggesting that inertial signals can indeed contribute to precise heading judgments (Gu et al., 2007). Since extra-vestibular sensors respond to the same environmental stimuli as the vestibular system and are difficult to isolate from the vestibular system, we will treat them as a single system, here collectively designated inertial heading sensors. 
Multisensory heading perception
It is likely that the brain combines estimates of heading made by multiple senses in a way that allows us to benefit from having multiple sources of information. Such a benefit can be expressed by improvements in precision (i.e., a reduction of variance) of the estimate. A well-known integration scheme is Bayesian Integration (BI). BI models have been applied to describe the influence of prior knowledge on sensory integration (e.g., Bresciani, Dammeier, & Ernst, 2006; Jürgens & Becker, 2006; Laurens & Droulez, 2006; MacNeilage et al., 2007). In its most simple form, the Bayesian scheme is essentially Maximum Likelihood Integration (MLI; Ernst & Banks, 2002). This is a statistically optimal strategy to combine multiple cues (“observations”). Assuming normality and independence of noises in internal representations, MLI effectively states that multisensory estimates of the same environmental property are combined as a weighted average, with weights proportional to each estimate's reliability (i.e., inverse of its variance; Howard, 1997). MLI yields minimal variance in the integrated estimate among all weighted averages. Thus, MLI predicts how the (parameters of the) multisensory condition relate to the (parameters of) unisensory conditions. It has been shown that the brain acts according to MLI for several psychophysical phenomena, such as integration of visual and haptic information on an object's size (Ernst & Banks, 2002) and for integration of multisensory information on sequences of events (e.g., Andersen, Tiipana, & Sams, 2005; Shams, Ma, & Beierholm, 2005). In these studies, multisensory estimates were more precise than unisensory estimates. In a recent study, Gu et al. (2008) investigated whether this integration strategy also holds for the perception of heading. Macaque monkeys were trained to perform a discrimination task in which they were passively moved along a linear track in the horizontal plane with a certain heading. The monkeys indicated whether the experienced motion was to the left or right of straight-ahead heading. The results suggested that the monkeys were more precise in their judgments when multisensory visual–inertial cues were presented than when either cue was presented in isolation, consistent with the MLI hypothesis. Furthermore, Fetsch, Turner, DeAngelis, and Angelaki (2009) recently showed that cue weighting is a dynamic process. They presented macaque monkeys and humans with multisensory heading stimuli. Reliability of the visual heading cue was manipulated between successive trials by varying coherence of the direction of moving dots in the visual field between 25% and 70%. It was observed that the weight attributed to each cue was updated on a trial-to-trial basis. 
Our interest in the present study was to investigate whether multisensory presentation yields a more precise estimate compared to its unisensory constituents, when the objects in the optical array move in a completely coherent fashion. We hypothesized that the variance of the estimate would be smallest when both visual and inertial cues were available, compared to when either cue was presented in isolation. We focused on MLI theory, which gives a precise prediction of the extent to which the variance is reduced. The study was performed in a moving base simulator that allowed us to independently manipulate visual and inertial motion cues. 
Methods
Participants
Nine paid volunteers took part in this experiment (six men, three women, mean age 28.7, standard deviation 6.9). All participants reported normal vestibular function and normal or corrected-to-normal vision. After receiving general instructions of the experimental goals and procedures, all participants signed an informed consent. 
Apparatus
The experiment was performed using the DESDEMONA simulator at the TNO Institute in Soesterberg, The Netherlands (Bles & Groen, 2009). This centrifuge-based simulator features a motion platform with six degrees of freedom. For this study, only two degrees of freedom were used: rotation about the cabin's vertical yaw axis and linear motion along the 8-m horizontal track. Participants were seated on a padded seat inside the cabin and secured by a five-point safety harness. Foam cushions were placed between a headrest and the left and right sides of the head to minimize head movements. 
Inside the cabin, an out-the-window (OTW) visual stimulus was projected on a screen at about 1.5 m in front of the participant. Participants wore a mask that restricted their field of view to 40° (horizontal) × 32° (vertical) of the OTW display, blocking stationary visual cues from the cabin interior. The mask served as a substitute for a lacking cockpit canopy and made the OTW scenery appear in the background. The latter is important, since it is known that vection (the visually induced sense of self-motion) is induced more effectively when a visual motion stimulus is presented in the perceptual background (Howard & Heckmann, 1989). Participants wore an audio-headset that allowed for constant contact with the experimenters. Responses were given verbally and were noted by the experimenters. 
Stimuli
Stimuli consisted of visual, inertial, or combined visual–inertial linear horizontal motion with different headings. Heading is defined as the direction of motion with respect to the median plane of the body: A heading of 0° corresponds to linear forward motion along the participants' naso-occipital axis; a 90° heading corresponds to linear rightward motion along the inter-aural axis. We did not expect consistent differences between perceptions of left- and rightward motions. Therefore, only rightward motion was presented, which reduced the number of trials. 
Visual stimuli consisted of linear horizontal motion through a star field. Different angles of visual heading were achieved by shifting the focus of expansion (FOE) sideways. The star field consisted of a cloud of solid white circles, placed in random positions on a three-dimensional grid in a dark surrounding environment. This stimulus was generated at random with each trial ( Figure 1). The objects never appeared at the FOE. Displacement of the visual objects was coupled linearly with the inertial motion. Absolute velocity of the visual motion was arbitrary since participants neither had objective information about the distance between objects, nor could they determine their size. Velocity of movement through the star field was determined in a pilot study; the stimulus amplitude was chosen such that it subjectively matched the velocity of the inertial motion. Participants never reported a feeling of discrepancy between the visual and inertial velocities. 
Figure 1
 
Screen capture of the visual stimulus.
Figure 1
 
Screen capture of the visual stimulus.
Inertial stimuli consisted of motion along the linear horizontal track of the simulator over a total length of 7 m. The velocity profile was a raised cosine bell with maximum velocity of 1.5 m/s. Maximum acceleration was 0.5 m/s 2. Each motion profile lasted 9.3 s (see additional material: HeadingI.wmv). Since the vestibular system is responsive to acceleration and not to constant velocity, a constantly changing velocity profile was used to ensure vestibular reactivity. 
Participants reported whether or not they perceived a heading of 0°. Psychometric curves (cumulative normal distribution functions) of the probability that a participant perceived the stimulus as being not straight ahead as a function of heading angle were determined based on six different fixed stimulus headings. When using fixed stimuli, it is important that the data points fall as much as possible on the steepest part of the psychometric curve. Based on previous reports, we knew that the detection thresholds differ between the different sensory modalities. For example, visual detection thresholds fall in the range of 0.65°–5.5° (Telford et al., 1995; Warren et al., 1988, respectively) and an inertial detection threshold of 11.4° has been reported (Telford et al., 1995). The range of stimulus values for which the percentage correct responses gradually increases thus varies between sensory modalities. Therefore, we used different ranges of stimulus values for each condition, which we verified in a pilot experiment (Table 1). 
Table 1
 
Heading angles used in the three conditions. V, I, and C stand for the visual, inertial, and combined visual–inertial condition, respectively.
Table 1
 
Heading angles used in the three conditions. V, I, and C stand for the visual, inertial, and combined visual–inertial condition, respectively.
Heading angle α
10° 15° 20°
V x x x x x x
I x x x x x x
C x x x x x x
Procedure
Stimuli were presented in separate simulator runs of about 30 s. At the beginning of each run, the cabin was positioned at one end of the linear track. The run started with rotation of the cabin about its yaw axis to orient the participant at the desired stimulus heading relative to the linear track. The cabin always rotated the longest distance (i.e., around 180°) with an angular velocity between 12 and 13.33°/s, depending on the stimulus angle. The duration of this rotation was kept constant to eliminate it as a possible cue to stimulus heading. To allow the aftereffects of the response of the semi-circular canals for this yaw rotation to wash out, a 6-s pause was implemented before the actual stimulus started. After this pause, the cabin was moved over 7 m to the other end of the linear track in the inertial and visual–inertial conditions. In the visual-only condition, the cabin remained stationary at the end of the linear track, and only visual motion was presented. Linear motion stimuli always lasted 9.3 s. Following stimulus presentation, participants gave their verbal response in a 1-s pause before the next run started with the reorientation of the cabin. Participants were asked to judge the direction of perceived self-motion by means of a two-alternative forced-choice (2AFC) task. In other words, they indicated after each stimulus whether they perceived the motion as “straight ahead” or “not straight ahead.” Participants were instructed to use all sensory information on self-motion, irrespective whether this was inertial, visual, or both. 
The runs were presented in three separate blocks, or conditions: a condition with visual-only (V) runs, a condition with inertial (I) runs, and a condition with combined visual–inertial (C) runs. In each condition, all six stimulus angles were presented 10 times, totaling 60 runs per condition. Participants also experienced an extra set of 60 visual–inertial stimuli to answer another research question, which will be reported elsewhere. The order of stimuli within a condition was randomized. The order of conditions was randomized as much as possible using Latin squares. Each participant performed a total of 240 runs in four 30-min blocks. After each block, they had a 15-min break outside the simulator. Including instruction, the whole experiment lasted about 4 h for each participant. 
Theoretical model
We assume that participants have an internal noisy but “continuous” representation X* of their heading angle α, with normal distributed noise, X* ∼ N( α, σ), where the standard deviation σ reflects the size of the noise in the random variables. We expect that participants experience to move “not straight ahead” and respond accordingly (i.e., binary response; X = 1) when a certain internal threshold is exceeded: X* > τ (Long, 1997, Section 3.2). When the heading angle increases, the probability of responding “not straight ahead” also increases. 
Maximum likelihood integration hypothesis
MLI hypothesizes how the standard deviation parameters for the multisensory (combined) condition ( σ c) depend on the standard deviation parameters for the unisensory inertial ( σ i) and visual ( σ v) conditions. The value of an environmental property, such as an assessment of heading, can be represented by an “internal” random variable X*. When we have two assessments, X i* and X v*, of a single environmental property, as is the case with multiple senses in the combined condition, the value of that property can be estimated by a convex (i.e., coefficients sum to one) combination of the unisensory representations  
X c * = w i X i * + w v X v * , w i , w v > 0 , w i + w v = 1 ,
(1)
where the ws are weights. Assuming unbiased unisensory noisy representations of the true heading angle α, the linear combination is also an unbiased noisy representation of α. Since a linear combination of normal variates is itself normal distributed, the noise in the combined estimate X c* is also normal distributed. Assuming that the random noises are stochastically independent, the variance σ c 2 of X c* is  
σ c 2 = w i 2 σ i 2 + w v 2 σ v 2 .
(2)
The statistical method of Maximum Likelihood (ML) can be used to give a prediction about the weights ( w i and w v). Since we assumed normal distributed noise, the likelihood L j for the internal representation X j* for the sensory condition j is  
L j ( α ; X j * , σ j ) = 1 2 π σ j 2 exp ( 1 2 ( X j * α σ j ) 2 ) .
(3)
In the multisensory condition, the likelihood function of ( X i*, X v*) is given by the product of the likelihoods of the unisensory variables X j* because we assume that the noises are independent across senses. Treating the σ j as knowns, the maximum of this function yields the ML estimate of the true angle α in terms of these parameters. It can be derived mathematically that the values of the expression where the maximum is attained indeed takes the linear form ( Equation 1) where  
w i = σ v 2 σ v 2 + σ i 2 , a n d w v = σ i 2 σ v 2 + σ i 2 .
(4)
Hence the variance σ c 2 of X c* corresponding to these weights is  
σ c 2 = σ i 2 σ v 2 σ i 2 + σ v 2 .
(5)
We conclude that MLI yields a precise prediction of how the variance in the combined condition depends on the variances in the unisensory conditions. As an aside, we mention that the same prediction ( Equation 5) can also be attained by another statistical principle, namely that the weights w i and w v are chosen so that the variance σ c 2 is minimal across all convex combinations. 
Data analysis
Since we assume normally distributed unbiased internal representations of heading, this results in a probit regression of the binomially distributed binary response X on the condition j = (i, v, c) and angle α, and their interaction. Here i, v, and c represent the inertial, visual, and combined condition, respectively. More specifically,  
π j α = Pr ( X j α = 1 ) = Φ ( τ j α σ j ) ,
(6)
where Φ denotes the cumulative standard normal distribution. The model is a dichotomous analogue to the familiar ANCOVA for continuous responses, where we regress the binary dependent variable “response” on the independent variable “sensory modality tested,” in the presence of the continuous independent variable “heading angle.” 
We estimated the parameters ( τ j, σ j) of the psychometric curve for condition j by maximum likelihood estimation (MLE, not to be confused with MLI), assuming that all observations of a participant were stochastically independent since no feedback on performance was provided. We found considerable inter-subject differences in psychometric curve parameters. Since the number of participants was too small to warrant a random effect specification, and we had large numbers of observations per participant, we fitted the model with six parameters (three τs and three σs) separately for each participant. A Pearson's χ 2 showed satisfactory goodness of fit of the psychometric curves, so that we finally fitted the model ( Equation 6) with the MLI-induced constraint ( Equation 5). This restricted model has five parameters (three τs and two σs), as the slope in the combined condition was predicted by MLI of the parameters of unisensory conditions. Since the τ parameter is free to vary for each condition, a comparison of the unrestricted and restricted models is only affected by the slopes ( σ) of the fitted functions. A comparison of the fit of the models with and without this constraint, using likelihood ratio tests with one degree of freedom, allowed us to test whether our data support the MLI hypothesis. 
In other work on testing MLI in cue integration (e.g., Fetsch et al., 2009; Helbig & Ernst, 2007), the standard deviations σ of the underlying Gaussians for the unisensory and multisensory conditions are often derived from the slope of fitted cumulative Gaussians: 
slope=σ2.
(7)
 
The standard deviation of the multisensory condition is subsequently compared to the value that is predicted using the standard deviations of the unisensory conditions ( Equation 5). Our approach essentially does not differ but is a more direct evaluation of the MLI hypothesis that is statistically more efficient. 
Results
Figure 2 shows the results for each individual participant. In each panel, the fitted probability that a stimulus was not perceived as “straight ahead” (SA) is plotted against heading angle for each condition (visual, inertial, and combined visual–inertial). The curves plotted through the data points are cumulative normal distribution functions. The shaded areas represent pointwise 95% confidence intervals. Table 2 describes the two fitted models and two associated tests for each participant. More specifically, the table lists the log likelihood of the unrestricted model, Pearson's goodness-of-fit test for the unrestricted model, the log likelihood of the MLI-restricted model, and a likelihood ratio test for the MLI restriction. 
Figure 2
 
Data points and fitted psychometric functions for each participant (panels) and condition with 95% confidence intervals for the visual (red), inertial (blue), and combined (green) conditions.
Figure 2
 
Data points and fitted psychometric functions for each participant (panels) and condition with 95% confidence intervals for the visual (red), inertial (blue), and combined (green) conditions.
Table 2
 
Model log likelihoods and parameters. Pearson's χ 2 goodness-of-fit test results are presented for the unrestricted model; p-values of χ 2 are based on a chi-square distribution with 12 degrees of freedom. Significant goodness-of-fit results indicate poor model fit; p-values for likelihood ratio tests are based on a chi-square distribution with 1 degree of freedom. Thus, small p-values indicate that the MLI hypothesis on the variance has to be rejected.
Table 2
 
Model log likelihoods and parameters. Pearson's χ 2 goodness-of-fit test results are presented for the unrestricted model; p-values of χ 2 are based on a chi-square distribution with 12 degrees of freedom. Significant goodness-of-fit results indicate poor model fit; p-values for likelihood ratio tests are based on a chi-square distribution with 1 degree of freedom. Thus, small p-values indicate that the MLI hypothesis on the variance has to be rejected.
Participant Unrestricted model MLI-constrained model Likelihood ratio test for MLI
Log likelihood Goodness of fit Log likelihood LR p
Pearson χ 2 p
1 −85.69 10.84 0.542 −86.02 0.65 0.419
2 −69.87 21.77 0.040 −74.89 10.03 0.002
3 −69.31 44.31 0.000 −75.14 11.67 0.001
4 −88.61 17.11 0.146 −93.04 8.85 0.003
5 −82.62 7.68 0.810 −86.05 6.87 0.009
6 −67.18 12.97 0.371 −79.74 25.13 0.000
7 −51.75 12.80 0.384 −71.71 39.92 0.000
8 −75.76 13.57 0.330 −84.24 16.97 0.000
9 −71.53 9.38 0.671 −89.58 36.10 0.000
The fit of the probit model is based on a Pearson's χ 2 test for goodness of fit with 180 observations equally distributed over 18 heading angle × condition combinations. In general, the model fits the data well, and hence we may proceed below to test whether the 6 parameters of this model satisfy the MLI constraint. The violation for participant number three is caused by a poor fit of the model in the combined condition. The results reported below do not differ substantially if we include or exclude participant three. 
A likelihood ratio test showed large differences in the parameters of the fitted cumulative normal distributions ( τ, σ) between participants ( χ 2(48) = 168.92, p < 0.001). Differences between the curves were also assessed for each participant individually using Wald χ 2 tests ( Table 3). 
Table 3
 
Statistical comparison between the parameters of the psychometric curves for the visual (V), inertial (I), and combined (C) conditions.
Table 3
 
Statistical comparison between the parameters of the psychometric curves for the visual (V), inertial (I), and combined (C) conditions.
Participant
1 2 3 4 5 6 7 8 9
V vs. I * * * * * * * * ns
V vs. C * * ns * * * * * ns
I vs. C * ns * * * ns ns ns ns
 

“ns” stands for “not significant”. * p < 0.05.

According to the MLI hypothesis, the variance of the multisensory estimate should be lower than the variance of either unisensory estimate. We compared an unrestricted model (see Data analysis section) to a model in which the standard deviation of the multisensory condition was constrained to the value predicted by MLI ( Equation 5). The average standard deviations for the visual, inertial, and combined visual–inertial conditions were: 1.07° ( SD = 0.68), 7.20° ( SD = 3.94), and 3.56° ( SD = 1.04), respectively. The model likelihoods and likelihood ratios, as well as a measure of the model goodness of fit are given in Table 2. Associated observed and predicted standard deviations are presented in Figure 3
Figure 3
 
Standard deviations ( SD; square root of the variance) for each participant and condition. The dots represent the observed inertial (blue), visual (red), and combined (green) visual/inertial SDs, respectively. The location of the black and green dots in terms of the x-coordinate indicates the associated visual weight. Black dots are the MLI-predicted optimal weights ( x-coordinates) and corresponding SDs ( y-coordinates). If MLI theory applies, the black and green dots should be close to each other.
Figure 3
 
Standard deviations ( SD; square root of the variance) for each participant and condition. The dots represent the observed inertial (blue), visual (red), and combined (green) visual/inertial SDs, respectively. The location of the black and green dots in terms of the x-coordinate indicates the associated visual weight. Black dots are the MLI-predicted optimal weights ( x-coordinates) and corresponding SDs ( y-coordinates). If MLI theory applies, the black and green dots should be close to each other.
Combined over all participants, we have strong evidence against MLI ( χ 2(9) = 156.19, p < 0.001). For all but one participant, we have strong evidence that their heading perceptions conflict with MLI. To allow comparison with previous studies on heading perception, we also calculated the group's average 75% detection thresholds for the three conditions, using the estimated model parameters and an inverse CDF function. These values amounted to 4.6° ( SD = 1.3), 16.1° ( SD = 5.2), and 9.4° ( SD = 1.94) for the visual, inertial, and combined visual–inertial conditions, respectively. Note that these thresholds indicate the shift of a subjective judgment from “straight ahead” to “not straight ahead.” These thresholds should be seen as a “Point of Subjective Straight Heading.” 
Discussion
The results of this study showed that the 75% correct detection threshold for deviations from subjective straight ahead was much lower for visual motion than for inertial motion in all participants. The average 75% threshold was 4.6° in the visual condition and 16.1° for the inertial condition. These values correspond to earlier findings, e.g., by Telford et al. (1995). However, the present multisensory results differ from earlier observations. We found that the detection threshold in the combined condition was always larger (average 9.4°) than in the visual-only condition, while Telford et al. reported a near identical visual and combined visual–inertial threshold of 5.5° and 5.7°, respectively. More importantly, they reported a slight but non-significant reduction in variance when both visual information and vestibular information were available, compared to when only visual information was available. This is in line with the predictions of a Maximum Likelihood Integration (MLI) model. According to MLI, the variance of a combined estimate should always be smaller than the variance of the best unisensory constituent. Although Telford et al. did not explicitly test the MLI hypothesis, Gu et al. (2008) found supporting evidence for MLI in macaque monkeys. In contrast, our results indicated that the variance of the combined estimate was actually larger than that of the best unisensory estimate. Provided that our assumptions on independent and normally distributed errors in the internal representation of stimuli were met, this implies that the weight assigned to the inertial estimate was too large and that MLI has to be rejected. 
Our observation does concur with other findings that inertial cues in a synthetic simulator environment are weighted more heavily relative to visual cues (Groen & Bles, 2004; Groen, Valenti Clari, & Hosman, 2001; Harris, Jenkin, & Zikovitz, 2000). Furthermore, Fetsch et al. (2009) also observed that some participants tended to overweight inertial cues. Remarkably, this overweighting seemed to occur mostly for multisensory stimuli with the highest level (70%) of visual motion coherence (how many of the dots in the visual stimulus coherently moved in the same direction), In the present experiment, the dots in the visual cue moved in a fully coherent fashion (100%) and we observed overweighting of the inertial cue for every participant. Although the different experimental setup used by Fetsch et al. prohibits direct comparison of the results, it seems that the less reliable inertial information gets more weight when motion in a random-dot pattern becomes more coherent. 
The observed overweighting of inertial cues might be explained by a violation of the assumption that sensory estimates are unbiased. Although this assumption is standard for MLI, it has recently been shown by Todd, Christensen, and Guckes (2010) that when this assumption does not hold, a 2AFC experiment can yield biased estimates of the size of the internal noise. Using these biased estimates in an integration scheme will result in erroneous estimates of the sensory weights. Although a 2AFC paradigm, such as used in the present study, does identify sensory noise (variable error), it does not provide information on a possible bias (constant error). Therefore, we cannot validate whether sensory estimates were biased. For this purpose, it would be necessary to measure heading judgments at a continuous scale. 
An alternative explanation of this overweighting is that participants did not perceive the visual and inertial motions as associated. In the present study, inertial motion may have given a much stronger and compelling sensation of self-motion than the random-dot pattern. It is therefore likely that the visual cue was interpreted as a separate event. Although in the debriefing all participants asserted to have followed the instruction to use all sensory information available, they may instinctively have discarded the visual cue. If so, performance in the multisensory condition would correspond to performance in the inertial condition, which was the case for five out of nine participants. As noted by Fetsch et al. (2009), this is analogous to causal inference models (Körding et al., 2007; Sato, Toyoizumi, & Aihara, 2007). These integration models include a step prior to actual integration of multisensory estimates; it is first evaluated whether two cues arise from a single source or from multiple sources. In case multisensory cues are attributed to separate events, integration does not occur. 
The quality of a visual stimulus itself may affect the extent to which it induces self-motion (or vection). It has been suggested that binocular visual cues are more effective in inducing vection than monocular cues, although Fetsch et al. (2009) did not find any effects of stereo vision on cue integration. However, it has been shown that a larger FoV (Allison, Howard, & Zacher, 1999) and photorealistic visual cues (Trutoiu, Mohler, Schulte-Pelkum, & Bülthoff, 2009) enhance vection. Compared to the stimuli used in the present experiment, the stimuli used by Fetsch et al., Gu et al. (2008), and Telford et al. (1995) were displayed with larger FoV; further, the stimulus used by Telford et al. depicted the actual surroundings, which may have increased vection. In future experiments, we plan to investigate the effects of these visual factors on cue integration. 
Supplementary Materials
Movie - Movie File 
Acknowledgments
This research was supported by Grant Number ALW-GO-MG/08-04 of the Netherlands Institute for Space Research (SRON). 
Commercial relationships: none. 
Corresponding author: Ksander N. de Winkel. 
Email: ksander.dewinkel@tno.nl. 
Address: Kampweg 5, 3769 DE Soesterberg, The Netherlands. 
References
Allison R. S. Howard I. P. Zacher J. E. (1999). Effect of field size, head motion and rotational velocity on roll vection and illusory self-tilt in a tumbling room. Perception, 28, 299–306. [Article] [CrossRef] [PubMed]
Andersen T. S. Tiipana K. Sams M. (2005). Maximum likelihood integration of rapid flashes and beeps. Neuroscience Letters, 380, 155–160. [Article] [CrossRef] [PubMed]
Bischof N. (1974). Optic-vestibular orientation to the vertical. In Kornhuber H. (Ed.), Handbook of sensory physiology. Vestibular system. Psychophysics, applied aspects and general interpretations (pp. 155–190). Berlin, Germany/New York: Springer-Verlag.
Bles W. Groen E. L. (2009). The DESDEMONA motion facility: Applications for space research. Microgravity Science and Technology, 21, 281–286. [CrossRef]
Bos J. E. Bles W. (2002). Theoretical considerations on canal–otolith interaction and an observer model. Biological Cybernetics, 86, 191–207. [CrossRef] [PubMed]
Bresciani J.-P. Dammeier F. Ernst M. O. (2006). Vision and touch are automatically integrated for the perception of sequences of events. Journal of Vision, 6, (5):2, 554–664, http://www.journalofvision.org/content/6/5/2, doi:10.1167/6.5.2. [PubMed] [Article] [CrossRef]
Crowell J. A. Banks M. S. (1993). Perceiving heading with different retinal regions and types of optic flow. Perception & Psychophysics, 53, 325–337. [CrossRef] [PubMed]
Ernst M. O. Banks M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 451, 429–433. [CrossRef]
Fetsch C. R. Turner A. H. DeAngelis G. C. Angelaki D. E. (2009). Dynamic reweighting of visual and vestibular cues during self-motion perception. Journal of Neuroscience, 29, 15601–15612. [Article] [CrossRef] [PubMed]
Gibson J. J. (1950). Perception of the visual world. Boston: Houghton Mifflin.
Groen E. L. Bles W. (2004). How to use body tilt for the simulation of self-motion. Journal of Vestibular Research, 14, 375–385. [PubMed]
Groen E. L. Valenti Clari M. S. V. Hosman R. J. A. (2001). Evaluation of perceived motion during a simulated takeoff run. Journal of Aircraft, 38, 600–606. [CrossRef]
Gu Y. Angelaki D. E. DeAngelis G. C. (2008). Neural correlates of multisensory cue integration in macaque MSTd. Nature Neuroscience, 11, 1201–1210. [Article] [CrossRef] [PubMed]
Gu Y. DeAngelis G. C. Angelaki D. E. (2007). A functional link between area MSTd and heading perception based on vestibular signals. Nature Neuroscience, 10, 1038–1047. [Article] [CrossRef] [PubMed]
Guedry F. E. (1974). Psychophysics of vestibular sensation. In Kornhuber H. H. (Ed.), Handbook of sensory physiology (vol. 6, pp. 3–154). Berlin, Germany: Springer-Verlag.
Harris L. R. Jenkin M. Zikovitz D. C. (2000). Visual and non-visual cues in the perception of self-motion. Experimental Brain Research, 135, 12–21. [Article] [CrossRef] [PubMed]
Helbig H. B. Ernst M. O. (2007). Optimal integration of shape information from vision and touch. Experimental Brain Research, 179, 595–606. [Article] [CrossRef] [PubMed]
Henn V. Cohen B. Young L. R. (1980). Visual–vestibular interaction in motion perception and the generation of nystagmus. Neuroscience Research Bulletin, 18, 457–651.
Howard I. P. (1982). Human visual orientation. New York: Wiley.
Howard I. P. (1997). Interactions within and between the spatial senses. Journal of Vestibular Research, 7, 311–345. [CrossRef] [PubMed]
Howard I. P. Childerson L. (1994). The contribution of motion, the visual frame and visual polarity to sensations of body tilt. Perception, 23, 753–762. [CrossRef] [PubMed]
Howard I. P. Heckmann T. (1989). Circular vection as a function of the relative sizes, distances, and positions of two competing visual displays. Perception, 18, 657–665. [CrossRef] [PubMed]
Jürgens R. Becker W. (2006). Perception of angular displacement without landmarks: Evidence for Bayesian fusion of vestibular, optokinetic, podokinesthetic and cognitive information. Experimental Brain Research, 174, 528–543. [Article] [CrossRef] [PubMed]
Körding K. P. Beierholm U. Ma W. J. Quartz S. Tenenbaum J. B. Shams L. (2007). Causal inference in multisensory perception. PLoS ONE, 2, e943. [CrossRef] [PubMed]
Laurens J. Droulez J. (2006). Bayesian processing of vestibular information. Biological Cybernetics, 96, 389–404. [Article] [CrossRef] [PubMed]
Long J. S. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage.
MacNeilage P. R. Banks M. S. Berger D. R. Bülthoff H. H. (2007). A Bayesian model for the disambiguation of gravito-inertial force by visual cues. Experimental Brain Research, 179, 263–290. [Article] [CrossRef] [PubMed]
Mittelstaedt H. (1996). Somatic graviception. Biological Psychology, 42, 53–74. [CrossRef] [PubMed]
Ohmi M. (1996). Egocentric perception through interaction among many sensory systems. Cognitive Brain Research, 5, 87–96. [CrossRef] [PubMed]
Sato Y. Toyoizumi T. Aihara K. (2007). Bayesian inference explains perception of unity and ventriloquism aftereffect: Identification of common sources of audiovisual stimuli. Neural Computation, 19, 3335–3355. [Article] [CrossRef] [PubMed]
Shams L. Ma W. J. Beierholm U. (2005). Sound-induced flash illusion as an optimal percept. Neuroreport, 16, 1923–1927. [CrossRef] [PubMed]
Telford L. Howard I. P. Ohmi M. (1995). Heading judgments during active and passive self-motion. Experimental Brain Research, 104, 502–510. [CrossRef] [PubMed]
Todd J. T. Christensen J. C. Guckes K. M. (2010). Are discrimination thresholds a valid measure of variance for judgments of slant from texture? Journal of Vision, 10, (2):20, 1–18, http://www.journalofvision.org/content/10/2/20, doi:10.1167/10.2.20. [PubMed] [Article] [CrossRef] [PubMed]
Trutoiu L. C. Mohler B. J. Schulte-Pelkum J. Bülthoff H. H. (2009). Circular, linear, and curvilinear vection in a large-screen virtual environment with floor projection. Computers & Graphics, 33, 47–58. [CrossRef]
Vingerhoets R. A. A. De Vrijer M. Van Gisbergen J. A. M. Medendorp W. P. (2009). Fusion of visual and vestibular tilt cues in the perception of visual vertical. Journal of Neurophysiology, 101, 1321–1333. [Article] [CrossRef] [PubMed]
Warren W. H. Hannon D. J. (1988). Direction of self-motion is perceived from optical flow. Nature, 336, 162–163. [CrossRef]
Warren W. H. Kurtz K. J. (1992). The role of central and peripheral vision in perceiving the direction of self-motion. Perception & Psychophysics, 51, 443–454. [CrossRef] [PubMed]
Warren W. H. Morris M. W. Kalish M. (1988). Perception of translational heading from optical flow. Journal of Experimental Psychology: Human Perception and Performance, 14, 646–660. [CrossRef] [PubMed]
Young L. R. Dichgans J. Murphy R. Brandt Th. (1973). Interaction of optokinetic and vestibular stimuli in motion perception. Acta Oto-Laryngologica, 76, 24–31. [CrossRef] [PubMed]
Zaichik L. E. Rodchenko V. V. Rufov I. V. Yashin Y. P. White A. D. (1999). Acceleration perception. American Institute of Aeronautics and Astronautics, 4334, 512–520.
Zupan L. H. Merfeld D. M. Darlot C. (2002). Using sensory weighting to model the influence of canal, otolith and visual cues on spatial orientation and eye movements. Biological Cybernetics, 86, 209–230. [CrossRef] [PubMed]
Figure 1
 
Screen capture of the visual stimulus.
Figure 1
 
Screen capture of the visual stimulus.
Figure 2
 
Data points and fitted psychometric functions for each participant (panels) and condition with 95% confidence intervals for the visual (red), inertial (blue), and combined (green) conditions.
Figure 2
 
Data points and fitted psychometric functions for each participant (panels) and condition with 95% confidence intervals for the visual (red), inertial (blue), and combined (green) conditions.
Figure 3
 
Standard deviations ( SD; square root of the variance) for each participant and condition. The dots represent the observed inertial (blue), visual (red), and combined (green) visual/inertial SDs, respectively. The location of the black and green dots in terms of the x-coordinate indicates the associated visual weight. Black dots are the MLI-predicted optimal weights ( x-coordinates) and corresponding SDs ( y-coordinates). If MLI theory applies, the black and green dots should be close to each other.
Figure 3
 
Standard deviations ( SD; square root of the variance) for each participant and condition. The dots represent the observed inertial (blue), visual (red), and combined (green) visual/inertial SDs, respectively. The location of the black and green dots in terms of the x-coordinate indicates the associated visual weight. Black dots are the MLI-predicted optimal weights ( x-coordinates) and corresponding SDs ( y-coordinates). If MLI theory applies, the black and green dots should be close to each other.
Table 1
 
Heading angles used in the three conditions. V, I, and C stand for the visual, inertial, and combined visual–inertial condition, respectively.
Table 1
 
Heading angles used in the three conditions. V, I, and C stand for the visual, inertial, and combined visual–inertial condition, respectively.
Heading angle α
10° 15° 20°
V x x x x x x
I x x x x x x
C x x x x x x
Table 2
 
Model log likelihoods and parameters. Pearson's χ 2 goodness-of-fit test results are presented for the unrestricted model; p-values of χ 2 are based on a chi-square distribution with 12 degrees of freedom. Significant goodness-of-fit results indicate poor model fit; p-values for likelihood ratio tests are based on a chi-square distribution with 1 degree of freedom. Thus, small p-values indicate that the MLI hypothesis on the variance has to be rejected.
Table 2
 
Model log likelihoods and parameters. Pearson's χ 2 goodness-of-fit test results are presented for the unrestricted model; p-values of χ 2 are based on a chi-square distribution with 12 degrees of freedom. Significant goodness-of-fit results indicate poor model fit; p-values for likelihood ratio tests are based on a chi-square distribution with 1 degree of freedom. Thus, small p-values indicate that the MLI hypothesis on the variance has to be rejected.
Participant Unrestricted model MLI-constrained model Likelihood ratio test for MLI
Log likelihood Goodness of fit Log likelihood LR p
Pearson χ 2 p
1 −85.69 10.84 0.542 −86.02 0.65 0.419
2 −69.87 21.77 0.040 −74.89 10.03 0.002
3 −69.31 44.31 0.000 −75.14 11.67 0.001
4 −88.61 17.11 0.146 −93.04 8.85 0.003
5 −82.62 7.68 0.810 −86.05 6.87 0.009
6 −67.18 12.97 0.371 −79.74 25.13 0.000
7 −51.75 12.80 0.384 −71.71 39.92 0.000
8 −75.76 13.57 0.330 −84.24 16.97 0.000
9 −71.53 9.38 0.671 −89.58 36.10 0.000
Table 3
 
Statistical comparison between the parameters of the psychometric curves for the visual (V), inertial (I), and combined (C) conditions.
Table 3
 
Statistical comparison between the parameters of the psychometric curves for the visual (V), inertial (I), and combined (C) conditions.
Participant
1 2 3 4 5 6 7 8 9
V vs. I * * * * * * * * ns
V vs. C * * ns * * * * * ns
I vs. C * ns * * * ns ns ns ns
 

“ns” stands for “not significant”. * p < 0.05.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×