Open Access
Article  |   May 2025
Eye-movement patterns for perceiving bistable figures
Author Affiliations
  • Yi-Hsuan Hsu
    Department of Psychology, National Taiwan University, Taipei, Taiwan
    [email protected]
  • Chien-Chung Chen
    Department of Psychology, National Taiwan University, Taipei, Taiwan
    Neurobiology and Cognitive Science Center, National Taiwan University, Taipei, Taiwan
    [email protected]
Journal of Vision May 2025, Vol.25, 3. doi:https://doi.org/10.1167/jov.25.6.3
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yi-Hsuan Hsu, Chien-Chung Chen; Eye-movement patterns for perceiving bistable figures. Journal of Vision 2025;25(6):3. https://doi.org/10.1167/jov.25.6.3.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Bistable figures can generate two different percepts alternating with each other. It is suggested that eye fixation plays an important role in bistable figure perception because it helps us selectively focus on certain image features. We tested how the shift of percept is related to the eye-fixation pattern and whether inhibition of return (IOR) plays a role in this process. IOR refers to the phenomenon where, after attention remains at the same image location for a period, the inhibition to the mechanisms supporting that location increases. Consequently, visual attention shifts to a new location, and reallocation to the original location is suppressed. We used an eye tracker to record the observers’ eye movements during observation of the duck/rabbit figure and the Necker cube while recording their percept reversals. In Experiment 1, we showed there were indeed different eye fixation patterns for different percepts. Also, the fixation shifted across different regions that occurred before the percept reversal. In Experiment 2, we examined the influence of inward bias on the duck/rabbit figure and found that it had a significant effect on the first percept but that this effect diminished over time. In Experiment 3, a mask was added to the attended region to remove the local saliency. This manipulation increased the number of percept reversals and fixation shifts across different regions. That is, the change in local saliency can cause a fixation shift and thus reverse our perception. Our result shows that what we perceive depends on where we look.

Introduction
Bistable figures are images that can be interpreted in two distinct ways with the same physical structure. When viewing such figures, observers may experience spontaneous percept reversals between the two possible interpretations (Long & Toppino, 2004). Classic examples include the duck/rabbit figure (Torrey, 1970), which can be perceived as either a duck facing left or a rabbit facing right, and the Necker cube (Necker, 1832), which can be perceived either as if the cube is viewed from above (top view) or below (bottom view). 
Even at the beginning of the scientific investigation of the bistable figure, the role of eye fixation and attention was noted. Necker (1832) suggested that the shift of the percept in the Necker cube is affected by eye fixation at different positions in the figure. Modern authors have also confirmed this observation (Meng & Tong, 2004). Necker (1832) attributed this effect to an optical cause (distinct vision at fixation), but modern authors considered this as an attentional effect (Kawabata, Yamagami, & Noakl, 1978; Meng & Tong, 2004; Peterson & Gibson, 1991; Sato, Laeng, Nakauchi, & Minami, 2020; Toppino, 2003). 
Rather than instructing observers to shift the eye fixation actively, some studies are using eye-tracking techniques to show that, under free viewing, observers’ eye-fixation patterns varied with the changing percept of the Necker cube (Glen, 1940; Ellis & Stark, 1978; Einhäuser, Martin, & König, 2004; Polgári, Causin, Weiner, Bertschy, & Giersch, 2020; van Dam & van Ee, 2005, van Dam & van Ee, 2006). Nakatani, Orlandi, and van Leeuwen (2011) showed that eye blinks and saccades occurred during percept reversals, and this result was considered a representation of the reallocation of attentional resources. Ellis and Stark (1978) showed that viewers often fixated on the vicinity of the vertices of the Necker cube during perceptual reversals. Einhäuser et al., (2004) found that changes in the perceived orientation of the Necker cube were more likely to occur when eye fixations were at the extreme positions of the eye movements, suggesting that percept shifts are linked to exploratory gaze patterns. Van Dam and van Ee (2005), Van Dam and van Ee (2006) showed that saccades and fixations correlate with perceptual flips in the Necker cube. Kietzmann, Geuter and König, (2011) also showed distinct eye-fixation patterns for different percepts of ambiguous patterns. These results demonstrated that the eye-fixation pattern changes while an observer is viewing a bistable figure and that such change is correlated with the shift of the percept. 
There are theoretical frameworks intent on explaining the change of the fixation pattern over time. Koch and Ullman (1987) suggested that visual attention and eye fixation can be captured by salient image components. Itti and Koch (2000; also see Itti, Koch, & Niebur, 1998) developed a saliency model in which the visual system combines the responses of the channels for color, luminance intensity, and orientation to create a saliency map. Later authors emphasized the relative importance of higher-level structures and objects (Elazary & Itti, 2008; Stoll, Thrun, Nuthmann, & Einhäuser, 2015) for saliency computation. In this model, the most salient location in an image is the one containing the most important information, which would attract the focus of attention (FOA), which is a small disk representing the part of the image an observer attends to or fixates on (Itti & Koch, 2000). However, after a prolonged fixation, the inhibition of the neural mechanisms responsive to the current FOA increases, which leads to reduced saliency of the fixated image component. The FOA then shifts to the next most salient location and will not return to the original location until its underlying neural mechanism recovers from inhibition. This phenomenon is known as inhibition of return (IOR). Notice that the IOR was originally noticed in the shift of covert attention (Posner & Cohen, 1984; Klein, 2000). A salience map may not be necessary for IOR to work under different experimental paradigms for attention. Nevertheless, this framework has also been widely utilized in predicting human fixation patterns (Foulsham & Underwood, 2007; Parkhurst, Law, & Niebur, 2002; Parkhurst & Niebur, 2003; Underwood, Foulsham, van Loon, Humphreys, & Bloyce, 2006). 
In sum, the image component that an observer focuses on has a profound effect on the percept (Meng & Tong, 2004; Necker, 1832), and the observers, while viewing a bistable pattern, do have a fixation pattern change correlated with percept shift (Einhäuser et al., 2004; Kietzmann et al., 2011). In addition, it is suggested that the tracks of eye movement can be explained by the saliency of the image components and IOR. Thus, it is possible that the percept shifts on bistable figures due to the saliency of the image components. That is, if there are two salient regions within a bistable figure, each containing features that lead to different interpretations, the fixation shift caused by IOR could bring about a reversal of our interpretations of the image. After all, shifting fixation to another location means the observer would acquire different information and, consequently, a different percept. To test this hypothesis, in Experiment 1, we used an eye tracker to record the observers’ eye movements when they viewed the bistable figures because eye movements can usually imply the corresponding shift of fixation (Hoffman & Subramaniam, 1995; Shepherd, Findlay, & Hockey, 1986). The fixation maps under different percepts should have dissimilar patterns centered on different salience regions. Furthermore, besides the most frequently used Necker cube, we also included the duck/rabbit figure, which, to our knowledge, has never been used in the past eye-movement studies to test the generality of the eye-fixation pattern studies. 
In addition to image features, other factors can also influence the perception of bistable figures (Goolkasian & Woodberry, 2010; Rock, Gopnik, & Hall, 1994). Inward bias is one such factor. Chen and Scholl (2014) manipulated the location of the duck/rabbit figure within a surrounding frame and measured the observers’ first percept upon seeing this figure as well as the total duration of different percepts during the stimulus presentation. When the figure was placed near the right border of the frame, the observer had the highest probability of perceiving it as a duck, whereas placing the figure near the left border had the lowest probability. When the figure was in the center, both percepts occurred with an equal probability. This inward bias (Palmer, Gardner, & Wickens, 2008) was more pronounced for the first percept. In Experiment 2, we further investigated whether the inward bias would affect the shift of FOA and, in turn, the effect of saliency in bistable figures. 
In the saliency model by Itti and Koch (2000), regions with higher saliency are more likely to attract our visual attention. In other words, if the local saliency is removed, visual attention is more likely to shift to another salient location. This should lead to a quicker shift of FOA and, in turn, a more frequent percept reversal on bistable figures. In Experiment 3, we introduced a mask to block the part of the figure that the observers were fixated on while viewing the bistable figures. This mask effectively eliminated the salient information from that masked region and, in turn, caused the fixation to relocate to another salient area. In this situation, the observers should make more fixation shifts between different regions of interest (ROIs) and thus a greater frequency of percept reversals. 
Experiment 1
We first investigated whether the FOA shift across regions was associated with different percepts. We tested whether there was a fixation pattern related to each percept of the bistable figures and how such patterns change over time before and after the percept reversal. 
Method
Apparatus
The stimuli were presented on a 19-inch monitor with 1,280 (H) × 1,024 (V) spatial resolution and a refresh rate of 60 Hz in a small, quiet room. The room was normally lit (12.2 lux in front of the display) from a fluorescent light above. The observers sat in front of the monitor at a viewing distance of 60 cm, with their heads stabilized by a chin rest. At this distance, each pixel on the monitor extended at a visual angle of 0.028° × 0.028° at the center of the display. An Eyelink 1000 Plus (SR Research, Ottawa, Canada) corneal-reflection eye tracker was used to record the observers’ right-eye movements at a sampling rate of 1,000 Hz. The eye tracking was achieved with a recording camera placed below the monitor. The recording camera was well adjusted to the height and angle of the observer’s face in advance. 
The eye-tracker calibration routine and stimuli generation were all written in MATLAB R2021a (https://www.mathworks.com/) with the PsychToolbox-3 (Brainard, 1997). The stimulus presentation computer was synchronized with the eye tracker through Transmission Control Protocol/Internet Protocol (TCP/IP). Observers were asked not to move their heads during the experiment. They made responses on a gamepad according to their perception change. The direction of a joystick on the gamepad was either left or right, forcing observers to make a choice. Apparatus settings were the same for all three experiments reported in this article. 
Observers
Twenty-three observers (11 women), aged between 21 and 35 years, participated in Experiment 1. All had normal (20/20) or corrected-to-normal vision. Informed consent was received from each observer before the experiment started. After completing the experiment, the observers received financial compensation for their time. 
Stimuli
There were two black-and-white bistable figures used in this experiment: the duck/rabbit figure (Torrey, 1970) and the Necker cube (Necker, 1832). The size of the duck/rabbit figure was 1,024 (H) × 630 (V), and the Necker cube was 600 (H) × 540 (V). The black-and-white images had luminance ranging from 0.12 to 101.7 cd/m2. The figures were always presented in the center of the screen. Each figure was presented in two runs. In total, there were four runs. The order of the runs was randomized for each observer. 
Procedure
After signing informed consent, observers were escorted to the testing room. They were first shown the two experimental bistable figures to ensure they could successfully perceive and understand both interpretations of each stimulus. This step was necessary because the study aimed to investigate the change of eye movements during the percept reversal. Before the experiment started, a practice session was conducted to familiarize the observers with the task. The data from the practice session were not included in the data analysis. The practice session contained two runs that lasted 15 s each. The main experiment started after the observers had fully understood the experimental tasks. 
The main session started with calibration and validation of the eye tracker with a 9-point (3 × 3) grid and a 1-point drift check. The experiment started immediately after a successful validation (error must be less than 1° on average and less than 1.5° at maximum). Before each run, the screen showed an instruction to remind the observers of the two percepts of the figure and their corresponding responses on the joystick with left and right arrows. Observers pressed a button when ready. After that, a black fixation point located at the center of the screen was shown for 200 ms, followed by the 30-s stimulus presentation, where the eye movements were recorded (see Figure 1). During the experiment, the observers had to continuously report their percept by switching the joystick to the corresponding direction. If they experienced a percept reversal, they had to move the joystick to the other side as soon as possible; otherwise, they should keep the joystick at the same position. Making no choice and leaving the joystick at the center were not allowed. 
Figure 1.
 
The experimental procedure in a single run.
Figure 1.
 
The experimental procedure in a single run.
The total duration of Experiment 1 was approximately 10 min. In the middle of the experiment, we conducted a drift check to ensure the validity of the eye-tracking recording in case the observer’s head moved away from the original position. When viewing the images, the observers were allowed to move their eyes freely and not to intentionally control their percept. 
Data analysis
The time course for the eye movement was time-locked to each finger response. The eye positions during blinking or looking away from the test stimuli, or 2.9% of all data, were not included in further data analysis. We then calculated the number of eye fixations at each pixel position. For the visualization of fixation patterns, we first divided the whole screen into 80 × 64 grids of 16 × 16 pixels. Then we calculated the frequency of eye-fixation samples inside each grid to derive the fixation density. Throughout the article, the fixation density is shown as heatmaps, with brighter colors indicating a higher fixation density, and it was smoothed by a Gaussian filter with a space constant (“standard deviation”) of 16 pixels, or the width of one grid, for better visualization. 
The regions of interest were determined from the fixation map. We summed up the number of eye-fixation samples inside each ROI across all the observers for each bin 100 ms wide for the duration, from 1,000 to 0 ms before each response. Such fixation per millisecond was further scaled by the area (degree2) of ROI. We also calculated the time point of fixation shift as the fixation density in the destination ROI exceeding the density in the starting ROI. 
Results
Fixation density
Figure 2 illustrates eye-fixation density maps for both the duck/rabbit figure (top row) and the Necker cube (bottom row). The pseudocolor indicates fixation density for grids with a total fixation duration greater than 50 ms. For the duck/rabbit figure, when the observers reported seeing a rabbit, their eyes were fixated on the regions on the right side of the figure that contained the image features resembling a rabbit's eye and mouth. In contrast, when they reported seeing a duck, they fixated on the regions on the left part, which contained features resembling a duck's eye and mouth. Overall, it appears that eyes and mouths are crucial features in forming perception. To identify regions associated with each percept, we calculated the difference between the density maps of the duck and rabbit (upper right in Figure 2). Blue regions indicate where the observers were more likely to fixate their eyes (t(22) = 8.35, p < 0.0001) when perceiving the figure as a duck, whereas the red regions were more likely to be fixated on (t(22) = 6.32, p < 0.0001) when perceiving the figure as a rabbit. Based on this result, we determined the duck (450 (H) × 280 (V)) and rabbit (280 (H) × 350 (V)) ROI, indicated by blue and red rectangles in Figure 2, respectively, for subsequent analysis. 
Figure 2.
 
The average fixation density maps under the two percepts for the duck/rabbit figure (top) and the Necker cube (bottom). The left and center panels show the fixation density map of different percepts of each figure. The pseudocolor indicates fixation density. The right-most panel shows the difference between the two density maps of each figure (rabbit minus duck for the top row and bottom view minus the top view for the bottom row). The unit of the color bar is the fixation duration (ms) per pixel. The red rectangle indicates the ROI for the rabbit or the bottom view, and the blue rectangle indicates the ROI for the duck or the top view. In the duck/rabbit figure, the green rectangle represents the ROI with fixations for both percepts.
Figure 2.
 
The average fixation density maps under the two percepts for the duck/rabbit figure (top) and the Necker cube (bottom). The left and center panels show the fixation density map of different percepts of each figure. The pseudocolor indicates fixation density. The right-most panel shows the difference between the two density maps of each figure (rabbit minus duck for the top row and bottom view minus the top view for the bottom row). The unit of the color bar is the fixation duration (ms) per pixel. The red rectangle indicates the ROI for the rabbit or the bottom view, and the blue rectangle indicates the ROI for the duck or the top view. In the duck/rabbit figure, the green rectangle represents the ROI with fixations for both percepts.
For the Necker cube, the observers showed increased fixation in the upper-right regions of the figure when they perceived it from the bottom view and in the lower-left regions when they perceived it from the top view. Similarly, we calculated the difference between the density maps for the top and bottom views. The blue regions indicate higher fixation density for the top view, and the red regions indicate that for the bottom view. We thus determined the ROIs for the top view (blue rectangular in Figure 2) and bottom view (red rectangular in Figure 2). Both ROIs had the same size (260 (H) × 400 (V)). 
Time course of fixation shift
Figure 3 shows the change of fixation density before the percept reversal from rabbit to duck (Figure 3a) and from duck to rabbit (Figure 3b). It was clear that the cluster of fixation density shifted to the right (to the mouth of the rabbit) and then to the left (to the beak of the duck) before the observers reported seeing the duck. On the other hand, before reporting the rabbit percept, the shift of the cluster of fixation density was in the opposite direction. Figure 4 shows the change of fixation density, pooled across the two types of percept change, in the starting and the destination ROIs. The fixation density in the starting ROI decreased over time, and that in the destination increased. The two fixation density curves crossed each other at 485 ms before the finger response, at which point the fixation density in the destination ROI started to surpass that in the starting ROI. Similar results were also found for the Necker cube (Figure 5), where the fixation shift occurred at 244 ms before the finger response when the fixation density in the destination ROI started to surpass that in the starting ROI (see Figure 6). Notice that the shift of fixation is a consequence of a saccade movement, which takes about 200 ms to initiate and 20 to 40 ms to execute (Fischer & Ramsperger, 1984). The decision to make eye movement would be 480 to 720 ms before the finger response for perceptual reversal. Compare this time course to the 200-ms reaction time needed for making the response (Haith & Bestmann, 2020) or the estimated 340-ms duration between decisions for perceptual reversal to manual response (Kornmeier & Bach, 2012). The decision to make eye movement should take place long before the decision to report percept reversal. To put this time course into perspective, in our experiment, the duration of each percept reported by the observers was, on average, 2,435 ms (SD = 395 ms) for the duck/rabbit figure and 3,028 ms (SD = 403) for the Necker cube. That is, on average, the eye fixation shifts occurred at about 70% to 85% of the percept duration. 
Figure 3.
 
The fixation density map for every 200-ms period with a 100-ms bin width (± 50 ms) before the behavioral response from rabbit to duck (a) and from duck to rabbit (b). The red and blue rectangles are the rabbit and duck ROIs, respectively.
Figure 3.
 
The fixation density map for every 200-ms period with a 100-ms bin width (± 50 ms) before the behavioral response from rabbit to duck (a) and from duck to rabbit (b). The red and blue rectangles are the rabbit and duck ROIs, respectively.
Figure 4.
 
The change of fixation density for viewing the duck/rabbit figure from 1,000 ms before the finger response. The blue and red curves indicate the fixation density within the starting and the destination ROI, respectively. The dashed line indicates the time of equal fixation for the ROIs. The shaded area denotes the 1 standard error of measurement.
Figure 4.
 
The change of fixation density for viewing the duck/rabbit figure from 1,000 ms before the finger response. The blue and red curves indicate the fixation density within the starting and the destination ROI, respectively. The dashed line indicates the time of equal fixation for the ROIs. The shaded area denotes the 1 standard error of measurement.
Figure 5.
 
The fixation density map for every 200-ms period with a 100-ms bin width (± 50 ms) before the behavioral response for the top view (a) and bottom view (b). The red and blue rectangles denote the ROI of the bottom and top view, respectively.
Figure 5.
 
The fixation density map for every 200-ms period with a 100-ms bin width (± 50 ms) before the behavioral response for the top view (a) and bottom view (b). The red and blue rectangles denote the ROI of the bottom and top view, respectively.
Figure 6.
 
The change of fixation density for viewing the Necker cube from 1,000 ms before the finger response. The meanings of the symbols and curves are the same as in Figure 4.
Figure 6.
 
The change of fixation density for viewing the Necker cube from 1,000 ms before the finger response. The meanings of the symbols and curves are the same as in Figure 4.
Discussion
The eye-fixation density maps revealed that the observers indeed fixated on different regions of a bistable figure when they reported different percepts. This result is consistent with Ellis and Stark (1978), Einhäuser et al. (2004), and Kietzmann et al. (2011), who also showed percept shift correlated fixation patterns while viewing a bistable pattern. 
The fixation patterns, however, do not seem to match what one would expect from image saliency. For the Necker cube, one would expect the most salient location of the image to be the vertices where three lines join. Instead, the fixation density was highest in the two regions along the two central vertical lines of the cube (Figure 2) and obviously less salient than the vortices. Thus, the image saliency seems to have limited prediction for fixation patterns. 
Notice that the left fixation site on the Necker cube, indicated by the blue square in Figure 2, is at the center of the foremost surface of the Necker cube in the top view. Similarly, the right fixation site, indicated by the square in Figure 2, is in the center of the foremost surface of the cube in the bottom view. One possibility is that the frontal surfaces might provide more meaningful information for observers to form new percepts. When an observer attends to one of the two surfaces, that surface is more likely to be seen in front of the other one. That is, whichever surface of the cube is attended can therefore determine which angle of view is perceived. After all, it is known that the attended surface tended to be considered the figure rather than the background in figure-ground segregation (Driver & Baylis, 1996; Vecera, Flevaris, & Filapek, 2004). The attended surface thus plays an important role in how an observer interprets a scene. Furthermore, it is known that observers tend to fixate their eyes on the center objects more than the edges, as predicted by the saliency model (Foulsham & Kingstone, 2013; Nuthmann & Henderson, 2010). Thus, it is not surprising that the observer fixated on the center of the surface, rather than the vertices, as our data show. 
For the Necker cube, our data also showed that the participants fixated on the left region when they shifted their percept from the bottom view to the top view and the right region when they shifted their percept from the bottom view to the top view. Our result is consistent with that of Ellis and Stark (1978) and Einhäuser et al. (2004), who reported a left-side bias of the fixation pattern when shifting their percept to the top view and a right-side bias for shifting to the bottom view. However, these results are inconsistent with Meng and Tong (2004), who suggested that fixating on the upper-right corner of the square caused observers more likely to perceive the cube from the top view, whereas the lower-left corner made them perceive the cube from the bottom view. This discrepancy may arise from differences in experimental tasks. In our study, as well as in Ellis and Stark (1978) and Einhäuser et al. (2004), observers were allowed to freely view the entire figure. Thus, we measured how the observers actively acquired information from the stimuli. Meng and Tong's observers were asked to focus on specific corners, which also inadvertently directed their attention to the corresponding surface (Kawabata, 1986). Thus, this limited the observers’ information intake. Such a difference may have a consequence on how an observer interprets the Necker cube. As discussed above, it seems that the observers consider the fixated surface as in front and construct their percepts accordingly. On the other hand, the lower-left vertex of the Necker cube is the foremost corner in the bottom view and the upper-right vertex the top view. Thus, in Meng and Tong (2004), their observers seem to interpret the corner they fixated on as the foremost corner and used it to construct their percept. Thus, it is possible that the observers consider whatever image feature they focus on as the front and use this to interpret the images. Our result suggests that the surface is what observers would focus on if they were allowed to inspect the image freely. When asked to focus on the corner, they were forced to use it as the front feature and thus produced the opposite result from free viewing. 
Similarly, in the black-and-white duck/rabbit figure we used, the locations with the most image saliency would be the ones with the highest density of black ink. (There are two reasons for this effect. First, the contrast energy is determined by the sum of the squared luminance difference between the black pixels and the white background, which occupies most of the image area and dominates the mean luminance. Thus, a higher density of black pixels results in more luminance changes, leading to higher contrast energy. Second, the increased density of black pixels leads to lower mean luminance in the local region compared to the background, thereby resulting in a higher Michaelson or Weber contrast.) However, our observers tended to focus on the eye and mouth of a rabbit or duck rather than on those darker areas (Figure 2). An interpretation of this focus on mouth and eye regions may come from their significance in face identification (Abudarham & Yovel, 2016; Hessels, 2020) and their association with facial representations (Rhodes, 1988; Sekiguchi, 2011). Indeed, faces can capture visual attention very rapidly, even when observers are told not to focus on them (Crouzet, Kirchner, & Thorpe, 2010; Langton, Law, Burton, & Schweinberger, 2008). Cerf, Harel, Einhäuser, and Koch (2007) even considered faces requiring their own saliency channel to increase the accuracy of predicted salient regions in figures containing faces. Thus, our result suggests that the saliency of the duck/rabbit figure is not solely dependent on early visual features but rather on the implied meaning of those image components in face perception. 
Thus, instead of image saliency, when viewing bistable figures, the observers tend to fixate on meaningful features, such as the frontal surfaces of the Necker cube or the eyes and mouths of the duck/rabbit figures. Such a result is consistent with the previous studies showing that objects are salient in complex scenes (Elazary & Itti, 2008; Stoll et al., 2015). Henderson, Brockmole, Castelhano, and Mack (2007) also showed that observers tend to fixate on image features that are semantically informative. 
We also showed that the fixation shift occurred long before the finger response during the bistable figures’ viewing, and this difference is longer than the reaction time needed for moving the joystick, which is around 200 ms (Haith & Bestmann, 2020). Consider that the shift of fixation itself is a consequence of saccade movement, which takes about 200 ms to initiate (Fischer & Ramsperger, 1984). The decision to make eye movement would be 480 to 720 ms before the finger response for percept reversal. Thus, the decision to make eye movement should take place long before the decision to report percept shift, which was measured as about 340 ms before finger response (Kornmeier & Bach, 2012). Therefore, the shift of fixation is not a consequence of the percept reversal but a driving factor in changing our interpretation of bistable figures. 
Taken together, our result implies that, when viewing a bistable figure, the fixation of the observers is first attracted by a meaningful feature of the image. The observers then construct the percept that is consistent with the meaning of the feature they focus on. 
Experiment 2
In Experiment 2, we examined the inward bias on bistable figures. Chen and Scholl (2014) suggested that the inward bias can influence the percept of the duck/rabbit figure, especially the first percept upon seeing the figure. Here we also used the duck/rabbit figure to test the effect of inward bias on eye-fixation patterns. 
Method
Observers
All observers in Experiment 1 also participated in Experiment 1
Stimuli
In this experiment, only a duck/rabbit figure with a smaller size (512 (H) × 315 (V)) was used. It could be placed on the left, center, or right sides of the screen; therefore, there were three conditions in total. In the left and right conditions, the center of the image was shifted 5° horizontally from the center of the display. Each condition included two runs for a total of six. All runs were randomly presented to the observers. 
Procedure
All the procedures were the same as those used in Experiment 1. All the observers participated in Experiment 2 immediately after they finished Experiment 1. No additional practice session was conducted before Experiment 2 because the task was quite similar to the first one. The total duration of Experiment 2 was approximately 10 min. 
Results
Fixation density
Figure 7 displays the average density maps of eye fixations under each percept for three stimulus locations. Regardless of the stimulus location, the fixations were mostly located at the right part of the figure when the observers reported seeing a rabbit and the left part of the figure when they reported seeing a duck. This result is consistent with that of Experiment 1. We then identified the duck (240 (H) × 210 (V)) and rabbit (190 (H) × 210 (V)) ROIs, as denoted by blue and red rectangles, respectively, in Figure 8 with the same procedure as that used in Experiment 1. The relative positions and the sizes of the ROIs were the same across all three conditions. 
Figure 7.
 
The average fixation density maps for the two percepts of the duck/rabbit figure placed at the left (top row), center (middle row), and right (bottom row) locations. The pseudocolor indicates the fixation duration (ms) per pixel. The red and blue rectangles are the rabbit and duck ROIs, respectively. The green rectangles represent the ROI for the overlapped area of the two density maps.
Figure 7.
 
The average fixation density maps for the two percepts of the duck/rabbit figure placed at the left (top row), center (middle row), and right (bottom row) locations. The pseudocolor indicates the fixation duration (ms) per pixel. The red and blue rectangles are the rabbit and duck ROIs, respectively. The green rectangles represent the ROI for the overlapped area of the two density maps.
Figure 8.
 
The percentage of the first percept of the duck/rabbit figure at the left, center, and right locations. The orange and cyan bars indicate the percentages of duck and rabbit percepts, respectively. **p < 0.01 Bonferroni corrected.
Figure 8.
 
The percentage of the first percept of the duck/rabbit figure at the left, center, and right locations. The orange and cyan bars indicate the percentages of duck and rabbit percepts, respectively. **p < 0.01 Bonferroni corrected.
First percept
Each observer's first response in each run of the three locations was extracted to calculate the total percentages of the first percept. Figure 8 shows the results. When the duck/rabbit figure was positioned on the left side of the screen, the percentage of seeing a rabbit first was higher (rabbit: 65.2%; duck: 34.8%). On the other hand, when it was on the right side, the percentage of seeing a duck first was higher (rabbit: 32.6%; duck: 67.4%). When the figure was centered, the percentage of each percept was almost equal (rabbit: 47.8%; duck: 52.2%). Three paired chi-squared tests were conducted to compare the percentages of first perceiving the duck and rabbit for each location (left, center, and right). There was a significant difference between left and right (χ2(1, N = 46) = 9.79; p = 0.0018 < α = .017) (Bonferroni corrected from .05) but not between left and center (χ2(1, N = 46) = 2.83; p = 0.092) or between center and right (χ2(1, N = 46) = 2.22; p = 0.14). The results support the notion that when the figure appears near the surrounding border, it is more likely to be seen as facing inward for the first percept. 
Total duration of each percept
We calculated the percentage durations of the two percepts in a run in each condition (left: rabbit 54%, duck 45.8%; center: rabbit 51.9%, duck 48.1%; right: rabbit 49.9%, duck 50.1%) and the difference between them (Figure 9). Unlike the first percept, the total duration of perceiving the duck and rabbit throughout the run was about the same at each location. There was no significant difference between them (left and right, t(45) = 1.29, p = 0.2; left and center, t(45) = 1.015, p = 0.32; and center and right, t(45) = 0.72, p = 0.47). The results support our prediction that the inward bias diminishes over time. 
Figure 9.
 
The percentage of the total duration of the duck or rabbit percept for the duck/rabbit figure at the left, center, and right locations. The orange and cyan bars indicate the percentages of duck and rabbit percepts, respectively.
Figure 9.
 
The percentage of the total duration of the duck or rabbit percept for the duck/rabbit figure at the left, center, and right locations. The orange and cyan bars indicate the percentages of duck and rabbit percepts, respectively.
Duration of percept
Behaviorally, the observers reported seeing the rabbit for, on average, 2,908, 2,241, and 2,512 ms and seeing the duck 2,343, 1,879, and 2,156 ms for the left, center, and right conditions, respectively. A repeated-measures ANOVA showed that there was no significant location effect (F(2, 44) = 1.04, p = 0.36) for the duration difference between the two percepts. The inward bias did not seem to affect the duration of the percept. Some readers may be interested in whether stimulus size would make a percept stay longer, as a larger image size may require a longer distance for the eye or attention to travel. We compared the duration between percept shift in the center condition here and that in Experiment 1, which used a larger stimulus size but placed at the same location, and found that the size had a significant effect on percept duration (t(22) = 8.9, p < 0.0001). 
Discussion
Our results confirmed the context effect on the perception of bistable figures. The duck/rabbit figure is more likely to be seen as facing inward (Chen & Scholl, 2014; Palmer et al., 2008). However, this preference seems to only affect the first percept of the figure as the overall duration of each percept was about the same. 
Chen and Scholl (2014) showed a significant difference in the rabbit/duck judgment between the left and right conditions, even when the response was pooled throughout the complete session. This may be because each session in their experiment only lasted 15 s, which is half of the 30-s duration we used. The longer duration allowed sufficient time for the observers to experience more percept reversals that balanced the proportion of each percept. To confirm this, we compared the accumulated percentages of perceiving the rabbit in bins of 0 to 10 s and 0 to 20 s in each session between the left and right conditions. As shown in Figure 10, for the left condition, the percentage of perceiving the rabbit for the first response was the highest (65.2%) and decreased in the bins of 0 to 10 s (58.2%), 0 to 20 s (54.4%), and 0 to 30 s (54.2%), respectively. Conversely, for the right condition, the percentage of perceiving the rabbit was lowest (32.6%) for the first percept, increasing gradually from 0 to 10 s (46.7%), 0 to 20 s (49.1%), and 0 to 30 s (49.9%). Paired t tests comparing the left and right conditions showed a significant difference in the accumulated percentages of perceiving the rabbit during 0 to 10 s (t(45) = 2.36, p < 0.05) but not during 0 to 20 s (t(45) = 1.8, p = 0.079). As the observation time increased, the percentage of perceiving the rabbit at each location gradually converged. Thus, the inward bias in perceiving the duck/rabbit figure weakens with prolonged observation. Similar attenuation of the bias in the first percept after prolonged viewing has been reported in other multistable perception (Carter & Cavanagh, 2007; Chong & Blake, 2006; Hupé & Pressnitzer, 2012; Wegner, Grenzebach, Bendixen, & Einhäuser, 2021). 
Figure 10.
 
The accumulated percentages of perceiving rabbit for a different time interval in the left (blue curve) and right (red curve) conditions. Error bars represent 1 standard error of measurement. *p < 0.05, **p < 0.01 (Bonferroni corrected).
Figure 10.
 
The accumulated percentages of perceiving rabbit for a different time interval in the left (blue curve) and right (red curve) conditions. Error bars represent 1 standard error of measurement. *p < 0.05, **p < 0.01 (Bonferroni corrected).
The weakening of inward bias with prolonged observation might be understood in the context of IOR. The initial dominance of one particular percept suggests that inward bias can alter the salience of an image feature and, in turn, the percept on a bistable figure. However, as the observing time increases, the inhibition to the focused location also increases. This reduces the saliency of the first fixation and shifts the focus to the next location. Eventually, such an IOR-induced shift drowns the effect of the initial preference and balances the probability of perceiving the two interpretations. Thus, the total duration proportion of the two percepts throughout a run showed no significant difference as that of the first percept. 
Experiment 3
In Experiment 1, we showed the related time course of the percept reversal and the fixation shift between the ROIs. Here, we further tested whether the fixation shift could result from the decrease in the local saliency. The phenomenon of IOR (Itti & Koch, 2000) suggested that after a prolonged fixation, the inhibition of the neural mechanisms responsive to the current fixated area increases, which leads to reduced saliency of the fixed image component. The fixation then shifts to the next most salient location. We added a white circular mask to remove the saliency of the fixated area, which is the consequence of the inhibition, and to observe whether the number of fixation shifts and percept reversal frequencies would both increase compared to Experiment 1
Method
Observers
Twenty observers (9 women) from Experiments 1 and 2 were recruited for Experiment 3. They were aged between 21 and 33 years. 
Stimuli
Experiment 3 contained all of the stimuli from Experiments 1 and 2. Therefore, there were five conditions in total, including duck/rabbit (big), duck/rabbit (small, left), duck/rabbit (small, center), duck/rabbit (small, right), and the Necker cube. Each condition contained two runs for a total of 10 runs. All the runs were randomly presented to the observers. 
Procedure
The procedure was almost the same as that of Experiment 1, except that a white circular mask would appear at the location the observers fixated on to interrupt the foveal viewing. This removed the saliency of the fixated region as all features there were blocked from view. The mask had a diameter of 4.2°, larger than the 2° diameter of the fovea (Wandell, 1995), considering the approximate 1° accuracy of the eye tracker. For each frame, the center of the mask was placed at the eye-fixation position at the beginning of the previous frame. We also added a practice session to help the observers become familiar with the presence of the mask while viewing the figure. The practice session consisted of two runs, each lasting 15 s. The total duration of Experiment 3 was approximately 25 min. 
Results
Fixation density
Figure 11 displays the average density maps of eye fixations under the two alternative percepts for all five conditions in Experiment 3. The fixation patterns here closely resembled those in Experiments 1 and 2, suggesting that adding the mask did not significantly impact the locations of fixation. In line with this, the ROIs determined in Experiments 1 and 2 were also employed in Experiment 3
Figure 11.
 
The average fixation density maps for the two percepts in all five conditions in Experiment 3. From top to bottom: duck/rabbit (big), the Necker cube, duck/rabbit (small, left), duck/rabbit (small, center), and duck/rabbit (small, right). The meanings of pseudocolors and symbols are the same as in Figure 2.
Figure 11.
 
The average fixation density maps for the two percepts in all five conditions in Experiment 3. From top to bottom: duck/rabbit (big), the Necker cube, duck/rabbit (small, left), duck/rabbit (small, center), and duck/rabbit (small, right). The meanings of pseudocolors and symbols are the same as in Figure 2.
Percept reversal frequency
As shown in Figure 12, percept reversal frequency was determined by counting the number of times the observers moved the joystick from left to right or vice versa per experimental run. For a comparison, the data of the same observers in Experiments 1 to 3 were analyzed by a repeated-measures ANOVA, which showed a statistically significant mask effect (F(1, 171) = 150.6, p < 0.001). We also conducted paired t tests to compare percept-reversal frequencies between Experiments 3 and Experiment 1 or 2 for each of the five conditions. The results revealed a significantly higher frequency in percept reversal during foveal masking across all conditions: t(19) = 3.93, 4.10, 4.37, 5.76, and 5.49 for the Necker cube, duck/rabbit (big), duck/rabbit (small, center), duck/rabbit (small, left), and duck/rabbit (small, right), respectively (all p < 0.001). The blocking of the fixated area did increase the frequency of percept reversal. 
Figure 12.
 
The frequency of percept reversals in one experimental run of 30 s. The blue bars indicate the result from Experiments 1 and 2, and the yellow bars indicate that from Experiment 3. DR denotes the duck/rabbit figure conditions. Error bar = 1 standard error. ***p < 0.001.
Figure 12.
 
The frequency of percept reversals in one experimental run of 30 s. The blue bars indicate the result from Experiments 1 and 2, and the yellow bars indicate that from Experiment 3. DR denotes the duck/rabbit figure conditions. Error bar = 1 standard error. ***p < 0.001.
Fixation shift frequency
The frequency of fixation shift between the two different ROIs was calculated for each run and observer. Five paired t tests were conducted to compare fixation shift frequencies between Experiments 3 and Experiment 1 or 2 for each condition. Figure 13 shows the results. Again, a repeated-measures ANOVA showed a significant mask effect (F(1, 171) = 94.17, p < 0.001). For all duck/rabbit conditions, the observers’ fixations shifted between different ROIs, occurring significantly more frequently in Experiment 3 compared to Experiments 1 and 2: t(19) = 2.31, p = 0.016 < α = .02, for duck/rabbit (big) and t(19) = 9.79, 13.01, and 7.96, p < 0.001, for duck/rabbit (small, center), duck/rabbit (small, left), and duck/rabbit (small, right), respectively. This result is aligned with our prediction that both fixation shifts and percept-reversal frequencies would increase in Experiment 3 compared to Experiments 1 and 2. It is noteworthy that the conditions with a higher frequency of perceptual reversal (Figure 12) also exhibit a higher rate of fixation shifts from one ROI to another (Figure 13). Specifically, the correlation between the frequencies of perceptual reversal and eye fixation shifts for the averaged data was 0.85 (df = 18, p < 0.001). 
Figure 13.
 
The frequency of fixation shift between ROIs. The blue bars indicate the result from Experiments 1 and 2, and the yellow bars indicate that from Experiment 3. DR denotes the duck/rabbit figure conditions. Error bar = 1 standard error.
Figure 13.
 
The frequency of fixation shift between ROIs. The blue bars indicate the result from Experiments 1 and 2, and the yellow bars indicate that from Experiment 3. DR denotes the duck/rabbit figure conditions. Error bar = 1 standard error.
Discussion
Introducing a mask to block the fixated region of the image increased the frequency of both the shift of FOA and perceptual reversal. Blocking the fixated region forces the fixation to shift to another salient area. However, the mask then followed the eye movement and blocked this new fixated region again, prompting another shift. This process thus resulted in a higher frequency of fixation shift, which in turn affected the frequency of percept reversals. 
From this, we can infer that during free viewing of bistable figures, once observers attend to the same location for a while and the local saliency is transiently suppressed due to IOR, their FOA will shift to another salient region, which carries the features that lead to the alternative percept. Again, after a while, the saliency of this new salient region is suppressed while the original location's saliency gradually recovers. Visual attention then returns to the original location. This transient decrease in the local saliency of the attended region causes our FOA to continually shift between different regions, just as the effect observed with the white mask introduced in Experiment 3. This also results in spontaneous percept reversals because different regions carry features, leading to diverse percepts. 
We also found that even when the observers had limited, if any, information in the fovea region, the fixation density maps (Figure 11) were quite similar to those in the unmasked conditions (Figure 2). It might be because the observers acquired information by continuously moving their eyes so that they could piece together the local features using their peripheral vision. In the context of IOR, this means that the saliency maps are computed before the shift of fixation. Without this precomputation, no salient features would be available to guide eye movements. Thus, the saliency maps for an image remain the same regardless of whether the fovea vision is occluded. This results in similar fixation patterns for the masked (Figure 11) and the unmasked conditions (Figure 2) observed in our experiment. 
General discussion
In Experiment 1, we observed distinct fixation patterns for different percepts in both the duck/rabbit figure and the Necker cube. We then examined the time course for the eye movement related to the finger response. The results showed that fixation shifts preceded percept reversals. This supports the idea that perception is influenced by gaze direction (Kawabata et al., 1978; Meng & Tong, 2004). In Experiment 2, the results showed an initial bias in the perception of the duck/rabbit figure based on its location (Chen & Scholl, 2014). However, this effect diminished over time. The percentage of perceiving each percept gradually converged. In Experiment 3, we found that masking the attended region led to more fixation shifts across ROIs and, in turn, more frequent percept reversals. This result further confirms that the shift of FOA can be caused by the decrease of local saliency, which is consistent with the assumption of IOR (Itti & Koch, 2000; Itti et al., 1998). 
A further examination showed that the saliency map cannot predict the fixation patterns on the bistable figures very well. The saliency may depend more on the meaningfulness than on the early visual features (Henderson & Hayes, 2017). It could also explain why inward bias occurs while perceiving bistable figures without altering the physical structure of an image. We tend to interpret the figure as facing inward when it appears near the frame border because of our aesthetic preference (Palmer et al., 2008). That is, the feature that contains consistent information with this tendency could become more meaningful. Therefore, its saliency will increase. This area will thus attract our attention in the first place. However, this dominance cannot persist for long. IOR eventually leads to a shift of FOA and, in turn, a spontaneous change in perception, which balances the proportion of the two percepts in the long run. Even though the first percept could be determined by inward bias, neither percept would continuously prevail due to IOR. 
The effect of IOR was further confirmed in Experiment 3. Introducing a mask to the fixated region can result in more fixation shifts, leading to a higher frequency of percept reversals. The mask's effects mirrored those of IOR, as both force our FOA to shift between different regions. This suggests that when we are freely viewing a bistable figure, the spontaneous percept reversal may also be attributed to IOR, as it prevents our FOA from staying at the same location and thus prevents our perception from maintaining stability. 
In sum, we investigated the relationship between gaze direction, saliency, and perception in bistable figures. We demonstrated that fixation shifts precede percept reversals, supporting the notion that gaze influences perception. Certain aspects of the fixation shift align with the concept of IOR, such as the alternation of fixation between salient regions and the promotion of the eye-fixation shifts by the reduction of responses at the fixation. However, in our results, saliency does not seem to be derived from early vision features, such as contrast or luminance, but rather from the meaningfulness of the features that support an interpretation of the image. 
Acknowledgments
Supported by NSTC 112-2423-H-002-002 to C.-C.C. 
Commercial relationships: none. 
Corresponding author: Chien-Chung Chen. 
Address: Department of Psychology, National Taiwan University, 1., Sec. 4, Roosevelt Rd., Da'an District, Taipei 106319, Taiwan. 
References
Abudarham, N., & Yovel, G. (2016). Reverse engineering the face space: Discovering the critical features for face identification. Journal of Vision, 16(3), 40, https://doi.org/10.1167/16.3.40. [PubMed]
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436. [PubMed]
Carter, O., & Cavanagh, P. (2007). Onset rivalry: Brief presentation isolates an early independent phase of perceptual competition. PLoS One, 2(4), e343, https://doi.org/10.1371/journal.pone.0000343. [PubMed]
Cerf, M., Harel, J., Einhäuser, W., & Koch, C. (2007). Predicting human gaze using low-level saliency combined with face detection. Advances in Neural Information Processing Systems, 20, https://doi.org/10.1109/WACV.2012.6163035.
Chen, Y.-C., & Scholl, B. J. (2014). Seeing and liking: Biased perception of ambiguous figures consistent with the “inward bias” in aesthetic preferences. Psychonomic Bulletin & Review, 21(6), 1444–1451, https://doi.org/10.3758/s13423-014-0617-2. [PubMed]
Chong, S. C., & Blake, R. (2006). Exogenous attention and endogenous attention influence initial dominance in binocular rivalry. Vision Research, 46(11), 1794–1803, https://doi.org/10.1016/j.visres.2005.10.031. [PubMed]
Crouzet, S. M., Kirchner, H., & Thorpe, S. J. (2010). Fast saccades toward faces: Face detection in just 100 ms. Journal of Vision, 10(4), 16, https://doi.org/10.1167/10.4.16.
Driver, J., & Baylis, G. C. (1996). Edge-assignment and figure-ground segmentation in short-term visual matching. Cognitive Psychology, 31, 248–306. [PubMed]
Einhäuser, W., Martin, K. A., & König, P. (2004). Are switches in perception of the Necker cube related to eye position? European Journal of Neuroscience, 20(10), 2811–2818, https://doi.org/10.1111/j.1460-9568.2004.03722.x.
Elazary, L., & Itti, L. (2008). Interesting objects are visually salient. Journal of Vision, 8(3), 1–15, https://doi.org/10.1167/8.3.3. [PubMed]
Ellis, S. R., & Stark, L. (1978). Eye movements during the viewing of Necker cubes. Perception, 7(5), 575–581, https://doi.org/10.1068/p070575. [PubMed]
Fischer, B., & Ramsperger, E. (1984). Human express saccades: Extremely short reaction times of goal directed eye movements. Experimental Brain Research, 57(1), 191–195, https://doi.org/10.1007/BF00231145. [PubMed]
Foulsham, T., & Kingstone, A. (2013). Optimal and preferred eye landing positions in objects and scenes. Quarterly Journal of Experimental Psychology, 66(9), 1707–1728, https://doi.org/10.1080/17470218.2012.762798.
Foulsham, T., & Underwood, G. (2007). How does the purpose of inspection influence the potency of visual salience in scene perception? Perception, 36(8), 1123–1138, https://doi.org/10.1068/p5659. [PubMed]
Glen, J. S. (1940). Ocular movements in reversibility of perspective. The Journal of General Psychology, 23(2), 243–281, https://doi.org/10.1080/00221309.1940.10544334.
Goolkasian, P., & Woodberry, C. (2010). Priming effects with ambiguous figures. Attention, Perception, & Psychophysics, 72(1), 168–178, https://doi.org/10.3758/APP.72.1.168. [PubMed]
Haith, A., & Bestmann, S. (2020). Preparation of movement. In Poeppel, D. & Gazzaniga, G. R. M. M. S. (Eds.), The cognitive neurosciences (6th ed., pp. 541–548). Cambridge, MA: The MIT Press, https://doi.org/10.7551/mitpress/11442.003.0059.
Henderson, J. M., Brockmole, J. R., Castelhano, M. S., & Mack, M. (2007). Visual saliency does not account for eye movements during visual search in real-world scenes. In van Gompel, R. P. G., Fischer, M., Murray, W. S., & Hill, R. L. (Eds.), Eye movements: A window on mind and brain (pp. 537–562). Amsterdam: Elsevier, https://doi.org/10.1016/B978-008044980-7/50027-6.
Henderson, J. M., & Hayes, T. R. (2017). Meaning-based guidance of attention in scenes as revealed by meaning maps. Nature Human Behaviour, 1(10), 743–747, https://doi.org/10.1038/s41562-017-0208-0. [PubMed]
Hessels, R. S. (2020). How does gaze to faces support face-to-face interaction? A review and perspective. Psychonomic Bulletin & Review, 27(5), 856–881, https://doi.org/10.3758/s13423-020-01715-w. [PubMed]
Hoffman, J. E., & Subramaniam, B. (1995). The role of visual attention in saccadic eye movements. Perception & Psychophysics, 57(6), 787–795, https://doi.org/10.3758/BF03206794. [PubMed]
Hupé, J. M., & Pressnitzer, D. (2012). The initial phase of auditory and visual scene analysis. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 367(1591), 942–953, https://doi.org/10.1098/rstb.2011.0368. [PubMed]
Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40(10–12), 1489–1506, https://doi.org/10.1016/S0042-6989(99)00163-7. [PubMed]
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259, https://doi.org/10.1109/34.730558.
Kawabata, N. (1986). Attention and depth perception. Perception, 15(5), 563–572, https://doi.org/10.1068/p150563. [PubMed]
Kawabata, N., Yamagami, K., & Noakl, M. (1978). Visual fixation points and depth perception. Vision Research, 18(7), 853–854, https://doi.org/10.1016/0042-6989(78)90127-X. [PubMed]
Kietzmann, T. C., Geuter, S., & König, P. (2011). Overt visual attention as a causal factor of perceptual awareness. PLoS ONE, 6(7), e22614, https://doi.org/10.1371/journal.pone.0022614. [PubMed]
Klein, R. M. (2000). Inhibition of return. Trends in Cognitive Sciences, 4(4), 138–147, https://doi.org/10.1016/s1364-6613(00)01452-2.
Koch, C., & Ullman, S. (1987). Shifts in selective visual attention: Towards the underlying neural circuitry. In Vaina, L. M. (Ed.), Matters of intelligence: Conceptual structures in cognitive neuroscience (pp. 115–141). Amsterdam, Netherlands: Springer, https://doi.org/10.1007/978-94-009-3833-5_5.
Kornmeier, J., & Bach, M. (2012). Ambiguous figures—what happens in the brain when perception changes but not the stimulus. Frontiers in Human Neuroscience, 6, 51, https://doi.org/10.3389/fnhum.2012.00051. [PubMed]
Langton, S. R., Law, A. S., Burton, A. M., & Schweinberger, S. R. (2008). Attention capture by faces. Cognition, 107(1), 330–342, https://doi.org/10.1016/j.cognition.2007.07.012. [PubMed]
Long, G. M., & Toppino, T. C. (2004). Enduring interest in perceptual ambiguity: Alternating views of reversible figures. Psychological Bulletin, 130(5), 748, https://doi.org/10.1037/0033-2909.130.5.748. [PubMed]
Meng, M., & Tong, F. (2004). Can attention selectively bias bistable perception? Differences between binocular rivalry and ambiguous figures. Journal of Vision, 4(7), 2, https://doi.org/10.1167/4.7.2.
Nakatani, H., Orlandi, N., & van Leeuwen, C. (2011). Precisely timed oculomotor and parietal EEG activity in perceptual switching. Cognitive Neurodynamics, 5(4), 399–409, https://doi.org/10.1007/s11571-011-9168-7. [PubMed]
Necker, L. A. (1832). LXI. Observations on some remarkable optical phænomena seen in Switzerland; and on an optical phænomenon which occurs on viewing a figure of a crystal or geometrical solid. The London and Edinburgh Philosophical Magazine and Journal of Science, 1(5), 329–337, https://doi.org/10.1080/14786443208647909.
Nuthmann, A., & Henderson, J. M. (2010). Object-based attentional selection in scene viewing. Journal of Vision, 10(8), 20, https://doi.org/10.1167/10.8.20. [PubMed]
Palmer, S. E., Gardner, J. S., & Wickens, T. D. (2008). Aesthetic issues in spatial composition: Effects of position and direction on framing single objects. Spatial Vision, 21(3), 421, https://doi.org/10.1163/156856808784532662. [PubMed]
Parkhurst, D., & Niebur, E. (2003). Scene content selected by active vision. Spatial Vision, 16(2), 125–154, https://doi.org/10.1163/15685680360511645. [PubMed]
Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42(1), 107–123, https://doi.org/10.1016/S0042-6989(01)00250-4. [PubMed]
Peterson, M. A., & Gibson, B. S. (1991). Directing spatial attention within an object: Altering the functional equivalence of shape description. Journal of Experimental Psychology: Human Perception and Performance, 17(1), 170, https://doi.org/10.1037/0096-1523.17.1.170. [PubMed]
Polgári, P., Causin, J.-B., Weiner, L., Bertschy, G., & Giersch, A. (2020). Novel method to measure temporal windows based on eye movements during viewing of the Necker cube. PLoS One, 15(1), e0227506, https://doi.org/10.1371/journal.pone.0227506. [PubMed]
Posner, M. I., & Cohen, Y. (1984). Components of visual orienting. In Bouma, H., Bouwhuis, D. (Eds.), Attention and performance X: Control of language processes. (pp. 531–556). Hillsdale, NJ: Erlbaum.
Rhodes, G. (1988). Looking at faces: First-order and second-order features as determinants of facial appearance. Perception, 17(1), 43–63, https://doi.org/10.1068/p17004. [PubMed]
Rock, I., Gopnik, A., & Hall, S. (1994). Do young children reverse ambiguous figures? Perception, 23(6), 635–644, https://doi.org/10.1068/p230635. [PubMed]
Sato, F., Laeng, B., Nakauchi, S., & Minami, T. (2020). Cueing the Necker cube: Pupil dilation reflects the viewing-from-above constraint in bistable perception. Journal of Vision, 20(4), 7, https://doi.org/10.1167/jov.20.4.7. [PubMed]
Sekiguchi, T. (2011). Individual differences in face memory and eye fixation patterns during face learning. Acta Psychologica, 137(1), 1–9, https://doi.org/10.1016/j.actpsy.2011.01.014. [PubMed]
Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye movements and spatial attention. The Quarterly Journal of Experimental Psychology Section A, 38(3), 475–491, https://doi.org/10.1080/14640748608401609.
Stoll, J., Thrun, M., Nuthmann, A., & Einhäuser, W. (2015). Overt attention in natural scenes: objects dominate features. Vision Research, 107, 36–48, https://doi.org/10.1016/j.visres.2014.11.006. [PubMed]
Toppino, T. C. (2003). Reversible-figure perception: Mechanisms of intentional control. Perception & Psychophysics, 65, 1285–1295, https://doi.org/10.3758/BF03194852. [PubMed]
Torrey, C. C. (1970). Trace localization and the recognition of visual form. The American Journal of Psychology, 83(4), 591–600, https://doi.org/10.2307/1420692.
Underwood, G., Foulsham, T., van Loon, E., Humphreys, L., & Bloyce, J. (2006). Eye movements during scene inspection: A test of the saliency map hypothesis. European Journal of Cognitive Psychology, 18(3), 321–342, https://doi.org/10.1080/09541440500236661.
van Dam, L. C. J., & van Ee, R. (2005). The role of eye movements in bistability from perceptual and binocular rivalry and the role of voluntary control. Journal of Vision, 5(8), 704, https://doi.org/10.1167/5.8.704.
van Dam, L. C. J., & van Ee, R. (2006). The role of saccades in exerting voluntary control in perceptual and binocular rivalry. Vision Research, 46(6), 787–799, https://doi.org/10.1016/j.visres.2005.10.011. [PubMed]
Vecera, S. P., Flevaris, A. V., & Filapek, J. C. (2004). Exogenous spatial attention influences figure-ground assignment. Psychological Science, 15, 20–26. [PubMed]
Wandell, B. A. (1995). Foundations of vision: Behavior, neuroscience and computation. Sunderland, MA: Sinauer.
Wegner, T. G. G., Grenzebach, J., Bendixen, A., & Einhäuser, W. (2021). Parameter dependence in visual pattern-component rivalry at onset and during prolonged viewing. Vision Research, 182, 69–88, https://doi.org/10.1016/j.visres.2020.12.006. [PubMed]
Figure 1.
 
The experimental procedure in a single run.
Figure 1.
 
The experimental procedure in a single run.
Figure 2.
 
The average fixation density maps under the two percepts for the duck/rabbit figure (top) and the Necker cube (bottom). The left and center panels show the fixation density map of different percepts of each figure. The pseudocolor indicates fixation density. The right-most panel shows the difference between the two density maps of each figure (rabbit minus duck for the top row and bottom view minus the top view for the bottom row). The unit of the color bar is the fixation duration (ms) per pixel. The red rectangle indicates the ROI for the rabbit or the bottom view, and the blue rectangle indicates the ROI for the duck or the top view. In the duck/rabbit figure, the green rectangle represents the ROI with fixations for both percepts.
Figure 2.
 
The average fixation density maps under the two percepts for the duck/rabbit figure (top) and the Necker cube (bottom). The left and center panels show the fixation density map of different percepts of each figure. The pseudocolor indicates fixation density. The right-most panel shows the difference between the two density maps of each figure (rabbit minus duck for the top row and bottom view minus the top view for the bottom row). The unit of the color bar is the fixation duration (ms) per pixel. The red rectangle indicates the ROI for the rabbit or the bottom view, and the blue rectangle indicates the ROI for the duck or the top view. In the duck/rabbit figure, the green rectangle represents the ROI with fixations for both percepts.
Figure 3.
 
The fixation density map for every 200-ms period with a 100-ms bin width (± 50 ms) before the behavioral response from rabbit to duck (a) and from duck to rabbit (b). The red and blue rectangles are the rabbit and duck ROIs, respectively.
Figure 3.
 
The fixation density map for every 200-ms period with a 100-ms bin width (± 50 ms) before the behavioral response from rabbit to duck (a) and from duck to rabbit (b). The red and blue rectangles are the rabbit and duck ROIs, respectively.
Figure 4.
 
The change of fixation density for viewing the duck/rabbit figure from 1,000 ms before the finger response. The blue and red curves indicate the fixation density within the starting and the destination ROI, respectively. The dashed line indicates the time of equal fixation for the ROIs. The shaded area denotes the 1 standard error of measurement.
Figure 4.
 
The change of fixation density for viewing the duck/rabbit figure from 1,000 ms before the finger response. The blue and red curves indicate the fixation density within the starting and the destination ROI, respectively. The dashed line indicates the time of equal fixation for the ROIs. The shaded area denotes the 1 standard error of measurement.
Figure 5.
 
The fixation density map for every 200-ms period with a 100-ms bin width (± 50 ms) before the behavioral response for the top view (a) and bottom view (b). The red and blue rectangles denote the ROI of the bottom and top view, respectively.
Figure 5.
 
The fixation density map for every 200-ms period with a 100-ms bin width (± 50 ms) before the behavioral response for the top view (a) and bottom view (b). The red and blue rectangles denote the ROI of the bottom and top view, respectively.
Figure 6.
 
The change of fixation density for viewing the Necker cube from 1,000 ms before the finger response. The meanings of the symbols and curves are the same as in Figure 4.
Figure 6.
 
The change of fixation density for viewing the Necker cube from 1,000 ms before the finger response. The meanings of the symbols and curves are the same as in Figure 4.
Figure 7.
 
The average fixation density maps for the two percepts of the duck/rabbit figure placed at the left (top row), center (middle row), and right (bottom row) locations. The pseudocolor indicates the fixation duration (ms) per pixel. The red and blue rectangles are the rabbit and duck ROIs, respectively. The green rectangles represent the ROI for the overlapped area of the two density maps.
Figure 7.
 
The average fixation density maps for the two percepts of the duck/rabbit figure placed at the left (top row), center (middle row), and right (bottom row) locations. The pseudocolor indicates the fixation duration (ms) per pixel. The red and blue rectangles are the rabbit and duck ROIs, respectively. The green rectangles represent the ROI for the overlapped area of the two density maps.
Figure 8.
 
The percentage of the first percept of the duck/rabbit figure at the left, center, and right locations. The orange and cyan bars indicate the percentages of duck and rabbit percepts, respectively. **p < 0.01 Bonferroni corrected.
Figure 8.
 
The percentage of the first percept of the duck/rabbit figure at the left, center, and right locations. The orange and cyan bars indicate the percentages of duck and rabbit percepts, respectively. **p < 0.01 Bonferroni corrected.
Figure 9.
 
The percentage of the total duration of the duck or rabbit percept for the duck/rabbit figure at the left, center, and right locations. The orange and cyan bars indicate the percentages of duck and rabbit percepts, respectively.
Figure 9.
 
The percentage of the total duration of the duck or rabbit percept for the duck/rabbit figure at the left, center, and right locations. The orange and cyan bars indicate the percentages of duck and rabbit percepts, respectively.
Figure 10.
 
The accumulated percentages of perceiving rabbit for a different time interval in the left (blue curve) and right (red curve) conditions. Error bars represent 1 standard error of measurement. *p < 0.05, **p < 0.01 (Bonferroni corrected).
Figure 10.
 
The accumulated percentages of perceiving rabbit for a different time interval in the left (blue curve) and right (red curve) conditions. Error bars represent 1 standard error of measurement. *p < 0.05, **p < 0.01 (Bonferroni corrected).
Figure 11.
 
The average fixation density maps for the two percepts in all five conditions in Experiment 3. From top to bottom: duck/rabbit (big), the Necker cube, duck/rabbit (small, left), duck/rabbit (small, center), and duck/rabbit (small, right). The meanings of pseudocolors and symbols are the same as in Figure 2.
Figure 11.
 
The average fixation density maps for the two percepts in all five conditions in Experiment 3. From top to bottom: duck/rabbit (big), the Necker cube, duck/rabbit (small, left), duck/rabbit (small, center), and duck/rabbit (small, right). The meanings of pseudocolors and symbols are the same as in Figure 2.
Figure 12.
 
The frequency of percept reversals in one experimental run of 30 s. The blue bars indicate the result from Experiments 1 and 2, and the yellow bars indicate that from Experiment 3. DR denotes the duck/rabbit figure conditions. Error bar = 1 standard error. ***p < 0.001.
Figure 12.
 
The frequency of percept reversals in one experimental run of 30 s. The blue bars indicate the result from Experiments 1 and 2, and the yellow bars indicate that from Experiment 3. DR denotes the duck/rabbit figure conditions. Error bar = 1 standard error. ***p < 0.001.
Figure 13.
 
The frequency of fixation shift between ROIs. The blue bars indicate the result from Experiments 1 and 2, and the yellow bars indicate that from Experiment 3. DR denotes the duck/rabbit figure conditions. Error bar = 1 standard error.
Figure 13.
 
The frequency of fixation shift between ROIs. The blue bars indicate the result from Experiments 1 and 2, and the yellow bars indicate that from Experiment 3. DR denotes the duck/rabbit figure conditions. Error bar = 1 standard error.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×