Free
Letters to the Editor  |   August 2013
Does spatio-temporal filtering account for nonretinotopic motion perception? Comment on Pooresmaeili, Cicchini, Morrone, and Burr (2012)
Author Affiliations
Journal of Vision August 2013, Vol.13, 19. doi:https://doi.org/10.1167/13.10.19
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Aaron M. Clarke, Marc Repnow, Haluk Öğmen, Michael H. Herzog; Does spatio-temporal filtering account for nonretinotopic motion perception? Comment on Pooresmaeili, Cicchini, Morrone, and Burr (2012). Journal of Vision 2013;13(10):19. https://doi.org/10.1167/13.10.19.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
How can nonretinotopic motion be explained? Pooresmaeili, Cicchini, Morrone, and Burr (2012) proposed a one-stage rather than a two-stage model. Whereas we find the approach interesting, we found serious flaws that make it impossible to evaluate the validity of their model. 
Humans make three to four eye movements each second and constantly move their heads and bodies. Despite these movements, the world appears stable. The generally accepted explanation for this phenomenon is that the brain uses an efference copy of movement commands to discount for the self-generated retinal motion (Monteon, Martinez-Trujillo, Wang, & Crawford, 2005; Monteon, Wang, Martinez-Trujillo, & Crawford, 2013; Sommer & Wurtz, 2002; Von Helmholtz, 1866). For example, if a horizontal eye movement to the left is executed, the human brain can compute the retinal image shift even before the eye movement and use this information to suppress the spurious motion cues that occur. However, in addition to our own predictable movements, objects often move in a complex and unpredictable manner, where no efference copy can be used. For example, a reflector on a moving bicycle wheel appears to be going round in circles, even though the reflector's physical trajectory follows a cycloidal path (Duncker, 1929; Johansson, 1973). As with eye movements, we dissociate the object's global motion from the relative motions of its parts. 
To test for retinotopic versus nonretinotopic processing where no efference copy can be used, we have previously created a test based on the Ternus-Pikler display (Boi, Öğmen, Krummenacher, Otto, & Herzog, 2009). We presented subjects with two disks (Figure 1A). In the left disk, a white dot moved up-down, and in the right disk it moved left-right. Retinotopic motion was perceived. When we added a third disk and created the percept of group motion where the three disks appear to move in tandem, a rotating central dot was perceived. The rotation is the “nonretinotopic” combination of the two retinotopic dot motions (Figure 1B). This finding can be conceptualized as a two-stage process wherein first the disk motion is processed to establish a reference frame against which dot rotation then emerges. 
Figure 1
 
Ternus-Pikler display. (A) In the “no-flankers” condition, two black disks are presented at the same locations separated by an inter-stimulus interval (ISI). A white dot is perceived to be moving up-down in the left disk, and left-right in the right disk (Movie 1). (B) In the group motion condition, a third disk is added either to the left or right of the center disks. When the ISI is sufficiently long, e.g., 100 ms, three disks are perceived to move back and forth as a group. The group-motion creates a nonretinotopic reference frame (see arrows that were not shown in the actual display). Features are integrated according to this nonretinotopic reference frame. A rotating dot is perceived in the center disk of a given frame which results from the nonretinotopic integration of the retinotopic dot motions of the two center-most disks. The only differences between A and B lie in the outer disks. The percept of rotation is illusory in the sense that, retinotopically, only up-down and left-right motions are present. Attention is crucial. It is nearly impossible to attend to the retinotopic up-down or left-right motions in the group motion condition (Movie 2). (C) The “illusory” dot motion disappears when the ISI is reduced to 0 ms. Contrary to the group motion condition, the disks now are perceived at four positions: the two center disks at the center positions and the outer disk jumping from the far left to the far right position. As in the no-flankers condition, up-down motion is perceived in the center-left disk and right-left motion in the center-right disk.
Figure 1
 
Ternus-Pikler display. (A) In the “no-flankers” condition, two black disks are presented at the same locations separated by an inter-stimulus interval (ISI). A white dot is perceived to be moving up-down in the left disk, and left-right in the right disk (Movie 1). (B) In the group motion condition, a third disk is added either to the left or right of the center disks. When the ISI is sufficiently long, e.g., 100 ms, three disks are perceived to move back and forth as a group. The group-motion creates a nonretinotopic reference frame (see arrows that were not shown in the actual display). Features are integrated according to this nonretinotopic reference frame. A rotating dot is perceived in the center disk of a given frame which results from the nonretinotopic integration of the retinotopic dot motions of the two center-most disks. The only differences between A and B lie in the outer disks. The percept of rotation is illusory in the sense that, retinotopically, only up-down and left-right motions are present. Attention is crucial. It is nearly impossible to attend to the retinotopic up-down or left-right motions in the group motion condition (Movie 2). (C) The “illusory” dot motion disappears when the ISI is reduced to 0 ms. Contrary to the group motion condition, the disks now are perceived at four positions: the two center disks at the center positions and the outer disk jumping from the far left to the far right position. As in the no-flankers condition, up-down motion is perceived in the center-left disk and right-left motion in the center-right disk.
Pooresmaeili et al. (2012), hereafter referred to as “PCMB,” attempted to design a simpler model to explain our nonretinotopic motion paradigm, where the Ternus-Pikler motion is not explicitly taken into account. To this end, PCMB came up with a new psychophysical experiment (Figure 2) and a model (Figure 3) based on spatio-temporal filtering. Whereas this is an interesting and parsimonious approach, we found several issues with both the experimental design and the modeling. 
Figure 2
 
(A) In the two-bar condition, observers were asked to attend to the left bar. Orange outlines (not present in the actual display) mark the task-relevant, attended location. (B) In the three-bar condition, observers attended to the same retinotopic location as in the two-bar condition (again outlined in orange). (C) We tested six observers using the same stimuli and procedures as used in PCMB. Five out of our six observers showed a pattern of results similar to PCMB's. Upward and downward percepts varied with δ in both conditions, but were biased towards upward motion in the three-bar condition. (D) We used the same three-bar stimulus as in the previous condition, but observers attended to the nonretinotopic, blue outlined bars in B (shown by blue dots and blue fitting functions). Obviously, a very different pattern of results occurs compared to when observers attended to the retinotopic location. In general, performance depends less on δ when observers attend to the nonretinotopic motion than when they attend to the retinotopic motion (black curves, replotted from C, vs. blue curves). However, inter-observer variability is high. While observer S1 only perceived the nonretinotopic motion (the blue points all lined up at a proportion “upward” responses of 1.0), the other observers were influenced to varying degrees by the retinotopic motion because performance changed with δ. Observer S2's performance was nearly at chance level. It seems that this observer had problems with the paradigm in general.
Figure 2
 
(A) In the two-bar condition, observers were asked to attend to the left bar. Orange outlines (not present in the actual display) mark the task-relevant, attended location. (B) In the three-bar condition, observers attended to the same retinotopic location as in the two-bar condition (again outlined in orange). (C) We tested six observers using the same stimuli and procedures as used in PCMB. Five out of our six observers showed a pattern of results similar to PCMB's. Upward and downward percepts varied with δ in both conditions, but were biased towards upward motion in the three-bar condition. (D) We used the same three-bar stimulus as in the previous condition, but observers attended to the nonretinotopic, blue outlined bars in B (shown by blue dots and blue fitting functions). Obviously, a very different pattern of results occurs compared to when observers attended to the retinotopic location. In general, performance depends less on δ when observers attend to the nonretinotopic motion than when they attend to the retinotopic motion (black curves, replotted from C, vs. blue curves). However, inter-observer variability is high. While observer S1 only perceived the nonretinotopic motion (the blue points all lined up at a proportion “upward” responses of 1.0), the other observers were influenced to varying degrees by the retinotopic motion because performance changed with δ. Observer S2's performance was nearly at chance level. It seems that this observer had problems with the paradigm in general.
Figure 3
 
(A) The schematic shows the model as reported in PCMB. The stimulus was Fourier transformed, filtered for upward and downward motion, and inverse Fourier transformed. Next, the maximum of the upward and downward filter outputs were computed and subtracted, yielding “upwards motion energy.” The model, however, is more complex than reported (see Figure 5 in Appendix IV for full details). Inspection of their code shows that after the upward- and downward-motion filtering, the result is windowed (inset with superimposed red windows) and the maximum is taken over space at each time point. A second nonlinear operation is carried out, where the absolute value of the minimum up-down motion energy over time is subtracted from the maximum. Finally, the above operations are carried out for both the two-bar and three-bar stimuli, and the results are normalized by the maximum of the two outputs (see Appendix IV.1). (B) Simulation results for different filter widths (wd). Data points mark model outputs and solid lines mark best fitting cumulative Gaussian functions. Results and fits for the two-bar condition are plotted in red, while those for the three-bar condition are plotted in black. For each sub-plot, the x-axis plots the phase offset δ (see Figure 2), while the y-axis plots the model's upwards motion bias. (C) PCMB's human psychophysical data (solid curves) with superimposed model results (dashed curves) selected from the wd = 0.67 filter for the two-bar condition and from the wd = 1.01 filter for the three-bar condition (see black arrow). (D) Filter selection rather than filter sensitivity matters. Assume no filters can discriminate between the two- and three-bar stimuli. In this case, the black curves would, for example, be identical to the red curves. Still, we obtain very similar model outputs when we take the outputs from the wd = 0.67 and wd = 1.01 filters (see red arrows projecting from B to D). The very same situation occurs if there is no Ternus motion, i.e., when we filter the two-bar stimulus with the same choice of filters, see Figure 4B.
Figure 3
 
(A) The schematic shows the model as reported in PCMB. The stimulus was Fourier transformed, filtered for upward and downward motion, and inverse Fourier transformed. Next, the maximum of the upward and downward filter outputs were computed and subtracted, yielding “upwards motion energy.” The model, however, is more complex than reported (see Figure 5 in Appendix IV for full details). Inspection of their code shows that after the upward- and downward-motion filtering, the result is windowed (inset with superimposed red windows) and the maximum is taken over space at each time point. A second nonlinear operation is carried out, where the absolute value of the minimum up-down motion energy over time is subtracted from the maximum. Finally, the above operations are carried out for both the two-bar and three-bar stimuli, and the results are normalized by the maximum of the two outputs (see Appendix IV.1). (B) Simulation results for different filter widths (wd). Data points mark model outputs and solid lines mark best fitting cumulative Gaussian functions. Results and fits for the two-bar condition are plotted in red, while those for the three-bar condition are plotted in black. For each sub-plot, the x-axis plots the phase offset δ (see Figure 2), while the y-axis plots the model's upwards motion bias. (C) PCMB's human psychophysical data (solid curves) with superimposed model results (dashed curves) selected from the wd = 0.67 filter for the two-bar condition and from the wd = 1.01 filter for the three-bar condition (see black arrow). (D) Filter selection rather than filter sensitivity matters. Assume no filters can discriminate between the two- and three-bar stimuli. In this case, the black curves would, for example, be identical to the red curves. Still, we obtain very similar model outputs when we take the outputs from the wd = 0.67 and wd = 1.01 filters (see red arrows projecting from B to D). The very same situation occurs if there is no Ternus motion, i.e., when we filter the two-bar stimulus with the same choice of filters, see Figure 4B.
PCMB attempted to model our dot rotation paradigm shown in Figure 1. However, they have chosen a fundamentally different experimental design in which nonretinotopic motion is only indirectly addressed (Appendix I). It is unclear why PCMB chose such an indirect method of assessing nonretinotopic perception given that our original paradigm is much simpler leading to clear-cut differences between nonretinotopic and retinotopic motion percepts (see for example Boi et al., 2009, “Motion adaptation”). 
In addition to problems with the experimental paradigm, we found problems with their model (Appendix II). First, their model is based on arbitrarily selecting two different filters for two different stimulus conditions, rather than on optimizing filter sensitivity to the differences between the two stimuli. This filter selection allows the model to fit the psychophysical data even when the filters cannot discriminate between the experimental conditions (Appendix II.1). As such, it is clear that the model cannot inherently distinguish between retinotopic and nonretinotopic motion. Second, because of its indirect approach, the paradigm bears the potential risk of picking up artifactual signals (Appendix II.2). Third, the model is strongly overparameterized and, despite this overparametrization, we found significant differences between the model fits and the experimental data (Appendix II.3). Fourth, the model was not tested in any of its predictions nor on control conditions, which is usually considered standard in modeling, particularly given the number of parameters. Fifth, the model is not as simple or biologically plausible as claimed (Appendix II.4). Sixth, it is unclear whether the model can be extended in a straightforward fashion to other type of stimuli as claimed (Appendix II.5). 
PCMB stressed that their current model is an existence proof and that more complex stimuli can also be modeled in a similar fashion (Pooresmaeili et al., 2012, p. 13). Whether this is true cannot easily be verified, because the validity of the model is not easy to evaluate. However, we do not want to claim that it is impossible to find a “one-stage” spatiotemporal model that captures nonretinotopic motion perception, i.e., a model not requiring an explicit nonretinotopic processing stage. Ultimately, this is also a question of how complex a single stage of a model is allowed to be. At the moment, however, we suggest that to capture both the group motion of the disks and the dot rotation within the central disk, a two-stage process seems to be better suited. Here, for example, a low-level motion grouping process can determine a common motion representative for the group (cf. Duncker, 1929; Johansson, 1973). This common motion vector can then serve as a reference frame upon which nonretinotopic computations can take place (Öğmen & Herzog, 2010). 
Splitting up nonretinotopic processing into two processing stages, a general back-end for providing a reference frame and a front-end for specific feature processing, also seems to be more appropriate than a one-stage model because nonretinotopic processing is not just limited to motion stimuli. With the Ternus-Pikler display we have shown that fine-grained spatial information (Öğmen, Otto, & Herzog, 2006), visual search (Boi et al., 2009), and attention (Boi, Vergeer, Öğmen, & Herzog, 2011) are processed nonretinotopically, whereas adaptation occurs mainly in retinotopic coordinates (Boi et al., 2011). In addition, many kinds of motion processing occur in nonretinotopic coordinates, e.g., dot motion (Figure 1), grating motion (Figure 2), and ambiguous motion (Van Boxtel & Koch, 2012). In the same line, it was shown that binocular rivalry can occur in nonretinotopic coordinates (Vergeer, Boi, Öğmen, & Herzog, unpublished). Future research is needed to show how these types of nonretinotopic processing can be modeled without the use of an efference copy. 
Supplementary Materials
Acknowledgments
Aaron Clarke was funded by the Sinergia project “State representation in reward based learning in human healthy observers, schizophrenic patients, and models of perceptual learning” (project number CRSIK0 l22697) of the Swiss National Science Foundation (SNF). Haluk Öğmen was supported in part by NIH grant R01 EY018165. We would like to thank A. Pooresmaeili, G. M. Cicchini, M. C. Morone, and D. Burr for making their code available to us. 
Commercial relationships: none. 
Corresponding author: Aaron M. Clarke. 
Address: Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne, Switzerland. 
References
Anderson S. J. Burr D. C. (1987). Receptive field size of human motion detection units. Vision Research, 27 (4), 621–635. [CrossRef] [PubMed]
Bach M. (1996). The Freiburg visual acuity test—Automatic measurement of visual acuity. Optometry & Vision Science, 73, 49–53. [CrossRef]
Boi M. Öğmen H. Krummenacher J. Otto T. U. Herzog M. H. (2009). A (fascinating) litmus test for human retino- vs. non-retinotopic processing. Journal of Vision, 9 (13), 5, 1–11, http://www.journalofvision.org/content/9/13/5, doi:10.1167/9.13.5. [PubMed] [Article] [CrossRef] [PubMed]
Boi M. Vergeer M. Öğmen H. Herzog M. H. (2011). Nonretinotopic exogenous attention. Current Biology, 21, 1732–1737. [CrossRef] [PubMed]
Brainard D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. [CrossRef] [PubMed]
Duncker K. (1929). Über induzierte Bewegung. Psychologische Forschung, 12, 180–259. [CrossRef]
Johansson G. (1973). Visual perception of biological motion and a model for its analysis. Perception & Psychophysics, 14 (2), 201–211. [CrossRef]
Monteon J. A. Martinez-Trujillo J. C. Wang H. Crawford J. D. (2005). Cross-coupled adaptation of eye and head position commands in the primate gaze control system. Neuroreport, 16 (11), 1189–1192. [CrossRef] [PubMed]
Monteon J. A. Wang H. Martinez-Trujillo J. Crawford J. D. (2013). Frames of reference for eye–head gaze shifts evoked during frontal eye field stimulation. European Journal of Neuroscience, 37 (11), 1754–1765. [CrossRef] [PubMed]
Öğmen H. Herzog M. H. (2010). The geometry of visual perception: Retinotopic and nonretinotopic representations in the human visual system. Proceedings of the IEEE, 98 (3).
Öğmen H. Otto T. U. Herzog M. H. (2006). Perceptual grouping induces non-retinotopic feature attribution in human vision. Vision Research, 46 (19), 3234–32 42. [CrossRef] [PubMed]
Pelli D. G. (1997). The videotoolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. [CrossRef] [PubMed]
Pooresmaeili A. Cicchini G. M. Morrone M. C. Burr D. (2012). “Non-retinotopic processing” in Ternus motion displays modeled by spatiotemporal filters. Journal of Vision, 12 (1): 10, 1–15, http://www.journalofvision.org/content/12/1/10, doi:10.1167/12.1.10. [PubMed] [Article] [CrossRef] [PubMed]
Sommer M. A. Wurtz R. H. (2002). A pathway in primate brain for internal monitoring of movements. Science, 296, 1480–1482. [CrossRef] [PubMed]
Van Boxtel J. J. A. Koch C. (2012). Visual rivalry without spatial conflict. Psychological Science, 23 (4), 410–418. [CrossRef] [PubMed]
Vergeer M. Boi M. Öğmen H. Herzog M. H. (2013). Binocular suppression in non-retinotopic coordinates. Manuscript submitted for publication.
Von Helmholtz H. (1866). Handbuch der physiologischen Optik. Leipzig: Voss.
Appendix I: Psychophysics
First, we reproduced PCMB's psychophysical results (Figure 2C, for experimental details see Appendix III). Observers fixated in both the two- and three-bar conditions on the retinotopic bar outlined in orange in Figures 2A and 2B. In the two-bar condition, the proportion of upward responses increased when δ increased. We found a slight bias towards downward motion, similar to PCMB's results (in PCMB's experiment, the annulling phase was 10°; in our experiments it was 7°). In the three-bar condition, observers attended to the same bar as in the two-bar condition. Here, the psychometric function also varies with δ but is shifted toward upward motion responses (difference between three-bar and two-bar annulling phases Δδ = 23.5°, t[5] = 2.66, p = 0.022, one-tailed, Figure 2C). 
As mentioned, PCMB's goal was to model nonretinotopic motion perception as it occurs in the Ternus-Pikler display. To ease modeling, PCMB chose drifting gratings instead of a rotating dot. There is, however, a fundamental difference between our and PCMB's paradigm, which is not primarily about dot rotation versus drifting gratings, but about task instruction and attention. In our group motion condition, observers attended to the moving center disk, in which the rotating dot was perceived (Figure 1B). In PCMB's three-bar condition, observers attended to the retinotopic motion stream outlined by the orange rectangles in Figure 2B, i.e., observers attended to the same motion stream as in the two-bar condition. Hence, PCMB's paradigm measured how strongly retinotopic motion (orange outlines) is modulated by nonretinotopic motion (blue outlines) rather than measuring nonretinotopic motion directly, as we did. Their task instruction corresponds in our dot motion paradigm to asking observers to attend to the left-center disk and to make judgments on the retinotopic up-down motion. This task is nearly impossible, particularly without a fixation dot (try to see the up-down motion of the left center disk in Movie 2). To enforce retinotopic motion perception in their paradigm, PCMB placed a fixation dot on the bar. In addition, PCMB used drifting gratings, which had a rather low object-to-background contrast and no horizontal structure, important for bar-to-bar motion correspondences, thus further weakening group motion. However, group motion is key for nonretinotopic perception. 
Here, we show that the differences in task instruction come with large differences in performance. We asked observers to attend to the nonretinotopically moving center bar (blue outlines) instead of the retinotopic bar (orange outlines) in Figure 2B. To avoid compromising group motion, we omitted the fixation dot. Because the nonretinotopic motion is always upwards in PCMB's paradigm, we altered the stimulus so that the grating drifted upwards on half of the trials and downwards on the other half, giving rise to a balanced binary decision task (we plot the downward motion results as upward motion for comparison with Figure 2C). Observer S1 generally correctly perceived the nonretinotopic motion in the three-bar condition (Figure 2D). For observers S5 and S6, this was also partly the case, whereas for observers S3 and S4, it was less the case. Observer S2 seemed unable to do the task. In general, the nonretinotopic motion percept was not too pronounced for most observers (see for yourself in Movie 3). It remains unclear whether the difficulty in perceiving the nonretinotopic motion stems from the group motion being weak, or from the drifting grating motion being very coarsely sampled (in steps of 90° for the attended bars and varying with δ in steps anywhere from 0° to 180° on the unattended bar). Overall, it is obvious that the focus of attention clearly mattered for all observers (except for S2) as the blue curves are clearly above the black ones, t(5) = 3.89, p = 0.012, two-tailed. 
Figure 4
 
(A) Contour plots of PCMB's spatio-temporal filters (colored lines) at one time slice superimposed on the three-bar stimulus (top) and the two-bar stimulus (bottom) for the two filter widths selected to model the psychophysical data. Note that in the top subplot, the filter integrates much more information from the neighboring bar than does the filter displayed in the bottom subplot. We used the same filter sizes as used in PCMB's simulation code. (B) When we applied the very same two filters to the two-bar stimuli, we obtained comparable fits to the psychometric functions as with the three-bar and two-bar conditions in A respectively (see Figure 3D). Hence, what matters is filter size and not filter sensitivity, or the stimulus condition (i.e., two bar or three bar). The difference is that in the wd = 1.01 condition there is more overlap with the right-hand bar than in the wd = 0.67 condition. This shows that ad-hoc filter selection rather than inherent filter sensitivity matters.
Figure 4
 
(A) Contour plots of PCMB's spatio-temporal filters (colored lines) at one time slice superimposed on the three-bar stimulus (top) and the two-bar stimulus (bottom) for the two filter widths selected to model the psychophysical data. Note that in the top subplot, the filter integrates much more information from the neighboring bar than does the filter displayed in the bottom subplot. We used the same filter sizes as used in PCMB's simulation code. (B) When we applied the very same two filters to the two-bar stimuli, we obtained comparable fits to the psychometric functions as with the three-bar and two-bar conditions in A respectively (see Figure 3D). Hence, what matters is filter size and not filter sensitivity, or the stimulus condition (i.e., two bar or three bar). The difference is that in the wd = 1.01 condition there is more overlap with the right-hand bar than in the wd = 0.67 condition. This shows that ad-hoc filter selection rather than inherent filter sensitivity matters.
PCMB's indirect approach is not well suited for assessing and modeling nonretinotopic motion perception. First the model needs to capture the role of attention, which strongly modulates perception and performance. Second, the model needs to capture the degree to which retinotopic motion can be perceived in the group motion condition. In our dot motion paradigm, it is nearly impossible to perceive the retinotopic motions without the fixation dot, and the same also occurs for drifting gratings when group motion is strong and the contrast of the gratings is high (see Boi et al., 2009). In PCMB's paradigm, however, both percepts seem to be possible depending on task instruction and, likely, the use of the fixation dot. To us, it does not seem straightforward to model the role of a fixation dot and, along with it, the change in attention that makes a big difference in the percept, as we showed with the modified three-bar condition (Figure 2D). In general, it seems that this aspect adds complexity to the topic that should be addressed after a basic model is available. Third, a direct approach can produce clear-cut psychophysical results as is evident, for example, in observer S1, which makes it easy to pit nonretinotopic versus retinotopic conditions in the modeling. Finally, we will show in Appendix II.2 that such an indirect approach may come with the risk of picking up motion components that are not necessarily of nonretinotopic origin but still account for a clear difference between the two-bar and three-bar conditions. 
Appendix II: Modeling
PCMB's model architecture is shown in Figure 3A. Essentially, the stimulus is filtered for upwards and downwards motion and the two signals are compared. Next, the maximum of the upward and downward filter outputs are computed and subtracted, yielding “upwards motion energy” (Figure 3A; details in Appendix IV.1). In the following, we address some problems with the model and simulations. 
Appendix II.1: Filter selection
PCMB ran the model for 12 different filter widths for both the two-bar and three-bar conditions. Each panel of Figure 3B shows the model output for the two conditions (two bar and three bar) with one filter width. To fit the model outputs to the psychophysical data, PCMB chose a filter width of 0.67 degrees for the two-bar and 1.01 degrees for the three-bar condition (Figures 3B, C, and 4). This is an invalid, ad-hoc procedure, unless the filter selection occurs inherently in a way that is justified by biological or computational constraints. With such an ad-hoc procedure, the model becomes unfalsifiable, which becomes immediately evident when considering the following examples. Assume that none of the 12 filters were sensitive to the differences between the two-bar and three-bar stimuli, in which case the black curves would be perfectly overlapping with the red curves in each subplot of Figure 3B. Still, the psychophysical data could be matched by the same choice of filters as shown in Figure 3D. As a second example, we can also reproduce the psychophysical results when filtering only the two-bar stimulus with filters of widths 0.67 and 1.01 (Figure 3D and 4B). Hence, we obtain the same model results even if there is no Ternus-Pikler motion, which indicates that filter selection matters rather than filter sensitivity. 
It is particularly noteworthy that PCMB took great care to increase filter sensitivity. For example, PCMB added several nonlinearities to the readout process, apparently for no other reason than to make the filters sensitive to the differences between the two-bar and three-bar stimuli. However, as shown above, the within-filter effects are rather negligible compared to the large effects of ad-hoc filter selection. 
PCMB provided two justifications for their particular selection of filter widths (0.67 and 1.01). First, PCMB postulated that the filter widths should scale with the overall stimulus widths, i.e., the ratio of the two filter widths should be 2 : 3 because the ratio of the stimulus widths is 2 : 3 (two bar : three bar). Second, PCMB claimed that the chosen filter widths are “in close correspondence” with receptive field sizes found previously (Anderson & Burr, 1987). We think there are several problems with these arguments. First, the ratio of the overall stimulus widths is not 2 : 3, because this ignores the spaces between the bars (see Appendix IV.9). Second, a wide range of ratios of wd's can be chosen with basically the same model outcomes, because the wd associated with the two-bar condition can be selected from a wide range of values at which the filters' responses to the two- and three-bar stimuli do not differ. Third, the absolute values of the wd's are hardly constrained by the large 2′ (minutes) to 7° (degrees) interval of receptive field sizes found by Anderson and Burr (1987). Fourth, the results of Anderson and Burr (1987)'s study are not applicable here because receptive field sizes in that study were measured along the grating's motion direction and not perpendicular to it, as was the case here (Appendix IV.6). 
Could PCMB have used a single filter width to explain their data? For example, for a filter width of 1.22, the filter with the maximal difference between the two-bar and three-bar condition, the three-bar condition shows a consistently stronger upward motion bias than does the two-bar condition, which is in line with the psychophysical results. Interestingly, the filter response for the two-bar condition is also biased towards upward motion. PCMB claimed that this bias is in line with the psychophysical data because there is an upward motion bias. However, this is not true. In fact, the bias in the psychophysical data for the two-bar condition is towards downward motion (in terms of annulling phases, +10° in their data and +7° in ours, versus around −27° as predicted by the model's 1.22 filter). Surprisingly, it seems that the two-bar condition is particularly hard to explain with their model because for all, even the smallest, filters there is a slight upward motion bias. 
In addition, the difference between the two-bar and three-bar conditions with the wd = 1.22 filter is much smaller than the difference observed in the psychophysical data. In the model the annulling phase of the 1.22 filter is around −44° for the three-bar condition and −27° for the two-bar condition, whereas it is around −27° (three-bar) and +10° (two-bar) for PCMB's psychophysical data. 
Appendix II.2: Unspecific motion components
As mentioned, PCMB have chosen an indirect approach by measuring how strongly retinotopic motion is modulated by nonretinotopic motion. Because of the indirect approach, the particular stimulus, and the complex model, it may well be that the model picks up unspecific signals which are not truly of nonretinotopic origin. For example, as mentioned above, the model shows a clear bias for upward motion in the two-bar condition (Figure 3B; PCMB's figure 11B). This bias is present even for the smallest filters, which integrate retinotopic motion from only one bar. Hence, the bias cannot come from nonretinotopic motion components. Moreover, these findings raise doubts about whether the within-filter differences for the two-bar and three-bar conditions are exclusively due to truly nonretinotopic motion rather than to unspecific signals of unclear origin. 
Appendix II.3: Overparametrization
There are only a few degrees of freedom in the experimental data, namely, the psychometric function's threshold in the two- and three-bar conditions (for a discussion on the slopes of the psychometric functions see Appendix IV.7). On the other hand, the model has a complex structure with many degrees of freedom, such as the five filter parameters (μy, μt, σx, σy, and σt), the degrees of freedom hidden in the readout process, and the free choice of filter wd's, as discussed in the “Filter selection” section above. It is surprising that, despite the extra degrees of freedom, the model's outputs differed significantly from the experimental data, χ2(2) = 18.3, p = 0.0001 = 0.01% (see Appendix IV.8). 
Appendix II.4: Simplicity
PCMB suggested that their model has the advantage of being simple and biologically plausible because of the use of spatio-temporal filters with receptive field profiles similar to those of motion sensitive neurons. This would be true if their model was based only on the filter outputs, but their model actually used at least three nonlinear operations following their filtering operation (see Appendix IV.1 and Figure 5 and additionally involved the aforementioned arbitrary filter selection. 
Appendix II.5: Generalizability
To explain the differences between the two- and three-bar conditions, PCMB chose different filters for the two-bar and three-bar condition (as shown in Figure 3B, C). As mentioned above, we obtain similar fits to the psychophysical data when we filter the two-bar stimulus with the same choice of filters (Figure 3B, D). Obviously, the Ternus-Pikler motion is not important for the model, which is at the heart of the spatio-temporal filtering approach aiming to account for nonretinotopic motion processing without explicitly computing both the Ternus-Pikler-reference frame and the task-relevant grating motion. Whereas, this is indeed a parsimonious approach, the success of such an approach may be restricted to certain types of stimuli and to motion processing in particular. 
For example, PCMB's model seems particularly tailored to the specific stimulus and task used in their experiment. Because of the larger filter width in the three-bar condition (wd = 1.01), vertical motion is taken into account from the attended and from the neighboring bars. In the two-bar condition, however, due to the smaller filter (wd = 0.67), vertical motion is taken into account mainly from the attended bar. Hence, the model explains “retinotopic” motion processing in the two-bar condition by the smaller filter width, taking motion from only one bar into account. “Nonretinotopic” motion processing in the three-bar condition apparently occurs because of the larger filter width, integrating more vertical motion from neighboring bars. 
This approach works for stimuli in which the Ternus-Pikler and the task-relevant motion signals have sufficiently different motion directions, e.g., if they are orthogonal to each other. However, it remains an open question whether the model can easily be expanded to the dot-rotation paradigm, or any other paradigms where motion components of all directions matter, as claimed by PCMB on page 13 where they say “Obviously, with a more complex neural network considering a population of neurons, it would be possible to simulate more precisely the data observed here and, by extension, those using the more complex stimuli of Boi et al.” Moreover, in our paradigm the retinotopic and nonretinotopic outcomes are not simple additions within the same stimulus dimension (nonretinotopic vertical motion modulates retinotopic vertical motion), but instead, are made of two different Gestalts (up-down, left-right motion in the case of retinotopic organization versus stationary dots and rotation in the case of nonretinotopic organization). 
Appendix II.6: Model parameters
Some values reported in PCMB did not match the values used in the actual simulation. Furthermore, there were discrepancies between the simulation values and the values used in the psychophysical experiment. For example, the stimulus on : off ratio was 1 : 7 in the simulation, whereas it was 4 : 7 (120 ms : 210 ms) in the psychophysical experiment. A list of further issues of this type can be found in Appendix IV.2
Appendix III: Methods and results of the psychophysical experiment
Observers
Six observers from the École Polytechnique Fédérale de Lausanne (EPFL) participated in this study (five naive and one author). All observers had normal or corrected-to-normal vision as assessed by the Freiburg visual acuity test (Bach, 1996). Observers were told that they could quit the experiment at any time, and written informed consent was obtained. Observers were remunerated for participation (20 CHF per hour). 
Apparatus
Experiments were conducted using an Iiyama VisionMaster Pro 454 monitor (Iiyama, Hoofddorp, The Netherlands) running at a screen resolution of 1024 × 768 and a refresh rate of 85 Hz. The pixel resolution was approximately 72 pixels per inch. Experiments were scripted using the Psychtoolbox for Windows XP (Brainard, 1997; Pelli, 1997). The computer was equipped with an ATI Radeon X300SE graphics card (AMD, Sunnyvale, CA) and the color lookup table was linearized by passing the input voltages through a gamma function to produce linear luminance increments. The viewing distance was 60 cm. 
Procedures
We used the exact same stimulus configuration as PCMB, shown in Figure 2A and B. For the two-bar stimulus, the left bar had variable up/down motion depending on parameter δ (Figure 2A). For the three-bar stimulus, the central bar always appeared to be moving upwards, while the left bar again had variable motion depending on δ. Each bar subtended a visual angle of 1.2° wide and 3.2° high with a space between bars of 1.2°. The Michelson contrast of the sinusoids was set to 50% and the background luminance was 12 cd/m2, as reported by PCMB. 
In PCMB's publication there were two conditions: the two-bar and three-bar conditions, both of which involved attending to a fixed retinotopic location and making motion discriminations at that location. We repeated these two conditions and added an additional condition where subjects viewed the same three-bar stimulus, but attended to the nonretinotopic central bar. In PCMB's experiment the central bar always underwent upward motion. In order to avoid participant response biases and to ensure attention to the task, we modified the stimulus so that on half of the trials the stimulus was flipped upside-down (undergoing downward motion) and we reverse-coded the participants' responses so that if they responded “up” on these flipped trials then we recorded the response as “down” and visa-versa for “down” responses. Participants completed five blocks of 30 trials per condition. The stimulus order was randomized across observers. 
All procedures used on human subjects conformed to the Declaration of Helsinki. 
Appendix IV: Detailed comments on modeling
  1.  
    There are several steps taken in the model that were not clearly reported. PCMB, p. 13, refer to their processing sequence as “... a single spatiotemporal filter followed by a basic non-linearity.” Mathematically, the sequence of operations they actually adopted is:
  •  A. 
    Convolve the input I(x, y, t) with upward- and downward-motion selective filters (Fup and Fdown) by Fourier transforming, filtering, and inverse Fourier transforming.   
  •  B. 
    Spatially window the filter outputs at the attended stimulus bar (see Figure 3A for their window function).   where  
  •  C. 
    Take the maximum filter output over space at each time point.   
  •  D. 
    Take the difference between the results for the upward and downward signals over time.  
  •  E. 
    Take the absolute value of the maximum and minimum of the result over time.   
  •  F. 
    Take the difference between DMaxup–down and DMinup–down.  
Figure 5
 
PCMB's model. The stimulus sequence is first convolved with two spatiotemporal filters of a given filter width (wd) in the upward and downward directions. PCMB then window the results and take the maximum over x, y-space at each time point. Next, they take the difference between the upward and downward maxima at each time point. The results are subject to a rectifying nonlinearity taking the absolute value. PCMB find the maximum and minimum of these rectified values in time and subtract them. The above operations are performed over all phases (δ) and for all stimuli (two bar and three bar), and the overall maximum is found. This overall maximum is then used to divisively normalize the outputs for the individual phases and stimuli so that the new global maximum is one. This entire process is repeated independently for each filter width (wd) used to fit the data. Although wd values are never mixed between two-bar and three-bar stimuli when computing the global normalization factor, the resultant psychometric curves for different wd values are used to fit the human data for the two-bar and three-bar conditions.
Figure 5
 
PCMB's model. The stimulus sequence is first convolved with two spatiotemporal filters of a given filter width (wd) in the upward and downward directions. PCMB then window the results and take the maximum over x, y-space at each time point. Next, they take the difference between the upward and downward maxima at each time point. The results are subject to a rectifying nonlinearity taking the absolute value. PCMB find the maximum and minimum of these rectified values in time and subtract them. The above operations are performed over all phases (δ) and for all stimuli (two bar and three bar), and the overall maximum is found. This overall maximum is then used to divisively normalize the outputs for the individual phases and stimuli so that the new global maximum is one. This entire process is repeated independently for each filter width (wd) used to fit the data. Although wd values are never mixed between two-bar and three-bar stimuli when computing the global normalization factor, the resultant psychometric curves for different wd values are used to fit the human data for the two-bar and three-bar conditions.
Figure 6
 
Simulation results using PCMB's code with the windowing function set to fill the entire image. Comparing the results presented here with those in Figure 3B, we see that the model predicts the same psychophysical results even when attention is dispersed over the entire figure.
Figure 6
 
Simulation results using PCMB's code with the windowing function set to fill the entire image. Comparing the results presented here with those in Figure 3B, we see that the model predicts the same psychophysical results even when attention is dispersed over the entire figure.
The parameter Ψ is then calculated for each stimulus type (two-bar and three-bar) and for each phase (see Figure 5). Once calculated, the final output that gives the points on the psychometric function for one filter width plotted in Figure 3B is given as:   This sequence (schematically illustrated in Figure 5) is clearly more than “... a single spatiotemporal filter followed by a basic nonlinearity.” In fact, there are at least three nonlinearities present following the filtering operation: the max operation of Step 3, the max and min operations of Step 5 and the absolute value of Step 5. 
  1.  
    Implementation and parametrization of the model:
  •  A.  
    The background in PCMB's psychophysical experiment was gray, but black in their simulations.
  •  B.   
    The stimulus to interstimulus interval (ISI) time ratio was 4 : 7 in their psychophysical experiment and 1 : 7 in their simulations.
  •  C.   
    The spatial window used in the simulations was not aligned with the bar (see Figure 3A). Also, there were two readout windows, even though the subject's task was to attend to only one of the bars. The placement or rationale for these windows was not described. We attempted to reproduce PCMB's results without restricting the search for maxima to the windowed region depicted in Figure 3A and found qualitatively similar but quantitatively different results (Figure 6).
  1.  
    PCMB calculated the values for the horizontal filter width wd incorrectly. To see this, consider the following. The horizontal filter width wd is related to the low-pass filter's parameter σx in Fourier space. The profile of the low-pass filter used in their simulation is Gaussian-like:  
In general, the Fourier transform () of a Gaussian-like function is also Gaussian-like. More specifically:   
We substitute σx in Equation 1, with sx· ω0, where sx is the filter width in multiples of the fundamental frequency ω0 in the fast Fourier transform (FFT). This frequency depends on the window size wFFT used for the FFT: ω0 = 2π/wFFT
Setting ω = ωx and equating the right sides of Equations 1 and 2 yields:  PCMB did not report exactly what the filter width wd refers to. We assume wd to be 2σ of the Gaussian. We can calculate wd as follows:  Replacing the factor a with its equivalent from the right hand side of Equation 3 yields:  PCMB used a window width of 512 pixels and a stimulus bar width of 20 pixels (according to their code). The stimulus bar width was reported to be 1.2°, and wFFT = 1.2° · 512/20 = 30.72°. Using this value, wd may finally be computed as 13.83°/sx. Further examination of the simulation code revealed that the simulation was run for sx = 1, 2, ..., 12 (according to the 12 subplots in Figure 3B) and the wd's were reported to be 6.11°/sx, which is fairly far off from the correct values. 
If we assumed wd to be the Full Width Half Maximum (FWHM) of the Gaussian instead of 2σ, we need to change Equations 4 and 5 accordingly or apply the general conversion for a Gaussian FWHM = Display FormulaImage not available · 2σ. Doing so yields wd = 16.28°/sx
  1.  
    The filter parameters reported by PCMB (μy, μt, σy, σt) are given without units, which makes filtering operations impossible to replicate.
  2.  
    The temporal window used in the Fourier domain covered exactly two full cycles of the Ternus-Pikler stimulus. Since the Fourier transform assumes periodicity, this manipulation assumes the Ternus-Pikler motion lasts forever, and ignores transient effects arising from stimulus onset and offset.
  3.  
    PCMB justify the choice of the wd values based on previous psychophysical experiments (Anderson & Burr, 1987). However, the paradigm in the paper by PCMB and the one presented by Anderson and Burr (1987) are very different. First, the range in Anderson and Burr (1987) is not 2 arc seconds to 7 degrees (as reported in PCMB, p. 13), but 2 arc minutes to 7 degrees. Second, in Anderson and Burr (1987)'s paper the receptive field sizes were measured along the motion direction and not perpendicular to it as it is for the filters in PCMB. Third, the experiments in Anderson and Burr (1987) used smooth motion with a peak temporal frequency of 8 Hz, not apparent motion with a peak frequency of 0.8 Hz as used in PCMB. Fourth, the experiments were run at a mean luminance of 400 cd/m2 instead of 12 cd/m2 as in PCMB.
  4.  
    Linking hypothesis: PCMB normalized the filter outputs by the maximum over all conditions in order to compare the model output with the results of the psychophysical experiments. This approach is not only biologically implausible, but also it neglects the influence of noise on the psychometric function's slope. If there were no noise and the subject made decisions, for example, based on the sign of the readout signal, then the decision would always be the same at a given test level (here δ), which would result in a step response function. As noise increases, however, the decisions are not that clear-cut anymore and the psychometric function deviates from the step function accordingly. The form of the psychometric function is not only determined by the signal to noise ratio but also by where in the process noise occurs, especially if the process contains nonlinearities as in PCMB's model. Because these factors are unknown, the particular form of the psychometric function, including the slope, should not be considered when comparing the model output to the experimental results. The only data point that is invariant regarding the linking hypothesis and its unknown factors are the offsets of the psychometric functions along the δ-axis (annulling phases).
  5.  
    In order to quantitatively compare the model with the experimental outcome, we inferred the standard deviations of the subjects' points of subjective equality (PSEs) by measuring the data points in points of subjective equality (PSEs) figure 6(b), which revealed sd2bar = 9.89° and sd3bar = 18.38°. Given N = 6 subjects, the corresponding standard errors are se2bar = 4.04° and se3bar = 7.50° (se = sd/ Display FormulaImage not available ). The mean thresholds were provided explicitly: m2bar = 10.2°, m3bar = −27.2°. The PSE's predicted by the model were inferred from PCMB's figure 11C: μ2bar = −6.6°, μ3bar = −34.8°. This results in a χ2 value of χ2 = ∑i=2bar,3bar[(miμi)/sei]2 = 18.3. We tested this value against a χ2-distribution assuming 2 degrees of freedom, which is very much in favor of the model as the degrees of freedom lost in the course of model fitting are not taken into account, and found the model's prediction error to be highly significant (p = 0.0001 = 0.01%).
  6.  
    PCMB use the ratio of the overall stimulus widths of the three- and the two-bar conditions to justify the ratio of the wd's picked for these conditions and claim the ratio to be 3 : 2 ( = 1.5). This ratio does not, however, take into account the gaps between the vertical bars, which would result in a ratio of (3 + 2) : (2 + 1) = 5 : 3 ( = 1.67) (the gaps have the same width as the bars). Moreover, the stimulus in the three-bar condition actually covers four bar positions on the screen, i.e., in retinotopic space, as the elements are displaced horizontally from one frame to the next. This results in a stimulus ratio of (4 : 3) : (2 + 1) = 7 : 3 ( = 2.3). Since the model does not assume that a nonretinotopic reference frame will be established, it seems to be more appropriate here to use the stimulus size in retinotopic space. Thus, the wd's should have had a ratio of 1 : 2.3 instead of 1 : 1.5.
  7.  
    PCMB used a sigmoidal Gaussian to fit their experimental data, which is well suited for nonperiodic paradigms; however, for their periodic motion experiment (where δ is periodic) a cumulative von Mises function would have been better suited.
Figure 1
 
Ternus-Pikler display. (A) In the “no-flankers” condition, two black disks are presented at the same locations separated by an inter-stimulus interval (ISI). A white dot is perceived to be moving up-down in the left disk, and left-right in the right disk (Movie 1). (B) In the group motion condition, a third disk is added either to the left or right of the center disks. When the ISI is sufficiently long, e.g., 100 ms, three disks are perceived to move back and forth as a group. The group-motion creates a nonretinotopic reference frame (see arrows that were not shown in the actual display). Features are integrated according to this nonretinotopic reference frame. A rotating dot is perceived in the center disk of a given frame which results from the nonretinotopic integration of the retinotopic dot motions of the two center-most disks. The only differences between A and B lie in the outer disks. The percept of rotation is illusory in the sense that, retinotopically, only up-down and left-right motions are present. Attention is crucial. It is nearly impossible to attend to the retinotopic up-down or left-right motions in the group motion condition (Movie 2). (C) The “illusory” dot motion disappears when the ISI is reduced to 0 ms. Contrary to the group motion condition, the disks now are perceived at four positions: the two center disks at the center positions and the outer disk jumping from the far left to the far right position. As in the no-flankers condition, up-down motion is perceived in the center-left disk and right-left motion in the center-right disk.
Figure 1
 
Ternus-Pikler display. (A) In the “no-flankers” condition, two black disks are presented at the same locations separated by an inter-stimulus interval (ISI). A white dot is perceived to be moving up-down in the left disk, and left-right in the right disk (Movie 1). (B) In the group motion condition, a third disk is added either to the left or right of the center disks. When the ISI is sufficiently long, e.g., 100 ms, three disks are perceived to move back and forth as a group. The group-motion creates a nonretinotopic reference frame (see arrows that were not shown in the actual display). Features are integrated according to this nonretinotopic reference frame. A rotating dot is perceived in the center disk of a given frame which results from the nonretinotopic integration of the retinotopic dot motions of the two center-most disks. The only differences between A and B lie in the outer disks. The percept of rotation is illusory in the sense that, retinotopically, only up-down and left-right motions are present. Attention is crucial. It is nearly impossible to attend to the retinotopic up-down or left-right motions in the group motion condition (Movie 2). (C) The “illusory” dot motion disappears when the ISI is reduced to 0 ms. Contrary to the group motion condition, the disks now are perceived at four positions: the two center disks at the center positions and the outer disk jumping from the far left to the far right position. As in the no-flankers condition, up-down motion is perceived in the center-left disk and right-left motion in the center-right disk.
Figure 2
 
(A) In the two-bar condition, observers were asked to attend to the left bar. Orange outlines (not present in the actual display) mark the task-relevant, attended location. (B) In the three-bar condition, observers attended to the same retinotopic location as in the two-bar condition (again outlined in orange). (C) We tested six observers using the same stimuli and procedures as used in PCMB. Five out of our six observers showed a pattern of results similar to PCMB's. Upward and downward percepts varied with δ in both conditions, but were biased towards upward motion in the three-bar condition. (D) We used the same three-bar stimulus as in the previous condition, but observers attended to the nonretinotopic, blue outlined bars in B (shown by blue dots and blue fitting functions). Obviously, a very different pattern of results occurs compared to when observers attended to the retinotopic location. In general, performance depends less on δ when observers attend to the nonretinotopic motion than when they attend to the retinotopic motion (black curves, replotted from C, vs. blue curves). However, inter-observer variability is high. While observer S1 only perceived the nonretinotopic motion (the blue points all lined up at a proportion “upward” responses of 1.0), the other observers were influenced to varying degrees by the retinotopic motion because performance changed with δ. Observer S2's performance was nearly at chance level. It seems that this observer had problems with the paradigm in general.
Figure 2
 
(A) In the two-bar condition, observers were asked to attend to the left bar. Orange outlines (not present in the actual display) mark the task-relevant, attended location. (B) In the three-bar condition, observers attended to the same retinotopic location as in the two-bar condition (again outlined in orange). (C) We tested six observers using the same stimuli and procedures as used in PCMB. Five out of our six observers showed a pattern of results similar to PCMB's. Upward and downward percepts varied with δ in both conditions, but were biased towards upward motion in the three-bar condition. (D) We used the same three-bar stimulus as in the previous condition, but observers attended to the nonretinotopic, blue outlined bars in B (shown by blue dots and blue fitting functions). Obviously, a very different pattern of results occurs compared to when observers attended to the retinotopic location. In general, performance depends less on δ when observers attend to the nonretinotopic motion than when they attend to the retinotopic motion (black curves, replotted from C, vs. blue curves). However, inter-observer variability is high. While observer S1 only perceived the nonretinotopic motion (the blue points all lined up at a proportion “upward” responses of 1.0), the other observers were influenced to varying degrees by the retinotopic motion because performance changed with δ. Observer S2's performance was nearly at chance level. It seems that this observer had problems with the paradigm in general.
Figure 3
 
(A) The schematic shows the model as reported in PCMB. The stimulus was Fourier transformed, filtered for upward and downward motion, and inverse Fourier transformed. Next, the maximum of the upward and downward filter outputs were computed and subtracted, yielding “upwards motion energy.” The model, however, is more complex than reported (see Figure 5 in Appendix IV for full details). Inspection of their code shows that after the upward- and downward-motion filtering, the result is windowed (inset with superimposed red windows) and the maximum is taken over space at each time point. A second nonlinear operation is carried out, where the absolute value of the minimum up-down motion energy over time is subtracted from the maximum. Finally, the above operations are carried out for both the two-bar and three-bar stimuli, and the results are normalized by the maximum of the two outputs (see Appendix IV.1). (B) Simulation results for different filter widths (wd). Data points mark model outputs and solid lines mark best fitting cumulative Gaussian functions. Results and fits for the two-bar condition are plotted in red, while those for the three-bar condition are plotted in black. For each sub-plot, the x-axis plots the phase offset δ (see Figure 2), while the y-axis plots the model's upwards motion bias. (C) PCMB's human psychophysical data (solid curves) with superimposed model results (dashed curves) selected from the wd = 0.67 filter for the two-bar condition and from the wd = 1.01 filter for the three-bar condition (see black arrow). (D) Filter selection rather than filter sensitivity matters. Assume no filters can discriminate between the two- and three-bar stimuli. In this case, the black curves would, for example, be identical to the red curves. Still, we obtain very similar model outputs when we take the outputs from the wd = 0.67 and wd = 1.01 filters (see red arrows projecting from B to D). The very same situation occurs if there is no Ternus motion, i.e., when we filter the two-bar stimulus with the same choice of filters, see Figure 4B.
Figure 3
 
(A) The schematic shows the model as reported in PCMB. The stimulus was Fourier transformed, filtered for upward and downward motion, and inverse Fourier transformed. Next, the maximum of the upward and downward filter outputs were computed and subtracted, yielding “upwards motion energy.” The model, however, is more complex than reported (see Figure 5 in Appendix IV for full details). Inspection of their code shows that after the upward- and downward-motion filtering, the result is windowed (inset with superimposed red windows) and the maximum is taken over space at each time point. A second nonlinear operation is carried out, where the absolute value of the minimum up-down motion energy over time is subtracted from the maximum. Finally, the above operations are carried out for both the two-bar and three-bar stimuli, and the results are normalized by the maximum of the two outputs (see Appendix IV.1). (B) Simulation results for different filter widths (wd). Data points mark model outputs and solid lines mark best fitting cumulative Gaussian functions. Results and fits for the two-bar condition are plotted in red, while those for the three-bar condition are plotted in black. For each sub-plot, the x-axis plots the phase offset δ (see Figure 2), while the y-axis plots the model's upwards motion bias. (C) PCMB's human psychophysical data (solid curves) with superimposed model results (dashed curves) selected from the wd = 0.67 filter for the two-bar condition and from the wd = 1.01 filter for the three-bar condition (see black arrow). (D) Filter selection rather than filter sensitivity matters. Assume no filters can discriminate between the two- and three-bar stimuli. In this case, the black curves would, for example, be identical to the red curves. Still, we obtain very similar model outputs when we take the outputs from the wd = 0.67 and wd = 1.01 filters (see red arrows projecting from B to D). The very same situation occurs if there is no Ternus motion, i.e., when we filter the two-bar stimulus with the same choice of filters, see Figure 4B.
Figure 4
 
(A) Contour plots of PCMB's spatio-temporal filters (colored lines) at one time slice superimposed on the three-bar stimulus (top) and the two-bar stimulus (bottom) for the two filter widths selected to model the psychophysical data. Note that in the top subplot, the filter integrates much more information from the neighboring bar than does the filter displayed in the bottom subplot. We used the same filter sizes as used in PCMB's simulation code. (B) When we applied the very same two filters to the two-bar stimuli, we obtained comparable fits to the psychometric functions as with the three-bar and two-bar conditions in A respectively (see Figure 3D). Hence, what matters is filter size and not filter sensitivity, or the stimulus condition (i.e., two bar or three bar). The difference is that in the wd = 1.01 condition there is more overlap with the right-hand bar than in the wd = 0.67 condition. This shows that ad-hoc filter selection rather than inherent filter sensitivity matters.
Figure 4
 
(A) Contour plots of PCMB's spatio-temporal filters (colored lines) at one time slice superimposed on the three-bar stimulus (top) and the two-bar stimulus (bottom) for the two filter widths selected to model the psychophysical data. Note that in the top subplot, the filter integrates much more information from the neighboring bar than does the filter displayed in the bottom subplot. We used the same filter sizes as used in PCMB's simulation code. (B) When we applied the very same two filters to the two-bar stimuli, we obtained comparable fits to the psychometric functions as with the three-bar and two-bar conditions in A respectively (see Figure 3D). Hence, what matters is filter size and not filter sensitivity, or the stimulus condition (i.e., two bar or three bar). The difference is that in the wd = 1.01 condition there is more overlap with the right-hand bar than in the wd = 0.67 condition. This shows that ad-hoc filter selection rather than inherent filter sensitivity matters.
Figure 5
 
PCMB's model. The stimulus sequence is first convolved with two spatiotemporal filters of a given filter width (wd) in the upward and downward directions. PCMB then window the results and take the maximum over x, y-space at each time point. Next, they take the difference between the upward and downward maxima at each time point. The results are subject to a rectifying nonlinearity taking the absolute value. PCMB find the maximum and minimum of these rectified values in time and subtract them. The above operations are performed over all phases (δ) and for all stimuli (two bar and three bar), and the overall maximum is found. This overall maximum is then used to divisively normalize the outputs for the individual phases and stimuli so that the new global maximum is one. This entire process is repeated independently for each filter width (wd) used to fit the data. Although wd values are never mixed between two-bar and three-bar stimuli when computing the global normalization factor, the resultant psychometric curves for different wd values are used to fit the human data for the two-bar and three-bar conditions.
Figure 5
 
PCMB's model. The stimulus sequence is first convolved with two spatiotemporal filters of a given filter width (wd) in the upward and downward directions. PCMB then window the results and take the maximum over x, y-space at each time point. Next, they take the difference between the upward and downward maxima at each time point. The results are subject to a rectifying nonlinearity taking the absolute value. PCMB find the maximum and minimum of these rectified values in time and subtract them. The above operations are performed over all phases (δ) and for all stimuli (two bar and three bar), and the overall maximum is found. This overall maximum is then used to divisively normalize the outputs for the individual phases and stimuli so that the new global maximum is one. This entire process is repeated independently for each filter width (wd) used to fit the data. Although wd values are never mixed between two-bar and three-bar stimuli when computing the global normalization factor, the resultant psychometric curves for different wd values are used to fit the human data for the two-bar and three-bar conditions.
Figure 6
 
Simulation results using PCMB's code with the windowing function set to fill the entire image. Comparing the results presented here with those in Figure 3B, we see that the model predicts the same psychophysical results even when attention is dispersed over the entire figure.
Figure 6
 
Simulation results using PCMB's code with the windowing function set to fill the entire image. Comparing the results presented here with those in Figure 3B, we see that the model predicts the same psychophysical results even when attention is dispersed over the entire figure.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×