Free
Research Article  |   June 2005
Is prior knowledge of object geometry used in visually guided reaching?
Author Affiliations
Journal of Vision June 2005, Vol.5, 2. doi:10.1167/5.6.2
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Bruce Hartung, Paul R. Schrater, Heinrich H. Bülthoff, Daniel Kersten, Volker H. Franz; Is prior knowledge of object geometry used in visually guided reaching?. Journal of Vision 2005;5(6):2. doi: 10.1167/5.6.2.

      Download citation file:


      © 2016 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

We investigated whether humans use prior knowledge of the geometry of faces in visually guided reaching. When viewing the inside of a mask of a face, the mask is often perceived as being a normal (convex) face, instead of the veridical, hollow (concave) shape. In this “hollow-face illusion,” prior knowledge of the shape of faces dominates perception, even when in conflict with information from binocular disparity. Computer images of normal and hollow faces were presented, such that depth information from binocular disparity was consistent or in conflict with prior knowledge of the geometry. Participants reached to touch either the nose or cheek of the faces or gave verbal estimates of the corresponding distances. We found that reaching to touch was dominated by prior knowledge of face geometry. However, hollow faces were estimated to be flatter than normal faces. This suggests that the visual system combines binocular disparity and prior assumptions, rather than completely discounting one or the other. When comparing the magnitude of the hollow-face illusion in reaching and verbal tasks, we found that the flattening effect of the illusion was similar for verbal and reaching tasks.

Introduction
Current literature on both robotic and human reachingassumes that most of the information used when planning and executing a visuallyguided reach is visually available at the time of the reach. In other words, itis assumed that prior knowledge is not used. The exception to this is priorknowledge about calibrations, such as camera calibrations and calibrationsbetween cameras and manipulators or between the eye and the hand for humanobservers. However, given the well-known ambiguities in visually extractingobject shape (Belheumer, Kriegman, ⇐p; Yuille, 1999), the use of prior information forshape may be critical for making successful visually guided reaches. In thisstudy we explore whether or not visually guided reaching in humans uses priorknowledge of the geometry-specifically, prior knowledge of the geometry offaces. 
Faces are convenient targets for our experiments, inpart because of the well-known “hollow-face illusion” (Gregory, 1973). When an observer views the inside ofa mask or mold of a face, depth estimates from binocular disparity conflict withprior knowledge of the shape of faces. It seems that prior knowledge “wins,” andthe mask is seen as a convex face (i.e., having normal geometry). This is the“hollow-face illusion.” 
One may wonder if the hollow-face illusion is simply amanifestation of a general convexity bias, as demonstrated by Langer andBülthoff (2001), rather thandepending on familiarity with faces. If this were the case, we would expect thatinverted versions of less familiar objects would exhibit the same effect as thehollow-face illusion, and that the effect of the illusion would be of the samemagnitude. However, this is not the case. As shown by Hill and Bruce (1994), a “hollow-potatoillusion” has a smaller effect on verbal tasks than does the hollow-faceillusion. This suggests that the hollow-face illusion is more than just amanifestation of a general convexity bias and that prior knowledge of anobject’s geometry is used when making verbal depth estimates. However, itdoes not immediately follow that the visual system guides reaches using the sameprior knowledge. The questions of how prior knowledge and binocular disparityare combined and what strategies are used to combine these cues are also stillopen. 
In this study, we use the hollow-face illusion to testwhether prior knowledge of geometry is used when making visually guided reaches,and whether this knowledge combines with or supercedes binocular stereoinformation in both reaching and verbal tasks. Finally, we investigate whetherthe use of prior knowledge depends on the task performed by comparing shapeestimates from verbal tasks to those from reaching tasks. We will brieflydiscuss each of these researchquestions. 
1.1 Does prior knowledge affect reaches to faces?
In theory, reaches could be controlled completely byinformation present at the time of the reach, for example, binocular disparity(Hespanha, Dodds, Hager, ⇐p; Morse, 1999). On the other hand, prior knowledgeof an object’s geometry could be used in combination with binoculardisparity to make a more accurate estimate of the object’s geometry. Ifthe visual system does use prior knowledge of an object’s geometry, then reachesshould be affected by the hollow-face illusion, as tested by our experiment. 
Because previous work (Hill ⇐p; Bruce, 1993, 1996)has shown that the hollow-face illusion affects verbal estimates of facegeometry, one may be tempted to assume that other types of tasks will also beaffected. However, Schrater and Kersten (2000) have shown that optimal cuecombination depends on the task being performed. For us, this may mean that theoptimal combination of prior knowledge and binocular disparity is different forverbal tasks than for reaching tasks. Prior knowledge may be a cue thatdominates when the task is to verbally estimate a familiar object’s geometry,whereas binocular disparity may dominate when the task is to guide reach to thesame object. Indeed, some studies (e.g., Bridgeman, Lewis, Heit, ⇐p; Nagle, 1979; Bridgeman, Peery, ⇐p; Anand,1997; Milner ⇐p; Goodale, 1995) suggest that illusions that affect someverbal (or more general “perceptual”) tasks, do not affect thevisual control of reaching tasks and that different types of reaching tasks maybe affected differently. More specifically, if haptic feedback is provided atthe end of the reach, the reach may not be affected by the illusion, but ifhaptic feedback is not provided, it will. However, target stimuli used in thecited experiments were not designed to test for the effects of prior knowledge.In those experiments, participants estimated size, length, or position ofabstract geometric entities, such as lines and circles. As these do not have a“typical” or expected size, these experiments did not test theeffects of prior knowledge inherent in the hollow-face illusion used in ourexperiment. 
1.2 How is prior knowledge combined with binocular disparity?
While the previous section asks if the visual systemuses prior knowledge, it does not ask how information from prior knowledge ofgeometry is combined with information from binocular disparity. When priorknowledge and disparity information are in conflict, it is possible that thevisual system uses a winner-take-all strategy-using only prior knowledgewhen reaching to sufficiently familiar objects. It is also possible that thevisual system combines these sources of information to yield a shape estimatethat forms a compromise between the two sources. For example, the visual systemmay use a weighted combination of the information from binocular disparity andprior knowledge of geometry. We will compare reaches to hollow faces withreaches to normal faces. If the visual system uses a winner-take-all strategy inwhich prior knowledge of geometry is the winner, reaches should be the same forboth hollow and normalfaces. 
1.3 Is the magnitude of the hollow-face illusion task dependent?
While the first question asks whether prior knowledgeis used for reaching tasks, it does not ask whether prior knowledge affects thethree types of tasks equally. We will compare the magnitude of the effect of theillusion on all three tasks. 
In our experiment, we presented participants withcomputer-generated images of convex (normal) and concave (hollow) faces, suchthat depth information from binocular disparity was consistent or in conflictwith prior knowledge of the geometry. We used an experimental setup that enabledus to minimize other potential cues for face geometry. For example, faces wererendered as Lambertian surfaces with directional light sources, such thatshading would not bias the participants to the concavity or convexity of thefaces. Participants reached to either the nose or cheek of the faces or gaveverbal estimates of the corresponding distances. If prior knowledge about thegeometry of faces does affect participants’ reaches, we expect them toreach to concave faces as if they were convex; therefore, we expect them toreach to the nose as if it were in front of the cheek, even though it isbehind. 
Methods
Participants viewed concave and convex faces and madeverbal and reach estimates that indicated the participant’s perceivedshape of theface. 
2.1 Participants
Five naïve University of Tübingen studentstook part in the study. In return for their participation, they received apayment of 13 DM (app. 6.5 US$ or 6.5 EURO) perhour. 
2.2 Apparatus
The faces were stereo pairs rendered using OpenGL,scaled to normal size for an adult head. The faces were taken from theTübingen Face Database1996;Troje ⇐p; Bülthoff, 1999; Blanz⇐p; Vetter, 1999). For the sake ofsimplicity, it was important to choose a lighting model that would not add anadditional source of information for determining the concavity/convexity of theface. To that end, each face was rendered as a Lambertian surface, lit by asingle, directional light source along the view direction. Because theconcavity/convexity of the face was determined by a scaling in the viewdirection, this light source created shading that provided only ambiguousinformation for determining the concavity or convexity of the face. 
It was necessary to present the faces in such a waythat participants would be able to reach to the perceived location of the faceimages. To achieve this, stimuli were rendered on a CRT suspended above amirror, as shown in Figure 1. The faces’location as defined by binocular disparity and perspective cues was behind themirror. As shown in the figure, this setup allowed participants to place theirhands at the location of the faces. 
Figure 1
 
Participants viewed computer-generated images of a face in stereo. The imagewas reflected from a CRT onto a mirror. Participants were able to interact withthe graphics at the location of the image, underneath the mirror. Hapticfeedback was provided by a PHANToM™ force feedback device. (Adapted froman illustration by Marc O. Ernst.)
Figure 1
 
Participants viewed computer-generated images of a face in stereo. The imagewas reflected from a CRT onto a mirror. Participants were able to interact withthe graphics at the location of the image, underneath the mirror. Hapticfeedback was provided by a PHANToM™ force feedback device. (Adapted froman illustration by Marc O. Ernst.)
A chin rest and headrest were used to maintain aconsistent viewing position. For the reaching tasks, the participant’s rightindex finger was placed into the thimble of a PHANToM™ force feedbackdevice that was used to give haptic feedback as well as measure the trajectoryof thefinger. 
2.3 Procedure
Each participant performed one verbal and two reachingtasks. Before the trials began, participants were instructed about which partsof the nose and cheek were the targets. So that participants would not bebiased, they were not told to touch the face from the inside or from theoutside, just to approach the target from “the side.” In each task,one of three faces was presented at a distance of 460, 490, or 520 mm from theviewpoint to the center of the face. This range was selected because thestereo-graphics effect began to degrade for faces closer than 460 mm, andparticipants were not able to reach faces further than 520 mm, due theconfiguration of the haptic workspace. Faces were presented in two differentorientations. In half of the trials, the faces were oriented such that theparticipant viewed a normal (convex) face, and in the other half, they viewed ahollow (concave) face. The 36 possible trial types (3 faces × 3 distances × 2targets × 2 orientations) were randomized within trials. The randomization andthe relatively large number of possible trial types make it unlikely thatparticipants were able to guess in which condition they were. In concave trials,the nose was at the same distance from the viewer as the cheek was during convextrials. Likewise, in concave trials, the cheek was at the same distance from theviewer as the nose was during convex trials. The order of tasks (verbal, haptic,and non-haptic) was randomized within participants. 
In the verbal task, participants were asked to give averbal estimate of the distance from their viewing position to either the noseor the cheek of the faces. Estimates were given in arbitrary units, chosen bythe participant. The participants were instructed that their eyes were at zero,and were told to use any metric they were comfortable with, so long as they wereconsistent. In each trial, the face was shown and a tone sounded. The face wasremoved from view after 2 s and a second tone sounded. Participants wereinstructed to respond before the second tone. This limit was imposed to keep theresponse time similar between the reaching and verbal estimates. Eachparticipant made distance estimates for two types of targets (nose or cheek) ontwo types of faces (concave and convex) at three distances (460, 490, and 520mm). Each condition was repeated 33 times for a total of 396 trials in theverbal task. 
In the non-haptic reaching task, participants wereasked to touch either the nose or cheek of the face. The mirror occluded theparticipant’s finger, but a “virtual finger” in the form of a ballwas presented at the position of the fingertip. The finger was not visible atits starting point. In each trial, a face was shown and a tone sounded. The facewas removed as soon as the finger came into view. Because the face and fingerwere both rendered objects, we were able to ensure that the finger and face werenever visible at the same time. A second tone sounded 2 s after the first.Participants were instructed to complete the reach before the second tone. Thislimit was imposed to keep the response time similar between the reaching andverbal estimates. The final Z-position of the finger was recorded as the estimated depth of the target, where the Z-axis is the view direction, with itsorigin between the participant’s eyes. Participants were asked to touch the sideof the nose, or the side of the cheek, so this reach would be consistent withthe haptic task, described below. Each participant made reaches to exactly thesame stimulus conditions as in the verbal tasks (to a total of 396trials). 
The haptic task was similar to the non-haptic task,with the addition of haptic feedback at the tip of the index finger. In allother respects, the haptic task was identical to the non-haptic task (again,participants performed a total of 396 trials). To ensure that the hapticfeedback did not give information about the true distance to the target,ambiguous haptic feedback was given. As shown in Figure 2, a board was rendered in haptic spaceusing a PHANToM™ force-feedback device. The position of the board in the X direction was consistent with the X position of the target nose or cheekin the trial. The board was fairly short (4 cm) in the Y direction, so participants would missthe board if their reaches were not accurate in the Y direction. The board gave the correctfeedback at the X and Y coordinates of the target (nose orcheek), but at any Z (distance). Inthis way we (a) ensured that participants received haptic feedback. This isimportant because lack of haptic feedback might change the planning and dynamicsof the reaching movement (e.g., Goodale, Jakobson, ⇐p; Keillo, 1994); (b) we excluded the possibility thatparticipants adopted a strategy to simply move the finger forward until theytouched the target object. In this case, participants would not need toexclusively use visual information, such that we could not draw inferences aboutthe underlying visual information processing. The board was haptically renderedto be somewhat sticky, such that it discouraged the participants from slidingtheir fingers forward along the board, and discovering its true shape. In thepost-experiment interview, participants were asked if they discovered anythingstrange about the shape of the face from touch. None reported that they did.They were asked directly if the haptic feedback was convincing, and all agreedthat it was. 
Figure 2
 
a. Faces were rendered in graphics space. Participants were asked to reachand touch the side of the nose or the cheek of the face. b. During haptictrials, a “board” was rendered in haptic space to give feedback atthe correct X and Y, but at any Z. c. This figure shows the two spaces in relation to each other. Note thatparticipants could not see the haptically renderedboard.
Figure 2
 
a. Faces were rendered in graphics space. Participants were asked to reachand touch the side of the nose or the cheek of the face. b. During haptictrials, a “board” was rendered in haptic space to give feedback atthe correct X and Y, but at any Z. c. This figure shows the two spaces in relation to each other. Note thatparticipants could not see the haptically renderedboard.
To be sure that the participants had not consciouslyused a different strategy during cue-conflict trials, they were interviewedafter all trials were completed. Participants were first asked if they noticedanything different about any of the faces. Some participants noted thatdifferent face models were used, and that some were further away than others,but none reported noticing the inversion of the faces. Participants were thenasked directly if they had noticed that some of the faces were concave. Again,none reported knowing that they wereconcave. 
2.4 Data analysis
Each trial resulted in a measurement of the estimateddistance to the nose or cheek. In the two reaching tasks, these distances weremeasured in millimeters, such that we could use these values directly for ouranalyses. In the verbal estimation task, estimates were given by theparticipants in arbitrary units. Therefore, we performed a normalizationrelative to the maximum estimate given by each participant, such that wecalculated the estimated distance as percentage of this maximum estimate.Further, because we were interested in the perceived depth (thickness) of theface, not the distance to individual targets, we calculated for each orientationand depth, the average difference between the nose and cheek responses as ameasure of the perceived depth of the face (Figure3). We used a significance level of α = .05 in all our statisticalanalyses. All error bars indicate ±1 SEM
Figure 3
 
The difference between the nose and cheek distance estimates was calculatedas a measure of the participant’s depth estimate.
Figure 3
 
The difference between the nose and cheek distance estimates was calculatedas a measure of the participant’s depth estimate.
Results
We calculated repeated measure ANOVAs for the reacheddistance in the reaching tasks (2 task × 3 distance × 2 concave/convex × 2nose/cheek) and for the estimated distance in the verbal estimation task (3distance × 2 concave/convex × 2 nose/cheek). The results are also graphicallydepicted in Figure 4. We first describe theresults of the ANOVAs in a compact fashion and then relate them to our researchquestions. 
Figure 4
 
Upper row. Average distancesestimated in the verbal task and reached in the two reaching tasks as functionof the distance of the face, the target (nose vs. cheek), and the type of face(concave vs. convex). Lower row. Average depth (i.e., distance to the cheekminus distance to the nose) as a function of the type of face (concave vs.convex) for each of the three tasks. Error bars indicate ±1 SEM. In the upper row we did notpresent error bars because here the SEM contain between and within participants variance and are therefore notinformative for our repeated measures analysis.
Figure 4
 
Upper row. Average distancesestimated in the verbal task and reached in the two reaching tasks as functionof the distance of the face, the target (nose vs. cheek), and the type of face(concave vs. convex). Lower row. Average depth (i.e., distance to the cheekminus distance to the nose) as a function of the type of face (concave vs.convex) for each of the three tasks. Error bars indicate ±1 SEM. In the upper row we did notpresent error bars because here the SEM contain between and within participants variance and are therefore notinformative for our repeated measures analysis.
In the reaching task, we found a main effect ofdistance, F (2,8) = 4.8, p = .043, indicating that participantsresponded to larger distances with longer reaches (see upper panel of Figure 4). This effect was similar for thenon-haptic and the haptic tasks (main effect task was not significant: F (1,4) = 2.8, p = .171). Participants reached furtherto cheeks than to noses (main effect nose/cheek: F (1,4) = 10.3, p =.033). This effect was modulated bythe hollow-face illusion (interaction nose/cheek × concave/convex: F (1,4) = 8.3, p = .045) and was slightly differentfor the two tasks at different distances (interaction nose/cheek × distance xtask: F (2,8) = 9.8, p = .007). All other main effects orinteractions were not significant. 
In the verbal estimation task, we also found a maineffect of distance, F (2,8) = 7.3, p = .016, indicating that participantsresponded to larger distances with larger estimates. Participants estimatedcheeks further than noses (main effect nose/cheek: F (1,4) = 25.7, p = .007) and this effect was modulatedby the hollow-face illusion (interaction nose/cheek × concave/convex: F (1,4) = 49.5, p = .002). All other main effects orinteractions were notsignificant. 
3.1 Does prior knowledge affect reaches to faces?
The first question that we wanted to answer was whetherall three tasks were affected by the hollow-face illusion. If so, then thedistance to the nose should be estimated to be less than the distance to thecheek for the concave faces, even though the distance to the nose was greater,as defined by binocular disparity. The depths (as the difference between thecheek and nose distance estimates) are plotted in the lower row of Figure 4. If participants respond veridically, thedepths should be positive for convex faces, as the nose is closer to theobserver, and negative for the concave faces and have the same magnitude as theconvex estimate. 
Inspecting the figure shows that in all tasks, convexas well as concave depth estimates were positive. This indicates that thehollow-face illusion did affect all three tasks (cf. the significant nose/cheekmain effects in the ANOVAs). In all three tasks the depth estimates weredecreased in the concave conditions relative to the convex conditions (cf. thenose/cheek × concave/convex interaction in the ANOVAs). This indicates that thebinocular information is not totally discounted. But even if we calculateseparate analyses for the concave conditions alone, we still get significantlypositive effects in the reaching tasks, t (4) = 2.8, p = .048, and a strong trend in theverbal estimation task, t (4) = 2.7, p = .055. Both results togetherindicate that prior knowledge is stronger than binocular information, but cannottotally overwrite the binocular information. 
Note that the depth-reducing effect of binocularinformation is similar for all three tasks (verbal estimation: concave depth is44% of convex depth; non-haptic: 46%; haptic: 61%), which suggests similar cuecombination strategies for the different tasks. In the following two sections wewill further explore the questions of cue combination strategies by using theindividual data from each participant (instead of the averaged group data). Bythis approach we are able to further justify our claims, because we exclude thepossibility of artifacts that might be caused by averaging the data of singleparticipants into groupdata. 
3.2 How is prior knowledge combined with binocular disparity?
When viewing the concave faces, prior knowledge is inconflict with binocular disparity. We were interested in how this conflict wasreconciled. The visual system may make a weighted combination of disparity andpriors, or it may use a winner-take-all strategy in which one is completelydisregarded. 
For each task, we plotted the depth estimate in theconcave condition versus the depth estimate in the convex condition for eachobserver. The plots are shown in Figure 5. Each data point is the average for oneparticipant. If prior knowledge completely dominates the depth estimates, theconcave and convex depths should be the same, and we would expect the datapoints to lie on the line of slope 1.0 (which is plotted in red). If stereoinformation completely dominates the depth estimates, then we would expect thedata points to lie on the line of slope −1.0 (which is plotted in blue).If the two cues are weighted, but prior knowledge is weighted more heavily, thenthe data points will lie above y = 0(in the yellow wedge). Similarly, if stereo information is weighted moreheavily, the data points will lie below y = 0 (in the green wedge). As can beseen, for all tasks and for all participants, the data points lie in the yellowwedge, and some of them even lie on the red slope=1 line. That is, priorknowledge dominates the depth estimates for all tasks for all participants. Forparticipant ST, prior knowledge completely dominates the depth estimates in bothreaching tasks. 
Figure 5
 
Average convex depth estimateversus the average concave depth estimate for each of the three tasks for eachparticipant. For all tasks and for all participants, the data points lie above y = 0 (in the yellow wedge), and someof them lie on the (red) slope=1 line. That is, prior knowledge dominates thedepth estimates for all tasks for all participants. For participant ST, priorknowledge completely dominates the depth estimates in both reaching tasks. Errorbars indicate ±1 SEM.
Figure 5
 
Average convex depth estimateversus the average concave depth estimate for each of the three tasks for eachparticipant. For all tasks and for all participants, the data points lie above y = 0 (in the yellow wedge), and someof them lie on the (red) slope=1 line. That is, prior knowledge dominates thedepth estimates for all tasks for all participants. For participant ST, priorknowledge completely dominates the depth estimates in both reaching tasks. Errorbars indicate ±1 SEM.
While prior knowledge dominates the depth judgments,the presence of conflicting binocular disparity flattens the face for all participants in the verbal task, and for all but one participant in the reaching tasks. This means that although prior knowledge is very strong, it does not appear that the cue conflict is resolved with a winner-take-all strategy. 
3.3 A comparison of the illusion’s effect on each task
We found that the hollow-face illusion affected each ofthe three types of tasks. It is also of interest to compare the effectsquantitatively. This is not straightforward when the comparison is between theverbal task and either of the two reaching tasks. The estimates given in theverbal task were in arbitrary units, chosen by each participant. The estimatesgiven in the reaching tasks were in millimeters. To compare relative differencesbetween these measures, we used the following geometrical analysis: Each datapoint in Figure 6 is the average depth estimatefor one task, plotted against the average estimate for another task, for oneorientation (concave vs. convex) for one participant. For example, consider thecomparison of the non-haptic and verbal depth estimates for participants SH inFigure 6. The non-haptic depth estimate in theconcave condition was half the size of the estimate in the convex condition. Ifthe illusion had a similar effect on the perceived depth in the verbal taskother than a change of units, we would expect a similar relationship in thereaching task. This is what we found. The verbal estimates of the participant SHin the concave condition were roughly half the size of the verbal estimates inthe convex condition. More generally, if the data point of the concave conditionlies on the line that connects the origin with the data point of the convexcondition, then the illusion’s effect on cue weighting is the same for bothtasks. If the data of the concave condition lie above this line, then theillusion’s effect is stronger for the task on the y-axis. If the data point of theconcave condition lies below this line, then the illusion’s effect is strongerfor the task on the x-axis. 
Figure 6
 
Each data point is the averagedepth estimate for one task, plotted against the average estimate for anothertask, for one orientation (convex vs. concave) for one participant. If theeffect of the illusion on both tasks is the same, the concave data point shouldlie on the line between the origin and the convex data point. This is true forthree of the five participants when comparing the verbal task to either reachingtask. The illusion had the same effect on the two reaching tasks for a differentgroup of three of the five participants. Error bars indicate ±1 SEM.
Figure 6
 
Each data point is the averagedepth estimate for one task, plotted against the average estimate for anothertask, for one orientation (convex vs. concave) for one participant. If theeffect of the illusion on both tasks is the same, the concave data point shouldlie on the line between the origin and the convex data point. This is true forthree of the five participants when comparing the verbal task to either reachingtask. The illusion had the same effect on the two reaching tasks for a differentgroup of three of the five participants. Error bars indicate ±1 SEM.
As shown in Figure 6, the illusion’s effect is similarfor three of the five participants when comparing the verbal task to eitherreaching task. One of the remaining participants shows a lesser effect on thereaching tasks (AL, shown in blue), and one shows a greater effect (ST, shown ingreen). That is, the weighting given to binocular disparity versus priorknowledge is the same for three of the five participants. For one participant,binocular disparity is weighted more heavily. For another participant, priorknowledge is weighted more heavily. 
Discussion
4.1 Prior knowledge and reach
The first question we wanted to answer was whether themotor system uses prior knowledge about the objects that it is reaching to. Wefound that participants do not reach to a nose that is behind a cheek, that is,the motor system is affected by the hollow-face illusion. 
One might argue, however, that this effect could be theresult of a general convexity bias and not to prior knowledge about facialgeometry. We believe this is less plausible because of results showing thehollow-face illusion is more than a convexity bias for verbal judgments (Hill⇐p; Bruce, 1994) coupled with thesimilarity between the reaching and verbal data (cf. Figure 4 and Figure6). In fact, general convexity would require stronger assumptions to hold.In particular, it would require that (a) the general convexity bias is strongerin reaching tasks than in the verbal task and (b) the increase in strength wasexactly large enough to make the illusion’s effect on the two tasks thesame. 
Also, our findings are consistent with previousresearch showing that the motor system takes into account prior knowledge aboutan object in different grasping tasks (e.g., Gordon, 1993; Fikes, Klatzky, ⇐p; Lederman, 1994; Haffenden ⇐p; Goodale, 2000). 
Not only is prior knowledge used to guide reaches, itcan even dominate binocular disparity for the given stimuli, as is shown in Section 3.2. This raises the question, how arebinocular disparity and prior knowledgecombined? 
4.2 Cue combination
The second question we addressed was how prior knowledge and binocular disparity interact when in conflict. A simple strategy would be a winner-take-all approach in which the visual system relies solely on either binocular disparity or prior knowledge for its depth estimate. Because the hollow-face illusion exists, it is clear that the visual system does not rely solely on binocular disparity. The data in Section 3.2 show that concave faces are estimated to be flatter than convex faces, so the visual system is not relying solely on prior knowledge either. Clearly, depth information from prior knowledge and binocular disparity is being combined in some way. 
Integration of prior knowledge with current data has asimple interpretation in terms of Bayesian models of perception. Previous workon surface depth perception has provided strong evidence for a model of depth cueintegration that combines information in a statistically optimal fashion (forreviews see, Ernst ⇐p; Bülthoff, 2004; Bülthoff ⇐p; Yuille, 1996; Yuille ⇐p; Bülthoff, 1996; Landy, Maloney, Johnston, ⇐p; Young, 1995) using Bayesian inference. In thesemodels, cue information is modeled using a likelihood function (the conditionalprobability of the cue value given a depth) for each cue. Cues and priorinformation (in the form of probability distributions) are then integrated bymultiplying the distributions. In the simplest form of these models, likelihoodfunctions and priors can be modeled as Gaussian distributions on depth or shapevariables. In this case, the optimal estimate (maximum a posteriori) has aparticularly simple form-a linear combination of the maximum likelihooddepth/shape estimates from each distribution, weighted by its inverse variance(reliability). Linear cue integration models can also serve as usefulapproximations to optimal statistical inference even when the distributions arenot Gaussian (Yuille ⇐p; Bülthoff, 1996). 
A linear cue integration model for our experiment isshown in Equation1:  
(1)
where d is the combineddepth estimate, db is the individual depth estimate from binocular disparity, and dp represents the depth expected from prior knowledge. wp and wb are weights on those individual depth estimates that represent therelative reliabilities. Finally, WocDoc represents some unknown linear combination of other cues (e.g., pictorial cues,shape from shading, etc.) or priors (e.g., a bias toward a surface smoothness)that may affect perception of our face stimuli. 
Note that db changes sign between the convex and concave cases. if wb is small compared to wb, d will be smallerfor concave faces than for convex faces, but will not change sign, so the concave faces will appear to be flatter than the convex faces, but will not be perceived to be concave. This is consistent with our results. However, this is not the only model consistent with these results. 
An alternative explanation for our results can beformulated using robust approaches to statistical cue combination (Clark ⇐p;Yuille, 1990; Maloney ⇐p; Landy, 1989; Landy, Maloney, ⇐p; Young, 1991;Shunck, 1989; Sinha ⇐p; Shunck, 1992). In robust cue combination, data aredisregarded if it falls too far outside of expected parameters or if it isinconsistent with other data assumed to be reliable. It is possible that whenviewing a convex face, where prior knowledge and binocular disparity are inagreement, wb has its typical value. However, the conflict between prior knowledge andbinocular disparity generated by viewing a concave face may result in thebinocular disparity information being ”thrown out” as unreliable. Inthis case, wb would be set equal to zero. If we also assume that wp and dp are the same in the concave case as in the convex case, and that WocDoc includes a strong bias toward a smoothsurface, the new estimate d will be smallerin the concave case, and the face will appear to be flatter than in the convexcase. Therefore, this robust statistical approach could also be consistent withour results. Because our data do not test these assumptions, the nature of depthcue combination in the motor system must be resolved by further study. 
4.3 Task dependence
Finally, we show that the magnitude of the hollow-faceillusion is similar for all three tasks (cf. Figure4 and Figure 6). This can parsimoniously beexplained if we assume that in all tasks the depth estimates are generated bythe same mechanism. Our results are consistent with studies that found similareffects of visual illusions on perception, grasping, pointing, and saccades(e.g., Pavan, Boscagli, Benvenuti, Rabuffetti, ⇐p; Farne, 1999; van Donkelaar, 1999; Franz, Gegenfurtner, Bülthoff,⇐p; Fahle, 2000; Dassonville ⇐p; Bala,2004) and might help to resolve thecurrent debate on the question of whether motor behavior and perception rely onfundamentally different processing of visual information (e.g., Bridgeman,Kirch, ⇐p; Sperling, 1981; Aglioti,DeSouza, ⇐p; Goodale, 1995; forreviews, see Bruno, 2001; Carey, 2001; Franz, 2001; Smeets ⇐p; Brenner, 2001; Glover, 2002; Goodale ⇐p; Westwood, 2004). 
Kroliczak, Heard, Goodale, and Gregory (in press) have recently described anexperiment in which participants were required to “flick” a small target object(a little magnet) off of a location on masks of convex or concave faces. Theseflicking movements were directed at the real, rather than the illusory,locations of the targets and therefore did not show an effect of the hollow-faceillusion. Kroliczak et al. (in press)interpreted their results as consistent with the hypothesis of distinct visualpathways for perceptual judgments versus goal-directed movements. We see,however, two limitations of this conclusion. 
First, Kroliczak et al. (in press) did not use ambiguous feedback(as we did in the present study). That is, participants were required to reallyflick the little magnets from the masks and the magnets were always located atthe real, not at the illusory, location on the faces. In consequence, aparticipant whose motor system was deceived by the hollow-face illusion couldnot perform the flicking at all and should have stopped in mid air, trying toflick unsuccessfully. It seems plausible that such a participant immediatelychanged the motor strategy to accomplish the task. This could happen in twoways: (a) The participant could try to use any available cue to detect whetherthe current stimulus is the normal or the hollow face and, in the case of thehollow face, simply move further than the visual input would normally tell themotor system. There were ample cues in this experiment that allowed participantsto discriminate between hollow and normal faces. For example, the magnets werealways convex such that for the hollow face there was a conflict between theconcave shape of the face and the convex shape of the magnets. Also, the faceswere illuminated by a little spotlight that was either placed above the normalface or below the hollow face. Such a spotlight creates a brightness gradient,such that its position is detectable by the participant and therefore adiscrimination is possible between normal and hollow faces. (b) The participantcould weight the binocular information more in this task to detect the realpositions of the magnets. (For practical reasons the binocular information wasartificially degraded in this study, but this need not interfere with thepossibility to utilize it by weighting it more; see our discussion of Bayesianmodels above.) In summary, a “fair” experimental procedure would require thatthe target object is either presented at both, the illusory as well as the realpositions on the face, or (even better) that flicking is always successful, nomatter at which distance the participant attempts to perform the flicking. Thisis what we achieved by the use of a virtual environment and ambiguous feedback(cf. Figure 2). 
Second, the fact that Kroliczak et al. (in press) found no effect of thehollow-face illusion in their flicking task, but did find an effect in apointing task (which was similar to the flicking task, but required no flickingand was performed slower than the flicking) is interpreted by them as anindication that flicking was controlled by a system other than the slow pointingmovements (dorsal vs. ventral streams, respectively). However, an analysis ofthe computational requirements for various tasks provides another level ofexplanation for the various ways in which cues may be combined (or rejected)other than this interpretation of two distinct systems. Schrater and Kersten (2000) used decision theory to show that cuecombination for optimal depth estimation depends crucially on the representationof depth (see Geisler ⇐p; Kersten, 2002, for a simple illustration of decisiontheory for perceptual estimations). In particular, the best estimate of depth ofa target depends on how (not just whether) information about a backgroundsurface is represented. Reaching movements could depend on whether the targetobject is treated as distinct from the surface or as part of the surface. This,in turn, could depend on visual factors (whether a target is in contact, not incontact, or a surface marking) and also on task prerequisites (e.g., “flicking”implies removability, touching does not). In addition to decision theoreticconstraints, dynamical constraints with respect to the goal of the reach shouldalso play an important role in determining visual motor trajectories. Thekinematics, up to the point of expected contact, can depend on the expectedconsequences beyond the time of contact. For example, if a target is beingtouched with a movement perpendicular to a background surface, any followthrough of the movement would be blocked by the surface, and thus backgroundsurface depth is an important piece of information. If it is being flicked, itis free to move tangential to the surface, and the background surface depth isless crucial. Task constraints may modulate cue integration through changes inattentionalallocation. 
Conclusion
Using hollow faces as a target for distanceestimations, we have shown that prior knowledge of object shape can dominateshape from binocular disparity information in reaching tasks, as well as inverbal tasks. The shape estimates from the two sources of information arecombined, rather than one being thrown out as completely unreliable. Theresulting shape estimates are similar for both verbal and reaching tasks, whichis what we would expect if the same cue combination strategy is being used forthe reaching and the verbaltasks. 
Acknowledgments
This work was first presented at the 2001 VisionSciences Society Conference, Sarasota, Florida(cf. Hartung, Franz, Kersten, ⇐p; Bülthoff, 2001). The work was supported by National Institutes of Health Grants R01 EY11507 and R01 EY015261-01, the Max Planck Society, and grant FA 119/15-2 from the Deutsche Forschungsgemeinschaft. 
Commercial relationships: none. 
Corresponding author: Volker H.Franz. 
Address: University of Giessen,Otto-Behaghel-Strasse 10F 35394, Giessen, Germany. 
References
Aglioti, S. DeSouza, J. F. X. Goodale, M. A. (1995). Size-contrast illusions deceive the eye but not the hand. Current Biology, 5(6), 679–685. [PubMed] [CrossRef] [PubMed]
Belhumeur, P. Kriegman, D. Yuille, A. (1999). The bas-relief ambiguity. International Journal of Computer Vision, 35(1), 33–44. [CrossRef]
Blanz, V. Vetter, T. (1999). A morphable model for the synthesis of 3D faces. SIGGRAPH’99 Conference Proceedings, 187–194.
Bridgeman, B. Lewis, S. Heit, G. Nagle, M. (1979). Relation between cognitive and motor-oriented systems of visual position perception, Journal of Experimental Psychology: Human Perception and Performance, 5, 692–700. [PubMed] [CrossRef] [PubMed]
Bridgeman, B. Kirch, M. Sperling, A. (1981). Segregation of cognitive and motor aspects of visual function using induced motion. Perception & Psychophysics, 29, 336–342. [PubMed] [CrossRef] [PubMed]
Bridgeman, B. Peery, S. Anand, S. (1997). Interaction of cognitive and sensorimotor maps of visual space. Perception & Psychophysics 59, 456–469. [PubMed] [CrossRef] [PubMed]
Bruno, N. (2001). When does action resist visual illusions? Trends in Cognitive Sciences, 5(9), 379–382. [PubMed] [CrossRef] [PubMed]
Bülthoff, H. H. Yuille, A. L. McClelland, J. Inui, T. (1996). A Bayesian framework for the integration of visual modules Attention & performance XVI: Information integration in perception and communication (pp. 49–70). Cambridge: MIT Press.
Clark, J. J. Yuille, A. L. (1990) Data fusion for sensory information processing systems, Boston: Kluwer Academic Publishers.
Carey, D. P. (2001). Do action systems resist visual illusions? Trends in Cognitive Sciences, 5(3), 109–113. [PubMed] [CrossRef] [PubMed]
Dassonville, P. Bala, J. K. (2004). Perception, action, and Roelofs effect: A mere illusion of dissociation. PLoS Biology, 2(11), 1936–1945. [PubMed][Article] [CrossRef]
Ernst, M. O. Bülthoff, H. H. (2004). Merging the senses into a robust percept. Trends in Cognitive Sciences 8(4), 162–169. [PubMed] [CrossRef] [PubMed]
Fikes, T. G. Klatzky, R. L. Lederman, S. J. (1994). Effects of object texture on precontact movement time in human prehension. Journal of Motor Behavior, 26, 325–332. [PubMed] [CrossRef] [PubMed]
Franz, V. H. (2001). Action does not resist visual illusions. Trends in Cognitive Sciences, 5(11), 457–459. [PubMed] [CrossRef] [PubMed]
Franz, V. H. Gegenfurtner, K. R. Bülthoff, H. H. Fahle, M. (2000). Grasping visual illusions: No evidence for a dissociation between perception and action. Psychological Science, 11(1), 20–25. [PubMed] [CrossRef] [PubMed]
Geisler, W. S. Kersten, D. (2002). Illusions, perception and Bayes. Nature Neuroscience, 5, 508–510. [PubMed] [CrossRef] [PubMed]
Glover, S. (2002). Visual illusions affect planning but not control. Trends in Cognitive Sciences, 6(7), 288–292. [PubMed] [CrossRef] [PubMed]
Goodale, M. A. Jakobson, L. S. Keillor, J. M. (1994). Differences in the visual control of pantomimed and natural grasping movements. Neuropsychologia, 32, 1159–1178. [PubMed] [CrossRef] [PubMed]
Gordon, A. M. Westling, G. Cole, K. J. Johansson, R. S. (1993). Memory representations underlying motor commands used during manipulation of common and novel objects. Journal of Neurophysiology, 69, 1789–1796. [PubMed] [PubMed]
Goodale, M. A. Westwood, D. A. (2004). An evolving view of duplex vision: Separate but interacting cortical pathways for perception and action. Current Opinion in Neurobiology, 14, 203–211. [PubMed] [CrossRef] [PubMed]
Gregory, R. L. Gregory, R. L. Gombrich, R. L. (1973). The confounded eye Illusion in nature and art (pp. 49–96). London: Duckworth.
Haffenden, A. M. Goodale, M. A. (2000). The effect of learned perceptual associations on visuomotor programming varies with kinematic demands. Journal of Cognitive Neuroscience, 12(6), 950–964. [PubMed] [CrossRef] [PubMed]
Hartung, B. Franz, V. H. Kersten, D. Bülthoff, H. H. (2001). Is the motor system affected by the hollow face illusion? [Abstract] Journal of Vision, 1(3), 256a, http://journalofvision.org/1/3/256/, doi:10.1167/1.3.256. [CrossRef]
Hespanha, J. Dodds, Z. Hager, G. D. Morse, A. S. (1999). What tasks can be performed with an uncalibrated stereo vision system? The International Journal of Computer Vision, 35(1), 65–85. [CrossRef]
Hill, H. Bruce, V. (1993). Independent effects of lighting, orientation, and stereopsis on the hollow-face illusion. Perception, 22, 887–897. [PubMed] [CrossRef] [PubMed]
Hill, H. Bruce, V. (1994). A comparison between the hollow-face and ‘hollow-potato’ illusions. Perception, 23, 1335–1337. [PubMed] [CrossRef] [PubMed]
Hill, H. Bruce, V. (1996). Effects of lighting on the perception of facial surfaces. Journal of Experimental Psychology: Human Perception and Performance, 22, 986–1004. [PubMed] [CrossRef] [PubMed]
Kroliczak, G. Heard, P. Goodale, M. A. Gregory, R. L. (in press). Dissociation of perception and action unmasked by the hollow-face illusion. Cognitive Brain Research.
Landy, M. S. Maloney, L. T. Johnston, E. B. Young, M. J. (1995). Measurement and modeling of depth cue combination: In defense of weak fusion, Vision Research, 35, 389–412. [PubMed] [CrossRef] [PubMed]
Landy, M. S. Maloney, L. T. Young, M. J. (1991). Psychophysical estimation of the human depth combination rule. Proceedings of the SPIE, 1383, 247–254.
Langer, M. S. Bülthoff, H. H. (2001). A prior for global convexity in local shape-from-shading. Perception, 30, 403–410. [PubMed] [CrossRef] [PubMed]
Maloney, L. T. Landy, M. S. (1989). A statistical framework for robust fusion of depth information. Proceedings of the SPIE, 1199, 1154–1163.
Milner, A. D. Goodale, M. A. (1995). The visual brain in action. Cambridge: Oxford University Press.
Pavani, F. Boscagli, I. Benvenuti, F. Rabuffetti, M. Farne, A. (1999). Are perception and action affected differently by the Titchener circles illusion? Experimental Brain Research, 127, 95–101. [PubMed] [CrossRef] [PubMed]
Schrater, P. R. Kersten, D. (2000). How optimal depth cue integration depends on the task. International Journal of Computer Vision, 40(1), 73–91. [CrossRef]
Schunck, B. G. (1989). Robust estimation of image flow. Proceedings of the APIE, 1198, 116–127.
Sinha, S. S. Schunck, B. G. (1992). A two stage algorithm for discontinuity-preserving surface reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 36–55. [CrossRef]
Smeets, J. B. J. Brenner, E. (2001). Action beyond our grasp. Trends in Cognitive Sciences, 5(7), 287. [CrossRef]
Troje, N. F. Bülthoff, H. H. (1996). Face recognition under varying poses: The role of texture and shape. Vision Research, 36, 1761–1771. [PubMed] [CrossRef] [PubMed]
Yuille, A. L. Bülthoff, H. H. Knill, D. Richards, W. (1996). Bayesian decision theory and psychophysics Perception as Bayesian inference (pp. 123–161). Cambridge: Cambridge University Press.
van Donkelaar, P. (1999). Pointing movements are affected by size-contrast illusions. Experimental Brain Research, 125(4), 517–520. [PubMed] [CrossRef] [PubMed]
Figure 1
 
Participants viewed computer-generated images of a face in stereo. The imagewas reflected from a CRT onto a mirror. Participants were able to interact withthe graphics at the location of the image, underneath the mirror. Hapticfeedback was provided by a PHANToM™ force feedback device. (Adapted froman illustration by Marc O. Ernst.)
Figure 1
 
Participants viewed computer-generated images of a face in stereo. The imagewas reflected from a CRT onto a mirror. Participants were able to interact withthe graphics at the location of the image, underneath the mirror. Hapticfeedback was provided by a PHANToM™ force feedback device. (Adapted froman illustration by Marc O. Ernst.)
Figure 2
 
a. Faces were rendered in graphics space. Participants were asked to reachand touch the side of the nose or the cheek of the face. b. During haptictrials, a “board” was rendered in haptic space to give feedback atthe correct X and Y, but at any Z. c. This figure shows the two spaces in relation to each other. Note thatparticipants could not see the haptically renderedboard.
Figure 2
 
a. Faces were rendered in graphics space. Participants were asked to reachand touch the side of the nose or the cheek of the face. b. During haptictrials, a “board” was rendered in haptic space to give feedback atthe correct X and Y, but at any Z. c. This figure shows the two spaces in relation to each other. Note thatparticipants could not see the haptically renderedboard.
Figure 3
 
The difference between the nose and cheek distance estimates was calculatedas a measure of the participant’s depth estimate.
Figure 3
 
The difference between the nose and cheek distance estimates was calculatedas a measure of the participant’s depth estimate.
Figure 4
 
Upper row. Average distancesestimated in the verbal task and reached in the two reaching tasks as functionof the distance of the face, the target (nose vs. cheek), and the type of face(concave vs. convex). Lower row. Average depth (i.e., distance to the cheekminus distance to the nose) as a function of the type of face (concave vs.convex) for each of the three tasks. Error bars indicate ±1 SEM. In the upper row we did notpresent error bars because here the SEM contain between and within participants variance and are therefore notinformative for our repeated measures analysis.
Figure 4
 
Upper row. Average distancesestimated in the verbal task and reached in the two reaching tasks as functionof the distance of the face, the target (nose vs. cheek), and the type of face(concave vs. convex). Lower row. Average depth (i.e., distance to the cheekminus distance to the nose) as a function of the type of face (concave vs.convex) for each of the three tasks. Error bars indicate ±1 SEM. In the upper row we did notpresent error bars because here the SEM contain between and within participants variance and are therefore notinformative for our repeated measures analysis.
Figure 5
 
Average convex depth estimateversus the average concave depth estimate for each of the three tasks for eachparticipant. For all tasks and for all participants, the data points lie above y = 0 (in the yellow wedge), and someof them lie on the (red) slope=1 line. That is, prior knowledge dominates thedepth estimates for all tasks for all participants. For participant ST, priorknowledge completely dominates the depth estimates in both reaching tasks. Errorbars indicate ±1 SEM.
Figure 5
 
Average convex depth estimateversus the average concave depth estimate for each of the three tasks for eachparticipant. For all tasks and for all participants, the data points lie above y = 0 (in the yellow wedge), and someof them lie on the (red) slope=1 line. That is, prior knowledge dominates thedepth estimates for all tasks for all participants. For participant ST, priorknowledge completely dominates the depth estimates in both reaching tasks. Errorbars indicate ±1 SEM.
Figure 6
 
Each data point is the averagedepth estimate for one task, plotted against the average estimate for anothertask, for one orientation (convex vs. concave) for one participant. If theeffect of the illusion on both tasks is the same, the concave data point shouldlie on the line between the origin and the convex data point. This is true forthree of the five participants when comparing the verbal task to either reachingtask. The illusion had the same effect on the two reaching tasks for a differentgroup of three of the five participants. Error bars indicate ±1 SEM.
Figure 6
 
Each data point is the averagedepth estimate for one task, plotted against the average estimate for anothertask, for one orientation (convex vs. concave) for one participant. If theeffect of the illusion on both tasks is the same, the concave data point shouldlie on the line between the origin and the convex data point. This is true forthree of the five participants when comparing the verbal task to either reachingtask. The illusion had the same effect on the two reaching tasks for a differentgroup of three of the five participants. Error bars indicate ±1 SEM.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×