Finally, stimuli from the candidate spotlight size (smallest spotlight size leading to performance similar to natural viewing) were compared to natural viewing stimuli. The goal here was to assess how much information was preserved in the spotlight condition compared to natural viewing when considering observers' fixations. Indeed, it is not straightforward to infer the perceptual span for face recognition based on our gaze-contingent manipulation. We needed to quantify how much information (in terms of spatial extent) in the critical spotlight was identical to the natural viewing condition. In order to address this question, we needed to consider four challenges. First, the spotlight is a Gaussian aperture, so the target and average faces information blend progressively. Second, visual acuity drops off with retinal eccentricity. Thus, small extrafoveal variability in high spatial frequencies between the spotlight and natural viewing stimuli might not be captured by the visual system. Third, the similarity between spotlight and natural viewing stimuli might depend on how dissimilar a given target face (presented centrally) is from the average face (displayed extrafoveally). Finally, the similarity between spotlight and natural viewing stimuli might depend on the actual fixation location. To address these challenges, we used a procedure that aimed at mimicking early constraints of the visual system (e.g., acuity drop-off with eccentricity), while also considering targets' typicality (how similar to the average face a specific target face is), as well as fixation locations. First, for each participant and stimulus, we convolved a retinal filter (Targino Da Costa & Do,
2014) on the spotlight of interest and natural viewing stimuli according to fixation locations in the critical spotlight condition (see
Figure 2 for the Facespan reconstruction pipeline). The retinal filter parameters were distance to the screen (700 mm in our viewing conditions), stimulus size in pixels, and the lossy parameter (Δ = 25, chosen to include visual perceptual lost according to human sight). We then used the Structural SIMilarity index (SSIM; Wang, Bovik, Sheikh, & Simoncelli,
2004) to quantify, independently for each target face, the similarity between spotlight and natural viewing stimuli after retinal filtering at each of the fixation locations in the critical spotlight condition. The SSIM uses luminance, contrast, and structure of two images in order to assess similarity of both images pixel by pixel. In the next step, we used the pixel-test (Chauvin, Worsley, Schyns, Arguin, & Gosselin,
2005) to assess significance in the SSIM maps corresponding to each target face and fixation location. The pixel-test (Chauvin et al.,
2005), which is based on the Random Field Theory (RFT), allowed highlighting the information that was significantly preserved between spotlight and natural viewing conditions (pixels that were significantly most similar). We used the following parameters for the pixel-test as recommended by Chauvin et al. (
2005): sigma = 20, cluster test threshold = 2.7,
p = 0.05. The Random Field Theory provides the probability of observing a cluster of pixels exceeding a threshold in a smooth Gaussian random field while taking into account the spatial correlation inherent to the data set. More details about SSIM and the pixel-test can be found in Wang et al. (
2004) and Chauvin et al. (
2005), respectively. We finally centered and averaged, across stimuli, observers and fixations, the areas of information preserved by the critical spotlight manipulation (significant SSIM according to pixel-test after retinal filtering on spotlight and natural viewing stimuli). The rationale of this approach is that the amount of preserved information, corresponding to the minimal spotlight from which additional information does not improve performance, is the Facespan.