The experimental setup consisted of a motorized linear stage with a sliding carriage (see
Figure 2). A thin metal pole was mounted on the top of the carriage, whereas a physical object (pointer) was mounted on the top of the pole. The participant could move the pointer in two directions – closer and further away, with a help of a controller (max speed: 5 cm/s). Given the mechanical characteristics of the system, the pointer could be positioned with a precision of 1 mm. The participant sat facing the linear stage and wore the headset. The seating height was varied using an adjustable chair. To ensure a uniform viewing angle and minimize the possible effect of head motion on perceptual judgments, participants rested their chins on a chinrest fixed on the tabletop. The fixation of head position and adjustable-height occluding surface were required to ensure that the participant did not see the rail and could not use its appearance as an additional depth cue.
Before the task, each participant underwent the display calibration procedure. First, the interpupillary distance of each participant was determined. Then, this value was used for the image rendering engine – as the rendering parameter. To fine-tune the alignment in software, a calibration image was shown through the headset to the participant on each focal plane separately. In consonance with the output of the calibration image, the physical stimulus on the linear stage was set to the corresponding distance of the given focal plane. Similar to the procedure implemented in
Livingston, Ellis, White, Feiner, and Lederer (2006), the participant was asked to adjust the digital image offset for two parts of a calibration image while looking at the physical stimulus. The adjustments were performed until the participant saw the calibration image as a symmetrical cross. The calibration steps were repeated two times for all focal planes to test the consistency and accuracy of the obtained results. Careful calibration allowed us to present visual stimuli with the intended vergence and focal distance accurately while keeping visual angle constant along the line of sight.
Next, the perceptual distance matching task followed. The variability in the depth cues was achieved by switching between a multifocal (consistent-cues condition) and single focal plane mode (inconsistent-cues condition), when deactivating all but one display plane. Thus, both conditions were realized using the same headset – ensuring the identical attributes of the conveyed images (i.e. the field of view, image brightness, image refresh rate, and color balance).
In both conditions, the vergence stimuli varied corresponding to the image demonstration distance. However, the focal stimulus was equal to the vergence stimulus in the consistent-cues condition, and fixed – in the inconsistent-cues condition. The image was demonstrated at three distances from the participant: 45 cm, 65 cm, and 115 cm, which corresponded to 2.22 D, 1.54 D, and 0.87 D demand, respectively. These rendered image distances were chosen in order to match the distances of focal planes when the display was driven in the multifocal mode. In the consistent-cues condition, the images were displayed at the focal distances of planes that coincided with the rendered image distances. In the inconsistent-cues condition, only the display plane with the focal distance at 530 cm (0.19 D) was used. The induced conflict magnitude (c) in the stimuli to vergence and accommodation was calculated as follows: c = 1/ dv – 1/ da, where dv is the rendered image distance, and da is the focal plane distance. As a result, the conflict magnitude was 2.03 D, 1.35 D, or 0.68 D depending on the rendered image distance when the display was driven in the single focal plane mode. Trials were blocked by the condition of cues consistency. The order of the conditions was counterbalanced across participants.
The initial session included two repetitions of tasks per rendered image distance to familiarize the participants with the visual stimulus, task, and setup. Then, the experiment session followed.
The participant was shown a separate image for each eye using the headset. Provided that the fusional reserves ensured proper merging of two images, the participant saw a single image with one star in the center of a rectangular arch, and circles at the corners of it. If stereoscopic fusion failed, the participant experienced diplopia. Participants were asked to inform the experimenter immediately about the double image. In this case, the experiment was terminated. The contours of all visual stimuli were white. To avoid the potential effect of monocular suppression on spatial judgments in augmented reality (
Rosales, Pointon, Adams, Stefanucci, Creem-Regehr, Thompson, & Bodenheimer, 2019), different circles were demonstrated separately to each eye. The maximum number of circles was four (in total for both eyes). The total number of circles at the beginning of the task was chosen in a random order (from 2 to 4). The possible locations were as follows: in the upper right corner, in the upper left corner, in the lower right corner, and in the lower left corner. The participant was asked to inform the experimenter immediately if the circle(s) disappeared during the trial.
In the beginning of each trial, if the participant saw one star and one arch, they responded about the number of circles perceived in the trial. The time countdown began when the response was submitted. The experiment was not time-constrained; however, the participants were instructed to complete the task as accurately and quickly as possible. The participant moved the pointer to align it with the apparent position of the projected star using a controller. When the participant finished the alignment, they reported it and closed their eyes until the next instruction. As soon as the response was given, the time countdown was stopped, and the value of matched distance was collected. Next, the experimenter changed the position of the physical pointer to one of the predefined initial distances (±5, ±10, ±15, and ±20 cm from the rendered image distance), the sequence of which randomly varied among trials, rendered image distances, and cues consistency conditions. Then, the experimenter switched on the next trial, asked the participant to open their eyes, and the next trial took place. Eight repetitions of the perceptual matching task were performed at each rendered image distance. Each participant completed 2 (cues consistency conditions) × 3 (rendered image distances) × 8 (repetitions) = 48 trials of perceptual distance matching, and the experiment yielded a total of 40 (participants) × 48 (trials) = 1920 trials in the analysis.