The LME model showed that the pupil orienting response (β = 0.003 ± 0.0001,
t = 4.04,
p < 0.001), baseline pupil size (β = 0.0009 ± 0.0003,
t = 2.53,
p = 0.014), and dwell time (β = 0.10 ± 0.01,
t = 7.29,
p < 0.001) significantly predicted the subsequent building duration (
Figure 3). Notably, Condition did not significantly predict build duration in this model, meaning that these effects were not driven by the differences in delay time (β = 0.04 ± 0.12,
t = 0.34,
p = 0.736). These findings show that stronger orienting responses predicted longer build durations. This relatively fast (and often ignored) pupil response thus revealed how much one will encode into VWM, which is in line with the
depth of sensory processing account (
Binda & Gamlin, 2017;
Mathôt & Van der Stigchel, 2015;
Strauch et al., 2022). For the first time, to the best of our knowledge, we are reporting a physiological marker (namely, the pupil orienting response) that reveals changes in the depth of encoding during a shift from external sampling to internal storing. Even preceding the pupil orienting response, a relatively small baseline pupil size significantly predicted longer build durations, seconds before building even commences. This finding is potentially compatible with the adaptive gain theory (
Aston-Jones & Cohen, 2005), wherein tonic LC firing and thus baseline pupil size are linked to task performance. When baseline pupil sizes were relatively small, this approached the “peak” of the inverted-U, whereas participants may have been highly aroused whenever their pupils were relatively large during baseline, leading to worse performance. Because the copy task is a quite effortful task (i.e., remembering and placing six items) (see
Robison & Unsworth, 2019), it is likely that the arousal levels of participants were indeed positioned on this “right” side of the inverted-U curve. Finally, in line with previous work, longer dwell times were linked with storing more visual information in VWM (
Draschkow et al., 2021;
Somai et al., 2020). Together, these results show that ocular metrics reflect not only
where and
how long attention is deployed but also
how strongly it is deployed, as well as the current attentional state, and together these attentional aspects determine how much is encoded into VWM.