**We regularly interact with moving objects in our environment. Yet, little is known about how we extrapolate the future movements of visually perceived objects. One possibility is that movements are experienced by a mental visual simulation, allowing one to internally picture an object's upcoming motion trajectory, even as the object itself remains stationary. Here we examined this possibility by asking human participants to make judgments about the future position of a falling ball on an obstacle-filled display. We found that properties of the ball's trajectory were highly predictive of subjects' reaction times and accuracy on the task. We also found that the eye movements subjects made while attempting to ascertain where the ball might fall had significant spatiotemporal overlap with those made while actually perceiving the ball fall. These findings suggest that subjects simulated the ball's trajectory to inform their responses. Finally, we trained a convolutional neural network to see whether this problem could be solved by simple image analysis as opposed to the more intricate simulation strategy we propose. We found that while the network was able to solve our task, the model's output did not effectively or consistently predict human behavior. This implies that subjects employed a different strategy for solving our task, and bolsters the conclusion that they were engaging in visual simulation. The current study thus provides support for visual simulation of motion as a means of understanding complex visual scenes and paves the way for future investigations of this phenomenon at a neural level.**

_{(L)}, or right, P

_{(R)}. Since there were only two possible options, the sum of these probabilities was always 1. We were able to use these probabilities to devise a new way of assigning an uncertainty score to each board, within the context of this particular strategy. We determined this alternate uncertainty score with the following formula:

*Uncertainty = 1 − |P*

_{(L)}–

*P*

_{(R)}|. Thus, a board for which the model predicted a P

_{(L)}value of 0.99 and P

_{(}

_{R)}value of 0.01 would be classified as being low uncertainty. On the other hand, a board for which the model predicted a P

_{(}

_{L)}value of 0.51 and a P

_{(}

_{R)}value of 0.49 would be classified as being high uncertainty. Here again, we transformed our uncertainty scores to a 0–100 scale. We then analyzed whether this new, image-analysis based method of assigning board uncertainty was predictive of reaction time and accuracy, and if so, how it compared to the simulation-based method of assigning uncertainty described above.

*F*(1, 198) = 44.39,

*p*< 0.001,

*R*

^{2}= 0.1831. This is congruent with our hypothesis, and is compatible with the idea that subjects were indeed carrying out visual simulations. To ensure that this effect was not being driven primarily by a small subset of subjects (note that the previous regression was carried out using the mean reaction times of all of our subjects), we repeated this same analysis on a subject-by-subject basis. We found that simulation length was a significant predictor of reaction time for each individual subject. The distribution of the slopes for the 16 individual regressions is shown in Figure 3C. A single sample

*t*test revealed that the mean of this distribution was significantly greater than zero,

*t*(15) = 15.28,

*p*< 0.001. We also noted that accuracy on this task was not significantly predicted by simulation length,

*F*(1, 198) = 2.406,

*p*= 0.122,

*R*

^{2}= 0.012. This too is unsurprising, since subjects' overall accuracy was very high. Further, as we did not impose any time constraints on our subjects, the effect of the speed-accuracy trade-off was largely reflected in the reaction time, with no notable effect on accuracy. Figure 3B depicts the effect of simulation uncertainty on reaction time and accuracy. Here we note that as simulation uncertainty increased, so did reaction time

*F*(1, 198) = 240.5,

*p*< 0.001,

*R*

^{2}= 0.5485). As before, we repeated this same analysis on a subject-by-subject basis. We found that simulation uncertainty was a significant predictor of reaction time for each individual subject. The distribution of the slopes for the 16 individual regressions is shown in Figure 3D. A single sample

*t*test revealed that the mean of this distribution was significantly greater than zero,

*t*(15) = 24.258,

*p*< 0.001. Finally, an increase in simulation uncertainty predicted a decrease in accuracy,

*F*(1, 198) = 44.46,

*p*< 0.001,

*R*

^{2}= 0.1834. These findings too are consistent with our hypotheses.

*p*< 0.001), this was not the case for length (β = 0.0027728,

*p*< 0.889), or the interaction term (β = 0.0004302,

*p*< 0.224). The results of this multiple regression model confirm that simulation length and uncertainty do indeed have differential effects on task accuracy. In summary, we found that mean reaction time and accuracy across subjects on this task could successfully be predicted by metrics describing the ball's trajectory. Our data thus support the idea that subjects were carrying out visual simulations as a strategy for solving this task.

*t*test of the means of the intersection in the actual and shuffled conditions revealed a significant difference between the two,

*t*(15) = 6.6626,

*p*< 0.001.

*t*test of the means of the edit distance in the actual and shuffled conditions revealed a significant difference between the two,

*t*(15) = −7.7454,

*p*< 0.001.

*t*(15) = 2.1525,

*p*< 0.05, and temporal,

*t*(15) = −3.7013,

*p*< 0.001, similarity between pre- and post-response eye movements was lower on trials that subjects incorrectly responded to compared to correct trials. This finding is in line with our hypothesis, and suggests that one possible factor explaining why subjects may have incorrectly responded on a trial is that they simulated the wrong ball path, leading to the wrong answer. Finally, we separately analyzed and compared each subject's degree of spatial and temporal overlap across the first 20 and last 20 trials of the session. This comparison is crucial because subjects were explicitly asked to pursue the falling ball, raising the possibility that over time, this instruction might have implicitly trained them to use eye movements to predict where the ball would fall. A paired

*t*test of the mean intersection of the first 20 versus last 20 trials for each subject showed no significant difference,

*t*(15) = −0.89119,

*p*= 0.3869. A paired

*t*test of the mean edit distance of the first 20 versus last 20 trials for each subject yielded the same outcome,

*t*(15) = 0.71735,

*p*= 0.4842. Based on these results, we do not consider it likely that repeatedly pursuing the falling ball necessarily led to an evolution in strategy or entrainment of simulation.

*F*(1, 198) = 15.49,

*p*< 0.001,

*R*

^{2}= 0.072, and accuracy,

*F*(1, 198) = 16.72,

*p*< 0.001,

*R*

^{2}= 0.077, on this task. We thus had two possible, valid models for explaining subject behavior—one based on simulation of the ball's trajectory, and the other based purely on computations of the spatial relationships between onscreen objects. To distinguish between these two possibilities, we assessed how much variance in subject behavior was accounted for by the two uncertainty metrics pertaining to each strategy (i.e., simulation uncertainty in Figure 3B and CNN uncertainty in Figure 5B. We note that the variance in reaction times explained by the CNN model is extremely small (

*R*

^{2}= 0.072), whereas the variance explained by the simulation model is far greater (

*R*

^{2}= 0.5485). The same applies for task accuracy (i.e.,

*R*

^{2}values of 0.07 and 0.18 for the CNN and simulation models respectively). The fact that the simulation model is a much better predictor of subjects' behavior strongly suggests that subjects were likely engaging in visual simulation. To corroborate this finding, we ran these same regression analyses on a subject-by-subject basis. We found that CNN uncertainty was in fact not a significant predictor of reaction time for six of our subjects, whereas simulation uncertainty was a significant predictor for all 16 subjects. A comparison of the slopes and

*R*

^{2}values of these two regression models across subjects shows that the simulation uncertainty model consistently yielded significantly higher slope,

*t*(15) = 16.931,

*p*< 0.001, and

*R*

^{2}values,

*t*(15) = 12.31,

*p*< 0.001. Notably, for the majority of subjects (12 out of 16), the

*R*

^{2}value for the CNN model barely exceeded 0. The nonoverlapping distributions of these values for all 16 subjects is shown in Figure 5C and 5D. Finally, calculating the Akaike information criterion for both models returned a lower value for the simulation model compared to the CNN model (Δ AIC: 143.956), further supporting the idea that subjects were likely simulating the ball's motion trajectory. Overall, we conclude that while there may be various valid approaches to solving this task, our subjects' behavior is best explained by a visual simulation strategy as opposed to a global image analysis strategy that might be exploited by a CNN.

*, 20 (4), 723–767.*

*Behavioral and Brain Sciences**, 12 (6), 066003, https://doi.org/10.1088/1741-2560/12/6/066003.*

*Journal of Neural Engineering**, 59 (1), 617–645, https://doi.org/10.1146/annurev.psych.59.103006.093639.*

*Annual Review of Psychology**, 110 (45), 18327–18332, https://doi.org/10.1073/pnas.1306572110.*

*Proceedings of the National Academy of Sciences*,*USA**, 91, 286–300, https://doi.org/10.1152/jn.00870.2003.*

*Journal of Neurophysiology**, 28 (1), 157–189, https://doi.org/10.1146/annurev.neuro.26.041002.131052.*

*Annual Review of Neuroscience**, 101 (4), 481, https://doi.org/10.2307/1423226.*

*The American Journal of Psychology**.*

*arXiv*1612.00341*, 105, 83–96, https://doi.org/10.1016/j.cortex.2017.08.036.*

*Cortex**, 4 (1), 55–81, https://doi.org/10.1016/0010-0285(73)90004-2.*

*Cognitive Psychology**( Vol. 1, pp. 3642–3649), https://doi.org/10.1109/CVPR.2012.6248110.*

*Multi-column deep neural networks for image classification. 2012 IEEE Conference on Computer Vision and Pattern Recognition**, 8 (3), 218–227, https://doi.org/10.1109/TAC.1963.1105574.*

*IEEE Transactions on Automatic Control**, 13 (1): 20, 1–14, https://doi.org/10.1167/13.1.20. [PubMed] [Article]*

*Journal of Vision**, 76 (4), 467–475, https://doi.org/10.1007/s00426-011-0398-4.*

*Psychological Research**.*

*arXiv*1706.02179*, 125, 61–73, https://doi.org/10.1016/j.neuroimage.2015.10.022.*

*NeuroImage**, 113 (34), E5072–E5081, https://doi.org/10.1073/pnas.1610344113.*

*Proceedings of the National Academy of Sciences, USA**, 424 (6950), 769–771, https://doi.org/10.1038/nature01861.*

*Nature**, 36 (4), 193–202, https://doi.org/10.1007/BF00344251.*

*Biological Cybernetics**, 7 (1), 52–55, https://doi.org/10.1111/j.1467-9280.1996.tb00666.x.*

*Psychological Science**, 10 (5), 1563–1573, https://doi.org/10.1046/j.1460-9568.1998.00181.x.*

*European Journal of Neuroscience**, 10 (3), 237–242, https://doi.org/10.3758/BF03197635.*

*Memory & Cognition**, 49 (1), 794–804, https://doi.org/10.1016/j.neuroimage.2009.07.05.*

*NeuroImage**, 34 (22), 3027–3036, https://doi.org/10.1016/0042-6989(94)90276-3.*

*Vision Research**, 7 (12), nn1355, https://doi.org/10.1038/nn1355.*

*Nature Neuroscience**, 22 (1), 26–31, https://doi.org/10.1016/j.cogbrainres.2004.07.006.*

*Cognitive Brain Research**, 5 (3), 263–287, https://doi.org/10.1162/jocn.1993.5.3.263.*

*Journal of Cognitive Neuroscience**, 2 (9), 635–642, https://doi.org/10.1038/35090055=.*

*Nature Reviews Neuroscience**, 284 (5411), 167–170, https://doi.org/10.1126/science.284.5411.167.*

*Science**, 6 (4), 320–334, https://doi.org/10.1006/nimg.1997.0295.*

*NeuroImage**, 378 (6556), 496–498, https://doi.org/10.1038/378496a0.*

*Nature**, 29 (9), 1049–1057, https://doi.org/10.1016/0042-6989(89)90052-7.*

*Vision Research**, 14 (5): 10, 1–16, https://doi.org/10.1167/14.5.10. [PubMed] [Article]*

*Journal of Vision**, 19 (6), 619–632, https://doi.org/10.1016/0042-6989(79)90238-4.*

*Vision Research**, 19 (6), 633–646, https://doi.org/10.1016/0042-6989(79)90239-6.*

*Vision Research**, 21 (2), 191–203, https://doi.org/10.1016/0042-6989(81)90113-9.*

*Vision Research**, 1 (4), 541–551, https://doi.org/10.1162/neco.1989.1.4.541.*

*Neural Computation**.*

*arXiv*1412.1897*, 11 (9), 929–I N8, https://doi.org/10.1016/0042-6989(71)90213-6.*

*Vision Research**, 6 (2), nrn1603, https://doi.org/10.1038/nrn1603.*

*Nature Reviews Neuroscience**, 29 (9), 1–98, https://doi.org/10.1162/neco_a_00990.*

*Neural Computation**, 76 (4), 383–387, https://doi.org/10.1007/s00426-012-0443-y.*

*Psychological Research**, 4, 387, https://doi.org/10.3389/fpsyg.2013.00387.*

*Frontiers in Psychology**, HFE-3 (2), 52–57, https://doi.org/10.1109/THFE2.1962.4503342.*

*IRE Transactions on Human Factors in Electronics**.*

*arXiv*1312.6199*, 21 (9), 649–665, https://doi.org/10.1016/j.tics.2017.05.012.*

*Trends in Cognitive Sciences**, 31 (1), 155–165, https://doi.org/10.1016/S0896-6273(01)00337-3.*

*Neuron**, 114 (2), 276–284, https://doi.org/10.1016/j.cognition.2009.09.010.*

*Cognition**, 111 (23), 8619–8624, https://doi.org/10.1073/pnas.1403112111.*

*Proceedings of the National Academy of Sciences*,*USA**.*

*arXiv*1506.06579