Abstract
Computational learning of visual systems has seen remarkable success, especially during the last decade. A large part of it can be attributed to the availability of large data sets tailored to specific domains. Most training is performed over unordered and assumed independent data samples and more data correlates with better performance. This work considers what we observe from humans as our sample. In hundreds of trials with human subjects, we found that samples are not independent, and ordered sequences are our observation of internal visual functions. We investigate human visuospatial capabilities through a real-world experimental paradigm. Previous literature posits that comparison represents the most rudimentary form of psychophysical tasks. As an exploration into dynamic visual behaviours, we employ the same-different task in 3D: are two physical 3D objects visually identically? Human subjects are presented with the task while afforded freedom of movement to inspect two real objects within a physical 3D space. The experimental protocol is structured to ensure that all eye and head movements are oriented toward the visual task. We show that no training was needed to achieve good accuracy, and we demonstrate that efficiency improves with practice on various levels, contrasting with modern computational learning. Extensive use is made of eye and head movements to acquire visual information from appropriate viewpoints in a purposive manner. Furthermore, we exhibit that fixations and corresponding head movements are well-orchestrated, encompassing visual functions, which are composed dynamically and tailored to task instances. We present a set of triggers that we observed to activate those functions. Furthering the understanding of this intricate interplay plays an essential role in developing human-like computational learning systems. The "why" behind all the functionalities - unravelling their purpose - poses an exciting challenge. While human vision may appear effortless, the intricacy of visuospatial functions is staggering.