Purchase this article with an account.
Yulia Kotseruba, John Tsotsos; Visual Attention in Dynamic Environments and its Application to Playing On-line Games. Journal of Vision 2014;14(10):523. doi: 10.1167/14.10.523.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
We examine the visual processing requirements of complex visual tasks by building a system capable of playing Jump'n'Run on-line games (e.g. CANABALT - http://www.adamatomic.com/canabalt) in real time. Such games are visually complicated while gameplay remains simple - move the character as far as possible in the map and help it avoid obstacles by pressing a single button. In our setup, video is streamed from the camera pointed at the monitor and button press is controlled by computer. The current gaze position imposes a fovea and periphery in each frame. The theoretical foundation for our work is the Selective Tuning model of visual attention (Tsotsos 2011) and the accompanying Cognitive Programs framework (Tsotsos 2013). We implement relevant parts of the model and show how it enables interaction between the high-level knowledge of the game and low-level context-independent algorithms used for bottom-up image processing. Since our focus is visual attention, we did not learn gameplay logic and instead hard-coded Cognitive Programs as a hierarchy of Finite State Automata: each FSA is composed of elements that in turn are decomposed into FSA's. These include detection and tracking of characters/obstacles, edge/line detection, construction of saliency maps, selection of regions of interest, foveation, changing gaze position, decisions regarding visual contents, spatial relations, etc. We learn game physics by using regression analysis to find relationship between the duration of the button press and sampled jump trajectories. We show that this representation is sufficient for this task and that the inclusion of attentive mechanisms permits us to achieve real-time performance. In particular, several elements help optimize vision algorithms by reducing the search space and partially eliminate image artefacts introduced by the camera. Based on the current state of the game we are able to make assumptions about the next events and adjust the image processing hierarchy accordingly.
Meeting abstract presented at VSS 2014
This PDF is available to Subscribers Only