Abstract
Recent work in Deep Reinforcement Learning has demonstrated the ability for a parameterized model to learn to solve complex tasks from a sparse reward signal. A consequence of this learning is often a meaningful latent representation of the observational data. The composite nature of neural networks opens the possibility of learning joint representations between not just one, but multiple sensory streams of information. In this work, we train a neural network to learn a joint spatial representation that combines separate egocentric and allocentric visual streams, corresponding to a 3D first-person view and 2D map view. We used a simple 3D environment with a goal-driven navigation task. In order to fully explore the relationship between the two information streams, we employed multiple experimental conditions where each stream contained variable amounts of relevant spatial information, specified as follows. The egocentric perspective could contain one of three levels of information (“None”, “Partial” - the goal is invisible, or “Full” - the goal is visible). Likewise, the allocentric perspective contained one of three levels of information: (“None”, “Partial” - the goal is present, but self-location is not indicated, or “Full” both the goal position and self-location are indicated). We demonstrate the novel result that a goal-driven reward signal can be used to guide the learning of a joint representation between allocen-tric and egocentric visual streams. Additionally, in the condition involving imperfect information from both streams (“Partial” - “Partial”) the network was able to learn to successfully combine the streams in a representation that contains near-perfect global self-location and orientation information, even when this information was not explicitly available in either visual stream, and allowed for near-optimal performance. We compare these learned representations to those prototypical of the mammalian “cognitive map,” as well as compare behavior results between our trained models and human participants.