Abstract
In this talk I will make a link with computer vision and recent techniques for addressing the problem of predicting the future. Some of the representations to address this problem in computer vision are reminiscent of current views on scene understanding in humans. When given a single static picture, humans can not only interpret the instantaneous content captured by the image, but also they are able to infer the chain of dynamic events that are likely to happen in the near future. Similarly, when a human observes a short video, it is easy to decide if the event taking place in the video is normal or unexpected, even if the video depicts a an unfamiliar place for the viewer. This is in contrast with work in computer vision, where current systems rely on thousands of hours of video recorded at a single place in order to identify what constitutes an unusual event. In this talk I will discuss techniques for predicting the future based on a large collection of stored memories. We show how, relying on large collections of videos, using global images features, such as the ones used to model fast scene recognition, we can index events stored in memory similar to the query, and how we can build a simple model of the distribution of expected motions. Consequently, the model can make predictions of what is likely to happen in the future, as well as evaluate how unusual is a particular event.