Abstract
I show that both the dynamical properties of V1 receptive fields and the spiking nature of neural activity are well suited to represent time-varying natural images in terms of a sparse code. Image sequences are modeled as a superposition of space-time kernels which are convolved with a set of coefficient signals. When the coefficient signals are constrained to be sparse — i.e., rarely active — the basis functions that emerge have similar properties to the measured receptive fields of V1 simple cells. That is, they are spatially localized, oriented, and bandpass, and they translate as a function of time. Thus, these receptive fields are well-suited to represent time-varying natural images using few active neurons, providing a simple and economical description of the environment. When a movie is encoded using the learned basis functions, the resulting output signals have a spike-like character in that they are mostly zero, interspersed with brief non-zero values that are punctate in time. This is in stark contrast to the continuous time varying pixel values that constitute the input stream. Together, these observations suggest that both the receptive field properties and the spiking nature of neural activity go hand in hand—i.e., they are not separate aspects of neural function, but rather part of a unified efficient coding strategy. I also show how the image model may be used in a generative mode to synthesize movies for use in both psychophysical and physiological experiments.
Supported by NIMH R29-MH057921.