Abstract
We measured the spatiotemporal chromatic properties of the natural world using a high speed calibrated digital video camera. Our video clips, each lasting 10 seconds and gathered at 200 Hz with a stationary camera, featured a wide variety of scenes, ranging from temporal texture (such as grass blowing in the wind and waves breaking on the sea) to meaningful spatiotemporal structure (such as people communicating using British Sign Language). The raw video output was calibrated and combined to closely approximate the human luminance, red-green and blue-yellow channels (Lovell et al. 2004). By analysing the videos using the power spectrum of the 3D FFT transform, we characterised the natural world as conveyed to the visual cortex. Examination of spatial characteristics showed that the amplitudes of the various spatial frequencies are, as expected, well characterised by a 1/fn relationship with n close to 1 for the luminance channel. In the temporal domain, the overall statistics follow a 1/ωn pattern (where ω denotes temporal frequency) with values of n substantially less than 1 for all three channels. However, when examined on a video-by-video basis a markedly different temporal structure can be observed (e.g. peaks in the temporal spectrum for waves in a river at 6Hz). We note that such peaks are invariant to viewing distance and we propose that vision may use this invariant structure to extract temporal gist from a scene. The spatiotemporal sensitivities of visual organisms may well be driven by a need to capture such information optimally.