By recording the response of fly photoreceptors during visual stimulation containing a target seen against a cluttered background, we show that the process of enhancing target salience begins at the earliest stages of vision. Temporal processing by photoreceptors alone, in the absence of any spatial interactions, improved the discrimination of targets from background (essentially a spatial task) by around 70%. Furthermore, this improvement is captured by elaborate models of photoreceptor temporal non-linear dynamics.
Although previous work has looked at photoreceptor representation of targets under uniform background conditions (Burton & Laughlin,
2003), this is the first study to show the complete neuronal image of a fly under real-world environmental conditions (i.e., cluttered background) utilizing both time and space. The results show that the temporal pre-processing performed by the photoreceptors greatly enhances the ability to not only visualize scenes within the limited bandwidth of neuronal signaling but also to identify small targets as they move against complex backgrounds. We also show that this ability is not dependent on either relative motion between the target and the surround or on the type of scene.
It is suspected that non-linear processing by the visual system is matched to the statistics of natural scenes (Simoncelli & Olshausen,
2001); hence, a major aim of this study was to examine the effect of such processing in a naturalistic setting. This allowed the reconstruction of a full 3-dimensional (2 space and 1 time) representation of the neural activity of a biological photoreceptor array under realistic lighting and was the only way to investigate questions such as how the identification of a mobile target within a complex moving scene is effected by photoreceptor processing.
This same movie sequence has been used recently to illustrate coding properties of motion sensitive neurons in the fly lobular plate (Nordström et al.,
2008). Ideally a sequence such as the one presented here would be based upon a real pursuit flight. However, the analysis of fly chases is complicated by highly irregular flight maneuvers (Boeddeker, Kern, & Egelhaaf,
2003) making tracking and flight reconstruction complex. Possible saccadic movements of the head (Land,
1973; Wagner,
1986) make it almost impossible to determine the exact orientation of the eyes during free flight and have only been possible in restricted environments (van Hateren & Schilstra,
1999).
In determining the course for our robot, we attempted to mimic a plausible chase sequence with the target being pursued in such a way as to place it in the top center of the visual field, just like flies do (Collett & Land,
1978). Unavoidable delays between recording a chase and reconstructing it (i.e., analysis of video footage to determine fly position) would lead to changes in lighting and other environmental variables thereby reducing the similarity to the real event. Since photoreceptor responses vary based on recent history (Matic & Laughlin,
1981), our approach made it possible to obtain a sequence that started and ended at approximately the same point and orientation in space in order to facilitate appropriate photoreceptor adaptation states when the movie was played in a loop. Finally, since the aim of the paper was to determine the ability for photoreceptor processing to enhance target salience the findings still hold regardless of the flight dynamics used. In some ways the sequence is a “worst case” as it is known that flies will reduce the contribution of the background by viewing targets against the sky (Wehrhahn et al.,
1982) and utilizing head movements to minimize the movement of the target on the retina (van Hateren & Schilstra,
1999). Although our reconstructed image sequence is perhaps best described as “naturalistic” rather than a true natural sequence, it represents an accurate glimpse into the neuronal representation of an entire scene in both space and time and is relevant for applications of artificial target detection systems.
By extending the approach to analyzing responses to all pixels within a scene, it was possible to look at information transfer in a 3-dimensional sense for a well-defined visual task. In other words, we were able to look at the relationship between features within the scene and their surround, both before and after processing. This has not been previously possible.
Here we show that target detectability was enhanced significantly after processing by the biological photoreceptor and that functionally this enhancement is captured by the implementation of a photoreceptor model (van Hateren & Snippe,
2001). Hence, although there are some differences in the detail (i.e., raw target
z-score values), the model appears to explain almost all of the biological response from a practical informational point of view in this specific, but highly relevant, task of small target detection. Where differences did occur they were more likely to be the result of biological or recording noise rather than significant differences between the processing by the model and photoreceptor.
Perhaps the most surprising aspect of this is that the enhancement is independent of 2-dimensional interactions, i.e., it does not require complex center-surround interactions. In almost all other approaches to studying detection of small targets among cluttered backgrounds, a fundamental operation is to enhance the difference between the target and the local surround by means of a classical center-surround operation (spatial high-pass filtering), where the weighted values of adjacent pixels are subtracted from the central pixel. This enhances small objects and makes them easier to detect. This operation takes place in the vertebrate retina (Baccus,
2007) and the second order neurons (large monopolar cells) of insects (Shaw,
1984); however, it is not present in insect photoreceptors (Smakman & Stavenga,
1987). By only playing one pixel at a time, and by using an extended source so that the surrounding pixels are all driven to the same level, we have removed any possible spatial interactions that could be fed back to the photoreceptors from higher order neurons as seen in Drosophila (Zheng et al.,
2006). However, there is no evidence that stimulation outside a photoreceptor's receptive field influences the response either directly or via possible higher order feedback (James,
1992). Thus, our signal is just processed in the time domain. In other words, the enhancement of spatial information that we demonstrate in this system is purely the result of temporal processing, occurring independently on every pixel.
When considering the effect of the static and dynamic non-linear components of the photoreceptor processing, it is important to keep the behavior in mind. While the static components alone provided a larger improvement in target detectability during the simulated flight they performed much worse under the simulated hovering/perching condition and when there was no relative motion between background and target in the panoramic scenes. Since it is often under low velocity conditions (e.g., hovering) that male files first detect potential mates or rivals for pursuit this situation should not be discounted. Furthermore, it is important to note that while the target was an important object within the moving environment it was not the only one of consequence. The dynamic model of the photoreceptor enhanced other objects, such as the tree branches, as seen in
Figure 4, that are important for navigational purposes and were not as prevalent in the static model. The static model of photoreceptor function does not make the best use of the available output bandwidth by continuously monitoring and changing the gain as does the dynamic model (van Hateren,
1997). While that has no impact in a noise-free model, it would be a serious limitation in biology where the signal-to-noise ratio is a major consideration (van Hateren,
1992a) or in a bandwidth limited second stage of processing, whether an artificial 8-bit digital system or a noisy interneuron. Finally, after-images of the target caused by the system dynamics, which decreased the detectability of the target according the types of measurements used in this paper, can be exploited in more sophisticated bio-inspired target detection models to enhance target salience (Wiederman, Shoemaker, & O'Carroll,
2007).
In providing insight into the degree to which the photoreceptor representation improves the salience of features of interest, our work highlights the importance of appropriate sampling and processing for subsequent higher-order visual tasks. This suggests that a biomimetic approach to modeling this “front-end” processing may have potential to simplify approaches to artificial vision for subsequent segregation of the motion of target from background. Given our recent description of neurons capable of this task (Barnett et al.,
2007; Nordström et al.,
2006), the fly thus provides a proof of concept that this complex task may be solved by simple and robust mechanisms. Additionally, a model that performs just as well in this task as the biological system of the fly shows that a solution to this difficult problem need not involve complex spatial interactions.
In order to test these findings under different conditions, it would have been preferable to reconstruct multiple movies from different locations to ensure the results are scene-invariant. This however would be a major undertaking as the time required to perform these electrophysiological experiments is large. To fully record a 10-s sequence utilizing both rotation and translation covering a 90 × 67 degree patch of space with 3 repeats (as described in this paper) requires approximately 50 hours of useful intracellular recordings. A more effective way was to show that similar target enhancement was achieved when targets were inserted into panoramic images that were animated to simulate yaw rotation only. While this type of motion is not as accurate as the full movie sequences, it does show that the enhancement of target salience is not specific to one scene or one type of motion and that the photoreceptor model also works under such situations. Thus, leading to the possibility of using the model in place of real photoreceptors to more fully explore the parameter space including testing different scenarios, speeds, and target luminances.
This shows that just as the optical design of eyes is adapted for different tasks (Hughes,
1977; Lythgoe,
1979), neuronal processing may also be different in animals displaying different behaviors. Further testing is required in order to investigate this hypothesis including an analysis within species between male flies (which do chase small targets) and female flies (which do not). It will also be interesting to test for neurophysiological differences between flies with different optical specializations (i.e., acute vs. bright zones) (Straw et al.,
2006) to see if target detection is altered.