August 2008
Volume 8, Issue 11
Free
Research Article  |   August 2008
Photoreceptor processing improves salience facilitating small target detection in cluttered scenes
Author Affiliations
Journal of Vision August 2008, Vol.8, 8. doi:10.1167/8.11.8
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Russell S. A. Brinkworth, Eng-Leng Mah, Jodi P. Gray, David C. O'Carroll; Photoreceptor processing improves salience facilitating small target detection in cluttered scenes. Journal of Vision 2008;8(11):8. doi: 10.1167/8.11.8.

      Download citation file:


      © 2016 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

Target detection amidst clutter is a challenging task for both natural and artificial vision, yet one solved at the level of neurons in the 3rd optic ganglion of insects. These neurons are capable of responding to the motion of small objects, even against complex moving backgrounds. While the basic physiology has been investigated, little is known about how these cells are able to reject background motion while robustly responding to such small stimuli. By recording intracellularly from fly photoreceptors stimulated with natural image sequences containing a target viewed against a complex moving background, we show that the process of target detection begins at the earliest stages of vision. The temporal processing by photoreceptors alone, in the absence of any spatial interactions, improved the discrimination of targets (essentially a spatial task) by around 70%. This enhancement of target salience can be explained by elaborate models of photoreceptor temporal non-linear dynamics. The application of the functional principals outlined in this work could be utilized in areas such as robotics and surveillance, medical imaging, or astronomy, anywhere it is necessary to detect a small item from a cluttered surround.

Introduction
Visual target detection against a cluttered, moving background is a challenging task for any system, natural or artificial. The ability to extract the most important features from a scene necessitates making them more easily distinguished from the surround (i.e., increasing their salience) is critical. Flying insects have solved this problem (Collett & Land, 1975; Land & Collett, 1974). Recordings from neurons in the 3rd optic ganglion (functionally similar to sections of the mammalian visual cortex) show responses to the motion of objects smaller than the angular subtense of a single photoreceptor (Barnett, Nordström, & O'Carroll, 2007; Nordström, Barnett, & O'Carroll, 2006). While the basic physiology (Collett, 1971; O'Carroll, 1993) and possible neuroanatomical connections (Egelhaaf, 1985; Gilbert & Strausfeld, 1992) of this have been investigated, little is known about the underlying mechanisms that allow these neurons to reject background motion while robustly responding to stimuli with a spatial extent smaller than the separation between photoreceptors. 
Application of standard information theory, using sinusoidal stimuli and accounting for various sources of noise, shows that the optics of insect eyes are optimized (given their physical limitations) (Snyder, Stavenga, & Laughlin, 1977). Differences in optical design of compound eyes can be attributed to different operating conditions (Snyder, 1977) with both lifestyle and environment influencing these parameters (Snyder, Bossomaier, & Hughes, 1990). Sexual dimorphism within species (Land & Eckert, 1985) and interspecies differences (Straw, Warrant, & O'Carroll, 2006) are clearly related to specialized behavior, despite the occupation of similar environments. 
Previously white noise stimuli were used to identify the signaling properties of photoreceptors in a dynamic sense (Juusola, Uusitalo, & Weckström, 1995), showing that non-linear dynamics of the cells aid in the optimal use of the limited signaling range of individual neurons (de Ruyter van Steveninck & Laughlin, 1996). The metabolic cost of running such neuronal systems is high (Laughlin, de Ruyter van Steveninck, & Anderson, 1998), indicating selective pressure for efficient coding. Additionally the surround response kernel of photoreceptors in hoverflies corresponded to optical cross-talk rather than a response to the surround via lateral connections or feedback (James, 1990). However, these results were generally limited to linear systems analysis, a potential problem given the highly non-linear nature of visual processing (Matic & Laughlin, 1981; Payne & Howard, 1981). 
This previous work showed that a major role of early sensory processing is to maximize information transfer through noisy neurons with limited dynamic range (bandwidth) (Attneave, 1954; Srinivasan, Laughlin, & Dubs, 1982; van Hateren, 1992a). Recently there has been a move to use more natural stimuli in vision research (van Hateren & van der Schaaf, 1996). These natural stimuli aim to maintain the inverse relationship between power and spatial frequency (i.e., more power at lower spatial frequencies), which is believed to be important in visual processing (Dror, O'Carroll, & Laughlin, 2000; Field, 1987). However, these studies have either focused on information processing over time, at the level of single samples in space (van Hateren, 1997), or they have been based on somewhat artificial animation of single stationary images in order to reconstruct the spatial “neural image” (van Hateren, 1992b). 
It is possible to devise computational models to accurately explain the behavior of photoreceptors to natural scenes (van Hateren & Snippe, 2001). However, the absence of a defined task only yields a general analysis of information transfer rates. Male flies have optical specializations for photoreceptors in bright or acute zones (function equivalent of a fovea) believed to be associated with one such task, the pursuit of other flies within complex environments (Collett & Land, 1978; Wehrhahn, Poggio, & Bülthof, 1982). Hence, such general analysis may overlook additional dynamic non-linear processing that aid in specific tasks. 
One such study to analyze the “neuronal image” of a moving target has found that the frontal photoreceptors of the male Musca domestica detect targets much more accurately, and from longer distances, than those of the female. Furthermore, the response properties of photoreceptors could not be predicted from white noise analysis (Burton & Laughlin, 2003), suggesting that higher order non-linear mechanisms may play a role in enhancing target detection. However, this study only investigated the detection of targets against a constant (uncluttered) background. In this condition, the target is highly salient as it is the only feature. While this situation is ideal (and behaviorally relevant when detecting a target from behind and below against the sky), it does not cover the much more difficult condition of a target viewed among clutter, as would be experienced during the conspecific pursuit flights observed for several different fly species. 
In contrast to previous approaches, we have reconstructed the photoreceptor representation across all points in space within a moving scene with a specific purpose: the identification of a small moving target in a cluttered moving background. This allowed us to investigate the degree to which “early” processing enhanced the relationship between image points (i.e., between a moving target and the texture within the local patch of the background against which it moved) and thus to evaluate the role photoreceptor processing plays in this difficult visual task. Additionally, we have used an animation technique to display to the fly six 360-degree panoramic stationary images of the world, littered with target like objects, to show how this salience enhancement is influenced by different environments and is not dependent on a velocity difference between target and background. Importantly, unlike other approaches, we have investigated whether the extent to which the purely temporal processing of the photoreceptors improves the performance of a spatial task, discriminating between a target and the local surround. 
We show that discrimination of targets from their backgrounds is improved in a manner that cannot be accounted for by simple response dynamics but can be explained by a computational model incorporating non-linear luminance adaptation. Target salience may thus be improved in the earliest stages of biological vision, prior to any complex spatial processing, to extract moving features from the background. 
Methods
Movie acquisition and processing
An area scan monochrome 14-bit camera (XCD-V50, Sony™) was mounted on a customized robotic platform fitted with a damped, inertial stabilization system and optical encoders on the wheels. Due to the physical constraint of the robot (wheel based and terrestrial), there was no change in the elevation of the camera above the ground, no sideslip component of motion and the stabilization arm prevented changes in pitch and roll. Hence, the motion was limited to forward translation and yaw rotation only. 
A wide-angle lens (TF2.8DA-8, Fujinon™) provided an effective field of view of 90 degrees. A green filter (N52-534, Edmund™) was used to approximately match the spectral pass-band of the camera to the blue-green sensitivity of fly R1–R6 photoreceptors (Srinivasan & Guy, 1990; Stavenga, 1976). The camera was programmed to alter the shutter speed after each frame (5 different shutter speeds used) thus increasing the dynamic range of the images captured to 90 dB, providing detail in both the dark and light parts of the scene and reducing noise in any one pixel by taking multiple samples. The movement of the platform was 200 times slower than typical fly speeds (Collett & Land, 1975) so the 5-Hz image acquisition rate could be scaled to 1 kHz for playback during electrophysiological experiments. The robot was moved along a curving path through a visually rich outdoor scene, under bright sunlight and directed in a manner similar to the “saccadic” mode of flight of many fly species, with periods of forward motion interspersed by turns towards a new trajectory (Frye & Dickinson, 2001; Kern, van Hateren, Michaelis, Lindemann, & Egelhaaf, 2005). Position and orientation at the start and end of the robot's path were the same, such that the subsequent movie could be played in a loop without discontinuities. The non-linear (gamma) characteristics of the camera were quantified and accounted for by comparing images of the same scene taken at different shutter speeds. This normalization allowed the combination of the frames taken at different shutter speeds into a single high dynamic range image. The slow movement of the platform meant that error from motion between frames when combining the images taken at different shutter speeds was minimal, especially when accounting for the spatial resolution properties of the fly. 
A small (1.4 × 2.8 degree) black target (luminance 0) was inserted into the reconstructed high dynamic range frames to simulate a simple chase sequence and animated in a manner to include random jitter (maximum 0.70 degrees/frame) and “turns” that preceded the actual platform motion by 0.1 s. The target size was selected to represent the approximate size of another fly at an appropriate distance for a chase scenario (Yeates & Dodson, 1990) and has been shown to strongly excite higher order cells in the fly lobula complex (Nordström, Barnett, Moyer de Miguel, Brinkworth, & O'Carroll, 2008; Nordström et al., 2006). The uniform black color was chosen to act as a constant for a valid comparison to be made throughout the duration of the movie. Data from wheel-mounted optical encoders were analyzed to determine the yaw velocity of the robotic platform. The difference between the phase advanced trajectory and that of the robot was then used as the azimuth for the target. The target elevation was varied slowly and randomly between 32 degrees and 11 degrees above the horizon, the part of the visual field associated with pursuit flight (Collett & Land, 1978; Wehrhahn et al., 1982). 
In order to mimic the optical properties of the dorsal eye of a male hoverfly (Stavenga, 2003; Straw et al., 2006), each frame was convolved with a Gaussian of half-width 1.4 degrees based on blur expected from an average facet diameter of 40 microns and hexagonally sampled with a 1 degree separation along the horizontal (0.866 degree vertical). This ratio of 1.4 between the blur half width and spatial resolution (Δθρ) is also the same as seen in bees (Laughlin & Horridge, 1971; van Hateren, Srinivasan, & Wait, 1990). The resulting data set had just over 10 000 frames each with 90 hexels (hexagonal pixels) along the horizontal and 58 along the vertical (90 × 67 degrees). 
Panorama acquisition and processing
A Nikon D-70 digital camera was used to obtain the high dynamic range panoramic images from six locations around Adelaide (see Figure 9). Each location was selected to demonstrate target detection within different environmental conditions, such as a densely treed area which would make target detection difficult (A), a fully urban environment with dramatically different dark and bright areas (B), a sparse area, in which it would be easy to detect targets (C), a setting displaying a mixture of urban and organic components (D), a location known to be populated by Eristalis tenax (E), and a moderately treed area similar to that used in the movie (F). These images also represented a range of luminance and contrast conditions. Each panorama was obtained using a series of 12 overlapping panels saved in NEF (raw) format. In order to capture components of the scenes that exceeded the dynamic range of the camera sensor, each panel was captured at a range of exposure levels. For each exposure level, saturated pixels were discarded and local luminance was then established using calibration of the luminance/value (gamma) curve for the camera pixels (Debevec & Malik, 1997) and then converted to floating point format using custom software written in LabView (National Instruments™). Image sizes were 8000 × 1600 pixels (360 × 72 degrees). 
One hundred targets of size 1.4 × 1.4 degrees were inserted into every panorama in a pseudorandom manner. No targets were placed within 4.6 degrees of the top or bottom of the images to ensure a sufficiently large surround for analysis. Targets on the same row were placed a minimum of 45 degrees apart to prevent the response to one target influencing the response of another. Finally, no target was placed within 9 degrees of any other target to prevent any target being included in the surround analysis of another target. As with the movie described above only the green channel was used, with the images blurred and resampled for fly optics resulting in 62 rows covering the 72 degrees vertical field of view. 
Electrophysiology for movie recordings
Electrophysiological recordings were made by inserting aluminosilicate microelectrodes pulled on a Sutter Instruments P97 filled with 2.0M KCl (tip resistance ca. 120 megaohms) into R1–R6 photoreceptors in the dorsal eye of male hoverflies, Eristalis tenax. Only photoreceptors in the bright zone were used for recording as they are the ones most associated with the detection of small targets (Barnett et al., 2007). The microelectrode was connected to a pre-amplifier (npi BA-1S) and the amplified output was filtered using a 50-Hz line noise removal filter (Hum Bug, Quest Scientific) before digital sampling at 5 kHz on a 16-bit data acquisition card (NI PCI6221, National Instruments™). Only healthy recordings of more than 20 minutes duration, from photoreceptors with membrane potentials below −40 mV (typically <−50 mV) and giving maximal responses greater than 30 mV (typically >40 mV) were included in analysis. 
Individual hexels from the movie sequence were played out through an ultra bright green Light Emitting Diode (LED; Luxeon Star, Luxeon™), driven by a calibrated custom designed high current amplifier (350 mA maximum current output), and the luminance level was fully controlled via the data acquisition system. Light from the LED was delivered to the eye via a quartz light-guide centered on the receptive field of the photoreceptor and subtending an angle >90 degrees. Because this completely filled the receptive field of the photoreceptor, the brightness of the LED could be calibrated directly against the luminance of the captured scenes. This system had fast temporal response characteristics (>1 kHz), wide enough dynamic range (maximum contrast >1:10 6), and sufficient brightness (70 000 cd/m 2) to fully reproduce the acquired data set at high speed and real-world luminance levels. 
The movie was played back in a vertical raster fashion starting with the upper left hexel. Since the start and end points of the movie were very close to the same point in space, the adaptation level of the photoreceptor was deemed sufficiently close to permit the playback of adjacent hexels without an intermediate stimulus to reset the adaptation level. The exception to this was when the first hexel in a column was played as it either represented the beginning of the movie or it came after the bottom hexel in the adjacent column. In such cases, the data for that point in space was played twice and the first set discarded. This method is illustrated in Figure 1, which also includes an example of a single luminance trace over time and the corresponding photoreceptor response. The resulting intracellular responses to the varying luminance levels were then reconstructed into a complete movie sequence representing the world as it would be “seen” by a fly. 
Figure 1
 
Method for movie playback to photoreceptors. Reconstruction of the neural representation of a natural scene in 2D spatial and temporal domains. (A) Segment of an original image of a natural scene, the optical input to the ommatidia and the photoreceptor signals corresponding to the optical input. Note the hexagonal sampling as in the insect eye. (B) Temporal sequence of images, the signals of such temporal sequences were played back sequentially to the eye of a fly one pixel over time in a vertical raster fashion (starting from the top left and moving down the column until the bottom then restarting from the top one column to the right). (C) The luminance input to a single ommatidium and the response of a photoreceptor to this optical input ( x = 45 degrees, y = 21 degrees). The photoreceptor membrane potential is reported after an average reverse normalization scale was applied to the data. While the photoreceptor membrane potential does follow some of the general patterns of the input luminance there are some notable transformations. Of most relevance are the semi-logarithmic response, which exaggerates the response to the darker areas, and adaptation, which causes the cell to respond in a decreasing way to constant or repetitive stimulation. Adaptation is most noticeable around 3.5 s and again around 7.5 s where an increasing luminance results in a decreasing response from the cell.
Figure 1
 
Method for movie playback to photoreceptors. Reconstruction of the neural representation of a natural scene in 2D spatial and temporal domains. (A) Segment of an original image of a natural scene, the optical input to the ommatidia and the photoreceptor signals corresponding to the optical input. Note the hexagonal sampling as in the insect eye. (B) Temporal sequence of images, the signals of such temporal sequences were played back sequentially to the eye of a fly one pixel over time in a vertical raster fashion (starting from the top left and moving down the column until the bottom then restarting from the top one column to the right). (C) The luminance input to a single ommatidium and the response of a photoreceptor to this optical input ( x = 45 degrees, y = 21 degrees). The photoreceptor membrane potential is reported after an average reverse normalization scale was applied to the data. While the photoreceptor membrane potential does follow some of the general patterns of the input luminance there are some notable transformations. Of most relevance are the semi-logarithmic response, which exaggerates the response to the darker areas, and adaptation, which causes the cell to respond in a decreasing way to constant or repetitive stimulation. Adaptation is most noticeable around 3.5 s and again around 7.5 s where an increasing luminance results in a decreasing response from the cell.
In order to test the health of the photoreceptor, and to provide normalization parameters for both long recordings and between cells of different insects, control stimuli were presented before the first hexel in every column. These consisted of 24.5 s adaptation at a steady 700 cd/m 2 (equivalent to partial shade in a typical outdoor scene), followed by 5 × 100 ms square wave pulses (duty cycle of 50%) alternating between 0 (LED off) and 1 400 cd/m 2 and finished with 14.5 s at 700 cd/m 2. These luminance levels were selected for the test values as preliminary experiments showed that photoreceptors generate near full-scale responses to this level of contrast, without obvious bleaching or other non-linear effects from the bright pulses. The difference between the average responses to the pulses was used as the normalization factor. An example of the normalization factors over time for a single photoreceptor recording is shown in Figure 2
Figure 2
 
Cell health over time. Response levels of a single photoreceptor to identical square wave stimuli played between rows of the movie. These results were used to normalize the recordings so as to compensate for slight changes in cell health that inevitably occur over such long recordings and to permit data from different cells to be compared. The baseline gave an indication of the change in the DC offset of the recording while the gain indicated the cell health by way of the size of the response to a constant stimulus. Cell health degraded a little between 366 and 377 minutes of recording with the cell finally lost between 581 and 591 minutes.
Figure 2
 
Cell health over time. Response levels of a single photoreceptor to identical square wave stimuli played between rows of the movie. These results were used to normalize the recordings so as to compensate for slight changes in cell health that inevitably occur over such long recordings and to permit data from different cells to be compared. The baseline gave an indication of the change in the DC offset of the recording while the gain indicated the cell health by way of the size of the response to a constant stimulus. Cell health degraded a little between 366 and 377 minutes of recording with the cell finally lost between 581 and 591 minutes.
In one case was it possible to maintain stable intracellular recordings from a single photoreceptor for the entire 16.25 hours required to show the complete movie. In cases where the test sequences showed a significant deterioration in cell health, the recording was suspended and the entire column of previously recorded data was removed from further analysis. A total of 30 different cells were used from 12 different male flies to generate sufficient data for the complete reconstruction of every hexel in the movie three times. 
In order to display the result on a standard 8-bit gray-scale display (and for purposes of producing figures and the movie for display), the reconstructed averaged sequence of image frames was scaled by applying limits and a gamma correction. This involved scaling the data between 0 and 255. To do this, the white point (255) was set to correspond to the 99.9th percentile value (i.e., the largest 0.1% of the data). Unlike the luminance signal, the photoreceptor response did not have a preset 0 point so it was defined as the 0.1th percentile value (i.e., the lowest 0.1% of the data was set to 0). All luminance data were scaled between the end points using a gamma of 2 to correct for the non-linearity of conventional displays. 
Panoramic experiments
Images were played back to the photoreceptors in a similar manner to the movie but utilizing a horizontal raster fashion commencing from the top left. To ensure a correct level of photoreceptor adaptation, the first row of the image was played twice and the first discarded. Repeating subsequent rows was not necessary as the previous row represented a sufficiently close match for adaptation purposes. 
The playback rate of the panoramas was 1000 samples/s, which represented a yaw rotation of 45 degrees/s. This rotation speed was selected as it corresponded closely to the average of the rectified and logarithmically transformed yaw values from the movie sequence (46.06 degrees/s). Each panoramic image was recorded both with and without targets inserted into the image 6 times for averaging purposes. 
Modeling
A 2D array of photoreceptors, covering the entire movie area, were simulated using LabView (National Instruments™) and were an elaboration of a previously published parametric model (Mah, Brinkworth, & O'Carroll, 2008; van Hateren & Snippe, 2001). This model had 4 basic components: (1) dynamic pixel-wise control of gain and low-pass filter corner frequency, (2, 3) short and mid-term adaptations that serve to compress the dynamic range while enhancing temporal changes, and (4) a static saturating non-linearity. This processing has been shown to be functionally similar to temporal processing by primate cones (van Hateren & Snippe, 2006). 
The input to the model was the same image sequence used to drive the LED in the electrophysiological experiments but was run through twice to ensure correct initial conditions for the model's filters. The output of the model was processed in the same way as the neuronal image recorded from the biological photoreceptors. 
For the panoramas, a 1D version of the model was used and the input was equivalent to that provided to the biological photoreceptors. 
Statistical analysis
All averaged results in the text are given in the form mean ± standard deviation. Two calculations were performed to quantify the difference between the target and background, local contrast (LC), and z-score (Z), as a means of determining how easily the target could be distinguished from the background. These quantities are defined below:  
L nganham C = P x nganham y P x nganham y P x nganham y + P x nganham y .
(1)
 
Where P xy is the value of the point corresponding to the center of the target and
P x nganham y
is the mean value of the 12 next-nearest neighbors to the target point on a hexagonal grid. Local contrast is an adaptation of the Michelson definition of contrast and is a dimensionless quantity bound between ±1 where negative numbers correspond to a darker center than surround.  
Z = P x nganham y P x nganham y σ x nganham y .
(2)
 
Where P xy is the value of the point corresponding to the center of the target,
P x nganham y
is the mean of the 12 next-nearest neighbors to the target point on a hexagonal grid, and
σ x nganham y
is the standard deviation of the 12 next-nearest neighbors. 
In each of these cases, local contrast and z-score, the next-nearest neighbors rather than the 6 neighboring points were used for analysis for two main reasons. Firstly the size of the target was larger than the inter-receptor distance, hence producing overlap into the surrounding points. Secondly, even if the target size was smaller, the targets were not always centered in the middle of a detection point, so while the “closest match” was used there was always some overlap of the target between the point closest to the target center and at least one of the 6 neighboring points. 
Due to the non-Gaussian nature of the sample distribution, standard parametric tests (i.e., z-score <−2.2 corresponds to p < 0.05 for n = 12) were not valid. Thus, to illustrate the level of separability between target and background receiver operating characteristic (ROC) curves were constructed (Hanley & McNeil, 1982). Histograms of the z-scores for 50 randomly sampled points per movie frame (total 515 000 for entire movie) and 22064 (4.45% of total pixels) randomly selected points from the panoramic images were used as the background reference values. These distributions of the rectified z-scores (one for each of the analysis groups) were compared to the distribution of rectified target z-scores. ROC curves were constructed by plotting the percentage area under the histogram of background points (false positives) versus the percentage under the histogram of target points (true positives) corresponding to different threshold levels. Three parameters were then calculated to quantify the difference in the ability to distinguish targets from background. The 50% target detection rate corresponded to the number of false positives for 50% detection of targets. The 1% error rate was the percentage of targets given 1% false positives (corresponded to p = 0.01). The ROC area was the percentage area under the ROC curve between 0.01% and 100% false positives on a log scale. 
Results
A copy of the movie showing both the luminance input and the average photoreceptor response ( n = 3), played at 1/8th of the speed presented to the fly, is shown as 1
 
Movie 1
 
Photoreceptor processing of visual scene. Left: Original luminance image as captured by the camera with target (approximately 1.4 by 2.8 hexagonal pixels in size) inserted after optical blur and hexagonal sampling to mimic the resolution of the fly. Right: Neuronal representation of the scene as recorded from the biological photoreceptors. Movie has been slowed to 1/8th real-time to better show the dynamics of the photoreceptor processing (total duration equivalent to 10.3 s). Sequences were scaled to fit onto an 8-bit look-up table such that the largest 0.1% of the values (across the entire movie) were mapped to white and the lowest 0.1% of the values were mapped to black. Every point within the neuronal representation was made up of the average of three different photoreceptor cells. The target and various background features such as trees and bushes are much more salient after processing by the biological photoreceptor.
Global analysis
Histograms of both the normalized luminance and photoreceptor responses covering the entire duration of the movie are shown in Figure 3. Consistent with earlier work that used a single pixel representation to look at luminance changes over time (van Hateren, 1997), and predictions of models for the visual system (van Hateren & Snippe, 2001), the bandwidth is more evenly utilized when processed by the photoreceptors than by a linear representation of luminance. This “gaussification” of luminance distribution is consistent with a closer to optimal representation of information (Shannon, 1948), thereby increasing the information available to a low dynamic range and/or noise limited system. 
Figure 3
 
Distribution of values. Histogram of both the normalized luminance and the averaged normalized neuronal response values recorded over the entire movie sequence. Histogram bin width is 1% of used dynamic range. The membrane potential is reported after an average reverse normalization scale was applied to the averaged data. The adaptive non-linear encoding of brightness levels by the photoreceptors made better use of the range of values available. On the scale used in the movie reproduction, the white point (100% luminance value) corresponded to approximately 22 500 cd/m2, while the black and white points for the photoreceptors corresponded to −58.8 mV and −14.54 mV, respectively.
Figure 3
 
Distribution of values. Histogram of both the normalized luminance and the averaged normalized neuronal response values recorded over the entire movie sequence. Histogram bin width is 1% of used dynamic range. The membrane potential is reported after an average reverse normalization scale was applied to the averaged data. The adaptive non-linear encoding of brightness levels by the photoreceptors made better use of the range of values available. On the scale used in the movie reproduction, the white point (100% luminance value) corresponded to approximately 22 500 cd/m2, while the black and white points for the photoreceptors corresponded to −58.8 mV and −14.54 mV, respectively.
Figure 4 shows an example of how biological processing improved the ability to distinguish the target from the background. As indicated in the histograms of the whole movie (see Figure 3), qualitative inspection of this single frame suggests that more detail is present in both the general background and salient features, such as the small target (upper center) and the tree branches (right), which has a higher contrast with respect to the local background levels in the photoreceptor reconstruction of the movie compared to the unprocessed luminance image. 
Figure 4
 
Single frame from the final reconstructed movie. Upper: Original luminance image as captured by the camera with target inserted. Middle: Luminance image after optical blur and hexagonal sampling to mimic the resolution of the fly. Lower: Neuronal representation of the scene as recorded from the biological photoreceptors. Image corresponds to 9.9 s after commencement of the movie (total duration 10.3 s). White circles identify the target position in each view. Images were scaled to fit onto an 8-bit look-up table such that the largest 0.1% of the values (across the entire movie) were mapped to white and the lowest 0.1% of the values were mapped to black. Every point within the neuronal representation was made up of the average of three different photoreceptor cells, which were normalized with respect to control stimuli played periodically throughout the experiment. The target (approximately 1.4 by 2.8 hexagonal pixels in size) was centered 42 hexagonal pixels from the left and 7 hexagonal pixels from the top in the photoreceptor reconstruction. The processing carried out by the biological photoreceptor enhanced not only the target against the local background but also various features within the background, such as tree branches. Vertical stripes in the reconstructed photoreceptor recordings highlight slight changes in the response of the cells over time. Note that due to the color filter on the acquisition camera this represented only the green spectrum.
Figure 4
 
Single frame from the final reconstructed movie. Upper: Original luminance image as captured by the camera with target inserted. Middle: Luminance image after optical blur and hexagonal sampling to mimic the resolution of the fly. Lower: Neuronal representation of the scene as recorded from the biological photoreceptors. Image corresponds to 9.9 s after commencement of the movie (total duration 10.3 s). White circles identify the target position in each view. Images were scaled to fit onto an 8-bit look-up table such that the largest 0.1% of the values (across the entire movie) were mapped to white and the lowest 0.1% of the values were mapped to black. Every point within the neuronal representation was made up of the average of three different photoreceptor cells, which were normalized with respect to control stimuli played periodically throughout the experiment. The target (approximately 1.4 by 2.8 hexagonal pixels in size) was centered 42 hexagonal pixels from the left and 7 hexagonal pixels from the top in the photoreceptor reconstruction. The processing carried out by the biological photoreceptor enhanced not only the target against the local background but also various features within the background, such as tree branches. Vertical stripes in the reconstructed photoreceptor recordings highlight slight changes in the response of the cells over time. Note that due to the color filter on the acquisition camera this represented only the green spectrum.
Target detection
To quantify if the processing of the photoreceptor altered the ability to discriminate the target from the background, local contrast ( Equation 1) and z-score ( Equation 2) functions were calculated centered on the target position with respect to the next-nearest neighbor pixels for every frame ( n = 12 on a hexagonal grid, see Figure 5 inset). For the example frame ( Figure 4 middle), the contrast of the target with the local surround for the luminance image was −0.52 and the z-score was −2.12. This z-score corresponded to a confidence level of 3.72% ( p = 0.0372) based on the non-target distribution of input values (two-tailed distribution). This level represents 194 false positives per frame (total frame size of 90 × 58) and thus is a poor level for accurately detecting targets. The contrast and z-score values improved to −0.85 and −8.03 respectively in the neuronal representation of the image ( Figure 4 lower), a z-score which indicated high confidence in the statistical independence of the target from the background ( p = 0.00446) and corresponded to an average false positive rate of less than 0.25 per frame. 
Figure 5
 
Target detection statistics. (a) z-score of the target and the local surround (next-nearest neighbors) over time for both the normalized luminance and averaged photoreceptor representation. (b) Ratio of photoreceptor and luminance z-score. The response of the photoreceptors consistently had a larger z-score (ratio >1 most times), meaning it would be easier to reliably determine target position from individual still frames. The data were smoothed by use of a zero-phase 15 ms moving average filter for display purposes. Inset: Hexagonal grid showing the center pixel identified to be the center of the target (black) and the next-nearest neighbors (gray) used as the local surrounding pixels for the calculation of z-score. The z-score is the difference between the target pixel and local surround divided by variation in the local surround; it indicates how easily the target pixel could be distinguished from the surround.
Figure 5
 
Target detection statistics. (a) z-score of the target and the local surround (next-nearest neighbors) over time for both the normalized luminance and averaged photoreceptor representation. (b) Ratio of photoreceptor and luminance z-score. The response of the photoreceptors consistently had a larger z-score (ratio >1 most times), meaning it would be easier to reliably determine target position from individual still frames. The data were smoothed by use of a zero-phase 15 ms moving average filter for display purposes. Inset: Hexagonal grid showing the center pixel identified to be the center of the target (black) and the next-nearest neighbors (gray) used as the local surrounding pixels for the calculation of z-score. The z-score is the difference between the target pixel and local surround divided by variation in the local surround; it indicates how easily the target pixel could be distinguished from the surround.
Figure 5A shows the z-scores of the target for both the normalized luminance and photoreceptor responses over the entire duration of the movie. The improvement in target detectability illustrated by the example frame in Figure 4 was commonplace over the entire sequence with the ratio of target/background z-scores in the neuronal image versus the luminance image ( Figure 5B) rarely less than 1, and frequently much greater. 
Calculating the contrast of the target with respect to the local surroundings seriously underestimated the improvements granted by photoreceptor processing. The average local contrast of the target in the luminance image was −0.49 ± 0.11 (mean ± standard deviation). The photoreceptor representation showed only a modest 10% improvement (−0.54 ± 0.24). However, when the z-score, which accounts for not only the average difference between the surround and the target but also the variability in the surround, was calculated the improvements could be better quantified. The average z-score for the target in the luminance image was −3.05 ± 1.67 while in the photoreceptor representation it was, on average, 28% greater at −3.89 ± 2.04. 
Modeling
To determine how much of this improvement in target detectability could be explained by known photoreceptor response properties, we implemented a modified (Mah et al., 2008) software version (Brinkworth, Mah, & O'Carroll, 2007) of a photoreceptor model originally developed by van Hateren and Snippe (2001). 
Using only single pixel analysis techniques this result confirmed previous findings by van Hateren and Snippe that the model photoreceptor was a good mimic for the biological system with a high correlation ( r 2 = 0.962 ± 0.0211) and significant coherence well beyond 100 Hz (see Figure 6 and for comparison between biological photoreceptors and model). In addition, there was an improvement in the z-score of the target in the simulation over both the raw luminance and biological photoreceptors (average z-score −4.14 ± 2.21). However, differences in the overall responses, and the only approximately Gaussian histogram distributions, necessitated further non-parametric analysis. 
Figure 6
 
Comparing biological and modeling results. (a) Averaged response of three biological photoreceptor cells ( n = 11) and the photoreceptor model to a square wave stimulus of duration 3 s. Pre- and post-stimulus luminance was 70 cd/m 2 while the stimulus amplitude was 7 000 cd/m 2 (corresponded to 0.1% and 10% of the maximum brightness of the stimulus LED). Dashed line shows the mean pre-stimulus level. Model results have been normalized to show the similarity in shape between the model and it's biological equivalent. (b) Average coherence from all pixels in the movie (90 × 58) between raw luminance (linear model) and the photoreceptor model with the biological photoreceptor over the entire movie duration. With the exception of power line interference at multiplies of 50 Hz, the model is a very close match to the biological photoreceptor. By comparing the difference with the luminance coherence in the frequency range DC −150 Hz, the photoreceptor model was found to be 34.7 ± 26.2% (mean ± standard deviation) better at predicting the cell response than a basic linear model.
Figure 6
 
Comparing biological and modeling results. (a) Averaged response of three biological photoreceptor cells ( n = 11) and the photoreceptor model to a square wave stimulus of duration 3 s. Pre- and post-stimulus luminance was 70 cd/m 2 while the stimulus amplitude was 7 000 cd/m 2 (corresponded to 0.1% and 10% of the maximum brightness of the stimulus LED). Dashed line shows the mean pre-stimulus level. Model results have been normalized to show the similarity in shape between the model and it's biological equivalent. (b) Average coherence from all pixels in the movie (90 × 58) between raw luminance (linear model) and the photoreceptor model with the biological photoreceptor over the entire movie duration. With the exception of power line interference at multiplies of 50 Hz, the model is a very close match to the biological photoreceptor. By comparing the difference with the luminance coherence in the frequency range DC −150 Hz, the photoreceptor model was found to be 34.7 ± 26.2% (mean ± standard deviation) better at predicting the cell response than a basic linear model.
Statistical analysis
In order to quantify the improvement in target detection achieved by the three systems (luminance, biological photoreceptor, and photoreceptor model), normalized histograms of target and background z-scores were generated and receiver operating characteristic (ROC) curves constructed. These results are illustrated in Figure 7
Figure 7
 
Target detectability. Normalized histograms of rectified target and background z-score values for (a) luminance, (b) photoreceptor, and (c) model. (d) Receiver operating characteristic (ROC) curve for the z-score data, calculated by plotting the area under the background histogram (false positives) versus the area under the target histogram (targets detected) to the right of all possible detection limit values. Dotted vertical lines indicate the 1% value, i.e., 99% of the background data is below this level, corresponds to 1% false positives. The rate of target detection outside of this limit is more than 1.7 times larger in the photoreceptor and model representation than in the luminance image while the horizontal shift in the ROC curves at the 50% target detection level is about 4 times. This means that for a detection level around this point the photoreceptor and model responses would have about 4 times less false positives than the raw luminance. There was no practical or statistical difference between the ROC curves or the histogram limits between the photoreceptor representation and the response of the model. ROC curves show the number of targets correctly detected ( y-axis) for any level of false detection events ( x-axis).
Figure 7
 
Target detectability. Normalized histograms of rectified target and background z-score values for (a) luminance, (b) photoreceptor, and (c) model. (d) Receiver operating characteristic (ROC) curve for the z-score data, calculated by plotting the area under the background histogram (false positives) versus the area under the target histogram (targets detected) to the right of all possible detection limit values. Dotted vertical lines indicate the 1% value, i.e., 99% of the background data is below this level, corresponds to 1% false positives. The rate of target detection outside of this limit is more than 1.7 times larger in the photoreceptor and model representation than in the luminance image while the horizontal shift in the ROC curves at the 50% target detection level is about 4 times. This means that for a detection level around this point the photoreceptor and model responses would have about 4 times less false positives than the raw luminance. There was no practical or statistical difference between the ROC curves or the histogram limits between the photoreceptor representation and the response of the model. ROC curves show the number of targets correctly detected ( y-axis) for any level of false detection events ( x-axis).
From the non-parametric ROC analysis, it was found that the number of false positives for a detection level of 50% was reduced by 73.0% for the photoreceptor (0.520%) and 77.7% for the model (0.428%) when compared to the raw luminance (1.92%). The target detectability at the 1% error rate improved from 34.9% in the luminance image to 61.2% and 64.4% in the photoreceptor and model representations respectively. The area under the ROC curve (using a log false positive axis) improved from 44.6% before processing to 58.6% in the photoreceptor representation and 59.5% in the model. 
The analysis of the neurobiological and modeling results indicate that photoreceptor-like processing is extremely beneficial in target detection. While there were some differences between the photoreceptor and model responses these were minimal within the pass-band, and most likely due to residual noise in the photoreceptor recordings, despite averaging several presentations. Importantly, with respect to the target detection task, the model faithfully captures the practical improvement that characterizes the biological photoreceptors. While it is possible that additional physiological components not incorporated in the model of photoreceptor function, such as the synaptic feedback recently described for photoreceptors in dipteran flies (Zheng et al., 2006), might play a role in photoreceptor response they had little impact under such conditions. 
Static versus dynamic components
In order to determine if it was the dynamic components of the photoreceptor or the underlying static non-linearities that were contributing to this increase in target detectability, we created a static version of the photoreceptor model, shown in Figure 8. This model is essentially logarithmic over most of the coding range with a linear region under low luminance conditions indicative of a Naka–Rushton transform ( x/( x + c)). 
Figure 8
 
Static photoreceptor response curve. The transfer function of the photoreceptor model with all dynamic components removed by short-circuiting all low-pass filters. In this case the response is divorced from all history and the model depends only on the current luminance input. The response is linear at low luminance but logarithmic over most of the coding range. Values over 0.6 were only achievable at very high luminance levels (larger than generated under these conditions) or, in the case of the dynamic model, with rapid changes from very dark to very bright stimuli.
Figure 8
 
Static photoreceptor response curve. The transfer function of the photoreceptor model with all dynamic components removed by short-circuiting all low-pass filters. In this case the response is divorced from all history and the model depends only on the current luminance input. The response is linear at low luminance but logarithmic over most of the coding range. Values over 0.6 were only achievable at very high luminance levels (larger than generated under these conditions) or, in the case of the dynamic model, with rapid changes from very dark to very bright stimuli.
When the static photoreceptor model was used to process the movie sequence, the area under the resulting ROC curve was much higher than derived from the dynamic model (69.5% static compared to 59.5%). This increase in target detectability over the dynamic model indicates that it is the static non-linear properties of the photoreceptor that improved the ability to detect moving targets among moving clutter. However, when we kept the background image constant and simulated the same target motion path, the area under the ROC curve was 94.7% for the dynamic model and only 56.3% for the static model. Hence, indicating that at low (or no) self-motion velocities, such as when flies are perched or hovering, then the dynamic adaptations within the photoreceptor are extremely beneficial for target detection. 
Multiple environments
Table 1 provides a summary of the three target detectability criteria used (50% detection rate, 1% error rate and ROC area) for each of the 6 panoramic images. Under each of the conditions tested, the ability to discriminate between the background and the targets was significantly improved by processing with the biological or model photoreceptors. Thus, showing that it is not necessary for there to be a velocity difference between the target and the background for an improvement in target detection to be achieved. The use of paired sampled t-tests also showed that the model did significantly better than the biological photoreceptor ( p < 0.05) at detecting targets among the panoramic images regardless of the parameter used to quantify target detection. Unlike in the movie condition when the background was in motion, the static non-linear photoreceptor model did not do as well as the dynamic model in detecting the targets in any of the environments tested. 
Table 1
 
Target detection statistics. The ability to discriminate between targets and background in the 6 panoramic images depicted in Figure 9 is improved by the application of photoreceptor-like processing by a biological photoreceptor, a model that accounts for the non-linear temporal dynamics (model), and a model that only accounts for the static non-linearities of photoreceptor processing (static model). The 50% detection rate indicates the number of false targets detected given that 50% of the real targets are identified (low number means less errors and better detection). The 1% error rate is the number of targets identified given the chance of detecting a false target is 1% (high numbers mean a larger number of targets could be identified, better detection). The ROC area is the area under the receive operating characteristic curve between 0.01% and 100% based on a logarithmic false positive axis (higher numbers mean that on average a target will be easier to detect). Image A is a highly cluttered scene, which made target detection extremely challenging. Image B was an urban scene with structural components not normally seen in nature. Image C was a sparse scene meaning it was relatively easy to detect the targets, even without processing. While images D, E, and F represented a midlevel of complexity. Due to resolution limits, the minimum possible value for 50% detection rate was 0.01%, which corresponded to 50% of targets being detectable without any false positives. SEM stands for standard error of the mean. The better detection levels obtained by the model over the photoreceptor are likely the result of the model being essentially free from noise. The better detection rates for the model that includes the dynamic properties reflect the importance of such processing under these conditions (target speed matched to background).
Table 1
 
Target detection statistics. The ability to discriminate between targets and background in the 6 panoramic images depicted in Figure 9 is improved by the application of photoreceptor-like processing by a biological photoreceptor, a model that accounts for the non-linear temporal dynamics (model), and a model that only accounts for the static non-linearities of photoreceptor processing (static model). The 50% detection rate indicates the number of false targets detected given that 50% of the real targets are identified (low number means less errors and better detection). The 1% error rate is the number of targets identified given the chance of detecting a false target is 1% (high numbers mean a larger number of targets could be identified, better detection). The ROC area is the area under the receive operating characteristic curve between 0.01% and 100% based on a logarithmic false positive axis (higher numbers mean that on average a target will be easier to detect). Image A is a highly cluttered scene, which made target detection extremely challenging. Image B was an urban scene with structural components not normally seen in nature. Image C was a sparse scene meaning it was relatively easy to detect the targets, even without processing. While images D, E, and F represented a midlevel of complexity. Due to resolution limits, the minimum possible value for 50% detection rate was 0.01%, which corresponded to 50% of targets being detectable without any false positives. SEM stands for standard error of the mean. The better detection levels obtained by the model over the photoreceptor are likely the result of the model being essentially free from noise. The better detection rates for the model that includes the dynamic properties reflect the importance of such processing under these conditions (target speed matched to background).
Image A B C D E F Mean SEM
50% detection rate Luminance 9.29 1.61 0.01 2.42 6.25 6.25 4.31 1.43
Photoreceptor 4.88 0.98 0.01 0.67 2.21 1.51 1.71 0.70
Model 4.49 0.44 0.01 0.32 1.34 0.93 1.26 0.67
Static model 5.25 0.68 0.01 1.18 3.63 2.33 2.18 0.81
1% error rate Luminance 6 42 86 42 22 22 36.67 11.35
Photoreceptor 23 51 93 55 37 43 50.33 9.70
Model 27 58 94 60 43 52 55.67 9.11
Static model 18 56 91 49 31 36 46.83 10.39
ROC area Luminance 26.45 44.28 83.30 49.52 36.59 36.59 46.12 8.09
Photoreceptor 35.25 51.56 88.59 59.89 46.42 47.22 54.82 7.50
Model 37.30 55.77 89.52 61.49 49.71 50.57 57.39 7.21
Static model 32.81 52.37 87.39 55.37 43.52 42.30 52.29 7.74
The largest difference between the model and biological photoreceptor was in image B, the most “urban” of the images and hence the least like the natural world the fly visual system evolved to operate in. However, the image was also the darkest used (in real-world luminance terms) so a decrease in the SNR of the biological photoreceptor could also be responsible for the larger difference between the biology and the model. This also explains why image B was the only condition in which the static model (also noise free) did better than the biological photoreceptor. 
The detectability of the target was substantially larger in the movie sequence compared to the panoramic images that most closely corresponded to the scene (compare Figure 7D and Figures 9A, 9E, and 9F), of particular relevance is panorama F, which was taken at the site of the movie sequence. In this case there is an almost 3-fold reduction in the number of false targets at the 50% detection level in the movie compared to the panorama. This difference was not caused by the location of the targets because when analysis of the panoramas was limited only to the upper region of the image (i.e., the region populated by the target in the movie), the target detection rate was even worse, although still substantially better than the raw luminance image (data not shown). This shows that, while target detection is still enhanced without them, relative motion cues are exploited by the early visual system to enhance target salience just as they are to improve the performance of more elaborate bioinspired target detection algorithms (Wiederman, Shoemaker, & O'Carroll, 2008). 
Figure 9
 
Panorama results. Input images and the associated receiver operator characteristic (ROC) curves for targets placed within them for both the raw luminance images (luminance) as well as the image processed by the photoreceptor (photoreceptor) and the model of the photoreceptor (model). The input images used were the green channel of high dynamic range panoramic images. These images have been normalized, gamma corrected, and reduced to 8 bits of dynamic range for reproduction but were kept in raw format for the analysis and the input to the photoreceptor and the model. Processing by the biological photoreceptor and the model resulted in an improvement in target detectability (as seen in a left shift of the ROC curves) in all cases, regardless of the complexity of the scene.
Figure 9
 
Panorama results. Input images and the associated receiver operator characteristic (ROC) curves for targets placed within them for both the raw luminance images (luminance) as well as the image processed by the photoreceptor (photoreceptor) and the model of the photoreceptor (model). The input images used were the green channel of high dynamic range panoramic images. These images have been normalized, gamma corrected, and reduced to 8 bits of dynamic range for reproduction but were kept in raw format for the analysis and the input to the photoreceptor and the model. Processing by the biological photoreceptor and the model resulted in an improvement in target detectability (as seen in a left shift of the ROC curves) in all cases, regardless of the complexity of the scene.
Discussion
By recording the response of fly photoreceptors during visual stimulation containing a target seen against a cluttered background, we show that the process of enhancing target salience begins at the earliest stages of vision. Temporal processing by photoreceptors alone, in the absence of any spatial interactions, improved the discrimination of targets from background (essentially a spatial task) by around 70%. Furthermore, this improvement is captured by elaborate models of photoreceptor temporal non-linear dynamics. 
Although previous work has looked at photoreceptor representation of targets under uniform background conditions (Burton & Laughlin, 2003), this is the first study to show the complete neuronal image of a fly under real-world environmental conditions (i.e., cluttered background) utilizing both time and space. The results show that the temporal pre-processing performed by the photoreceptors greatly enhances the ability to not only visualize scenes within the limited bandwidth of neuronal signaling but also to identify small targets as they move against complex backgrounds. We also show that this ability is not dependent on either relative motion between the target and the surround or on the type of scene. 
It is suspected that non-linear processing by the visual system is matched to the statistics of natural scenes (Simoncelli & Olshausen, 2001); hence, a major aim of this study was to examine the effect of such processing in a naturalistic setting. This allowed the reconstruction of a full 3-dimensional (2 space and 1 time) representation of the neural activity of a biological photoreceptor array under realistic lighting and was the only way to investigate questions such as how the identification of a mobile target within a complex moving scene is effected by photoreceptor processing. 
This same movie sequence has been used recently to illustrate coding properties of motion sensitive neurons in the fly lobular plate (Nordström et al., 2008). Ideally a sequence such as the one presented here would be based upon a real pursuit flight. However, the analysis of fly chases is complicated by highly irregular flight maneuvers (Boeddeker, Kern, & Egelhaaf, 2003) making tracking and flight reconstruction complex. Possible saccadic movements of the head (Land, 1973; Wagner, 1986) make it almost impossible to determine the exact orientation of the eyes during free flight and have only been possible in restricted environments (van Hateren & Schilstra, 1999). 
In determining the course for our robot, we attempted to mimic a plausible chase sequence with the target being pursued in such a way as to place it in the top center of the visual field, just like flies do (Collett & Land, 1978). Unavoidable delays between recording a chase and reconstructing it (i.e., analysis of video footage to determine fly position) would lead to changes in lighting and other environmental variables thereby reducing the similarity to the real event. Since photoreceptor responses vary based on recent history (Matic & Laughlin, 1981), our approach made it possible to obtain a sequence that started and ended at approximately the same point and orientation in space in order to facilitate appropriate photoreceptor adaptation states when the movie was played in a loop. Finally, since the aim of the paper was to determine the ability for photoreceptor processing to enhance target salience the findings still hold regardless of the flight dynamics used. In some ways the sequence is a “worst case” as it is known that flies will reduce the contribution of the background by viewing targets against the sky (Wehrhahn et al., 1982) and utilizing head movements to minimize the movement of the target on the retina (van Hateren & Schilstra, 1999). Although our reconstructed image sequence is perhaps best described as “naturalistic” rather than a true natural sequence, it represents an accurate glimpse into the neuronal representation of an entire scene in both space and time and is relevant for applications of artificial target detection systems. 
By extending the approach to analyzing responses to all pixels within a scene, it was possible to look at information transfer in a 3-dimensional sense for a well-defined visual task. In other words, we were able to look at the relationship between features within the scene and their surround, both before and after processing. This has not been previously possible. 
Here we show that target detectability was enhanced significantly after processing by the biological photoreceptor and that functionally this enhancement is captured by the implementation of a photoreceptor model (van Hateren & Snippe, 2001). Hence, although there are some differences in the detail (i.e., raw target z-score values), the model appears to explain almost all of the biological response from a practical informational point of view in this specific, but highly relevant, task of small target detection. Where differences did occur they were more likely to be the result of biological or recording noise rather than significant differences between the processing by the model and photoreceptor. 
Perhaps the most surprising aspect of this is that the enhancement is independent of 2-dimensional interactions, i.e., it does not require complex center-surround interactions. In almost all other approaches to studying detection of small targets among cluttered backgrounds, a fundamental operation is to enhance the difference between the target and the local surround by means of a classical center-surround operation (spatial high-pass filtering), where the weighted values of adjacent pixels are subtracted from the central pixel. This enhances small objects and makes them easier to detect. This operation takes place in the vertebrate retina (Baccus, 2007) and the second order neurons (large monopolar cells) of insects (Shaw, 1984); however, it is not present in insect photoreceptors (Smakman & Stavenga, 1987). By only playing one pixel at a time, and by using an extended source so that the surrounding pixels are all driven to the same level, we have removed any possible spatial interactions that could be fed back to the photoreceptors from higher order neurons as seen in Drosophila (Zheng et al., 2006). However, there is no evidence that stimulation outside a photoreceptor's receptive field influences the response either directly or via possible higher order feedback (James, 1992). Thus, our signal is just processed in the time domain. In other words, the enhancement of spatial information that we demonstrate in this system is purely the result of temporal processing, occurring independently on every pixel. 
When considering the effect of the static and dynamic non-linear components of the photoreceptor processing, it is important to keep the behavior in mind. While the static components alone provided a larger improvement in target detectability during the simulated flight they performed much worse under the simulated hovering/perching condition and when there was no relative motion between background and target in the panoramic scenes. Since it is often under low velocity conditions (e.g., hovering) that male files first detect potential mates or rivals for pursuit this situation should not be discounted. Furthermore, it is important to note that while the target was an important object within the moving environment it was not the only one of consequence. The dynamic model of the photoreceptor enhanced other objects, such as the tree branches, as seen in Figure 4, that are important for navigational purposes and were not as prevalent in the static model. The static model of photoreceptor function does not make the best use of the available output bandwidth by continuously monitoring and changing the gain as does the dynamic model (van Hateren, 1997). While that has no impact in a noise-free model, it would be a serious limitation in biology where the signal-to-noise ratio is a major consideration (van Hateren, 1992a) or in a bandwidth limited second stage of processing, whether an artificial 8-bit digital system or a noisy interneuron. Finally, after-images of the target caused by the system dynamics, which decreased the detectability of the target according the types of measurements used in this paper, can be exploited in more sophisticated bio-inspired target detection models to enhance target salience (Wiederman, Shoemaker, & O'Carroll, 2007). 
In providing insight into the degree to which the photoreceptor representation improves the salience of features of interest, our work highlights the importance of appropriate sampling and processing for subsequent higher-order visual tasks. This suggests that a biomimetic approach to modeling this “front-end” processing may have potential to simplify approaches to artificial vision for subsequent segregation of the motion of target from background. Given our recent description of neurons capable of this task (Barnett et al., 2007; Nordström et al., 2006), the fly thus provides a proof of concept that this complex task may be solved by simple and robust mechanisms. Additionally, a model that performs just as well in this task as the biological system of the fly shows that a solution to this difficult problem need not involve complex spatial interactions. 
In order to test these findings under different conditions, it would have been preferable to reconstruct multiple movies from different locations to ensure the results are scene-invariant. This however would be a major undertaking as the time required to perform these electrophysiological experiments is large. To fully record a 10-s sequence utilizing both rotation and translation covering a 90 × 67 degree patch of space with 3 repeats (as described in this paper) requires approximately 50 hours of useful intracellular recordings. A more effective way was to show that similar target enhancement was achieved when targets were inserted into panoramic images that were animated to simulate yaw rotation only. While this type of motion is not as accurate as the full movie sequences, it does show that the enhancement of target salience is not specific to one scene or one type of motion and that the photoreceptor model also works under such situations. Thus, leading to the possibility of using the model in place of real photoreceptors to more fully explore the parameter space including testing different scenarios, speeds, and target luminances. 
This shows that just as the optical design of eyes is adapted for different tasks (Hughes, 1977; Lythgoe, 1979), neuronal processing may also be different in animals displaying different behaviors. Further testing is required in order to investigate this hypothesis including an analysis within species between male flies (which do chase small targets) and female flies (which do not). It will also be interesting to test for neurophysiological differences between flies with different optical specializations (i.e., acute vs. bright zones) (Straw et al., 2006) to see if target detection is altered. 
Acknowledgments
We thank the manager of the Botanic Gardens of Adelaide for allowing insect collection. The project was supported by grants from the Australian Research Council (LP0667744) and the US Air Force Office of Scientific Research (FA 9550-04-1-0294). We would also like to thank the anonymous reviewers whose constructive comments substantially improved this manuscript. Copies of the high dynamic range movie and panoramas used in this study are available upon request to the corresponding author RSAB. 
Commercial relationships: none. 
Corresponding author: Russell Brinkworth. 
Email: russell.brinkworth@adelaide.edu.au. 
Address: Biomimetic Vision Laboratory, School of Molecular and Biomedical Science, The University of Adelaide, Adelaide SA 5005, Australia. 
References
Attneave, F. (1954). Some informational aspects of visual perception. Psychological Review, 61, 183–193. [PubMed] [CrossRef] [PubMed]
Baccus, S. A. (2007). Timing and computation in inner retinal circuitry. Annual Review of Physiology, 69, 271–290. [PubMed] [CrossRef] [PubMed]
Barnett, P. D. Nordström, K. O'Carroll, D. C. (2007). Retinotopic organization of small-field-target-detecting neurons in the insect visual system. Current Biology, 17, 569–578. [PubMed] [Article] [CrossRef] [PubMed]
Boeddeker, N. Kern, R. Egelhaaf, M. (2003). Chasing a dummy target: Smooth pursuit and velocity control in male blowflies. Proceedings of the Royal Society B: Biological Sciences, 270, 393–399. [PubMed] [Article] [CrossRef]
Brinkworth, R. S. A. Mah, E. L. O'Carroll, D. C. (2007). Bioinspired pixel-wise adaptive imaging.
Burton, B.G. Laughlin, S.B. (2003). Neural images of pursuit targets in the photoreceptor arrays of male and female houseflies Musca domestica. Journal of Experimental Biology, 206, 3963–3977. [PubMed] [Article] [CrossRef] [PubMed]
Collett, T. (1971). Visual neurones for tracking moving targets. Nature, 232, 127–130. [PubMed] [CrossRef] [PubMed]
Collett, T. S. Land, M. F. (1975). Visual control of flight behaviour in the hoverfly, Syritta pipiens L.. Journal of Comparative Physiology A, 99, 1–66 [CrossRef]
Collett, T. S. Land, M. F. (1978). How hoverflies compute interception courses. Journal of Comparative Physiology A, 125, 191–204. [CrossRef]
Debevec, P. E. Malik, J. (1997). Recovering high dynamic range radiance maps from photographs.
de Ruyter van Steveninck, R. R. Laughlin, S. B. (1996). The rate of information transfer at graded-potential synapses. Nature, 379, 642–645. [CrossRef]
Dror, R. O. O'Carroll, D. C. Laughlin, S. B. (2000). The role of natural image statistics in biological motion estimation. Springer Lecture Notes in Computer Science, 181, 492–501.
Egelhaaf, M. (1985). On the neuronal basis of figure-ground discrimination by relative motion in the visual-system of the fly: II Figure-detection cells, a new class of visual interneurones. Biological Cybernetics, 52, 195–209. [CrossRef]
Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A, Optics and Image Science, 4, 2379–2394. [PubMed] [CrossRef] [PubMed]
Frye, M. A. Dickinson, M. H. (2001). Fly flight: A model for the neural control of complex behavior. Neuron, 32, 385–388. [PubMed] [Article] [CrossRef] [PubMed]
Gilbert, C. Strausfeld, N. J. (1992). Small-field neurons associated with oculomotor and optomotor control in muscoid flies: Functional organization. Journal of Comparative Neurology, 316, 72–86. [PubMed] [CrossRef] [PubMed]
Hanley, J. A. McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC curve. Radiology, 143, 29–36. [PubMed] [Article] [CrossRef] [PubMed]
Hughes, A. Crescitelli, F. (1977). The topography of vision in mammals. Handbook of sensory physiology. (VII pp. 613–756). Berlin: Springer-Verlag.
James, A. C. (1990). White-noise studies in the fly lamina.
James, A. C. Pinter, R. B. Nabet, B. (1992). Nonlinear operator network models of processing in the fly lamina. Nonlinear vision: Determination of neural receptive fields, function, and networks. (pp. 39–73). London: CRC Press.
Juusola, M. Uusitalo, R. O. Weckström, M. (1995). Transfer of graded potentials at the photoreceptor-interneuron synapse. Journal of General Physiology, 105, 117–148. [PubMed] [Article] [CrossRef] [PubMed]
Kern, R. van Hateren, J. H. Michaelis, C. Lindemann, J. P. Egelhaaf, M. (2005). Function of a fly motion-sensitive neuron matches eye movements during free flight. PLoS Biology, 3,
Land, M. F. (1973). Head movement of flies during visually guided flight. Nature, 243, 299–300. [CrossRef]
Land, M. F. Collett, T. S. (1974). Chasing behaviour of houseflies. Journal of Comparative Physiology A, 156, 525–538. [CrossRef]
Land, M. F. Eckert, H. M. (1985). Maps of the acute zones of fly eyes. Journal of Comparative Physiology. A, 156, 525–538. [CrossRef]
Laughlin, S. B. de Ruyter van Steveninck, R. R. Anderson, J. C. (1998). The metabolic cost of neural information. Nature Neuroscience, 1, 36–41. [PubMed] [Article] [CrossRef] [PubMed]
Laughlin, S. B. Horridge, G. A. (1971). Angular sensitivity of the retinula cells of dark-adapted worker bee. Journal of Comparative Physiology A, 74, 329–335.
Lythgoe, J. N. (1979). The ecology of vision. Oxford: Clarendon Press.
Mah, E. L. Brinkworth, R. S. O'Carroll, D. C. (2008). Implementation of an elaborated neuromorphic model of a biological photoreceptor. Biological Cybernetics, 98, 357–369. [PubMed] [CrossRef] [PubMed]
Matic, T. Laughlin, S. B. (1981). Changes in the intensity-response function of an insect's photoreceptors due to light adaptation. Journal of Comparative Physiology A, 145, 169–177. [CrossRef]
Nordström, K. Barnett, P. D. Moyer de Miguel, I. M. Brinkworth, R. S. O'Carroll, D. C. (2008). Sexual dimorphism in the hoverfly motion vision pathway. Current Biology, 18, 661–667. [PubMed] [Article] [CrossRef] [PubMed]
Nordström, K. Barnett, P. D. O'Carroll, D. C. (2006). Insect detection of small targets moving in visual clutter. PLoS Biology, 4,
O'Carroll, D. C. (1993). Feature-detecting neurons in dragonflies. Nature, 357, 336–339.
Payne, R. Howard, J. (1981). Response of an insect photoreceptor: A simple log-normal model. Nature, 290, 415–416. [CrossRef]
Shannon, C. E. (1948). The mathematical theory of communication. Bell System Technical Journal, 27, 3–91. [CrossRef]
Shaw, S. R. (1984). Early visual processing in insects. Journal of Experimental Biology, 112, 225–251. [PubMed] [PubMed]
Simoncelli, E. P. Olshausen, B. A. (2001). Natural image statistics and neural representation. Annual Review of Neuroscience, 24, 1193–1216. [PubMed] [CrossRef] [PubMed]
Smakman, J. G. J. Stavenga, D. G. (1987). Angular sensitivity of blowfly photoreceptors—Broadening by artificial electrical coupling. Journal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology, 160, 501–507. [CrossRef]
Snyder, A. W. (1977). Acuity of compound eyes: Physical limitations and design. Journal of Comparative Physiology A, 116, 161–182. [CrossRef]
Snyder, A. W. Bossomaier, T. J. Hughes, A. Blakemore, C. (1990). The theory of comparative eye design. Vision: Coding and efficiency. (pp. 45–52). Cambridge: Cambridge University Press.
Snyder, A. W. Stavenga, D. G. Laughlin, S. B. (1977). Spatial information capacity of compound eyes. Journal of Comparative Physiology A, 116, 183–207. [CrossRef]
Srinivasan, M. V. Guy, R. G. (1990). Spectral properties of movement perception in the dronefly Eristalis. Journal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology, 166, 287–295.
Srinivasan, M. V. Laughlin, S. B. Dubs, A. (1982). Predictive coding: A fresh view of inhibition in the retina. Proceedings of the Royal Society of London B: Biological Sciences, 216, 427–459. [PubMed] [CrossRef]
Stavenga, D. G. (1976). Fly visual pigments Difference in visual pigments of blowfly and dronefly peripheral retinula cells. Journal of Comparative Physiology A, 111, 137–152. [CrossRef]
Stavenga, D. G. (2003). Angular and spectral sensitivity of fly photoreceptors I Integrated facet lens and rhabdomere optics. Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 189, 1–17. [PubMed]
Straw, A. D. Warrant, E. J. O'Carroll, D. C. (2006). A “bright zone” (in male hoverfly Eristalis tenax) eyes and associated faster motion detection and increased contrast sensitivity. Journal of Experimental Biology, 209, 4339–4354. [PubMed] [Article] [CrossRef] [PubMed]
van Hateren, J. H. (1992a). A theory of maximizing sensory information. Biological Cybernetics, 68, 23–29. [PubMed] [CrossRef]
van Hateren, J. H. (1992b). Real and optimal neural images in early vision. Nature, 360, 68–70. [PubMed] [CrossRef]
van Hateren, J. H. (1997). Processing of natural time series of intensities by the visual system of the blowfly. Vision Research, 37, 3407–3416. [PubMed] [CrossRef] [PubMed]
van Hateren, J. H. Schilstra, C. (1999). Blowfly flight and optic flow II Head movements during flight. Journal of Experimental Biology, 202, 1491–1500. [PubMed] [Article] [PubMed]
van Hateren, J. H. Snippe, H. P. (2001). Information theoretical evaluation of parametric models of gain control in blowfly photoreceptor cells. Vision Research, 41, 1851–1865. [PubMed] [CrossRef] [PubMed]
van Hateren, J. H. Snippe, H. P. (2006). Phototransduction in primate cones and blowfly photoreceptors: Different mechanisms, different algorithms, similar response. Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 192, 187–197. [PubMed] [CrossRef]
van Hateren, J. H. Srinivasan, M. V. Wait, P. B. (1990). Pattern recognition in bees: Orientation discrimination. Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 167, 649–654. [CrossRef]
van Hateren, J. H. van der Schaaf, A. (1996). Temporal properties of natural scenes. Proceedings of the IS&T/SPIE, San Jose
Wagner, H. (1986). Flight performance and visual control of flight of the free-flying housefly (Musca-domestica L.): II. Pursuit of targets. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 312, 581–595. [CrossRef]
Wehrhahn, C. Poggio, T. Bülthof, H. (1982). Tracking and chasing in houseflies (Musca. Biological Cybernetics, 45, 123–130. [CrossRef]
Wiederman, S. D. Shoemaker, P. A. O'Carroll, D. C. (2008). A model for the detection of moving targets in visual clutter inspired by insect physiology. PLoS One, 3,
Yeates, D. K. Dodson, G. N. (1990). The mating system of a bee fly (Diptera: Bombyliidae: I Non-resource-based hilltop territoriality and a resource based alternative. Journal of Insect Behavior, 3, 603–617. [CrossRef]
Zheng, L. de Polavieja, G. G. Wolfram, V. Asyali, M. H. Hardie, R. C. Juusola, M. (2006). Feedback network controls photoreceptor output at the layer of first visual synapses in Drosophila. Journal of General Physiology, 127, 495–510. [PubMed] [Article] [CrossRef] [PubMed]
Wiederman, S. Shoemaker, P. A. O'Carroll, D. C. (2007). Biologically inspired small target detection mechanisms. ISSNIP conference, Melbourne, Australia.
Figure 1
 
Method for movie playback to photoreceptors. Reconstruction of the neural representation of a natural scene in 2D spatial and temporal domains. (A) Segment of an original image of a natural scene, the optical input to the ommatidia and the photoreceptor signals corresponding to the optical input. Note the hexagonal sampling as in the insect eye. (B) Temporal sequence of images, the signals of such temporal sequences were played back sequentially to the eye of a fly one pixel over time in a vertical raster fashion (starting from the top left and moving down the column until the bottom then restarting from the top one column to the right). (C) The luminance input to a single ommatidium and the response of a photoreceptor to this optical input ( x = 45 degrees, y = 21 degrees). The photoreceptor membrane potential is reported after an average reverse normalization scale was applied to the data. While the photoreceptor membrane potential does follow some of the general patterns of the input luminance there are some notable transformations. Of most relevance are the semi-logarithmic response, which exaggerates the response to the darker areas, and adaptation, which causes the cell to respond in a decreasing way to constant or repetitive stimulation. Adaptation is most noticeable around 3.5 s and again around 7.5 s where an increasing luminance results in a decreasing response from the cell.
Figure 1
 
Method for movie playback to photoreceptors. Reconstruction of the neural representation of a natural scene in 2D spatial and temporal domains. (A) Segment of an original image of a natural scene, the optical input to the ommatidia and the photoreceptor signals corresponding to the optical input. Note the hexagonal sampling as in the insect eye. (B) Temporal sequence of images, the signals of such temporal sequences were played back sequentially to the eye of a fly one pixel over time in a vertical raster fashion (starting from the top left and moving down the column until the bottom then restarting from the top one column to the right). (C) The luminance input to a single ommatidium and the response of a photoreceptor to this optical input ( x = 45 degrees, y = 21 degrees). The photoreceptor membrane potential is reported after an average reverse normalization scale was applied to the data. While the photoreceptor membrane potential does follow some of the general patterns of the input luminance there are some notable transformations. Of most relevance are the semi-logarithmic response, which exaggerates the response to the darker areas, and adaptation, which causes the cell to respond in a decreasing way to constant or repetitive stimulation. Adaptation is most noticeable around 3.5 s and again around 7.5 s where an increasing luminance results in a decreasing response from the cell.
Figure 2
 
Cell health over time. Response levels of a single photoreceptor to identical square wave stimuli played between rows of the movie. These results were used to normalize the recordings so as to compensate for slight changes in cell health that inevitably occur over such long recordings and to permit data from different cells to be compared. The baseline gave an indication of the change in the DC offset of the recording while the gain indicated the cell health by way of the size of the response to a constant stimulus. Cell health degraded a little between 366 and 377 minutes of recording with the cell finally lost between 581 and 591 minutes.
Figure 2
 
Cell health over time. Response levels of a single photoreceptor to identical square wave stimuli played between rows of the movie. These results were used to normalize the recordings so as to compensate for slight changes in cell health that inevitably occur over such long recordings and to permit data from different cells to be compared. The baseline gave an indication of the change in the DC offset of the recording while the gain indicated the cell health by way of the size of the response to a constant stimulus. Cell health degraded a little between 366 and 377 minutes of recording with the cell finally lost between 581 and 591 minutes.
Figure 3
 
Distribution of values. Histogram of both the normalized luminance and the averaged normalized neuronal response values recorded over the entire movie sequence. Histogram bin width is 1% of used dynamic range. The membrane potential is reported after an average reverse normalization scale was applied to the averaged data. The adaptive non-linear encoding of brightness levels by the photoreceptors made better use of the range of values available. On the scale used in the movie reproduction, the white point (100% luminance value) corresponded to approximately 22 500 cd/m2, while the black and white points for the photoreceptors corresponded to −58.8 mV and −14.54 mV, respectively.
Figure 3
 
Distribution of values. Histogram of both the normalized luminance and the averaged normalized neuronal response values recorded over the entire movie sequence. Histogram bin width is 1% of used dynamic range. The membrane potential is reported after an average reverse normalization scale was applied to the averaged data. The adaptive non-linear encoding of brightness levels by the photoreceptors made better use of the range of values available. On the scale used in the movie reproduction, the white point (100% luminance value) corresponded to approximately 22 500 cd/m2, while the black and white points for the photoreceptors corresponded to −58.8 mV and −14.54 mV, respectively.
Figure 4
 
Single frame from the final reconstructed movie. Upper: Original luminance image as captured by the camera with target inserted. Middle: Luminance image after optical blur and hexagonal sampling to mimic the resolution of the fly. Lower: Neuronal representation of the scene as recorded from the biological photoreceptors. Image corresponds to 9.9 s after commencement of the movie (total duration 10.3 s). White circles identify the target position in each view. Images were scaled to fit onto an 8-bit look-up table such that the largest 0.1% of the values (across the entire movie) were mapped to white and the lowest 0.1% of the values were mapped to black. Every point within the neuronal representation was made up of the average of three different photoreceptor cells, which were normalized with respect to control stimuli played periodically throughout the experiment. The target (approximately 1.4 by 2.8 hexagonal pixels in size) was centered 42 hexagonal pixels from the left and 7 hexagonal pixels from the top in the photoreceptor reconstruction. The processing carried out by the biological photoreceptor enhanced not only the target against the local background but also various features within the background, such as tree branches. Vertical stripes in the reconstructed photoreceptor recordings highlight slight changes in the response of the cells over time. Note that due to the color filter on the acquisition camera this represented only the green spectrum.
Figure 4
 
Single frame from the final reconstructed movie. Upper: Original luminance image as captured by the camera with target inserted. Middle: Luminance image after optical blur and hexagonal sampling to mimic the resolution of the fly. Lower: Neuronal representation of the scene as recorded from the biological photoreceptors. Image corresponds to 9.9 s after commencement of the movie (total duration 10.3 s). White circles identify the target position in each view. Images were scaled to fit onto an 8-bit look-up table such that the largest 0.1% of the values (across the entire movie) were mapped to white and the lowest 0.1% of the values were mapped to black. Every point within the neuronal representation was made up of the average of three different photoreceptor cells, which were normalized with respect to control stimuli played periodically throughout the experiment. The target (approximately 1.4 by 2.8 hexagonal pixels in size) was centered 42 hexagonal pixels from the left and 7 hexagonal pixels from the top in the photoreceptor reconstruction. The processing carried out by the biological photoreceptor enhanced not only the target against the local background but also various features within the background, such as tree branches. Vertical stripes in the reconstructed photoreceptor recordings highlight slight changes in the response of the cells over time. Note that due to the color filter on the acquisition camera this represented only the green spectrum.
Figure 5
 
Target detection statistics. (a) z-score of the target and the local surround (next-nearest neighbors) over time for both the normalized luminance and averaged photoreceptor representation. (b) Ratio of photoreceptor and luminance z-score. The response of the photoreceptors consistently had a larger z-score (ratio >1 most times), meaning it would be easier to reliably determine target position from individual still frames. The data were smoothed by use of a zero-phase 15 ms moving average filter for display purposes. Inset: Hexagonal grid showing the center pixel identified to be the center of the target (black) and the next-nearest neighbors (gray) used as the local surrounding pixels for the calculation of z-score. The z-score is the difference between the target pixel and local surround divided by variation in the local surround; it indicates how easily the target pixel could be distinguished from the surround.
Figure 5
 
Target detection statistics. (a) z-score of the target and the local surround (next-nearest neighbors) over time for both the normalized luminance and averaged photoreceptor representation. (b) Ratio of photoreceptor and luminance z-score. The response of the photoreceptors consistently had a larger z-score (ratio >1 most times), meaning it would be easier to reliably determine target position from individual still frames. The data were smoothed by use of a zero-phase 15 ms moving average filter for display purposes. Inset: Hexagonal grid showing the center pixel identified to be the center of the target (black) and the next-nearest neighbors (gray) used as the local surrounding pixels for the calculation of z-score. The z-score is the difference between the target pixel and local surround divided by variation in the local surround; it indicates how easily the target pixel could be distinguished from the surround.
Figure 6
 
Comparing biological and modeling results. (a) Averaged response of three biological photoreceptor cells ( n = 11) and the photoreceptor model to a square wave stimulus of duration 3 s. Pre- and post-stimulus luminance was 70 cd/m 2 while the stimulus amplitude was 7 000 cd/m 2 (corresponded to 0.1% and 10% of the maximum brightness of the stimulus LED). Dashed line shows the mean pre-stimulus level. Model results have been normalized to show the similarity in shape between the model and it's biological equivalent. (b) Average coherence from all pixels in the movie (90 × 58) between raw luminance (linear model) and the photoreceptor model with the biological photoreceptor over the entire movie duration. With the exception of power line interference at multiplies of 50 Hz, the model is a very close match to the biological photoreceptor. By comparing the difference with the luminance coherence in the frequency range DC −150 Hz, the photoreceptor model was found to be 34.7 ± 26.2% (mean ± standard deviation) better at predicting the cell response than a basic linear model.
Figure 6
 
Comparing biological and modeling results. (a) Averaged response of three biological photoreceptor cells ( n = 11) and the photoreceptor model to a square wave stimulus of duration 3 s. Pre- and post-stimulus luminance was 70 cd/m 2 while the stimulus amplitude was 7 000 cd/m 2 (corresponded to 0.1% and 10% of the maximum brightness of the stimulus LED). Dashed line shows the mean pre-stimulus level. Model results have been normalized to show the similarity in shape between the model and it's biological equivalent. (b) Average coherence from all pixels in the movie (90 × 58) between raw luminance (linear model) and the photoreceptor model with the biological photoreceptor over the entire movie duration. With the exception of power line interference at multiplies of 50 Hz, the model is a very close match to the biological photoreceptor. By comparing the difference with the luminance coherence in the frequency range DC −150 Hz, the photoreceptor model was found to be 34.7 ± 26.2% (mean ± standard deviation) better at predicting the cell response than a basic linear model.
Figure 7
 
Target detectability. Normalized histograms of rectified target and background z-score values for (a) luminance, (b) photoreceptor, and (c) model. (d) Receiver operating characteristic (ROC) curve for the z-score data, calculated by plotting the area under the background histogram (false positives) versus the area under the target histogram (targets detected) to the right of all possible detection limit values. Dotted vertical lines indicate the 1% value, i.e., 99% of the background data is below this level, corresponds to 1% false positives. The rate of target detection outside of this limit is more than 1.7 times larger in the photoreceptor and model representation than in the luminance image while the horizontal shift in the ROC curves at the 50% target detection level is about 4 times. This means that for a detection level around this point the photoreceptor and model responses would have about 4 times less false positives than the raw luminance. There was no practical or statistical difference between the ROC curves or the histogram limits between the photoreceptor representation and the response of the model. ROC curves show the number of targets correctly detected ( y-axis) for any level of false detection events ( x-axis).
Figure 7
 
Target detectability. Normalized histograms of rectified target and background z-score values for (a) luminance, (b) photoreceptor, and (c) model. (d) Receiver operating characteristic (ROC) curve for the z-score data, calculated by plotting the area under the background histogram (false positives) versus the area under the target histogram (targets detected) to the right of all possible detection limit values. Dotted vertical lines indicate the 1% value, i.e., 99% of the background data is below this level, corresponds to 1% false positives. The rate of target detection outside of this limit is more than 1.7 times larger in the photoreceptor and model representation than in the luminance image while the horizontal shift in the ROC curves at the 50% target detection level is about 4 times. This means that for a detection level around this point the photoreceptor and model responses would have about 4 times less false positives than the raw luminance. There was no practical or statistical difference between the ROC curves or the histogram limits between the photoreceptor representation and the response of the model. ROC curves show the number of targets correctly detected ( y-axis) for any level of false detection events ( x-axis).
Figure 8
 
Static photoreceptor response curve. The transfer function of the photoreceptor model with all dynamic components removed by short-circuiting all low-pass filters. In this case the response is divorced from all history and the model depends only on the current luminance input. The response is linear at low luminance but logarithmic over most of the coding range. Values over 0.6 were only achievable at very high luminance levels (larger than generated under these conditions) or, in the case of the dynamic model, with rapid changes from very dark to very bright stimuli.
Figure 8
 
Static photoreceptor response curve. The transfer function of the photoreceptor model with all dynamic components removed by short-circuiting all low-pass filters. In this case the response is divorced from all history and the model depends only on the current luminance input. The response is linear at low luminance but logarithmic over most of the coding range. Values over 0.6 were only achievable at very high luminance levels (larger than generated under these conditions) or, in the case of the dynamic model, with rapid changes from very dark to very bright stimuli.
Figure 9
 
Panorama results. Input images and the associated receiver operator characteristic (ROC) curves for targets placed within them for both the raw luminance images (luminance) as well as the image processed by the photoreceptor (photoreceptor) and the model of the photoreceptor (model). The input images used were the green channel of high dynamic range panoramic images. These images have been normalized, gamma corrected, and reduced to 8 bits of dynamic range for reproduction but were kept in raw format for the analysis and the input to the photoreceptor and the model. Processing by the biological photoreceptor and the model resulted in an improvement in target detectability (as seen in a left shift of the ROC curves) in all cases, regardless of the complexity of the scene.
Figure 9
 
Panorama results. Input images and the associated receiver operator characteristic (ROC) curves for targets placed within them for both the raw luminance images (luminance) as well as the image processed by the photoreceptor (photoreceptor) and the model of the photoreceptor (model). The input images used were the green channel of high dynamic range panoramic images. These images have been normalized, gamma corrected, and reduced to 8 bits of dynamic range for reproduction but were kept in raw format for the analysis and the input to the photoreceptor and the model. Processing by the biological photoreceptor and the model resulted in an improvement in target detectability (as seen in a left shift of the ROC curves) in all cases, regardless of the complexity of the scene.
Table 1
 
Target detection statistics. The ability to discriminate between targets and background in the 6 panoramic images depicted in Figure 9 is improved by the application of photoreceptor-like processing by a biological photoreceptor, a model that accounts for the non-linear temporal dynamics (model), and a model that only accounts for the static non-linearities of photoreceptor processing (static model). The 50% detection rate indicates the number of false targets detected given that 50% of the real targets are identified (low number means less errors and better detection). The 1% error rate is the number of targets identified given the chance of detecting a false target is 1% (high numbers mean a larger number of targets could be identified, better detection). The ROC area is the area under the receive operating characteristic curve between 0.01% and 100% based on a logarithmic false positive axis (higher numbers mean that on average a target will be easier to detect). Image A is a highly cluttered scene, which made target detection extremely challenging. Image B was an urban scene with structural components not normally seen in nature. Image C was a sparse scene meaning it was relatively easy to detect the targets, even without processing. While images D, E, and F represented a midlevel of complexity. Due to resolution limits, the minimum possible value for 50% detection rate was 0.01%, which corresponded to 50% of targets being detectable without any false positives. SEM stands for standard error of the mean. The better detection levels obtained by the model over the photoreceptor are likely the result of the model being essentially free from noise. The better detection rates for the model that includes the dynamic properties reflect the importance of such processing under these conditions (target speed matched to background).
Table 1
 
Target detection statistics. The ability to discriminate between targets and background in the 6 panoramic images depicted in Figure 9 is improved by the application of photoreceptor-like processing by a biological photoreceptor, a model that accounts for the non-linear temporal dynamics (model), and a model that only accounts for the static non-linearities of photoreceptor processing (static model). The 50% detection rate indicates the number of false targets detected given that 50% of the real targets are identified (low number means less errors and better detection). The 1% error rate is the number of targets identified given the chance of detecting a false target is 1% (high numbers mean a larger number of targets could be identified, better detection). The ROC area is the area under the receive operating characteristic curve between 0.01% and 100% based on a logarithmic false positive axis (higher numbers mean that on average a target will be easier to detect). Image A is a highly cluttered scene, which made target detection extremely challenging. Image B was an urban scene with structural components not normally seen in nature. Image C was a sparse scene meaning it was relatively easy to detect the targets, even without processing. While images D, E, and F represented a midlevel of complexity. Due to resolution limits, the minimum possible value for 50% detection rate was 0.01%, which corresponded to 50% of targets being detectable without any false positives. SEM stands for standard error of the mean. The better detection levels obtained by the model over the photoreceptor are likely the result of the model being essentially free from noise. The better detection rates for the model that includes the dynamic properties reflect the importance of such processing under these conditions (target speed matched to background).
Image A B C D E F Mean SEM
50% detection rate Luminance 9.29 1.61 0.01 2.42 6.25 6.25 4.31 1.43
Photoreceptor 4.88 0.98 0.01 0.67 2.21 1.51 1.71 0.70
Model 4.49 0.44 0.01 0.32 1.34 0.93 1.26 0.67
Static model 5.25 0.68 0.01 1.18 3.63 2.33 2.18 0.81
1% error rate Luminance 6 42 86 42 22 22 36.67 11.35
Photoreceptor 23 51 93 55 37 43 50.33 9.70
Model 27 58 94 60 43 52 55.67 9.11
Static model 18 56 91 49 31 36 46.83 10.39
ROC area Luminance 26.45 44.28 83.30 49.52 36.59 36.59 46.12 8.09
Photoreceptor 35.25 51.56 88.59 59.89 46.42 47.22 54.82 7.50
Model 37.30 55.77 89.52 61.49 49.71 50.57 57.39 7.21
Static model 32.81 52.37 87.39 55.37 43.52 42.30 52.29 7.74
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×