April 2008
Volume 8, Issue 4
Free
Research Article  |   April 2008
Cue combination and color edge detection in natural scenes
Author Affiliations
  • Chunhong Zhou
    Second Sight Medical Products, Inc., Sylmar, CA, USAazhou@2-sight.com
  • Bartlett W. Mel
    Biomedical Engineering Department and Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USAhttp://lnc.usc.edumel@usc.edu
Journal of Vision April 2008, Vol.8, 4. doi:https://doi.org/10.1167/8.4.4
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Chunhong Zhou, Bartlett W. Mel; Cue combination and color edge detection in natural scenes. Journal of Vision 2008;8(4):4. https://doi.org/10.1167/8.4.4.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Biological vision systems are adept at combining cues to maximize the reliability of object boundary detection, but given a set of co-localized edge detectors operating on different sensory channels, how should their responses be combined to compute overall edge probability? To approach this question, we collected joint responses of red-green and blue-yellow edge detectors both ON- and OFF-edges using a human-labeled image database as ground truth (D. Martin, C. Fowlkes, D. Tal, & J. Malik, 2001). From a Bayesian perspective, the rule for combining edge cues is linear in the individual cue strengths when the ON-edge and OFF-edge joint distributions are (1) statistically independent and (2) lie in an exponential ratio to each other. Neither condition held in the color edge data we collected, and the function P(ON∣cues)—dubbed the “combination rule”—was correspondingly complex and nonlinear. To characterize the statistical dependencies between edge cues, we developed a generative model (“saturated common factor,” SCF) that provided good fits to the measured ON-edge and OFF-edge joint distributions. We also found that a divisive normalization scheme derived from the SCF model transformed raw edge detector responses into values with simpler distributions that satisfied both preconditions for a linear combination rule. A comparison to another normalization scheme (O. Schwartz & E. Simoncelli, 2001) suggests that apparently minor details of the normalization process can strongly influence its performance. Implications of the SCF normalization scheme for cue combination in biological sensory systems are discussed.

Introduction
Detecting object boundaries is one of the central problems of natural vision. The problem is all the more pressing given that some of the most important objects—predators and prey—are often well camouflaged. Even without intentional camouflage, an object's surface properties can accidentally match those of its background, causing border contrast to weaken or disappear. The problem is compounded by poor lighting conditions, shadows, haze, etc. To combat these problems, biological visual systems are adept at combining cues from multiple sources, including depth, motion, luminance, color, and texture, to maximize the reliability of boundary detection under the widest variety of circumstances (Rivest & Cavanagh, 1996; Badcock & Westheimer, 1985; Gray & Regan, 1997; McGraw, Whitaker, Badcock, & Skillen, 2003). 
The problem of cue combination in the context of boundary detection may be simply stated: Given a set of N local edge detectors applied at a given location in the scene, each tuned to a different sensory cue, how should their responses be combined mathematically to arrive at the ON-edge probability P(ON∣cues)? 
Within a Bayesian framework, optimal cue combination depends heavily on the statistics of natural scenes (Grenander, 1996; Knill, 1998; Mamassian, Landy, & Maloney, 2002; Grzywacz & Yuille, 1990), and in the particular case of edge detection, on the conditional joint distributions P(cues∣ON) and P(cues∣OFF). Previous authors have collected ON-edge and OFF-edge conditional distributions from natural images. Using an adaptive binning method, Konishi, Yuille, Coughlan, and Zhu (2003) measured the joint statistics of brightness and color gradients both on and off edges in a human-labeled image database, and then used Bayes' rule to compute the empirical posterior probability P(ON∣cues) at every point in an image. The method yielded good edge-detection results. Fine, MacLeod, and Boynton (2003) gathered joint statistics of the luminance, red-green, and blue-yellow color differences between pixels in natural images as a function of their spatial separation. By adopting the assumption that nearby pixels most often lie on the same “stuff,” whereas pixels taken from different images most likely lie on different materials, they estimated same-surface vs. different-surface conditional distributions of their color-difference measurements using unlabelled images, analogous to the collection of ON-edge and OFF-edge conditional statistics as discussed above. Then using Bayes' rule, they derived an empirical cue-combination rule that computed the probability that two pixels lie on the same surface using differences in the three color channels as cues. The surface locality assumption and the Bayesian approach were validated in the sense that the resulting cue-combination rule provided fairly good predictions of human subjects' performance on an image segmentation task. 
Direct tabulation of boundary statistics to arrive at an optimal cue-combination rule, as in the abovementioned approaches, has the advantage of simplicity, and involves few assumptions. The main disadvantage is that an ever-larger space of boundary cue joint statistics must be tabulated each time a new cue is added to the mix. Furthermore, tables of statistics do not increase our understanding of the origin and statistical properties of natural boundaries, nor of the functional form of the optimal combination rule. It is difficult, therefore, to exploit such an approach to make inferences about the circuitry and operations that biological sensory systems may use to combine cues. 
Overview of the approach
Given the natural ties between cue-combination theory and natural image statistics, in the next two sections we review ideas and methods from both to frame the problem of optimal boundary detection in natural images. We turn first to cue-combination theory, to learn what makes cue combination easy. The key insight is that when classifying a cue vector into categories (e.g., ON-edge vs. OFF-edge), class-conditional independence of the cues greatly simplifies the process by satisfying the most demanding precondition for a linear combination rule. We then turn to previous work in natural image statistics, from which we learn not to expect co-localized edge cues to be class-conditionally independent—far from it. We nonetheless gain insight from this work as to why co-localized cues are contaminated with higher-order correlations, and what kinds of (divisive normalization) operations can help to rid cues of their undesirable dependencies. 
In the body of the work we describe our new contributions as follows: (1) We measure in a controlled setting the joint statistics of two co-localized edge cues operating in different color channels; (2) we propose a simple generative model (called “saturated common factor,” SCF) that helps explain the specific form of the joint distributions we see and especially the higher-order correlations; (3) we invert the SCF model mathematically to arrive at a simple, biologically plausible divisive normalization scheme; (4) we test the SCF normalization scheme on the labeled database and find that the normalized edge-detector responses are not only class-conditionally independent but also satisfy the other critical condition for a linear combination rule; (5) we compare the SCF normalization scheme to another scheme that is similar in spirit but based instead on a sum-of-squares normalization term (Schwartz & Simoncelli, 2001), finding that the SCF normalization is more effective at eliminating the higher-order dependencies between edge detector channels; and (6) we point out the relationships between the SCF and other similar models and flesh out interpretations and predictions pertaining to the neural circuitry needed for sensory cue combination. 
When a linear combination rule is best
Previous approaches to cue combination have often included assumptions that lead to a linear combination rule (Yuille & Bulthoff, 1996; Ernst & Banks, 2002; Olman & Kersten, 2004), such as the assumption that the cues are class-conditionally independent (CCI) Gaussian variables. In the context of two edge detectors operating separately in red-green and blue-yellow color channels, the CCI assumption means that (1) in cases where an edge is present, the magnitude of the red-green change across the boundary provides no information about the magnitude of the blue-yellow change, and (2) the same holds in cases when there is no boundary present. (Note that when the CCI assumption holds for a set of cues, overall statistical independence between the cues does not hold, since a large response measured within any individual channel increases the probability that a boundary is present, which leads to higher expected values for all of the other boundary cues.) 
Given multiple estimators
s ^
i of a continuous target variable S, the CCI Gaussian assumption leads to a cue-combination rule that is linear, with weights depending on the individual cue “reliabilities”:  
S ^ = i w i s ^ i
(1)
where  
w i = 1 δ i 2 j 1 δ j 2
(2)
and δ is the variance of distribution P( S
s ^
i) (Yuille & Bulthoff, 1996). Similar results can be obtained for cues {r1, r2,… rN} that provide evidence for a binary target variable T. For example, T might indicate the presence or absence of an edge at a given image location/orientation. As is the case for the Perceptron model (Bishop, 1996), Bayes' rule tells us that 
P(T|cues)=P(cues|T)P(T)P(cues|T)P(T)+P(cues|T)P(T)=11+P(cues|T)P(T)P(cues|T)P(T).
(3)
 
Using the assumptions that the cues are CCI Gaussian random variables, with each cue having the same variance in class T and
T
(despite different means), it can be shown that  
P ( T | r 1 , r 2 , , r N ) = f ( i w i r i ) ,
(4)
where the w i's are related to cue variances and f( x) is a logistic function. It is easy to see that the optimal combination rule remains linear even under the weaker assumption that P( r 1, r 2, … ∣ T) and P( r 1, r 2, … ∣
T
), though potentially violating the CCI condition, are arbitrary nonspherical Gaussian distributions with the same covariance matrix (Jacobs, 1995). This is because a single linear transformation of the cues, built into the combination rule (Equation 4), can eliminate the dependencies in both distributions simultaneously. 
We observed that a linear combination rule still applies for CCI sensory cues, even if non-Gaussian, as long as the ratio of the class-conditional distributions for each individual cue is a decaying exponential function of the cue's value:  
P ( r i | T ) P ( r i | T ) e α i r i .
(5)
This is the case since under the CCI assumption,  
P ( r 1 , r 2 , | T ) P ( r 1 , r 2 , | T ) = P ( r 1 | T ) P ( r 2 | T ) P ( r 1 | T ) P ( r 2 | T ) e α 1 r 1 e α 2 r 2 ,
(6)
where the α is are constants. Plugging Equation 6 into Equation 3, we have  
P ( T | r 1 , r 2 , , r N ) = 1 1 + P ( T ) P ( T ) e ( α 1 r 1 + α 2 r 2 + ) = f ( i = 1 N α i r i ) ,
(7)
where f is again a logistic function. This is the familiar “neural” operation consisting of a weighted sum of inputs followed by a sigmoidal activation function. A number of simple class-conditional distributions satisfy the ratio requirement of Equation 5, including any two Gaussian distributions with the same covariance (the special case mentioned above) and any two exponential distributions. Note that the exponential ratio assumption of Equation 5 is equivalent to the assumption that each cue by itself has a sigmoidal target function:  
P ( T | r i ) = 1 1 + P ( r i | T ) P ( T ) P ( r i | T ) P ( T ) = 1 1 + A e α r i ,
(8)
where A is a constant. 
In addition to their simplicity from a computational perspective, linear combination rules have the advantage that they make it particularly easy to incorporate new cues as they become available. Linear combination rules account for human performance in a variety of visual tasks, including orientation discrimination (Rivest, Boutet, & Intriligator, 1997), estimation of depth (Landy, Maloney, Johnston, & Young, 1995), luminance (Maloney, 1999, 2002; Maloney & Yang, 2004), motion (Derrington & Badcock, 1985; Ledgeway & Smith, 1994; Scott-Samuel & Georgeson, 1999; Wilson & Kim, 1994), texture localization (Landy & Kojima, 2001), and surface orientation (Knill, 1998). Results inconsistent with linear combination rules have also been found (Bulthoff & Mallot, 1998; Porrill, Frisby, Adams, & Buckley, 1999; Saunders & Knill, 2001; Frome, Buck, & Boynton, 1981), telling us that biological systems are certainly capable of managing more complex combination schemes when necessary. 
About the class-conditional independence assumption: It's not likely to hold
The optimal rule for combining cues is simplest when the detectors to be combined are class conditionally independent. However, previous work on the statistics of natural images tells us that nearby filter responses often show pronounced higher-order correlations that cannot be eliminated by a linear operation such as a whitening transform (Wainwright & Simoncelli, 2000; Schwartz & Simoncelli, 2001; Liang, Simoncelli, & Lei, 2000; Zetsche & Röhrbein, 2001; Parra, Spence, & Sajda, 2000; Fine et al., 2003; Karklin & Lewicki, 2003, 2005). In particular, given that a strong response is measured in one filter, other nearby filters typically show larger variances. This “bowtie”-shaped dependency can arise from several factors that scale filter response distributions up and down on a region-by-region basis, including lighting factors (e.g., variations in light intensity and/or the angles between surfaces and light sources), variations in surface texture across the scene, and atmospheric conditions such as mist or haze that reduce the contrast of more distant objects. 
A basic assumption of this work is that regional common factor–induced variability in filter responses is undesirable because it conflates with the more important filter variations tied to the underlying physical structure of the scene. Various authors have shown that higher-order correlations between nearby filters can be suppressed by divisively normalizing filter responses using a locally computed energy-like measurement. Wainwright and Simoncelli (2000) proposed a Gaussian Scale Mixture model of image formation, in which contrast factors multiplicatively scale the original Gaussian-distributed wavelet coefficients. The process is reversed by divisive normalization, allowing the raw sensory variables to be recovered. Parra et al. (2000) proposed a similar model in which filter coefficients are drawn from a spherical random distribution whose overall scale is modulated, and Zetsche and Röhrbein (2001) proposed a divisive normalization scheme based on the observation that natural signals are often separable in polar coordinates. More recently, Karklin and Lewicki (2003, 2005) showed that the multiplicative factors operating on filter responses can themselves be viewed as spatially varying variance images, and efficiently encoded using learned basis functions tailored to specific image contexts. 
It is interesting to note that the divisive normalization schemes associated with these analyses of natural image statistics are very similar, and in some cases identical, to the divisive operations proposed to implement contrast gain control in neural circuits (Reichardt & Poggio, 1979; Ohzawa, Sclar, & Freeman, 1982; Bonds, 1989; Nelson, 1991; Albrecht & Geisler, 1991; Geisler & Albrecht, 1992, 1997; Heeger, 1992; Carandini, Heeger, & Movshon, 1997; Bonin, Mante, & Carandini, 2005). 
Given that none of these previous studies were primarily concerned with classification tasks, such as edge detection using optimal combinations of filter values, they focused on overall rather than class-conditional dependencies among filter values. Nonetheless, based on these earlier studies we hypothesized that our color edge data set would exhibit a similar type of higher-order dependency in the class-conditional (ON-edge and OFF-edge) distributions, and that some form of divisive normalization would be needed to eliminate these dependencies as a precursor to the cue-combination stage. 
Results
Color edge statistics
The color opponent space
Three hundred images from the Corel database for which human segmentations were available (Martin, Fowlkes, Tal, & Malik, 2001) were used to gather color edge statistics (Figure 1). Just as for cone responses in natural scenes (Ruderman, Cronin, & Chiao, 1998), the RGB values in the image database were highly correlated (Figure 2A). Since this redundancy, left unchecked, would propagate directly through to the color edge statistics, we ran a fast fixed-point ICA algorithm (Hyvarinen & Oja, 1997) on a random sample of 1.5 million pixels from the database. This led to an uncorrelated color-opponent space of familiar form (Wandell, 1995; Ruderman et al., 1998) with O1 and O2 corresponding to red-green and blue-yellow opponent axes, respectively, and O3 related to pixel intensity: 
(O1O2O3)=(5.24.91.71.31.93.01.00.01.0)(RGB)
(9)
with scaling factors 
R=0.7RG=1.1GB=1.0B
(10)
introduced to simplify the matrix. 
Figure 1
 
Example images and contours used to gather ON-edge and OFF-edge statistics. ON-edge pixels were defined as those lying on and roughly aligned with human-drawn contours that functioned as ground truth (Martin et al., 2001). Human-contour orientation was determined using a lookup table-driven line-tracking method (Liow, 1991). OFF-edge pixels were defined to be at least 4 pixels from the closest boundary pixel. Human labels were concentrated on main object contours and outlines, omitting many bona fide local edges; for the significance of this data collection bias, see “Discussion.”
Figure 1
 
Example images and contours used to gather ON-edge and OFF-edge statistics. ON-edge pixels were defined as those lying on and roughly aligned with human-drawn contours that functioned as ground truth (Martin et al., 2001). Human-contour orientation was determined using a lookup table-driven line-tracking method (Liow, 1991). OFF-edge pixels were defined to be at least 4 pixels from the closest boundary pixel. Human labels were concentrated on main object contours and outlines, omitting many bona fide local edges; for the significance of this data collection bias, see “Discussion.”
Figure 2
 
Remapping of RGB values into a uniformly distributed red-green, blue-yellow opponent color space. A. R vs. G values for randomly drawn pixels from 300 indoor and outdoor scenes in the Corel database. Correlation coefficient is 0.9 and similarly high for R-B and B-G pairs. B. An independent components analysis (ICA)-derived linear transformation ( Equation 9) gives two decorrelated color-opponent channels. Marginal distributions are also shown. C. Histogram equalization was achieved by integrating one-dimensional marginal densities from B and mapping data into a uniformly distributed two-dimensional color-opponent space (D). Slight striations in D are the result of JPEG quantization. Correlation between resulting R-G and B-Y values was 0.08.
Figure 2
 
Remapping of RGB values into a uniformly distributed red-green, blue-yellow opponent color space. A. R vs. G values for randomly drawn pixels from 300 indoor and outdoor scenes in the Corel database. Correlation coefficient is 0.9 and similarly high for R-B and B-G pairs. B. An independent components analysis (ICA)-derived linear transformation ( Equation 9) gives two decorrelated color-opponent channels. Marginal distributions are also shown. C. Histogram equalization was achieved by integrating one-dimensional marginal densities from B and mapping data into a uniformly distributed two-dimensional color-opponent space (D). Slight striations in D are the result of JPEG quantization. Correlation between resulting R-G and B-Y values was 0.08.
Data in the two color-opponent channels clustered near the origin ( Figure 2B). A histogram equalization step ( Figure 2C) was used to spread the two color-opponent values uniformly from 0 to 1 on their respective axes ( Figure 2D). In the remainder of the work, we focus on edge statistics within the normalized R-G and B-Y color channels shown in Figure 2D
Computing oriented edges
To suppress JPEG artifacts, a Gaussian filter ( σ = 2 pixels) was used to smooth the color-opponent images ( Figures 3A and 3B). A custom “pairwise difference” (PD)-oriented edge detector ( Figure 3C) was applied separately to the smoothed R-G and B-Y color channels, giving rise to raw edge responses r 1 and r 2, respectively, at each pixel ( Figure 3D). A second type of edge detector based on an oriented Gabor filter was also used in some experiments ( Figure 11). 
Figure 3
 
Oriented edge detection within R-G and B-Y color channels. A. Original color image. B. Slightly smoothed R-G and B-Y channels shown as intensity images. C. Pairwise-difference (PD) edges were computed as follows: At each of 8 neighboring pixel locations along the edge axis (only 4 are shown), the difference across the edge (skipping the central pixel) was computed, passed through a sigmoid function and summed. Sigmoid was x/(x + 0.2) for x ≥ 0, and x/(0.2 − x) for x < 0. PD values were computed at 8 orientations (0, 22.5, 45, … 157.5 deg), using a simple interpolation scheme for the oblique orientations. D. Absolute values of PD detector responses (subsequently referred to as r 1 and r 2) are shown for the two color channels. A black pixel is drawn wherever the PD response magnitude at any orientation exceeded a threshold of 0.4. Because of the blurring operation, PD filter responses along boundaries were up to several pixels wide at this threshold. Examples of complementary responses in two edge channels are indicated by red and green circles; blue circles show a contour containing both types of color edge energy.
Figure 3
 
Oriented edge detection within R-G and B-Y color channels. A. Original color image. B. Slightly smoothed R-G and B-Y channels shown as intensity images. C. Pairwise-difference (PD) edges were computed as follows: At each of 8 neighboring pixel locations along the edge axis (only 4 are shown), the difference across the edge (skipping the central pixel) was computed, passed through a sigmoid function and summed. Sigmoid was x/(x + 0.2) for x ≥ 0, and x/(0.2 − x) for x < 0. PD values were computed at 8 orientations (0, 22.5, 45, … 157.5 deg), using a simple interpolation scheme for the oblique orientations. D. Absolute values of PD detector responses (subsequently referred to as r 1 and r 2) are shown for the two color channels. A black pixel is drawn wherever the PD response magnitude at any orientation exceeded a threshold of 0.4. Because of the blurring operation, PD filter responses along boundaries were up to several pixels wide at this threshold. Examples of complementary responses in two edge channels are indicated by red and green circles; blue circles show a contour containing both types of color edge energy.
Statistics of R-G and B-Y edges
In agreement with previous measurements of spatial contrast values in natural images ( Figure 4A) (Bell & Sejnowski, 1995; Balboa & Grzywacz, 2003; Wainwright, Simoncelli, & Willsky, 2000), the marginal distributions of r1 and r2 were S-shaped on a log scale (Figure 4B). 
Figure 4
 
Distributions of spatial contrast values in natural images. A. Distribution of contrast measurement in natural images shows a characteristic S shape on a log plot (from Balboa & Grzywacz, 2003). B. The unconditional marginal distributions of the red-green and blue-yellow pairwise-difference (PD) detector responses r1 and r2 collected from 300 database images. Small-dashed line shows marginal distribution produced by the generative model shown in Figure 6.
Figure 4
 
Distributions of spatial contrast values in natural images. A. Distribution of contrast measurement in natural images shows a characteristic S shape on a log plot (from Balboa & Grzywacz, 2003). B. The unconditional marginal distributions of the red-green and blue-yellow pairwise-difference (PD) detector responses r1 and r2 collected from 300 database images. Small-dashed line shows marginal distribution produced by the generative model shown in Figure 6.
Human-labeled image contours (Martin et al., 2001) were used to sort PD responses into ON-edge and OFF-edge classes (see Figure 1 caption for details), and the class-conditional distributions were collected (Figure 5). Despite the fact that the R-G and B-Y values at each pixel were nearly uncorrelated (r = 0.08, see Figure 2D), the values of r1 and r2 were moderately correlated within both ON-edge and OFF-edge classes (r = 0.36 and 0.43, respectively). Furthermore, as expected from previous studies (Wegman & Zetzsche, 1990; Wainwright & Simoncelli, 2000; Schwartz & Simoncelli, 2001; Parra et al., 2000; Zetzsche & Röhrbein, 2001), the variance of either variable increased with the value of the other (Figure 5, lower panels). 
Figure 5
 
Class-conditional ON-edge and OFF-edge distributions for spatially superimposed R-G and B-Y edge detectors. Upper. Contour plots of r1 and r2. Given the sparseness of the human contour labeling, the OFF-edge distribution made up 97% of the collected data. Gray stripes show values of ri's used to collect conditional distributions shown in bottom row. Lower. One-dimensional distributions of r2 conditioned on four values of r1 (0, 0.2, 0.4, and 0.6) reveal a typical higher-order correlation (see text). Probabilities are on a log(10) scale.
Figure 5
 
Class-conditional ON-edge and OFF-edge distributions for spatially superimposed R-G and B-Y edge detectors. Upper. Contour plots of r1 and r2. Given the sparseness of the human contour labeling, the OFF-edge distribution made up 97% of the collected data. Gray stripes show values of ri's used to collect conditional distributions shown in bottom row. Lower. One-dimensional distributions of r2 conditioned on four values of r1 (0, 0.2, 0.4, and 0.6) reveal a typical higher-order correlation (see text). Probabilities are on a log(10) scale.
A second feature common to the two joint distributions was the transition of contour shapes from relatively straight diagonal contours near the origin, to round contours in the intermediate ranges, to square contours far from the origin ( Figures 5, upper panels). 
A generative model
We searched for a simple generative model that would (1) help explain how co-localized edge detectors operating in independent sensory channels could give rise to the particular form of correlated joint distributions shown in Figure 5, and (2) provide an avenue to recover the underlying edge variables, which we presumed to be class-conditionally independent. We proposed that the responses r 1, r 2, … r N of a collection of edge detectors applied at image location ( x, y) and orientation θ are generated by a four-step process ( Figure 6): 
Figure 6
 
Diagram of the saturated common factor generative model. Distal edge magnitudes e i are assumed to be exponentially distributed both ON and OFF edges. The block diagram applies to both cases, differing only in the mean 1/q of the exponential distributions. Distal edge values are multiplied by an exponentially distributed common factor C that affects all sensory channels. Each scaled variable R i is then passed through a saturating nonlinearity g( x) to yield the measured edge-detector responses r 1r N.
Figure 6
 
Diagram of the saturated common factor generative model. Distal edge magnitudes e i are assumed to be exponentially distributed both ON and OFF edges. The block diagram applies to both cases, differing only in the mean 1/q of the exponential distributions. Distal edge values are multiplied by an exponentially distributed common factor C that affects all sensory channels. Each scaled variable R i is then passed through a saturating nonlinearity g( x) to yield the measured edge-detector responses r 1r N.
  1.  
    P(ON) denotes the prior probability that a physical edge exists at any given location/orientation, with P(OFF) = 1 − P(ON). In the first stage of the generative model, the ON or OFF class is chosen at random according to this prior probability, measured to be 3% in the labeled images.
  2.  
    If the ON-edge class is selected, the raw edge magnitudes e 1, … e N in the N sensory channels are drawn independently from an exponential distribution with mean  
    e i O N = 1 q O N ,
    (11)
    that is,  
    P O N ( e i ) = q O N · e q O N · e i .
    (12)
    Edge magnitudes evaluated at OFF-edge sites are also assumed to be independent across sensory channels and exponentially distributed, though with a smaller mean value  
    e i O F F = 1 q O F F < 1 q O N P O F F ( e i ) = q O F F · e q O F F · e i .
    (13)
     
    This difference in expected value reflects the fact that human contour labels were generally well aligned with local edge structures; the larger mean values for ON-edge responses based on human labeling can be clearly seen in Figure 5.
  3.  
    A “common factor” C representing the local lighting conditions, texture, or other modulatory influence is drawn from a third exponential distribution with mean 〈 C〉 = 1/ p, giving  
    P ( C ) = p · e p C .
    (14)
    The factor C multiplies the raw edge strengths in all N sensory channels to give the scaled sensory responses  
    R i = C · e i .
    (15)
  4.  
    Finally, the scaled sensor responses are passed through a saturating nonlinearity with knee K to give the measured edge detector responses  
    r i = g ( R 1 ) = R i R i + K .
    (16)
By deriving an expression for the cumulative joint distribution of r 1 and r 2 from this “saturated common factor” model, and computing the partial derivative evaluated using either q = q ON or q OFF, we arrived at a parameterized expression for the class-conditional joint distributions P( r 1, r 2∣ON) and P( r 1, r 2∣OFF) (see “ 18” section). Maximum likelihood fits of the model to the empirical joint distributions of Figure 5 are shown in Figure 7. The model-generated distributions show both the gradual transitions from diagonal to square contours and the increasing variance of the conditional distributions. 
Figure 7
 
ON-edge and OFF-edge distributions generated by the SCF model using only a single parameter q (with p = K = 1). Plots are maximum likelihood fits to the distributions shown in Figure 5. Similar features include gradual transition from diagonal to square contours, and increasing variance of either variable conditioned on an increasing value of the other.
Figure 7
 
ON-edge and OFF-edge distributions generated by the SCF model using only a single parameter q (with p = K = 1). Plots are maximum likelihood fits to the distributions shown in Figure 5. Similar features include gradual transition from diagonal to square contours, and increasing variance of either variable conditioned on an increasing value of the other.
Surprisingly, only a single parameter was needed to generate each of the plots of Figure 7. Whereas each two-dimensional plot nominally depends on the three parameters p, K, and either q ON or q OFF, the SCF model is sensitive only to the product p* q* K (see “ 19” section). It is thus possible to fix both p and K to 1 and maximize the likelihood of the data varying only q. The values of q ON and q OFF found by this procedure are shown in Table 1. P(ON) was determined by estimating the fraction of pixels covered by human labels in the image database, and played no role in the fits shown in Figure 7
Table 1
 
Parameters of the saturated common factor model used to generate the distributions in Figure 7.
Table 1
 
Parameters of the saturated common factor model used to generate the distributions in Figure 7.
Description Parameter Value
Prior probability of edge P(ON) 0.03
Mean raw ON-edge response 〈 e ON 1/ q ON 0.563
Mean raw OFF-edge response 〈 e OFF 1/ q OFF 0.086
Mean of common factor 〈 C 1/ p (fixed) 1
Knee of sensor saturation function g() K (fixed) 1
The optimal combination rule
Given the prior probability P(ON) and the joint distributions of co-localized edge detector responses for both ON-edge and OFF-edge classes, the combination rule P(ON∣ r 1, r 2) follows directly from the application of Bayes' rule ( Equation 3). Contour and surface plots of the combination rule derived from the empirically tabulated likelihood functions ( Figure 5) are shown in Figure 8A. For comparison, the combination rule derived from the modeled likelihoods ( Figure 7) is shown in Figure 8B. The two functions show a similar progression of contour shapes moving away from the origin, reminiscent of the progression of contour shapes in the class-conditional joint distributions shown in the underlying likelihood functions. One notable feature of the combination rule is that near the origin, where the two cues are both weak, the more diagonally oriented contours mean the edge probability is closer to a function of the sum (or average) of the two cues, whereas the square-shaped contours far from the origin indicate that the edge probability there is governed roughly by the MAX of r 1 and r 2. That fact that the combination rule expressed in terms of the unprocessed edge detector responses r 1 and r 2 defies simple description is in keeping with the fact that the raw edge values satisfy neither of the preconditions for a linear combination rule of the form given in Equation 7. We conjectured that an appropriate normalization of r 1 and r 2, obtained by inverting the SCF model, could recover the two exponentially distributed underlying physical edge variables e 1 and e 2 if they exist, that is, if the edge responses obtained from the human-labeled images were in fact generated by an SCF–like process. 
Figure 8
 
Plots of the combination rule P(ON∣ r 1, r 2). A. Contour and surface plots of Equation 7 using measured likelihood tables P( r 1, r 2∣ON) and P( r 1, r 2∣OFF). B. Corresponding plots derived from the saturated common factor–generated likelihoods using the parameters in Table 1. C. Plots generated directly from edge-detector responses using Equations 18, 21, and 22. Sigmoid was steeper than in A, though contour shapes near the decision point (i.e., 50% probability level near the 8 th contour) were quite similar.
Figure 8
 
Plots of the combination rule P(ON∣ r 1, r 2). A. Contour and surface plots of Equation 7 using measured likelihood tables P( r 1, r 2∣ON) and P( r 1, r 2∣OFF). B. Corresponding plots derived from the saturated common factor–generated likelihoods using the parameters in Table 1. C. Plots generated directly from edge-detector responses using Equations 18, 21, and 22. Sigmoid was steeper than in A, though contour shapes near the decision point (i.e., 50% probability level near the 8 th contour) were quite similar.
Recovering the CCI edge variables: First expand, then divide
Given the form of the SCF model, recovery of the underlying variables e i from measured edge detector responses involves two steps. First, each variable is expanded through the function h( r) = g −1( r) = Kr/(1 − r), with K = 1, to undo the effect of the compressive nonlinearity that we assume has acted on the edge-detector outputs. Applying the expansive nonlinearity unbends the contours found in the class-conditional joint distributions of r 1 and r 2, giving rise to the straight diagonal contours in the joint distributions of the intermediate variables R 1 and R 2 ( Figure 9A). The contours of P( R 1, R 2∣OFF) are not, however, equally spaced as they would be in a log plot for two independent exponential variables. Thus, the expansive nonlinearity leads to a simpler relationship between the two edge detector values, but does not remove their higher-order statistical dependencies. 
Figure 9
 
Two-step saturated common factor (SCF) normalization procedure. A. Each detector response is passed through expansive nonlinearity h( x), which inverts the presumed saturating nonlinearity g( x); this results in diagonal, though unevenly spaced, contours in the joint distribution. B. Estimated value of C from Equation 17 plotted vs. actual value in Monte Carlo simulation of the SCF model using the parameters in Table 1. Correlation coefficient is r = 0.75; correlation between sum-of-squares normalizer and actual C was 0.47. C. The R is are then divided by C ^, leading to a close-to-independent exponential joint distribution. D. Conditional slices are now nearly superimposed with each other (slices taken at e 1 = 0, 1, 2, and 3), indicating higher-order dependencies have been largely eliminated—compare to lower row in Figure 5. E and F. Same as C and D but for ON-edge distributions.
Figure 9
 
Two-step saturated common factor (SCF) normalization procedure. A. Each detector response is passed through expansive nonlinearity h( x), which inverts the presumed saturating nonlinearity g( x); this results in diagonal, though unevenly spaced, contours in the joint distribution. B. Estimated value of C from Equation 17 plotted vs. actual value in Monte Carlo simulation of the SCF model using the parameters in Table 1. Correlation coefficient is r = 0.75; correlation between sum-of-squares normalizer and actual C was 0.47. C. The R is are then divided by C ^, leading to a close-to-independent exponential joint distribution. D. Conditional slices are now nearly superimposed with each other (slices taken at e 1 = 0, 1, 2, and 3), indicating higher-order dependencies have been largely eliminated—compare to lower row in Figure 5. E and F. Same as C and D but for ON-edge distributions.
The second step in inverting the SCF model is to divide the intermediate variables R i by the most probable common factor, approximated as  
C ^ 1 2 p ( N 2 + 4 p q O F F R i N ) ,
(17)
where N = 2 is the number of available cues (see “ 20” section for derivation of Equation 17). This approximation leads in turn to an approximation of the most probable distal edge value  
e ^ i = R i C ^ = h ( r i ) C ^ 2 p · h ( r i ) 4 p q O F F j = 1 N h ( r j ) + N 2 N + ɛ .
(18)
For analytical tractability, Equation 17 approximates the most probable value of C given R 1 and R 2 rather than its (presumably larger) mean value. To adjust for the underestimate, we added a constant ɛ = 0.5 to the denominator of Equation 18. The primary effect of the constant was to expand the range of { e 1, e 2, …} tuples produced by the normalization to include points near the origin. Divide-by-zero errors that occur when r 1 = r 2 = 0 were also avoided. 
Equation 18 is similar in spirit to previously proposed normalization schemes, in that the raw filter responses are first passed through an expansive nonlinearity and then divisively normalized by a term involving a sum of the expanded variables. The details are different, however. In particular, the function h( r) appears in lieu of the more common squaring nonlinearity (Heeger, 1992; Bonin et al., 2005; Wainwright & Simoncelli, 2000; Parra et al., 2000; Schwartz & Simoncelli, 2001; Zetzsche & Röhrbein, 2001); the divisive factor here is also sensitive to the number of available cues through the value of N
The common factor can be rewritten as  
C ^ 2 q O F F R 1 + 1 + 4 p q O F F R N .
(19)
This form makes it clear that when N grows large, the most probable common factor simplifies to  
C ^ q O F F · R , f o r N 1 .
(20)
This means that when many independent sensory cues are available, the average of the expanded cues
R
converges to the common multiplier, up to the constant of proportionality q OFF. When the number of cues is small, however, the sparse prior for C causes
C ^
to grow sublinearly with increasing values of
R
Testing the normalization
To test the quality of the estimates of C (via Equation 17) when only two color edge values are available, two channels of data ( r 1 and r 2) were generated by drawing from the SCF model using the parameters of Table 1. A scatter plot of C vs.
C ^
is shown in Figure 9B. The correlation between estimated and actual values of C was r = 0.75. When a sum-of-squares normalizer ( Equation 23) was used in lieu of Equation 17, estimates of C were degraded ( r = 0.46), though this was partly to be expected given that the data in this case were known to be generated by the SCF model. 
A more demanding test of the normalization formula would involve edge data from natural images, where the underlying generating process is unknown. Again using the estimate of C given by Equation 17, we tested whether the normalization of Equation 18 would lead to CCI exponentially distributed edge values e 1 and e 2—a premise of the SCF model. The joint distributions of the post-normalization edge variables are shown in Figures 9C9F. The roughly evenly spaced diagonal contours ( Figures 9C and 9E) and the nearly overlapping conditional probability slices ( Figures 9D and 9F) confirm that the higher-order correlations present in the measured filter values r 1 and r 2 have been mostly eliminated, and that the resulting CCI edge variables are approximately exponentially distributed. As can be seen by the greater residual dependencies in the ON-edge data (compare panels D and F in Figure 9), the parameters of Table 1 were optimized to model OFF-edge statistics since those data accounted for 97% of the total data set. 
Given that the two requirements for a linear combination rule are approximately met by e 1 and e 2, that is, class-conditional independence and an exponential ratio of the two class conditional distributions, we expected from Equation 7 that the combination rule derived empirically from the normalized edge variables e 1 and e 2 would be a linear combination of the two variables passed through a sigmoidal output function  
P ( O N | r 1 , r 2 , ) = 1 1 + exp [ ( i = 1 N w i e i D ) ] ,
(21)
where the e i are computed as in Equation 18, and assuming the cues have equal means both ON and OFF edges, the sigmoid steepness and threshold parameters would be  
w i = q O F F q O N D = log [ ( q O F F q O N ) N P ( O F F ) P ( O N ) ] .
(22)
The diagonal, sigmoidally spaced contours of the empirical combination rule shown in Figure 10 confirm this. 
Figure 10
 
Linear combination rule after saturated common factor normalization. A. Contour and surface plots show combination rule is a sigmoidal function of the sum of the two normalized edge detector responses e 1 and e 2.
Figure 10
 
Linear combination rule after saturated common factor normalization. A. Contour and surface plots show combination rule is a sigmoidal function of the sum of the two normalized edge detector responses e 1 and e 2.
Comparison to another divisive normalization scheme and generality
We asked whether the details of the SCF normalization scheme are critical for eliminating the higher-order statistical dependencies between co-localized edge detectors, or whether any similar scheme would perform in a roughly equivalent way. We also asked whether good performance of the SCF normalization scheme might be tied to the particular edge-detection filter we used up to this point—the PD filter—or whether the scheme would also work well when using a more conventional edge detecting filter. 
Previously proposed schemes for divisive normalization of sensory signals have often involved an energy-like computation in which measured filter values are squared and summed over a neighborhood, in effect a measure of local contrast (Heeger, 1992; Parra et al., 2000; Schwartz & Simoncelli, 2001; Zetzsche & Röhrbein, 2001; Carandini et al., 1997; Bonin et al., 2005). For example, Schwartz and Simoncelli (2001) proposed a normalization of the form 
ei=ri2jiwjrj2+σ2,
(23)
in which the wj and σ are constants. We compared the SCF normalization with the Schwartz-Simoncelli (S-S) equation, which we extended to allow for an arbitrary exponent k: 
ei=rikjiwjrjk+σ2.
(24)
 
Lacking a likelihood function comparable to that derived for the SCF model, the parameter k was optimized by systematic search with visual examination of the results. Results for S-S normalization applied to the PD edge-detector data set are shown in Figure 11A. In comparison to Figures 9C9F, the post-normalized ON-edge and OFF-edge distributions using the S-S normalization show greater residual higher-order dependencies, and lead to a combination rule with pronounced nonlinear interactions between the normalized edge variables. A similar pattern is seen when comparing the SCF and S-S normalization schemes using a Gabor filter as the edge-detection operator ( Figure 11B). 
Figure 11
 
Comparison of saturated common factor (SCF) normalization to that proposed by Schwartz and Simoncelli (2001). A. Class conditional distributions and combination rule after S-S normalization. ON and OFF distributions retain significant higher-order correlations (compare to Figure 10). Resulting combination rule based on normalized variables remains nonlinear (compare to Figure 9). S-S parameters were σ = 0.74, k = 2.5, and w = 1.0. B. Comparison between SCF and S-S normalization when edges were extracted using conventional Gabor rather than pairwise difference filters applied separately to the RG and BY color channels (σw = 1, σh = 4, and λsin = 4). SCF normalization again leads to simpler, nearly independent exponential joint distributions, and the resulting SCF-derived combination rule is correspondingly more linear. SCF parameters for Gabors determined from ML fit: K = p = 1 (fixed) and q = 192; S-S parameters: k = 1.5, σ = 0.3, and w = 0.8.
Figure 11
 
Comparison of saturated common factor (SCF) normalization to that proposed by Schwartz and Simoncelli (2001). A. Class conditional distributions and combination rule after S-S normalization. ON and OFF distributions retain significant higher-order correlations (compare to Figure 10). Resulting combination rule based on normalized variables remains nonlinear (compare to Figure 9). S-S parameters were σ = 0.74, k = 2.5, and w = 1.0. B. Comparison between SCF and S-S normalization when edges were extracted using conventional Gabor rather than pairwise difference filters applied separately to the RG and BY color channels (σw = 1, σh = 4, and λsin = 4). SCF normalization again leads to simpler, nearly independent exponential joint distributions, and the resulting SCF-derived combination rule is correspondingly more linear. SCF parameters for Gabors determined from ML fit: K = p = 1 (fixed) and q = 192; S-S parameters: k = 1.5, σ = 0.3, and w = 0.8.
Overall, the SCF normalization leads to a simpler pattern of results, including a more complete elimination of the higher-order dependencies, a nearly exponential appearance of the post-normalized variables, and an approximately linear combination rule. This suggests that the SCF model, by assuming underlying exponentially distributed variables, an exponentially distributed common factor, and a compressive nonlinearity applied separately to each channel, provides a fairly good representation of the process leading from physical edges to measured edge detector responses. 
Color edge detection using the SCF normalization
Examples of images processed using the SCF-derived cue-combination scheme ( Equations 18, 21, 22, and Figure 8C) are shown in Figure 12. All three color channels were used (i.e., including the intensity channel). The lightness value at each pixel is tied to the maximum value of P(ON∣ r 1, r 2, r 3) over the 8 cue-combined orientation channels analyzing that pixel. 
Figure 12
 
Sample edge-detected images. Pairwise difference edges were run in 3 color channels, including intensity channel O3. Edge values were normalized with Equation 18 (with e = 0.001) and summed. Parameters were as in Table 1. Given uncertainty in the true prior P(ON), in lieu of the final sigmoidal operation ( Equations 21 and 22), which depends strongly on P(ON), we used ordinary contrast enhancement (+55 setting in Adobe Photoshop). Gray level is thus monotonically related to the max value of P(ON∣r1, r2, r3) over the 8 orientation channels at each pixel. Same parameters were used for all 4 images.
Figure 12
 
Sample edge-detected images. Pairwise difference edges were run in 3 color channels, including intensity channel O3. Edge values were normalized with Equation 18 (with e = 0.001) and summed. Parameters were as in Table 1. Given uncertainty in the true prior P(ON), in lieu of the final sigmoidal operation ( Equations 21 and 22), which depends strongly on P(ON), we used ordinary contrast enhancement (+55 setting in Adobe Photoshop). Gray level is thus monotonically related to the max value of P(ON∣r1, r2, r3) over the 8 orientation channels at each pixel. Same parameters were used for all 4 images.
Discussion
The ability to make optimal use of available cues for the detection of object boundaries is critical for survival. We have pointed out that the problem of combining multiple cues for edge detection lies at a crossroads between theories of optimal cue combination and natural image statistics. Our approach was motivated by the idea that biological and machine vision systems should not only strive to combine edge cues optimally, but for reasons of parsimony, they should be structured to allow easy incorporation of additional cues whenever they become available in the course of evolution. One means to achieve easy extensibility is to design the system so that edge cues can be combined linearly. In this way, new cues that become available can simply be added into the existing mix with appropriate coefficients. 
A Bayesian formulation of the problem tells us that a linear combination rule is optimal when the cues are class-conditionally independent and satisfy certain distributional assumptions (Jacobs, 1995; see “When a linear combination rule is best” section). In color edge detection, class-conditional independence would mean that in the class of all true edges (and similarly for the class of all non-edges), the red-green edge score carries no information about the blue-yellow or luminance edge scores, and vice versa. Statistical independence is a frequent topic of discussion in connection with sparse V1-like image representations, including results showing V1-like-oriented filters emerge from independent components analysis (Olshausen & Field, 1996; Bell & Sejnowski, 1995). In actuality, however, when filter responses are collected from nearby locations in natural images, strong statistical dependencies are observed between them (Wainwright & Simoncelli, 2000; Schwartz & Simoncelli, 2001; Liang et al., 2000; Zetsche & Röhrbein, 2001; Parra et al., 2000; Wainwright, Schwartz, & Simoncelli, 2001; Fine et al., 2003; Karklin & Lewicki, 2003, 2005). 
What is the source of these dependencies in the context of color edge detection? First, even assuming the luminance, red-green, and blue-yellow values at each pixel are statistically independent in natural images (they are not), it does not follow that edge detectors operating separately on the three color channels would be statistically independent. There are at least two reasons for this. First, the appearance and disappearance of physical edges within the overlapping receptive fields of two or more co-localized detectors will induce a dependency between them. This is because an edge is a space-occupying object whose occurrence will as a rule boost edge detector responses in all color channels simultaneously. Isoluminant or iso-R-G or B-Y edges are counterexamples to this rule, but such cases are relatively rare. Second, even when the joint statistics of two or more co-localized edge detectors are conditioned on the presence or absence of a true edge, thus factoring out the spatial edge–induced correlation, numerous studies of natural image statistics have shown that regional lighting, geometric or texture variations can introduce higher-order correlations between nearby detectors (Wegman & Zetzsche, 1990; Wainwright & Simoncelli, 2000; Schwartz & Simoncelli, 2001; Parra et al., 2000; Zetzsche & Röhrbein, 2001; Karklin & Lewicki, 2003, 2005). A prime example is the modulatory effect of lighting intensity (e.g., sun vs. shade) on the outputs of all of the spatial filters within a local neighborhood. 
In point of fact, therefore, co-localized edge detectors exhibit both (1) a correlation arising from their spatially overlapping receptive fields and co-responsiveness to most physical edges—this is a “good” kind of correlation, akin to the consensus one seeks from a panel of medical experts who use different methods to diagnose the same disease—and (2) a “bad” kind of higher-order correlation induced by lighting, geometric or texture-related factors that modulate the spatial filters in a region up and down together, hence the term “common factor.” It is these undesirable correlations that lead a set of co-localized edge detectors to violate the CC independence assumption. And according to cue-combination theory, this makes edge cues in their un-normalized form more difficult to combine. 
We conclude that an extensible boundary-detecting system should set up a first layer of circuitry that attempts to eliminate the higher-order correlations between edge cues using an appropriate divisive normalization scheme, while leaving the good, target-related correlations intact. A number of divisive normalization schemes have been previously proposed in addition to the one we describe here. Assuming the post-normalization cues satisfy the required distributional assumptions (see “ When a linear combination rule is best” section), they may simply be added to obtain a measure of the overall edge probability at any given location. With this scenario in mind, we developed a simple generative model to explain certain qualitative features of the joint distributions of two color edge channels operating on and off edges in complex indoor and outdoor scenes. We found that the SCF model with the parameters in Table 1 generated good facsimiles to our empirical data based on human-labeled contours, including the progression from diagonal to square contours seen in the ON-edge and OFF-edge distributions ( Figure 5A), and the general form of the higher-order correlation between the two edge variables ( Figure 5B). A more stringent validation of the generative model lay in its ability to selectively eliminate the undesirable higher-order statistical dependencies between co-localized edge detectors. To verify this, we inverted the SCF model to arrive at a two-stage normalization scheme that is similar in spirit, but different in the details, from previously proposed normalization schemes. When we applied the SCF-derived normalization scheme to the R-G and B-Y edge responses measured in the human-labeled data set, we recovered nearly independent exponentially distributed edge variables, and correspondingly, a linear combination rule. That the SCF-normalized color edge values would be class-conditionally independent, exponentially distributed, and would lead to a linear combination rule, was not a forgone conclusion. For comparison, we applied an alternative (sum-of-squares) normalization scheme and found that it was less well suited to normalize and combine the edge cues in our data set under two different assumed edge detector models (PD and Gabor) ( Figure 11). This suggests that the details of the normalization scheme can be quite important, and that the SCF model may provide a particularly good description of the joint response statistics of co-localized edge detectors in natural images. 
Appropriateness of the human-labeled data set
The human-labeled boundaries in the Martin et al. (2001) data set were used to sort local oriented edge-detector responses into ON-edge vs. OFF-edge classes. The appropriateness of this data set for our purposes might be questioned, given that the human labelers clearly focused their attention on major object boundaries rather than local edges (Figure 1). Many strong local edge responses would have been missing from the ON-edge class and misclassified as OFF-edge data. In addition, our automated edge-sorting algorithm included in the ON-edge class any response from an edge filter lying underneath and roughly aligned with a label from any of the 5 human subjects. This criterion was probably too liberal, leading to the inappropriate assignment of some weak filter responses to the ON-edge class. The combined effect of these two choices means that our ON-edge distribution probably contained an excess of weak edge values and a shortage of strong ones, and vice versa for the OFF-edge distribution. The prior probability of an edge P(ON) could also have been biased away from its true value, but it is difficult to say in which direction. 
Despite these non-ideal properties of our data set, it seems unlikely that a change in the human labeling strategy or an improvement in our mode of collecting ON-edge vs. OFF-edge data from the labeled images would change our results in any fundamental way. Recall that the SCF-derived normalization, whose parameters were fit based on the human labeled data set, led to post-normalized color edge values that were close to exponentially distributed and class-conditionally independent. Moreover, judging by the comparison to another (sum-of-squares) normalization method ( Figure 11), the simplification that we observed both in the joint distributions and in the combination rule after applying the SCF normalization was not an inevitable outcome. It seems unlikely that the failure of human subjects to label many bona fide local edges, and the overly inclusive criterion we used to define the ON-edge class based on the human labels, would conspire to simplify the statistics of the data set in such as way that the distributional assumptions of the SCF model would be accidentally satisfied. Nonetheless, it would be interesting to apply the SCF model and its corresponding normalization scheme to a labeled data set that emphasizes local edges rather than major object boundaries (Wilson, Ing, & Geisler, 2006). 
Are ON-edge and OFF-edge structures fundamentally different?
It is at first surprising that the joint distributions in Figure 5 show a similar kind of higher-order correlation in both ON- and OFF-edge data. This observation clashes with the intuitive notion that edges constitute different “stuff” than non-edges, and should therefore react differently to the modulatory influences that generate higher correlations between co-localized edge detectors. However, the natural world does not in fact contain a clean dichotomy between edges and non-edges, any more than it contains a clean dichotomy between chairs and non-chairs or any other natural category. Rather, an edge detector presented with an image patch asks the question, “How good an edge do I see at my preferred location, orientation, and scale?” The answer is continuous-valued rather than categorical, a measure of the match between the complex, variegated local image structure and the edge detector's canonical multi-dimensional tuning curve. The key observation is that a local image structure rated as a poor edge by one particular detector, and thus relegated to that detector's OFF-edge distribution, will often qualify as a good edge at a nearby location, orientation, and/or scale. It follows that all edge-detector responses evoked by a local image structure, including both the strong responses evoked in well-matched edge channels, and the weak responses evoked in poorly matched channels, are subject to the same regional scaling factors that generate higher-order correlations between co-localized edge channels. This may explain why higher-order correlations are seen even in the OFF-edge distribution for any given filter. Put another way, an OFF-edge distribution might be more appropriately labeled “NOT-ON-MY-edge.” 
When the use of human labels as ground truth isn't circular
In a statistical approach to cue combination, a kind of circularity would seem to exist when using a cue-combining edge-detection system, such as a human observer, to sort data into ON-edge and OFF-edge ground truth categories with the intention to use that ground truth to evaluate the performance of another cue-combining system that is specifically designed to be a model of the first. In the limit in which the model system becomes equivalent to the ground truth–generating system in design and sophistication, the likelihood functions P(cues∣ON) and P(cues∣OFF) would degenerate to reflect the deterministic assignment of inputs to ON- and OFF-edge categories by the ground truth classifier. That is, the ON-edge cue distribution would have non-zero probability domains that are exactly complementary to those of the OFF-edge distribution (unlike the case of Figure 5 which contains no regions of zero probability), and the combination rule, unlike the rules shown in Figure 8, would be a binary-valued function that perfectly replicates the ground truth classifications. The reason that this degeneracy does not occur in the present context is that the human-drawn labels used to classify pixels into ON- and OFF-edge categories are the product of an extremely sophisticated long-range contour processing network residing within the human visual system. This ultra-high end, contour-processing network has far more information about object contour structure available to it than is available to any collection of local edge detectors, and far more sophisticated machinery for processing that information. The inability of a set of cues to perfectly reproduce the ground truth classifications, as indicated in our case by the heavily overlapping ON-edge and OFF-edge distributions, attests to the non-circularity of the approach. 
Rationale for the compressive nonlinearity in the SCF model
A key feature of the SCF model is the compressive nonlinearity g( x) applied to each filter channel after it has been scaled by the common factor. One rationale for including this nonlinearity in the generative model for edges is that the dynamic range of distal edge magnitudes that can occur within natural scenes far exceeds the dynamic range of the physical devices—the neurons or camera pixels—that are designed to represent those values. For example, consider the border between two dark matte surfaces with luminances differing by only 5% vs. the border between a dark matte surface and a bright specular surface whose luminances can easily differ by a factor of 100. The need to cope with such a large range of edge magnitudes, in some cases within the same scene, means that edge-representing variables within a typical physical edge-detection system will for practical reasons carry range-compressed signals. 
Beyond the need to cope with dynamic range limitations, applying a compressive transform to a low-level sensory variable can have representational advantages as well. For example, a logarithmic function has the convenient property that just-noticeable differences in stimulus intensity will be proportional to the absolute stimulus magnitude (subject to certain assumptions); this is the basis of the Weber-Fechner psychophysical scaling law. A compressive transform may also simplify the joint distributions of two or more sensory variables: Ruderman et al. (1998) found that a logarithmic transform of retinal cone responses led to a simpler, more symmetrical joint distribution of pixel data in a PCA-derived color-opponent space like that found in the retina. Compressive transforms can also increase coding efficiency in sparsely distributed signals (Laughlin, 1981). Thus, on both practical and representational grounds, it seems reasonable to include an explicit compressive nonlinearity in a generative model for low-level visual cues. As a cautionary note in interpreting neural data, however, the fact that a neuron's response saturates with increasing stimulus intensity does not necessarily signify that the system has applied an explicit compressive nonlinearity to an underlying sensory variable. For example, a sublinear stimulus-response curve can arise as a byproduct of a divisive normalization of the very kind we are concerned with here (e.g., see Bonin et al., 2005). In particular, following the Bayesian logic of the SCF and other similar models, a sublinear response to increasing stimulus intensity may reflect (1) the system's assumption that the measured stimulus intensity is the product of a true feature value and a contaminating multiplicative factor, (2) the Bayesian inference that an increase in stimulus intensity is partly due to an increase in the contaminating factor, and (3) the fact that division by an ever-increasing factor drags the curve down into the sublinear range. Viewed in this way, the cell's sublinear stimulus-response curve simply reflects the system's best guess as to the distal feature's true value given the magnitude of the conflated stimulus presented as input. The situation could be more complicated than this, however, since the above argument does not rule out that an overt compressive nonlinearity has been applied after the division operation, for range compression or other purposes. 
Comparison to other normalization schemes
The inclusion of an explicit compressive nonlinearity in the generative (i.e., forward) direction of the SCF model underscores an interesting difference between it and related models. In virtually all normalization schemes, including the SCF normalization, the input variables feeding into the common factor estimate (i.e., the denominator) are run through an expansive nonlinearity before they are summed ( Equation 17). In the SCF model, the purpose of this operation is transparent: The expansive function h( x) is the inverse of the compressive function g( x) that is presumed to have acted on the variables prior to their arrival as inputs. In other normalization schemes, the expansive nonlinearity is most often a squaring operation (Heeger, 1992; Carandini et al., 1997; Parra et al., 2000; Schwartz & Simoncelli, 2001; Zetzsche & Röhrbein, 2001; Bonin et al., 2005). The rationale for the squaring nonlinearity, however, is not that it inverts a square-root operation that was applied earlier in the generative process. Rather, the square can be traced to the idea that the common factor modulates the variance or “energy” of the population response (Heeger, 1992), or to the assumption that the underlying sensory variables are Gaussian distributed (Wainwright & Simoncelli, 2000). 
Linking the compressive nonlinearity to MAX-like behavior
Whatever its source or purpose, a compressive nonlinearity profoundly affects the joint distributions of the measured sensory variables as well as the form of the optimal combination rule ( Figure 13). It also points to an unexpected connection between compressed sensory variables and combination rules with MAX-like behavior. Consider a cue-combination system whose input variables have been compressed by a sublinear function g( x), where g( x) can be thought of as a power function with a gradually decreasing exponent; both g( x) = log(1 + x) and g( x) = x/( x + K) have this property, whereas g( x) = x 1/2 does not since the exponent (1/2) is constant. If the combination rule is a function of a sum of the original uncompressed variables P( Tx 1, x 2,… x N) = f( x 1 + x 2 + … + x N) , then the combination rule plotted in the space of the original variables x i will have straight diagonal contours—this is true by definition. The output function f( x) changes only the contour spacing. In contrast, when the combination rule is expressed in terms of the sensory variables after they have been compressed, that is, where each original coordinate x i has been replaced by g( x i), then the combination rule will have iso-response contours more like those in Figures 8, 13B, and 13C (but not 13D), progressing from close to diagonal near the origin to squarish as any one of the compressed variables grows large. Such a “LIN-MAX” progression of contours can be modeled as a family of curves of constant p norm  
r p = ( x 1 p + x 2 p + + x n p ) 1 p
(25)
with steadily increasing values of p. Depending on the range of p values encountered, which depends on the function g( x), the combination rule is approximately a function of the Hamming length of the input vector ( p = 1) when the inputs are all weak; the Euclidean length ( p = 2) in intermediate zones; and the MAX of the inputs ( p = ∞) when any one or more of the input variables grows large. 
Figure 13
 
A compressive nonlinear transform warps a linear combination rule. A. Three compressive nonlinear functions; the first two grow with progressively decreasing exponent, while the square root grows with constant power. B-D. Contour plots of a linear combination rule f( x 1, x 2) = x 1 + x 2 expressed in the compressed coordinates g( x 1) and g( x 2), where the particular g( x) function is shown in each inset. LIN-MAX pattern is present in B and C, but not D. z axis values are plotted on a log scale, affecting only the spacing of the contours. The output function f( x) is linear in these examples; substituting a nonlinear output function would again affect only the spacing of the contours.
Figure 13
 
A compressive nonlinear transform warps a linear combination rule. A. Three compressive nonlinear functions; the first two grow with progressively decreasing exponent, while the square root grows with constant power. B-D. Contour plots of a linear combination rule f( x 1, x 2) = x 1 + x 2 expressed in the compressed coordinates g( x 1) and g( x 2), where the particular g( x) function is shown in each inset. LIN-MAX pattern is present in B and C, but not D. z axis values are plotted on a log scale, affecting only the spacing of the contours. The output function f( x) is linear in these examples; substituting a nonlinear output function would again affect only the spacing of the contours.
Thus one effect of the compressive nonlinearity is to make a single strong cue more powerful than several weak cues when the cue vectors are equated for length, usually taken to mean the Hamming or Euclidean length. In the context of edge detection, a LIN-MAX rule can be understood intuitively in this way: When all of the available edge detectors at a point in an image are responding weakly, corresponding to the linear range of g( x) along each cue dimension, the combination rule P(ON∣ r 1, r 2,…) depends on the average of the detector responses—we may say this since the average is proportional to the sum. Averaging is an appropriate strategy when the goal is to suppress independent noise. In the MAX-like regime, on the other hand, a strong response in any single edge channel corresponds to an overridingly strong distal edge value in that channel, owing to the range compression that has previously occurred. Assuming that an edge is almost certain to be present at an image location based on a single strong cue, the edge probability has little room to increase as other cues are added. The MAX-like regime is thus a result of a probability ceiling effect. 
The difference between linear and MAX-like modes of cue combination becomes exaggerated as the number of cues increases. For example, if 5 cues are available and each has a value ranging from 0 to 1, a cue vector (1, 0, 0, 0, 0) is as strong a combination as cue vector (1, 1, 1, 1, 1) according to a MAX rule, while a linear rule would greatly overestimate the strength of the second input in this case. Likewise, a vector of balanced weak cues (0.05, 0.05, 0.05, 0.05, 0.05) would be as powerful a cue combination as (0.2, 0, 0, 0, 0) for a linear rule, but would have its strength greatly underestimated by a pure MAX rule. The lesson here is that when input variables have been range compressed, a fixed combination rule that depends inalterably on either the sum or the MAX of the cues will yield poor estimates of edge probability for some cue combinations. This provides the rationale for first applying an expansive nonlinearity to the input variables—the first stage of the SCF normalization—to convert a LIN-MAX combination rule into a simpler linear one. 
The connection between range compression of sensory cues and LIN-MAX combination rules has interesting implications for neural integration of sensory signals, and for cue combination at the perceptual level. Range compression through a logarithmic or similar transform is a common feature of biological sensory systems, suggesting that a LIN-MAX pattern of summation of sensory cues might be found at the single neuron level. This possibility could be tested in a cortical neuron that receives input from two or more independent sensory channels, for example, color-luminance cells in primate V1 (Johnson, Hawken, & Shapley 2001). Assuming an idealized LIN-MAX combination rule and a well-behaved output function (e.g., corresponding to a typical F-I curve), the firing rate of such a cell should increase roughly linearly with the superposition of weak sensory cues, but proportionally less as progressively stronger cues are combined, and should show response saturation for even a single strong cue. As a corollary, wherever multiple cues are integrated, one strong cue should be a more effective stimulus than multiple weak cues—assuming the cue vectors are equated using standard 1-norm or 2-norm measures. 
As a caveat, given uncertainties in the precise form of g( x) and the output function f( x), the strongest predictions regarding a neuron's response to multiple cues derive from the shapes of the iso-response contours. This is because the contours of the combination rule can be determined experimentally without specific knowledge of either f or g, as long as the input cue values r 1, r 2, … r N are known and can be manipulated. More problematic are predictions framed in terms of a neuron's “summation arithmetic,” (i.e., whether summation of responses to two or more stimuli is linear, sublinear, MAX-like, etc.). Response magnitudes depend in nontrivial ways on both f and g and interactions between them, and these functions may be difficult to determine in specific cases. Furthermore, summation arithmetic can be profoundly affected by competitive interactions among stimuli presented simultaneously within a cell's receptive field, including division-like normalizing operations of the kind under consideration here. In contrast to measures of summation arithmetic, the iso-response contours of a cue-combination cell can be identified without knowing f or g. Iso-response contours can even remain stable after the input cues have been divisively normalized—as long as the contours in both class-conditional joint distributions start out all the same shape. (This holds, for example, in the case of diagonal contours, as shown in the transition from Figures 9B and 9C). 
The arithmetic of response summation for two stimuli presented separately and together has been examined in the monkey striate and extrastriate cortex, though almost always in the context of spatial summation of stimuli presented at different classical RF locations (Movshon, 1978; Reynolds, Chelazzi, & Desimone, 1999; Gawne & Martin, 2002; Lampl, Ferster, Poggio, & Riesenhuber, 2004). Though the applicability of these data to cue combination is tenuous, the data nonetheless contain useful lessons. As previously mentioned, pair interactions are often competitive, meaning that the response to a pair of stimuli lies between the responses to the two individual stimuli. Competitive interactions, incidentally, can be symptomatic of a divisive normalization process. Beyond the frequent reports of competition, however, no single or simple pattern of spatial summation appears to apply to all neurons or all stimuli. Reynolds et al. (1999) found that the combined response to a pair of stimuli in V2 and V4 was the average of the two individual responses—on average. But they also showed that a wide variety of actual outcomes underlay the “average on average” property, ranging from MIN and below to MAX and above. Lampl et al. (2004) measured subthreshold summation in V1 neurons, concluding that summation was a MAX on average, that is, the combined response was about equally often larger or smaller than the maximum of the two individual responses. Other authors have reported a wide range of summation outcomes as well, with MAX-like summation in a subset of the cases (Gawne & Martin, 2002; Avillac, Ben Hamed, & Duhamel, 2007). The lack of a clear pattern in these data as a whole could reflect the concern expressed above that predictions of response magnitudes and response summation arithmetic are beset by uncertainties, whereas predictions based on the shapes of iso-response contours may prove easier to interpret. These issues will have to be resolved through further experimentation, including experiments specifically designed to test cue-combination rules. 
If neurons do exhibit LIN-MAX summation, perception should perhaps follow a similar pattern. Taking perceptual salience of an edge as a measure of P(ON∣cues), a LIN-MAX rule implies that the salience of an edge should increase substantially with the superposition of a set of weak cues, but with a progressively lower exponent as the cues grow stronger ( Figure 14). 
Figure 14
 
Demo of apparent increases in edge probability when color cues of different strengths are combined. The first 3 images in each column show a sinusoidal grating defined by (1) isoluminant L-M bars, (2) isoluminant B-(L+M)/2 bars, and (3) intensity bars. The fourth image shows RGB superposition of the 3 single-cue gratings. RGB saturation was avoided by building all gratings on an RGB pedestal of (128, 128, 128) and making sure RGB values never exceeded 80% of their maximum (0.8 × 255 = 204), or dropped below 20% of their minimum (0.2 * 255 = 51) value. Single-cue chromatic bars were created using the following method provided by Elizabeth Johnson: A full-strength L-isolating grating was created by multiplying the vector (0.792, −0.16, −0.026) by a sinusoidal modulation of amplitude max-min and adding to a gray RGB pedestal (see above). Weaker gratings were made by scaling down the modulation. Corresponding RGB directions for M and S cones were 1.26, −0.65, and 0.0032, and 0.15, −0.25, and 0.71, respectively. Very weak single-cue edges (left column) are virtually invisible separately but combine to create a discernible partition. Cues of intermediate strength (middle column) are perceptible, but when combined lead to a clear increase in edge salience. Combining strong cues (right column) leads to diminishing returns.
Figure 14
 
Demo of apparent increases in edge probability when color cues of different strengths are combined. The first 3 images in each column show a sinusoidal grating defined by (1) isoluminant L-M bars, (2) isoluminant B-(L+M)/2 bars, and (3) intensity bars. The fourth image shows RGB superposition of the 3 single-cue gratings. RGB saturation was avoided by building all gratings on an RGB pedestal of (128, 128, 128) and making sure RGB values never exceeded 80% of their maximum (0.8 × 255 = 204), or dropped below 20% of their minimum (0.2 * 255 = 51) value. Single-cue chromatic bars were created using the following method provided by Elizabeth Johnson: A full-strength L-isolating grating was created by multiplying the vector (0.792, −0.16, −0.026) by a sinusoidal modulation of amplitude max-min and adding to a gray RGB pedestal (see above). Weaker gratings were made by scaling down the modulation. Corresponding RGB directions for M and S cones were 1.26, −0.65, and 0.0032, and 0.15, −0.25, and 0.71, respectively. Very weak single-cue edges (left column) are virtually invisible separately but combine to create a discernible partition. Cues of intermediate strength (middle column) are perceptible, but when combined lead to a clear increase in edge salience. Combining strong cues (right column) leads to diminishing returns.
The normalization pool: How many filters is best?
For simplicity, we applied the SCF model to de-correlate and combine just two edge cues. Estimates of overall edge probability would likely be improved by including additional co-localized edge cues, most obviously the intensity channel, which is the third principal component of natural image spectra. The performance improvement expected with the inclusion of additional cues would flow from two sources. First, the larger the number of valid cues that are available, the greater the mutual information between the cue vector and the edge indicator variable. In the limit where many highly informative cues are available, P(ON∣cues) would approach a binary-valued function indicating complete certainty about the presence or absence of an edge in all situations. This type of performance improvement stems from the progressive divergence of the two class-conditional distributions P(cues∣ON) and P(cues∣OFF) as new valid cue dimensions are added. The improvement would obtain whether or not a common factor was included in the generative model for edges. In the realistic case where a common factor exists, a second source of improved performance arises from better estimates of the common factor using the larger number of available cues (see Equations 21, 22, and surrounding text). 
These two distinct sources of improvement in edge-detection performance point to a dissociation between the use of cues to decide whether an edge is present based on local spatial structure, and the use of cues to estimate the common factor; different sets and numbers of filters could contribute to these two processes. For example, whereas in our experiments both computations were based on just two co-localized edge cues, the common factor estimate could in principle be based on a much larger pool of edge detector outputs from the surrounding neighborhood. The question thus arises: What is the optimal collection of filters from which to estimate the common factor modulating two or more co-localized edge channels? Wainwright et al. (2000) have suggested that the higher-order correlations among filters in natural images may have a hierarchical structure, involving separable contributions at different spatial scales. Karklin and Lewicki (2003, 2005) showed that the distribution of common factors (i.e., filter variances) across space and orientation could be modeled as a linear combination of variance basis functions, just as basis functions are used to generate images patches in conventional applications of ICA. In both of these approaches, the “scale parameter” at any given point/orientation derives from a substantial block of image data, and potentially the entire image. As the region increases in size from which a local scale parameter is computed, however, two effects could actually worsen estimates, especially if—for reasons of biological computational parsimony—they rely on relatively simple averaging methods such as Equation 21. First, common factors are regional, but the regions needn't be large. Thus, the spatial scale of the pool should match the scale over which lighting conditions, textural variations, and other common factor–inducing processes typically change. Considering that texture changes occur on the scale of object surfaces, which can be small and have sharp borders, and lighting conditions can also vary on a fine spatial scale (consider dappled sunlight under a forest canopy), pressure exists to keep simple average-based common factor estimates quite localized as was done in our experiments. A second effect is that the statistical structure of natural images produces spatial correlations between filters that are not straightforward to factor out. For example, unlike two co-localized, co-oriented edge detectors in different color channels, which are roughly independent both ON and OFF edges, two filters at 180° shifted phases, or at orthogonal orientations, or in co-linear arrangements, can have strong positive or negative correlations as the case may be, both ON and OFF edges. These departures from independence in the surrounding pool of filters will inevitably complicate the common factor estimate, perhaps even pushing the necessary computations beyond the capabilities of a realistic neural circuit. This again highlights the potential advantage of restricting the pool to contain only those filters whose outputs are, as a group, related to the unknown common factor in an easy-to-compute way. That a small number of filter values can provide a good estimate of the common factor is supported by the fact that the SCF model was able to eliminate most of the higher-order correlations between R-G and B-Y edges in the color edge data set using only the two cues themselves. Further research will be needed to determine how neural circuits have managed the tradeoff between the need for effective normalization processes and the need for tractable neural algorithms. 
Appendix A
Computing the joint density of two edge cues from the SCF model
We begin with the cumulative joint distribution of r 1 and r 2:  
F r 1 , r 2 ( r 1 , r 2 ) = 0 0 r 1 K ( 1 r 1 ) C 0 r 2 K ( 1 r 2 ) C P ( C ) P ( e 1 ) P ( e 2 ) d e 2 d e 1 d C = p q 2 0 e p C 0 r 1 K ( 1 r 1 ) C e q e 1 d e 1 0 r 2 K ( 1 r 2 ) C e q e 2 d e 2 d C = p 0 e p C · ( 1 e r 1 q K ( 1 r 1 ) C ) ( 1 e r 2 q K ( 1 r 2 ) C ) d C = 1 p 2 B e s s e l K ( 1 , 2 r 1 p q K 1 r 1 ) ( 1 r 1 ) p r 1 q K 2 B e s s e l K ( 1 , 2 r 2 p q K 1 r 2 ) ( 1 r 2 ) p r 2 q K + 2 B e s s e l K ( 1 , 2 p q K ( r 1 + r 2 2 r 1 r 2 ) ( 1 r 1 ) ( 1 r 2 ) ) ( 1 r 1 ) ( 1 r 2 ) p ( r 1 + r 2 2 r 1 r 2 ) q K .
(A1)
 
The joint density of r 1 and r 2 can be obtained through the partial derivative of the cumulative distribution:  
P ( r 1 , r 2 ) = 2 F r 1 , r 2 ( r 1 , r 2 ) r 1 r 2 .
(A2)
 
Fitting the SCF model to the data using only one parameter per joint distribution
To fit the generative model ( Equations 1216), we maximized the likelihood,  
L = P ( r 1 , r 2 , | q , p , K ) ,
(A3)
in which the joint density function P() is defined as in Equation A2. Notice that R i = C * e i, and C and e i are both exponentially distributed random variables. An exponential variable with mean λ is equivalent to a constant λ times an exponentially distributed variable with mean 1, so that the distribution of the product C* e is invariant as long as the product of the two means 1/( p* q) remains constant (or likewise the reciprocal of the mean p* q). This extra degree of freedom makes it possible to fix p and vary only q. Furthermore, given that the saturating nonlinearity g() in the SCF model can be expressed as g( x/ K), where the parameter K simply scales the input variable, the flexibility provided by K can be subsumed by the scaling factor attached to C* e. The distribution of the output g( C* e) is therefore invariant as long as the product p* q* K remains constant. This allowed us to fix p = K = 1 during the optimization process while varying only q
We minimized the negative log-likelihood:  
L = i log P ( r 1 i , r 2 i | q , p = 1 , K = 1 ) .
(A4)
 
Filter response distributions collected ON- and OFF-edges were fitted separately using 120,000 data points in each case, resulting in the parameters q ON and q OFF as shown in Table 1
Computing the most probable common factor
We start with an expression for the posterior probability:  
L = P ( C | R 1 , R 2 , . . . ) P ( R 1 , R 2 , . . . | C ) P ( C ) = [ P ( R 1 , R 2 , . . . | C , O N ) P ( O N ) + P ( R 1 , R 2 , . . . | C , O F F ) P ( O F F ) ] P ( C ) = [ P e ( e 1 = R 1 C , e 2 = R 2 C , . . . | O N ) P ( O N ) C N + P e ( e 1 = R 1 C , e 2 = R 2 C , . . . | O F F ) P ( O F F ) C N ] P ( C ) .
(A5)
 
Using Equation 12 through 16 and the parameters from Table 1, and by setting the partial derivative to 0, we have  
L C = p e p C C N { P ( O N ) q O N N e q O N C R i ( N C + p q O N R i C 2 ) + P ( O F F ) q O F F N e q O F F C R i ( N C + p q O F F R i C 2 ) } = 0 .
(A6)
 
Equation A6 has no straightforward explicit solution. However, given that all variables in Equation A6 are positive, we note that it can only be equal to 0 if the two terms in parentheses have opposite signs:  
( N C + p q n R i C 2 ) ( N C + p q f R i C 2 ) < 0 .
(A7)
 
The solution C to Equation A7, and hence to Equation A6, lies in between the solutions of the following two equations:  
N C + p q n R i C 2 = 0 N C + p q f R i C 2 = 0
(A8)
which are  
C 1 = 1 2 p ( N 2 + 4 p q n R i N ) C 2 = 1 2 p ( N 2 + 4 p q f R i N ) .
(A9)
 
Given that in natural images q OFF > q ON and P(OFF) ≫ P(ON), we expect the solution to Equation A6 to be close to C 2,  
C ^ 1 2 p ( N 2 + 4 p q f R i N ) .
(A10)
 
Acknowledgments
Thanks to Gary Holt, co-designer of the PD filter, and for useful discussions in early phases of this work. Thanks to Fritz Sommer, Allan Yuille, the anonymous reviewers and the editor for critical comments on the manuscript, to Elizabeth Johnson for the cone-isolating stimulus method, and to Chait Ramachandra for invaluable technical assistance. This work was funded through grants from the Office of Naval Research, Army Research Office, National Science Foundation, and National Institutes of Health. 
Commercial relationships: none. 
Corresponding author: Bartlett W. Mel. 
Email: mel@usc.edu. 
Address: Second Sight Medical Products, Inc., Sylmar, CA, USA. 
References
Albrecht, D. G. Geisler, W. S. (1991). Motion selectivity and the contrast-response function of simple cells in the visual cortex. Visual Neuroscience, 7, 531–546. [PubMed] [CrossRef] [PubMed]
Avillac, M. Ben Hamed, S. Duhamel, J. R. (2007). Multisensory integration in the ventral intraparietal area of the macaque monkey. Journal of Neuroscience, 27, 1922–1932. [PubMed] [Article] [CrossRef] [PubMed]
Badcock, D. R. Westheimer, G. (1985). Spatial location and hyperacuity: The centre/surround localization contribution function has two substrates. Vision Research, 25, 1259–1267. [PubMed] [CrossRef] [PubMed]
Balboa, R. M. Grzywacz, N. M. (2003). Power spectra and distribution of contrasts of natural images from different habitats. Vision Research, 43, 2527–2537. [PubMed] [CrossRef] [PubMed]
Bell, A. J. Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7, 1129–1159. [PubMed] [CrossRef] [PubMed]
Bishop, C. M. (1996). Neural networks for pattern recognition. Oxford: Oxford University Press.
Bonds, A. B. (1989). Role of inhibition in the specification of orientation selectivity of cells in the cat striate cortex. Visual Neuroscience, 2, 41–55. [PubMed] [CrossRef] [PubMed]
Bonin, V. Mante, V. Carandini, M. (2005). The suppressive field of neurons in lateral geniculate nucleus. Journal of Neuroscience, 25, 10844–10856. [PubMed] [Article] [CrossRef] [PubMed]
Bülthoff, H. H. Mallot, H. A. (1998). Integration of depth modules: Stereo and shading. Journal of the Optical Society of America A, Optics and Image Science, 5, 1749–1758. [PubMed] [CrossRef]
Carandini, M. Heeger, D. J. Movshon, J. A. (1997). Linearity and normalization in simple cells of the macaque primary visual cortex. Journal of Neuroscience, 17, 8621–8644. [PubMed] [Article] [PubMed]
Derrington, A. M. Badcock, D. R. (1985). Separate detectors for simple and complex grating patterns? Vision Research, 25, 1869–1878. [PubMed] [CrossRef] [PubMed]
Ernst, M. O. Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415, 429–433. [PubMed] [CrossRef] [PubMed]
Fine, I. MacLeod, D. I. A. Boynton, G. M. (2003). Surface segmentation based on the luminance and color statistics of natural scenes. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 20, 1283–1291. [PubMed] [CrossRef] [PubMed]
Frome, F. S. Buck, S. L. Boynton, R. M. (1981). Visibility of borders: Separate and combined effects of color differences, luminance contrast, and luminance level. Journal of the Optical Society of America, 71, 145–150. [PubMed] [CrossRef] [PubMed]
Gawne, T. J. Martin, J. M. (2002). Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. Journal of Neurophysiology, 88, 1128–1135. [PubMed] [Article] [CrossRef] [PubMed]
Geisler, W. S. Albrecht, D. G. (1992). Cortical neurons: Isolation of contrast gain control. Vision Research, 32, 1409–1410. [PubMed] [CrossRef] [PubMed]
Geisler, W. S. Albrecht, D. G. (1997). Visual cortex neurons in monkeys and cats: Detection, discrimination, and identification. Visual Neuroscience, 14, 897–919. [PubMed] [CrossRef] [PubMed]
Granander, U. (1996). Elements of pattern theory.. Baltimore: John Hopkins Press.
Gray, R. Regan, D. (1997). Vernier step acuity and bisection acuity for texture-defined form. Vision Research, 37, 1717–1723. [PubMed] [CrossRef] [PubMed]
Grzywacz, N. M. Yuille, A. L. (1990). A model for the estimate of local image velocity by cells in the visual cortex. Proceedings of the Royal Society of London B: Biological Sciences, 239, 129–161. [PubMed] [CrossRef]
Heeger, D. J. (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9, 181–197. [PubMed] [CrossRef] [PubMed]
Hyvarinen, A. Oja, E. (1997). A fast fixed-point algorithm for independent component analysis. Neural Computation, 9, 1483–1492 [CrossRef]
Jacobs, R. A. (1995). Methods for combining experts' probability assessments. Neural Computation, 7, 867–888. [PubMed] [CrossRef] [PubMed]
Johnson, E. N. Hawken, M. J. Shapley, R. (2001). The spatial transformation of color in the primary visual cortex of the macaque monkey. Nature Neuroscience, 4, 409–416. [PubMed] [Article] [CrossRef] [PubMed]
Karklin, Y. Lewicki, M. S. (2003). Learning higher-order structures in natural images. Network, 14, 483–499. [PubMed] [CrossRef] [PubMed]
Karklin, Y. Lewicki, M. S. (2005). A hierarchical Bayesian model for learning nonlinear statistical regularities in nonstationary natural signals. Neural Computation, 17, 397–423. [PubMed] [CrossRef] [PubMed]
Knill, D. C. (1998). Ideal observer perturbation analysis reveals human strategies for inferring surface orientation from texture. Vision Research, 38, 2635–2656. [PubMed] [CrossRef] [PubMed]
Konishi, S. Yuille, A. L. Coughlan, J. M. Zhu, S. C. (2003). Statistical edge detection: Learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 57–74. [CrossRef]
Lampl, I. Ferster, D. Poggio, T. Riesenhuber, M. (2004). Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. Journal of Neurophysiology, 92, 2704–2713. [PubMed] [Article] [CrossRef] [PubMed]
Landy, M. S. Kojima, H. (2001). Ideal cue combination for localizing texture-defined edges. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 18, 2307–2320. [PubMed] [CrossRef] [PubMed]
Landy, M. S. Maloney, L. T. Johnston, E. B. Young, M. (1995). Measurement and modeling of depth cue combination: In defense of weak fusion. Vision Research, 35, 389–412. [PubMed] [CrossRef] [PubMed]
Laughlin, S. (1981). A simple coding procedure enhances a neuron's information capacity. Zeitschrift für Naturforschung C: Biosciences, 36, 910–912. [PubMed]
Ledgeway, T. Smith, A. T. (1994). Evidence for separate motion-detecting mechanisms for first- and second-order motion in human vision. Vision Research, 34, 2727–2740. [PubMed] [CrossRef] [PubMed]
Liang, Y. Simoncelli, E. P. Lei, Z. (2000). Color channels decorrelation by ICA transformation in the wavelet domain for color texture analysis and synthesis. Computer Vision and Pattern Recognition, 1, 1606–1611.
Liow, Y. T. (1991). A contour tracing algorithm that preserves common boundaries between regions. CVGIP, 3, 313–321. [CrossRef]
Maloney, L. T. (1999). Physics-based approaches to modeling surface color perception. (pp. 387–422). Cambridge: Cambridge University Press.
Maloney, L. T. (2002). Illuminant estimation as cue combination. Journal of Vision, 2, (6):6, 493–504, http://journalofvision.org/2/6/6/, doi:10.1167/2.6.6. [PubMed] [Article] [CrossRef]
Maloney, L. T. Yang, J. N. (2004). The illuminant estimation hypothesis and surface color perception. Oxford: Oxford University Press.
Mamassian, P. Landy, M. S. Maloney, L. T. Rao,, R. Olshausen,, B. Lewicki, M. (2002). Bayesian modelling of visual perception. Probabilistic models of the brain: Perception and neural function. (pp. 13–36). Cambridge: MIT Press.
Martin, D. Fowlkes, C. Tal, D. Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. International Conference on Computer Vision. Vancouver, BC, Canada.
McGraw, P. V. Whitaker, D. Badcock, D. R. Skillen, J. (2003). Neither here nor there: Localizing conflicting visual attributes. Journal of Vision, 3, (4):2, 265–273, http://journalofvision.org/3/4/2/, doi:10.1167/3.4.2. [PubMed] [Article] [CrossRef]
Movshon, J. A. (1978). Hypercomplexities in the visual cortex. Nature, 272, 305–306. [PubMed] [CrossRef] [PubMed]
Nelson, S. B. (1991). Temporal interactions in the cat visual system I Orientation-selective suppression in the visual cortex. Journal of Neuroscience, 11, 344–356. [PubMed] [Article] [PubMed]
Ohzawa, I. Sclar, G. Freeman, R. D. (1982). Contrast gain control in the cat visual cortex. Nature, 298, 266–268. [PubMed] [CrossRef] [PubMed]
Olman, C. Kersten, D. (2004). Classification objects, ideal observers & generative models. Cognitive Science, 28, 227–239. [CrossRef]
Olshausen, B. A. Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609. [PubMed] [CrossRef] [PubMed]
Parra, L. Spence, C. Sajda, P. Leen,, T. K. Dietterich,, T. G. Tresp, V. (2000). High-order statistical properties arising from the non-stationarity of natural signals. Advances in neural information processing systems. (13, pp. 782–792). Cambridge: MIT Press.
Porrill, J. Frisby, J. P. Adams, W. J. Buckley, D. (1999). Robust and optimal use of information in stereo vision. Nature, 397, 63–66. [PubMed] [Article] [CrossRef] [PubMed]
Reichardt, W. Poggio, T. (1979). Figure-ground discrimination by relative movement in the visual system of the fly. Biological Cybernetics, 35, 81–100. [CrossRef]
Reynolds, J. H. Chelazzi, L. Desimone, R. (1999). Competitive mechanisms subserve attention in macaque areas V2 and V4. Journal of Neuroscience, 19, 1736–1753. [PubMed] [Article] [PubMed]
Rivest, J. Boutet, I. Intriligator, J. (1997). Perceptual learning of orientation discrimination by more than one attribute. Vision Research, 37, 273–281. [PubMed] [CrossRef] [PubMed]
Rivest, J. Cavanagh, P. (1996). Localizing contours defined by more than one attribute. Vision Research, 36, 53–66. [PubMed] [CrossRef] [PubMed]
Ruderman, D. L. Cronin, T. W. Chiao, C. (1998). Statistics of cone responses to natural images: Implications for visual coding. Journal of the Optical Society of America A, 15, 2036–2045. [CrossRef]
Saunders, J. A. Knill, D. C. (2001). Perception of 3D surface orientation from skew symmetry. Vision Research, 41, 3163–3183. [PubMed] [CrossRef] [PubMed]
Schwartz, O. Simoncelli, E. P. (2001). Natural signal statistics and sensory gain control. Nature Neuroscience, 4, 819–825. [PubMed] [Article] [CrossRef] [PubMed]
Scott-Samuel, N. E. Georgeson, M. A. (1999). Does early non‐linearity account for second-order motion? Vision Research, 39, 2853–2865. [PubMed] [CrossRef] [PubMed]
Wainwright, M. J. Schwartz, O. Simoncelli, E. P. Rao,, R. Olshausen,, B. Lewicki, M. (2001). Natural image statistics and divisive normalization: Modeling nonlinearity and adaptation in cortical neurons. Statistical theories of the brain. Cambridge: MIT Press.
Wainwright, M. J. Simoncelli, E. P. Solla,, S. Leen,, T. Muller, K. R. (2000). Scale mixture of Gaussians and the statistics of natural images. Advances in neural information processing systems. (pp. 55–861). Cambridge: MIT Press.
Wainwright, M. J. Simoncelli, E. P. Willsky, A. S. (2000). Random cascades of Gaussian scale mixture and their use in modeling natural images with application to denoisingn Proceedings of the 7th international conference on image processing (260) –263). Vancouver, BC, Canada.
Wandell, B. A. (1995). Foundations of vision. Sunderland, MA: Sinauer Associates, Inc.
Wegman, B. Zetsche, C. (1990). Statistical dependence between orientation filter outputs used in a human‐vision‐based image code. Visual Communications and Image Processing, 1360, 909–922.
Wilson, H. R. Kim, J. (1994). A model for motion coherence and transparency. Visual Neuroscience, 11, 1205–1220. [PubMed] [CrossRef] [PubMed]
Wilson, J. A. Ing, A. D. Geisler, W. S. (2006). Chromatic differences within surfaces and across surface boundaries [Abstract]. Journal of Vision, 6, (6):559, [CrossRef]
Yuille, A. Bülthoff, H. H. Knill, D. Richards, W. (1996). Bayesian decision theory and psychophysics. Perception as Bayesian inference. (pp. 123–161). Cambridge: Cambridge University Press.
Zetzsche, C. Röhrbein, F. (2001). Nonlinear and extra-classical receptive field properties and the statistics of natural scenes. Network, 12, 331–350. [PubMed] [CrossRef] [PubMed]
Figure 1
 
Example images and contours used to gather ON-edge and OFF-edge statistics. ON-edge pixels were defined as those lying on and roughly aligned with human-drawn contours that functioned as ground truth (Martin et al., 2001). Human-contour orientation was determined using a lookup table-driven line-tracking method (Liow, 1991). OFF-edge pixels were defined to be at least 4 pixels from the closest boundary pixel. Human labels were concentrated on main object contours and outlines, omitting many bona fide local edges; for the significance of this data collection bias, see “Discussion.”
Figure 1
 
Example images and contours used to gather ON-edge and OFF-edge statistics. ON-edge pixels were defined as those lying on and roughly aligned with human-drawn contours that functioned as ground truth (Martin et al., 2001). Human-contour orientation was determined using a lookup table-driven line-tracking method (Liow, 1991). OFF-edge pixels were defined to be at least 4 pixels from the closest boundary pixel. Human labels were concentrated on main object contours and outlines, omitting many bona fide local edges; for the significance of this data collection bias, see “Discussion.”
Figure 2
 
Remapping of RGB values into a uniformly distributed red-green, blue-yellow opponent color space. A. R vs. G values for randomly drawn pixels from 300 indoor and outdoor scenes in the Corel database. Correlation coefficient is 0.9 and similarly high for R-B and B-G pairs. B. An independent components analysis (ICA)-derived linear transformation ( Equation 9) gives two decorrelated color-opponent channels. Marginal distributions are also shown. C. Histogram equalization was achieved by integrating one-dimensional marginal densities from B and mapping data into a uniformly distributed two-dimensional color-opponent space (D). Slight striations in D are the result of JPEG quantization. Correlation between resulting R-G and B-Y values was 0.08.
Figure 2
 
Remapping of RGB values into a uniformly distributed red-green, blue-yellow opponent color space. A. R vs. G values for randomly drawn pixels from 300 indoor and outdoor scenes in the Corel database. Correlation coefficient is 0.9 and similarly high for R-B and B-G pairs. B. An independent components analysis (ICA)-derived linear transformation ( Equation 9) gives two decorrelated color-opponent channels. Marginal distributions are also shown. C. Histogram equalization was achieved by integrating one-dimensional marginal densities from B and mapping data into a uniformly distributed two-dimensional color-opponent space (D). Slight striations in D are the result of JPEG quantization. Correlation between resulting R-G and B-Y values was 0.08.
Figure 3
 
Oriented edge detection within R-G and B-Y color channels. A. Original color image. B. Slightly smoothed R-G and B-Y channels shown as intensity images. C. Pairwise-difference (PD) edges were computed as follows: At each of 8 neighboring pixel locations along the edge axis (only 4 are shown), the difference across the edge (skipping the central pixel) was computed, passed through a sigmoid function and summed. Sigmoid was x/(x + 0.2) for x ≥ 0, and x/(0.2 − x) for x < 0. PD values were computed at 8 orientations (0, 22.5, 45, … 157.5 deg), using a simple interpolation scheme for the oblique orientations. D. Absolute values of PD detector responses (subsequently referred to as r 1 and r 2) are shown for the two color channels. A black pixel is drawn wherever the PD response magnitude at any orientation exceeded a threshold of 0.4. Because of the blurring operation, PD filter responses along boundaries were up to several pixels wide at this threshold. Examples of complementary responses in two edge channels are indicated by red and green circles; blue circles show a contour containing both types of color edge energy.
Figure 3
 
Oriented edge detection within R-G and B-Y color channels. A. Original color image. B. Slightly smoothed R-G and B-Y channels shown as intensity images. C. Pairwise-difference (PD) edges were computed as follows: At each of 8 neighboring pixel locations along the edge axis (only 4 are shown), the difference across the edge (skipping the central pixel) was computed, passed through a sigmoid function and summed. Sigmoid was x/(x + 0.2) for x ≥ 0, and x/(0.2 − x) for x < 0. PD values were computed at 8 orientations (0, 22.5, 45, … 157.5 deg), using a simple interpolation scheme for the oblique orientations. D. Absolute values of PD detector responses (subsequently referred to as r 1 and r 2) are shown for the two color channels. A black pixel is drawn wherever the PD response magnitude at any orientation exceeded a threshold of 0.4. Because of the blurring operation, PD filter responses along boundaries were up to several pixels wide at this threshold. Examples of complementary responses in two edge channels are indicated by red and green circles; blue circles show a contour containing both types of color edge energy.
Figure 4
 
Distributions of spatial contrast values in natural images. A. Distribution of contrast measurement in natural images shows a characteristic S shape on a log plot (from Balboa & Grzywacz, 2003). B. The unconditional marginal distributions of the red-green and blue-yellow pairwise-difference (PD) detector responses r1 and r2 collected from 300 database images. Small-dashed line shows marginal distribution produced by the generative model shown in Figure 6.
Figure 4
 
Distributions of spatial contrast values in natural images. A. Distribution of contrast measurement in natural images shows a characteristic S shape on a log plot (from Balboa & Grzywacz, 2003). B. The unconditional marginal distributions of the red-green and blue-yellow pairwise-difference (PD) detector responses r1 and r2 collected from 300 database images. Small-dashed line shows marginal distribution produced by the generative model shown in Figure 6.
Figure 5
 
Class-conditional ON-edge and OFF-edge distributions for spatially superimposed R-G and B-Y edge detectors. Upper. Contour plots of r1 and r2. Given the sparseness of the human contour labeling, the OFF-edge distribution made up 97% of the collected data. Gray stripes show values of ri's used to collect conditional distributions shown in bottom row. Lower. One-dimensional distributions of r2 conditioned on four values of r1 (0, 0.2, 0.4, and 0.6) reveal a typical higher-order correlation (see text). Probabilities are on a log(10) scale.
Figure 5
 
Class-conditional ON-edge and OFF-edge distributions for spatially superimposed R-G and B-Y edge detectors. Upper. Contour plots of r1 and r2. Given the sparseness of the human contour labeling, the OFF-edge distribution made up 97% of the collected data. Gray stripes show values of ri's used to collect conditional distributions shown in bottom row. Lower. One-dimensional distributions of r2 conditioned on four values of r1 (0, 0.2, 0.4, and 0.6) reveal a typical higher-order correlation (see text). Probabilities are on a log(10) scale.
Figure 6
 
Diagram of the saturated common factor generative model. Distal edge magnitudes e i are assumed to be exponentially distributed both ON and OFF edges. The block diagram applies to both cases, differing only in the mean 1/q of the exponential distributions. Distal edge values are multiplied by an exponentially distributed common factor C that affects all sensory channels. Each scaled variable R i is then passed through a saturating nonlinearity g( x) to yield the measured edge-detector responses r 1r N.
Figure 6
 
Diagram of the saturated common factor generative model. Distal edge magnitudes e i are assumed to be exponentially distributed both ON and OFF edges. The block diagram applies to both cases, differing only in the mean 1/q of the exponential distributions. Distal edge values are multiplied by an exponentially distributed common factor C that affects all sensory channels. Each scaled variable R i is then passed through a saturating nonlinearity g( x) to yield the measured edge-detector responses r 1r N.
Figure 7
 
ON-edge and OFF-edge distributions generated by the SCF model using only a single parameter q (with p = K = 1). Plots are maximum likelihood fits to the distributions shown in Figure 5. Similar features include gradual transition from diagonal to square contours, and increasing variance of either variable conditioned on an increasing value of the other.
Figure 7
 
ON-edge and OFF-edge distributions generated by the SCF model using only a single parameter q (with p = K = 1). Plots are maximum likelihood fits to the distributions shown in Figure 5. Similar features include gradual transition from diagonal to square contours, and increasing variance of either variable conditioned on an increasing value of the other.
Figure 8
 
Plots of the combination rule P(ON∣ r 1, r 2). A. Contour and surface plots of Equation 7 using measured likelihood tables P( r 1, r 2∣ON) and P( r 1, r 2∣OFF). B. Corresponding plots derived from the saturated common factor–generated likelihoods using the parameters in Table 1. C. Plots generated directly from edge-detector responses using Equations 18, 21, and 22. Sigmoid was steeper than in A, though contour shapes near the decision point (i.e., 50% probability level near the 8 th contour) were quite similar.
Figure 8
 
Plots of the combination rule P(ON∣ r 1, r 2). A. Contour and surface plots of Equation 7 using measured likelihood tables P( r 1, r 2∣ON) and P( r 1, r 2∣OFF). B. Corresponding plots derived from the saturated common factor–generated likelihoods using the parameters in Table 1. C. Plots generated directly from edge-detector responses using Equations 18, 21, and 22. Sigmoid was steeper than in A, though contour shapes near the decision point (i.e., 50% probability level near the 8 th contour) were quite similar.
Figure 9
 
Two-step saturated common factor (SCF) normalization procedure. A. Each detector response is passed through expansive nonlinearity h( x), which inverts the presumed saturating nonlinearity g( x); this results in diagonal, though unevenly spaced, contours in the joint distribution. B. Estimated value of C from Equation 17 plotted vs. actual value in Monte Carlo simulation of the SCF model using the parameters in Table 1. Correlation coefficient is r = 0.75; correlation between sum-of-squares normalizer and actual C was 0.47. C. The R is are then divided by C ^, leading to a close-to-independent exponential joint distribution. D. Conditional slices are now nearly superimposed with each other (slices taken at e 1 = 0, 1, 2, and 3), indicating higher-order dependencies have been largely eliminated—compare to lower row in Figure 5. E and F. Same as C and D but for ON-edge distributions.
Figure 9
 
Two-step saturated common factor (SCF) normalization procedure. A. Each detector response is passed through expansive nonlinearity h( x), which inverts the presumed saturating nonlinearity g( x); this results in diagonal, though unevenly spaced, contours in the joint distribution. B. Estimated value of C from Equation 17 plotted vs. actual value in Monte Carlo simulation of the SCF model using the parameters in Table 1. Correlation coefficient is r = 0.75; correlation between sum-of-squares normalizer and actual C was 0.47. C. The R is are then divided by C ^, leading to a close-to-independent exponential joint distribution. D. Conditional slices are now nearly superimposed with each other (slices taken at e 1 = 0, 1, 2, and 3), indicating higher-order dependencies have been largely eliminated—compare to lower row in Figure 5. E and F. Same as C and D but for ON-edge distributions.
Figure 10
 
Linear combination rule after saturated common factor normalization. A. Contour and surface plots show combination rule is a sigmoidal function of the sum of the two normalized edge detector responses e 1 and e 2.
Figure 10
 
Linear combination rule after saturated common factor normalization. A. Contour and surface plots show combination rule is a sigmoidal function of the sum of the two normalized edge detector responses e 1 and e 2.
Figure 11
 
Comparison of saturated common factor (SCF) normalization to that proposed by Schwartz and Simoncelli (2001). A. Class conditional distributions and combination rule after S-S normalization. ON and OFF distributions retain significant higher-order correlations (compare to Figure 10). Resulting combination rule based on normalized variables remains nonlinear (compare to Figure 9). S-S parameters were σ = 0.74, k = 2.5, and w = 1.0. B. Comparison between SCF and S-S normalization when edges were extracted using conventional Gabor rather than pairwise difference filters applied separately to the RG and BY color channels (σw = 1, σh = 4, and λsin = 4). SCF normalization again leads to simpler, nearly independent exponential joint distributions, and the resulting SCF-derived combination rule is correspondingly more linear. SCF parameters for Gabors determined from ML fit: K = p = 1 (fixed) and q = 192; S-S parameters: k = 1.5, σ = 0.3, and w = 0.8.
Figure 11
 
Comparison of saturated common factor (SCF) normalization to that proposed by Schwartz and Simoncelli (2001). A. Class conditional distributions and combination rule after S-S normalization. ON and OFF distributions retain significant higher-order correlations (compare to Figure 10). Resulting combination rule based on normalized variables remains nonlinear (compare to Figure 9). S-S parameters were σ = 0.74, k = 2.5, and w = 1.0. B. Comparison between SCF and S-S normalization when edges were extracted using conventional Gabor rather than pairwise difference filters applied separately to the RG and BY color channels (σw = 1, σh = 4, and λsin = 4). SCF normalization again leads to simpler, nearly independent exponential joint distributions, and the resulting SCF-derived combination rule is correspondingly more linear. SCF parameters for Gabors determined from ML fit: K = p = 1 (fixed) and q = 192; S-S parameters: k = 1.5, σ = 0.3, and w = 0.8.
Figure 12
 
Sample edge-detected images. Pairwise difference edges were run in 3 color channels, including intensity channel O3. Edge values were normalized with Equation 18 (with e = 0.001) and summed. Parameters were as in Table 1. Given uncertainty in the true prior P(ON), in lieu of the final sigmoidal operation ( Equations 21 and 22), which depends strongly on P(ON), we used ordinary contrast enhancement (+55 setting in Adobe Photoshop). Gray level is thus monotonically related to the max value of P(ON∣r1, r2, r3) over the 8 orientation channels at each pixel. Same parameters were used for all 4 images.
Figure 12
 
Sample edge-detected images. Pairwise difference edges were run in 3 color channels, including intensity channel O3. Edge values were normalized with Equation 18 (with e = 0.001) and summed. Parameters were as in Table 1. Given uncertainty in the true prior P(ON), in lieu of the final sigmoidal operation ( Equations 21 and 22), which depends strongly on P(ON), we used ordinary contrast enhancement (+55 setting in Adobe Photoshop). Gray level is thus monotonically related to the max value of P(ON∣r1, r2, r3) over the 8 orientation channels at each pixel. Same parameters were used for all 4 images.
Figure 13
 
A compressive nonlinear transform warps a linear combination rule. A. Three compressive nonlinear functions; the first two grow with progressively decreasing exponent, while the square root grows with constant power. B-D. Contour plots of a linear combination rule f( x 1, x 2) = x 1 + x 2 expressed in the compressed coordinates g( x 1) and g( x 2), where the particular g( x) function is shown in each inset. LIN-MAX pattern is present in B and C, but not D. z axis values are plotted on a log scale, affecting only the spacing of the contours. The output function f( x) is linear in these examples; substituting a nonlinear output function would again affect only the spacing of the contours.
Figure 13
 
A compressive nonlinear transform warps a linear combination rule. A. Three compressive nonlinear functions; the first two grow with progressively decreasing exponent, while the square root grows with constant power. B-D. Contour plots of a linear combination rule f( x 1, x 2) = x 1 + x 2 expressed in the compressed coordinates g( x 1) and g( x 2), where the particular g( x) function is shown in each inset. LIN-MAX pattern is present in B and C, but not D. z axis values are plotted on a log scale, affecting only the spacing of the contours. The output function f( x) is linear in these examples; substituting a nonlinear output function would again affect only the spacing of the contours.
Figure 14
 
Demo of apparent increases in edge probability when color cues of different strengths are combined. The first 3 images in each column show a sinusoidal grating defined by (1) isoluminant L-M bars, (2) isoluminant B-(L+M)/2 bars, and (3) intensity bars. The fourth image shows RGB superposition of the 3 single-cue gratings. RGB saturation was avoided by building all gratings on an RGB pedestal of (128, 128, 128) and making sure RGB values never exceeded 80% of their maximum (0.8 × 255 = 204), or dropped below 20% of their minimum (0.2 * 255 = 51) value. Single-cue chromatic bars were created using the following method provided by Elizabeth Johnson: A full-strength L-isolating grating was created by multiplying the vector (0.792, −0.16, −0.026) by a sinusoidal modulation of amplitude max-min and adding to a gray RGB pedestal (see above). Weaker gratings were made by scaling down the modulation. Corresponding RGB directions for M and S cones were 1.26, −0.65, and 0.0032, and 0.15, −0.25, and 0.71, respectively. Very weak single-cue edges (left column) are virtually invisible separately but combine to create a discernible partition. Cues of intermediate strength (middle column) are perceptible, but when combined lead to a clear increase in edge salience. Combining strong cues (right column) leads to diminishing returns.
Figure 14
 
Demo of apparent increases in edge probability when color cues of different strengths are combined. The first 3 images in each column show a sinusoidal grating defined by (1) isoluminant L-M bars, (2) isoluminant B-(L+M)/2 bars, and (3) intensity bars. The fourth image shows RGB superposition of the 3 single-cue gratings. RGB saturation was avoided by building all gratings on an RGB pedestal of (128, 128, 128) and making sure RGB values never exceeded 80% of their maximum (0.8 × 255 = 204), or dropped below 20% of their minimum (0.2 * 255 = 51) value. Single-cue chromatic bars were created using the following method provided by Elizabeth Johnson: A full-strength L-isolating grating was created by multiplying the vector (0.792, −0.16, −0.026) by a sinusoidal modulation of amplitude max-min and adding to a gray RGB pedestal (see above). Weaker gratings were made by scaling down the modulation. Corresponding RGB directions for M and S cones were 1.26, −0.65, and 0.0032, and 0.15, −0.25, and 0.71, respectively. Very weak single-cue edges (left column) are virtually invisible separately but combine to create a discernible partition. Cues of intermediate strength (middle column) are perceptible, but when combined lead to a clear increase in edge salience. Combining strong cues (right column) leads to diminishing returns.
Table 1
 
Parameters of the saturated common factor model used to generate the distributions in Figure 7.
Table 1
 
Parameters of the saturated common factor model used to generate the distributions in Figure 7.
Description Parameter Value
Prior probability of edge P(ON) 0.03
Mean raw ON-edge response 〈 e ON 1/ q ON 0.563
Mean raw OFF-edge response 〈 e OFF 1/ q OFF 0.086
Mean of common factor 〈 C 1/ p (fixed) 1
Knee of sensor saturation function g() K (fixed) 1
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×