May and Li claim that the use of nonlinear summation is not useful in detecting signals that are not limited by a fixed net contrast, and that even if it were a useful mechanism to detect plaids, it would not be useful for, say, contour integration. Actually, the original article already noted that in those cases where components do not overlap spatially nonlinear summation was not strictly necessary. However, it certainly could facilitate conjunction detection even in those cases.
The point is illustrated in
Figure 1. The center panel shows the response of the nonlinear summation mechanism described in Peirce (
2007), as a function of contrast in two input channels. A decision boundary has been added to the plot indicating all points where the response to the compound is greater than the maximal response to any single component presented alone (by at least 5% of the total dynamic range of the cell). This decision boundary might represent the threshold required to trigger a response, either at the spike-generating stage or in some later readout mechanism. It is hopefully very clear that the mechanism would be useful in detecting the conjunction in nearly all cases, excluding only those where one or both components has a low contrast, as discussed below. The only difference between spatially overlapping and spatially separated components is that spatially overlapping components are limited to the lower left portion of the graph. That portion, for the linear summation model (left panel), does not include any correctly detected conjunctions, hence, the necessity for an AND gate (either multiplicative or with nonlinear summation) for these types of compounds.
The principal reason that May and Zhaoping claim that nonlinear summation cannot support AND operations is that, with strong stimulation in one channel alone, there is always some degree of response; the sigmoidal nonlinearities described cannot force the summed signals back to zero. May and Zhaoping implicitly assume that the decision about whether the conjunction is present or absent is based on whether the mechanism responds
at all; there is an implicit boundary set at zero impulses per second. This is natural enough in mathematics but is not the only option, and for neurons, it seems an unrealistic decision boundary. Neuronal responses are noisy, and to allow tolerance to the noise, a readout/decision mechanism should surely set its decision boundary at some reasonable level
above zero. The natural choice would be to set it at some level just above the maximum response that can be generated from either component alone. When that is done, the “decision” to be made from the multiplicative AND gate (
Figure 1, right panel) and one using nonlinear summation is actually very similar. Furthermore, it takes very little extension (e.g., the addition of a spike-generating mechanism with an appropriate response threshold) to convert this into a “genuine” AND gate, for which there is no response at all when one of the components is not present. The sources that May and Zhaoping cite as existing evidence for multiplication (e.g., binocular obligate cells of Hubel & Wiesel,
1962) could equally be modeled by nonlinear summation of two signals, as in the middle panel of
Figure 1, with no need for multiplication.
The second criticism of the proposed mechanism was that, for the stimuli we used in our plaid (e.g., McGovern & Peirce,
2010; Peirce & Taylor,
2006) and curvature (e.g., Hancock & Peirce,
2008) adaptation experiments, the contrast of components was constant in the compound and component conditions and, therefore, that the sum of responses to the components would exactly equal the whole. This was obviously necessary in those studies, in order to show that adaptation to the “whole” is greater than to the “sum of the parts”. Again, the authors fail to consider the response (or readout) threshold. With this small addition, which I clearly should have made more explicit in the original description of the mechanism, it quickly becomes clear that either high-contrast component alone would fall outside the decision boundary and result in no response, whereas the compound stimulus would cross the boundary and a response would result.
May and Zhaoping's third assertion is, however, quite correct. The nonlinear summation mechanism will fail to detect conjunctions when the components are presented with very low contrast. As a result, if the visual system did use nonlinear summation in the detection of conjunctions then it might fail, or might have to resort to a different mechanism, when conjunctions are presented at low contrast.
In fact, that prediction appears to hold true. Certainly, selective adaptation to plaids, the phenomenon that caused me to think about the mechanism in the first place, falls off dramatically with probe contrast; by a Michelson contrast of 0.1, it is swamped by adaptation to the component gratings (McGovern & Peirce,
2010). Similarly, Meese and Freeman (
1995) show that plaid patterns tend to be perceived as two overlapping gratings at low contrast rather than as a single coherent pattern. Sarah Hancock has now attempted to collect similar data for the curvature aftereffect (the CAE, as described by Hancock & Peirce,
2008) but found the task of identifying magnitude of curvature too difficult to be able to measure an aftereffect when probes had a Michelson contrast below 0.1 (Hancock, unpublished observation). For radial frequency patterns, sensitivity is relatively constant for probe contrasts between 1.0 and 0.125 but then drops substantially when contrast is reduced further (Wilkinson, Wilson, & Habak,
1998). Similarly, sensitivity to contours in a field of Gabor patches plummets when contrast falls below 0.1 but is roughly constant above that (McIlhagga & Mullen,
1996). May and Zhaoping are correct to point out that for low contrasts the mechanism would predict a failure to detect or accurately discriminate conjunctions. What they fail to point out is that this fits very well indeed with data from numerous psychophysical studies into a wide range of “mid-level” visual tasks.
The fact that the visual system might perform conjunction detection simply by summing the nonlinear responses that we already know to exist does not mean that we would necessarily want to do that in computational modeling projects. Whereas nonlinear summation would seem easier to implement in neural circuits than multiplication (not requiring the three layers of neurons to perform the Babylonian trick that May and Zhaoping suggest) in mathematics, simply multiplying signals is often the more straightforward option and can result in very similar results, as shown in
Figure 1.