Placing the nonlinearity and threshold in the feature detector is a fiction, a useful but unrealistic assumption that is adequate for the purposes of this paper, but ultimately to be replaced by the nonlinear combination rule of the feature integrator. We cannot suppose that the feature detector and integrator are both linear because the cascade of two linear operations is equivalent to a single linear operation, so the two stages would collapse to one consisting of more-complex-shaped-feature detectors and no second stage. Selfridge’s (
1959) pandemonium model suffers from this defect. Minsky and Papert (
1969) showed that the computational power of a one-layer network (i.e., simple feature detection), which they dubbed a “perceptron,” is quite limited and cannot solve some important problems, such as closure. The second-order literature is full of tasks that people can do, but which an isolated feature detector cannot (see Chubb, Olzak, & Derrington,
2001; Landy & Graham,
2004). However, if the layers are separated by a nonlinearity then they don’t collapse and the perceptron limits do not apply.
For the purposes of this paper we assume a high threshold within the feature detector. The high-threshold assumption is known to be wrong, because it cannot cope with the finding that when observers lower their criterion both the hit and false alarm rates rise together (Nachmias & Steinman,
1965). Nor can it cope with the finding that when noise is added to the display, the observer’s threshold is at a constant effective signal-to-noise ratio (Pelli,
1985; Pelli & Farell,
1999). (Also see Palmer, Verghese, & Pavel,
2000.) However, for a given criterion and noise level, the high-threshold assumption remains a popular and useful simplification because it captures the steepness of real psychometric functions (much steeper than a cumulative normal), the relatively sudden disappearance of the signal as contrast is reduced below threshold, and allows surprisingly accurate calculations of the increase in visibility with added features (Pelli,
1985; Graham,
1989). In fact, the criterion and noise effects tell us that the nonlinearity cannot lie within the feature detector, and must, instead, be in the integrator, in the way the (linear) feature signals are combined. For example, for detection and near-threshold discrimination, a maximum rule (output is maximum of the many inputs) among 100 to 1000 linear detectors exhibits appropriate dependence on criterion, added noise, and number of features (Pelli,
1985). The tasks considered here (e.g., letter identification) clearly require different nonlinear combination rules, which are still a mystery.