**Saunders and Chen (2015) have recently proposed a Bayesian model of the perception of 3D shape from texture that they claim is superior to an alternative model based on scaling contrast that was originally proposed by Todd, Thaler, Dijkstra, Koenderink, and Kappers (2007). This commentary will review a variety of empirical findings that are relevant to this issue, and it will also evaluate how well these findings can be explained by different possible models that have been proposed in the literature. The results will demonstrate that the scaling contrast model can account for almost all of the factors that can influence apparent shape from texture with just two free parameters. The Bayesian model of Saunders and Chen has a greater number of free parameters than the scaling contrast model, yet it is not sufficiently developed to make quantitative predictions about any of these factors. Moreover, there are several other processes that would be necessary to actually implement their model that are not mentioned in their theoretical discussion. These include some mechanism for measuring veridical slant from texture, and a mechanism for computing global 3D shape from local estimates of optical slant.**

**Figure 1**

**Figure 1**

**Figure 2**

**Figure 2**

*is the relative depth of a surface point in a visual direction (*

_{ϕ}*ϕ*),

*S*′

*is the projected length of an optical texture element in that direction,*

_{ϕ}*S*′

_{Max}and

*S*′

_{Min}are the maximum and minimum projected texture lengths in the entire image, and

*k*is a constant. Astute readers will recognize that Equation 1 is a form of contrast normalization—a mechanism that is observed in a wide variety of sensory processes. That is why the model is referred to as scaling contrast. Todd et al. (2007) also provided a mathematical analysis to show that scaling contrast uniquely specifies the depth contrast within a visible scene.

*k*across all of the different conditions. Note that this produces almost perfect fits to the observers' judgments, and that it captures all of the different effects described above involving camera angles, viewing angles, and the sign of surface curvature.

*k*). Given the remarkably accurate fits of perceived shape from texture provided by the scaling contrast model, it sets a high bar for any alternative model that is proposed to replace it.

**Figure 3**

**Figure 3**

**Figure 4**

**Figure 4**

*σ*) at a given surface location using the following equation: where

*S*′ is the projected major axis of an optical texture element, and

*C*′ is the compression of its projected minor axis.

**Figure 5**

**Figure 5**

*D*′ is the projected distance between neighboring optical texture elements in the direction of slant, and

*S*′

_{1}and

*S*′

_{2}are the projected lengths of those texture elements (see Figure 5). In the limit of an infinitesimally small D′, the right side of Equation 3 equals the normalized scaling gradient (Gårding, 1992). It is interesting to note that Equations 1 and 3 are quite similar to one another. The primary difference is that scaling contrast exploits the entire range of scale differences within an image, whereas the normalized scaling gradient only considers them within local neighborhoods. Because of this difference, the scaling gradient is much more sensitive to noise than is scaling contrast. For example, if a 2% error were applied to the measured values of

*S*′

_{1}or

*S*′

_{2}within a 5° neighborhood, the estimated slant would deviate from the ground truth by 13° for a 5° slant and 3° for a 60° slant. For scaling contrast, on the other hand, a 2% error in the measured values of

*S*′

*or*

_{Max}*S′*would produce estimated slant errors of only 1.6° and 1.5°, respectively, for depicted slants of 5° and 60°.

_{Min}*σ*), because

*σ*varies linearly with viewing direction for planar surfaces (see Todd et al., 2005). For example, if a 50° slanted surface is observed with a visual angle of 60° (see Figure 3A), the optical slants will vary from a maximum value (

*σ*

_{Max}) of 80° to a minimum value (

*σ*

_{Min}) of 20°.

*σ*

_{Max)}. He also demonstrated that when

*σ*

_{Max}is held constant, changes in the field of view that increase or decrease

*σ*

_{Min}have no significant effect on performance. Because Saunders and Chen (2015) assume that discrimination thresholds of slant from texture are proportional to the spreads of the likelihood distributions, it follows from their hypothesis that changes in the field of view that do not alter

*σ*

_{Max}should have no significant effect on judged slant. As it turns out, many of the field of view manipulations in Todd et al. (2005) and Todd et al. (2007) satisfied this condition, and the results in those cases do not confirm the Bayesian prediction. For example, when viewed from the correct visual angles, the two concave surfaces depicted in Figure 1 both have identical

*σ*

_{Max}values of 55°, but the apparent depth-to-width ratio of the 60° image is 80% larger than observers′ perceptions of the 20° version. This finding indicates that Saunders and Chen (2015) will have a difficult time explaining the effects of field of view within their proposed Bayesian perspective.

*S′*and increase the measured value of

_{Max}*S′*Thus, the effects of averaging for irregular textures would be expected to reduce the apparent depths of surfaces, which is consistent with the empirical findings of Todd et al. (2005) and Todd et al. (2007).

_{Min.}**Figure 6**

**Figure 6**

*σ*) at a particular location along a given scan line of the surface can be computed using the following equation: where

*w*is the width of an optical texture element at that location in the direction of the scan line, and

*w*

_{Max}is the maximum width in that direction among all the elements on the scan line. This model provides an excellent account of observers′ perceptions of shape from texture for surfaces rendered under orthographic projection or with small camera angles. However, when applied to images with significant amounts of perspective it produces systematic errors that are incompatible with observers′ shape judgments. For example, an analysis of directional width gradients predicts that the planar surface depicted in Figure 3A should appear highly curved. Because of its inability to cope with perspective, Todd and Thaler (2010) argued that this model is only viable if used in conjunction with scaling contrast. According to this account, observers′ judgments will be based on scaling contrast whenever there are measurable systematic variations of scaling within the pattern of optical stimulation, and that directional width gradients are used as a fall back whenever these variations are absent.

*, 2, 327–350.*

*Journal of Mathematical Imaging and Vision**. Boston, MA: Haughton Mifflin.*

*The perception of the visual world**, 38, 1683–1711.*

*Vision Research**, 38, 2635–2656.*

*Vision Research**, 23, 149–168.*

*International Journal of Computer Vision**, 42, 1454. (University Microfilms No. 58-5594).*

*Dissertation Abstracts International**, 47, 411–427.*

*Vision Research**, 45, 1501–1517.*

*Vision Research*