The influence of local structure, however, seems more difficult to understand. Orientation and contrast boundary segmentation performance is the same for the intact and locally scrambled conditions at a given density (Zavitz & Baker,
2013), which suggests that local phase alignment is not encoded by the mechanisms that segment these boundaries. Yet in
Experiment 1, performance is the same in the INT-GS and LS-GS conditions at low densities but different at the highest density tested. Furthermore, in the density segmentation task, we see a small but consistent performance
impairment in the INT condition. Because phase scrambling reduces sparseness even on a small scale, LS micropatterns may be effectively slightly less sparse than INT micropatterns. It may even be the case that because this difference in sparseness arises from the micropatterns themselves, the more micropatterns are added (as in the high density conditions), the greater the difference in sparseness between INT and LS textures having the same number of micropatterns. If this were the case, in
Experiment 1 we would expect to see segmentation occurring in the INT-LS conditions. It does occur for some observers, in whom we see what might be a slight trend toward lower thresholds in higher density conditions as would then be expected. We would also expect that the LS-GS condition would be more difficult than the INT-GS condition at higher densities, which appears to be the case. Finally, in
Experiment 2, the LS condition would be easier if the effective sparseness difference between its constituent halves is greater than implied simply by the difference between their numbers of micropatterns. Thus local differences in sparseness might explain these effects of local structure in human observers, though it is unclear why the model is not sensitive to these differences. One possibility might be that we have used inappropriate spatial frequencies or bandwidths for the model's first-stage filters. It is unlikely that the model is using insufficiently high spatial frequencies to match those of human vision, because the highest spatial frequency filter used in the model (160 cycles per image, about 25 cycles/°) is well into the high frequency roll-off of the human contrast sensitivity function (Campbell & Robson,
1968). There are no gaps in the model's first-stage spatial frequency representation, so within the range of 3–25 cycles/°, no information is lost. However, the optimal tuning characteristics of the first stage filters remain an interesting question—see for example,
Westrick and Landy (2013).