In this section, we show how the GMM can be used to induce features from the two-pixel example and discuss some of the characteristics that the stimuli and features must possess for this method to work.
The GMM tries to find clusters in the data set. Each cluster is represented as a Gaussian distribution with as many dimensions as there are pixels in the displays (two in this example). These clusters can be thought of as local modes in the data. The GMM will only successfully recover the features if they correspond to the modes in the data distribution.
Figure 3 is a representation of the stimulus space for the two-pixel example. The
x- and
y-axes of the figure correspond to the grayscale values for the first and second pixels, respectively. Targets A and B are shown in blue and red, respectively. As described previously, 1,000 stimuli were generated, 500 from each target.
To determine whether the GMM properly recovers features from data, we created a benchmark data set in which the correct features are known. To create data sets with known features, we simulated an observer with the feature structure as shown in
Figure 4 (in the column labeled “features”) and represented in
Figure 3. Notice that, although the Category A and B features are better predictors of Targets A and B, respectively, these features do not perfectly match the targets. Recall that it is not necessary to assume that all image pixels are relevant to each feature; a feature could match only part of one target. The single feature for Category A, for example, ignores Pixel 1 (as represented by the “x” in
Figure 4 and by the blue horizontal line in
Figure 3).
A model of human decision making, illustrated in
Figure 4, was used to generate simulated data for this task. For each trial, the noisy test stimulus is compared to each of the features. The likelihood ratio,
L(F
l), that each feature, F
l, is present in the stimulus (relative to random noise) is calculated. Using a Bayesian-inspired decision rule in which the likelihood ratio of each feature is weighted based on how well the feature discriminates the two targets, these likelihood ratios are then combined within each response category (
L(A) and
L(B) for Response Categories A and B, respectively). More weight is given to more diagnostic features. The observer selects the response with the greatest combined likelihood ratio. Some features may be sensitive only to a part of a target, as is the case for the Category A feature, which ignores Pixel 1 and matches a stimulus based solely on Pixel 2. Details of this decision model for the simulated observer are given in the
1.
This observer model was used to classify the two-pixel stimuli into one of two classes: A or B. The classifications of all 1,000 stimuli are shown in the top panel of
Figure 3 as blue and red dots. (Because the contrasts have been adjusted so that performance is near 71%, the targets appear more similar than they did in
Figure 1.) It is clear that there are no obvious modes near the features. Indeed, when the GMM was applied to these data (one and two features when applied to Categories A and B, respectively), the positions of the recovered features were clearly biased away from the actual features and were distributed to capture the range of classified data (see
Figure 3).
It could be argued, however, that the GMM did capture the general pattern present in the features; the recovered B features, for example, are darker on Pixel 1 than the recovered A feature and one is lighter than the other on Pixel 2. This result, however, is mostly an accident of the restricted two-pixel stimulus space. In contrast, consider the simulation illustrated in
Figure 5. The data were generated as before, except that the Pixel 2 value of the second B feature was shifted toward neutral gray. Because the feature shift did not substantially change the classification boundary (top panel), the features recovered by the GMM did not substantially shift. These simulations illustrate that when all of the data are used, the GMM tends to find modes that cover the classified data, without direct regard for the features.
Feature recovery would be far more robust if there were clear modes near the features. One way to identify stimuli that are close to a feature (and far from other features) is to utilize only those trials on which the likelihood of one of the responses was far greater than that of the other response. This is the equivalent of asking observers for confidence ratings and using only high-confidence responses for analysis. Such a confidence measure was generated in the following manner. For each trial, the probability that the model would respond “A” was calculated as P(A) = L(A)/( L(A) + L(B)), and likewise, for “B”, P(B) = L(B)/( L(A) + L(B)) = 1 − P(A). Assuming the features are sufficiently distinct, high-confidence stimuli tend to be produced by high activation of a single feature.
For the first two-pixel simulation, the stimuli associated with a high-confidence response (the top 15% for each response class) are shown in the bottom panel of
Figure 3. When only these high-confidence trials were analyzed, clear modes were produced near the features, allowing the GMM to recover features that were quite close to the generating features.
There are a number of factors that might influence the ability of the GMM to recover a feature. It is perhaps most obvious that the features should be sufficiently distinct from each other. If features are too similar, then the modes near them will merge, rendering them unrecoverable. The B modes in the bottom panel of
Figure 5, for example, have merged, and thus, the two GMM-recovered features are identical. Real observers, however, are unlikely to select extremely similar features for a given task. If features are very similar, then it will also be difficult to determine the number of features that best fit the data—two close features might produce the same classifications as one feature near their mean.
The problem of identifying the number of feature components is further exacerbated if the GMM model is generalized. For simplicity, we have utilized a GMM model having Gaussian components with a diagonal covariance matrix and equal variance on each dimension. If the variances were allowed to differ, a single Gaussian component (an ellipse with a long axis parallel to the horizontal axis) could well represent the data points in the upper panels of
Figures 3 and
5. Investigation of the more general GMM model is a goal for future work.
It is interesting to consider cases where the observer selects features for a given task that are systematic distortions of the targets relative to the stimulus space. The top panel of
Figure 6 shows such a case for high-confidence responses. The chosen features are sufficient to do an excellent job of classifying, but because they are effectively outside the range of stimuli, the recovered features are systematically biased inwards. The opposite situation is shown in the lower panel. Now, the features are very similar to the targets and, more importantly, to each other. Stimuli that are similar to features from the opposite class cannot produce high-confidence responses. Hence, the recovered features move outward from the actual ones. These results occur because two forces compete to produce high-confidence responses: The stimuli must be close to a feature from one class and far from the features from other classes.
Because analysis of the entire stimulus (target plus noise) tends to recover the targets and not the characteristics of the observer's classification strategy, classification images are created from the noise fields (i.e., the stimuli with the targets subtracted). To make the relationship between the generating and recovered features clear, the results in this section are analyses of the signal plus noise data. The results do not differ from those obtained by applying GMM to the noise fields. This convenient outcome is not typical. In general, the presence of the targets in the data sets will distort the recovered features. A simple example will make this clear. Consider a three-pixel target with grayscale values (ranging from 0 to 1) of (.7, .6, .6). Assume that there is a feature with grayscale values of (.9, .9, x), where the “x” indicates that the value for this pixel is not used when matching this feature to a stimulus. Assume that this feature is far from any other features and also much more likely than the other features to match stimuli generated by the (.7, .6, .6) target. The cluster of high-confidence data for the (.7, .6, .6) target trials would tend to cluster around (.9, .9, .6), if the full stimuli are analyzed. The values for the first two pixels come from the feature; because the feature is insensitive to the third pixel, the value of that dimension comes from the target. Removing the target and fitting GMM to the noise fields would remove its influence on the third pixel (because the noise has a zero mean and is normally distributed, unused pixels tend to be 0). This example illustrates the benefits of analyzing the noise fields, that is, the target-plus-noise stimuli with the targets subtracted.
This example also suggests that, although using the noise fields from high-confidence data produces excellent results, a subtle bias from the target still remains. Consider the first two pixels of this three-pixel example. Although the first two pixels of the generating feature are equal (.9), when the targets are subtracted, the expected recovered values are not (i.e., (.9, .9, .6) − (.7, .6, .6) = (.2, .3, .0)). Fortunately, this bias will only be extreme in the unlikely circumstance that observers are using features that are very different from the targets (e.g., if the feature were (.9, .1, x)). A subtler model of feature induction would take this negative bias into account. Because the bias is related to the subtraction of targets from the stimuli, explicitly modeling the case in which a feature ignores some pixels could solve the problem. The GMM model does not currently do this but could be extended to handle ignored pixels in the future.
In summary, the GMM will do an excellent job of recovering features that are well separated, frequently activated by the stimuli, and relatively close to their associated targets.