A second possible explanation is that a modified version of a logistic regressor is a better characterization of the subject's responses than the version we have studied so far. In this modified version, the weighted sum of a regressor's inputs,
S =
w i x i, is mapped to the probability that the subject judged a stimulus as belonging to class
A (
y = 1) using a modified logistic function:
p(
y = 1∣
) = 1 / (1 +
e − S/ β) (the original logistic function is recovered by setting
β = 1). In this new model, the parameter
β is analogous to a variance parameter. If
β is a small value (e.g.,
β = 0.1), then the model will tend to always believe that a stimulus belongs to class
A with a probability of either 1 or 0 (intermediate probabilities will be rare). In this case, the model is essentially deterministic, and the model is said to “exploit” its current knowledge. If
β is a large value (e.g.,
β = 10), the model will tend to always believe that a stimulus belongs to class
A with an intermediate probability (extreme probabilities near 1 or 0 will be rare). It will appear to be at least partially random. For example, if the model believes that the probability that a stimulus belongs to class
A is 0.6, then it will judge the stimulus as belonging to class
A with a probability of 0.6 and will judge the stimulus as belonging to class
B with a probability of 0.4. In this case, the model is said to “explore”. In the field of machine learning, there is a lot of discussion about the advantages and disadvantages of exploration and exploitation. Exploration is often thought to be useful when a learner has incomplete knowledge of its environment or when an environment is non-stationary (Bellman,
1956; Sutton & Barto,
1998; note that the exploitation/exploration trade-off is closely related to a sub-optimal decision-making strategy known as “probability matching” [e.g., Newell, Lagnado, & Shanks,
2007]).