A simple model that exemplifies the above ideas and reproduces qualitatively the main experimental results is given as follows. Let
f be the probability that one percept is correct; hence 1 −
f is the probability that the competing percept is correct. Assume that each percept is “chosen” with the same probability that it is expected to be correct (i.e., either
f or 1 −
f), and that the cost of switching is proportional to the square of the alternation rate
r. If the initial state is the one with probability
f, then the probability of transitioning to the other percept and that it is the correct one is (1 −
f)
r. This happens with probability
f(1 −
f)
r. If the initial state is the one with probability 1 −
f, then the probability of transitioning to the other percept and that it is the correct one is
fr. This situation happens with probability (1 −
f)
fr. For the sake of simplicity, we assume that exploration occurs mainly during the initial periods after the transitions, rather than continuously throughout the whole dominance epochs. Therefore, the expected gain because of exploration is, on average, 2
f(1 −
f)
r. After subtracting from it the cost of exploration, the total expected gain per unit time after each transition is
where
a is a constant. For the cost of transitions, we chose the squared rate instead of linear or other nonlinear dependences because it is the simplest case that leads to nontrivial results (i.e., not choosing always either the most likely or less likely percept). Note that when the alternation rate is zero, the reward obtained from exploration is zero. Exploring with a large alternation rate is very costly, because of the square in the cost term, and it is not optimal either. There is a value of alternation rate for which the expected reward per unit time reaches a maximum, and this is attained when
This optimal alternation rate is in turn a function of
f, it is very similar to the entropy, and it has a maximum at
f = 1/2 (not shown). This simplistic model illustrates, consistently with experimental results, that perceptual alternations can lead to maximization of reward, and that the brain can pay the higher cost of increasing the alternation rate if the sensory input is highly ambiguous. Although the hypothesis that perceptual bistability is a form of exploration is consistent with the experimental observations, further research is required to determine its adequacy.