Our algorithm was successful at equating the selections of each color; 16 participants had exactly equal numbers of correct responses across the three training conditions, with the remaining 26 varying by only one more correct trial in the highest condition than the lowest. Unsurprisingly, an ANOVA on the number of correct responses for each condition showed that conditions did not differ (F(2, 82) = 0.04,
p = 0.96, η
p2 = 0.001), and, as
Figure 4 (top panel) shows, the average number of correct responses was equated across each condition within each block. However, as with the previous experiments, the percentage correct (see
Figure 4, bottom panel) did differ across conditions (F(2, 82) = 7.56,
p = 0.001, η
p2 = 0.156), with significantly better performance for the rewarded group (M = 74.1%, SE = 1.8%) than either the punished (M = 66.2%, SE = 2.8%) or the neutral (M = 69.3%, SE = 2.0%) conditions, both (t(41) > 3.22, ps < 0.002, ds > 0.39). The punished condition did not differ significantly from the neutral condition (t(41) = 1.37,
p = 0.178, BF
01 = 3.37). The reason that the number of correct detections was equated, even though the percentage correct differed, was that the algorithm resulted in significantly (F(2, 82) = 7.71,
p = 0.001, η
p2 = 0.16) fewer rewarded target trials (M = 187.7, SE = 5.11) than no consequence (M = 201.12, SE = 5.09) or punished (M = 225.5, SE = 10.54) trials (both t (41) > 3.15, p < 0.004, d > 0.40). The number of trials for the punished condition was significantly more than the number of trials for the neutral condition (t(41) = 2.13,
p = 0.04, d = 0.45). As a consequence of equating the number of correct responses, the net gain/loss during training was close to 0. The final payoff for all participants were essentially constant, in the range of $10 ± 0.1 across conditions.