There is an intimate connection between action and reward. By systematically varying rewards and punishments in visuo-motor tasks, we pose problems to the movement planning system and, in doing so, we can potentially reveal aspects of movement planning not otherwise observable.
There are now several studies where experimenters impose rewards and punishments on possible outcomes in motor tasks and evaluate how close subjects come to maximizing expected gain (Hudson, Maloney, & Landy,
2008; Trommershäuser, Gepshtein, Maloney, Landy, & Banks,
2005; Trommershäuser, Landy, & Maloney,
2006; Trommershäuser, Maloney, & Landy,
2003a,
2003b). Overall, the results of these studies give the impression that people are nearly optimal movers even when they have little experience in a particular motor task. Consequently, the large, qualitative failures observed in Wu et al. (
2009) are striking.
In that study, subjects were asked to allocate time between two successive reaching movements to targets as the experimenter varied the rewards associated with hitting the targets. Subjects either did not vary their allocation of time or varied it in the wrong direction even when one target was as much as five times more valuable than the other. In related experiments involving only a single reaching movement, subjects did vary the time allocated to the movement so as to maximize their performance (Battaglia & Schrater,
2007; Dean et al.,
2007; Hudson et al.,
2008).
In
Experiment 1, we considered the possibility that the observed failures in allocating time were the consequence of a lack of experience with allocating time between movements. If so, a session of motor practice before the motor decision task should move human performance toward optimal performance, maximizing expected gain.
In the first part of the experiment, we trained subjects to divide up the total movement time in different ways (constrained timing task). They were able to do so (but see below) and in doing so they could observe the consequences of varying timing on their accuracy in both movements. However, in the second part of the experiment, when they were left to choose a timing strategy, they did not vary their allocation of time between targets. We observed a similar pattern of failures to that observed by Wu et al. (
2009). Training with the constrained timing task did not lead to improved performance in the choice timing task.
One interesting finding of
Experiment 1 was that when people attempt to divide the movement time of the sequential movements in a constrained ratio, their actual division regresses to a certain ratio that is very close to the ratio in their spontaneous time division. This outcome hinted that subjects resist varying time allocation away from a specific default value, their
preferred ratio. A second interesting finding is that the dwell times (the time subjects spent in contact with the first target before initiating the second movement) had a simple reciprocal relation to the proportion of time allocated to the first movement.
A third intriguing phenomenon revealed by
Experiment 1 is the unusually high efficiencies found. Even though subjects did not vary their time allocation between the two movements, their winnings were not reduced as much as we predicted they would be. Battaglia and Schrater (
2007) reported a similar phenomenon. Their task was to reach a target within a time limit for monetary rewards. The exact position of the target was hidden, and the location of the hidden target was signaled to the observer by dots sampled from an isotropic bivariate Gaussian distribution whose mean was the hidden target location. Subjects viewed the distribution of the dots, judged the target position, and then made the reach. The number of dots increased at a fixed rate after a trial began until the reach was initiated. Given the limited total time, there was a tradeoff between viewing time and movement time. Subjects out-performed predicted maximal expected gain although their time tradeoff significantly deviated from the optimal one predicted by the model. Battaglia and Schrater attributed the unexpectedly good performance to “increased participant motivation” for the experimental task than the baseline task. But they were still puzzled with the larger movement time variability in the experimental task, which could not be the result of higher motivation.
In our case, a motivation-difference explanation is untenable because our subjects were rewarded in both the training and test sessions. We considered a second explanation, that constraining time allocation reduces spatial accuracy of movements. We confirmed this possibility in
Experiment 2 by comparing the performance in two conditions that differed only in that time allocation was constrained in one but not in the other. In the constrained time condition we required subjects to carry out the movement with the timings that they would have freely chosen had the choice of time allocation been left up to them. We found that there was a cost of constraining time allocation that, in our task, was about a 17% reduction in expected gain.
Based on previous work concerning single movements (Carlton,
1994; Zelaznik, Mone, McCabe, & Thaman,
1988), we might expect that smaller spatial variance comes only at a cost of larger temporal variance. As shown in
Figure 11, this is not the case. In
Figure 11, we plot the temporal standard deviation for subjects and conditions in both
Experiments 1 and
2. For each experiment, we ran a repeated-measures one-way ANOVA for all its constrained timing and choice timing conditions. For
Experiment 2, there is no significant difference among conditions,
F(3, 9) = 0.56,
p = .66. For
Experiment 1, the effect of condition is significant,
F(6, 42) = 15.12,
p < .001, but as a Tukey's HSD test shows, the significant differences are either between two constrained timing conditions, or between choice and constrained conditions with the standard deviation of a choice timing condition significantly
less than that of a constrained timing condition, exactly the reverse of what we might expect given previous work. Subjects achieve higher spatial accuracy without detectable decreases in temporal accuracy (
Experiment 2) or even with increases in temporal accuracy (
Experiment 1).
We are left with two questions: Why does choice time allocation in sequential reaching movements improve the spatial accuracy of reaching without a concomitant decrease in temporal accuracy? And why do subjects not vary their time allocation as we vary reward? We address these two questions next.
Soechting and Flanders (
1998) emphasized that imposing different constraints on motor dynamics may lead the motor system to adopt qualitatively different solutions for motor control. That is, the motor system can switch “motor strategies” in response to changes in task demands. In a recent paper, for example, Welchman, Stanley, Schomers, Miall, and Bülthoff (
2010) found that movements made in reaction to an opponent's movements are faster than movements initiated voluntarily. They argue that different movement types have different neural bases.
While there may be a fixed relation between speed and accuracy for any one strategy, the relation between speed and spatial or temporal accuracy for movements generated by two different strategies is less clear.
Meyer, Smith, and Wright (
1982) considered the different functional forms of speed–accuracy tradeoff found by Fitts (
1954) and Schmidt, Zelaznik, and Frank (
1978; Schmidt et al.,
1979). Fitts constrained spatial accuracy and asked subjects to maximize speed. He found that the relation between speed and accuracy was logarithmic in form, a relationship known as Fitts' Law. Schmidt et al. constrained speed and asked subjects to be as spatially accurate as possible. They found an SAT that was linear in form, not logarithmic.
Meyer et al. (
1982) conjecture that the different forms of SAT, linear and logarithmic, resulted from the use of different motor strategies (models): the symmetric impulse variability model and the overlapping impulse model.
The symmetric impulse variability model assumes that a reaching movement is produced by generating a single force pulse whose cycle runs from the start to the end of the movement. Both spatial and temporal uncertainty are determined by the choice of force pulse and increase linearly.
The overlapping impulse model assumes that a reach is the result of a series of small, overlapping force pulses. The model allows for the possibility of multiple spatial corrections during the reach and a consequent reduction of spatial uncertainty. Meyer et al. (
1982) show that the logarithmic SAT (Fitts' Law) is a consequence of adopting this model.
Similarly, Bye and Neilson (
2008) proposed their BUMP model of motor control which includes two motor control strategies: fixed horizon control and receding horizon control. The basic assumption of the BUMP model is that movement control is a discrete-time process consisting of multiple intervals. In each interval, typically 100–200 ms, motor commands for the incoming movement stage are computed. When a movement is close to its end, the fixed horizon control and the receding horizon control differ in whether the motor commands generated in each interval are supposed to end at a specific time. The fixed horizon control allows more accurate control of timing but poses a more difficult computational problem than the receding horizon control. Interpreting our results by the BUMP model, we conjecture that choice time allocation in sequential movements induces receding horizon control while constrained time allocation leads to fixed horizon control.
Todorov and Jordan (
2002; Todorov,
2004) and others (Diedrichsen & Gush,
2009) conjecture that the motor system could minimize the effort of motor control by allowing variances in task-irrelevant dimensions to increase. It implies an effective switch of motor control strategy in the face of different task situations.
In the constrained timing trials, we were able to model the SATs for two successive movements as we changed the constraints on timing. But when the timing constraint was removed altogether, subjects reached with greater spatial accuracy than we would expect based on their performance in the constrained task. We conjecture that this change in SAT corresponds to a shift in motor strategy, a shift in how the reaching movement is generated and controlled.
What is unusual about the observed speed and accuracy in the choice timing conditions is that subjects (with one exception) do not vary mean timing as we vary the rewards associated with successful completion of the first or second reaching movement. We conjecture that they cannot. That is, the movement they adopt for the choice timing task is generated by a privileged motor strategy that, given the conditions of our experiment, can divide time between the two movements in only one ratio, the one observed.
That is, the privileged timing strategy achieves a higher spatial accuracy for the same speed as the constrained timing strategy, but, with this strategy, the motor system has no freedom in allocating time between the two movements. Only one time division is possible. We further conjecture that the privileged time allocation corresponds to the preferred time durations identified in analyzing the constrained timing data of
Experiment 1.
If our conjecture concerning two strategies is valid, then we may schematize the possible speed–accuracy tradeoffs available to the subject: a range of SATs available through constrained timing where accuracy smoothly increases with increased time duration of each of two movements and an isolated point, corresponding to the privileged timing strategy with only one possible allocation of times between the two movements.
If the privileged strategy leads to higher spatial accuracy than the constrained at every speed across the range employed in the experiment, then it is always the strategy to employ in order to maximize expected gain in
Experiment 1. The subject maximizing expected gain should not vary timing as we vary reward over the range of rewards employed and speeds evoked and that is what (with one exception) they did.
If this analysis is correct, then subjects did err but in only one respect. In the constrained timing task in
Experiment 2, they should have employed the same movement strategy as they did in the choice timing task. It may be that subjects knew that the privileged motor strategy led to greater spatial accuracy than the constrained but incorrectly believed that it led to lower temporal accuracy as well. Thus, when the instructions put an emphasis on time, subjects used the constrained motor strategy, intending to sacrifice spatial accuracy for temporal accuracy. They did not know the sacrifice was in vain.