Abstract
Humans can solve computationally difficult planning problems, in large part owing to the ability to break problems down into parts by defining subgoals. Here we investigated how people make use of visual subgoals when facing a visually grounded planning task. Specifically, we employed a virtual block tower assembly paradigm wherein participants sought to generate plans to accurately reconstruct a series of towers based on their silhouettes. In this task, visual subgoals are defined as rectangular regions of the building area. We asked two questions: First, does the use of well-chosen visual subgoals reduce the cognitive cost of planning? Second, how sensitive are people to these cognitive-cost savings when selecting visual subgoals? For each of 128 towers, we generated pairs of visual subgoals delimiting a portion of the tower of equal size, but where one was predicted to take more planning time to solve than the other, based on a classical algorithm known as best-first search. We found that participants (N=80) took less time and made fewer errors on average when completing the subgoal predicted to be less costly, establishing differences in actual planning complexity between otherwise similar visual subgoals. We then presented other participants (N=80) with these subgoal pairs, who were asked to judge which of them they would prefer to build. We found that participants were systematically biased towards the less costly subgoal (72.1%; 95% CI: [70.1%, 74.1%]), indicating that they are sensitive to the planning cost of these visual subgoals even without having built them first. More broadly, these findings raise important questions concerning the mental processes that explain how people define and judge the tractability of visual subgoals.