Abstract
The hand grasp type provides visual information about a person’s intended action. We conducted an experiment to evaluate how accurately humans can judge basic manipulation action categories from static images and how much image information they need to make this judgment. We classified manipulation actions into three basic categories, namely ‘power action’ (including actions, such as lifting, cutting, hitting with force), ‘precision action’ (such as playing an instrument, writing, or stirring with a straw) and ‘casual movement’ (such as showcasing, posing, or gesturing). Images were shown in sequences of three: the first one showing the bare hands segmented from background, the second one showing square patches around the hands, and the third showing the full image. Mechanical Turkers were shown 39 images, and asked to make a judgment on the action category. 57 subjects participated, of which 48 were considered valid responses. The accuracy of action classification was nearly the same, when subjects saw the hand patches as when they saw the whole image (71% vs.77% in average over all actions). Accuracy decreased significantly when subjects only saw the segmented hands (54 % accuracy over all images) (confirmed as a significant difference with 99.9% using t-testing). The findings confirm the importance of the grasp type as a symbolic feature for action interpretation. The statistics will be used also to provide confidence intervals for our computational algorithms that recognize human manipulation actions from images and video.
Meeting abstract presented at VSS 2015