Abstract
When grasping objects, humans visually identify relevant object properties such as size, shape and material to select appropriate hand poses and surface contact regions. Prior research has mainly focused on precision grip grasps, with hand-object contact approximated as points (by projecting markers attached to thumb and index fingertips onto an object). This oversimplifies the complexity of real-world grasps, which often employ more than two digits and typically feature extensive regions of fingers and palm in contact with the object. Here, we provide the first extensive examination of how humans visually select both hand poses and surface contact regions during unconstrained multidigit (i.e., whole-hand) grasps with diverse objects and tasks. Participants grasped, lifted and manipulated objects varying in size, shape and material while we tracked a total of 28 markers attached to the object and selected locations on the hand and recorded synchronized video footage from multiple views. From the markers, we derived hand joint poses on which we performed a principal component analysis. Dimensionality of hand poses could be strongly reduced, as five principal components explained 95% of the variance in hand poses. Additionally, we exploited recent advances in hand mesh reconstruction to generate a mesh model of the participant’s hand for every frame. Using intersections between the reconstructed hand and object meshes allowed us to estimate the contact regions between hand and object surfaces. We validated the approach by per-pixel comparisons between hand mesh renderings and video footage from the same camera view. Contact region estimates were validated using objects coated in thermochromic paint which allowed imaging contact regions immediately following grasping. Our measurements reveal large differences between precision-grip and whole-hand grasp poses and contact areas, as well as large and systematic variations in unconstrained grasps across object properties (shape, material) and task.