Abstract
On January 6, 2021 a mob attacked the U.S. Capitol seeking to stop confirmation of the presidential election; in the aftermath, five lay dead. Some insurrectionists were recognizable from photos of that day, but others were masked or obscured. Attempting to aid in identifying (or exonerating) photographed suspects, we ask how accurately physical attributes – height and weight – can be estimated from a single photograph. Volunteers (n=58) had their height and weight measured, and then photographed in neutral and dynamic poses in a studio with no surrounding structures; and in a hallway surrounded by familiar structures (doorway, stairs). Study participants (n=325) recruited from Mechanical Turk were asked to estimate the height/weight of the volunteers depicted in 58 photos. The median absolute height/weight error in the studio is 8.4cm/9.1kg, and 6.4cm/7.5kg in the hallway. Studio and hallway accuracy pooled across 20 respondents improved to 5.5cm/6.7kg and 3.9cm/5.8kg. By comparison, a state-of-the-art, deep-learning based, computer-vision system was used to estimate the 3D body pose and shape (scaled to be consistent with a gender-specific, inter-pupillary distance) from which height/weight was estimated. The median studio and hallway error of 7.3cm/8.0kg and 5.0cm/10.4kg is smaller than individuals, larger than pooled responses, but – with the exception of pooled weight – not statistically so. Lastly, ten recruited, licensed photogrammetrists were statistically less accurate at height estimation than our pooled participants and no different at weight estimation. Pooled non-expert human estimates of physical attributes from a photo are surprisingly accurate, even in the absence of reference objects. It is not immediately obvious how these estimates are made, but naive observers remain the most accurate way to estimate basic physical attributes from a single photograph.