Abstract
How does what we say reflect what we see? A powerful approach to representing objects is for the mind to encode them according to their shortest possible “description length”. Intriguingly, such information-theoretic encoding schemes often predict a non-linear relationship between an image’s “objective” complexity and the actual resources devoted to representing it, because excessively complex stimuli might have simple underlying explanations (e.g. if they were generated randomly). How widely are such schemes implemented in the mind? Here, we explore a surprising relationship between the perceived complexity of images and the complexity of spoken descriptions of those images. We generated a library of visual shapes, and quantified their complexity as the cumulative surprisal of their internal skeletons — essentially measuring the amount of information in the objects. Subjects then freely described these shapes in their own words, producing more than 4000 unique audio clips. Interestingly, we found that the length of such spoken descriptions could be used to predict explicit judgments of perceived complexity (by a separate group of subjects), as well as ease of visual search in arrays containing simple and complex objects. But perhaps more surprisingly, the dataset of spoken descriptions revealed a striking quadratic relationship between the objective complexity of the stimuli and the length of their spoken descriptions: Both low-complexity stimuli and high-complexity stimuli received relatively shorter verbal descriptions, with a peak in spoken description length occurring for intermediately complex objects. Follow-up experiments went beyond individual objects to complex arrays that varied in how visually grouped or random they were, and found the same pattern: Highly grouped and highly random arrays were tersely described, while moderately grouped arrays garnered the longest descriptions. The results establish a surprising connection between linguistic expression and visual perception: The way we describe images can reveal how our visual systems process them.
Acknowledgement: JHU Science of Learning Institute