Abstract
To recognize an object, we detect and bind the features it is made of. We also merge information across the senses into a coherent percept of our external environment. In general, how well do we combine information from several sources, be they features, cues, or sensory modalities? Building on the classic efficiency approach (Tanner and Birdsall, 1958), here we introduce a “relative efficiency” paradigm to assess binding. We measure the energy threshold as a function of object extent (a word) or for a combination as opposed to each component alone (audio and visual). Efficient binding has a fixed energy threshold, independent of length or distribution among modalities. Inefficient binding requires more energy as length or number of modalities increases. Our results reveal an amazing dichotomy. Energy is integrated inefficiently within each modality: Observers need more energy to recognize longer words, whether seen or heard. However, text and speech summate perfectly: Observers require the same overall energy, irrespective of its distribution across eye and ear. Thus, to see and hear a word, we inefficiently combine features but efficiently combine streams.
M.D. is supported by post-doctoral fellowships from the Belgian Fonds National de la Recherche Scientifique and the Belgian American Educational Foundation Inc., USA. Supported by NIH grant R01-DC05660 to D.P. and R01-EY04432 to D.G.P.