Abstract
INTRODUCTION. Why do people associate spiky shapes to the word “Kiki” and rounded shapes to the word “Bouba”? One hypothesis suggests that our mouths form angular shapes while saying “Kiki” and rounded shapes while saying “Bouba”. Alternatively, sharp objects produce high-pitched (“Kiki”-like) sounds while rounded objects produce low-pitched (“Bouba”-like) sounds when struck. We investigated these hypotheses using behavioural and computational experiments. METHODS. Ten spiky and ten rounded shapes were created. In Experiments 1 & 2, subjects had to associate each shape to a Kiki-like or Bouba-like word. In Experiment 3 & 4, audio clips were played. Subjects had to associate them with spiky or rounded shapes. The audio clips included pronounceable Kiki-Bouba words; their digitally-reversed counterparts; pure tones with mean frequencies matched to the Kiki-Bouba words and sounds generated when objects were struck. RESULTS. In Experiments 1 & 2, 80% subjects associated spiky and rounded shapes with Kiki-like and Bouba-like words respectively. In Experiments 3 & 4, subjects showed the Kiki-Bouba effect for pronounceable words, reversed words, natural sounds and pure tones. The sound’s mean frequency was positively correlated with the fraction of times it was classified as Kiki-like. To assess whether these associations can be learnt, we took neural networks trained for object recognition (VGG-16) and asked if we could predict the sound spectrum from the object shape on a database containing natural objects and their associated sounds (34 natural objects, 7 materials, falling on 5 surfaces). After training this model, we queried it for the predicted sound spectra for Kiki-like and Bouba-like shapes. The mean frequency of the predicted spectra was strongly correlated with the Kiki-ness of these shapes (r= 0.8664, p = 7.87e-07). CONCLUSIONS. We conclude that the Kiki-Bouba effect is a consequence of natural associations that can be learnt between objects and their associated sounds.