Second, we looked at how well participants agreed with each other by calculating mean correlation coefficients between ratings of all participants per stimulus, resulting in a correlation of
r = 0.26 for softness ratings, and
r = 0.21 for weight ratings. As a more formal measure, we also calculated intraclass correlation (
ICC;
McGraw & Wong, 1996). ICC measures the reliability of ratings and can take values between 0 (low similarity between ratings) and 1 (high similarity between ratings). We report ICC(C,1) for consistency between ratings (i.e., relative agreement) and ICC(A,1) for absolute agreement between ratings (
McGraw & Wong, 1996). For softness ratings, we obtained ICC(C,1) = 0.28 and ICC(A,1) = 0.22, both of which were significantly different from zero (F[478, 311.49] = 6.71,
p < 0.001, and F[478, 6692] = 6.71,
p < 0.001, respectively). For weight ratings, we obtained ICC(C,1) = 0.12 and ICC(A,1) = 0.10, both of which were significantly different from zero (F[478, 6692] = 3.08,
p < 0.001, and F[478, 766.09] = 3.08,
p < 0.001, respectively). Together, this shows that even though there was substantial variation among participants, they also agreed with each other in their softness and weight ratings above chance—showing that there was truly some effect of shape on perceived softness and weight. At the same time, there was less agreement among participants for weight ratings, suggesting that softness from shape was more immediate to participants compared to weight from shape.