Abstract
Measuring the subjective similarity of stimuli—for example, the visual impression of materials or the categories in object images—can be achieved through multidimensional scales. These scales represent each stimulus as a point, with inter-point distances reflecting similarity measurements from behavioral experiments. An intuitive task used in this context is the ordinal triplet comparison: ”Is stimulus i more similar to stimulus j or k?”. Modern ordinal embedding algorithms infer the (metric) scale from a subset of (ordinal) triplet comparisons, remaining robust to observer response noise. However, the unknown residual errors raise concerns about interpreting the scale’s exact shape and whether additional data may be necessary or helpful. These observations demand an examination of the scale’s stability. Here, we present an approach to visualize the variation of comparison-based scales via bootstrapping techniques and a probabilistic model. Simulation experiments demonstrate that variation is broadly captured by an ensemble of scales estimated from resampled trials. However, common methods to align the ensemble parts in size, rotation, and translation can distort the local variation. For example, standardization results in zero variation at some points but bloats variation at others, while Procrustes analysis leads to uniform “noise” distribution across all scale points. To address this, we propose a probabilistic model to identify the variation at individual scale points. In essence, we ”wiggle” scale points to observe changes in triplet correspondence, indicating their stability level. These localized estimates are combined in the ensemble to provide a robust measure of variation. Simulations validate our approach, while case studies on behavioral datasets emphasize the practical relevance. In these case studies, we visualize perceptual estimates through regions instead of points and identify the most variable stimuli or trials. Beyond data analysis, our stability measures enable further downstream tasks like adaptive trial selection to expedite experiments.