Abstract
Purpose: Assessing how well one image matches another forms a critical component of many image analysis operations. Two norms commonly used for this purpose are L1 and L2, which are specific instances of the LP family, or Minkowski metric. These metrics are often used interchangeably, as there is typically not a principled reason for choosing one over the other. Given that the purpose of most image analysis operations is to maximize image quality for a human observer, it stands to reason that the human visual system is the “gold standard" of image similarity. The goal of these experiments is to determine whether the L1 or L2 metric produces results more congruent with human notions of image similarity.
Methods: In two experiments we sought to determine whether observers preferred image matches derived using either the L1 or L2 metric. Subjects were asked to decide which of two images more closely matched a third, where the two images were chosen from a reference set of images to be most similar to the third by the L1 and L2 metrics. The first experiment used images created by vector quantization (with L1 or L2 as the distortion function) that were matched for semantic content. The second experiment used images that were abstract, with no high level meaning.
Results: There was a small but very significant preference for the L1 metric in both experiments. In each experiment 54% of all responses indicated that the L1 match was more similar to the original image, and 11 of 12 subjects selected the L1 match on more than half the trials.
Conclusions: These results suggest that the L1 metric may better capture human notions of image similarity. This gives a principled reason for choosing the L1 metric rather than the L2 metric for use in applications related to the retrieval, manipulation, and compression of natural images. Further analysis of results provides preliminary clues about why the L1 metric may correspond better with human notions of similarity.