We use a conventional metric of image difference that is intuitively appealing due its simplicity. This is the Euclidean distance
E, or
L 2 norm. If the images are tri-plane, RGB colored images, as in
Figure 1,
E can be calculated using the following formula:
where
p ni and
q ni are the intensities of the corresponding pixels in the two images, with
i the image plane (
i = 1:3 ∣
R, G, B),
n the pixel (i.e., with unique
x, y coordinate), and
N the number of pixels per image. Euclidean distance has the important property that it defines a straightforward measure of the distance between two images and provides the same answer irrespective of the orthonormal basis used to represent the images, e.g., pixels, Fourier, Haar, etc. (Horn & Johnson,
1990). We are certainly not arguing that the Euclidean distance is the proper
perceptual metric. Rather, we argue that
E is a relatively neutral metric, providing a useful measure for comparing the relative sensitivities to the different types of image transformation shown in
Figure 1. It is widely believed that simple visual discrimination tasks are mediated by filters in the early stages of the visual cortex, for example primate area V1, that are tuned to various orientations and spatial frequencies (DeValois & DeValois,
1991). Under the most simplistic model where we assume that the visual system calculates the differences between images from the differences between the magnitudes of m linear, orthonormal filter responses, the Euclidean distance calculated from the filter responses produces similar answers to that calculated from pixel intensities. We should also emphasize that Euclidean distance is a somewhat unusual metric for describing affine transforms. In a Euclidean pixel space, most affine transforms represent a curved trajectory through the space. Although a monotonic increase in the affine transformation (e.g., a shift to the left) will typically result in a monotonic increase in the Euclidean distance, it is not a simple linear relationship. Therefore, although Euclidean distance is a valid metric of physical distance between two images and is easily calculated, we do not expect it to be an accurate perceptual metric. Indeed, it is the failure of this physical metric which is the core of this study.