Unlike the contrast perception example of the previous section, the SSIM model is too complex for us to solve for the MAD stimulus pairs analytically. But it is differentiable, and thus allows an alternative approach based on iterative numerical optimization, as illustrated in
Figure 7. First, an initial distorted image is generated by adding a random vector in the image space to the reference image. Now consider a level set of
M 1 (i.e., set of all images having the same value of
M 1) as well as a level set of
M 2, each containing the initial image. Starting from the initial image, we iteratively move along the
M 1 level set in the direction in which
M 2 is maximally increasing/decreasing. The iteration continues until a maximum/minimum
M 2 image is reached.
Figure 7 also demonstrates the reverse procedure for finding the maximum/minimum
M 1 images along the
M 2 level set. The maximally increasing/decreasing directions may be computed from the gradients of the two image quality metrics, as described in
3. This gradient descent/ascent procedure does not guarantee that we will reach the global minimum/maximum on the level set (i.e., we may get “stuck” in a local minimum). As such, a negative result (i.e., the two images are indiscriminable) may not be meaningful. Nevertheless, a positive result may be interpreted unambiguously.
Figure 8 shows an example of this image synthesis process, where the intensity range of the reference image is [0, 255] and the initial image A was created by adding independent white Gaussian noise with MSE = 1024. Visual inspection of the images indicates that both models fail to capture some aspects of perceptual image quality. In particular, images B and C have the same MSE with respect to the reference original image (top left). But image B has very high quality, while image C poorly represents many important structures in the original image. Thus, MSE is clearly failing to provide a consistent metric for image quality. On the other hand, image D and image E have the same SSIM values. Although both images have very noticeable artifacts, the distortions in image E are concentrated in local regions but extremely noticeable, leading to subjectively lower overall quality than image D. Computer animations of MAD competition between MSE and SSIM can be found at
http://www.ece.uwaterloo.ca/~z70wang/research/mad/.