Abstract
In this abstract, we propose an adaptation to the area under the curve (AUC) metric to measure the adversarial robustness of a model performing an object classification task over a particular epsilon-interval (interval of adversarial perturbation strengths) that facilitates comparisons across models when they have different initial performance under the Fast Gradient Sign Method (FGSM) attack [Goodfellow et al. 2014]. This can be used to determine how adversarially sensitive a model is to different image distributions (or some other variable); and/or to measure how robust a model is comparatively to other models. We used this adversarial robustness metric on MNIST, CIFAR-10, and a Fusion dataset (CIFAR-10 + MNIST) where trained models performed either a digit or object recognition task using a LeNet, ResNet50, or a fully connected network (FullyConnectedNet) architecture and found the following: 1) CIFAR-10 models are more adversarially sensitive than MNIST models when attacked using FGSM; 2) Pretraining with a different image distribution and task sometimes carries over the adversarial sensitivity induced by that image distribution and task in the resultant model; 3) Both the image distribution and task that a model is trained on can affect the adversarial sensitivity of the resultant model. Collectively, our results imply non-trivial differences of the learned representational space of one perceptual system over another given its exposure to different image statistics or tasks (mainly objects vs digits). Moreover, these results hold even when model systems are equalized to have the same level of performance, or when exposed to approximately matched image statistics of fusion images but with different tasks. Altogether, our empirical analysis of the adversarial sensitivity of machine vision systems provides insights into understanding the interplay between the learned task, the computations in a network/system, and image structure for the general goal of object recognition.