Although outperforming almost all alternative approaches on many vision tasks, CNNs are surprisingly sensitive to barely visible perturbations of the input images (
Szegedy et al., 2013). An adversarial attack on a classifier function
\(f\) adds a noise pattern
\(\mathbf {\eta }\) to an input image
\(\mathbf {x}\) so that
\(f(\mathbf {x} + \mathbf {\eta })\) does not return the correct class
\(y=f(\mathbf {x})\). Furthermore, the attacker ensures that some
\(p\)-norm of
\(\mathbf {\eta }\) does not exceed
\(\epsilon\). In many cases, including this work, the infinity-norm is chosen, and the
\(\epsilon\) values are in the set
\(\lbrace {1}/{255}, {2}/{255}, ...\rbrace\). Thus, for example, for
\(\epsilon = {1}/{255}\), each 8-bit pixel value is at most altered by adding or subtracting the value 1.
Goodfellow et al. (2014) argue that the main reason for the sensitivity to adversarial examples is due to the linearity of CNNs: With a high-dimensional input, one can substantially change a linear neuron’s output, even with small perturbations. Consider the output of an LN-neuron for an input
\(\mathbf {x}\) with dimension
\(n\) perturbed by
\(\mathbf {\eta }\). We choose
\(\mathbf {\eta }\) to be the sign function of the weight vector multiplied with
\(\epsilon\):
\(\mathbf {\eta }=sign(\mathbf {w}) \cdot \epsilon\). Thus,
\(\mathbf {\eta }\) roughly points in the direction of the optimal stimulus (which is also the gradient), but its infinity-norm does not exceed
\(\epsilon\). Assuming that the mean absolute value of
\(\mathbf {w}\) is
\(m\),
\(f_{LN}(\mathbf {\eta })\) is approximately equal to
\(\epsilon n m\). Accordingly, a significant change of the LN-neuron’s output can be achieved by a small
\(\epsilon\) value if the input dimension
\(n\) is large, which is the case for many vision-related tasks. This gradient-ascent method can also be applied to nonlinear neurons. Within a local region, the output of almost any function
\(f\) can be approximated by a linear function. To optimally increase the output, the input needs to be moved along the gradient direction. The fast gradient sign method (FGSM;
Goodfellow et al., 2014) perturbs the original input image
\(\mathbf {x}\) by adding
\(\mathbf {\eta }=\epsilon sign(\nabla f(\mathbf {x}))\). Another approach is to define
\(\mathbf {\eta }\) to be the gradient times a positive step size
\(\tau\) followed by clipping to
\(\eta \in [-\epsilon , +\epsilon ]^{n}\). The clipped iterative gradient ascent (CIGA) greedily moves along the direction of the highest linear increase,
\begin{eqnarray}
\begin{array}{@{}r@{\;}c@{\;}l@{}}
\mathbf {\eta }_{0} &=& \mathbf {0}; \tau \gt 0\\
\mathbf {q}_{i+1} &=& \mathbf {\eta }_{i} + \tau \nabla f(\mathbf {x} + \mathbf {\eta }_{i})\\
\mathbf {\eta }_{i+1}^{j} &=& \min (\max (q_{i+1}^{j}, -\epsilon ), \epsilon ),\\
\end{array} \qquad
\end{eqnarray}
with
\(q_{i}^{j}\) being the
jth entry of the unbounded result
\(\mathbf {q}_{i}\) at the
ith iteration step. In the following, we use CIGA in our illustrations of the principle, and in our experiments, we employ FGSM as it is a widely recognized adversarial attack method. When regarding an iso-response contour plot, one can easily spot the direction of the gradient, which is orthogonal to an iso-response contour (
Paiton et al., 2020). In
Figure 9 on the left, the gradient for an LN-neuron is parallel to the optimal stimulus (black line). As long as the initial input yields a nonzero gradient, each step of CIGA maximally increases the LN-neuron output. Thus, the algorithm’s effectiveness is only bounded by
\(\epsilon\) but widely independent of the initial input
\(\mathbf {x}\). For a step size larger than
\(\epsilon\), CIGA finds the optimal solution in one step. We now investigate the effects of CIGA on a simplified version of an FP-neuron:
\begin{equation}
F(\mathbf {x}) = \mathbf {x}^T \mathbf {v} \mathbf {g}^T \mathbf {x}.
\end{equation}
Note that in the following particular example, the input is chosen to yield nonnegative projections on
\(\mathbf {v}\) and
\(\mathbf {g}\); thus, we can remove the ReLUs. The resulting gradient is
\begin{equation}
\nabla _{F}(x) = (\mathbf {v}^{T}\mathbf {x}) \mathbf {g} + (\mathbf {g}^{T}\mathbf {x}) \mathbf {v}.
\end{equation}
The effectiveness of an iteration step strongly depends on the current position. The highest possible increase would be obtained along the line defined by the optimal stimulus. In
Figure 9 on the right, this is the black line. If the initial input is located on this line, any step in the gradient direction yields an optimal increase of the FP-neuron output. However, for any other position with a nonzero gradient, an unbounded iteration step would move toward the optimal stimulus line. The blue curve in
Figure 9 shows the path for several iterations of CIGA: Starting above the optimal stimulus line, each step slowly converges to the optimal stimulus line, eventually moving almost parallel to it. Once the
\(\epsilon\) threshold of 1 is reached in the horizontal dimension, the (now bounded) path runs parallel to the vertical dimension to increase the neuron output further. The optimal solution is found once the
\(\epsilon\) bound is also reached in the vertical dimension. The important difference when comparing with LN-neurons is that there are numerous conditions (depending on
\(\tau\),
\(\mathbf {x}\),
\(\gamma\), and
\(\epsilon\)) where CIGA would need several steps to find an optimal solution. This reduced effectiveness of the gradient ascent illustrates why hyperselective neurons are more robust against adversarial attacks; for example, if
\(\epsilon\) is too small, or
\(\tau\) is chosen poorly, or with too few iterations, an attack might not increase the FP-neuron output by much. Note that single neurons are usually not the target of adversarial attacks; instead, the gradient is determined on the classification loss function. Still, the argument holds that hyperselective neurons are harder to activate than LN-neurons, resulting in an increased robustness.