Deep Learning

Lecture 4: Adversarial attacks and defenses

We have seen that (convolutional) neural networks achieve super-human performance on a large variety of tasks.

Soon enough, it seems like:

neural networks will replace your doctor;
neural networks will drive your car;
neural networks will compose the music you listen to.

But is that the end of the story?

.center.width-80[] .center[A recipe for success, or is it?]

Adversarial attacks

Locality assumption

"The deep stack of non-linear layers are a way for the model to encode a non-local generalization prior over the input space. In other words, it is assumed that is possible for the output unit to assign probabilities to regions of the input space that contain no training examples in their vicinity.

It is implicit in such arguments that local generalization—in the very proximity of the training examples—works as expected. And that in particular, for a small enough radius $\epsilon > 0$ in the vicinity of a given training input $\mathbf{x}$, an $\mathbf{x} + \mathbf{r}$ satisfying $||\mathbf{r}|| < \epsilon$ will get assigned a high probability of the correct class by the model."

Adversarial examples

$$\begin{aligned} &\min ||\mathbf{r}||_2 \\ \text{s.t. } &f(\mathbf{x}+\mathbf{r})=y'\\ &\mathbf{x}+\mathbf{r} \in [0,1]^p \end{aligned}$$ where

$y'$ is some target label, different from the original label $y$ associated to $\mathbf{x}$,
$f$ is a trained neural network.

.center.width-100[]

.center[(Left) Original images $\mathbf{x}$. (Middle) Noise $\mathbf{r}$. (Right) Modified images $\mathbf{x}+\mathbf{r}$.
All are classified as 'Ostrich'. (Szegedy et al, 2013)]

Even simpler, take a step along the direction of the sign of the gradient at each pixel: $$\mathbf{r} = \epsilon, \text{sign}(\nabla_\mathbf{x} \ell(y', f(\mathbf{x}))) $$ where $\epsilon$ is the magnitude of the perturbation.

--

.center.width-100[]

.center.width-70[]

Not just for neural networks

Many other machine learning models are subject to adversarial examples, including:

Linear models
- Logistic regression
- Softmax regression
- Support vector machines
Decision trees
Nearest neighbors

Fooling neural networks

.center.width-100[]

.center.width-60[]

One pixel attacks

.center.width-40[]

Universal adversarial perturbations

.center.width-40[]

Fooling deep structured prediction models

.center.width-100[]

.center.width-100[]

.center.width-100[]

Attacks in the real world

]

]

]

Security threat

Adversarial attacks pose a security threat to machine learning systems deployed in the real world.

Examples include:

fooling real classifiers trained by remotely hosted API (e.g., Google),
fooling malware detector networks,
obfuscating speech data,
displaying adversarial examples in the physical world and fool systems that perceive them through a camera.

.center.width-100[]

Adversarial defenses

Defenses

.center.width-100[]

Failed defenses

"In this paper we evaluate ten proposed defenses and demonstrate that none of them are able to withstand a white-box attack. We do this by constructing defense-specific loss functions that we minimize with a strong iterative attack algorithm. With these attacks, on CIFAR an adversary can create imperceptible adversarial examples for each defense.

By studying these ten defenses, we have drawn two lessons: existing defenses lack thorough security evaluations, and adversarial examples are much more difficult to detect than previously recognized."

The end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lecture4.md

lecture4.md

Deep Learning

Adversarial attacks

Locality assumption

Adversarial examples

Not just for neural networks

Fooling neural networks

One pixel attacks

Universal adversarial perturbations

Fooling deep structured prediction models

Attacks in the real world

Security threat

Adversarial defenses

Defenses

Failed defenses

Further readings

Files

lecture4.md

Latest commit

History

lecture4.md

File metadata and controls

Deep Learning

Adversarial attacks

Locality assumption

Adversarial examples

Not just for neural networks

Fooling neural networks

One pixel attacks

Universal adversarial perturbations

Fooling deep structured prediction models

Attacks in the real world

Security threat

Adversarial defenses

Defenses

Failed defenses

Further readings