Robust Physical-World Attacks on Deep Learning Visual Classification


Summary

Although deep neural networks (DNNs) perform well in a variety of applications, they are vulnerable to adversarial examples resulting from small-magnitude perturbations added to the input data. Inputs modified in this way can be mislabeled as a target class in targeted attacks or as a random class different from the ground truth in untargeted attacks. However, recent studies have demonstrated that such adversarial examples have limited effectiveness in the physical world due to changing physical conditions—they either completely fail to cause misclassification or only work in restricted cases where a relatively complex image is perturbed and printed on paper. In this paper, we propose a general attack algorithm—Robust Physical Perturbations (RP2)— that takes into account the numerous physical conditions and produces robust adversarial perturbations. Using a real-world example of road sign recognition, we show that adversarial examples generated using RP2 achieve high attack success rates in the physical world under a variety of conditions, including different viewpoints. Furthermore, to the best of our knowledge, there is currently no standardized way to evaluate physical adversarial perturbations. Therefore, we propose a two-stage evaluation methodology and tailor it to the road sign recognition use case. Our methodology captures a range of diverse physical conditions, including those encountered when images are captured from moving vehicles. We evaluate our physical attacks using this methodology and effectively fool two road sign classifiers. Using a perturbation in the shape of black and white stickers, we attack a real Stop sign, causing targeted misclassification in 100% of the images obtained in controlled lab settings and above 84% of the captured video frames obtained on a moving vehicle for one of the classifiers we attack.


FAQ

  • Did you attack a real self-driving car? 
    • No.
  • Okay, what did you attack?
    • We attacked a deep neural network-based classifier for U.S. road signs. A classifier is a neural network (in the context of our work) that interprets road signs. A car would potentially use a camera to take pictures of road signs, crop them, and then feed them into a road sign classifier. We did not attack object detectors — a different type of machine learning model that analyzes an image of the entire scene and detects the signs and their labels without cropping. Object detection is a very different machine learning problem and presents different challenges for attackers. 
      To the best of our knowledge, there is currently no publicly available classifier for U.S. road signs. Therefore, we trained a network on the LISA dataset, a U.S. sign dataset comprised of different road signs like Stop, Speed Limit, Yield, Right Turn, Left Turn, etc. This model consists of three convolutional layers followed by a fully connected layer and was originally developed as part of the Cleverhans library. Our final classifier accuracy was 91% on the test dataset.
  • What are your findings?
    • We show that it is possible to construct physical modifications to road signs, in ways that cause the trained classifier (discussed above) to misinterpret the meaning of the signs. For example, we were able to trick the classifier into interpreting a Stop sign as a Speed Limit 45 sign, and a Turn Right sign as either a Stop or Added Lane sign. Our physical modifications for a real Stop sign are a set of black and white stickers. See the resources section below for examples.
  • What resources does an attacker need?
    • An attacker needs a color printer for sticker attacks, and a poster printer for poster-printing attacks. The attacker would also need a camera to take an image of the sign he wishes to attack.
  • Who is a casual observer and why do these modifications to road signs not raise suspicion?
    • A casual observer is anyone in the street or in vehicles. Our algorithm produces perturbations that look like graffiti. As graffiti is commonly seen on road signs, it is unlikely that casual observers would suspect that anything is amiss.
  • Based on this work, are current self-driving cars at risk?
    • No. We did not attack a real self-driving car. However, our work does serve to highlight potential issues that future self-driving car algorithms might have to address. A more complete attack on a self-driving car would have to target the entire control pipeline that includes many more steps in addition to classification. One such part of the pipeline, which is out of the scope of our work, is the detection of objects, that is the identification of the region of an image taken by a car camera where some type of road sign is to be found. We focus our efforts on attacking classifiers using physical object modifications. We focus on classifiers because they are commonly studied in the context of doing research on adversarial examples. Although it is unlikely that our attacks on classifiers would attack detectors “out of the box,” it is highly possible that future work will examine and find robust attacks on object detectors, in a similar vein to our work on attacking classifiers.
  • Should I stop using the autonomous features (parking, freeway driving) of my car? Or is there any immediate concern?
    • We again stress that our attack was crafted for the trained neural network discussed above. As it stands today, this attack would most likely not work as-is on existing self-driving cars.
  • By revealing this vulnerability, aren’t you helping potential hackers?
    • No—on the contrary, we are helping manufacturers and users to address potential problems before hackers can take advantage. As computer security researchers, we are interested in identifying the security risks of emerging technologies, with the goal of helping improve the security of future versions of those technologies. The security research community has found that evaluating the security risks of a new developing technology makes it much easier to confront and address security problems before adversarial pressure manifests. One example has been the modern automobile and another, the modern smart home. In both cases, there is progress toward better security. We hope that our results start a fruitful conversation on securing cyber-physical systems that use neural nets for making important control decisions.
  • Are you doing demos or interviews?
    • As our work is in progress, we are currently focused on improving and fine-tuning the scientific techniques behind our initial results. We created this FAQ in response to the unanticipated media interest and to answer questions that have arisen in the meantime. In the future, we may upload video demonstrations of the attack, and may accept interview invitations. For the time being, we have uploaded our experimental attack images on this website.
  • Whom should we contact if we have more questions?
    • We are a team of researchers at various institutions. Please see below for a list of team members and institutions involved in the project. In order to streamline communication, we have created an alias that reaches all team members. We strongly recommend that you contact [email protected] if you have further questions.

Example Drive-By Test Video

Abstract Art Attack on LISA-CNN

The left-hand side is a video of a perturbed Stop sign, the right-hand side is a video of a clean Stop sign. The classifier (LISA-CNN) detects the perturbed sign as Speed Limit 45 until the car is very close to the sign. At that point, it is too late for the car to reliably stop. The subtitles show the LISA-CNN classifier output.

Subtle Poster Attack on LISA-CNN

The left-hand side is a video of a true-sized Stop sign printout (poster paper) with perturbations covering the entire surface area of the sign. The classifier (LISA-CNN) detects this perturbed sign as a Speed Limit 45 sign in all tested frames. The right-hand side is the baseline (a clean poster-printed Stop sign). The subtitles show LISA-CNN output.

Research Paper

Download PDF

When referring to our work, please cite it as:

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, Dawn Song 
Robust Physical-World Attacks on Deep Learning Visual Classification 
Computer Vision and Pattern Recognition (CVPR 2018) (supersedes arXiv preprint 1707.08945, August 2017)

or, use BibTeX for citation:

@InProceedings{roadsigns17,
     author = {Kevin Eykholt and Ivan Evtimov and Earlence Fernandes and Bo Li and Amir Rahmati and Chaowei Xiao and Atul Prakash and Tadayoshi Kohno and Dawn Song},
     title = {{Robust Physical-World Attacks on Deep Learning Visual Classification}},
     booktitle = {Computer Vision and Pattern Recognition (CVPR)},
     month = June,
     year = 2018
     }
                

Resources: Experimental Attack Images

We have made a sampling of our experimental attack images available as a zip file (around 25MB). Click here to download. Google Drive link to datasets we used in our attacks (validation set for attack on a coffee mug, victim set for the coffee mug attack, US stop signs for validation, etc).   Permission is granted to use and reproduce the images for publications or for research with acknowledgement to the CVPR 2018 paper.

Code & Tools

Attack Code on GitHub 


Team

Kevin Eykholt, Ph.D. Candidate, University of Michigan
Ivan Evtimov, Ph.D. Candidate, University of Washington
Earlence Fernandes, Postdoctoral Researcher, University of Washington
Bo Li, Postdoctoral Researcher, University of California Berkeley
Amir Rahmati, Professor, Stony Brook University
Chaowei Xiao, Ph.D. Candidate, University of Michigan
Atul Prakash, Professor, University of Michigan
Tadayoshi Kohno, Professor, University of Washington
Dawn Song, Professor, University of California Berkeley

Acknowledgements