New Research in Adversarial Machine Learning
Last fall, a team of researchers with the Lab’s Ivan Evtimov, Earlence Fernandes, and Co-Director Yoshi Kohno shared research on ArXiv showing that malicious alterations to real world objects could cause devices to “misread” the image. Specifically, the team tricked an object classifier, like those present in self-driving cars, into misidentifying a stop sign as a speed limit sign.
Now, the team of researchers from Samsung Research America, Stanford University, Stony Brook University, University of California Berkeley, University of Michigan, and University of Washington have presented two papers updating this research – one at Computer Vision and Pattern Recognition (CVPR) 2018 and another at the 12th USENIX Workshop on Offensive Technologies (WOOT) 2018.
Robust Physical-World Attacks on Deep Learning Visual Classification
At CVPR 2018, the team presented an updated version of the paper we shared last fall in “Robust Physical-World Attacks on Deep Learning Visual Classification”:
Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations. Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm, Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. With a perturbation in the form of only black and white stickers, we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8% of the captured video frames obtained on a moving vehicle (field test) for the target classifier.
Robust Physical-World Attacks on Deep Learning Visual Classification. Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, Dawn Song. Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT (supersedes arXiv:1707.08945), June 2018.
Physical Adversarial Examples for Object Detectors
At WOOT 18, the researchers presented new research in their paper “Physical Adversarial Examples for Object Detectors.” In this paper, they expand on their previous work by examining the possibility of fooling object detection modules by real-world alterations to objects, a broader class of algorithms than their previous work examined.
Deep neural networks (DNNs) are vulnerable to adversarial examples—maliciously crafted inputs that cause DNNs to make incorrect predictions. Recent work has shown that these attacks generalize to the physical domain, to create perturbations on physical objects that fool image classifiers under a variety of real-world conditions. Such attacks pose a risk to deep learning models used in safety-critical cyber-physical systems.
In this work, we extend physical attacks to more challenging object detection models, a broader class of deep learning algorithms widely used to detect and label multiple objects within a scene. Improving upon a previous physical attack on image classifiers, we create perturbed physical objects that are either ignored or mislabeled by object detection models. We implement a Disappearance Attack, in which we cause a Stop sign to “disappear” according to the detector—either by covering the sign with an adversarial Stop sign poster, or by adding adversarial stickers onto the sign. In a video recorded in a controlled lab environment, the state-of-the-art YOLO v2 detector failed to recognize these adversarial Stop signs in over 85% of the video frames. In an outdoor experiment, YOLO was fooled by the poster and sticker attacks in 72.5% and 63.5% of the video frames respectively. We also use Faster R-CNN, a different object detection model, to demonstrate the transferability of our adversarial perturbations. The created poster perturbation is able to fool Faster R-CNN in 85.9% of the video frames in a controlled lab environment, and 40.2% of the video frames in an outdoor environment. Finally, we present preliminary results with a new Creation Attack, wherein innocuous physical stickers fool a model into detecting nonexistent objects.
Physical Adversarial Examples for Object Detectors. Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Florian Tramer, Atul Prakash, Tadayoshi Kohno, Dawn Song. 12th USENIX Workshop on Offensive Technologies (WOOT 2018), Baltimore, MD (arXiv:1807.07769) (supersedes arXiv:1712.08062), August 2018