Machine Learning Algorithms: Adversarial Robust...
As machine learning is applied to increasingly sensitive tasks, and applied on noisier and noisier data, it has become important that the algorithms we develop for ML are robust to potentially worst-case noise.In this class, we will survey a number of recent developments in the study of robust machine learning, from both a theoretical and empirical perspective.Tentatively, we will cover a number of related topics, both theoretical and applied, including:
Machine Learning Algorithms: Adversarial Robust...
Adversarial robustness has been initially studied solely through the lens of machine learning security, but recently a line of work studied the effect of imposing adversarial robustness as a prior on learned feature representations. These works have found that although these adversarially robust models tend to attain lower accuracies than their standardly trained counterparts, their learned feature representations carry several advantages over those of standard models. These advantages include better-behaved gradients (see Figure 3), representation invertibility, and more specialized features. These desirable properties might suggest that robust neural networks are learning better feature representations than standard networks, which could improve the transferability of those features.
Overall, we have seen that adversarially robust models, although being less accurate on the source task than standard-trained models, can improve transfer learning on a wide range of downstream tasks. In our paper, we study this phenomenon in more detail. There, we analyze the effects of model width and robustness levels on the transfer performance, and we compare adversarial robustness to other notions of robustness. We also uncover a few somewhat mysterious properties: for example, resizing images seems to have a non-trivial effect on the relationship between robustness and downstream accuracy.
Finally, our work provides evidence that adversarially robust perception models transfer better, yet understanding precisely what causes this remains an open question. More broadly, the results we observe indicate that we still do not yet fully understand (even empirically) the ingredients that make transfer learning successful. We hope that our work paves the way for more research initiatives to explore and understand what makes transfer learning work well.
Machine learning has advanced radically over the past 10 years, and machine learning algorithms now achieve human-level performance or better on a number of tasks, including face recognition,31 optical character recognition,8 object recognition,29 and playing the game Go.26 Yet machine learning algorithms that exceed human performance in naturally occurring scenarios are often seen as failing dramatically when an adversary is able to modify their input data even subtly. Machine learning is already used for many highly important applications and will be used in even more of even greater importance in the near future. Search algorithms, automated financial trading algorithms, data analytics, autonomous vehicles, and malware detection are all critically dependent on the underlying machine learning algorithms that interpret their respective domain inputs to provide intelligent outputs that facilitate the decision-making process of users or automated systems. As machine learning is used in more contexts where malicious adversaries have an incentive to interfere with the operation of a given machine learning system, it is increasingly important to provide protections, or "robustness guarantees," against adversarial manipulation.
The modern generation of machine learning services is a result of nearly 50 years of research and development in artificial intelligencethe study of computational algorithms and systems that reason about their environment to make predictions.25 A subfield of artificial intelligence, most modern machine learning, as used in production, can essentially be understood as applied function approximation; when there is some mapping from an input x to an output y that is difficult for a programmer to describe through explicit code, a machine learning algorithm can learn an approximation of the mapping by analyzing a dataset containing several examples of inputs and their corresponding outputs. The learning proceeds by defining a "model," a parametric function describing the mapping from inputs to outputs. Google's image-classification system, Inception, has been trained with millions of labeled images.28 It can classify images as cats, dogs, airplanes, boats, or more complex concepts on par or improving on human accuracy. Increases in the size of machine learning models and their accuracy is the result of recent advancements in machine learning algorithms,17 particularly to advance deep learning.7
One focus of the machine learning research community has been on developing models that make accurate predictions, as progress was in part measured by results on benchmark datasets. In this context, accuracy denotes the fraction of test inputs that a model processes correctlythe proportion of images that an object-recognition algorithm recognizes as belonging to the correct class, and the proportion of executables that a malware detector correctly designates as benign or malicious. The estimate of a model's accuracy varies greatly with the choice of the dataset used to compute the estimate. The model's accuracy is generally evaluated on test inputs that were not used during the training process. The accuracy is usually higher if the test inputs resemble the training images more closely. For example, an object-recognition system trained on carefully curated photos may obtain high accuracy when tested on other carefully curated photos but low accuracy on photos captured more informally by mobile phone users.
Machine learning has traditionally been developed following the assumption that the environment is benign during both training and evaluation of the model. Specifically, the inputs x are usually assumed to all be drawn independently from the same probability distribution at both training and test time. This means that while test inputs x are new and previously unseen during the training process, they at least have the same statistical properties as the inputs used for training. Such assumptions have been useful for designing effective machine learning algorithms but implicitly rule out the possibility that an adversary could alter the distribution at either training time or test time. In this article, we focus on a scenario where an adversary chooses a distribution at test time that is designed to be exceptionally difficult for the model to process accurately. For example, an adversary might modify an image (slightly) to cause it to be recognized incorrectly or alter the code of an executable file to enable it to bypass a malware detector. Such inputs are called "adversarial examples"30 because they are generated by an adversary.
To simplify our presentation in this article, we focus on machine learning algorithms that perform "classification," learning a mapping from an input x to a discrete variable y where y represents the identity of a class. As a unifying example, we discuss road-sign image recognition; the different values of y correspond to different types of road signs (such as stop signs, yield signs, and speed limit signs). Examples of input images and expected outputs are shown in Figure 1. Though we focus on image classification, the principles of adversarial machine learning apply to much more general artificial intelligence paradigms (such as reinforcement learning).12
Anatomy of a machine learning task. A machine learning algorithm is expected to produce a model capable of predicting the correct class of a given input. For instance, when presented with an image of a STOP sign, the model should output the class designating "STOP." The generic strategy adopted to produce such a model is twofold: a family of parameterized representations, the model's architecture, is selected, and the parameter values are fixed.
The machine learning pipeline. Machine learning models are frequently deployed as part of a data pipeline; inputs to the model are derived from a set of preprocessing stages, and outputs of the model are used to determine the next states of the overall system.23 For example, our running example of a traffic-sign classifier could be deployed in an autonomous vehicle, as illustrated in Figure 3. The model would be given as inputs images captured by a camera monitoring the side of the road and coupled with a detection mechanism for traffic signs. The class predicted by the machine learning model could then be used to decide what action should be taken by the vehicle (such as come to a stop if the traffic sign is classified as a "STOP" sign).
Attacking the system. As outlined earlier in this article, most machine learning models are designed, at least partially, based on the assumption that the data at test time is drawn from the same distribution as the training data, an assumption that is often violated. It is common for the accuracy of the model to degrade due to some relatively benign change in the distribution of test data. For example, if a different camera is used at test time from the one used to collect training images, the trained model might not work well on the test images. More important, this phenomenon can be exploited by adversaries capable of manipulating inputs before they are presented to the machine learning model. The adversary's motivation for "controlling" the model's behavior this way stems from the implications of machine learning predictions on consequent steps of the data pipeline.23 In our running example of an autonomous vehicle, an adversary capable of crafting STOP signs classified as "yield" signs may cause the autonomous vehicle to disobey traffic laws and potentially cause an accident. Machine learning is also applied to other sensitive domains (such as financial fraud3 and malware detection1) where adversarial incentives to have the model mis-predict are evident. 041b061a72