Hello, and welcome to Paper of the Day (Po’D): Intriguing properties of neural networks edition. Today’s paper is: C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow and R. Fergus, “Intriguing properties of neural networks”, in Proc. Int. Conf. Learning Representations, 2014. Today’s paper is very exciting for me because I see “horses” nearly being called “horses” in a machine learning research domain outside music information retrieval. Furthermore, the arguments that this work is apparently causing resembles what I have received in peer review of my work. For instance, see the comments on this post. Or the reviews here. Some amount of press is also resulting, e.g., ZDnet, Slashdot; and the results of the paper are also being used to bolster the argument that the hottest topic in machine learning is over-hyped.
The one-line precis of this paper is: The deep neural network: as uninterpretable as it ever was; and now acting in ways the contradict notions of generalization.
The authors take several neural net architectures, and create several systems by training them using different image datasets, from handwritten digits, to 10 million images from YouTube. With these systems, they perform two experiments. One experiment addresses the claim that the output of a unit in a layer in a NN provides a “meaningful feature”. For instance, one can find the subset of images in a test dataset where the output of one unit in a layer is exceptionally large, and then infer from these images the properties that they share — such as a diagonal line denoting “2” and “7”. Szegedy et al. look instead for images that produce large values in a random combination of the outputs of all units in a layer, and find the same results. This challenges a claim that units themselves are responsible for recognizing characteristics that are higher-level than their inputs.
The second experiment of the authors is essentially the method of irrelevant transformations.
They take an image correctly classified by the system,
and find a small perturbation of it (essentially additive distortion)
that causes the system to mislabel it.
They call the resulting image an “adversarial example.”
For all their systems, they are able to inflate their error rate to 100%.
(Several of these systems have test error rates of less than 2%.)
What is more, these perturbations are done on the training dataset.
For instance, the figure below shows one example.
The picture on the left is correctly labeled by a system. The picture in the middle is incorrectly labeled by the same system. The magnitude difference between the two pictures is shown on the right.
Clearly at stake now is the sanity of maintaining a position that these systems have learned to generalize objects in images. However, are these systems “horses”? In other words, do their performances in these tasks measured in these datasets come from them using criteria that are irrelevant to the tasks? Are their good performance due to confounds, and the lack of control over independent variables in the experiment? I think Szegedy et al. have yet to do this, but the notions are certainly there. If such systems with 7% error can be so easily fooled, then what is the meaning of that 7% error? This, to me, signals a problem with validity in the evaluation with respect to the basic questions to which answers are sought, e.g., how well has my system learned to recognize “X”?