Hello, and welcome to Paper of the Day (P’oD): Deep Neural Networks are Easily Fooled Edition. Today’s paper is a recent one: A. Nguyen, J. Yosinski and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” at arXiv 1412.1897. The work in this paper is accompanied by a nicely produced video:
This article describes taking state of the art image content recognition systems based on deep neural networks (DNN), and showing how easy it is to fool them with nonsense images. Essentially, the authors submit high performing systems to a simple task: process a set of images, and select the ones that you are most confident represent the classes in which you have been trained. So, for a system trained to recognize handwritten digits zero to nine, random (irregular) images are generated and the image labeled zero with a confidence of at least 99.99 is selected as being zero, and so on. Below are the resulting images from five different runs.
Of course, this salt and pepper noise is unlike any of the training data of the system. So the authors create more “regular” images, and submit the system to the same procedure. Below are the results of five different runs.
When the authors train a new DNN-based system with these fooling images labeled a new class “fooling images”, the resulting systems can still be easily fooled.
The theme is very familiar to me: content recognition systems can behave in ways that contradict conclusions based on their high figures of merit. In other words, classification accuracy is not enough. For DNN research, the work of Nguyen et al. complements the work done by Szegedy et al. (Po’D here). What are these systems really recognizing?
It is not just DNN that can be so fooled. I essentially performed the same experiment as Nguyen et al. for machine music listening systems built using AdaBoost and sparse representation classification: B. L. Sturm, “Two systems for automatic music genre recognition: what are they really recognizing?“, in Proc. MIRUM ’12, 2nd Int. ACM workshop on Music information retrieval with user-centered and multimodal strategies, pp. 69-74, Nov. 2012. I took two state of art music genre recognition systems and had them find among many randomly composed music excerpts those ones they label most confidently (e.g., for AdaBoost only if the posterior probability is at least 0.999). I then performed a listening experiment to see if people could identify the supposed classes of the excerpts. Performance was consistent with chance.
While not as fancy as the video above, here is an edited video of my presentation at MIRUM 2012. The particular experiment above begins around minute 10, but the second experiment is also about fooling the systems by real music (a la Szegedy et al.).
One obvious problem with all of the above is the interpretation of “confidence”. One cannot equate high posteriors of a system with “confidence” as we mean it in the human sense. To say, “a system has a high confidence in its prediction” then has a danger of being misinterpreted. Instead, as Ngyuen et al. point out, “confidence” as embodied in these systems has to do with distances of observations from decision boundaries, and not a “feeling” or “belief” on the part of the system. Our work together is suggesting that distances in these spaces might not be as meaningful as the figures of merit seem to imply.