Paper of the Day (Po’D): “Why should I trust you?” Edition

Hello, and welcome to the Paper of the Day (Po’D): “Why should I trust you?” Edition. Today’s paper is M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why should I trust you?’: Explaining the predictions of any classifier,” in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016 (http://arxiv.org/abs/1602.04938). Ribeiro et al. is in-line with the Po’D from a few days ago, Chakarov et al. “Debugging machine learning tasks,” CoRR, vol. abs/1603.07292, 2016.

Whereas Chakarov et al. address the problem of finding errors in training data via misclassifications, Ribeiro et al. address the problem of demystifying the cause of a classifier making a specific decision. This paper is timely and strong. My one-line précis of it is: We propose a method that helps one understand and trust the decisions of a classifier, as well as improve the system.

Ribeiro et al. advance a new way (at least to me) of understanding classification systems. They propose “explanations” that qualitatively link the output of the system and the contents of the input. One illustration they provide is in medical diagnostics. A classification system has decided from a patient’s history and symptoms that they have the flu. Ribeiro et al.’s proposed approach identifies the information leading to that classification, e.g., sneezing, aches, patient ID, etc. A doctor working with such information can then make an informed diagnosis. Another example they give is an image classification system. The explanation for a particular decision highlights those parts of an image (superpixels) that are closely tied to the classification. When it highlights irrelevant regions, one then knows something is amiss.

The approach of Ribeiro et al. is called LIME: Local Interpretable Model-agnostic Explanations. (It has a github page! https://github.com/marcotcr/lime) An explanation for the classification of a specific instance comes from building an interpretable model (e.g., decision tree) around that instance. What does that mean?

Take an observation x in some feature domain, which the classifier f labels f(x). For that observation x, LIME forms an “interpretable representation” x’, which is a vector of binary elements related to some meaningful vocabulary, e.g., bag of words. (They way LIME does this is not completely clear to me… and when it does become clear I will update this.) Then, LIME aims to build a new classifier g that maps the domain of the interpretable representation x’ to the range of f such that: 1) it is a good approximation to f around the neighbourhood of x; 2) it is not too complex (a stand-in for “interpretability”). Ribeiro et al. pose this problem in terms of optimising complexity (like the number of regressors) and approximation error over a family of “interpretable” functions, e.g., decision trees of varying depth.

The toy example shown on the github page is illustrative. A classifier f might produce a complex decision boundary from a training dataset, but LIME only seeks to produce a decision boundary that closely approximates f near the point of interest, but a boundary that is interpretable in a domain that is less abstract than that of f.

To make this computationally feasible, LIME approaches it iteratively. It samples an instance around x’ (randomly turning off its non-zero elements), and projects that point z’ to z in the feature domain (how this happens is entirely unclear to me … and when it does become clear I will update this). Then it forms an approximation error, e.g., (f(z) – g(z’))^2. For a whole set of such sampled points {z’}, LIME computes the weighted errors of a collection of interpretable functions {g}. Simultaneously, LIME computes the complexity of these functions as well, e.g., sparsity.

LIME picks the best model by minimising a linear combination of the error and complexity over the set of interpretable functions. In the case of regression, the few non-zero weights of the selected model point to particularly significant elements in the interpretable domain with respect to the classification f(x).

Ribeiro et al. offer great examples, which are detailed in this blog post of the lead author. Even more, they show how this approach to understanding classification systems can lead to improving their generalisation.

I find this work exciting for several reasons. The first is that it supports much of what I have discovered to be the case in a significant amount of research in music informatics (MIR). Many music content analysis systems are unknowingly exploiting aspects of data that are not meaningful with respect to the problems they are designed to solve. And so we don’t know which are solutions and which are not. My current favourite example is the music genre classification system that uses sub-20 Hz information. (Our working hypothesis is that some of observations in the benchmark dataset was collected using recording equipment having identifiable low frequency characteristics, like a modulated DC of a PC microphone in front of a radio playing country music.)

The second reason I like Ribeiro et al. is that it provides a new approach to uncovering “horses”. Ribeiro et al. is not using interventions or system analysis, but instead a proxy system specialised for a specific point under consideration. That is a brilliant and novel approach to me.

The third reason I like Ribeiro et al. is that they clearly show how understanding the behaviour of a system really does lead to improving its generalisation. It also shows classification accuracy and cross-validation can be quite unreliable indicators of generalisation.

With papers like Ribeiro et al., the case grows stronger that your machine learnings may not be learning what you think. We cannot take on faith that a particular dataset has good provenance, or is large enough, or that high accuracy probably reflects good generalisation. Everyone must be wary of “horses”! Keep an eye on the first event of its kind, HORSE 2016!

Advertisements

2 thoughts on “Paper of the Day (Po’D): “Why should I trust you?” Edition

  1. Neat application. I always wondered about the use of sparsity in classifiers, here it is a strong feature. Though I wonder how well this will actually do for interpreting deep nets, or finding weird “album effect” frequency domain effects in MIR.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s