Hello, and welcome to my Po’D (Paper of the day): Explaining explainations edition. Today’s paper contributes a survey of explainable AI (XAI): L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, L. Kagal, “Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning”, eprint arXiv:1806.00069, May 2018. My interest in this area stems from: 1) horses; 2) ethics; and 3) the development of new methods of research and development in my field.
My one line precis of this paper: To address mistrust of AI, XAI should focus on interpretability and explainability.
Explainable AI (XAI) is exactly what it says: building systems that “explain” their learned behaviors, or at least are “transparent” enough to “interpret”, and by which address a lack of public trust in AI. A major impediment here — as in much of engineering — is one of definition: what is meant by “transparent”, “to interpret”, and “to explain”? And to whom: an experienced engineer or a lay person? In a sense, we know these things when we see it in the context of some use case; but there is a need for standardising these criteria and systematising their evaluation in developing XAI.
Gilpin et al. argue that there is a need for interpretation and explaination. For Gilpin et al., the difference between the two terms seems to be in who is acting. “Interpretation” involves someone/thing trying to link the behavior of a system with its input (e.g., activiation maximisation in a deep image content recognition system). “Explaination” involves the system giving reasons for its behavior (e.g., a medical diagnostics system presenting a diagnosis and highlighting reasons for it).
Gilpin et al. do not explicitly define “an explaination”, but they propose two qualities of one: its “interpretability” and its “completeness”. The first is measured along lines of human comprehension. The second is measured along lines of inferring future behaviours of the system. An example they provide is an explaination that consists of all parameters in a deep neural network. It is complete because one can infer how the system will behave under any other input, but it is not interpretable because one cannot quickly comprehend how the behavior arises from the parameters. Gilpin et al. suggest that the challenge of XAI is to create explainations that are both interpretable and complete (which they see as a tradeoff), and which increase the public trust in that system.
Gilpin et al. identify three different approaches to XAI in research using deep neural networks (which they propose as a taxonomy of XAI): 1) explaining the processing of data by the network (e.g., LIME, salience mapping); 2) explaining the representation of data in the network (e.g., looking at the filters in each layer, testing layers in other tasks); and 3) making the system explainable from the start (e.g., disentangled representations, generated explanations). They also review some past work on interpretability and XAI.
Gilpin et al. conclude with a look at types of evaluation within each of the three approaches to XAI they identify. These include completeness for data processing; detecting biases for data representation; and human grading of produced explainations.
There’s lots of good references in this article some of which I would like to read now:
- B. Herman, “The promise and peril of human evaluation for model interpretability,” arXiv preprint arXiv:1711.07414, 2017.
- Q.-s. Zhang and S.-C. Zhu, “Visual interpretability for deep learning: a survey,” Frontiers of Information Technology & Electronic Engineering 19(1):27–39, 2018.
- A. S. Ross, M. C. Hughes, and F. Doshi-Velez, “Right for the right reasons: Training differentiable models by constraining their explanations,” arXiv preprint arXiv:1703.03717, 2017.
- F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint: arXiv:1702.08608, 2017.
- A. Abdul, J. Vermeulen, D. Wang, B. Y. Lim, and M. Kankanhalli, “Trends and trajectories for explainable, accountable and intelligible systems: An HCI research agenda,” in Proc. CHI Conf. on Human Factors in Computing Systems, 2018.
- R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti, “A survey of methods for explaining black box models,” arXiv preprint arXiv:1802.01933, 2018.
I only have three criticisms of this article. Its title can better reflect its contents, e.g., “A survey of XAI”. It doesn’t propose an approach to evaluating interpretability of machine learning. (Or maybe its “approach” is via the taxonomy?) The second criticism is that the suitcase words of “explaination” and “interpretability” remain just as vague by the end, and are even accompanied by another suitcase word, e.g., “completeness”. Definitions of these things are difficult to pin down. It’s nice to see Gilpin et al. go to some philosophical works to help with the discussion, but by the end I’m not sure we have anything more solid than, “I know it when I see it.” Finally, the discussion of the evaluation could be made more concrete by appealing to specific use cases of AI. I think notions of “explainability” make the most sense when put in a context of use, whether it’s by the machine learning researcher, or a bank loan officer working with a system. Anyhow, this article was well worth the time I spent reading it.
=== Update: Nov. 1 2018 (after reading group discussion)
In their review of past work explaining deep network representation, Gilpin et al. suggest transfer learning as a way of understanding what layers are doing. I don’t agree with this. This is like answering the question, “Why did my barber do a good job cutting my hair? Look, he also cooks good food.” Inferring what a layer must be doing by looking at its performance on a different task is just guessing.
No attention is paid to the harms of interpretability. For instance, spammers can adjust their strategy by adapting content to subvert explainable spam detection systems.