Hello, and welcome to Paper of the Day (Po’D): Kiki-Bouba edition. Today’s paper is my own: B. L. Sturm and N. Collins, “THE KIKI-BOUBA CHALLENGE: ALGORITHMIC COMPOSITION FOR CONTENT-BASED MIR RESEARCH & DEVELOPMENT“, in Proc. Int. Symp. Music Info. Retrieval, Oct. 2014. Below is the video of my presentation from a few days ago (powerpoint slides here).
The one-line precis of our paper is:
The Kiki-Bouba Challenge (KBC) attempts to change the incentive in content-based MIR research from reproducing ground truth in a dataset to solving problems.
By “content-based MIR,” we are referring to the “intelligent, automated processing of music [intended to make] music, or information about music, easier to find” . In this domain, we are specifically concerned with research about genre recognition, mood recognition, and autotagging.
We first point out six major problems that inhibit research in content-based MIR:
- Research problems can lack formal and explicit definitions.
- The amount of data available is quite small.
- “Ground truth” can be difficult to define and collect.
- Intellectual property law conflicts with the needs of the data scientist, machine learning algorithms, and so on.
- The design of a valid experiment to evaluate a “solution” is difficult when
a problem is not explicitly defined.
- Reproducible research is not widely practiced.
Most research in music genre recognition, mood recognition, and autotagging suffers from all of these problems. Even though a massive amount of work has been performed in these important research areas, whatever problems are being addressed remains informally and implicitly defined by labeled datasets, which are typically small and with questionably defined “ground truth.” A vast majority of all this work attempts to create systems that exhibit some kind of musical intelligence (the research discipline is called “music information retrieval” after all), and evaluates solutions by measuring the amount of “ground truth” they reproduce in some dataset. Unfortunately, this approach does not control for the independent variables in the experiment, and considers only the simplest measurement model. Whatever a result, no one can claim it is evidence for a system considering music content relevant to the task (genre, mood, etc.). In other words, musical “intelligence” can be an illusion. A system can produce “correct” answers for the wrong reasons.
Clearly, this is a significant problem. In content-based MIR research, too much importance has been and continues to be given to reproducing labels in datasets instead of verifiably solving fundamental problems. What is more, the problems with validity in experimental design are exceptionally subtle. So, how do we attempt to illustrate these problems, and provide a way to address them?
The Kiki-Bouba Challenge (KBC) poses a content-based MIR research problem that 1) can be defined explicitly and formally, 2) has limitless data with perfect ground truth unencumbered by intellectual property law, 3) requires valid evaluation to solve, and 4) facilitates reproducibility. The essence of KBC is to design a system that can discriminate (task 1), identify (task 2), recognize (task 3), and imitate (task 4) Aristotelian categories of “music.” Each of these tasks are as follows:
- Discrimination task: Given an unlabeled set of music recordings from the Kiki-Bouba music universe, build a system that determines: 1) there exist two categories in this music universe; 2) what high-level criteria (content) discriminate them. (This can be seen as unsupervised learning, but ensuring discrimination is caused by content and not criteria irrelevant to the task.)
- Identification task: Build a system that identifies, using high-level criteria (content), recordings of music (either from the Kiki-Bouba music universe or from another) as being Kiki, Bouba, or neither. (This can be seen as supervised learning, but ensuring identification is caused by content and not criteria irrelevant to the task.)
- Recognition task: Build a system that recognizes high-level content in real-world music recordings as being similar to contents in music from Kiki, Bouba, both, or neither. (This can be seen as relevance ranking, but ensuring recognition and ranking is caused by content and not criteria irrelevant to the task.)
- Imitation task: Build a system that composes music having high-level content similar to that in music from Kiki and/or Bouba. (This can be seen as backwards engineering the compositional rules of the Kiki-Bouba music universe by listening.)
In the “Kiki-Bouba universe,” there only exists “music” that belongs to either of two categories: “Kiki” and “Bouba.” Each category consists of high-level characteristics (content) that are well-defined and indisputable (Aristotelian). Music from each of these categories is generated using algorithmic composition. In this way, we can create a limitless amount of “music” data without copyright having a perfect ground truth, i.e., all “music” generated by the “Kiki” algorithm will be “Kiki”.
KBC attempts to simplify the problem of music genre recognition such that it can be explicitly and formally defined. Its use of algorithmic composition neutralizes the complex, culturally negotiated nature of genre in the real-world (which is incompatible with Aristotelian categorization) and replaces it with a property that is compatible with the formal nature of algorithms. What is more, algorithmic composition can create unlimited data with a perfect ground truth without the impediments erected by intellectual property law. If one can’t solve such a simplified problem, then why hope to solve the real-world problem of genre recognition (or autotagging)? KBC is an abstract challenge, from which one generates a realisation and solves it. As its four tasks show (above), to solve KBC is not to reproduce the most “ground truth” in a dataset of Kiki and Bouba music. Central to solving each task is the verification of the internal models of a system: is its behavior caused by music content, or something else?
In the paper, we provide an example realisation of KBC, and provide a table of the high-level content of each of its categories. The supercollider code, and some sound examples, can be found here: http://composerprogrammer.com/kikibouba.html.
For the identification task of this realisation, we present an “unacceptable” solution.
As is typical in music genre recognition research, we build a bag of frames of features classifier. The features we use are short-time zero crossings, aggregated over several seconds into a mean and variance. We classify by majority vote of a nearest neighbor classification of several consecutive bags. We evaluate this simple system and find it reproduces the class labels of a test dataset with perfect accuracy. Yet, we say, this is not a solution to the identification task because it is not using the high-level criteria or content specified in the task.
In my presentation, I went a bit further. I showed that we can make the performance of this same system go to random just by adding a small bit of DC offset to the test signals. In this way, the statistics of the zero crossings are no longer correlated with the ground truth as they are in the training dataset. Thus, I claimed in my presentation, the system is not and never was listening, and has only learned to exploit a confound in the training dataset.
This motivated an important question after my presentation: “How do you define listening?”
In KBC, the definition is simple: attending to the high-level criteria (content) constituting each category of the Kiki-Bouba universe in an acquired acoustic signal.
The simple system above attends only to a low-level characteristic, which, while leading to the correct identification for signals drawn from the Kiki-Bouba universe, will be unsuccessful in other universes e.g., most trivially the “Kiki-Bouba with small DC offset universe.” To solve KBC requires developing a system that can listen at the content level.
This, I think, is quite far away.
 M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-based music information retrieval: Current directions and future challenges,” Proc. IEEE, vol. 96, pp. 668-696, Apr. 2008.