Tomorrow, at the ninth edition of the Digital Music Research Network at QMUL, I will be presenting work done with Nick Collins at Durham University: Five
Four challenges for music information retrieval researchers.
We propose five challenges that turn on its head the problem of music description as has been pursued in MIR research for some time, i.e., take a labeled dataset of recorded music, then combine features with machine learning and reproduce as many labels as possible by any means. This “engineering approach” we claim is deficient in many regards: formally and explicitly defining problems and use cases; identifying and testing underlying assumptions; and using evaluation with the validity to address relevant hypotheses. We thus propose some challenges to encourage new approaches that address real-world problems having to do with music content, however that is defined.
A brief aside:
Looking through the MIR literature, we find the term “music content” is often used, but very rarely defined. In one work from almost 20 years ago (E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based classification, search and retrieval of audio,” IEEE Multimedia, vol. 3, pp. 27-36, Fall 1996), we find a user-centered almost-definition: “… properties that users might wish to specify” for retrieving sound and music. This is echoed in a more recent work (M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-based music information retrieval: Current directions and future challenges,” Proc. IEEE, vol. 96, pp. 668-696, Apr. 2008):
“Content-based MIR is engaged in intelligent, automated processing of music. The goal is to make music, or information about music, easier to find.” Hence, “music content” it seems can be anything from the sample values themselves (an example given in Wold et al.), to whatever high-level concepts “users might wish to specify” in their search.
Back to the program:
So, Nick and I propose some challenges to motivate problem-based thinking in the description of music by listening machines. We express these challenges using the formalism developed in a recent ISMIR 2014 paper: Formalizing the Problem of Music Description.
Briefly, the problem of music description is defined by a use case, which consists of specifications of the following four components: a music universe , a music recording universe , a semantic universe , and success criteria (a set of boolean predicates). is a set of sequences built from the vocabulary (a set of tokens, e.g., instrument names) according to the semantic rule (a Boolean predicate on sequences, e.g., permitting only unary sequences). A music description system is a map from to . This done by two intermediate maps, one from to the semantic feature universe (a set of sequences built from the feature vocabulary according to the semantic rule ); and then a map from to . The problem of music description is to make the map from to (the music description system) “acceptable”, i.e., the success criteria of the use case are satisfied.
The image above gives an overview of the problem of music description. First, on the far left, we have the music universe . This is the intangible stuff where “works” of music reside, whatever those are (see L. Goehr “The Imaginary Museum of Musical Works: An Essay in the Philosophy of Music: An Essay in the Philosophy of Music,” Oxford University Press, 1992). Anyhow, a specification of in the use case provides an indication of the problem domain, e.g., music for solo clarinet. An element of is mapped in some way, e.g., recorded human performance or transcription, to an element in . This is the universe of tangible observations, such as the bits of a CD track, the groves on a record, the pressure fluctuations at an ear, or the notes on printed score. This depends on the use case. Unlike the elements of . the elements of are “sensed” by a description system. That means they are input to the system by its operator. A specification of describes this material, e.g., 30 second excerpts of CD-quality live recorded performance. Finally, a music description system maps to by a feature extraction algorithm ; and then maps to by a classification algorithm .
So, that all sets the stage for our five challenges.