How music recommendation works — and doesn’t work | Brian Whitman @

A nice blog post summarising many important points.

Can a computer really listen to music? A lot of people have promised it can over the years, but I’ve (personally) never heard a fully automated recommendation based purely on acoustic analysis that made any sense – and I’ve heard them all, from academic papers to startups to our own technology to big-company efforts. And that has a lot to do with the expectations of the listener.

via How music recommendation works — and doesn't work | Brian Whitman @

PhD Studentship in Intelligent Machine Music Listening

Please have a look here:

Applications are invited for a fully-funded PhD studentship, to seek ways to exploit novel and holistic approaches to evaluation for building machine music listening systems (and constituent parts). A major emphasis will be on answering “how” systems work and “what” they have learned to do, in relation to the success criteria of real-world use cases. The research will involve working at the intersection of digital signal processing, machine learning, and the design and analysis of experiments.

All nationalities are eligible to apply for this studentship, which will start in Autumn 2015. The studentship is for three years, and covers student fees as well as a tax-free stipend of £15,863 per annum.

Candidates must have a first-class honours degree or equivalent, or a good MSc Degree in Computer Science, Electronic Engineering, or Mathematics. Candidates should be confident in digital signal processing or machine learning, and have programming experience in, e.g. R, MATLAB, or Python. Experience in research and a track record of publications is very advantageous. Formal music training is also advantageous.

The PhD supervisors will be Dr. Bob L. Sturm (Machine Listening) and Dr. Hugo Maruri-Aguilar (Statistics). Please see for background. The project will be based in the School of EECS, and the student will become a member of the interdisciplinary Centre for Digital Music. Informal enquiries can be made by email to Dr. Sturm (

To apply, please follow the on-line process ( by selecting ‘Electronic Engineering’ in the ‘A-Z list of research opportunities’ and following the instructions on the right-hand side of the web page.

Please note that instead of the ‘Research Proposal’ we request a ‘Statement of Research Interests’. Your statement should answer two questions: (i) Why are you interested in the topic described above? (ii) What relevant experience do you have? Your statement should be brief: no more than 500 words or one side of A4 paper. In addition we would also like you to send a sample of your written work. This might be a chapter of your final year dissertation, or a published conference or journal paper. More details can be found at:

The closing date for the applications is 1/05/15.

Interviews are expected to take place /15.during the week of 15/06

Troll detection

Yesterday, I pointed out suspicuous behavior at, and questioned the need for “troll detection.” Thinking about the problem further, I think has it right, and can actually provide a mechanism that could be exploited in other contexts. Each sound uploaded to is analysed by an objective algorithm that computes five features correlating with “good” sound production, as well as being “upvoted” or “downvoted” by real humans. When the two scores conflict, a flag can be raised denoting that the algorithm has missed something important that the benevolent voters did not, or the algorithm has acted objectively while the malevolent voters did not. In the first case, a moderator can seek the reasons for the algorithm’s failure, and impove it. In the second case, a moderator can seek the reasons for the voters’ malevolence, and “improve” them. Either way, it is a win-win for the system as an evolving community-driven resource!

Five challenges for music information retrieval researchers

Tomorrow, at the ninth edition of the Digital Music Research Network at QMUL, I will be presenting work done with Nick Collins at Durham University: Five Four challenges for music information retrieval researchers.

We propose five challenges that turn on its head the problem of music description as has been pursued in MIR research for some time, i.e., take a labeled dataset of recorded music, then combine features with machine learning and reproduce as many labels as possible by any means. This “engineering approach” we claim is deficient in many regards: formally and explicitly defining problems and use cases; identifying and testing underlying assumptions; and using evaluation with the validity to address relevant hypotheses. We thus propose some challenges to encourage new approaches that address real-world problems having to do with music content, however that is defined.

A brief aside:
Looking through the MIR literature, we find the term “music content” is often used, but very rarely defined. In one work from almost 20 years ago (E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based classification, search and retrieval of audio,” IEEE Multimedia, vol. 3, pp. 27-36, Fall 1996), we find a user-centered almost-definition: “… properties that users might wish to specify” for retrieving sound and music. This is echoed in a more recent work (M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-based music information retrieval: Current directions and future challenges,” Proc. IEEE, vol. 96, pp. 668-696, Apr. 2008):
“Content-based MIR is engaged in intelligent, automated processing of music. The goal is to make music, or information about music, easier to find.” Hence, “music content” it seems can be anything from the sample values themselves (an example given in Wold et al.), to whatever high-level concepts “users might wish to specify” in their search.

Back to the program:
So, Nick and I propose some challenges to motivate problem-based thinking in the description of music by listening machines. We express these challenges using the formalism developed in a recent ISMIR 2014 paper: Formalizing the Problem of Music Description.

Briefly, the problem of music description is defined by a use case, which consists of specifications of the following four components: a music universe {\Omega}, a music recording universe {\mathcal{R}_\Omega}, a semantic universe {\mathcal{S}_{\mathcal{V},A}}, and success criteria {\{P_i\}} (a set of boolean predicates). {\mathcal{S}_{\mathcal{V},A}} is a set of sequences built from the vocabulary {\mathcal{V}} (a set of tokens, e.g., instrument names) according to the semantic rule {A} (a Boolean predicate on sequences, e.g., permitting only unary sequences). A music description system is a map from {\mathcal{R}_\Omega} to {\mathcal{S}_{\mathcal{V},A}}. This done by two intermediate maps, one from {\mathcal{R}_\Omega} to the semantic feature universe {\mathcal{S}_{\mathbb{F},A'}} (a set of sequences built from the feature vocabulary {\mathbb{F}} according to the semantic rule {A'}); and then a map from {\mathcal{S}_{\mathbb{F},A'}} to {\mathcal{S}_{\mathcal{V},A}}. The problem of music description is to make the map from {\mathcal{R}_\Omega} to {\mathcal{S}_{\mathcal{V},A}} (the music description system) “acceptable”, i.e., the success criteria of the use case are satisfied.


The image above gives an overview of the problem of music description. First, on the far left, we have the music universe {\Omega}. This is the intangible stuff where “works” of music reside, whatever those are (see L. Goehr “The Imaginary Museum of Musical Works: An Essay in the Philosophy of Music: An Essay in the Philosophy of Music,” Oxford University Press, 1992). Anyhow, a specification of {\Omega} in the use case provides an indication of the problem domain, e.g., music for solo clarinet. An element of {\Omega} is mapped in some way, e.g., recorded human performance or transcription, to an element in {\mathcal{R}_\Omega}. This is the universe of tangible observations, such as the bits of a CD track, the groves on a record, the pressure fluctuations at an ear, or the notes on printed score. This depends on the use case. Unlike the elements of {\Omega}. the elements of {\mathcal{R}_\Omega} are “sensed” by a description system. That means they are input to the system by its operator. A specification of {\mathcal{R}_\Omega} describes this material, e.g., 30 second excerpts of CD-quality live recorded performance. Finally, a music description system maps {\mathcal{R}_\Omega} to {\mathcal{S}_{\mathbb{F},A'}} by a feature extraction algorithm {\mathscr{E}}; and then maps {\mathcal{S}_{\mathbb{F},A'}} to {\mathcal{S}_{\mathcal{V},A}} by a classification algorithm {\mathscr{C}}.

So, that all sets the stage for our five challenges.
Continue reading