¿El Caballo Viejo?

This Wednesday, I am presenting our work: B. L. Sturm, C. Kereliuk, and J. Larsen, “¿El Caballo Viejo? Latin genre recognition with deep learning and spectral periodicity,” in Proc. Int. Conf. on Mathematics and Computation in Music, 2015.
For a temporary time, all papers of this conference are available for free from Springer. (Well done Tom Collins and Elaine Chew!)

My slides are here. The outline of the talk is simple:

  1. What do these numbers mean?
  2. What do these numbers mean?
  3. What do these numbers mean?
  4. What do these numbers MEAN?
  5. What do these numbers mean?

Our work here is a direct extension of our previous work: B. L. Sturm, C. Kereliuk, and A. Pikrakis, “A closer look at deep learning neural networks with low-level spectral periodicity features,” in Proc. Int. Workshop on Cognitive Info. Process., 2014. Here is my blog post about that paper. Essentially, we took the algorithms used to build the music content analysis systems that reproduced the most ground truth in the 2013 MIREX Audio Latin Music Genre Classification Train/Test task, then trained a system on a different benchmark dataset (we didn’t have access to the private dataset used in the MIREX task), and found that it was extremely susceptible to minor changes in tempo. By just changing only the tempo in the test set excerpts by at most ±6%, we found that we could make its mean classification accuracy move from 0.882 to 0.093 or to 0.971. This then led to our re-discovery of the extremely close relationship between tempo and label. But what of the systems tested in MIREX?

We finally acquired a copy of the private dataset used in the MIREX task — which has 3229 full-length music recordings and is at least 6.5% replicas — and run nearly the same experiment. Will the system have the same unreasonable sensitivity to changes in tempo? Will it matter that it seems tempo and label appear to not be so correlated?

FoM_DeSPerF_LMDF-score-eps-converted-to

We find we are able to make the normalised classification accuracy of the system move from its “original” test value of 0.62 to 0.22 or 0.91 with tempo changes of at most ±6%. The image above shows how the F-score of each class of the test set deflates or inflates as we impose larger changes to tempo. Note that a 6% change in a tempo of 120 BPM is in the range [113, 127]. The F-score of all but one class moves to below 0.5 at this point. Furthermore, we can make the system classify the “same” music from the test set in many ways by just making tempo changes. You can listen to many results here.

So, what do these numbers mean? Certainly, the winning system is not picking labels randomly.
So, What do these numbers mean? Certainly, the winning system has learned something about this dataset.
So, what do these numbers mean? Nothing with regard to whether that something learned has anything to do with music.
So, what do these numbers MEAN? Nothing with regard to whether this system addresses one of the principal goals of content-based music information retrieval: connecting users with music and information about music.
So, what do these numbers mean? We need experimental designs that validly test to the problem.

NB: The title refers to “horses” of course. It is also the title of a great song in the private dataset of MIREX, Salsa label — but if you slow it a little bit it becomes Tango, and speed it up a little bit and it becomes Merengue.

Advertisements

One thought on “¿El Caballo Viejo?

  1. Pingback: I am known as “that guy”, but really I am “that other guy” | High Noon GMT

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s