Over the past year I have been working with my student Pardis on automatic music genre recognition, a problem that I feel has been approached too often in a way that is at odds with reality. The aim of recognizing the genre thought to be embodied by an excerpt of a recording of a piece of music is by nature not well defined — a fact reflected by the observation that humans often disagree on labels in music. There is not so much argument over whether an ‘8’ is an ‘8’ in a dataset of hand-written digits; but whether a musical excerpt of a music genre dataset is Jazz or Classical depends often on the viewpoint of whoever created that dataset (which I show below).
Since about 2001, many studies in this area have used the 1.2 GB dataset assembled by George Tzanetakis, who was one of the first to study this area. The typical approach has been to design and test a set of acoustic features with a classifier, then report the mean accuracies from cross validation, and perhaps a confusion table. Below is a confusion table I just created from this dataset using one classification method and set of acoustic features.
At first glance, we might be quite satisfied. Metal is classified correctly 100% of the time! Classical and blues are way up there too. Then we see Rock has some troubles, but based on my experience with Rock, I can see how one can confuse it with Metal, Blues, Country, Disco, Pop, and Reggae… but not Classical. Hiphop can also been seen close to Reggae and Disco and Pop, but not Metal. With a classification accuracy of 79±17%, and based on a purely acoustic feature without any feature integration or weak classifiers, these results are publishable — only 4% below state of the art, yet involving much simpler features and classification. But let’s dig a little deeper and see where things are going wrong.
First, we listen to the mislabeled Classical excepts, of which we see there are four out of the one hundred tested. The excerpt classical.00045 was labeled Jazz:
I would say that this confusion is acceptable because of its use of subject and variations, and the general lack of tonal center; and probably most of that passage was notated with figured bass and later written. (The pops are present in the original file.)
Next is classical.00039, misclassified as Disco:
Unlike the previous mistake, I say that though this confusion is enormously amusing, it is completely unacceptable. First, no part of that excerpt reminds me of Gloria Gaynor. Second, it lacks all of the defining characteristics of disco: a bouncy driving beat, 16th note hi hats, and of course sequins. (I would also say this excerpt is not Classical, but oh well.)
The excerpt classical.00049 is misclassified as Rock:
and I am sure Mozart would be happy with that, but I’m not. Also, here we hear some distortion present in the dataset. It sounds as if the excerpt was amplified beyond quantization limits, which makes me wonder if this excerpt is classified as rock because of its lack of dynamic range. Still, genre is a quality that transcends distortion, such as AM radio.
Finally, classical.00051 is misclassified as Metal:
This is going to annoy my metal friends, but I think that is acceptable considering the presence of dramatic dynamic shifts, tutti power chords, rumbling bass, and szforzando kettles. Mussorgsky always was a bad boy, like a past-day Glenn Branca.
Now, what about things misclassified as Classical? From the confusion table we see that excerpts from all but two genres (Hiphop and Metal) are misclassified Classical. First, we have jazz.00000 and jazz.00001.
One of the few things you might notice is that BOTH OF THESE ARE NOT JAZZ.
Have you ever heard an entire orchestra improvise, and at the same time quote Stravinsky? (Maybe inadvertently John Cage, but not Stravinsky.) The musical from which these excerpts come, West Side Story, is not Jazz. I think this is a problem in this dataset.
The Stan Getz in jazz.00057, which I would label Jazz, is misclassified as Classical:
A similar thing happens with country.00069, which is also apparently Classical:
It is as if this classifier misses the front and center saxophone, and Willie Nelson’s voice — so rare are these instruments to Classical music — and focuses instead on the narrow definition “uses strings.” But, probably, the classifier is not listening at all, but is sensitive to particular methods of mastering typical to genres — which explains the above misclassification of Mozart as Rock. (Another Willie Nelson tune, Uncloudy Day, is also misclassified as Classical.)
Then there is blues.00004:
which are audibly not Classical, yet have no bowed strings.
Blues, Jazz and Country can’t have all the fun; disco.00020 is thought to be Classical as well:
which is Clarence Carter’s “Patches”. Though it is certainly not Classical, I think we can all agree that if this was played at a disco, everyone would stop dancing and start crying. Just as Bernstein’s West Side Story above is mislabeled Jazz, Carter’s “Patches” is mislabeled Disco.
And on top of that, disco.00047, mislabeled Classical
is in my opinion mislabeled itself even though it comes from a Disco piece. This excerpt comes from the most unDisco-like portion of Barbara Streisand and Donna Summer singing No More Tears (Enough is Enough). Listen to when the real Disco portion breaks loose at 1m50! Still, in my opinion, this excerpt has no business being used to represent Disco.