Faults in the Tzanetakis Music Genre Dataset

In a previous post, I spoke of some classification outcomes using the Tzanetakis music genre dataset. My observations, or unsupported justifications, should be taken worth a grain of salt because they assume the classifier is looking at and compare the same things I am comparing. Then, in the last post, I noted there exist several problems in the training and testing dataset. I have finally completed a thorough study of this dataset, and present a detailed list of faults here.

This finding is not good news for the many new studies and those over the past decade that
rely only on the Tzanetakis dataset for testing and comparing results. Confirming results with other datasets is always a good idea; but I don’t have enough experience with other datasets yet — and I don’t know whether their integrity has been validated.

However, in this paper I argue that the many faults in the Tzanetakis dataset presents new and interesting challenges. Since our datasets have grown past the point where human validation is impossible, we need tools that can automatically find problems, like distortions, versions, and possible mislabelings. Furthermore, when we only have access to features and not to the audio data itself, we have to build tools to do the same in the feature space. In these directions, my large catalog of faults provides a ground truth to test such tools. With my limited memory too, I am sure I missed some versions. But I am confident all replicas are found (using a simplified version of the Shazam fingerprint method).


2 thoughts on “Faults in the Tzanetakis Music Genre Dataset

  1. Hey, great work on such a thorough analysis of the problems with the dataset. I was a bit suspicious when randomly listening to some of the samples and found lots of repeated artists.
    I think the format of the dataset is very good though – 10 genres, 100, 30 second samples of each. What would it take to make a new accepted dataset, much more representative of the genres? Would copyright be an issue?


  2. Hello Fionntan,
    Copyright could be an issue, but I say screw them. This is for research. And 30 second excerpts is less infringing than using the entire song. Anyhow, since that duration appears to be much more than what is needed by people to classify genre, we might even go shorter.
    To create a new “accepted” dataset, I would start with the Tzanetakis dataset, and replace the replicas and the questionably labeled excerpts and that one extremely corrupted element with excerpts more typical to each genre. I would also aim to make the excerpts within each genre to be more homogeneous, e.g., right now the Popular and Rock excerpts span several decades, and styles themselves.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s