Optimum path forest at ISMIR 2011

Appearing at ISMIR 2011 was the following intriguing paper:
C. Marques and I. R. Guiherme and R. Y. M. Nakamura and J. P. Papa, “New Trends in Musical Genre Classification Using Optimum-Path Forest”, Proc. ISMIR, 2011.
As it reports classification accuracies in GTZAN above 98.8%,
it certainly caught my attention.
With respect to the classification accuracies in GTZAN reported in 94 other works, we see that of the optimum path forest in the image below as reference [55]:

So, with the great help of the fourth author Joao Papa, and their excellent Optimum Path Forest library, I was quickly on my way to reproducing the results.

Joao has filled in a critical detail missing from the paper.
Their results come from classifying every feature (computed from a 23 ms window)
instead of the 30 s excerpts.
This is even more curious to me since experience shows such classification
should be very poor … unless the partitioning of the dataset into training and test sets distributes features from excerpts across instead of keeping them separated.
Looking at the code behind the “opf_split” program confirms that it takes no care to avoid a biased partition.
Another curious detail in the paper is that they write they have 33,618 MFCC vectors from the 1000 excerpts in GTZAN.
I get 1,291,628 MFCC vectors.

So, I decided to run this evaluation as I think they did:

./runOPF.sh alldata.bin 0.5 0.5 1 1

where “alldata.bin” is an OPF-formatted file of the features I compute in MATLAB,
the first two numbers specify the train/test split, the last two numbers denote whether feature normalization is used, and how many independent trials to run.
Here is some of the output:

Training time: 23248.525391 seconds
Testing time: 30824.958984 seconds
Supervised OPF mean accuracy 74.323967

We see that after nearly 15 hours of computation,
we don’t get anywhere near the 98.8% accuracy.
And without feature normalization, the accuracy rises only to about 76.3%.
The paper reports that the training and testing times for OPF in GTZAN
are 9 and 4 seconds, respectively.
Respectfully, my computer is not so slow as to cause a 7000 fold
increase in computation time.
I tried several other things to increase the accuracy,
but nothing was working.

Then I tried testing and training on the same fold, and got an accuracy of 99.97%.
Joao confirms that this appears to be at least part of what happened.

Now, I am going to run the same experiment, but using a proper partitioning,
and the fault filtering necessary for evaluating systems with GTZAN. I predict that we should see the classification accuracy drop from 74 to at least 55.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s