During the weekend, I experimented with
the scattering coefficient features for music genre recognition.
At first, I was using AdaBoost with 1000 decision stumps, giving me just above 80% accuracy.
These features being 469 dimensions makes the training process very slow,
so I decided why not test a much quicker approach given by Bayesian classification with Gaussianity assumptions.
So, I learned class-dependent means and covariances for each class from the training data,
the covariance matrix of all the training data.
I then implemented the Mahalanobis distance (MDC) and full quadratic classifiers (FDC),
the benefits of which include their simple implementation, and quick training and testing.
Furthermore, within a Bayesian framework, we can naturally introduce concepts of confidence, risk, and rejection. But I started simple: equal priors and uniform risk.
Below we see the mean classification results from 10 independent trials of 10-fold stratified cross validation.
The lines labeled “frame” show the mean classification accuracies of each 1.5 second frame of data (hopped 0.75 seconds). The lines labeled “excerpt” show the mean classification accuracies of each 30 second music excerpt, classified by taking the class associated with the maximum sum of the posteriors.
“Maha” denotes the results of MDC (class-dependent means and identical covariances); and “Quad” denotes the results of FQC (class-dependent means and covariances).
In the case of MDC, I find that the mean classification accuracy of excerpts is 83.0 ± 2.1% (95% confidence interval), and of frames is 71.9 ± 1.0%; and that of FDC is 75.0 ± 2.3% for excerpts, and 70.0 ± 0.7%.
In both cases, we see that the accuracy of MDC is significantly better than that of FDC.
It is at first surprising that the FDC performs on average worse than the MDC;
but this could be because of problems with the composition of particular classes in the Tzanetakis data set.
The Classical excerpts are by and large uniformly Classical and Baroque, with no mislabelings. And we see that using its covariance matrix provides FQC an advantage over using the average covariance matrix of all classes.
We see that the FDC achieves 100% accuracy in all folds in all trials for this class for excerpts, and nearly so as well for frames.
Other classes, like Disco and Hip Hop, contain mislabeled excerpts; and using these in training could swivel their covariance matrices so that the clusters overlap more. So, I think using an average covariance matrix helps in general with the problems of the dataset.
I am surprised to see that Rock excerpt classification with MDC achieves over 70% mean accuracy, which I have never seen.
Now it is time to take a closer look: first to see how much these results depend on the features; and then to see what is going on below the hood.