Here is our late breaking demo for ISMIR2015: We re-evaluate the published results in J. Andén and S. Mallat, “Deep scattering spectrum,” IEEE Trans. Signal Process., vol. 62, pp. 4114–4128, Aug 2014. The reason we do this is that the original experiments do not account for the faults in the GTZAN dataset. How will the results change?
We build and test scattering feature-based classification systems in two conditions: i) random stratified partitioning of GTZAN; 2) fault-filtered partition. We use the same code as Andén and Mallat, which is provided in ScatNet. Our code is available here. The table below shows major differences between the two conditions for all features.
The figure below shows there to be a non-trivial interaction between “genre” and partition condition. (Here, “normalised accuracy” is recall.)
We emphasize that these new numbers are no more relevant or reliable to measuring the “musical intelligence” of these classification systems, simply because we do not know what that means; however, these results come from experimental conditions that are more controlled than before (e.g., removal of replicas and “artist filtering”). Regardless, look at this:
That figure shows how the accuracies of six systems we have re-evaluated in these conditions drop (bracketed numbers refer to publications cited in this article). The SVM+Scattering features (l>1) systems reproduce much more ground truth than these in the fault filtered condition. Why? What characteristics are these scattering features capturing? Does it have anything to do with music?