Formalizing Evaluation in Music Information Retrieval: A Look at the MIREX Automatic Mood Classification Task

One thing I have come to appreciate during the past two years is the necessity to employ formalism. Formalism is a way to see and work with things without ambiguity, to circumvent semantics, to find flaws and avoid them, and to make assumptions clear and their qualification of conclusions. I might have looked at such a sentence two years ago and thought it a senseless piece of self-serving gibberish irrelevant to the way I was working — which was quite formal, if I may say so! I was using standardized datasets and accepted approaches to systematically evaluate algorithms for music genre recognition, not to mention recovery from compressive sampling.
I was even testing for statistical significance!

And things were good; but, I began to see the necessity for something deeper than the standardized and systematic ways in which I was working. I then developed a deep appreciation for analysis; and learned first hand how bad ideas and wasted efforts can be avoided with a little analysis.
And then the edifice of my standardized and systematic ways of working
was cracked to its foundation.

And things were bad; but then I realized this summer the core problem.


Formalism is a way to see and work with things without ambiguity, to circumvent semantics, to find flaws and avoid them, and to make assumptions clear and their qualification of conclusions. Formalism is not the same as a standardization (which is a thing), or working systematically (which is a way of doing something), or using math and applying tools of statistics, although those things involve formalism, either overtly or covertly. Designing a formalism involves rising to the meta-level and higher, abstracting to the point where nothing meaningful can be said at all (what is “meaning”?), and then coming back a little bit where it is safer to operate. When it is acknowledged, formalism provides perspective; when it is unacknowledged, there is a high risk of shock.

My paper Formalizing Evaluation in Music Information Retrieval: A Look at the MIREX Automatic Mood Classification Task (to be presented at CMMR 2013), makes clear the unacknowledged formalism underlying much work in music informatics research, and pinpoints numerous serious problems arising from the assumptions unknowingly made, the experimental designs unknowingly employed, and the serious qualifications that must accompany any conclusion coming from such work, but that are by and large missing. From the altitude of this formalism then, we see that the extraordinary amount of systematic and standardized evaluation efforts thought by many explorers to be solid and fertile land, is but a mirage.

Leave a comment