Hello, and welcome to Paper of the Day (Po’D): Semantic gap?? Schemantic Schmap!! Edition. Today’s paper provides an interesting argument for what is necessary to push forward the field of “Music Information Retrieval”: G. A. Wiggins, “Semantic gap?? Schemantic Schmap!! Methodological Considerations in the Scientific Study of Music,” Proc. IEEE Int. Symp. Mulitmedia, pp. 477-482, San Diego, CA, Dec. 2009.
My one line summary of this work is:
The sampled audio signal is only half of half of half of the story.
This paper argues that MIR researchers are looking in the wrong (and non-existent) place to explain the apparent “performance ceiling” faced by the state-of-the-art approaches to, e.g., transcription, genre recognition, similarity, and so on. First, music is not just sound, but something much more. It does not, the paper argues, reside in the physical real world, but in the domain between one’s ears; and being such, we cannot realistically hope to capture its characteristics from just acquiring and processing its (necessarily incomplete) physical manifestation in the channel between transducers. Furthermore, one’s perception of music is altered by things, real and imaginary, conscious and unconscious, things like memory and experience, that come before, during, and after, the experience. And our experience is entirely mutable, altered by reflection and time.
With all this in mind then, it is no wonder there is a glass ceiling to the current approaches. Or maybe it isn’t even a glass ceiling, but we are actually at the top floor of what we can hope to extract from the audio signal alone. (Which for music genre recognition appears to be a penthouse that looks solid from the ground level, but up close we can clearly see it’s affordable IKEA flat pack unable to withstand use in the realworld.)
Ok, and now wearing the critical hat (curiously crumpled from someone sitting on it). I think the paper provides some good discussion and reflection, and in general I agree with what he is saying, but the arguments have a few weaknesses. First, the paper says music is “an invisible and intangible entity” that “leaves traces” such as acoustic signals, printed music, etc. In my opinion, one cannot scientifically make such a premise unless one also stipulates a mechanism of how something not of the physical world can affect the physical world. This is the old mind-matter duality problem. Are humans somehow plugged into the non-physical realm through an organ? Where along the communication channel does the physical reach the non-physical?
The paper compares music to light, which “cannot itself be seen” without having reflected from some surface. I think this analogy can be better, and I feel the description of light is unjustified. First, light, the thing, is demonstrably physical. Second, this requires that light possesses a sort of intelligence to know when it has been reflected so that it will reveal itself to an observer; or that an observer’s eye know which photons have been reflected and to accept only those. I think it is better to say perceived characteristics of music are like perceived characteristics of light, e.g., color and brightness.
This premise of music not being physical precedes the paper’s prescription for MIR: include more study of music in (physical) humans, i.e., minds in the physical world. I wholeheartedly agree, but the prescription can’t cure our physical malady (poor algorithms) when the carrier (music) is locked away in a non-physical realm incommunicado. If we can just say the acoustic realm carries only part of the information we need to create effective algorithms, then we can avoid the dualism.
Throughout the paper, there is persuasion that human factors must be included throughout the design of MIR algorithms, and that biologically-inspired methods are often working better than other approaches. Some arguments take an evolutionary viewpoint, e.g., “it is unlikely that such complexity [that ears of mammals transmit important information about sound phase and pitch to the brain] would evolve unless it were important.” This makes an unjustified assumption that evolved things have important functions, or that some sort of teleology is involved. It becomes even more complicated to say something, then, about music since its relationship to the diversity of species is completely unclear. And humans have plenty of limitations. For instance,
it is due to the shortcomings of the human auditory system that it is even possible to have transparent lossy audio coding. (We do miss a lot of stuff that machines can reliably capture.)
Finally, the paper argues that, “Because music is first and foremost a psychological construct, there can be no externally defined truth, and systems which aim to encode musical similarity must, by definition, do so in a human-like way.” I am not sure about the “first and foremost” part; why can’t it be first and foremost a societal construct, i.e., many minds, construct? In fact, earlier in the paper it says, “[the kinds of organized sound appropriate to be called music are] entirely a sociocultural construct.” So, I think it is fair to call music a product of minds. Finally, I believe this position on music similarity is limiting because it only considers human definitions of similarity. Why can’t there be other ways of perceiving music, and relationships between pieces, of which we might not be aware because of our 6 second window of memory?