In his recent Visions and Views column for the IEEE Multimedia Magazine, Malcolm Slaney asks the question, “Does content analysis matter?” when it comes to judging the similarity between media, or making recommendations on media, at the scale of the Internet. He discusses experimental results that show user ratings, and/or context of media on the WWW, appear to be much more powerful than automatic content analysis, i.e., careful feature design, segmentation and feature extraction, and classification, for predicting the similarity between two pieces of music, or for building a playlist of music, or for recommending a movie, or for determining the content of a given image (for instance, as done by Google Image Search, my favorite tool when building a lecture). This is altogether not at all unbelievable since we humans operate in a non-random and time varying way; and it shows how far we engineers must go until we can extract from complex media “actionable intelligence” for making decisions on media within particular complex contexts (e.g., pornography from 100 years ago is not pornography today, “rock” in the 60’s is now “classic rock”, etc.). Thus, Slaney opines, the way forward is to “take advantage of human signals” for aiding content analysis of multimedia, and all things related.
While I agree with his thesis, Slaney says something with which I do not agree: “The [music] audio waveform is rich in information — it tells us everything we need to know about the music.” The first part is ok; the second part, not so much. Barring argument over the set composition of “everything we need to know” about the music in the recording, among the things I need to know as a modern classical music lover are the composer and year of composition. Perhaps for a real connoisseur, he or she would also like to know the conductor, orchestra, soloists, and the place and time of a recording. Such things are not present in the audio waveform. Perhaps in only a few cases may the composer be inferred, or a soloist named, but only with expert experience (or expert systems like Shazaam) can such judgements be reliable. This information is not present in the waveform. (There are also a lot of imitators, e.g., Bach’s children, David Cope’s EMI, counterpoint students, etc.)