Paper of the Day (Po’D): Music Cover Song Identification Edition, pt. 2

Hello, and welcome to Paper of the Day (Po’D): Music Cover Song Identification Edition, pt. 2. We continue today with a paper mentioned in the post from yesterday: J. Serrà, E. Gómez, P. Herrera, and X. Serra, “Chroma binary similarity and local alignment applied to cover song identification,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, pp. 1138-1151, Aug. 2008.


The authors explore the impacts on cover song identification of feature types, alignment methods both in time and pitch, beat tracking, and comparison methods using dynamic time warping (DTW). They also propose a new system using sequences of harmonic pitch class profiles (HPCP), which are essentially smoothed chroma features computed from spectral peaks (and their first 8 harmonics) in the band (40,5000) Hz. Related to this work are the following:

First, the authors make comparisons with two baseline systems: 1) the correlation of beat-synchronized HPCPs (Ellis et al. 2007 Po’D here); and 2) using DTW with HCPCs from E. Gómez’s Ph. D. thesis. The authors find that HCPCs computed with smaller pitch divisions (i.e., 1/3 semitone), provide higher average F-measures and recalls; that a cross-correlation measure is better than the cosine distance (i.e., stay out of the Euclidean space); that a key finding algorithm hurts performance (i.e., avoid weak links and assumptions); averaging features over frames instead of beats appears to increase performance (i.e., beat tracking could be a weak link); global DTW path constraints do not help (i.e., use local path constraints).

Second, the authors propose a new approach to cover song identification that uses these findings, and computes distances using a binary similarity matrix between the features, and inspects subsequences to account for possible structural differences. With this similarity matrix their algorithm performs dynamic programming local alignment using the Smith-Waterman algorithm (from comparing molecular biological sequences), picks off the largest value, and finally, to create a distance, multiplies its reciprocal by the total lengths of the songs.
This system scored the best (significantly better than the others) in MIREX 2007.
The authors observe that often misclassifications occur when a song has a simple and/or popular tonal progression. Nothing surprising here as most Western popular music shares these common themes.

The same authors brutally lay waste to other algorithms in the same competition in MIREX 2008 and MIREX 2009 (with modified systems).
They did not participate in MIREX 2010, but it appears that it would have again been a game of cat vs. mice — which makes me think cover song identification is nearly solved (at least when it comes to popular Western music).
And look! J. Serrà, E. Gómez, and P. Herrera, “Audio cover song identification and similarity: Background, approaches, evaluation, and beyond,” in Advances in Music Information Retrieval (Z. Ras and A. Wieczorkowska, eds.), vol. 274 of Studies in Computational Intelligence, pp. 307-332, Springer Berlin / Heidelberg, 2010.

Yes, what is the “beyond” part?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s