Hello, and welcome to Paper of the Day (Po’D): Music Cover Song Identification Edition, pt. 2. We continue today with a paper mentioned in the post from yesterday: J. Serrà, E. Gómez, P. Herrera, and X. Serra, “Chroma binary similarity and local alignment applied to cover song identification,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, pp. 1138-1151, Aug. 2008.
The authors explore the impacts on cover song identification of feature types, alignment methods both in time and pitch, beat tracking, and comparison methods using dynamic time warping (DTW). They also propose a new system using sequences of harmonic pitch class profiles (HPCP), which are essentially smoothed chroma features computed from spectral peaks (and their first 8 harmonics) in the band (40,5000) Hz. Related to this work are the following:
- E. Gómez, B. S. Ong, and P. Herrera, “Automatic tonal analysis from music summaries for version identification,” in Proc. Conv. AES, Oct. 2006;
- E. Gómez, “Tonal description of music audio signals,” Ph. D. thesis, Music Technology Group, Univ. Pompeu Fabra, Barcelona, Spain, 2006;
- E. Gómez and P. Herrera, “Estimating the tonality of polyphonic audio files: Cognitive versus machine learning modelling strategies,” in Proc. Int. Symp. Music Info. Retrieval, pp. 92-95, 2004;
- E. Gómez and P. Herrera, “The song remains the same: Identifying versions of the same song using tonal descriptors,” in Proc. Int. Symp. Music Info. Retrieval, pp. 180-185, 2006.
- HPCP are compared with chroma in: B. S. Ong, E. Gómez, and S. Streich, “Automatic extraction of musical structure using pitch class distribution features,” in Proc. Workshop Learning the Semantics of Audio Signals, pp. 53-65, 2006.
- And appendix material for the current Po’D is here
First, the authors make comparisons with two baseline systems: 1) the correlation of beat-synchronized HPCPs (Ellis et al. 2007 Po’D here); and 2) using DTW with HCPCs from E. Gómez’s Ph. D. thesis. The authors find that HCPCs computed with smaller pitch divisions (i.e., 1/3 semitone), provide higher average F-measures and recalls; that a cross-correlation measure is better than the cosine distance (i.e., stay out of the Euclidean space); that a key finding algorithm hurts performance (i.e., avoid weak links and assumptions); averaging features over frames instead of beats appears to increase performance (i.e., beat tracking could be a weak link); global DTW path constraints do not help (i.e., use local path constraints).
Second, the authors propose a new approach to cover song identification that uses these findings, and computes distances using a binary similarity matrix between the features, and inspects subsequences to account for possible structural differences. With this similarity matrix their algorithm performs dynamic programming local alignment using the Smith-Waterman algorithm (from comparing molecular biological sequences), picks off the largest value, and finally, to create a distance, multiplies its reciprocal by the total lengths of the songs.
This system scored the best (significantly better than the others) in MIREX 2007.
The authors observe that often misclassifications occur when a song has a simple and/or popular tonal progression. Nothing surprising here as most Western popular music shares these common themes.
The same authors brutally lay waste to other algorithms in the same competition in MIREX 2008 and MIREX 2009 (with modified systems).
They did not participate in MIREX 2010, but it appears that it would have again been a game of cat vs. mice — which makes me think cover song identification is nearly solved (at least when it comes to popular Western music).
And look! J. Serrà, E. Gómez, and P. Herrera, “Audio cover song identification and similarity: Background, approaches, evaluation, and beyond,” in Advances in Music Information Retrieval (Z. Ras and A. Wieczorkowska, eds.), vol. 274 of Studies in Computational Intelligence, pp. 307-332, Springer Berlin / Heidelberg, 2010.
- Identifying covers of Merzbow?
- Identifying covers made by dogs?
- Identifying pathological covers?
- Automatic cover song correction and generation?
- Identifying William Shatner interpretations?
- Assisted song completion?
- Identifying music performed by the Portsmouth Sinfonia, or the delightful Florence Foster Jenkins?
- Audio forensics to test this guy’s claim to be the singer of that cover?