Hello, and welcome to Paper of the Day (Po’D): Music Cover Song Identification Edition, pt. 4 (of many more). We continue today with a novel approach: J .Serrà, H. Kantz, and R. G. Andrzejak, “Model-based cover song detection via threshold autoregressive forecasts,” in Proc. ACM Multimedia Int. Workshop on Machine Learning and Music, Firenze, Italy, Oct. 2010.
The authors describe a novel approach to cover song identification by using prediction! Whereas other approaches attempt to measure distances between collections of descriptors, the authors’ approach is to first build a model of a set of descriptors, and then see how well descriptors of another song fit that model. The authors use three descriptors built from frames of 116.1 ms and hop size 104.5 ms: harmonic pitch class profiles (PCP), tonal centroid (obtained from PCPs), and harmonic change (Euclidean distance between pairs of consecutive tonal centroids). The authors create delay coordinate state-space vectors by concatenating the descriptors of delayed frames (a tool used in nonlinear time series analysis). Then they find the best threshold autoregressive models, which are predictors for clusters of time series. For each song they find the best parameters using a grid search method. Presented with a new song, the authors first transpose its descriptors to the key of the modeled song, and then compute the mean squared error of the predictor. When the error is small, this implies the song matches the model. In their experiments, the authors find that model orders higher than 7 (> 731 ms) do not gain more accuracy. In their experiments the authors find that their system performs with a mean average precision of 0.4, which is less than the best of 0.66 in this work. However, the approach using prediction is more computationally tractable, and is not dependent on user-assisted optimization of parameters. In the current case, each model (and parameters therein) is adapted as best as possible (with respect to least square error) to each song.