Paper of the Day (Po’D): Clustering Beat-chroma Patterns in Music Databases Edition

Hello, and welcome to Paper of the Day (Po’D): Clustering Beat-chroma Patterns in Music Databases Edition. Today’s paper offers a break from the cover-song-centric Po’Ds we have been having lately; but it is still related: T. Bertin-Mahieux, R. J. Weiss, and D. P. W. Ellis, “Clustering beat-chroma patterns in a large music database,” in Proc. Int. Symp. Music Info. Retrieval, (Utrecht, Netherlands), Aug. 2010.


The authors take the beat-chroma features used in the paper by Ellis and Poliner, and use them to compare the general musical aspects between pieces of music. The authors use the Echo Nest API to segment a song into beats and then into bars of four beats (and when a song has an odd meter, they resample to four). Then they compute the chroma of each beat of the audio signal to create “patches” of four or eight beats of features. (Is there overlap of the patches?) They then shift all rows of each patch until the first row contains the most energy. They perform vector quantization on each patch in a codebook of size K, updating the centroid of the selected cluster each time. They define the distance between patches as the Euclidean distance. (This should make it sensitive to shifts then.) The codebook of patches provides K significant tonal musical structures that are somewhat independent of tempo and timbre.

In their experiments, the authors learn a codebook using 43,300 songs (and segments) uploaded by people at http://morecowbell.dj (nice crowd-sourcing!). (Did they keep the cowbell?)

http://www.morecowbell.dj/swf/player.swf
  Make your own at MoreCowbell.dj  

(The beat tracker gets confused with four part harmony; but I love the roll.)

To test their codebook of K=200 codes, they use a low-bandwidth collection of 8,651 songs in the collection uspop2002; as well as artist20, which has 1,402 songs by 20 popular artists. They find that the most often used codes correspond to sustained notes, perfect fourths, triads and inversions, and I-V and V-I transitions. Finally, the authors discuss the application of their approach for downbeat detection (this part is confusing), and artist recognition. The authors also provide access to their code as well!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s