Continuing with my reproduction of the work in M. Casey, C. Rhodes, and M. Slaney, “Analysis of minimum distances in high-dimensional musical spaces,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, pp. 1015-1028, July 2008, today I inspected the squared distances between shingles of songs related and unrelated. Before I proceed, however, I must say that I have had to make many guesses at how Casey et al. calculate their features:

- I calculate log-frequency cepstral coefficients (LFCCs) using rectangular (or triangular) bandpass filters weighted inversely to their area. (Casey et al. do not mention what weighting they use. I just adopted what is done in Slaney’s Auditory Toolbox for calculating MFCCs.)
- In the 20 coefficients, I include the zeroth DCT coefficient.
- To calculate the “power” of an LFCC shingle (which Casey et al. use to discard those that come from quiet moments), I sum all zeroth coefficient magnitudes. (Do Casey et al. use the squared norm of a shingle?)

Above we can see histograms of the squared distance between LFCC and PCP feature shingles from Marty Robbins’ rendition of Tumbling Tumbleweeds (1678 shingles), and those of another Marty Robbins tune (2769 shingles), a different version of Tumbling Tumbleweeds (1664 shingles), High Noon (1444 shingles), and Purple Rain (1862 shingles).

The LFCC histograms are created by binning all shingle pairs on the intervals [0:0.01:2]; and for PCP, on [0:0.002:2] (which I selected to give both distributions approximately the same height). It is nice to see that they all have a chi-squared look. It is unusual that the PCP distances have such a smaller variance than those of the LFCCs. Is this due to those shingles living in a much smaller dimensional space, and the fact that all of these songs are tonal, and in the key of sad? Or maybe it is because I am including that 0th DCT coefficient. (When I rerun without the zeroth coefficient the LFCC distribution gets wider and moves higher.) I wonder if that bump in Purple Rain PCPs is due to the modulation in the song …

The LFCC histograms are created by binning all shingle pairs on the intervals [0:0.01:2]; and for PCP, on [0:0.002:2] (which I selected to give both distributions approximately the same height). It is nice to see that they all have a chi-squared look. It is unusual that the PCP distances have such a smaller variance than those of the LFCCs. Is this due to those shingles living in a much smaller dimensional space, and the fact that all of these songs are tonal, and in the key of sad? Or maybe it is because I am including that 0th DCT coefficient. (When I rerun without the zeroth coefficient the LFCC distribution gets wider and moves higher.) I wonder if that bump in Purple Rain PCPs is due to the modulation in the song …

Here we see Frankie Laine’s High Noon, compared with the others. Quite different distributions from above, but still chi-squared.

Now it is time for me to incorporate the fruits of yesterday and then tweak.

Advertisements