Continuing with my reproduction of the work in M. Casey, C. Rhodes, and M. Slaney, “Analysis of minimum distances in high-dimensional musical spaces,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, pp. 1015-1028, July 2008, today I inspected the squared distances between shingles of songs related and unrelated. Before I proceed, however, I must say that I have had to make many guesses at how Casey et al. calculate their features:
- I calculate log-frequency cepstral coefficients (LFCCs) using rectangular (or triangular) bandpass filters weighted inversely to their area. (Casey et al. do not mention what weighting they use. I just adopted what is done in Slaney’s Auditory Toolbox for calculating MFCCs.)
- In the 20 coefficients, I include the zeroth DCT coefficient.
- To calculate the “power” of an LFCC shingle (which Casey et al. use to discard those that come from quiet moments), I sum all zeroth coefficient magnitudes. (Do Casey et al. use the squared norm of a shingle?)
The LFCC histograms are created by binning all shingle pairs on the intervals [0:0.01:2]; and for PCP, on [0:0.002:2] (which I selected to give both distributions approximately the same height). It is nice to see that they all have a chi-squared look. It is unusual that the PCP distances have such a smaller variance than those of the LFCCs. Is this due to those shingles living in a much smaller dimensional space, and the fact that all of these songs are tonal, and in the key of sad? Or maybe it is because I am including that 0th DCT coefficient. (When I rerun without the zeroth coefficient the LFCC distribution gets wider and moves higher.) I wonder if that bump in Purple Rain PCPs is due to the modulation in the song …
Now it is time for me to incorporate the fruits of yesterday and then tweak.