Strange behavior in sparse representation classification?

While putting together the finishing touches on our paper for CMMR 2012 about music genre classification with sparse representation classification,
we noticed something funny going on with the classifier.
In our experiments, we are measuring classifier performance using features that have undergone some dimensionality reduction.
Standard ways to do this are projecting the dataset onto its significant principal directions,
thereby forming linear combinations of the original features and maintaining variability in the data in a lower dimensional space. Another way is to project the dataset onto the span of a set of positive features found through non-negative matrix factorization.
A non-adaptive approach is just randomly projecting the dataset onto a subspace.
We can also downsample the features by lowpass filtering and decimation.
So we coded these up, and after much debugging, are quite sure things are working as expected. I made a mistake though when specifying the downsample factors,
and ended up running lots of experiments with features that were ideally interpolated higher-dimensional versions of their original low dimensional selves.
This interpolation appears to provide somewhat of a boost to the accuracy.

In the figure below, we compare the classification accuracy
for four reduction methods and several reduction factors.
(You can see my mistake has shifted the “Downsample” line a bit.)
At a factor of 4, the feature dimension is 1/4 that of the original.
At 1, we are just using the original feature.
And at a factor of 0.5, the dimension is twice that of the original,
created by putting a zero between each feature element and then
low pass filtering to remove the alias.
I expect there to be a dimensionality that is just right for maximizing the accuracy,
and for there to be some benefit in reducing the dimensionality given that the amount of training data we have does not change.
So the dip at no reduction (1) makes sense.
But why the boost of nearly 8% in mean accuracy
with an ideal interpolation of the features?
(We have seen this happen repeatedly with other features as well.)

errors_ours.pdf.png
Is the lowpass filtering of the ideal interpolation making things more discriminable for the sparse representation classifier?
This is something we will have to explore and isolate its cause.

PS: Sorry for the long delay in posts! Happy new year too!
Much more will come in a few weeks after submission, exams, and semester start.

5 thoughts on “Strange behavior in sparse representation classification?

  1. Hello Bob!
    Is the plot of Classification Accuracy some sort of generalization error, meaning are you testing different examples then you are training on, ie either a separate test set or n-fold cross validation? If not, then the behavior would not surprise me. Classification Accuracy on the training set can trivially go to 100% depending on the model parameters (complete memorization).
    Perhaps in this case the “smoother” upsampled data gets a bit further in remembering the training examples because they are more regular (being bandlimited) than the original data.
    Though the training statistics can be informative, some test statistics are also necessary.

    Like

  2. I am doing 10-fold cross validation.
    I think I know what is going on. I am using BP (with equality constriants) for the sparse representation classification (because that is what is said is used). When upsampling the set of features, the resulting dictionary is no longer full rank, and so to cope I include an identity matrix. Including these elements may make the equality constraints less problematic for the classifier. I will do the same for the other dictionaries and see if it increases performance.

    Like

  3. Sorry for jumping to conclusions about your Accuracy statistics. That indeed sounds like a reasonable explanation and a way to test it.

    Like

  4. No worries! I am discomforted by the amount of variation too, since in the original publication we are reproducing the variances are much smaller (and the means are much, much higher).

    Like

Leave a comment