Test results for inter-genre similarity, part 2

Yesterday, I posted some initial results with the music genre recognition system proposed by Bagci and Erzin.
Since I am not too confident that I understand what PRTools is doing, I have decided to implement the process with the stats toolbox of MATLAB, and get it working on a standard machine learning dataset: the handwritten digits of the US Postal Service .


Changing the code wasn’t too difficult. Essentially it is a little of this

% get data from CV fold
% train a GMM for each class
for jj=1:numclasses
idx = (trainlabels == jj);
obj{jj} = gmdistribution.fit(traindata(idx,:),numGaussians, ...
'Options',options,'CovType','diagonal');
% get conditional densities
Ptrain(:,jj) = pdf(obj{jj},traindata);
Ptest(:,jj) = pdf(obj{jj},testdata);
end
% classify training data by maximum likelihood (equal priors)
[~, predictedLabelsTrain] = max(Ptrain,[],2);
% for a number of times, cycle "inter-digit similarity" (IDS) approach
for kk = 1:numIDStimes
% find misclassified instances
idx_wrong = ~(predictedLabelsTrain == trainlabels);
% create new models of digits classes, compute conditional densities
clear obj Ptrain Ptest;
for jj=1:numclasses
idx = (trainlabels == jj);
obj{jj} = gmdistribution.fit(traindata(idx & ~idx_wrong,:),numGaussians, ...
'Options',options,'CovType','diagonal');
Ptrain(:,jj) = pdf(obj{jj},traindata);
Ptest(:,jj) = pdf(obj{jj},testdata);
end
% create model of IDS
obj{11} = gmdistribution.fit(traindata(idx_wrong,:),numGaussians, ...
'Options',options,'CovType','diagonal');
Ptrain(:,11) = pdf(obj{11},traindata);
Ptest(:,11) = pdf(obj{11},testdata);
% classify test data
[~, predictedLabelsTest] = max(Ptest(:,1:end-1),[],2);
end

And that is it really. What is different with respect to the music data is that each digit I am trying to classify consists of one feature vector, whereas with the music data I am considering windows. So if I really want to make it equivalent, I am going to have to break the digits into smaller frames, and then look over the frames.
Regardless, let’s see what happens with this process, where I am not even considering the IDS model.
Here, I am just building models (mixtures of 3 Gaussians with diagonal covariance matrices) from the digits that are always classified correctly.

Running the above for 2-fold stratified CV, and five refinement iterations, I get the following classification errors on the training and testing sets.

Fold  1: GMMC before IDS Error Train = 28.89, Test = 31.04
Fold  1: GMMC after  IDS Error Train = 22.82, Test = 25.47
Fold  1: GMMC after  IDS Error Train = 24.15, Test = 26.56
Fold  1: GMMC after  IDS Error Train = 25.16, Test = 26.73
Fold  1: GMMC after  IDS Error Train = 25.65, Test = 28.00
Fold  1: GMMC after  IDS Error Train = 22.51, Test = 24.80
Fold  2: GMMC before IDS Error Train = 31.16, Test = 32.89
Fold  2: GMMC after  IDS Error Train = 29.67, Test = 32.98
Fold  2: GMMC after  IDS Error Train = 30.98, Test = 34.45
Fold  2: GMMC after  IDS Error Train = 29.29, Test = 32.73
Fold  2: GMMC after  IDS Error Train = 25.93, Test = 29.27
Fold  2: GMMC after  IDS Error Train = 23.24, Test = 26.95

As expected, the classification error on the training dataset decreases, but it appears that the classification error on the test set decreases as well.
Let’s check for statistical significance.
Below is the contingency tables for three pairs of algorithms.
digicontable.png
The elements in the first row count those digits the once-tuned system (I1) gets correct,
but the untuned system (G) gets correct (first column), G gets incorrect (second column), the system tuned five times (I5) gets correct (third column), and I5 gets incorrect (last column).
By a Chi-squared test, we find with statistical significance (\(p<10^{-11}\)) that I5
performs better (with respect to accuracy) than G or I1.

That was fun. Let’s run it again.

Fold  1: GMMC before IDS Error Train = 26.71, Test = 27.42
Fold  1: GMMC after  IDS Error Train = 26.76, Test = 27.33
Fold  1: GMMC after  IDS Error Train = 33.24, Test = 34.36
Fold  1: GMMC after  IDS Error Train = 26.00, Test = 27.05
Fold  1: GMMC after  IDS Error Train = 26.09, Test = 26.69
Fold  1: GMMC after  IDS Error Train = 24.64, Test = 25.02
Fold  2: GMMC before IDS Error Train = 25.27, Test = 27.18
Fold  2: GMMC after  IDS Error Train = 24.95, Test = 27.33
Fold  2: GMMC after  IDS Error Train = 26.64, Test = 28.22
Fold  2: GMMC after  IDS Error Train = 25.82, Test = 27.24
Fold  2: GMMC after  IDS Error Train = 25.93, Test = 26.60
Fold  2: GMMC after  IDS Error Train = 30.64, Test = 33.07

And here are the contingency tables.
digicontable2.png
Well crap!
By a Chi-squared test, we find that we cannot reject the null hypothesis that G and I1 perform the same; but with statistical significance (\(p<10^{-3}\)) we can say that I5
performs worse than GMM and I1.
From the top to the bottom, just like that —
cross-validation can be a real bitch!

Now that I am satisfied I have some learning going on,
it is time to break up the digits and use the IDS mechanism.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s