Paper of the Day (Po’D): Weighted Voting of Sparse Representation Classifiers for Recognizing Face Emotion Edition

Hello, and welcome to Paper of the Day (Po’D): Weighted Voting of Sparse Representation Classifiers for Recognizing Face Emotion Edition. Today’s paper is: S. F. Cotter, “Weighted voting of sparse representation classifiers for facial expression recognition,” in Proc. European Signal Process. Conf., (Aalborg, Denmark), pp. 1164-1168, Aug. 2010. This work is an outgrowth of related work:

  • S. F. Cotter, “Sparse representation for accurate classification of corrupted and occluded facial expressions,” Proc. Int. Conf. Acoustics, Speech, Signal Process., (Dallas, TX, USA), pp. 838-841, Mar. 2010.
  • J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, pp. 210-227, Feb. 2009.
  • J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan, “Sparse representation for computer vision and pattern recognition,” Proc. IEEE, Mar. 2009.

Also of interest are the following related papers:

  • P. J. Phillips, “Matching pursuit filters applied to face identification,” IEEE Trans. Image Process., vol. 7, pp. 1150-1164, Aug. 1998.
  • T. V. Pham and A. Smeulders, “Sparse representation for coarse and fine object recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, pp. 555-567, Apr. 2006.

And with regards to sparse weighted voting classifiers:

  • N. Goldberg and J. Eckstein, “Sparse weighted voting classifier selection and its LP relaxations,” Ructor Research Report 9-2010 (Rutgers University, Piscataway, NJ), May 2010.

In the interest of extreme brevity, here is my one line description of the work in this paper:

With stunning accuracy, we can segment a face with and without occlusions into regions, sparsely approximate each region with a dictionary of other faces similarly segmented and labeled with an emotion, and use a simple voting approach to recognize the emotion.

In this paper the authors build and test a system for recognizing the emotion expressed in an image of a human face. The classification is built on the method of sparse representation classification (discussed here previously: Paper of the Day (Po’D): Music Genre Classification via Compressive Sampling Edition, and Paper of the Day (Po’D): Music Genre Classification this time with Sparse Representations Edition, and Paper of the Day (Po’D): Speech Recognition by Sparse Approximation Edition). Sparse representation classification is nothing more than finding the minimum distortion in a set of codebooks using a sparsity constraint. Here the author uses \(\ell_1\) minimization with an equality constraint (hence Basis Pursuit and not Basis Pursuit denoising, which assumes the dictionary is complete).
In this case, the author creates each dictionary by segmenting into 9 zones a set of face images with labeled emotional expression, as seen below in Fig. 3(a).
Each one of these segments is then vectorized and normalized.
Where there is an occlusion of the face in the unlabeled image,
a subdictionary is added to the regular dictionary to account for it
and later remove it from the model in classification step.
(Details are sparse on how to build this occlusion dictionary however,
and how to determine when and where an occlusion occurs.)
The author considers several strategies for compiling the segment coding results into an overall classification: 1) emotion class with highest number of votes; 2) a weighted voting approach; and a weighted voting approach that incorporates the coding error.
The author compares these three approaches against two approaches that use the entire image (straight sparse classification, and Gabor filters with nearest neighbors) and finds that with no occlusion, they all perform about the same with 95% accuracy.
When occlusions grow, however, the weighted voting approach incorporating coding error outperforms the others with an accuracy of a little less than 70% even when over half the image is occluded (accuracy with occlusion of lower half is 77% while of upper half is 64% ==> its in the eyes).
The accuracy of the other approaches is between 24% – 50%.
The startling thing overall is that though each codebook is not generalized in any way (such as by a Lloyd algorithm), the classification ends up performing spectacularly well.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s