Paper of the Day (Po’D): Sound Edition

Hello, and welcome to the Paper of the Day (Po’D): Sound
Edition. Today’s paper comes from ICASSP 2009. Since ICASSP 2010 is next week, I figure I better get started looking at the papers I printed from last year:

H. D. Tran and H Li, Sound event classification based on feature integration, recursive feature elimination and structured classification, in Proc. ICASSP, pp. 177-180, Taipei, Taiwan, Apr. 2009.

The automatic classification of data is not just useful for surveillance, but also the automatic description of any large amount of sound data, such as the collection of sound data that exists on the Internet. In this paper, the authors explore the effectiveness of some feature selection algorithms to increase the classification of several different examples of environmental audio signals, such as cry, scream, door slam, clapping, speech, etc. An oft-used approach to the problem of sound classification is the use of feature vectors that contain large numbers of different features encompassing all sorts of characteristics of signals in the time- and frequency-domain, as well as higher-level descriptors such as tempo, pitch, voiced/unvoiced, etc. With these ever-expanding feature vectors, however, comes a price paid in terms of the number of training samples needed to build a classifier that is general enough to be useful. This is colorfully known as “the curse of dimensionality.” Besides throwing in every descriptor one can think of, a much better approach is to find those descriptors among the many that give a large separation between groups and a small separation within groups. The price paid however, is finding those relevant subsets of features. The authors show in this paper that a feature selection algorithm designed for analyzing genetic material can be used to reduce their 226-dimensional feature vectors to ones containing dozens of features, while increasing performance of a SVM classifier. They further augment this technique with a multi-level classifier that becomes more specific with each level. It is unclear from this paper, however, how the authors labeled their dataset, and the criteria for doing so. 

Reblog this post [with Zemanta]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s