So many papers, so little time.
“Music Structure Analysis by Subspace Modeling” by Y. Panagakis and C. Kotropoulos
This paper applies subspace clustering of beat-aligned auditory temporal modulation features to extracting the structure of musical signals. This is an interesting unsupervised method for discovering structure. Subspace clustering is reinvented in R.Vidal, “Subspace clustering,” IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 52-68, 2011. It is originally proposed in B. V. Gowreesunker and A. H. Tewfik, “A novel subspace clustering method for dictionary design,” in Proc. ICA, 2009, vol. 5441, pp. 34-41; B. V. Gowreesunker and A. H. Tewfik, “A shift tolerant dictionary training method,” presented at the Signal Processing With Adaptive Sparse Structured Representations (SPARS), Saint Malo, France, 2009, INRIA Rennes–Bretagne Atlantique; and B. V. Gowreesunker and A. H. Tewfik, “Learning sparse representation using iterative subspace identification” IEEE Transactions on Signal Processing, Vol. 58, no. 6, June 2010.
“A Framework for Fingerprint-Based Detection of Repeating Objects in Multimedia Streams” by S. Fenet, M. Moussallam, Y. Grenier, G. Richard and L. Daudet
Take the Shazam fingerprint method, and generate the anchors by a matching pursuit that emphasizes atom diversity instead of peak finding in a spectrogram.
“Evolutionary Feature Generation for Content-based Audio Classification and Retrieval” by
T. Mäkinen, S. Kiranyaz, J. Pulkkinen and M. Gabbouj
An approach to optimizing features using particle swarms? I like it already.
“Robust Retina-based Person Authentication Using the Sparse Classifier” by A. Condurache, J. Kotzerke and A. Mertins
Sparse representation classification goes CSI. The paper does not mention the computational approach used to find the sparse representations.
“Gammatone Wavelet Features for Sound Classification in Surveillance Applications” by
X. Valero and F. Alías
The paper proposes new features for discriminating between different sounds, such as dogs barking, people talking or screaming, guns shooting, feet stepping, and thunder clapping. This approach employs perceptual motivation for the features. Why handicap a classification or estimation system by human limitations?
“Daily Sound Recognition Using a Combination of GMM and SVM for Home Automation”
by M. A. Sehili, D. Istrate, B. Dorizzi and J. Boudy
Somehow GMMs and SVMs are combined with sequence discrimination. I need to read closer since it looks really interesting.
“Enhancing Timbre Model Using MFCC and Its Time Derivatives for Music Similarity Estimation” by F. de Leon and K. Martinez
In past work, MFCC features have often been concatenated with delta MFCC and delta^2MFCC. This paper looks at the effect on classification of treating these separately using bags of frames of features (BFFs). Genre classification of musical signals shows differences between these approaches. But shouldn’t a proper scaling of the dimensions work the same?
“Classification of Audio Scenes Using Narrow-Band Autocorrelation Features” by
X. Valero and F. Alías
This paper proposes treating separately the bands of a multiband decomposition of music signals. The four low-level features extracted come from the autocorrelation of each separate band. These low-level features are tested in discriminating between acoustic environments (such as classroom and library).
“Large Scale Polyphonic Music Transcription Using Randomized Matrix Decompositions” by I. Ari, U. Simsekli, A. T. Cemgil and L. Akarun
This looks like very fine work employing randomized factorizations that can handle large datasets. The paper points to P.Smaragdis, “Polyphonic Pitch Tracking by Example,” in Proc. IEEE WASPAA, pp. 125-128, 2011.
“Searching for Dominant High-Level Features for Music Information Retrieval” by
M. Zanoni, D. Ciminieri, A. Sarti and S. Tubaro
This paper attacks the problem of making features more discriminable by clustering,
and tests the new methods in the context of genre recognition.
Groups of music excerpts associated with features that are near cluster centroids are evaluated by humans in semantic terms, which shows some interesting high-level properties, e.g., music that is “Groovy” or “Classic.”
“AM-FM Modulation Features for Music Instrument Signal Analysis and Recognition” by
A. Zlatintsi and P. Maragos
This paper describes applying perceptual-based features to identifying musical instruments. Tests on monophonic instrument recordings (is the IOWA dataset of isolated notes, or musical phrases?) give good results.
“Analysis of Speaker Similarity in the Statistical Speech Synthesis Systems Using a Hybrid Approach” by E. Guner, A. Mohammadi and C. Demiroglu
This work will be useful for the iPad game I will create someday.
“A Geometrical Stopping Criterion for the LAR Algorithm” by C. Valdman, M. Campos and J. Apolinário Jr.
The paper applies something called a Volterra filter to determine when to stop LAR. I have heard LARS is related to subspace pursuit, so this paper could provide some interesting ideas.
“Signal Compression Using the Discrete Linear Chirp Transform (DLCT)” by O. Alkishriwo and L. Chaparro
This paper proposes using a chirp transform for audio compression. Essentially, the algorithm estimates chirp parameters and a amplitude for each frame.
The authors apply this algorithm to speech and bird sounds, and compare its performance to that of a compressed sensing of the audio. This makes no sense to me. That is like racing a red Ferrari against a red tomato, which makes no sense to me either. The paper says, “[bird song] is sparser in the time domain than in the frequency domain.” What??
The figures are useless, but the description claims the chirp transform method is better.