Paper of the Day (Po’D): Even More Sparse Separation Edition

Hello and welcome to Paper of the Day (Po’D): Even More Sparse Separation Edition.
Today’s paper comes from ICASSP 2008:
B. V. Gowreesunker and A. H. Tewfik, “Blind source separation using monochannel overcomplete dictionaries,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., (Las Vegas, NV), pp. 33-36, Mar. 2008.

The authors show how sources can be separated from mixtures by performing greedy sparse approximation using a single channel dictionary of time-localized atoms on linear combinations of the mixtures. Consider the model \(\MY = \MX \MA\), where \(\MX\) are the sources and \(\MA\) is the mixing matrix. And assume that each source can be described in a sparse way over a dictionary \(\MD\), i.e., \(\MX = [ \MD\vc_1 | \MD\vc_2 | \MD\vc_3 | \ldots ]\) where most of the weights in \(\vc_i\) are zero. This makes the model of the mixtures (considering just three sources) $$\MY = [ \MD\vc_1 | \MD\vc_2 | \MD\vc_3 ] \MA = [ \MD\vc_1a_{1,1} + \MD\vc_2 a_{2,1} + \MD\vc_3 a_{3,1} | \MD\vc_1a_{1,2} + \MD\vc_2 a_{2,2} + \MD\vc_3 a_{3,2} ].$$
Now, if we then create a linear combination of the mixtures, $$\vm :=\MY \left [ \begin{array}{c}
1 \\
\lambda
\end{array}\right ] = \MD \left [ \vc_1a_{1,1} + \vc_2 a_{2,1} + \vc_3 a_{3,1}+ \lambda (\vc_1a_{1,2} + \vc_2 a_{2,2} + \vc_3 a_{3,2}) \right ]$$ we see that a proper choice of \(\lambda\) can cancel one of the sources, e.g., \(\lambda = -a_{2,1}/a_{2,2}\) will remove the second source. A sparse approximation of this mixture can then be used to recover the first and third sources up to a scale factor. Without knowledge of the mixing matrix, however, we can only make guesses. The authors thus propose to decompose several mixtures of the mixtures, and looking across decompositions to find frequently occurring atoms; but here my first confusion arises because they appear to use knowledge of the mixing matrix, which isn’t really “blind source separation.”

Their algorithm works as follows (in the two channel case):

  1. Choose \(\lambda\) and create a linear combination of mixtures.
  2. Perform sparse approximation process with some dictionary on this mixture of mixtures (to what order?).
  3. For the set of atoms that have non-zero weights, find the weights for the orthogonal projection of the two mixtures (thus doing well to ignore the optimism inherent to the weights of a greedy pursuit).
  4. Categorize each atom to one of the two sources by the magnitude inner product between its pair of weights and each row of the mixing matrix.
  5. Reconstruct sources.

The authors test their algorithm on a mixture of three artificially-sparse source signals created by a linear combination of dictionary atoms (I wonder what kind of dictionary and its coherency, and how many atoms in each source?). The authors know the mixing matrix, and choose three values of \(\lambda\) based on it to cancel one of the sources from each mixture of mixtures. They compare the number of “correct” (and “incorrect”) atoms in the sparse decompositions with those in the synthesis of each source, and find that their algorithm appears to recover a higher percentage of correct atoms than does stereo MP (reviewed Monday) as a function of approximation error (which I assume is the \(\ell_2\)-norm of the residual). (It is a bit unfair to make such a comparison with MP since its error will always be greater than after performing an orthogonal projection, unless the sources are trivially sparse.)

The authors also test their algorithm on the mixtures of three speech signals, again with knowledge of the mixing matrix, and find that it produces comparable and better results to stereo MP with respect to the signal-to-interference and signal-to-artifact ratios. Here my second confusion arises. The authors mention that the dictionary they use in this case (for their algorithm, but maybe not for stereo MP) is generated by K-SVD (M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. 54, pp. 4311-4322, Nov 2006). I remember from the talk by Vikrham at ICASSP that he said the dictionary was trained on the three sources. Here though, I think we completely leave the “blind source separation” problem when we use a dictionary trained on the exact sources in the mixtures. I would argue though that the results in this paper show how well we can expect source separation by sparse approximation to perform when we know the mixing matrix, and have a very good dictionary.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s