Hello, and welcome to the Paper of the Day (Po’D): Dual Matching Pursuit Edition. Today’s interesting paper comes from ICASSP 2004: P. Sugden and N. Canagarajah, “Underdetermined noisy blind separation using dual matching pursuits,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., (Montreal, Quebec, Canada), pp. 557-560, May 2004.

The authors propose an alternative approach to stereo matching pursuit (reviewed yesterday) for separating sources at the atomic level. Instead of performing MP on both mixtures using a dictionary of two-channel time-frequency atoms, the authors here perform MP with an overcomplete (single channel) time-frequency dictionary \(\mathcal{D} = \{ \vg \in \MR^K : ||\vg||_2 = 1\}\) on each mixture separately. So here, the \(i\)th channel of the mixture \(\MY\) can be modeled as $$\lim_{L \to |\mathcal{D}|} \Norm \vy_i – \sum_{l=1}^{L} \beta_{l}^{(i)} \vg_l^{(i)} \Norm = 0$$ and thus $$ \MY \approx \left [ \sum_{l=1}^{L_1} \beta_{l}^{(1)} \vg_{l}^{(1)} \Biggl | \sum_{l=1}^{L_2} \beta_{l}^{(2)} \vg_{l}^{(2)} \Biggl |\cdots \right ]$$ bordering on notation overload.

Since “uncorrelated sources have a low probability of sharing elements localized in both time and frequency,” the authors assume that there will exist atoms repetitions across channels that belong to each source., i.e., “two atoms one source,” instead of Gribonval’s

“one (stereo) atom one source” (made justifiable by the atom selection across the mixtures). Thus, if an atom is repeated across the mixtures (which I assume means the atoms will have the same support and modulation frequency, but not necessarily the same shift) then an estimate of the “panpot parameter” for the source associated with the atom, which pertains to a \(\theta\) in the mixing matrix \(\mathbf{\Theta}\), will be given by the arc tangent of the ratio of the weights of the two channels in a stereo case. As done by Gribonval, finding the distribution of these individual estimates can reveal the number of sources, and their (static) locations in the stereo field, and then generate the separated sources up to a scale factor. The authors also show how using complete dictionaries learned for the different classes of source signals expected can greatly enhance the separability of the sources from their mixtures within the underdetermined and sparse context.

While it is true that uncorrelated sources can be considered non-interacting (and thus well-separable) in the time-frequency domain, this begins to break down with multiple musical instruments playing together. And certainly so for short-time wideband phenomena like transients. However, I predict that this atomic approach (using MP and a time-frequency dictionary as in this paper of Gribonval’s) will fail miserably for a mixture of constant amplitude sinusoids that are completely non-overlapping in the time-frequency domain. It is dangerous to assume an atom (and its attendant parameters) of a time-frequency model of a signal built by MP is a real aspect of the signal, and is not an artifact of the decomposition process. Atoms will be given amplitudes that are overestimated, and this will degrade the estimation of the mixing matrix in the method proposed in this paper (and Gribonval’s). Even though here an atom in the model of one channel is not considered a real aspect of a source unless there is a similar atom in the other channels, it is not taking into consideration the problem of greed in MP. But who cares about separating stationary signals composed of sinusoids? Perhaps the breakdown of greedy iterative descent approximation methods looks bad on paper and contrived examples (except for the well-known pre-echo artifacts), but in the end, above the level of the atom, the artifacts may average themselves into irrelevance.