# Paper of the Day (Po’D): The Other Other Probabilistic Matching Pursuits Edition

Hello, and welcome to Paper of the Day (Po’D): The Other Other Probabilistic Matching Pursuits Edition. Today’s paper is: M. Elad and I. Yavneh, “A plurality of sparse representations is better than the sparsest one alone,” IEEE Trans. Info. Theory, vol. 55, pp. 4701-4714, Oct. 2009. I have broached this subject before, namely in the following:

1. A. Divekar and O. Ersoy, “Probabilistic matching pursuit for compressive sensing,” tech. rep., Purdue University, West Lafayette, IN, USA, May 2010. (Po’D here.)
2. P. J. Durka, D. Ircha, and K. J. Blinowska, “Stochastic time-frequency dictionaries for matching pursuit,” IEEE Trans. Signal Process., vol. 49, pp. 507-510, Mar. 2001. (Related Po’D here.)
3. S. E. Ferrando, E. J. Doolittle, A. J. Bernal, and L. J. Bernal, “Probabilistic matching pursuit with gabor dictionaries,” Signal Process., vol. 80, pp. 2099-2120, Oct. 2000. (Po’D here.)

In this paper, the authors look at the capacity for a greedy method to denoise a signal by it capacity to find many representations. Given a signal corrupted by noise $$\vy = \vx + \vn$$, we want to recover $$\vx$$ from our measurement $$\vy$$. Considering an overcomplete dictionary $$\MD$$ of $$N$$ atoms, i.e., the dictionary spans the space in which $$\vx$$ exists, we can find many solutions
$$\mathcal{S}(\vy; \MD, \epsilon) = \{\vs_i : || \vy – \MD\vs_i ||_2^2 \le \epsilon\}.$$
If we know the expected variance of the noise signal, then we can set $$\epsilon$$ appropriately.
Based on the greedy approach we take, e.g., matching pursuit (MP), orthogonal matching pursuit (OMP), orthogonal least squares (OLS), we will obtain one member of $$\mathcal{S}(\vy; \MD, \epsilon)$$.
However, that is not to say such a solution is “the best.”
MP, OMP, OLS have all been shown to fail in many circumstances, related in complex ways to the dictionary used and the signal decomposed.
We might also be convinced by the “song of the sparsity siren” that the sparsest member of $$\mathcal{S}(\vy; \MD, \epsilon)$$ will be “the best” in terms of minimizing, e.g., $$|| \vx – \MD\vs_i||_2$$.
However, we should not forget the well-known adage:

“There’s value in them thar non-unique solutions to underdeterMINED problems.”

With that, and their presentation of “Randomized OMP (RandOMP)”, we see that Elad et al. do nearly the same thing as done in PrOMP by Divekar et al. (See that Po’D for the details.) The only differences are: 1) their probability distributions for the atoms (RandOMP uses a Gaussian-like distribution, and PrOMP uses uniform distributions); and 2) stopping criterion (RandOMP stops when the residual is less than the noise variance, and PrOMP stops only when the error is zero).
Since ROMP was presented by Elad et al. before April 2008, and PrOMP was presented by Divekar et al. around April 2010 with no reference to this article by Elad et al., I consider the latter as a rediscovery of the former, except with a slight adjustment in terminology, e.g., “compressive sensing” — but both works are really about sparse representation.
With a plethora of solutions $$\mathcal{S}(\vy; \MD, \epsilon)$$,
Elad et al. find the average solution
$$\bar \vs := \frac{1}{|\mathcal{S}(\vy; \MD, \epsilon)|}\sum_{\vs_i \in \mathcal{S}(\vy; \MD, \epsilon)} \vs_i$$
and thereby estimate $$\hat \vx = \MD\bar \vs$$.
(In this respect, this approach is much like that by Durka et al., but there the authors attempt to deal with the annoying behaviors of MP to inject “decomposition noise” into the representations.)

Above is an image of some of the results by Elad et al. In subfigure (a) we see that for their 10-sparse length-100 signal (composed from a 200 atom dictionary of iid normal random variables — each atom normalized to unity), the distribution of 1000 different representations. (The noise signal is also iid normal, thus $$\epsilon := 1$$.)
OMP returns a 2-sparse solution, while RandOMP returns a plethora of other models of varying orders.
In (b) and (c) we see that had we paid attention to the song of the sparsity sirens, we would have paid the price in decreased noise attenuation. The OMP model is among those with an better than aveage denoising capacity (smaller attenuation here is better), but many other representations recover the original signal with much less error. Thus, as emphasized in (d), model order, sparsity, solution cardinality, are not synonymous with accuracy, good recovery, etc.

The image above shows the effect of averaging all of the representations together. The blue stems show the true sparse signal; the red stems is the 2-sparse signal obtained by OMP; and the black line is the solution $$\bar \vs$$… which is no longer sparse. In this application, that is fine because the name of the game is recovery of a noise-corrupted signal. In this case the noise attenuation is 2.8 times better than that of OMP. But I can’t help but stare at the incredible disconnect between the “true” representation and the “recovered” one.

In the second part of this article, the authors show theoretically and empirically how RandOMP approaches the minimum mean squared error estimator with respect to $$||\bar \vx – \vx||_2$$, while OMP approaches the maximum a posteriori estimate. The former, being the minimum, is obviously “the best” when it comes to denoising. This relationship makes sense as RandOMP involves a mean representation, and OMP approximates once the MAP estimator by roughly assigning projection magnitudes to a posteriori probability.

To complete my collection of pursuit algorithms incorporating some aspects of probability theory (Collect them all!), I look forward to reviewing the following papers in the coming days (which I will spend in Paris with some fresh bread and comté)

1. H. Zayyani, M. Babaie-Zadeh, and C. Jutten, “Bayesian pursuit algorithm for sparse representation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., (Taipei, Taiwan), pp. 1549-1552, Apr. 2009.
2. P. Schniter and L. C. Potter and J. Ziniel, “Fast Bayesian matching pursuit: model uncertainty and parameter estimation for sparse linear models,” IEEE Trans. Signal Process. (submitted still??)