Last night I ran some experiments to compare the recovery behaviors of greedy approaches and \(\ell_1\) minimization (BP) for sparse vectors with non-zero elements distributed differently. First, I have elements distributed standard normal.

Then, elements distributed Laplacian.

Third, elements with magnitudes distributed uniformly in \([1,2]\).

And finally, elements from \(\{-1,1\}\), with equal probability.

(I normalize every sparse vector to 1.)

Below we see how the average performance (200 trials) of

three greedy approaches compares with that of BP — which appears to stay quite consistent over all four distributions.

The greedy approaches are wildly inconsistent. ranging from over 90% recovery for vectors with sparsity above 0.25 to only 0.16.

The mean errors also vary wildly among the different approaches. And here we see BP is not so consistent.

I am reading, A. Barron, A. Cohen, W. Dahmen, and R. DeVore, “Approximation and learning by greedy algorithms,” Ann. Statist. vol. 36, no. 1, pp. 64-94, 2008.

In there they summarize and extend past work in measuring the performance of sparse approximation by OMP, relaxed greedy algorithms (which I believe is exactly what my PhD was about), and others.

They show, for instance, that the norm (in the Hilbert space) of the \(n\)th-order residual \(\vr(n)\) from an approximation of a signal \(\vx\) using a dictionary \(\MD\) is bounded from above by an exponential

$$|| \vr(n) || \le C (n + 1)^{-1/2}$$

where \(C\) depends on the signal \(\vx\)

$$C = \inf \left \{ || \vs ||_1 : \vx = \MD \vs \right \}.$$

That is, this constant depends on the greatest lower bound of the \(\ell_1\)-norm of the solution coefficients.

Now, I am not sure how loose this bound is, but \(C\) should change depending on how the non-zero coefficients of the sparse vector are distributed.

With reference to the above, when those values are distributed Bernoulli in \(\{-1,1\}\) then \(C\) might be the largest; and when those values are distributed Laplacian, then \(C\) will be smaller.

But this error refers to the residual \(\vx – \MD\vs\), and not to what we are interested in, i.e., \(\vs – \hat \vs\).

I am sure the answer is somewhere, but what my experiments are telling me is to (in the noise-free case) use a greedy recovery approach when I know the non-zero elements of my compressively sensed sparse vector are distributed Gaussian or Laplacian — which I would assume happens in many non-trivial cases.

Shouldn’t we be looking at signals with a power law distribution (or at least add this to the list of distributions to test)? For instance in

Candes EJ, Tao T. Near Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? (http://arxiv.org/abs/math/0410542),

they say:

“In general, signals of practical interest may not be supported in space or in a transform domain on a set of relatively small size. Instead, the coefficients of elements taken from a signal class decay rapidly, typically like a power law”.

LikeLike

This might be (or not) relevant with some of the issues you are trying to investigate, Lianlin Li’s post of a while back (the beginning is in chinese):

http://to-cs.blog.sohu.com/142625002.html

which points back to one paper by Volkan that I featured here:

http://nuit-blanche.blogspot.com/2009/10/cs-learning-with-compressible-priors.html

Hope this helps,

Igor.

LikeLike

Thank you Igor. It appears that the paper by Volkan is quite relevant.

LikeLike

Thank you for the link! I think this paper is relevant and will read it soon.

LikeLike