Greedy recovery depends on the sparse vector distribution

Last night I ran some experiments to compare the recovery behaviors of greedy approaches and \(\ell_1\) minimization (BP) for sparse vectors with non-zero elements distributed differently. First, I have elements distributed standard normal.
Then, elements distributed Laplacian.
Third, elements with magnitudes distributed uniformly in \([1,2]\).
And finally, elements from \(\{-1,1\}\), with equal probability.
(I normalize every sparse vector to 1.)
Below we see how the average performance (200 trials) of
three greedy approaches compares with that of BP — which appears to stay quite consistent over all four distributions.
The greedy approaches are wildly inconsistent. ranging from over 90% recovery for vectors with sparsity above 0.25 to only 0.16.

The mean errors also vary wildly among the different approaches. And here we see BP is not so consistent.


In seeking an answer to this behavior,
I am reading, A. Barron, A. Cohen, W. Dahmen, and R. DeVore, “Approximation and learning by greedy algorithms,” Ann. Statist. vol. 36, no. 1, pp. 64-94, 2008.
In there they summarize and extend past work in measuring the performance of sparse approximation by OMP, relaxed greedy algorithms (which I believe is exactly what my PhD was about), and others.
They show, for instance, that the norm (in the Hilbert space) of the \(n\)th-order residual \(\vr(n)\) from an approximation of a signal \(\vx\) using a dictionary \(\MD\) is bounded from above by an exponential
$$|| \vr(n) || \le C (n + 1)^{-1/2}$$
where \(C\) depends on the signal \(\vx\)
$$C = \inf \left \{ || \vs ||_1 : \vx = \MD \vs \right \}.$$
That is, this constant depends on the greatest lower bound of the \(\ell_1\)-norm of the solution coefficients.
Now, I am not sure how loose this bound is, but \(C\) should change depending on how the non-zero coefficients of the sparse vector are distributed.
With reference to the above, when those values are distributed Bernoulli in \(\{-1,1\}\) then \(C\) might be the largest; and when those values are distributed Laplacian, then \(C\) will be smaller.
But this error refers to the residual \(\vx – \MD\vs\), and not to what we are interested in, i.e., \(\vs – \hat \vs\).

I am sure the answer is somewhere, but what my experiments are telling me is to (in the noise-free case) use a greedy recovery approach when I know the non-zero elements of my compressively sensed sparse vector are distributed Gaussian or Laplacian — which I would assume happens in many non-trivial cases.


4 thoughts on “Greedy recovery depends on the sparse vector distribution

  1. Shouldn’t we be looking at signals with a power law distribution (or at least add this to the list of distributions to test)? For instance in
    Candes EJ, Tao T. Near Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? (,
    they say:
    “In general, signals of practical interest may not be supported in space or in a transform domain on a set of relatively small size. Instead, the coefficients of elements taken from a signal class decay rapidly, typically like a power law”.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s