So, \(\ell_1\) minimization works fine, but greedy methods perform much better for certain sparse signals?

I have spent the past week studying the performance of various greedy algorithms in recovering sparse signals from compressive measurements, with most interest in that of cyclic matching pursuit, and cyclic orthogonal least squares. Then I looked at the performance of relaxing the strict sparsity criterion with the \(\ell_1\) norm, the principle known as Basis Pursuit (BP). It appeared that BP performs much worse than all the others — which was strange to me because with a sensing matrix of size 50 x 250, its coherence seems to me to preclude greedy methods from performing any good at all. Thinking it was some result of using CVX for solving the convex problem, I used MATLAB’s linear program function, and found that BP is still underperforming the greedy methods.

Then, a commenter (thank you Alejandro!) took my code and substituted Elad’s sparse signal construction for mine and found the expected result: BP outperforms greedy approaches. We can see below that only when the sparsity of the sensed signal is over 8/50, then reconstruction by BP begins to fail, while all the greedy approaches begin to fail after 5/50. But actually, comparing the BP performance seen below to that of my previous study, we see its performance hasn’t changed, but rather the performances of the greedy approaches became worse.

BPRecoveryProbability.jpg

CMPRecoveryProbability.jpg
So what is the difference between the sparse signals that has caused these differences?
Apparently, it is a magnitude thing. Before, I generated the signals like so:

activelements = randperm(N);
x(activelements(1:K)) = randn(K,1);
x = x./sqrt(x’*x);
y = Phi*x;

where N is 250, K is the sparsity, and Phi is the \(50 \times 250\) sensing matrix.
These sparse signals apparently make greedy recovery methods way out-shine the more complex and sophisticated approach using convex relaxation.
Elad instead makes sparse signals like so:

activelements = randperm(N);
x(activelements(1:K)) = sign(randn(K,1)).*(1+rand(K,1));
x = x./sqrt(x’*x);
y = Phi*x;

The difference is embiggened and emboldened.
All the non-zero elements in Elad’s sparse signals have a magnitude uniformly distributed in \([1,2]\) (before normalization),
whereas in my original approach I created sparse signals with non-zero elements having amplitudes distributed standard normal (before normalization).
This is entirely confusing to me.
Why should greedy methods work better than \(\ell_1\) minimization when entries are distributed normally?

Advertisements

One thought on “So, \(\ell_1\) minimization works fine, but greedy methods perform much better for certain sparse signals?

  1. Here are a couple of answers to your question:
    1) Greedy methods are said to work better when there is a large dynamic range among the (active) coefficient magnitudes. Intuitively, if the largest coefficient is easy to spot, then a greedy method can identify it and subtract its contribution from the measurements. Then, if the largest coefficient in the residual is easy to spot, the greedy method will next identify it, and so on. But if, instead, there are multiple large coefficients with similar magnitudes, a greedy method can get confused. This behavior has been rigorously analyzed in the context of multiuser communications under the guise of “successive interference cancellation for CDMA”.
    2) Another way to look at this problem is via Bayesian estimation theory. This approach makes more sense if you consider the presence of noise, which is usually present in practical problems. (I’m not sure if your experiments included noise.) Anyway, the sparse reconstruction problem can be thought of as a non-Gaussian estimation problem, where the non-Gaussianity is a result of sparsity. For example, a signal generated from mostly zeros and a few Gaussian entries has a Bernoulli-Gaussian distribution. Although the exact MMSE-optimal estimator for this compressive sensing problem is too complex to implement (the problem is NP hard), one can construct approximately MMSE-optimal estimates using several Bayesian strategies, including MCMC, variational Bayes, and belief propagation (or message passing). The latter option appears to be the cheapest, as a consequence of the “approximate message passing” (AMP) algorithm proposed (for general signal priors) by Donoho/Maleki/Montanari at the 2010 Information Theory Workshop. In fact, AMP manifests as a simple iterative thresholding algorithm that is–perhaps surprisingly–a lot cheaper than OMP! The details of the Bernoulli-(real)Gaussian AMP algorithm can be found in one of our recent Asilomar papers on compressive imaging (http://www2.ece.ohio-state.edu/~schniter/pdf/asil10_image.pdf). Now, continuing the discussion, the LASSO or BPDN algorithms can also be interpreted in a Bayesian estimation framework. However, they are not optimized for the Bernoulli-Gaussian signal prior, but rather a different family of priors that includes the Laplacian prior. Thus, we have found that the normalized-MSE of LASSO/BPDN estimates is usually several dB worse than that of Bernoulli-Gaussian AMP when applied to Bernoulli-Gaussian signal estimation problems. However, if you generated a Laplacian signal, you might find LASSO/BPDN performing well relative to other algorithms. As a final note, I should mention that the original AMP [from PNAS 2009] was designed around the Laplacian prior and has been found to yield (with iid mixing matrices) essentially LASSO/BPDN estimates but with a small fraction of the complexity.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s