The other day I presented some preliminary results. Now I have rerun the experiments using the same algorithms, and a correction to iterative re-weighted \(\ell_1\) minimization (IRl1). I am still using up to four reweightings, and the initialization \(\epsilon = 0.1\) and \(\MW = \MI\); but for the noisy case it is now solving the following convex optimization problem: $$\min || \MW \vx ||_1 \; \textrm{subject to} \; || \vy – \MA\vx ||_2 \le ||\MA\vx||_2 10^{-\textrm{SNR}/20}.$$ (The noise signal has a standard deviation \(||\MA\vx||_2 10^{-\textrm{SNR}/20}\). This made me a bit uncomfortable thinking it will result in no solution, but we will always have a solution as long as \(\MA\) is full-rank.)

In the animated plots below, we see how the phase transitions (averaged over 50 independent trials) of these algorithms change for sparse vectors distributed Bernoulli (top) and Normal (bottom) with (SNR of 60 dB) and without additive noise in the compressive measurements. The dashed lines are the phase transition of selecting solutions from the “ensemble“. The red one shows the best we can hope for; and the black one is from selecting the solution that satisfies $$\vx = \arg \min_{\vx_i \in \mathcal{X}} \frac{|| \vy – \MA \vx_i ||_2}{|| \vy ||_2} + \frac{|| \vx_i ||_0}{N}$$

where \(N\) is the dimensionality of the solution,

and I define in an ad hoc manner

$$|| \vx_i ||_0 := \#\{ | [\vx_i]_n | \ge 10^{-4} : n = 1, 2, \ldots, N\}.$$

In both of these plots, we see that for each algorithm individually, things stay the same with this little bit of noise.

In the Bernoulli case, IRl1 does surprisingly better than SL0. That was not expected.

We also see that we reach the best phase transition through the ensemble selection criterion above.

In the Normal case below, when there is no noise, SL0 exceeds all our expectations at high problem indeterminacies. IRl1 flounders for some reason.

And our ensemble selection method coincides with the best expected.

When we add a wittle bit of noise however, SL0 takes a dramatic plunge, while the other algorithms retain their performance. (For no noise, I set \(\sigma_J = 0.00004\); with noise I set \(\sigma_J = 4 ||\MA\vx||_2 10^{-\textrm{SNR}/20}\). For both cases, I set the decrease factor to 0.95.) Also, we see a difference between the best ensemble performance between the two noise cases.

Finally, our ensemble selection criterion fails, clearly due to my definition above of \(|| \vx_i ||_0\).

When the non-zero elements of the sensed vector are distributed with a concentration about zero mean, this definition of sparsity should become very sensitive.

Thus, I propose instead defining it

$$|| \vx_i ||_0 := \#\{ | [\vx_i]_n | \ge \sigma_\vn : n = 1, 2, \ldots, N\}$$

for no other reason than it feels right.

I think I can arrive at that by considering \(\MA\) satisfies the RIP with constant \(\delta_s \ll 1\).

This means

$$(1 – \delta_s) || \vx ||_2^2 \le ||\MA\vx||_2^2 – \sigma_\vn^2 \le (1 + \delta_s) || \vx ||_2^2$$

where the additive white Gaussian noise has variance \(\sigma_\vn\), and we assume it is approximately orthogonal to the measurements \(\MA\vx\).

If a non-zero element in \(\vx\) has a magnitude less than \(\sigma_\vn\),

then the unit-norm column in \(\MA\) will contribute less than \(\sigma_\vn^2\) energy to \(||\MA\vx||_2^2\)… So maybe instead the definition should be

$$|| \vx_i ||_0 := \#\{ | [\vx_i]_n | \ge \sigma_\vn/\sqrt{N} : n = 1, 2, \ldots, N\}.$$ I will have to think about this more, and hack my way to the right solution.

After some hacking, I am finding a very good definition (at least for an SNR of 60 dB) is

$$|| \vx_i ||_0 := \#\{ | [\vx_i]_n | \ge 10 \sigma_\vn/\sqrt{N} : n = 1, 2, \ldots, N\}.$$

I will start simulations at 30 dB SNR today and have it run through Kristi Himmelfartsdag.

I also see that there exists a “Robust SL0”: A. Eftekhari, M. Babaie-Zadeh, C. Jutten, and H. Abrishami Moghaddam, “Robust-SL0 for Stable Sparse Representation in Noisy Settings,” in Proc. ICASSP, 2009. I have updated my Po’D on SL0 with this other paper.