Over the past few days, Igor of Nuit Blanche has been asking about this “ensemble” line on my phase plots here, which I also discuss in my article here. I explained in a comment that

We choose the [solution] that produces the smallest l_2 error given the measurements \(|| y – A x_i ||_2\) to be the sensed vector. I do not expect that OMP, BP, IHT, CMP, etc. fail for the same vectors. So using an ensemble, I am guaranteed to have at least the recovery performance of the best performing algorithm. When the recovery algorithms fail for different vectors, then I will have better performance.

After thinking about this and looking at my code, I see that is not correct. In order to clear up any and all confusion, I want to formalize what this “ensemble” line means.

Given an underdetermined system, \(\vy = \MA\vx\), I wish to recover \(\vx\).

Many algorithms exist to do this, but those we focus on in compressed sensing emphasize error and sparsity in the solution.

The approaches I have been studying in my own work are: \(\ell_1\) minimization (BP and BP denoising), and iterative re-weighted \(\ell_1\) minimization (IRl1); Smoothed \(\ell_0\) (SL0); cyclic matching pursuit (CMP), orthogonal matching pursuit (OMP), and probabilistic OMP (PrOMP); and the recommended versions of iterative hard and soft thresholding (IHT, IST), and two-stage thresholding (TST), which is essentially subspace pursuit.

Each one of these algorithms gives a different solution to the underdetermined system above.

Thus, given a set of solutions \(\mathcal{X} =\{\vx_i : i = 1, 2, \ldots \}\) produced by an “ensemble” E of recovery algorithms (OMP, BP, SL0, etc.), I consider it a success for E if any of these solutions satisfy \(|| \vx – \vx_i ||_2 \le 10^{-2} || \vx ||_2\).

This, of course, requires knowing \(\vx\), or at least knowing how to select the best solution from the set.

The dotted line in the figure below thus shows the phase transition of the best we can expect using E, given we know exactly how to select the right solution from \(\mathcal{X}\).

So it appears that this approach is echoed in part of Igor’s description of his “DUMBEST” algorithm, where he says,

One could also perform these reconstructions using different solvers and use a voting mechanism to single out the actual reconstructed signal.

The rest of his description about adding more measurements however, confuses me.

This brings up the interesting question of devising a “voting mechanism” to select the real solution from all the others.

Given the measurements \(\vy\) and the sensing matrix \(\MA\), can we pick a solution \(\mathcal{X} = \{\vx_i : i = 1, 2, \ldots \}\) to reach the dotted line of E?

Perhaps, we can select the solution that appears (approximately) more than once;

but this won’t help when only one of the algorithms is capable of finding the solution at a given sparsity.

Or maybe we can mix the solutions based on their error, as in Elad’s plurality approach.

Or, we can select a solution based on an error criterion:

$$\vx = \arg \min_{\vx_i \in \mathcal{X}} || \vy – \MA \vx_i ||_2$$

which stresses modeling the measurements precisely.

This is a problem considering that greedy iterative descent algorithms

can run until this error becomes zero (as long as \(\MA\) is full rank).

Another selection criterion is

$$\vx = \arg \min_{\vx_i \in \mathcal{X}} || \vx_i ||_0$$

but it is a mistake to not consider the errors of the solutions

since we cannot assume all algorithms have converged onto a satisfactory solution.

If, instead, we use a criterion like

$$\vx = \arg \min_{\vx_i \in \mathcal{X}} || \vy – \MA \vx_i ||_2 + \lambda || \vx_i ||_0$$

then we are getting closer to what I think will give the performance of E.

Except I think it should be

$$\vx = \arg \min_{\vx_i \in \mathcal{X}} \frac{|| \vy – \MA \vx_i ||_2}{|| \vy ||_2} + \frac{|| \vx_i ||_0}{N}$$

where \(N\) is the dimensionality of the solution.

To test this, I have run the same experiments as here, but now with SL0 and IRl1. Still, no noise in the measurements.

Below, we see the phase transitions for nine algorithms for sparse vectors with non-zero elements distributed Normal, averaged over three trials for each problem sparsity and indeterminacy pair.

The dashed line is the phase transition of the solutions selected from set of solutions using the last selection criterion above, where I define $$|| \vx_i ||_0 := \#\{ | [\vx_i]_n | \ge 10^{-4} : n = 1, 2, \ldots, N\}$$

which I arrived at by experimentation.

We see that with this selection criterion, the phase transition of the recovery from this ensemble is equal to the best or above all the others.

My code is here: ensemble.zip. I am running this for a larger number of signals. Now, what is going on with IRl1???

Bob,

This is very nice ensemble definition and the voting mechanism seems to work very well.

Here is another suggestion, we could also put all the solutions as columns of a new matrix called B and look for a factorization of that matrix as:

B = A + E + Z , where A is a low rank matrix (rank 1), E is a sparse matrix and Z is a “noisy” matrix ( http://perception.csl.uiuc.edu/matrix-rank/sample_code.html ) or B = A + E (Robust PCA, http://www-stat.stanford.edu/~candes/papers/RobustPCA.pdf ). Of note the comparison between different algos: http://perception.csl.uiuc.edu/matrix-rank/sample_code.html#Comparison

Cheers,

Igor.

PS: As a side note, I am still amazed at the performance of SL0.

LikeLike