# Even more experiments with IHT, now with more stages!

In the Maleki and Donoho paper, they generalize CoSaMP (D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from
incomplete and inaccurate samples,” Appl. Comput. Harmonic Anal.,
vol. 26, no. 3, pp. 301-321, 2008), and subspace pursuit (W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing
signal reconstruction,” IEEE Trans. Inf. Theory, vol. 55, no. 5, pp.
2230-2249, May 2009), as “composite iterative algorithms,” which they call
“two-stage thresholding” (TST).

Say we have sensed some vector $$\vx \in \mathcal{R}^N$$
by the sensing matrix $$\mathbf{\Phi} \in \mathcal{R}^{M\times N}$$
with $$N > M$$ columns of the same norm that is less than 1.
This produces the measurement $$\vy = \mathbf{\Phi}\vx$$,
from which we want to recover $$\vx$$.
TST does this first by thresholding as IHT, but to find a linearly independent set of columns from $$\mathbf{\Phi}$$. (Will the first thresholding step and subsequent union of supports always yield such a set? I don’t think so. Must read the original sources.)
It then thresholds the projection of the measurement onto the dual basis of this subset.
As before, we initialize the algorithm with $$\hat \vx(0) = \mathbf{0}$$, and $$\vr(0) = \vx$$.
At step $$n$$ we have the $$n$$th-order solution $$\hat\vx(n) \in \mathcal{R}^N$$,
producing the residual $$\vr(n) = \vy – \mathbf{\Phi}\hat \vx(n)$$.
$$\hat \vx(n+1) = T_2 \left ( [\mathbf{\Phi}_n^T\mathbf{\Phi}_n]^{-1}\mathbf{\Phi}_n^T \vy; t_2(n) \right )$$
where $$\mathbf{\Phi}_n$$ is a matrix of the union of significant columns in $$\mathbf{\Phi}$$.
These are found from the non-zero entries in $$\vv(n) \cup \vx(n)$$ (by an abuse of notation, I am just saying combine their supports) where
$$\vv(n) = T_1\left ( \hat \vx(n) + \kappa \mathbf{\Phi}^T\vr(n); t_1(n) \right )$$
where $$0 < \kappa < 1$$ is a relaxation parameter,
and $$T_i( \cdot; t_i)$$ are thresholding functions.
As it has been noticed by me and many others, these thresholds are non-trivial;
and for TST, Maleki and Donoho recommend setting them based on an oracle sparsity (!).
If we know our sensed signal is k-sparse, then we set $$t_1(n)$$ such that we keep the $$\alpha k$$ values with the largest magnitudes, and $$t_2(n)$$ such that we keep the $$\beta k$$ values with the largest magnitudes, where $$\alpha, \beta$$ are positive integers.
Maleki and Donoho say CoSaMP is where $$\alpha = 1, \beta = 2$$, and subspace pursuit is where $$\alpha = \beta = 1$$. Maleki and Donoho helpfully provide their tuned algorithms. It was just a matter of seconds for me to take their code and integrate it with mine before I could spend hours waiting for the simulation results. Such nice work make my days pleasant!

Below we see the recovery performance of their recommended algorithms averaged over 1000 trials, for a variety of problem sparsities (with no noise added to the measurements) using constant amplitude random signs (CARS) sparse signals (the same used by Maleki and Donoho).
The algorithms are: iterative hard thresholding (IHTR), iterative soft thresholding ISTR, two-stage threshoding (TSTR). We also see the performance of $$\ell_1$$-minimization (BP), and orthogonal matching pursuit (OMP).
I am surprised by a few things, though further reflection provides some illumination.

First, IHT fails even for the sparsest signals!
But since it is built on thresholding projections of the initial measurement onto the sensing matrix, two columns that are more non-orthogonal than expected could ruin it all.
This might be cured by selecting only one element at a time,
but then IHT becomes a variant of MP,
and we all know MP acts like the brother no one is proud of (greedy).
The sensing matrix (50×250 sampled from the spherical ensemble) changes only for each sparsity… which might not be ideal since I could happen on a matrix with a bad RIP. But then the sensing matrix would be poor for all algorithms.
Soft thresholding appears to do better in this domain, but with larger sparsity it begins to act like MP crumbling under the massive weight of correctly decoding a handful plus one of non-zero elements.
The TST approach does much better than these, and even outperforms OMP,
but boy does it have a fast decline. BP just takes the cake for these types of sparse vectors.

Below we see the recovery performance of their recommended algorithms averaged over 2000 trials, for a variety of problem sparsities (with no noise added to the measurements) using sparse signals distribute Normal.
Now the story gets twisted.
IHT is still failing, dare I say even for 0-sparse signals.
TST performs better for these types of signals,
but it drops off even more precipitously than before.
BP is consistent.
But OMP really shines, if you can call it that.

Now, I think, the crucial questions to be answered before I select an algorithm with which to decode a compressively sampled signal are:

1. What is/are the distribution/s of the underlying sparse signal/s?
2. How can I combine the results of several of these algorithms to aid my recovery efforts?

I think it is time to write!
Now, how to find the relationships between the underlying distribution, and the performance of these algorithms…