# An Improvement to CoSaMP but not to Supspace Pursuit?

Noting the similarity between the two,
Maleki and Donoho generalize CoSaMP and subspace pursuit (SP) into Two-stage thresholding (TST).
Given $$\vx_k$$, $$\vr_k$$, and $$s$$, TST with parameters $$(\alpha,\beta)$$ finds
$$J := \mathcal{S} \left [T_{\alpha s} ( \vx_k + \kappa \mathbf{\Phi}^*\vr_k ) \right ]$$
where $$0 < \kappa < 1$$ is a relaxation parameter,
and $$T_{\alpha s}(\vy)$$ nulls all elements of $$\vy$$
except for the $$\alpha s$$ ones with the largest magnitudes,
and
$$\mathcal{S}(\vy)$$ returns the support set of a vector $$\vy$$.
TST then thresholds again to find the new support
$$\Omega_{k+1} = \mathcal{S}\left [ T_{\beta s} ( \arg \min_{\vx} || \vu – \MPhi \MI_{\mathcal{S}(\vx)\cup J} \vx ||_2) \right ]$$
where $$T_{\beta s}(\vy)$$ nulls all elements of $$\vy$$
except for the $$\beta s$$ ones with the largest magnitudes,
and the $$N$$ square matrix $$[\MI_{J}]_{jj} = 1 \; \forall j \in J$$, and zero elsewhere.
The new solution $$\vx_{k+1}$$ is then computed by
$$\vx_k = \arg \min_\vx || \vu – \MPhi\MI_{\Omega_{k}}\vx||_2.$$
Obviously, when $$(\alpha,\beta) = (2,1)$$,
TST becomes similar to CoSaMP;
and to become similar to SP, $$\alpha = \beta = 1$$.
However, it appears similarities can be deceiving.

Needell and Tropp present CoSaMP in the following way.
First, given a sparsity $$s$$ CoSaMP finds the support
$$J := \mathcal{S} \left [T_{2s} ( \mathbf{\Phi}^*\vr_k) \right ].$$
CoSaMP then thresholds again to find the new support
$$\Omega_{k+1} = \mathcal{S}\left [ T_s ( \arg \min_{\vx} || \vu – \MPhi \MI_{\mathcal{S}(\vx)\cup J} \vx ||_2) \right ].$$
(In my own twice corrected implementation of CoSaMP, I make sure a magnitude is above numerical precision before I include its index in the support.)
Dai and Milenkovic present SP in nearly the same way, except in the first thresholding step they select $$s$$ indexes instead of $$2s$$.

One iteration of CoSaMP and one iteration of TST are only equivalent when
$$\mathcal{S} \left [T_{2 s} ( \vx_k + \kappa \mathbf{\Phi}^*\vr_k ) \right ] = \mathcal{S}(\vx_k) \bigcup \mathcal{S} \left [T_{2s} ( \mathbf{\Phi}^*\vr_k ) \right ].$$
And similarly for SP and TST,
$$\mathcal{S} \left [T_{s} ( \vx_k + \kappa \mathbf{\Phi}^*\vr_k ) \right ] = \mathcal{S}(\vx_k) \bigcup \mathcal{S} \left [T_{s} ( \mathbf{\Phi}^*\vr_k ) \right ].$$
As long as thresholding is non-linear, TST will not produce the same results as these.
So what happens if we redefine the first thresholding step of CoSaMP and SP?

Below we see for five indeterminacies the probability of exact recovery of CoSaMP using each of the thresholding definitions as a function of sparsity for compressively sensed sparse vectors with non-zero elements distributed normally. (I selected $$\kappa=0.6$$ since Maleki and Donoho find that produces quite good results.)
The extra term appears to really help the performance of CoSaMP as envisioned by Needell and Tropp (red is original). Keep in mind that CoSaMP knows the true sparsity of my signals, so these are upper limits of its performance.

For the same types of vectors, however,
this extra term appears to really hurt SP.
The version envisioned by Dai and Milenkovic
perform the best by far (red is original).
Again, SP knows the true sparsity of my signals.

Finally, let’s look at the performance of TST and SP for these same signals. Below we see that, for SP as envisioned by Dai and Milenkovic. Though TST does not know the real sparsity of the signal, I am surprised that its probability of exact recovery decays so much faster than SP. And it looks to me, TST here is acting like CoSaMP above without the extra term in the first thresholding step.

So the puzzler is: why does this extra term really help CoSaMP, and why does it really hurt SP?