# Experiments with Iterative Hard Thresholding

Over the past few days, I have been running some experiments with iterative hard thresholding (IHT). I have described this approach to sparse signal recovery from compressive measurements here.
This method is presented and analyzed in T. Blumensath, M. E. Davies, “Iterative Hard Thresholding for Compressed Sensing,” Applied and Computational Harmonic Analysis, vol. 27, no. 3, pp. 265-274, 2009.

Say we have sensed some vector $$\vx \in \mathcal{R}^N$$
by the sensing matrix $$\mathbf{\Phi} \in \mathcal{R}^{M\times N}$$
with $$N > M$$ and unit-norm columns.
This produces the measurement $$\vy = \mathbf{\Phi}\vx$$,
from which we want to recover $$\vx$$.
IHT attempts this as follows.
We initialize the algorithm with $$\hat \vx(0) = \mathbf{0}$$, and $$\vr(0) = \vx$$.
At step $$n$$ we have the $$n$$th-order solution $$\hat\vx(n) \in \mathcal{R}^N$$,
producing the residual $$\vr(n) = \vy – \mathbf{\Phi}\hat \vx(n)$$.
We update this solution by
$$\hat \vx(n+1) = T \left ( \hat \vx(n) + \kappa \mathbf{\Phi}^T\vr(n); t(n) \right )$$
where $$0 < \kappa 0$$.
For “hard” thresholding, this is defined simply
$$T(x; t) := \begin{cases} x, & |x| > t \\ 0, & \textrm{else}. \end{cases}$$
Blumensath et al. show that this algorithm will converge to a local minimum of the problem
$$\min_\vx || \vy – \mathbf{\Phi} \vx ||_2^2 + \lambda || \vx ||_0$$
with $\lambda \ge 0$.

Though it is quite simple in design, I have several problems getting it to work.
The first problem is the choices for $$\kappa$$, of $$\lambda$$, and of the threshold $$t$$.
$$\kappa$$ will control the step size of convergence onto the solution.
$$\lambda$$ will weigh the importance of sparsity in the solution relative to the error.
And $$t$$ will determine which elements are kept in the thresholding step.
Looking into the helpful MATLAB software provided by Blumensath — Sparsify_0.5
I see he sets $$\kappa = 1$$, and makes $$t = \lambda$$.
So, what of $$\lambda$$?
I am finding this parameter has a significant impact on the performance of IHT;
and no matter what I do it performs miserably for a problem sparsity even as small as $$\rho := k/M = 5/50 = 0.1$$
The results below are averaged from 2000 independent trials of compressively sampled sparse signals sampled from five distributions. I add no noise to the measurements.

Each distribution comes from the following code:

x = zeros(dimensions,1);
activelements = randperm(dimensions);
switch distributiontype{kk}
case 'normal'
x(activelements(1:sparsity)) = randn(sparsity,1);
case 'bimodalgaussian'
x(activelements(1:sparsity)) = ...
sign(randn(sparsity,1)).*(randn(sparsity,1)+3);
case 'laplacian'
x(activelements(1:sparsity)) = ...
randlap([sparsity,1],10); % lambda = 10
case 'uniform'
x(activelements(1:sparsity)) = ...
sign(randn(sparsity,1)).*rand(sparsity,1);
case 'bimodaluniform'
x(activelements(1:sparsity)) = ...
sign(randn(sparsity,1)).*(1+rand(sparsity,1));
end


After much trial and error, I set $$\lambda = 0.4$$ for all these, which is probably not optimal over all distributions.

Yes, that is IHT in diamonds suffering far below OMP and $$\ell_1$$-minimization for five different sparse vector distributions (‘N’ is Normal, ‘L’ is zero mean Laplacian, ‘U’ is zero mean Uniform, ‘BG’ is bimodal Gaussian, and ‘BU’ is bimodal uniform).
It occurs to me here that because of its hard thresholding step, IHT can only recover sparse vectors that have all its active elements outside this very critical threshold value. So it makes sense to me that IHT only has some success with the bimodal distributions. Furthermore, I see that, from Fig. 1 of the preprint to the article I point to above, IHT does very poorly above a problem sparsity of 0.05.

Lo and behold! I believe I see an improved version here: T. Blumensath, M. E. Davies; “Normalised Iterative Hard Thresholding; guaranteed stability and performance” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 298-309, 2010.