Algorithm Power Hour: Gradient Projection for Sparse Reconstruction (GPSR)

This algorithm power hour is devoted to Gradient Projection for Sparse Reconstruction (GPSR), presented in the paper: M. Figueiredo, R. Nowak, and S. J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE J. Selected Topics in Signal Process., vol. 1, no. 4, pp. 586-597, 2007.


Given an \(s\)-sparse vector \(\vx \in \mathcal{R}^N\),
we sense it by a matrix \(\mathbf{\Phi} \in \mathcal{R}^{m\times N}\) with \(N > m\),
which produces the measurement vector \(\vu = \mathbf{\Phi}\vx\).
GPSR so wants to solve the convex relaxation problem
$$
\min_{\vx} \frac{1}{2} || \vu – \mathbf{\Phi}\vx||_2^2 + \lambda ||\vx||_1.
$$
First, we turn this into the quadratic program
by separating the solution into its positive and negative parts,
$$
\vx = \vw – \vv, \vw, \vv \succeq 0.
$$
Then we rewrite the problem as
$$
\min_{\vw,\vv} \frac{1}{2} || \vu – \mathbf{\Phi}(\vw – \vv)||_2^2 + \lambda \mathbf{1}^T (\vw + \vv) \; \text{subject to} \; \vw, \vv \succeq 0.
$$
We can expand this and put it in the following very dapper form
$$
\min_{\vz} F(\vz) \; \text{subject to} \; \vz \succeq 0 := \min_{\vz} \frac{1}{2} \vz^T\MB\vz + \vc^T \vz \; \text{subject to} \; \vz \succeq 0
$$
where
$$
\MB = \left [
\begin{array}{cc}
\MG & -\MG \\
-\MG & \MG
\end{array} \right ]
$$
with \(\MG = \mathbf{\Phi}^T\mathbf{\Phi}\) the Gramian of the dictionary,
and
$$
\vc = \lambda \mathbf{1}_{2N} +
\left [ \begin{array}{c}
-\mathbf{\Phi}^T\vu \\
\mathbf{\Phi}^T\vu
\end{array} \right ]
$$
This then is a “bound-constrained quadratic program”
solvable by iterative gradient projection algorithms, among others.

Given a solution at iteration \(k\), \(\vz_k\),
GPSR chooses a stepsize parameter \(\alpha_k > 0\) and forms
$$
\vq = T_+(\vz_k – \alpha_k \nabla F(\vz_k) )
$$
where \(T_+(\vy)\) is a thresholding that lets pass all non-negative values,
and sets to zero the rest,
and \(\nabla F(\vz_k)\) is the vector of partial derivatives of the cost function, i.e., its gradient which points in the direction of steepest ascent.
Thus, \(- \nabla F(\vz_k) \) points in the direction of steepest descent,
and adding a bit of it to our current solution pushes it in that direction.
GPSR chooses an weight \(\mu_k \in [0,1] \) and updates the solution
$$
\vz_{k+1} = \vz_k + \mu_k (\vq – \vz_k).
$$
Now we see why we need the threshold above: if we didn’t do this, then
some elements of \(\vz_{k+1}\) will be negative.
GPSR continues this until some convergence criterion is satisfied.
The secret is how to choose the stepsize \(\alpha_k\) and weight \(\mu_k\) at each iteration.

In the basic formulation of GPSR,
the stepsize is guessed by
solving
$$
\alpha’ = \arg \min_\alpha F(\vz_k – \alpha \vg)
$$
where for \(i = 1, 2, \ldots, 2N\)
$$
[\vg]_i = \begin{cases}
[\nabla F(\vz_k)]_i, & [\vz_k]_i > 0 \;\textrm{or} \; [\nabla F(\vz_k)]_i < 0 \\
0, & \textrm{else}.
\end{cases}
$$
a vector that gives an indication of the favorable directions where there is some hope of zeroing out elements of the solution.
To keep the algorithm stable,
basic GPSR (another flavor is presented in their paper, and available in their code) reins potentially non-productive or egregious stepsizes
by selecting the middle value of the set \(\{\alpha_{\min}, \alpha’, \alpha_{\max}\}\).
Denoting that middle value \(\alpha\), GPSR selects some \(\beta \in (0,1)\), and constructs a series \(\mathcal{A} = \{\alpha\beta^i \}_{i = 0, 1, \ldots }\).
Then, GPSR selects a weight \(\mu \in (0,0.5)\),
and produces \(\alpha_k\) by
$$
\begin{multline}
\alpha_k = \max \alpha \in \mathcal{A} \; \textrm{such that} \\
F\left (T_+[\vz_k – \alpha\nabla F(\vz_k)] \right ) \le
F(\vz_k) – \mu [\nabla F(\vz_k)]^T\left( \vz_k – T_+[ \vz_k – \alpha\nabla F(\vz_k) ] \right).
\end{multline}
$$
Finally, GPSR updates the solution by
$$
\vz_{k+1} = T_+(\vz_k – \alpha_k \nabla F(\vz_k)).
$$
So, in the end, we see that GPSR is nothing but a computationally-light iterative approach to solving the convex relaxation principle of Basis Pursuit.

Below we see the empirical phase transition of GPSR compared with fourteen other methods.
I average the results over 50 independent trials, for sparse vectors with non-zero elements distributed Normal. The ambient dimension is \(N=400\).
I expect GPSR to equal the performance of BP, but with less computation time.
Instead, it is acting curiously.
More measurements of sparse signals is not helping.
This could be due to my choice of settings — better to stick with the defaults?
And look at CoSaMP!
What is going on there?
SP doesn’t appear to suffer from this problem,
and it is beating Maleki and Donoho’s Recommended Two-stage thresholding.
I am looking into these items now.

normal_phasevsdistSupp.png


It doesn’t appear to be anything with the variables that I pass to their code.
Turning “verbose” on, I see
it is finding some solution where the error in the measurements is on the order of 1e-5,
but for this 1-sparse signal, GPSR finds a 59-sparse signal. (And that is with “debiasing”.) If I don’t debias, GPSR fails.
When I try instead the “BB” flavor, it does even worse.
Hmmm…

Advertisements

One thought on “Algorithm Power Hour: Gradient Projection for Sparse Reconstruction (GPSR)

  1. I’ve had very similar problems with both flavors of the GPSR code. To date, I haven’t found much of a way around it without having to sit down and dig through the internals, which I generally try not to do unless I’m developing some kind of extension.
    So, I mostly just use GPSR when I need a quick back of the envelope/scratch pad recovery.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s