This algorithm power hour is devoted to Gradient Projection for Sparse Reconstruction (GPSR), presented in the paper: M. Figueiredo, R. Nowak, and S. J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE J. Selected Topics in Signal Process., vol. 1, no. 4, pp. 586-597, 2007.

Given an \(s\)-sparse vector \(\vx \in \mathcal{R}^N\),

we sense it by a matrix \(\mathbf{\Phi} \in \mathcal{R}^{m\times N}\) with \(N > m\),

which produces the measurement vector \(\vu = \mathbf{\Phi}\vx\).

GPSR so wants to solve the convex relaxation problem

$$

\min_{\vx} \frac{1}{2} || \vu – \mathbf{\Phi}\vx||_2^2 + \lambda ||\vx||_1.

$$

First, we turn this into the quadratic program

by separating the solution into its positive and negative parts,

$$

\vx = \vw – \vv, \vw, \vv \succeq 0.

$$

Then we rewrite the problem as

$$

\min_{\vw,\vv} \frac{1}{2} || \vu – \mathbf{\Phi}(\vw – \vv)||_2^2 + \lambda \mathbf{1}^T (\vw + \vv) \; \text{subject to} \; \vw, \vv \succeq 0.

$$

We can expand this and put it in the following very dapper form

$$

\min_{\vz} F(\vz) \; \text{subject to} \; \vz \succeq 0 := \min_{\vz} \frac{1}{2} \vz^T\MB\vz + \vc^T \vz \; \text{subject to} \; \vz \succeq 0

$$

where

$$

\MB = \left [

\begin{array}{cc}

\MG & -\MG \\

-\MG & \MG

\end{array} \right ]

$$

with \(\MG = \mathbf{\Phi}^T\mathbf{\Phi}\) the Gramian of the dictionary,

and

$$

\vc = \lambda \mathbf{1}_{2N} +

\left [ \begin{array}{c}

-\mathbf{\Phi}^T\vu \\

\mathbf{\Phi}^T\vu

\end{array} \right ]

$$

This then is a “bound-constrained quadratic program”

solvable by iterative gradient projection algorithms, among others.

Given a solution at iteration \(k\), \(\vz_k\),

GPSR chooses a stepsize parameter \(\alpha_k > 0\) and forms

$$

\vq = T_+(\vz_k – \alpha_k \nabla F(\vz_k) )

$$

where \(T_+(\vy)\) is a thresholding that lets pass all non-negative values,

and sets to zero the rest,

and \(\nabla F(\vz_k)\) is the vector of partial derivatives of the cost function, i.e., its gradient which points in the direction of steepest ascent.

Thus, \(- \nabla F(\vz_k) \) points in the direction of steepest descent,

and adding a bit of it to our current solution pushes it in that direction.

GPSR chooses an weight \(\mu_k \in [0,1] \) and updates the solution

$$

\vz_{k+1} = \vz_k + \mu_k (\vq – \vz_k).

$$

Now we see why we need the threshold above: if we didn’t do this, then

some elements of \(\vz_{k+1}\) will be negative.

GPSR continues this until some convergence criterion is satisfied.

The secret is how to choose the stepsize \(\alpha_k\) and weight \(\mu_k\) at each iteration.

In the basic formulation of GPSR,

the stepsize is guessed by

solving

$$

\alpha’ = \arg \min_\alpha F(\vz_k – \alpha \vg)

$$

where for \(i = 1, 2, \ldots, 2N\)

$$

[\vg]_i = \begin{cases}

[\nabla F(\vz_k)]_i, & [\vz_k]_i > 0 \;\textrm{or} \; [\nabla F(\vz_k)]_i < 0 \\

0, & \textrm{else}.

\end{cases}

$$

a vector that gives an indication of the favorable directions where there is some hope of zeroing out elements of the solution.

To keep the algorithm stable,

basic GPSR (another flavor is presented in their paper, and available in their code) reins potentially non-productive or egregious stepsizes

by selecting the middle value of the set \(\{\alpha_{\min}, \alpha’, \alpha_{\max}\}\).

Denoting that middle value \(\alpha\), GPSR selects some \(\beta \in (0,1)\), and constructs a series \(\mathcal{A} = \{\alpha\beta^i \}_{i = 0, 1, \ldots }\).

Then, GPSR selects a weight \(\mu \in (0,0.5)\),

and produces \(\alpha_k\) by

$$

\begin{multline}

\alpha_k = \max \alpha \in \mathcal{A} \; \textrm{such that} \\

F\left (T_+[\vz_k – \alpha\nabla F(\vz_k)] \right ) \le

F(\vz_k) – \mu [\nabla F(\vz_k)]^T\left( \vz_k – T_+[ \vz_k – \alpha\nabla F(\vz_k) ] \right).

\end{multline}

$$

Finally, GPSR updates the solution by

$$

\vz_{k+1} = T_+(\vz_k – \alpha_k \nabla F(\vz_k)).

$$

So, in the end, we see that GPSR is nothing but a computationally-light iterative approach to solving the convex relaxation principle of Basis Pursuit.

Below we see the empirical phase transition of GPSR compared with fourteen other methods.

I average the results over 50 independent trials, for sparse vectors with non-zero elements distributed Normal. The ambient dimension is \(N=400\).

I expect GPSR to equal the performance of BP, but with less computation time.

Instead, it is acting curiously.

More measurements of sparse signals is not helping.

This could be due to my choice of settings — better to stick with the defaults?

And look at CoSaMP!

What is going on there?

SP doesn’t appear to suffer from this problem,

and it is beating Maleki and Donoho’s Recommended Two-stage thresholding.

I am looking into these items now.

It doesn’t appear to be anything with the variables that I pass to their code.

Turning “verbose” on, I see

it is finding some solution where the error in the measurements is on the order of 1e-5,

but for this 1-sparse signal, GPSR finds a 59-sparse signal. (And that is with “debiasing”.) If I don’t debias, GPSR fails.

When I try instead the “BB” flavor, it does even worse.

Hmmm…

I’ve had very similar problems with both flavors of the GPSR code. To date, I haven’t found much of a way around it without having to sit down and dig through the internals, which I generally try not to do unless I’m developing some kind of extension.

So, I mostly just use GPSR when I need a quick back of the envelope/scratch pad recovery.

LikeLike