Yesterday, I looked at Tropp’s exact recovery condition (ERC) for orthogonal matching pursuit (OMP). Today, I continue with his extrapolation to basis pursuit (BP), otherwise known as the principle of relaxing the definition of strict sparsity by the \(\ell_1\)-norm.

Assume, as yesterday, we have a \(m\)-dimensional signal \(\vx\) with a unique and \(s\)-sparse representation in a dictionary \(\mathcal{D} = \{ \vd_i : i \in \Omega\} \) of \(N\) atoms, notated in matrix form as \(\MD\).

Also, all \(s\) atoms participating in this representation are the columns of the matrix \(\mathbf{\Phi}_\vx\);

and I notate the indices of the atoms not participating as \(\Psi\).

Assume for this signal \(s\) is as small as possible.

Theorem 3.3 in J. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Trans. Info. Theory, vol. 50, pp. 2231-2242, Oct. 2004, states

The ERC

$$

\max_{i \in \Psi} ||\mathbf{\Phi}_\vx^\dagger \vd_i ||_1 < 1

$$

is a sufficient condition for recovering the sparsest representation of \(\vx\) from a dictionary \(\mathcal{D}\) by solving the problem

$$

\min_{\vs \in \mathcal{C}^N} || \vs ||_1 \; \textrm{subject to} \; \vx = \MD \vs.

$$

It is amazing the same condition that applies to OMP also applies to BP — but I bet one is sharper than the other. Now, let’s look at the proof.

We start by assuming \(\vx\) has another representation \(\vx = \mathbf{\Phi}_\vx’ \vs_\vx’\) different from the ideal one \(\vx = \mathbf{\Phi}_\vx \vs_\vx\) — which is not an assumption if \(\MD\) is full rank. And no trivial solutions here: all elements in \(\vs_\vx’\) are non-zero.

As long as the coefficients in \(\vs_\vx’\) are different from \(\vs_\vx\) (the vectors cannot even be the same dimension with our assumption of an optimal \(s\) sparse representation of \(\vx\)),

then \(\mathbf{\Phi}_\vx’\) has at least one column (atom), call it \(\vh\) that is not in \(\mathbf{\Phi}_\vx\) (for instance, one that is a linear combination of two in \(\mathbf{\Phi}_\vx\)).

Now, looking at the \(\ell_1\)-norm of the optimal solution,

and using the fact that \(\mathbf{\Phi}_\vx^\dagger \mathbf{\Phi}_\vx \vs_\vx = \vs_\vx\),

we see that

$$

\begin{align}

|| \vs_\vx ||_1 & = || \mathbf{\Phi}_\vx^\dagger \mathbf{\Phi}_\vx \vs_\vx ||_1 \\

& = || \mathbf{\Phi}_\vx^\dagger \vx ||_1 \\

& = || \mathbf{\Phi}_\vx^\dagger \mathbf{\Phi}_\vx’ \vs_\vx’ ||_1.

\end{align}

$$

We can bound this last expression. Assuming that all elements of the vector \(\vb\) are non-zero, and that each column of the matrix \(\MA\) has different \(\ell_1\) norms.

Then, we can say \(|| \MA\vb||_1 < ||\MA||_1 || \vb||_1\).

There is no equality here because we are assuming that all elements of \(\vb\) are non-zero,

and the columns of \(\MA\) are all different.

Applying this above, we find

$$

\begin{align}

|| \vs_\vx ||_1 & = || \mathbf{\Phi}_\vx^\dagger \mathbf{\Phi}_\vx' \vs_\vx' ||_1 \\

& < || \mathbf{\Phi}_\vx^\dagger \mathbf{\Phi}_\vx' ||_1 ||\vs_\vx' ||_1.

\end{align}

$$

As long as the ERC holds, we can bound \(|| \mathbf{\Phi}_\vx^\dagger \mathbf{\Phi}_\vx' ||_1 < 1\),

and finally \( || \vs_\vx ||_1 < ||\vs_\vx' ||_1\).

In other words, by minimizing the \(\ell_1\)-norm of the solution, BP will skip past any other solution as long as the ERC holds.