Today, I am looking at Theorem 3.1 in J. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Trans. Info. Theory, vol. 50, pp. 2231-2242, Oct. 2004. (I see he did this when he was a student member of IEEE!)

Consider we have some signal \(\vx \in \mathcal{C}^N\),

and a dictionary of functions \(\mathcal{D} = \{ \vd_i \in \mathcal{C}^N : i \in \Omega \}\).

Assume that \(\mathcal{D}\) contains a subset of functions indexed by \(\Lambda_\vx\) such that \(\vx\) has an exact \(m\)-sparse solution

$$

\vx = \sum_{\lambda \in \Lambda_\vx} s_\lambda \vd_\lambda = \mathbf{\Phi}_\vx \vs

$$

where all \(m\) coefficients of \(\vs\) are non-zero, and the set of functions \(\{\vd_\lambda : \lambda \in \Lambda_\vx\}\) are linearly independent, i.e., \(\mathbf{\Phi}_\vx\) has full rank.

Furthermore, assume \(m\) is as small as it can go for this pair of signal and dictionary.

From this, define the indices of the unused atoms as \( \Psi := \Omega \backslash \Lambda_\vx\), and make those atoms columns in the matrix \(\mathbf{\Phi}_\Psi\).

Tropp’s theorem states the following:

A sufficient condition for orthogonal matching pursuit (OMP) to recover the sparsest representation of \(\vx\) in \(\mathcal{D}\) is that

$$

\max_{i \in \Psi} ||\mathbf{\Phi}_\vx^\dagger \vd_i ||_1 < 1

$$

where \(\mathbf{\Phi}_\vx^\dagger := ( \mathbf{\Phi}_\vx^H \mathbf{\Phi}_\vx)^{-1} \mathbf{\Phi}_\vx^H\) is the pseudoinverse, and \(^H\) denotes conjugate transpose. (Here there is a typo in Tropp's text.)

This is to say, if the atoms not used in the sparsest representation of a signal do not project much on span of the atoms that are used, then OMP will recover its sparsest representation. Tropp calls this the “Exact Recovery Condition (ERC)”, and colorfully says, “it guarantees that no spurious atom can masquerade as part of the signal well enough to fool OMP.”

Now, let’s look at the proof.

Assume OMP finds as its first atom one of those with an index in \(\Lambda_\vx\).

By definition (of the OMP selection criterion), this is the atom that points most in the direction of \(\vx\), and leaves a residual \(\vr_1\) that is perpendicular to the span of that atom.

We can then evaluate the magnitudes of \(\mathbf{\Phi}_\vx^H \vr_1\), which are the lengths of the projections of \(\vr_1\) onto the \(m\) best atoms.

Since OMP selects the atom that has the largest length projection onto the current residual,

we want the largest magnitude element of \(\mathbf{\Phi}_\vx^H \vr_1\) to be larger than the largest magnitude element of \(\mathbf{\Phi}_\Psi^H \vr_1\).

Generalizing this,

assume OMP has selected \(k < m\) atoms all with indices in \(\Lambda_\vx\),

and thus producing a residual \(\vr_k\) perpendicular to the span of this set of atoms,

the index of the next atom OMP selects will surely be in \(\Lambda_\vx\) if and only if

$$

\frac{|| \mathbf{\Phi}_\Psi^H \vr_k ||_\infty}{|| \mathbf{\Phi}_\vx^H \vr_k ||_\infty} < 1.

$$

We could stop there, but what is the relationship between the atoms indexed by \(\Lambda_\vx\) and those by \(\Psi\)?

Considering that \(\vx\) has an exact \(m\)-sparse representation in the atoms indexed by \(\Lambda_\vx\), and the indices of the first \(k < m\) atoms found by OMP are in \(\Lambda_\vx\), this means by definition that \(\vr_k\), and every other residual created by OMP, is in the span of the set of atoms indexed by \(\Lambda_\vx\).

Thus, we can say \(\vr_k = \mathbf{\Phi}_\vx (\mathbf{\Phi}_\vx^H\mathbf{\Phi}_\vx )^{-1} \mathbf{\Phi}_\vx^H \vr_k = (\mathbf{\Phi}_\vx^\dagger)^H\mathbf{\Phi}_\vx^H \vr_k\),

since the Gramian and its inverse are Hermitian.

Defining \(\vy := \mathbf{\Phi}_\vx^H \vr_k\), the quotient above becomes

$$

\frac{|| \mathbf{\Phi}_\Psi^H (\mathbf{\Phi}_\vx^\dagger)^H \vy ||_\infty}{|| \vy ||_\infty}

$$

which Tropp recognizes as similar to an induced matrix norm, and applies a norm bound.

The induced \(p\)-norm of a matrix \(\MA \in \mathcal{C}^{m \times N}\) is defined

$$

|| \MA ||_p := \max_{\vx \in \mathcal{C}^N : || \vx || > 0} \frac{|| \MA \vx ||_p}{||\vx||_p}.

$$

The \(\infty\)-norm of a matrix is its largest absolute sum of its columns;

the \(1\)-norm is its largest absolute sum of its rows.

From the definition, for any vector in \(\mathcal{C}^N\) acted on by \(\MA\), the following relationship must hold:

$$

|| \MA \vx ||_p \le || \MA ||_p ||\vx||_p

$$

i.e., the \(p\)-norm is bounded from above.

With this, we can say

$$

\frac{|| \mathbf{\Phi}_\Psi^H (\mathbf{\Phi}_\vx^\dagger)^H \vy ||_\infty}{|| \vy ||_\infty} \le

\frac{|| \mathbf{\Phi}_\Psi^H (\mathbf{\Phi}_\vx^\dagger)^H ||_\infty || \vy ||_\infty}{|| \vy ||_\infty} = || \mathbf{\Phi}_\Psi^H (\mathbf{\Phi}_\vx^\dagger)^H ||_\infty

$$

With the correspondence between the infinity- and 1-norm of a matrix being just a conjugate transpose different (columns become rows and vice versa), the ERC becomes

$$

\frac{|| \mathbf{\Phi}_\Psi^H \vr_k ||_\infty}{|| \mathbf{\Phi}_\vx^H \vr_k ||_\infty} \le

|| \mathbf{\Phi}_\vx^\dagger \mathbf{\Phi}_\Psi ||_1 = \max_{i \in \Psi} || \mathbf{\Phi}_\vx^\dagger \vd_i ||_1 < 1.

$$

If this relationship holds, along with the exact \(m\)-sparse assumption, then OMP is guaranteed to recover all atoms from the signal,

and the norm residual at the end will be zero.

The ERC could also be put in terms of the 2-norm

using the inequality \(|| \vx ||_1 \le \sqrt{N} || \vx ||_2\):

$$

\max_{i \in \Psi} || \mathbf{\Phi}_\vx^\dagger \vd_i ||_2 < 1/\sqrt{N}

$$

but this probably even further reduces the space of signal representations that can be recovered by OMP.

Still, the ERC is a sufficient condition, but not necessary.

I am finding OMP is often quite good at recovering all the atoms in a representation, but I wonder to what degree the ERC is broken in these cases.

Because of its orthogonal projection step, OMP can and does correct bad choices — but certainly not always.

This observation makes me wonder if there is an “ERC-\(1\) condition,” meaning 1 of the atoms so far selected by OMP is wrong…