Tropp’s More Useful Exact Recovery Condition

I have thus far reviewed two theorems from Tropp’s pièce de résistance “Greed is Good” (GiG)
one showing a sufficient condition for exact recovery from OMP, and the other showing the same sufficient condition for exact recovery from BP.
We cannot really use these theorems, however, to determine whether or not a given signal $$\vx$$ is recoverable using OMP or BP in some dictionary $$\mathcal{D} = \{ \vd_i : i \in \Omega\}$$.
To do so requires we already know the best set of functions in the dictionary for the signal.
However, Tropp finds the maximum signal sparsity $$s$$ to which we can push a dictionary
to ensure the “exact recovery condition” (ERC) is met — that is, OMP will recover the correct $$s$$ atoms in its first $$s$$ iterations.
Theorem 3.5 of GiG states,

OMP and BP will recover any signal $$s$$ sparse in the dictionary
$$\mathcal{D} = \{ \vd_i : i \in \Omega \}$$ if
$$\mu_\Sigma(s-1) + \mu_\Sigma(s) < 1$$
where
$$\mu_\Sigma(s) := \max_{| \Gamma | = s} \max_{\lambda \in \Omega \backslash \Gamma} \sum_{\gamma \in \Gamma} | \langle \vd_\lambda, \vd_\gamma \rangle |$$
is the cumulative coherence of the dictionary.

The cumulative coherence is a generalization of the coherence of a dictionary,
which is defined
$$\mu := \max_{j \ne k} | \langle \vd_{\omega_j}, \vd_{\omega_k} \rangle |$$
which tells us the worst case confusion in a dictionary.
When $$\mu \approx 1$$, then there are $$1$$-sparse signals
that may not be properly decomposed by OMP
because a wrong atom will be confused with the right atom.
Cumulative coherence generalizes this to $$s$$-sparse signals
by finding the $$s$$ atoms in the dictionary that are most similar to
all the others, and thinking pessimistically.

The proof of Theorem 3.5 is quite nice.
Though it starts from the optimal set of $$s$$ atoms — expressed in matrix form by $$\mathbf{\Phi}_\vx$$ —
it ends with only values dependent on the dictionary, i.e., the cumulative coherence.
Letting $$\Psi := \Omega \backslash \Lambda_\vx$$ be the indices of the atoms
not in $$\mathbf{\Phi}_\vx$$, we can bound the ERC norm:
\begin{align} \max_{i \in \Psi} ||\mathbf{\Phi}_\vx^\dagger \vd_i ||_1 & = \max_{i \in \Psi} ||( \mathbf{\Phi}_\vx^H \mathbf{\Phi}_\vx)^{-1} \mathbf{\Phi}_\vx^H \vd_i ||_1 \\ & \le ||( \mathbf{\Phi}_\vx^H \mathbf{\Phi}_\vx)^{-1} ||_1 \max_{i \in \Psi} || \mathbf{\Phi}_\vx^H \vd_i ||_1 \end{align}
where we can move the max because the first norm doesn’t depend on it.
From the definition of the cumulative coherence, we know
$$\max_{i \in \Psi} || \mathbf{\Phi}_\vx^H \vd_i ||_1 = \max_{i \in \Psi} \sum_{\lambda \in \Lambda_\vx} | \langle \vd_i, \vd_\lambda \rangle | \le \max_{| \Gamma | = s} \max_{i \in \Psi} \sum_{\gamma \in \Gamma} | \langle \vd_i, \vd_\gamma \rangle | = \mu_\Sigma(s).$$
That much is easy.

Now, notice that the Gramian $$\mathbf{\Phi}_\vx^H\mathbf{\Phi}_\vx$$ has ones on its diagonal. We can express this matrix then as $$\mathbf{\Phi}_\vx^H\mathbf{\Phi}_\vx = \MI + \MA$$, where $$\MI$$ is an $$s \times s$$ identity matrix.
A column of the second matrix is made of the inner products between one atom and $$s-1$$ others.
And if we take its induced $$\ell_1$$-norm,
we will have the largest absolute column sum.
In other words, we look for the column of $$s-1$$ atom inner products whose magnitudes sum to the largest value of all the columns.
The $$s-1$$ cumulative coherence puts a limit on that value, and so:
$$|| \MA ||_1 \le \mu_\Sigma(s-1).$$
If we assume, $$\mu_\Sigma(s-1) < 1$$, magic happens.
We know from geometric series that if $$|a| < 1$$ then
$$\sum_{k=0}^\infty a^k = (1 – a)^{-1}.$$
Neumann series generalizes that to matrices.
If $$||\MA||_1 < 1$$, then
$$\sum_{k=0}^\infty \MA^k = (\MI – \MA)^{-1}.$$
Thus,
\begin{align} ||( \mathbf{\Phi}_\vx^H \mathbf{\Phi}_\vx)^{-1} ||_1 & = || (\MI + \MA )^{-1} ||_1 \\ & = \left | \left | \sum_{k=0}^\infty (-\MA)^k \right | \right |_1. \end{align}
Since the norm of a sum is at most the sum of the norms
\begin{align} \left | \left | \sum_{k=0}^\infty (-\MA)^k \right | \right |_1 & \le \sum_{k=0}^\infty || \MA ||_1^k \\ & = \frac{1}{1 – || \MA ||_1} \\ & \le \frac{1}{1 – \mu_\Sigma(s-1)} \end{align}
from the inequality above.

Putting all this together with the ERC
$$\max_{i \in \Psi} ||\mathbf{\Phi}_\vx^\dagger \vd_i ||_1 \le ||( \mathbf{\Phi}_\vx^H \mathbf{\Phi}_\vx)^{-1} ||_1 \max_{i \in \Psi} || \mathbf{\Phi}_\vx^H \vd_i ||_1 \le \frac{\mu_\Sigma(s)}{1 – \mu_\Sigma(s-1)} < 1$$
and thus we have reached Tropp's excellent result,
which depends completely on the dictionary, and with the assumption that $$\mu_\Sigma(s-1) < 1$$.
Computing cumulative coherences, however, might be harder than determining the sparsest representation of a signal.