Hello, and welcome to Paper of the Day (Po’D): Complementary Matching Pursuit Edition. Today’s papers are: G. Rath and C. Guillemot, “A complementary matching pursuit algorithm for sparse approximation,” in Proc. European Signal Process. Conf., (Lausanne, Switzerland), Aug. 2008; and G. Rath and C. Guillemot, “On a simple derivation of the complementary matching pursuit,” Signal Processing, vol. 90, pp. 702-706, Feb. 2010.

There also appears to be a paper submitted: “Complementary Matching Pursuit Algorithms for Sparse Approximation.”

On Friday we reviewed the problem of finding the sparsest solution to an underdetermined system of equations, and saw that it can be cast in a subtly different form: instead of looking for the sparsest solution starting from nothing, we can look for it starting from the solution with the smallest \(\ell_2\)-norm of all possible solutions, i.e., $$\vs^* := \MD^T(\MD\MD^T)^{-1}\vx$$ i.i.e., the solution that points in no way into the nullspace of the dictionary, i.i.i.e, the Jean-Claude Van Damme of solutions to $$\vx = \MD\vs$$ where the dictionary is fat yet full rank.

As described earlier, \(\vs^*\) will probably not be sparse.

Thus, we will attempt to sparsify it using a greedy approach.

The complementary matching pursuit (CMP) proposes to sparsify it

using MP and a special dictionary.

Thus, we move from performing MP in a \(K\)-dimensional space, to an \(N\)-dimensional space.

Considering our original model \(\vx = \MD\vs\),

if we premultiply everything by the transformation \(\MD^T(\MD\MD^T)^{-1}\)

$$\MD^T(\MD\MD^T)^{-1}\vx = \vs^* = \MD^T(\MD\MD^T)^{-1} \MD\vs.$$

and define \(\MD’ := \MD^T(\MD\MD^T)^{-1} \MD\),

then we end up with something familiar:

$$\vs^* = \MD’ \vs.$$

Instead of sparsifying \(\vx\), we now want to sparsify \(\vs^*\).

CMP thus proceeds as MP using the non-unit-norm dictionary of \(N\) atoms in \(\MD’\)

starting with \(\vs^*\), i.e.,

- Set \(\vr(0) := \vs^*\).
- Find atom according to \(\max_{\vd_n’ \in \MD’} \frac{|\langle \vr(0), \vd_n’\rangle|}{||\vd_n’||}\).
- Set \(\vr(1) := \vr(0) – \frac{\langle \vr(0), \vd_n’\rangle}{||\vd_n’||^2}\vd_n’\).
- Continue with \(\vr(1)\), until some stopping criterion are met.

(I think, but that does not sit well. Considering these three papers together has me befuddled.)

Above is a figure summarizing MP and CMP, which I grabbed from their submitted article. Here, \(\MA\) is the dictionary, and a projection from \(\mathcal{R}^N\) to the signal space \(\mathcal{R}^K\), \(\vx^2\) is the Jean-Claude Van Damme solution, \(\vx_0\) is the sparsest solution, and \(\vb\) is the signal. MP starts from all zeros, and proceeds to add atoms until we converge onto the signal. CMP starts with \(\vx^2\) added to the nullspace (I think), and removes elements until we arrive at a more sparse solution (but I am confused how we ever left the solution space). Anyhow, the authors nicely summarize the difference between MP and CMP by the following:

- MP looks for a solution vector among the sparse vectors.
- CMP looks for a sparse solution vector among the solution vectors.

In their computer simulations, the authors show that CMP outperforms MP in a number of ways, though only using synthetically generated sparse vectors and small dictionaries. Also, we can see that their orthogonal extensions outperform orthogonal MP in these tests as well.

While CMP is an interesting idea, it has limited applicability to any sort of processing with massive dictionaries. First of all, just computing the Jean-Claude Van Damme solution for an audio signal and a 100 million atom dictionary is expensive. The problem then is finding a solution in the first place. Second of all, audio signals can be sparse, but they are often embedded in noise. Thus, any solution we start with for our audio signal will be extremely corrupt from its transformation over the pseudo-norm of the dictionary. This can completely ruin the excellent noise robustness of greedy pursuits. In such a case, we don’t want to start with a solution vector!