Consider the usual showbiz story of a short vector wanting to be much, much taller.
Our characters are: 1) a real \(K\)-dimensional signal \(\vx \in \mathcal{R}^K\);
2) a full rank fat dictionary of \(N > K\) real \(K\)-dimensional atoms \(\MD \in \mathcal{R}^{K\times N}\);
and 3) and some \(\vs \in \mathcal{R}^N\) such that $$\vx = \MD\vs.$$
To produce this show, there are an infinite number of possibilities to cast a solution \(\vs\) since \(N > K\);
but among these is “the one,” \(\vs_0\), the sparsest solution,
i.e., the one with the largest number of elements that are zero,
i.i.e., the solution that points into the fewest dimensions,
i.i.i.e., the Jack Nicholson of solutions.
We want to hire Jack Nicholson of course, but he is prohibitively expensive.
We cannot find “the one” directly because the computational complexity is combinatorial.
This is due to having to look at all \(n\)-tuples of the dictionary of \(N\) atoms.
An interesting question is:
Can we hire Jean-Claude Van Damme and transform him into Jack Nicholson?
Jack Nicholson is the sparsest solution to \(\vx = \MD\vs\). |
In other words, can we start with some non-sparse but low-cost solution,
and then whittle it down to the sparsest solution while
avoiding the combinatorial complexity of the direct approach?
Considering that we have found a low-cost but non-sparse solution,
we know there exists some \(\vp \in \mathcal{R}^N\) such that \( \vs_0 = \vs – \vp\).
So now we have to find a \(\vp\) that removes the bad elements
of Jean-Claude Van Damme, magnifies his good elements,
and leaves us with Jack Nicholson.
If \(||\vp|| = 0\), then obviously \(\vs = \vs_0\),
and we can safely remove the rubber face mask from Mr. Nicholson
as it is just a clever disguise.
Let us start with one special solution.
Since our dictionary has full-row rank,
any solution can be expressed as a linear combination of
the rows of the dictionary, i.e., for any \(\vs\) satisfying \(\vx = \MD \vs\),
we can always find a \(\vb \in \mathcal{R}^K\) such that \(\vs = \MD^T\vb\).
Substituting this into our model equation, \(\vx = \MD\vs = \MD\MD^T\vb\),
we see that we can solve for \(\vb\) by applying from the left
the inverse of the full-rank matrix product \(\MD \MD^T\),
and thus \(\vb = (\MD\MD^T)^{-1}\vx\), and finally,
$$\vs^* = \MD^T(\MD\MD^T)^{-1}\vx.$$
In fact, for any \(\vs\) satisfying \(\vx = \MD \vs\), when \(N > K\) there exists
an infinite number of \(\vb \in \mathcal{R}^K\) such that \(\vs = \MD^T\vb\);
but the particular \(\vb\) given by \((\MD\MD^T)^{-1}\vx\)
does not point at all in the left nullspace of \(\MD\),
and consequently has the smallest \(\ell_2\)-norm of all,
which thus means \(\vs^* \) — Jean-Claude Van Damme — has the
smallest norm of all possible solutions to \(\vx = \MD\vs\).
Jean-Claude Van Damme has the smallest norm of all possible solutions to \(\vx = \MD\vs\). |
Now, even though we have found the solution with the smallest \(\ell_2\)-norm,
Jean-Claude Van Damme is not always Jack Nicholson — and we might
safely say that Jean-Claude Van Damme is hardly ever Jack Nicholson.
This means, for those of you not versed in showbiz,
that \(\vs^*\) is not often the sparsest solution \(\vs_0\) as well.
(Such actors are trivial anyhow.)
As above, we know there exists some non-zero \(\vp \in \mathcal{R}^N\)
such that \( \vs_0 = \vs^* – \vp\);
and because of this, we see that \(\vx = \MD\vs_0 = \MD\vs^* – \MD\vp\).
This implies that \(\vp\) points entirely in the \((N-K)\)-dimensional nullspace of the dictionary
since \(\MD\vp = \mathbf{0}\).
Thus, using the minimum \(\ell_2\)-norm solution \(\vs^* \)
(which does not point one iota in the nullspace of \(\MD\)),
our task is now to find the vector in \(\text{span}\{ \vp \in \mathcal{R}^N : \MD\vp = \mathbf{0}, ||\vp|| > 0\}\),
that cancels the most elements from \(\vs^* \).
If we construct a new dictionary, \(\MJ\), of \((N-K)\) orthonormal vectors from
the nullspace of \(\MD\), then our problem can be stated as
$$\min_{\vp} || \vs^* – \vp ||_0 = \min_{\va} || \vs^* – \MJ\va ||_0.$$
Now, since \(\va \in \mathcal{R}^{(N-K)}\), we can find a solution
directly by testing \(n\)-tuples of the \((N-K)\) elements of \(\MJ\) that create
equivalent values in \(\vs^*\) for cancellation;
and while this appears less combinatorially complex than the direct approach with \(\MD\),
it is still prohibitively costly, especially considering that kids these days prefer \(N \gg K\).
Instead, we can take a greedy approach to solving the above problem,
which is the subject of two papers that I will review next week as Po’Ds:
- G. Rath and C. Guillemot, “A complementary matching pursuit algorithm for sparse approximation,” in Proc. European Signal Process. Conf., (Lausanne, Switzerland), Aug. 2008.
- G. Rath and C. Guillemot, “On a simple derivation of the complementary matching pursuit,” Signal Processing, vol. 90, pp. 702-706, Feb. 2010.