# Quick Cyclic Matching Pursuit, but not Practical

A long time ago, I noted a well-cited trick to making matching pursuit (MP) fast in theory, but not in practice when it comes to the large dictionaries common in audio signal processing.
The trick is calculating all inner products of the dictionary atoms $$\mathcal{D} = \{ \vd_i \}$$ with the original signal $$\vx$$, and then using simple recursive update formula to choose the next atom by the criterion:
$$\vh_{n+1} = \arg \max_{\vd \in \mathcal{D}} | \langle \vr(n+1), \vd \rangle |$$
where
\begin{align} \langle \vr(n+1), \vd \rangle & = \langle \vr(n), \vd \rangle – a_n \langle \vh_n, \vd \rangle \\ a_n & = \langle \vr(n), \vh_n \rangle \end{align}
where $$\vh_n \in \mathcal{D}$$ is the ($$n+1$$)th atom selected by MP (starting with $$\vh_0$$.)
Starting from the set $$\{\langle \vr(0), \vd_i \rangle : \vd_i \in \mathcal{D} \}$$, we can essentially read off the atoms and their weights from an MP decomposition without ever having to compute a residual again. The updates require $$| \mathcal{D} |$$ multiplications, and a search for the single largest magnitude value, which has complexity order of at most $$| \mathcal{D} |$$.
Thus, for $$s$$ iterations of MP, we are looking at a minimum complexity of order $$s | \mathcal{D} |$$
MPTK, the fastest implementation of MP in the world, has a complexity of $$s | \mathcal{D} | \log | \mathcal{D} |$$, because it does not use this impractical approach.

Now, I wonder, can we find such a theoretically fast but practically useless recursive update for cyclic MP (Po’D here and here.)
After all, CMP is just MP within MP, which is almost but not quite exactly like sparse approximation meets Inception, but without all the drama.
Consider that we have an $$n$$-th order model of $$\vx$$, making use of the $$n$$ atoms in $$\mathcal{H} = \{\vh_0, \vh_1, \ldots, \vh_{n-1} : \vh_i \in \mathcal{D} \}$$, each selected by MP.
By the recursive relations above, we also have the weights $$\{a_i : i = 0, 1, \ldots, n-1 \}$$,
and the sets of inner products $$\{\langle \vr(n), \vd_i \rangle : \vd_i \in \mathcal{D} \}$$,
and $$\{\langle \vd_i, \vd_j \rangle : \vd_i, \vd_j \in \mathcal{D} \}$$.

In the model refinement step,
CMP cycles through the atoms to refine each one.
Let’s do it in the order that they were added to the model, starting with $$\vh_0$$.
Thus, we replace this first atom using the criterion:
\begin{align} \vh_{0}^{(1)} &= \arg \max_{\vd \in \mathcal{D}} | \langle \vr(n) + a_0\vh_0, \vd \rangle | \\ & = \arg \max_{\vd \in \mathcal{D}} | \langle \vr(n), \vd \rangle + a_0 \langle \vh_0, \vd \rangle | \end{align}
All those values have been computed, and so we just need a multiply, and a search for the maximum.
Then we adjust the weight according to
$$a_0^{(1)} = \langle \vr(n) + a_0\vh_0, \vh_{0}^{(1)} \rangle = \langle \vr(n), \vh_{0}^{(1)} \rangle + a_0 \langle \vh_0, \vh_{0}^{(1)} \rangle$$
All those values have been computed, and we just need a single multiply.

Now we refine the second atom.
The criterion is now
\begin{align} \vh_{1}^{(1)} &= \arg \max_{\vd \in \mathcal{D}} | \langle \vr^{(1)}(n) + a_1\vh_1, \vd \rangle | \\ & = \arg \max_{\vd \in \mathcal{D}} | \langle \vr^{(1)}(n), \vd \rangle + a_1 \langle \vh_1, \vd \rangle |. \end{align}
The last two values have been computed, and so we just need to determine the first.
Since $$\vr^{(1)}(n) = \vr(n) + a_0\vh_0 – a_0^{(1)}\vh_{0}^{(1)}$$, then
\begin{align} \langle \vr^{(1)}(n), \vd \rangle & = \langle \vr(n) + a_0\vh_0 – a_0^{(1)}\vh_{0}^{(1)}, \vd \rangle \\ & = \langle \vr(n), \vd \rangle+ a_0 \langle \vh_0,\vd \rangle – a_0^{(1)} \langle \vh_{0}^{(1)}, \vd \rangle \end{align}
and all those values have been computed, and we just need one more multiply.

Now we refine the third atom.
The criterion is now
\begin{align} \vh_{2}^{(1)} &= \arg \max_{\vd \in \mathcal{D}} | \langle \vr^{(2)}(n) + a_2\vh_2, \vd \rangle | \\ & = \arg \max_{\vd \in \mathcal{D}} | \langle \vr^{(2)}(n), \vd \rangle + a_2 \langle \vh_2, \vd \rangle |. \end{align}
The last two values have been computed, and so we just need to determine the first.
Since $$\vr^{(2)}(n) = \vr^{(1)}(n) + a_1\vh_1 – a_1^{(1)}\vh_{1}^{(1)}$$, then
\begin{align} \langle \vr^{(2)}(n), \vd \rangle & = \langle \vr^{(1)}(n) + a_1\vh_1 – a_1^{(1)}\vh_{1}^{(1)}, \vd \rangle \\ & = \langle \vr^{(1)}(n), \vd \rangle+ a_1 \langle \vh_1,\vd \rangle – a_1^{(1)} \langle \vh_{1}^{(1)}, \vd \rangle \end{align}
and all those values have been computed, and we just need one more multiply.

We continue in this way until we have refined all atoms,
and then perhaps cycle through them a total of $$L$$ times.
After this, we want to augment the representation,
which is very simple because we just choose the next atom by the criterion above, but using the updated inner products with the residual.
From the above, we can distill the recursive update procedure as follows.
After CMP has augmented the representation, and given us the updated set of inner products $$\{\langle \vr(n), \vd_i \rangle : \vd_i \in \mathcal{D} \}$$,
CMP sets to refining its ($$i+1$$)th atom by updating this set
$$\langle \vr'(n), \vd \rangle = \langle \vr(n) + a_i\vh_i – a_i’\vh_{i}’, \vd \rangle = \langle \vr(n), \vd \rangle+ a_i \langle \vh_i,\vd \rangle – a_i’ \langle \vh_{i}’, \vd \rangle$$
which only requires one multiply, and then evaluating
\begin{align} \vh_{i+1}’ &= \arg \max_{\vd \in \mathcal{D}} | \langle \vr'(n) + a_{i+1}\vh_{i+1}, \vd \rangle | \\ & = \arg \max_{\vd \in \mathcal{D}} | \langle \vr'(n), \vd \rangle + a_{i+1} \langle \vh_{i+1}, \vd \rangle | \end{align}
which is only a search.

With this procedure, and with CMP refining its atoms a total of $$L$$ times at each of $$s$$ augmentation stages, I estimate the computational complexity to be of order
$$\mathcal{O} \left (s | \mathcal{D} | + \sum_{k=2}^s k L | \mathcal{D} | \right) = \mathcal{O} \left ( [ s + L s(s+1)/2 – L ] | \mathcal{D} | \right)$$
which I think can be reduced to
$$\mathcal{O} \left ( [s^2 L + s] | \mathcal{D} | \right ).$$
That $$s^2$$ term scares me.