Hello, and welcome to Paper of the Day (Po’D): Low-Complexity Orthogonal Matching Pursuit Edition. Today’s papers are exciting:

- B. Mailhé, R. Gribonval, F. Bimbot, and P. Vandergheynst, “Local orthogonal greedy pursuits for scalable sparse approximation of large signals with shift-invariant dictionaries,” in Proc. Signal Process. Adaptive Sparse Structured Representations, (St. Malo, France), Apr. 2009.
- B. Mailhé, R. Gribonval, F. Bimbot, and P. Vandergheynst, “A low complexity orthogonal matching pursuit for sparse signal approximation with shift-invariant dictionaries,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., (Taipei, Taiwan), pp. 3445-3448, Apr. 2009.

In a world, where representing audio signals in efficient and meaningful ways can mean the difference between useful and useless descriptions, we have seen the sheer power in the approximation of a function by a linear combination of functions selected from a very redundant set of functions. However, as the redundancy of this set increases, the price paid becomes too great to justify use reaching for the quite large fruits of optimal efficiency and meaningfulness. What all this means in an overly dramatic and poetic way is basically that the turtle that represents our dreams of a sparse paradise is smashed under the spry feet of the elephant-roadrunner chimera that represents the frame-based orthogonal transforms with post-processing to fit parametric signal models.

The sparse approximation of high-dimensional signals (e.g., audio data) in terms drawn from large dictionaries is so computationally complex that it was not until about 2006 that the simplest greedy form became nearly practical in the form of an excellent cross-platform library called Matching Pursuit Toolkit (MPTK). (A tip of my hat to LastWave as well.) However, it was clear since almost the same year it was proposed within a signal processing context (1992-1993), that MP can run askew and completely miss good representations for even simple and non-pathological functions and dictionaries. One must then be wary to claim that a signal model produced by MP is good in any sense.

Orthogonal MP (OMP) was proposed as a natural extension of MP, where the atom selection criteria is preserved, but the residual is made orthogonal to the subspace spanned by the vectors so far selected. Though this change makes the signal decomposition proceed much faster in terms of the residual energy decay, it makes the computational overhead grow quadratically with iteration. So what all this means is that OMP can provide a signal model that is much more efficient than that provided by MP (assuming the dictionary is overcomplete and signal is non-trivial), but at a computational price that is much higher than the already high price paid for in using MP. What I need then, in part, is the computationally-lightest implementation of OMP with highly redundant dictionaries to enable experimentation with sparse approximation on real-world high-dimensional data like audio signals. These papers describe the first attempts at doing so in a locally optimal manner, and within the framework of MPTK.

Before we proceed, let’s review MP and OMP. The \(n\)th-order representation

of the signal \(\vx \in \mathcal{C}^K\) is denoted by

\(\mathcal{X}_n = \bigl \{ \MH(n), \va(n), \vr(n) \bigr \}\),

which produces the \(n\)th order model \(\vx = \MH(n)\va(n)+\vr(n)\).

The columns of the representation basis matrix \(\MH(n)\)

are atoms selected from the dictionary of \(N\) atoms \(\mathcal{D}_N\).

MP updates \(\mathcal{X}_n\) according to:

$$

\mathcal{X}_{n+1} = \left \{

\begin{array}{@{}r@{\;}l@{}}

\MH(n+1) = & [ \MH(n) | \vh_{n} ], \\

\va(n+1) = & [\va^T(n), \langle \vr(n), \vh_n \rangle ]^T, \\

\vr(n+1) = & \vx – \MH(n+1)\va(n+1)

\end{array}

\right \}

$$

using the atom selection criterion

$$

\vh_{n} = \arg \max_{\vd \in \mathcal{D}_N}

| \langle \vr(n), \vd \rangle |

$$

where \(||\vd||_2 = 1\) is implicit.

OMP uses the same atom selection criterion,

but updates the representation according to:

$$

\mathcal{X}_{n+1} = \left \{

\begin{array}{@{}r@{\;}l@{}}

\MH(n+1) = & [ \MH(n) | \vh_{n} ], \\

\va(n+1) = & [\MH^H(n+1) \MH(n+1)]^{-1}\MH^H(n+1)\vx, \\

\vr(n+1) = & \vx – \MH(n+1)\va(n+1)

\end{array}

\right \}.

$$

It is precisely the computation of the inverse Gramian that makes OMP so computationally heavy, even using the matrix inversion lemma.

In this paper, Mailhé et al. propose LoCOMP: a low-cost variation of OMP using a shift-invariant dictionary (all atoms present having unit shifts) of time-localized atoms. With these constraints, they are able to nicely implement LoCOMP within the existing framework of MPTK. The key modification they make is the assumption that each new atom affects only the weights of the atoms in its time-domain neighborhood. Given the above framework, LoCOMP selects an atom with the same criterion as MP, but updates the representation by

$$

\mathcal{X}_{n+1} = \left \{

\begin{array}{@{}r@{\;}l@{}}

\MH(n+1) = & [ \MH(n) | \vh_{n} ], \\

\va(n+1) = & \va_{\Psi(n)} \\

\vr(n+1) = & \vx – \MH(n+1)\va(n+1)

\end{array}

\right \}

$$

where \(\va_{\Psi(n)}\) is a vector of length \(n+1\) computed in the following way.

Define the set \(\Psi(n) = \{ 0 \le i \le n : \text{supp} [\MH(n)]_i \cap \text{supp}\vh_{n} \ne \oslash \}\), i.e., the set of column indicies in \(\MH(n)\) of the atoms that overlap in time with the new atom, \(\vh_n\). This is a “subdictionary,” onto the span of which we can orthogonally project a length-\(K\) vector by the matrix \(\MP_{\Psi(n)}\). With this, we leave alone the other \(n – |\Psi(n)|\) vectors that do not overlap the new atom. Finally, these new weights replace the relevant values in the old weights \(\va(n)\), we append a new weight for the new atom, and then leave all other weights alone. (The authors make a mistake in their algorithm formulation by saying that the new weights are added to the old weights.)

With this suboptimal formulation, the authors show that we can greatly diminish the computational overhead of OMP to something like that of MP — though they only show this for dictionaries built from translation-invariant MDCT bases of up to two scales (roughly corresponding to 2 ms and 23 ms, i.e., the scales used in advanced audio coding). Compared with OMP and gradient pursuits (GP, and which will be a Po’D soon), the authors observed a speed-up time from 800 — 1500! What took 5 days with OMP and GP, took LoCOMP 15 minutes (8 kHz mono-channel signal, but the authors do not state the kind of signal (music? greasy?), or how spectrally rich it is). And on top of this, the price paid in terms of reduced signal-to-residual-energy ratio to OMP and GP was about 0.01 dB!!! This is very exciting news.

Returning briefly to the problem mentioned above — that OMP solves some problems with model efficiency in MP — plenty of work has shown that even though a sparse model is more efficient than another with respect to the signal-to-residual-energy ratio, this does not mean that the features of the model better describe the contents of the signal. See for instance, a hem, my work: B. L. Sturm and J. J. Shynk, “Sparse approximation and the pursuit of meaningful signal models with interference adaptation,” IEEE Trans. Acoustics, Speech, Lang. Process., vol. 18, pp. 461-472, Mar. 2010. Thus, I am very curious to see the results of LoCOMP by changing its atom selection criterion to the one I study in that article, i.e.,

$$

\vh_{n} = \arg \max_{\vd \in \mathcal{D}_N}

| \langle \vr(n), \vd \rangle |^2 + \gamma(n) \langle \vr(n), \vd \rangle \langle \vx – \vr(n), \vd \rangle

$$

$$

= \arg \max_{\vd \in \mathcal{D}_N} (1-\gamma(n)) | \langle \vr(n), \vd \rangle |^2 +\gamma(n)\langle \vr(n), \vd \rangle\langle \vx, \vd \rangle$$

where \(\gamma(n)\) is an interference weighting that is currently poorly understood, except for the fact that we usually want \(\gamma(n) \ge 0\) (I am working on it!)

With this new atom selection, there is only minimal additional overhead (a maximum of \(N\) more multiplies per iteration), so it is completely implementable within the LoCOMP framework. My summer research docket grows one item more.