Hello, and welcome to the Paper of the Day (Po’D): Sparse Approximation Edition. Today’s paper comes from Vladimir N. Temlyakov and Pavel Zheltov, “On performance of greedy algorithms,” submitted Jan. 27, 2010 to Journal of Approximation Theory. (Technical report available here.)

This paper looks at the performance of “orthogonal greedy algorithms” (e.g., orthogonal matching pursuit) and “pure greedy algorithms” (e.g., matching pursuit) in recovering sparse signals from linear combinations of elements from low coherence dictionaries. In other words, let us assume some discrete signal \(\mathbf{x}\) of finite support \(N\) is a linear combination of \(m \le N\) atoms (of same dimension as \(\mathbf{x}\)) selected from the dictionary \(\mathbf{D}\) such that $$\mathbf{x} = \mathbf{D} \mathbf{a}$$ where \(\mathbf{a}\) has only \(m\) non-zero entries. Now, what conditions must this system meet so that we can recover \(\mathbf{a}\) from \(\mathbf{x}\) using an orthogonal greedy algorithm? Clearly, if the dictionary is an orthonormal basis, then there will be no problem no matter what \(m\) is. This is because there is no interaction between atoms in building \(\mathbf{x}\), i.e., every atom pair has a zero inner product. However, when the dictionary becomes a frame, e.g., a union of two bases, then we have ourselves a more difficult problem.

One of the figures of merits of a dictionary is its *coherency*, defined: $$M = \sup_{\mathbf{d}_i,\mathbf{d}_j} \frac{| \langle \mathbf{d}_i,\mathbf{d}_j \rangle|}{||\mathbf{d}_i|| ||\mathbf{d}_j||}, \mathbf{d}_i,\mathbf{d}_j \in \mathbf{D}, \mathbf{d}_i \ne \mathbf{d}_j.$$ In words, this is the largest cosine distance between all pairs of atoms from the dictionary, or likewise the smallest angle between all atom pairs. As \(M \to 1\), then we are approaching higher and higher redundancy in the dictionary, and there will begin to exist signals \(\mathbf{x}\) in some subspace for which orthogonal greedy algorithms cannot recover their sparse representations \(\mathbf{a}\). This paper puts a limit on the approximation error of a signal with sparsity \(m\) in terms of the coherency \(M\) of the dictionary \(\mathbf{D}\) using an orthogonal greedy algorithm.

Defining the \(k\)th-order approximation error of \(\mathbf{x}\) as \(\mathbf{r}(k)\), and \(\sigma_k(\mathbf{x})\) as the best \(k\)-term approximation error, the authors show that the \(k+s\)th-order approximation error using an orthogonal greedy algorithm using a dictionary of coherency \(M\) has the upper bound

(Theorem 5): $$||\mathbf{r}(k+s) ||_2 \le 7 M s || \mathbf{r}(k) ||_2 + \sigma_s^2(\mathbf{r}(k)), \; \textrm{if} \; k+s \le \frac{1}{2M}, k \le s.$$

The difficult matter is evaluating the term \(\sigma_s^2(\mathbf{r}(k))\); but regardless, we can see from this expression how much larger than the minimum error the orthogonal greedy algorithm can take us.

Another interesting aspect of this paper is its demonstration of a particular dictionary with low coherency that cannot be used by an orthogonal greedy algorithm to recover any sparse signal at the limit $$m \le \frac{1}{2} \left ( \frac{1}{M} + 1\right ).$$ In fact, they show that an orthogonal greedy algorithm does not find a single one of the original atoms used to build \(\mathbf{x}\).

From my dissertation work, I am particularly enticed by papers offering new glimpses of greedy algorithms for sparse approximation. However, I have yet to see how I can apply such work to audio signal decomposition. We cannot know a priori the sparsity of a given audio signal; but I do know from experience that what works best to produce signal models that are useful for my purposes (low order, small error, and a good reflection of the short-time signal statistics) are dictionaries with high coherency. I don’t ever start with the assumption that there exists a “true” or “correct” model; but I do know that some atoms are less correct than others.