# Papers of the Day (Po’D): Comparative Studies of Greedy Sparse Decompositions

Hello, and welcome to the Papers of the Day (Po’D): Comparative Studies of Greedy Sparse Decompositions. Today’s two papers are an interesting lot, looking at the performance of some greedy algorithms for the recovery of sparse signals:

1. P. Dymarski, N. Moreau and G. Richard, “Greedy sparse decompositions: a comparative study”, EURASIP Journal on Advances in Signal Processing, Aug. 2011.
2. G. Rath and A. Sahoo, “A COMPARATIVE STUDY OF SOME GREEDY PURSUIT ALGORITHMS FOR SPARSE APPROXIMATION,” in Proc. EUSIPCO, Glasgow, Scotland, Aug. 2009.

(I attended a seminar by Nicolas Moreau on this subject during my postdoc in Paris in 2009, where he was persuaded to publish such a comprehensive review. It is nice to see the final paper!)

In their article, Dymarski et al. first make explicit the equivalence between shape gain vector quantization methods developed in speech coding in the eighties (specifically code-excited linear prediction), and the family of matching pursuit (MP) algorithms developed from 1993. All one needs to do is change terminology, and expand the definition of the dictionary. MP in 1993 becomes the “standard iterative algorithm” from 1984 proposed for multipulse excitation linear predictive coding (LPC). (Of course, MP is also the “projection pursuit” developed in statistics in 1985; and I am sure it appeared sometime much earlier in the development of approximation theory.) In the same work from 1984, the authors propose changing the gains at each iteration to orthogonalize the error to the vector space spanned by the selected codewords, which is equivalent to orthogonal MP (OMP). And again in the same year, this approach was further refined by orthogonalizing at each step the error and the remaining codewords to those codewords already selected. This is equivalent to optimized OMP (OOMP), also known as orthogonal least squares (OLS). On top of this, the variety of approaches making these algorithms fast have already been known for some time before their rediscovery in sparse representation.

Dymarski et al. test these algorithms along with least angle regression (LARS), complementary MP (the other CMP, Po’D here), and its orthogonal variant (OOCMP). (Their code is available here.) Their test suite consists of two dictionaries (size 40 by 128) containing either realizations of a Gaussian white noise process, or a second order autoregressive process. They test decomposition performance of two types of signals: ones that are of the same process as the atoms in the dictionary; and ones that are 10-sparse in the dictionary. (In the latter case, the authors mention the amplitudes are random, and that the amplitudes can be small, but they do not state how they are distributed. I assume they are not Bernoulli in $$\{-\alpha, \alpha\}$$. Dymarski responds that they are Gaussian distributed.) We see for these signals with respect to the signal to residual ratios, an advantage in using OOMP and OOCMP. The least achieving algorithms in these cases are LARS and MP.
In the latter case, they debias the solution by finding the orthogonal projection of the signal onto the atoms selected.

When it comes to exact recovery in terms of the support, OOCMP succeeds for both cases above 95% of the signals tested, and OMP and OOMP succeed for over 90% of the signals for the white noise case only. The authors provide a nice discussion of why the complementary approaches perform better for the autoregressive dictionary: the cross correlation between dictionary atoms in the two spaces are different. In the higher-dimensional space, they are more orthogonal.

Finally, Dymarski et al. combine the cyclic approaches of Christensen and myself (Po’D here and here) — which I am sure has been invented in another field — with OOMP and OOCMP. In the same experiments, they find these approaches produce better approximations than others (including BP, subspace pursuit, and CoSaMP), and the cyclic OOCMP recovers more sparse signals in the two dictionaries. Thus, we see that it can pay if we start in the neighborhood of a solution, rather than from the origin. (If the sparse signal amplitudes were distributed Bernoulli, they would underperform BP.)

In their comparative study paper, Rath and Sahoo also propose new versions of complementary MP (CMP) and orthogonal CMP (OCMP). They also mix the selection step of CMP with the update steps of MP or OMP, and the selection step of MP with the update step of CMP or OCMP. In their experiments, which use dictionaries of Gaussian white noise (dimension 16 by 25), and sparse vectors made of linear combinations of 1-8 atoms with amplitudes distributed normally. It appears they stop the decomposition process when the residual error bound is “0.001 per component” — which means what I have no idea. Is it $$||\vx – \hat\vx||_2/s$$ where $$s$$ is the sparsity? Is it the average error only over the support set? We see the OCMP algorithm produces on average better approximations with respect to residual error (per component) (?), even when there is additive white Gaussian noise (at some SNR).

Looking at the implementations by Dymarski et al., I see they have used an erroneous implementation of CoSaMP,
which I corrected here. So that is probably the reason why CoSaMP performs so poorly and erratically.
It also appears they had some trouble with subspace pursuit after a certain number of iterations. Looking through their code here also shows there could be a linear dependence issue as the iterations increase.
Anyhow, I am now going to augment the set of 15 algorithms
I study in this work, with OOMP, CMP, OOCMP, and cyclic CMP and OOCMP. That will bring the total to 20!