The problems associated with Matching Pursuit using an overcomplete dictionary (the pure greedy algorithm in approximation theory) have long been known: poor convergence, and bad choices. I provide a short synopsis below of work that I know of within the discipline of signal processing that attempts to deal with these problems; but this work is relatively audio-centered, and I am sure there exists work in the image processing domain.

- Pati et al. note that it is quite easy to bake a non-trivial example where MP will take an infinite number of iterations to converge to zero error, even though only a few are required with the given dictionary of non-orthogonal atoms. They propose using orthogonal projections: Y. Pati, R. Rezaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Proc. Asilomar Conf. Signals, Syst., Comput., vol. 1, (Pacific Grove, CA), pp. 40-44, Nov. 1993.
- Durka is one of the first to notice the negative impact of the decomposition artifacts: P. J. Durka and K. J. Blinowska, “Analysis of EEG transients by means of matching pursuit,” Ann. Biomed. Eng., vol. 23, pp. 608-611, Sep. 1995. A few years later, Durka et al. attempt to reduce these errors by averaging together wivigrams (summed Wigner-Ville distributions of atoms) from decompositions of the same signal made over dictionaries of randomly selected modulated windows: P. J. Durka, D. Ircha, and K. J. Blinowska, “Stochastic time-frequency dictionaries for matching pursuit,” IEEE Trans. Signal Process., vol. 49, no. 3, pp. 507-510, Mar. 2001.
- Jaggi et al. look at changing the MP atom selection criteria with one based on minimizing the worst fit of smaller scale atoms (here splines) that combine linearly to form larger scale atoms: S. Jaggi, W. C. Karl, S. Mallat, and A. S. Willsky, “High resolution pursuit for feature extraction,” tech. rep., Laboratory for Information and Decision Systems, MIT, Cambridge, MA, Nov. 1995; and S. Jaggi, W. C. Karl, S. Mallat, and A. S. Willsky, “High resolution pursuit for feature extraction,” Applied and Computational Harmonic Analysis, vol. 5, no. 4, pp. 428-449, Oct. 1998.
- Gribonval et al., are one of the first to apply MP to musical signals, and adapt the approach of Jaggi et al. for that purpose. We see a reduction in artifacts preceding transients, but at a higher computational cost, and with a loss of freedom in dictionary selection: R. Gribonval, E. Bacry, S. Mallat, P. Depalle, and X. Rodet, “Analysis of sound signals with high resolution matching pursuit,” in Proc. IEEE-SP Int. Symp. Time-Freq. Time-Scale Anal., Paris, France, pp. 125-128, June 1996.
- Goodwin et al. also notice the presence of bad atoms, and attribute this to a lack of fit between the dictionary and signal structures, e.g., asymmetric structures modeled with symmetric atoms. So, for speech signals, and music with transients, he proposes to use a dictionary of damped sinusoids: M. M. Goodwin and M. Vetterli, “Matching pursuit with damped sinusoids,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 3, (Munich, Germany), pp. 2037-2040, Apr. 1997; and M. M. Goodwin and M. Vetterli, “Matching pursuit and atomic signal models based on recursive ﬁlter banks,” IEEE Trans. Signal Process., vol. 47, no. 7, pp. 1890-1902, July 1999.
- Nearly simultaneously, OMP is “optimized” so that atoms are selected based on minimizing the residual error after projection: L. Rebollo-Neira and D. Lowe, “Optimized orthogonal matching pursuit approach,” IEEE Signal Process. Lett., vol. 9, pp. 137-140, Apr. 2002; P. Vincent and Y. Bengio, “Kernel matching pursuit,” Machine Learning, vol. 48, pp. 165-187, July 2002. It has been pointed out that “Optimized OMP” is nothing more than just orthogonal least squares: T. Blumensath and M. E. Davies, “On the difference between orthogonal matching pursuit and orthogonal least squares,” tech. rep., University of Edinburgh, Edinburgh, Scotland, U.K., Mar. 2007.
- Christensen et al. notice that these bad atoms cause audible artifacts, and proposes a cyclic approach that can correct for them by replacement cycles: M. G. Christensen and S. H. Jensen, “The cyclic matching pursuit and its application to audio modeling and coding,” in Proc. Asilomar Conf. Signals, Syst., Comput., (Pacific Grove, CA), Nov. 2007. This is extended to time frequency dictionaries in: B. L. Sturm and M. Christensen, “Cyclic matching pursuit with multiscale time-frequency dictionaries,” in Proc. Asilomar Conf. Signals, Systems, and Computers, (Pacific Grove, CA), Nov. 2010.
- Ravelli et al. explore audio coding using MP and a dictionary of a union of MDCT bases at 8 scales. They attempt to avoid the pre-echo artifacts created by MP with a rather ad hoc method of looking at correlations on smaller scales, akin to the approach of Jaggi et al: E. Ravelli, G. Richard, and L. Daudet, “Union of MDCT bases for audio coding,” IEEE Trans. Audio, Speech, Lang. Proc., vol. 16, pp. 1361-1372, Nov. 2008.
- Sturm et al. give these artifacts a name, and begin to formally study their origins: B. L. Sturm, J. J. Shynk, L. Daudet, and C. Roads, “Dark energy in sparse atomic estimations,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 3, pp. 671-676, Mar. 2008. Finally, aiming to remove the influence of bad atom selection in MP, but to keep its simplicity and freedom to choose the dictionary, Sturm et al. propose and study a simple change to the pure greedy atom selection called “interference adaptation”: B. L. Sturm and J. J. Shynk, “Sparse approximation and the pursuit of meaningful signal models with interference adaptation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, pp. 461-472, Mar. 2010.

Greedy approaches to sparse approximation of data are competitive with convex optimization approaches (depending on the underlying distributions apparently) because they are the only ones that are computationally feasible for large-scale problems. If we can maintain simplicity, improve convergence, and avoid bad selections in a greedy decomposition method, then the analysis of high-dimensional data, such as audio signals, can benefit. In my own realm of research, I still have yet to fully analyze the “interference adaptation” I propose. I believe it is nothing more than the “relaxed greedy algorithm” described in: A. Barron, A. Cohen, W. Dahmen, and R. A. DeVore, “Approximation and learning by greedy algorithms,” Annals of Statistics, vol. 36, no. 1, pp. 64-94, 2008. In that work, they point to this one from a long time ago: L. K. Jones, “A simple lemma on greedy approximation in Hilbert spaces and convergence rates for projection pursuit regression and neural network training.” Ann. Statist. 20, 608-613, 1992.