# On Matching Pursuit and Order Statistics

The other day, I analyzed the use of order statistics to study matching pursuit (MP) by Manuel Moussallam et al. in their paper “Matching Pursuits with Random Sequential Subdictionaries” (Po’D here). Manuel has returned from his vacation and kindly responds to my comments below. Thank you Manuel!

Dear Bob,

I’ve read your comments which I found interesting even though I think they essentially point a lack of clarity of the article rather than a flaw.

First of all, in your first post you write: “The important question to answer now is whether MP using randomly selected subsets of a larger dictionary is guaranteed to produce signal models that are superior to MP using the larger dictionary.”

Sadly, the answer to that question is no. And I would totally agree that it is not stated clearly enough in the article.

In fact it may even be a bad idea to use varying subsets instead of a fixed one.

Indeed if x is sparse in a subdictionary D_0 \subset D and we know D_0, not only will RSSMP be weaker than MP over D, but also than MP over D_0. Thus, there is no way to guarantee that RSSMP will behave better than MP in the general case because it’s basically a false statement.

Having said that, in a number of situations, RSSMP will provide very interesting decompositions and the article tries to give an intuition of why it does so using the order statistics framework and some simple synthetic signals before applying it to the real wold of audio signals.
These situations are basically the ones in which x is sparse in D, but D is simply too big to perform MP on it.

The confusion comes, I think from the example used in section 3.2. The uniformly distributed samples that are taken in a decreasing order with or without reshuffling. This example is only provided to illustrate the spirit of the method, and not to be used as an effective model for the projections as you do in assuming that elements of W_0 are uniformly distributed over [0,1]. In fact, I can hardly think of a unit normed signal and a dictionary of pairwise orthogonal atoms that would behave like that!

A second confusion arises when you write that the random dictionaries are taken smaller and smaller. In fact all subsets have the same size. Contrary to what you write, MP is given a constant number of choices at each iteration, the key point is to have these choices vary so as to browse the biggest possible space during the overall process!

In section 3.4, we make strong assumptions (I guess some of them may be questionable) about the structure of the signal and most especially the pairwise orthogonality of it’s components and the fact that the statistical model of the inner products remains unchanged from one iteration to another. Figure 3.5 is an illustration of such case where you have non-overlapping components.

With your notations in this case, K elements of W_0 are non-negligible and drawn from a certain distribution and we may calculate the expectation of the ultimate value.
Then at iteration 1, only K-1 elements of W_1 are non-negligible and since the values have not changed, we derive equation (21) to represent convergence of MP with the fixed dictionary.
However, when changing the subdictionary (here by randomly translating all atoms in time) we have the set W’_1 of the same size as W_1. Only K-1 elements of W’_1 are non-negligible just as for W_1, but these K-1 values are brand new, so chances are that we will be able to pick a better atom.

So to summarize:

• Better convergence is not guaranteed
• Uniform distribution of inner products seems like a problematic situation
• Equation (21) fairly describes convergence of MP using a fixed dictionary of pairwise orthogonal atoms using order statistics
• Equation (26) may be used to describe convergence of RSSMP under some (strong) assumptions on both the signal and the random subdictionaries
• On real signals anyway, statistical modelling of inner products is often not available, but one may easily get the intuition of why RSSMP still has advantages.