To accompany this interesting discussion about the behavior of the magnitude inner products as matching pursuit (MP) proceeds, I decided to run some experiments in the same way as those of Dymarski et al..

I am studying four types of dictionaries:

- realizations of Normal white noise process
- localized realizations of Normal white noise process
- realizations of 2nd order autoregressive process
- localized realizations of 2nd order autoregressive process

and two types of signals

- realization of same process as dictionary
- exact sparse construction of sparsities 5, 10, and 15 with amplitudes distributed Bernoulli in \(\{-1, 1\}\).

My dictionaries all have 10,000 unit norm atoms. My signal dimension is 128.

All dictionary subsets have a size of 5% of the large dictionary size, which here is 500 atoms. I choose each one randomly with replacement at each decomposition iteration of MP.

I run MP for a maximum of 64 iterations in all cases, and do this for 1000 trials.

For each trial, I compute a new dictionary and signal.

I am looking specifically at the following quantities:

$$

x_2 = \max_{\vd \in W’} \frac{ | \langle \vr_n, \vd \rangle|^2}{||\vr_{n}||_2^2}, \;

x_1 = \max_{\vd \in W\backslash \vd^*} \frac{ | \langle \vr_n, \vd \rangle|^2}{||\vr_{n}||_2^2}

$$

where \(W\) and \(W’\) are two randomly selected dictionary subsets,

and \(W\backslash \vd^*\) is the dictionary without the most absolutely correlated atom used to create \(\vr_n\), i.e., \(\vr_n = \vr_{n-1} – \langle \vr_{n-1}, \vd^*\rangle \vd^* \).

The value \(x_2\) is the largest normalized magnitude inner product of the residual with a new dictionary; and the value \(x_1\) is the largest magnitude inner product of the residual with the old dictionary.

With reference to my previous post on this matter, \(x_2 \equiv U(N)\), and \(x_1 \equiv P(N)\).

The question is: how often can we expect \(U(N) > P(N)\)?

Now, for each 1000 runs of MP, I accumulate counts of the value \(y = x_2 – x_1\), and estimate a probability distribution function of the random variable \(Y = X_2 – X_1\).

A positive value says that the best atom from the new dictionary is better than the best atom remaining in the old dictionary. A negative value says we should have stuck with the old dictionary for the next best atom.

For a dictionary of realizations of a normal white noise process, and the

signals above, we find the following probability distribution functions of the random variable \(Y\).

In the graph below, we can see plenty of opportunity for success (when it is measured so bleakly) by keeping the dictionary we have.

In other words, the best remaining atom in the old dictionary subset can often be better than the best atom in a new dictionary subset.

As our signal becomes less and less sparse, it approaches that of just another realization of a white noise process.

When we look at a dictionary of localized atoms of a Normal white noise process (the atoms are Hann windowed with scale 32 samples, and with 50% overlap), we see the same picture, though it is a bit more spread out.

For a dictionary of realizations from a 2nd order AR process (here the filter coefficients \(\{a_k\}\) are [1 -2*0.9*cos(pi/4) 0.9^2], which are the same a those used by Dymarski et al.),

and the same kind of signals, we don’t see much change depending on the sparsity.

This dictionary is highly coherent now.

When we make a dictionary of localized AR processes (again, Hann windowed to size 32 samples, and with 50% overlap), we decrease the coherence, and see the bell shrink or dilate some, depending on the signal sparsity.

All of this goes to show that the atom having the penultimate magnitude inner product of the old dictionary subset (\(x_1\)) is about half the time better than the atom with the ultimate magnitude inner product of a new dictionary subset (\(x_2\)). So, what about the pen-penultimate value? Or the pen-pen-penultimate value?

Here is my code if you wish to explore further.