A missing step from the Method of Directions?

In theory, things can look really good;
but then when it comes to implementation, wrinkles arise.
Only after some hair pulling did I realize that there may be a step missing in MOD,
which I cannot find in the original paper: K. Engan, S. O. Aase, and J. H. Husøy, “Method of optimal directions for frame design”, Proc. ICASSP, Vol. 5, pp. 2443 – 2446, 1999.


Let’s run the algorithm as described, which is summarized at the end of this.
The graph below shows the normalized coding errors as a function of iteration for each of 1000 different trials.
In each trial I create dictionaries of \(N=5\) atoms drawn from the uniform spherical ensemble of dimension \(m = 3\), and create \(L=10\) signals with sparsity \(s=1\) where the non-zero element is sampled from \(\{-1,1\}\) equiprobable.
Then I create the measurements simply by \(\MU := \MPhi_\textrm{true}\MX\),
where \(\MX\) is the sparse matrix — which I make sure is full rank so that all atoms of the dictionary are used.
I use OMP (the QR-1 implementation if you must know) as the sparse coding algorithm,
and do it a favor by running only 1 iteration, i.e., “OMP-QR-1, please find me the best atom.”
And I tell MOD to find a dictionary of 5 atoms.
I define the normalized coding error as
$$
J(i) := \frac{ \|\MU – \MPhi^{(i)}\MX\|_\textrm{FRO}}{\|\MU – \MPhi^{(2)}\MX\|_\textrm{FRO}}.
$$
In other words, I compare each coding error to that achieved with the first refined dictionary.
On the right is a histogram of the final normalized coding error at the final iteration,
with the cumulative frequency distribution shown as well.

MOD Type 1

normal_error_type1.png
Now we notice some things.
First, we see some oscillations.
Then we see nearly 70% of the problems
do not have any coding error improvement greater than 1 dB.
There are only a few lucky ones that improve 10 dB,
but they are quite exceptional.
So what is going on? Is MOD really that poor? Or is that as good as it gets?

The answer is “nej” in Danish.
Recall steps 4 and 5 of MOD here:

Compute SVD of sparse representation weights: \(\MX = \MG \Sigma \MV^T\)
Adjust the dictionary: \(\MPhi^{(i+1)} \leftarrow \MU \MV\Sigma^\dagger \MG^T\)

After this update, we compute the new sparse representations of the measurements with the new dictionary \(\MPhi^{(i+1)}\).
HOWEVER, if \(\Sigma\) is not full rank, then at least one column of the new dictionary is guaranteed to be zeros.
This means that though our original measurements might be composed of \(N\) atoms,
and we know for sure you want to learn \(N\) atoms,
the sparse representation step after this dictionary update will be using a smaller dictionary since some of the atoms are set to zero.
In effect, our atoms are evaporating from our dictionary, and there is no way to go back. The arrow of time and all.
This is bad news.

My colleague Boris informs me that the initialization of the dictionary in MOD is extremely important. And one possible way to defeat evaporating dictionaries is to replace the zeros on the diagonal of \(\Sigma\) with an epsilon. So, rerunning the experiment above with that change, instead of creating replacement atoms, we find the results below for 1000 iterations.

MOD Type 2

normal_error_type3_1000.png
There is some improvement, but not very much.
So, let’s amend the MOD dictionary update with this simple step:
If the dictionary cardinality decreases by \(n\),
draw \(n\) others from the uniform spherical ensemble.
Now look at these results!

MOD Type 3

normal_error_type2_1000.png
While there is still about 3% of the problems that experience less than 15 dB of coding error improvement, about 97% of them eventually go straight to hell, or, “numerical precision.” How interesting that there exist in nearly all cases a tipping point.

Boris suggests too that the atoms could be replaced by averaged training data, instead of realizations of noise. This makes sense because the measurements should tell us about the atoms. So now, for each evaporated atom, I create a weighted version of all our measurements by sampling weights from a Normal distribution. We see these results below.

MOD Type 4

normal_error_type4_1000.png
I would say that the results of Type 3 and Type 4 are pretty similar, except perhaps in the latter case the convergence is quicker.
Thus, at least in these 1-sparse cases, “resetting” the “evaporated” atoms “helps.”

This initialization sensitivity and atom evaporation ought to be discussed somewhere then. Here is a possible source of illumination: Aase, S.O., Husoy, J.H., Skretting, J.H.H.K., Engan, K., “Optimized signal expansions for sparse representation”, Signal Processing, IEEE Transactions on, On page(s): 1087 – 1096 Volume: 49, Issue: 5, May 2001.

Now it is time to ramp up the problem dimensions!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s