Consider a dictionary of two atoms

Say we have some signal \(\vx \in \mathcal{C}^N\),
and a dictionary of two atoms \(\mathcal{D} = \{\vd_1, \vd_2 \in \mathcal{C}^N : || \vd_i ||_2 = 1 \}\).
And assume that the signal is not in the span of the dictionary, so there is some residual with non-zero norm.
We can minimize this norm by an orthogonal projection:
\(\vx_\mathcal{D} := a_1^o \vd_1 + a_2^o \vd_2\),
where
$$
\begin{align}
a_1^o & = \frac{\langle \vx, \vd_1\rangle – \langle \vx, \vd_2\rangle \langle \vd_1, \vd_2 \rangle }{1 – |\langle \vd_1, \vd_2 \rangle |^2} \\
a_2^o & = \frac{\langle \vx, \vd_2\rangle – \langle \vx, \vd_1\rangle \langle \vd_2, \vd_1 \rangle }{1 – |\langle \vd_1, \vd_2 \rangle |^2}
\end{align}
$$
Now, consider that we run matching pursuit on this signal with this dictionary.
Assume that \(\vd_1\) is the first atom selected; and then necessarily \(\vd_2\).
The resulting approximation of this signal gives the weights
$$
\begin{align}
a_1 & = \langle \vx, \vd_1\rangle \\
a_2 & = \langle \vx – a_1\vd_1, \vd_2\rangle
\end{align}
$$
and the residual \(\vr = \vx – a_1 \vd_1 – a_2\vd_2\).
What if we add to the residual \(a_1 \vd_1\) and recompute the weight of the first atom?
And then after that, what if we add to the new residual \(a_2 \vd_2\), and recompute the weight of the second atom?
And what if we do this an infinite number of times?
Do the resulting weights, \(\{a_1^{(\infty)}, a_2^{(\infty)}\}\) converge to those of the orthogonal projection above?
And if so, how fast do they converge?


Starting from the residual,
and defining \(c := \langle \vd_2, \vd_1 \rangle\),
the new first weight will be
$$
\begin{align}
a_1^{(1)} & = \langle \vx – a_2\vd_2, \vd_1 \rangle = \langle \vx, \vd_1 \rangle – a_2 \langle \vd_2, \vd_1\rangle \\
& = a_1 – a_2 c
\end{align}
$$
The new second weight is then
$$
\begin{align}
a_2^{(1)} & = \langle \vx – a_1^{(1)}\vd_1, \vd_2 \rangle = \langle \vx, \vd_2 \rangle – a_1^{(1)} \langle \vd_1, \vd_2\rangle \\
& = a_2 + a_1 c^* – a_1^{(1)} c^* \\
& = a_2 – c^*(a_1^{(1)} – a_1)
\end{align}
$$
since \(\langle \vx, \vd_2 \rangle = a_2 + a_1 \langle \vd_1, \vd_2\rangle \) from above.

Continuing once more,
the new first weight will be
$$
\begin{align}
a_1^{(2)} & = \langle \vx – a_2^{(1)}\vd_2, \vd_1 \rangle = \langle \vx, \vd_1 \rangle – a_2^{(1)} \langle \vd_2, \vd_1\rangle \\
& = a_1 – a_2^{(1)} c
\end{align}
$$
and the new second weight will be
$$
\begin{align}
a_2^{(2)} & = \langle \vx – a_1^{(2)}\vd_1, \vd_2 \rangle = \langle \vx, \vd_2 \rangle – a_1^{(2)} \langle \vd_1, \vd_2\rangle \\
& = a_2 + a_1 c^* – a_1^{(2)} c^* \\
& = a_2 – c^*(a_1^{(2)} – a_1)
\end{align}
$$
From these, we see the following pattern for the weights
$$
\begin{align}
a_1^{(n)} & = a_1 – a_2^{(n-1)} c \\
a_2^{(n)} & = a_2 – (a_1^{(n)} – a_1) c^*.
\end{align}
$$
Substituting the latter expression into the former we get the following system
$$
a_1^{(n)} = |c|^2 a_1^{(n-1)} + a_1(1 – |c|^2) – a_2 c
$$
for \(n > 1\).
We can express this as a finite difference equation
$$
y(n) – |c|^2 y(n-1) = x(n)
$$
and explore its properties as a digital filter.
Taking the z-transform, we find the relationship between output and input:
$$
H(z) := \frac{Y(z)}{X(z)} = \frac{1}{1-|c|^2z^{-1}}
$$
which shows the system has a pole at \(|c|^2\).
Since the dictionary has unit norm atoms we know \(|c|^2 \le 1\).
Furthermore, if \(|c|^2 = 1\) then the second weight found by MP will be zero.
So, we can safely assume \(|c|^2 < 1\), i.e., the causal system is stable.

When \(x(n) = p \mu(n)\),
where \(p := a_1(1 – |c|^2) – a_2 c\) and \(\mu(n) \) is the step function,
we are essentially finding the step response of the system.
In this case,
Since the z-transform of this is
$$
X(z) = \frac{p}{1-z^{-1}}
$$
our output has the z-transform
$$
Y(z) = \frac{p}{1-z^{-1}}H(z) = \frac{p}{(1-|c|^2z^{-1})(1-z^{-1})} = \frac{pz^2}{(z – |c|^2)(z-1)}.
$$
Using partial fractions, we find
$$
Y(z) = \frac{pz^2}{(z – |c|^2)(z-1)} = pz^2\left [ \frac{(1 – |c|^2)^{-1}}{z-1} – \frac{(1 – |c|^2)^{-1}}{z – |c|^2}\right ].
$$
Taking this z-transform back to the time-domain, this signal splits this into a steady-state and transient response
$$
y(n) = \frac{p}{1-|c|^2}\left [ 1 + |c|^{2n} \right ] \mu(n)
$$
from which we see
$$
\begin{align}
\lim_{n\to \infty} y(n) = a_1^{(\infty)} & = \frac{a_1(1 – |c|^2) – a_2 c}{1-|c|^2} \\
& = \frac{\langle \vx, \vd_1 \rangle (1 – |c|^2) – \langle \vx, \vd_2 \rangle c + \langle \vx, \vd_1 \rangle |c|^2}{1-|c|^2} \\
& = \frac{\langle \vx, \vd_1 \rangle – \langle \vx, \vd_2 \rangle \langle \vd_1, \vd_2 \rangle}{1-|\langle \vd_1, \vd_2 \rangle|^2} \\
& = a_1^{o}.
\end{align}
$$
Voilà (except for a conjugate error somewhere, and lazy dealing with an extra z term). So the first weight converges to that given by least squares! (I leave it to the reader to prove the convergence of the second weight to the optimum.)
Furthermore, from this analysis we see that the convergence depends on \(|c|^2 = |\langle \vd_2, \vd_1 \rangle |^2\).
When the two atoms are pointed in nearly the same direction up to a sign, \(|c|^2 \approx 1\), the convergence will be quite slow.

In the figure below we see a slow rate of convergence.
The atom \(\vd_2\) is made to point in the direction of \(\vd_1\) with just a slight perturbation; and \(|\langle \vd_2, \vd_1\rangle| \approx 0.97\).
The dashed lines are the weights found by least squares.
We see the weights converge correctly, but slowly.

weightconvergence.jpg
In the figure below, however, we see a fast rate of convergence.
In this case, \(|\langle \vd_2, \vd_1\rangle| \approx 0.52\).
Of course, if \(|\langle \vd_2, \vd_1\rangle| = 0\), then as expected the weights converge to the optimum immediately.

weightconvergence2.jpg
Now, the interesting thing to ask is whether this happens when we increase the number of atoms, and adapt each one’s weight while holding the others constant.
The figure below shows what happens with four atoms.

weightconvergence3.jpg
Clearly we have convergence; but now the convergence of each weight is tied to the other weights in a very complex manner.
I predict that the convergence of all the weights is dominated by
the pair of atoms that are most similar, which is the coherence of the dictionary.

Just to make sure, I have run a script looking at 95 atoms selected from the uniform spherical ensemble in a 100-dimensional space.
Running the above procedure to 2000 iterations produces the weight convergence below.

weightconvergence4.jpg
The SNR at the end (comparing the difference between the sorted optimal weights and the sorted iterated weights to the optimal weights) is nearly 100 dB.
If we increase the number of atoms to 99, we come dangerously close to breaking our assumption that \(|c| < 1\).

Advertisements

2 thoughts on “Consider a dictionary of two atoms

  1. Bob,
    This is really interesting and actually particularly useful for thinking about some things I’ve actually only started looking at this past week :P You have great timing!
    I’m going to try to spend more time understanding your train of thought more thoroughly :)
    Thanks,
    — Eric

    Like

  2. Thank you Eric. It is a pleasure to know you are reading.
    I am thinking it shouldn’t be too much trouble to solve for all weights at once. Just need to spend time formulating the matrices.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s