Paper of the Olden Days (Po’OD): Hierarchical and Parallel Matching Pursuit Edition

Hello, and welcome to the Paper of the Olden Days (Po’OD): Hierarchical and Parallel Matching Pursuit Edition. H. Feichtinger, A. Türk, and T. Strohmer, “Hierarchical parallel matching pursuit,” in Proc. SPIE Int. Soc. Opt. Eng., vol. 2302, pp. 222-232, 1994. <footnote>This paper begins in an interesting way: “The by now well-known matching pursuit method of S. Mallat as well as the recently proposed orthogonal matching pursuit …” I think it is always dangerous to begin a research exposition by stating the subject of study is popular, or well-known … and thus an attempt to indirectly infer one’s relevance. And with the context of the Po’OD, I am assuming that “well-known” does not mean matching pursuit was front page news in the NY Times; or that it hosted Saturday Night Live. But this sentence gives me historical pause to think, “At that time I was just a stupid kid 18 years old and incapable of matching my socks, let alone think about decomposing signals into atoms, which aren’t really atoms. </footnote>


This paper claims to propose a “parallel” implementation
of Matching Pursuit (MP) based on a hierarchical framework.
At some point I think it is alluded that the algorithm
first considers large atoms, and then successively smaller ones,
based on orthogonalized subdictionaries and their dual frames.
The order of these operations obviously affect the outcome,
and because of this I argue that this algorithm is not
actually a parallelization of MP.
Instead, I think this paper proposes a way of finding
an orthogonal projection in a large unstructured space
by using pseudo-inverses of orthogonal subspaces.
It is not entirely clear what that has to do with MP;
even their “algorithms” make no mention of it.

Consider a Hilbert Space \(E\),
and a closed subspace \(U\) created by the sum of
\(N\) closed subspaces of \(E: U = V_1 + \ldots + V_N\),
where each \(V_j\) is not necessarily orthogonal to the others.
(In any finite-dimensional normed linear space,
every possible subspace is closed.)
Define the dictionary \(\mathcal{D} = \bigcup_{j=1}^N \mathcal{D}_j =
\{\rho_i\}_{i\in I}\),
where each sub-dictionary has \(\text{span}\{\mathcal{D}_j\} = V_j\),
and thus \(\text{span}\{\mathcal{D}\} = U\).
Now consider the closed orthogonal complement subspaces
$$\begin{align}
W_1 & = V_1 \\
W_2 & = (V_1 + V_2) \ominus V_1 = (V_1 + V_2) \ominus W_1 \\
W_3 & = (V_1 + V_2 + V_3) \ominus (V_1 + V_2) =
(V_1 + V_2 + V_3) \ominus (W_1 \oplus W_2)\\
& \vdots \\
W_N & = (V_1 + \ldots + V_N) \ominus (V_1 + \ldots + V_{N-1}) = (V_1
+ \ldots + V_N) \ominus (W_1 \oplus \ldots \oplus W_{N-1})
\end{align}$$
from which we can clearly see \(U = W_1 \oplus \ldots \oplus W_N\).

We want to approximate some \(f \in E\) by the minimization criterion
$$
\min || f – f_a ||_2 = \min \Norm f – \sum_{i\in I} c_i \rho_i \Norm_2
$$
which is satisfied when \(f_a\) is
the orthogonal projection of \(f\) onto \(U\), or \(f_U\).
Calculating the orthogonal projection operator for \(U\) is, however,
unfeasible
since it will in general be very large and unstructured.
But this can be simplified by using instead the projection operators
associated with
each closed orthogonal subspace in the set \(\{W_j\}_{j=1}^N\).
Then \(f_U\) is built from the superposition of the projections of \(f\)
onto each of the orthogonal subspaces.
These still might be difficult to calculate directly,
so we can use the process of MP or OMP to iteratively approximate each
one.
Instead of finding the orthogonal projection of \(f\) onto \(U\)
by MP or OMP and the large \(\mathcal{D}\),
we will find the orthogonal projections of \(f\)
onto each \(W_j\) by MP or OMP and
a set of orthogonalized dictionaries \(\{\mathcal{C}_j\}_{j=1}^N\),
where \(\text{span}\{\mathcal{C}_j\} = W_j\).
This way the decomposition problem becomes separable, and thus
parallelizable,
and provides a speed-up of the algorithm
since each sub-dictionary is smaller than \(\mathcal{D}\).
The problem now is determining \(\{\mathcal{C}_j\}_{j=1}^N\)
using a Gram-Schmidt orthogonalization procedure starting with
\(\mathcal{D}_1\).

Let \(E = L^2(\mathcal{R}^m)\), the space of all
absolutely square integrable \(m\)-dimensional real functions.
Consider the closed linear subspace \(V_j\) which is the span
of all possible linear combinations of the translates \(h\in H\)
of some function \(\varphi_j \in E\), i.e.,
$$
V_j := \text{span} \left \{ v_j = \sum_{h \in H} a_h \varphi_j(x-h) \;
\forall \{a_h \in \mathcal{R}\}_{h \in H} \right \}.
$$
Thus, in terms of the dictionary \(\mathcal{D}_j := \{\varphi_j(x-h) : h
\in H\}\),
every vector \(v_j \in V_j\) has an exact linear expansion by
definition.
When \(\mathcal{D}_j\) constitutes a frame for \(V_j\), i.e.,
$$A || v_j ||_2^2 \le \sum_{h\in H} |\langle v_j, \varphi_j(x-h) \rangle
|^2 \le B || v_j ||_2^2
\forall v_j \in V_j$$ \(\textrm{such that} ||v_j||_2 > 0, A > 0, B
< \infty\)
then any \(v_j \in V_j\) has an expansion in terms of the dual frame
translated with \(H\) as well, \(\{\widetilde\varphi_j(x-h) : h \in
H\}\), i.e.,
$$
v_j = \sum_{h \in H} \langle v_j, \widetilde{\varphi}_j(x-h)\rangle
\varphi_j(x-h) =
\sum_{h \in H} \langle v_j,{\varphi}_j(x-h)\rangle
\widetilde{\varphi}_j(x-h)
$$
since the original frame is constructed by shifts of a single function
(proven elsewhere).

Now we will construct the dictionaries associated with the
orthogonal set of linear closed subspaces \(\{W_j\}_{j=1}^N\).
First, set \(\psi_1 := \varphi_1\),
and consider the set of functions \(\{\psi_j\}_{j=2}^N\) defined by
$$
\begin{align}
\psi_2 & := \varphi_2 – Q_1 \varphi_2 \\
\psi_3 & := \varphi_3 – (Q_1 \varphi_3 + Q_2 \varphi_3) \\
& \vdots \\
\psi_N & := \varphi_N – (Q_1 \varphi_N + Q_2 \varphi_N + \ldots +
Q_{N-1} \varphi_N)
\end{align}$$
where \(Q_j\) is the orthogonal projection operator onto \(W_j\).
The set of translates \(H\) of these functions
span their respective orthogonal subspace,
just as the set of \(\varphi_j\),
which is proven in the paper.

For the discrete case, \(E = \ell^2(\mathcal{Z}_n)\), the space of all
absolutely square integrable \(n\)-dimensional complex functions.
We consider the necessarily closed linear subspace \(V_j\),
which is the span of all possible linear combinations of
some function \(\varphi_j \in E\), as in (\ref{eq:Vjdefinition}).
Here however, we consider the translates \(h \in H_a\)
built as \(d\) multiples of a fundamental translation \(a\),
such that \(n=da\).
As above, first set \(\psi_1 := \varphi_1\),
and construct the following dictionary matrix:
$$
\MD_1 := \left [
\psi_1(x) | \psi_1(x-a) | \psi_1(x-2a) | \cdots | \psi_1(x-da) \right ].
$$
This dictionary spans, and is a frame for, the closed subspace \(V_1 =
W_1\).
Now, to get the dictionary for each orthogonal subspace \(W_j\)
we iteratively calculate the following functions
$$
\begin{align}
\psi_2 & := [ \MI – \MD_1 \MD_1^ \dagger ] \varphi_2 \\
\psi_3 & := [ \MI – ( \MD_1 \MD_1^ \dagger + \MD_2 \MD_2^ \dagger) ]
\varphi_3 \\
& \vdots \\
\psi_N & := [ \MI – ( \MD_1 \MD_1^ \dagger + \MD_2 \MD_2^ \dagger +
\ldots + \MD_{N-1} \MD_{N-1}^ \dagger) ] \varphi_N
\end{align}$$
where \(\MD_j^\dagger := (\MD_j^H \MD_j)^{-1}\MD_j^H\),
and each dictionary is formed iteratively by
$$
\MD_j := \left [
\psi_j(x) | \psi_j(x-a) | \psi_j(x-2a) | \cdots | \psi_j(x-da) \right ].
$$

Phew! But in the end I don’t see how this is applicable to any of the dictionaries we use in audio signal processing.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s