Consider that we have the matrix above, where \( \{ \vh_i \in \mathcal{R}^K : || \vh_i || = 1 \}\) is a set of \(n\) atoms from a real dictionary, and each real \(a\) is a positive weight. We form a matrix of the \(n^2\) inner products of these \(2n\) atoms \(\MG_{iq}\), and the outer-product of the two length-\(n\) vectors of weights, \(\MA_{iq}\). The Schur product of these matrices creates a matrix where the sum of its entries gives the inner product between the two vectors \(\langle \vv_{i}, \vv_{q} \rangle \), where

$$\vv_{i} = \sum_{m=1}^n a_{im}\vh_{im}$$

$$\vv_{q} = \sum_{m=1}^n a_{qm}\vh_{qm}.$$

We may not know all of the weights, but we do know that they obey

$$0 0\).

Now, the problem:

Estimate \(\langle \vv_{i}, \vv_{q} \rangle \) to within some precision by summing only a few elements of \(\MA_{iq} \bullet \MG_{iq}\).

Why don’t we just directly sum the entries? Because evaluating all the products involves too much computation, especially when the dimension of \(n\) is large. Essentially, the number of products is \((K+2)n^2\), so when we have atoms of large support, and a large number of atoms in each model, we are in trouble.

What we want then is to use only a few atoms from each model in order to assess the projection of one vector onto another.

Obviously, since the outer product of the weights, \(\MA_{iq}\), has the largest values in its lowest indices, we should start summing there in hopes of getting in the neighborhood of the projection quickly. So we build the following recursive sum for \(M > 1\)

$$\begin{align}

S(M) &= S(M-1) + \sum_{m=1}^M [\MA_{iq} \bullet \MG_{iq} ]_{m(M-m+1)} \\

& = S(M-1) + \sum_{m=1}^M a_{im} a_{q,M-m+1} \langle \vh_{im}, \vh_{q,M-m+1} \rangle

\end{align}$$

with \(S(1) = a_{i1}a_{q1}\langle \vh_{i1}, \vh_{q1} \rangle\).

At step \(M\) we are summing \(M\) entries along an anti-diagonal of the matrix, and adding it to the previous sum.

The triangles in the picture at top show this.

At step \(M\) then we have summed \(M(M+1)/2\) entries in total.

Now, the new problem:

How small can we make \(M\) to estimate \(\langle \vv_{i}, \vv_{q} \rangle \) to within some precision?

If we make an assumption about how the elements of \(\MG_{iq}\) are distributed, e.g., uniform in \([-1,1]\), then we can generate the following upper bound on the magnitude error for large \(M\):

$$|\langle \vv_{i}, \vv_{q} \rangle – S(M)| \le C^2 \sqrt{ 2/3 } \mathrm{Erf}^{-1}(p) \left ( || \vc_M^\gamma ||^2 + || \vd^\gamma ||^2 \right )^{1/2}$$

where \(p\) is the probability of being below the bound (given this distribution), and we define the following vectors:

$$\begin{align}

\vc_M^\gamma & = \left \{ [l(m-l+1) ]^{-\gamma} : m = M+1, \ldots, n; l = 1, \ldots, m \right \} \\

\vd^ \gamma & = \left \{ [ l ( n-m+1 ) ]^{-\gamma} : m = 1, \ldots, n-1; l = m+1, \ldots, n \right \}

\end{align}$$

by an excusable abuse of notation (conflating sets and vectors).

This much has been said in the following works:

- P. Jost, Algorithmic aspects of sparse approximations. PhD thesis, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, June 2007.
- P. Jost and P. Vandergheynst, “On finding approximate nearest neighbours in a set of compressible signals,” in Proc. European Signal Process. Conf., (Lausanne, Switzerland), pp. 1-5, Aug. 2008.

In our (Sturm and L. Daudet) article currently in revision,

we put Jost et al.’s work in the form above.

Problem solved! Theoretically. Aside from the fact that this bound is still extremely loose for the matrix \(\MG_{iq}\), which has elements that are more accurately described as being distributed Laplacian with a large Dirac mass of probability at 0 — that is a topic for another day.

A new problem arises, however, in evaluating the bound above in a way that will contribute less to the heat death of the Universe:

How can I compute \(\left ( || \vc_M^\gamma ||^2 + || \vd^\gamma ||^2 \right )^{1/2}\) efficiently for \(n\) large? In other words, how can I compute the following double sums efficiently for \(n\) large?

$$\begin{align}

|| \vc_M^\gamma ||^2 & = \sum_{m=M+1}^{n}\sum_{l=1}^m \frac{1}{[l(m-l+1)]^{2\gamma}} \\

|| \vd^{\gamma} ||^2 & = \sum_{m=1}^{n-1}\sum_{l=m+1}^n \frac{1}{[l(n-m+1)]^{2\gamma}}

\end{align}.$$

Evaluating these directly (e.g., creating the vectors) requires many operations:

for \(|| \vc_M^\gamma ||^2\) it is \(n(n+1)/2 – M(M+1)/2\) multiplies and powers each;

and for \(|| \vd^\gamma ||^2\) it is \(n(n-1)/2\) multiplies and powers each).

With \(n > 300\) it takes more than 10 minutes to evaluate the Euclidean norm on my extraordinarily fast computer — the “Sparse Approximation Station (SpASt)”. I have finally grown tired enough to find a better way.

And at least for the case when \(\gamma = 0.5\) (which is not unrealistic for the signals with which I am working) I have found a nice way to reduce the complexity of both double sums to single sums with just a smidgen of error. I leave it as an exercise to the reader until tomorrow. :)