Continuing from yesterday, I want to look more closely at the computational complexities of greedy sparse approximation algorithms. Considering that we have a signal of dimension \(m\) with assumed sparsity \(s \ll m\), and a dictionary of \(N \gg m\) atoms, Mailhé et al. (Metal) arrive at orders of computational costs for pursuits using general and fast dictionaries. (I have modified their notation to fit that often used in compressed sensing, where \(m\) stands for measurements, \(s\) stands for sparsity; and it doesn’t make sense to me to make \(D\) be the number of atoms in a dictionary \(\MD\) — so I set it to \(N\) because that is what Tropp and Wright do in J. A. Tropp and S. J. Wright, “Computational methods for sparse solution of linear inverse problems,” Proc. IEEE, vol. 98, pp. 948-958, June 2010.) Metal describe a little bit in their paper about how they arrive at these costs, but I need to fill in some details. I reproduce below their costs for general dictionaries, and in parenthesis for fast dictionaries where different.

table.tableizer-table {border: 0px solid #CCC; font-family: Arial, Helvetica, sans-serif; font-size: 12px;} .tableizer-table td {padding: 0px; margin: 0px; border: 1px solid #ccc;}

.tableizer-table th {background-color: #104E8B; color: #FFF; font-weight: bold;}

Action | HT | MP | OMP | GP |
---|---|---|---|---|

compute inner products | \(mN\) ( \(N\log m\) ) | \(mN\) ( \(N\log m\) ) | \(mN\) ( \(N\log m\) ) | \(mN\) ( \(N\log m\) ) |

find max | \(N \log N\) | \(N\) | \(N\) | \(N\) |

compute Gramian | \(s^2m\) ( \(s\log m\) ) | – | \(im\) ( \(i\log m\) ) | \(im\) ( \(i\log m\) ) |

find weights | \(s^2\) | – | \(i^2\) | \(i\) |

update residual | \(sm\) ( \(N\log m\) ) | \(m\) | \(im\) ( \(N\log m\) ) | \(im\) ( \(N\log m\) ) |

Cost/iter. | \(mN + s^2m\) ( \(N\log N + s^2\) ) | \(mN\) ( \(N\log m\) ) | \(mN\) ( \(N\log m + i^2\) ) | \(mN\) ( \(N\log m\) ) |

# iterations | 1 | \(\ge s\) | \(s\) | \(\ge s\) |

Total | \(mN + s^2m\) ( \(N\log N + s^2\) ) | \(smN\) ( \(sN\log N\) ) | \(smN\) ( \(sN\log N + s^3\) ) | \(smN\) ( \(sN\log N\) ) |

where \(i\) is the iteration of the greedy algorithm. (Table made with Tableizer.)

In the hard thresholding (HT) case, which is not the iterative version, the costs are clear for the general dictionary case. After finding the set of \(N\) inner products with the length-\(m\) signal, we sort them at a typical cost of \( N \log N \). With the prior knowledge that our signal is \(s\)-sparse, we select \(s\) atoms with the largest magnitude products, and compute their \(s \times s\) Gramian at a cost of \(s^2 N\) if they are not orthogonal. (The cost is actually lower because the diagonal is all ones.) We must find the least squares weights, which involves at worst an inversion of the positive definite Gramian at a cost of \(s^2\) …. which I think should be \(s^3 + s^2\), but it doesn’t matter here. Finally, producing the orthogonal projection of the signal onto the basis has a cost \(s m\). This gives a total cost of HT on the order of \(Nm + s^2m\).

For HT with the fast dictionary, which consists of a union of bases with fast transforms, the inner product step can be done with less computation. For instance, the Fourier transform of a length-\(m\) signal has cost \(m \log m\). That is the cost of finding all projections of the signal onto the Fourier basis of size-\(m\). Thus, for a dictionary of \(b\) size-\(m\) bases, all projections can be computed at a cost of \(b m\log m = N \log m\).

Metal say computing the Gramian of the selected atoms costs cheaper at \(s \log m\), which may come from the fact that projections onto an arbitrary set of atoms from fast bases brings with it the lower computational complexity of the transforms… (Best case scenario is when all atoms come from the same basis, with cost 0. Worst case is when they come from different bases, with cost \(s^2m\). Is \(s \log m\) then the expected cost?) Finally, Metal state the linear combination of these \(s\) atoms has a cost \(N\log m\), which I don’t get. I think it should be \(s \log m\) at most.

Anyhow, since \(N \gg m \gg s\), the \(N\log N\) terms win, and the total cost has an order of that with a little more.

Now, for matching pursuit (MP), the PGA, using a general dictionary, computing the inner products has a cost that is the same as all the others. Since we only need to find the maximum magnitude, the cost is at most \(N\). MP does not perform any orthogonal projection; but updating the residual is at most \(m\). This makes the cost for each iteration \(mN\). And so for \(s\) iterations, we have a total cost of \( s m N\). In a fast dictionary, computing the inner products has the same cost \(N\log m\) as all the others.

Searching for the largest magnitude remains \(N\).

Updating the residual is just a multiplication of a length-\(m\) atom, and a subtraction.

Thus, the \(N \log m\) terms win; and with \(s\) iteration we arrive at a cost of \(N s\log m\). MPTK actually reduces the complexity to \( [ N+s l]\log l\) where \(l\) is the size of the largest atom in the dictionary.

For orthogonal MP, the OGA, we now have projections to do as in HT. Unlike HT however, the cost of these projections increase with the number of iterations. After selecting the \(i\)th atom, we must find the orthogonal projection of the signal on the span of those \(i\) atoms. This should involve a cost of \(i^2 m\) to compute the Gramian, and \(i^3\) to compute its inverse, and then \(i^2\) to compute the weights; but Metal claim it costs \(i m\) to compute the Gramian, and \(i^2\) to compute its inverse and the weights — a trick by Pati et al. 1993. These things don’t matter with big O notation, as \(Nm\) takes the complexity cake. (Metal also state that OMP takes only \(s\) iterations, which I don’t accept. There is no guarantee it will have converged in \(s\) iterations even when the signal is \(s\)-sparse.) Considering OMP with a fast dictionary, Metal claim computing the Gramian has a cost \(i \log m\) for some reason, its inverse and linear combination with \(i\) inner products has complexity \(i^2\) for some reason, and finally the computation of the orthogonal projection of the signal onto the span of the \(i\) atoms has a cost of \(N \log m\), which I don’t see. Thus, Metal show a total cost of \(Ns\log m + s^3\). Since \(s \ll N\), OMP is in the same complexity ballpark as MP — but this is hard to believe. Computing the orthogonal projection is costly.

Any and all elucidation is greatly appreciated.