After writing about probabilistic orthogonal matching pursuit (PrOMP) a few days ago, I decided to implement and test it starting from my old dissertation code. The MATLAB code I used to generate the plots below is here, for those interested in reproducing my research.

The approach I took is described in: A. Divekar and O. K. Ersoy, “Probabilistic Matching Pursuit for Compressive Sensing,” Electrical and Computer Engineering Technical Report, Purdue University, May 2010. Though this report is missing a crucial detail about the implementation, I had a look at their MATLAB code and found what was missing: each iteration selects an atom randomly from all those with a non-zero projection onto the residual. The probability distribution over these atoms is defined as in their paper:

$$P[ \vd \in \mathcal{H}^* | \vr(n) ] := \begin{cases}

0, & | \langle \vr(n), \vd \rangle | = 0 \\

(1-\epsilon)/m, & | \langle \vr(n), \vd \rangle | \ge f_m \\

\epsilon/(N-n-m), & 0 < | \langle \vr(n), \vd \rangle | < f_m

\end{cases}$$

where here for my signals of length 1,024 samples, and a dictionary of size 16,364 complex Gabor atoms I have set \(\epsilon = 0.01\), and defined \(m\) to be \(0.1\%\) of the number of atoms with non-zero projection. (Note that since OMP guarantees that the residual will be orthogonal to all atoms already selected, we need not worry about selecting those atoms again since they will have zero probability mass.) The signals and dictionary are the same as those I used in my experiments with low complexity OMP (LoCOMP).

Below we see the residual energy decay for three signals with different structures that are difficult to model by greedy decomposition approaches: an asymmetric attack, two modes critically spaced in time, and a stationary sinusoid. (None of the signals are embedded in noise.) The residual energy decays of 100 runs of PrOMP are shown in gray against the results of MP and OMP. Only a few runs of PrOMP outperform OMP for Attack with respect to residual energy decay. And only one run of PrOMP for Bimodal is equivalent to that of OMP. But for Sine, all of the runs perform dismally (with some strange behavior not unlike that caused by bugs in my code (residual energy should be monotonic decreasing for a complete dictionary no matter what atom is selected (as long as it has non-zero projection), so maybe there are some linear dependence issues going on that are upsetting the inversion of the Gramian.)) Regardless, it appears that PrOMP for the most part does not find good signal models. And on top of this, even with the reprojection of OMP, these models perform more poorly than MP.

Let’s now look below at the time-domain distribution of the atoms in the models that perform the best in the sense of minimizing the number of atoms to reach -60 dB, or 100 atoms, whichever comes first. Here we can see that PrOMP found models of Attack and Bimodal that look better than the models offered by OMP — the structures of the signals are represented by the atoms in a more visible manner. For Attack, we see fewer atoms preceding the onset. For Bimodal we can see the two modes are represented by several atoms individually than by large scale atoms that have twice the support of each mode individually. The model of Sine apparently does not benefit. (Note that these results depend in some complex way on the number of atoms \(m\) I consider at each iteration to be the most likely candidates. The authors used 8 for a dictionary of \(\mathcal{O}(1000)\) atoms; I use 0.1% of the total number of atoms with non-zero projection out of a dictionary of \(\mathcal{O}(10000)\) atoms, which I find to be dozens of atoms. I am rerunning these simulations using 0.01% just to make sure.)

So in conclusion, I would not let PrOMP watch over my children. PrOMP needs a long time (in these specific instances I have covered) to discover signal models that are more efficient (with respect to minimizing the residual energy below some model order) than OMP (or just MP, ouch), without any guarantee that all elements of the signal model fit well the content in which I am interested. It is my opinion then, that — when it comes to the problem of sparse representation — it is more often than not a bad idea to select atoms at random from a set of atoms that we uniformly distribute with respect to their magnitude inner products. To create a more efficient signal model, much more needs to be taken into consideration than just the projection of an atom onto the residual!