Hello, and welcome to Paper of the Day (Po’D): Consensus matching pursuit for multi-trial EEG signals Edition. Today’s paper is: C. G. Bénar, T. Papadopoulo, B. Torrésani, and M. Clerc, “Consensus matching pursuit for multi-trial eeg signals,” J. Neuroscience Methods, vol. 180, pp. 161-170, Mar. 2009. With this paper, we add yet another greedy sparse approximation method with the acronym CMP: First, there was **Cyclic MP** (M. G. Christensen and S. H. Jensen, “The cyclic matching pursuit and its application to audio modeling and coding,” in Proc. Asilomar Conf. Signals, Syst., Comput., (Pacific Grove, CA), Nov. 2007); then there was **Complementary MP** (G. Rath and C. Guillemot, “A complementary matching pursuit algorithm for sparse approximation,” in Proc. European Signal Process. Conf., (Lausanne, Switzerland), Aug. 2008.) Now we have **Consensus MP**. Are there any takers for Conjugate MP? Or Confabulatory MP? Or Circumlocutive MP? (which one may argue should be the MP decompositions that always go awry.) Or perhaps Cholangiocholecystocholedochectomy MP? (First person to put that in an abstract gets an award from me.)

In fact, today’s paper has more to do with cholangiocholecystocholedochectomies than the other papers I have reviewed here, because the paper deals with the time-frequency analysis of electroencephalogram (EEG) data (true story: I once played Pictionary where I had to draw an electroencephalogram in under 30 seconds. Though I spent too much time drawing the cerebellum and medulla oblongata, my team got it.) Anyhow, in EEG data the components of interest are knee deep in noise, and so there is hope that sparse approximation with a suitable dictionary can provide a time-frequency analysis of the data that is both efficient and “meaningful.” In particular, since 1995 Durka et al. (P. J. Durka, Matching Pursuit and Unifiction in EEG analysis. Artech House Engineering in Medicine and Biology Series, Boston, MA: Artech House, 2007) have looked at ways of using MP to parameterize and analyze these difficult signals.

Not surprisingly, the simplest greedy method is not also the best at generating useful signal descriptions. Sometime between 1995 and 1999, Durka et al. began to notice that MP decomposition produces spurious artifacts, i.e., atoms in the model that had no business being there. In P. J. Durka, D. Ircha, and K. J. Blinowska, “Stochastic time-frequency dictionaries for matching pursuit,” IEEE Trans. Signal Process., vol. 49, pp. 507-510, Mar. 2001, Durka et al. mentions in the abstract,

Analyzing large amounts of sleep electroencephalogram (EEG) data by means of the matching pursuit (MP) algorithm, we encountered a statistical bias of the decomposition, resulting from the structure of the applied dictionary.

Well, these problems are not due to the dictionary, but are really due to the decomposition process MP. (I show that MP cannot avoid producing these artifacts no matter the dictionary and non-trivial signal in B. L. Sturm and J. J. Shynk, “Sparse approximation and the pursuit of meaningful signal models with interference adaptation,” IEEE Trans. Acoustics, Speech, Lang. Process., vol. 18, pp. 461-472, Mar. 2010). Anyhow, it is a real problem when your processed data begins to acquire biases from the processing and provides no way to remove them. So Durka et al. decides to randomize the parameters of a dictionary of Gabor atoms, perform several decompositions of the same signal, and then average the time-frequency energy distributions of each atomic representation to obtain a fuzzy picture hopefully void of spurious artifacts.

Now the question comes, what do you do when you have thousands of tests of the same stimuli being administered to different people, and you wish to study the variability across test subjects? And with that we arrive at the subject of today’s Po’D.

The authors propose an approach to the decomposition of the signals over several trials. They assume that their signals of interest have components that repeat consistently will some subtle variations in shape and lag over the trials (and these differences are what they want to explore), and that the background noise is stationary and uncorrelated between trials. So now we have a collection of data \(\mathcal{X} := \{\vx_i \in \mathcal{R}^M : i = 1, 2, \ldots, I \}\) for all the trials, and we wish to decompose it all over a dictionary \(\mathcal{D}\), or \(\MD\) in matrix form, of unit norm time-frequency atoms such that, for instance,

$$\min_{\vs} \sum_{i=1}^I || \vx_i – \MD\vs ||_2^2$$

i.e., to describe all the signals (which I assume are phase aligned somehow) using a single linear combination of dictionary elements.

One approach, the authors call “Evoked Activity MP” (EMP) (P. J. Durka, A. Matysiak, E. M. Montes, P. V. Sosa, and K. J. Blinowska, “Multichannel matching pursuit and eeg inverse solutions,” Journal of Neuroscience Methods, vol. 148, pp. 49-59, Oct. 2005.

) selects the \(n\)th atom by the criterion

$$\vh_n = \arg \max_{\vd \in \mathcal{D}} \left | \sum_{i=1}^I \langle \vr_i(n), \vd \rangle \right |$$

where \(\vr_i(n)\) is the \(n\)th order residual signal of the \(i\)th trial.

Another approach, which the authors call “Induced activity MP” (IMP) (R. Gribonval, “Piecewise linear source separation,” in Proc. Soc. Photographic Instrumentation Eng., vol. 5207, (San Diego, CA, USA), pp. 297-310, Dec. 2003.

), selects the \(n\)th atom by the criterion

$$\vh_n = \arg \max_{\vd \in \mathcal{D}} \sum_{i=1}^I \left | \langle \vr_i(n), \vd \rangle \right |^2.$$

Contrary to these, the authors here propose the following algorithm:

- For each \(n\)-order residual of the trials, find the atom parameters at which there are local maxima in the set \( \mathcal{M}_i := \{ | \langle \vr_i(n), \vd \rangle | : \vd \in \mathcal{D}\}\). (This now assumes our projections onto the dictionary is a sampling of some parametric and smooth manifold, and thus it makes sense to talk about local maxima. Or that our dictionary is parametric at all.) We call this set of parameters at which there exist local maxima \( \mathcal{P}_i \).
- From the set \(\{\mathcal{P}_i : i=1,2\cdots, I\}\) we find the parameters of a “consensus” atom that maximizes a “vote map,” which is defined by $$ V(p) := \sum_{i=1}^I \sum_{p_i \in \mathcal{P}_i} | \langle \vr_i(n), \vd(p_i) \rangle | S(p,p_i)$$ where \(\vd(p_i)\) is that atom in the dictionary defined with parameter \(p_i\), and \(S(p,p_i)\) is a similarity measure between two parameters, which the authors define for Gabor atoms having translation \(u\), modulation frequency \(f\), and number of oscillations \(\xi\), $$S(p_1,p_2) := \exp\left ( \frac{1}{2} \left [ \left ( \frac{u_1 – u_2}{\sigma_u(u_1,f_1,\xi_1)} \right )^2 + \left ( \frac{f_1 – f_2}{\sigma_f(u_1,f_1,\xi_1)} \right )^2 + \left ( \frac{\xi_1 – \xi_2}{\sigma_\xi(u_1,f_1,\xi_1)} \right )^2 \right ] \right )$$ where each term in the denominator, e.g., \(\sigma_u(u_1,f_1,\xi_1)\), denotes something about the width of the atom in the time-frequency domain. (I think. The authors are not entirely clear on this aspect.)
- Finally, from the consensus atom, for each trial we select the best atom that is close enough to the consensus atom by maximizing the trial-dependent vote map defined by $$\arg \max_{p_i \in \mathcal{P}_i} | \langle \vr_i(n), \vd(p_i) \rangle | V(p_i).$$ This atom is then subtracted from the \(n\)th order residual of the \(i\)th trial.
- Now wipe the sweat from your brow, and repeat.

So, in summary, at each iteration we select the atom that has the longest projection on all the trial residuals, and then fine tune it for each of the trials individually.

In their simulations, the authors find CMP does a good job recovering the underlying (synthetic) signals from noise (at an SNR of “1”, whether it is dB or what I don’t know).

The figure below compares for three trials the results of using EMP, IMP, and their proposed approach. The blue lines are the original data; the red lines are the reconstructions (using 2 atoms), and the black lines are the true signals. At least for these, we can see a greater resemblance for CMP.

Another interesting figure from their paper is seen below. Here they look at the atoms in six-order models of two synthetic signals (or maybe it was several realizations of the same signals to simulate trials): one signal being the “original data”, and one signal created by jittering the parameters of the original data in time and frequency (or maybe just time, the authors are not clear). (Or maybe they use both the original and jittered signal as two trials… why aren’t all publications clear?) But we see that CMP finds a consistent set of atoms in the two, in contrast to the horrible mess of EMP and IMP.

Anyhow, this approach really reminds me of Stereo MP, and I am glad that the authors cite Gribonval — which was missing from a preprint I read.