I am now on a two-week research visit at the Centre for Mathematical Sciences at

Lund University, Sweden. I let slip at lunch that “sparsity” is now out and “co-sparsity” is in — yet I couldn’t exactly state why. (My mind is all up in that music genre.) So, to learn myself, I volunteered to present to the research group on Thursday, “Co-sparsity: what’s it good for?” And what better way to start than read, M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis in signal priors,” Inverse Problems, vol. 23, pp. 947-968, 2007. I start from the beginning: what is meant by “analysis” and “synthesis”?

We wish to recover \(\vx \in \mathcal{R}^N\) given observation \(\vy \in \mathcal{R}^M\).

Assume the model \(\vy = \MT\vx + \vv\),

where \(\MT : \mathcal{R}^N \to \mathcal{R}^M\) is known,

and \(\vv\) is noise.

Assuming \(\vv\) is distributed \(\mathcal{N}(\zerob,\sigma_\vv^2\MI)\),

we can estimate \(\vx\) by maximum a posteriori (MAP):

$$

\begin{align}

\hat \vx_{MAP}(\vy|\MT,\vv) & = \arg \max_{\vx \in \mathcal{R}^N} P[\vy|\vx]P[\vx] \\

& = \arg \max_{\vx \in \mathcal{R}^N} \exp\left ( -\frac{\|\vy – \MT\vx\|_2^2}{2\sigma_\vv^2}\right )P[\vx] \\

& = \arg \min_{\vx \in \mathcal{R}^N} \frac{\|\vy – \MT\vx\|_2^2}{2\sigma_\vv^2} – \log P[\vx].

\end{align}

$$

- If we assume \(P[\vx]\) uniform, then this becomes the maximum likelihood estimate.
- Assume for \(\alpha > 0\) and \(p \ge 0\) the prior is defined

$$

P[\vx] \propto \exp (-\alpha \|\Omega\vx\|_p^p), \vx \in \mathcal{R}^N

$$

where the “analysis operator” \(\Omega : \mathcal{R}^N \to \mathcal{R}^L\).

With this we see the most probable \(\vx\) lies in the null space of \(\Omega\),

if there is one.

If \(p \le 1\), the most probable \(\vx\) are the ones most

“sparsified” by \(\Omega\).

If \(p = 2\), the most probable \(\vx\) are those that point

along the eigenvector of \(\Omega^T\Omega\) having the smallest eigenvalue. (Is that right?)

The “analysis MAP” estimate is thus

$$

\hat\vx_{MAP,A}(\vy|\MT,\vv) = \arg \min_{\vx \in \mathcal{R}^N} \|\vy – \MT\vx\|_2^2 + 2\sigma_\vv^2\alpha \|\Omega\vx\|_p^p.

$$ - Assume for \(\alpha > 0\) and \(p \ge 0\) the prior is defined

$$

P[\vx] \propto \begin{cases}

0 & \lnot \exists \vs \in \mathcal{R}^K(\vx = \MD\vs) \\

\exp (-\alpha \|\vs\|_p^p) &

\end{cases}

$$

where the “dictionary” \(\MD \in \mathcal{R}^{N\times K}\).

We see that only \(\vx\) in the column space of \(\MD\)

has non-zero probability density.

If \(p \le 1\), then the \(\vx\) with sparse representations in \(\MD\)

are the most probable.

If \(p = 2\), then the \(\vx\) having small “energies” in \(\MD\)

are the most probable.

The “synthesis MAP” estimate is thus

$$

\hat \vx_{MAP,S}(\vy|\MT,\MD,\vv) = \MD \hat \vs

$$

where

$$

\hat \vs = \arg \min_{\vs \in \mathcal{R}^K} \|\vy – \MT\MD\vs\|_2^2 + 2\sigma_\vv^2\alpha \|\vs\|_p^p.

$$ - If \(\MD\) is full rank and \(\Omega^{-1} = \MD\),

then \(\hat \vx_{MAP,S}(\vy|\MT,\MD,\vv) = \hat\vx_{MAP,A}(\vy|\MT,\vv)\).

Now we are set to explore these two approaches

for sparse approximation, and thus unveil the relationships between sparsity and cosparsity.