The general inverse problem with MAP estimation

I am now on a two-week research visit at the Centre for Mathematical Sciences at
Lund University, Sweden. I let slip at lunch that “sparsity” is now out and “co-sparsity” is in — yet I couldn’t exactly state why. (My mind is all up in that music genre.) So, to learn myself, I volunteered to present to the research group on Thursday, “Co-sparsity: what’s it good for?” And what better way to start than read, M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis in signal priors,” Inverse Problems, vol. 23, pp. 947-968, 2007. I start from the beginning: what is meant by “analysis” and “synthesis”?

We wish to recover \(\vx \in \mathcal{R}^N\) given observation \(\vy \in \mathcal{R}^M\).
Assume the model \(\vy = \MT\vx + \vv\),
where \(\MT : \mathcal{R}^N \to \mathcal{R}^M\) is known,
and \(\vv\) is noise.
Assuming \(\vv\) is distributed \(\mathcal{N}(\zerob,\sigma_\vv^2\MI)\),
we can estimate \(\vx\) by maximum a posteriori (MAP):
$$
\begin{align}
\hat \vx_{MAP}(\vy|\MT,\vv) & = \arg \max_{\vx \in \mathcal{R}^N} P[\vy|\vx]P[\vx] \\
& = \arg \max_{\vx \in \mathcal{R}^N} \exp\left ( -\frac{\|\vy – \MT\vx\|_2^2}{2\sigma_\vv^2}\right )P[\vx] \\
& = \arg \min_{\vx \in \mathcal{R}^N} \frac{\|\vy – \MT\vx\|_2^2}{2\sigma_\vv^2} – \log P[\vx].
\end{align}
$$

  • If we assume \(P[\vx]\) uniform, then this becomes the maximum likelihood estimate.
  • Assume for \(\alpha > 0\) and \(p \ge 0\) the prior is defined
    $$
    P[\vx] \propto \exp (-\alpha \|\Omega\vx\|_p^p), \vx \in \mathcal{R}^N
    $$
    where the “analysis operator” \(\Omega : \mathcal{R}^N \to \mathcal{R}^L\).
    With this we see the most probable \(\vx\) lies in the null space of \(\Omega\),
    if there is one.
    If \(p \le 1\), the most probable \(\vx\) are the ones most
    “sparsified” by \(\Omega\).
    If \(p = 2\), the most probable \(\vx\) are those that point
    along the eigenvector of \(\Omega^T\Omega\) having the smallest eigenvalue. (Is that right?)
    The “analysis MAP” estimate is thus
    $$
    \hat\vx_{MAP,A}(\vy|\MT,\vv) = \arg \min_{\vx \in \mathcal{R}^N} \|\vy – \MT\vx\|_2^2 + 2\sigma_\vv^2\alpha \|\Omega\vx\|_p^p.
    $$
  • Assume for \(\alpha > 0\) and \(p \ge 0\) the prior is defined
    $$
    P[\vx] \propto \begin{cases}
    0 & \lnot \exists \vs \in \mathcal{R}^K(\vx = \MD\vs) \\
    \exp (-\alpha \|\vs\|_p^p) &
    \end{cases}
    $$
    where the “dictionary” \(\MD \in \mathcal{R}^{N\times K}\).
    We see that only \(\vx\) in the column space of \(\MD\)
    has non-zero probability density.
    If \(p \le 1\), then the \(\vx\) with sparse representations in \(\MD\)
    are the most probable.
    If \(p = 2\), then the \(\vx\) having small “energies” in \(\MD\)
    are the most probable.
    The “synthesis MAP” estimate is thus
    $$
    \hat \vx_{MAP,S}(\vy|\MT,\MD,\vv) = \MD \hat \vs
    $$
    where
    $$
    \hat \vs = \arg \min_{\vs \in \mathcal{R}^K} \|\vy – \MT\MD\vs\|_2^2 + 2\sigma_\vv^2\alpha \|\vs\|_p^p.
    $$
  • If \(\MD\) is full rank and \(\Omega^{-1} = \MD\),
    then \(\hat \vx_{MAP,S}(\vy|\MT,\MD,\vv) = \hat\vx_{MAP,A}(\vy|\MT,\vv)\).

Now we are set to explore these two approaches
for sparse approximation, and thus unveil the relationships between sparsity and cosparsity.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s