# The general inverse problem with MAP estimation

I am now on a two-week research visit at the Centre for Mathematical Sciences at
Lund University, Sweden. I let slip at lunch that “sparsity” is now out and “co-sparsity” is in — yet I couldn’t exactly state why. (My mind is all up in that music genre.) So, to learn myself, I volunteered to present to the research group on Thursday, “Co-sparsity: what’s it good for?” And what better way to start than read, M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis in signal priors,” Inverse Problems, vol. 23, pp. 947-968, 2007. I start from the beginning: what is meant by “analysis” and “synthesis”?

We wish to recover $$\vx \in \mathcal{R}^N$$ given observation $$\vy \in \mathcal{R}^M$$.
Assume the model $$\vy = \MT\vx + \vv$$,
where $$\MT : \mathcal{R}^N \to \mathcal{R}^M$$ is known,
and $$\vv$$ is noise.
Assuming $$\vv$$ is distributed $$\mathcal{N}(\zerob,\sigma_\vv^2\MI)$$,
we can estimate $$\vx$$ by maximum a posteriori (MAP):
\begin{align} \hat \vx_{MAP}(\vy|\MT,\vv) & = \arg \max_{\vx \in \mathcal{R}^N} P[\vy|\vx]P[\vx] \\ & = \arg \max_{\vx \in \mathcal{R}^N} \exp\left ( -\frac{\|\vy – \MT\vx\|_2^2}{2\sigma_\vv^2}\right )P[\vx] \\ & = \arg \min_{\vx \in \mathcal{R}^N} \frac{\|\vy – \MT\vx\|_2^2}{2\sigma_\vv^2} – \log P[\vx]. \end{align}

• If we assume $$P[\vx]$$ uniform, then this becomes the maximum likelihood estimate.
• Assume for $$\alpha > 0$$ and $$p \ge 0$$ the prior is defined
$$P[\vx] \propto \exp (-\alpha \|\Omega\vx\|_p^p), \vx \in \mathcal{R}^N$$
where the “analysis operator” $$\Omega : \mathcal{R}^N \to \mathcal{R}^L$$.
With this we see the most probable $$\vx$$ lies in the null space of $$\Omega$$,
if there is one.
If $$p \le 1$$, the most probable $$\vx$$ are the ones most
“sparsified” by $$\Omega$$.
If $$p = 2$$, the most probable $$\vx$$ are those that point
along the eigenvector of $$\Omega^T\Omega$$ having the smallest eigenvalue. (Is that right?)
The “analysis MAP” estimate is thus
$$\hat\vx_{MAP,A}(\vy|\MT,\vv) = \arg \min_{\vx \in \mathcal{R}^N} \|\vy – \MT\vx\|_2^2 + 2\sigma_\vv^2\alpha \|\Omega\vx\|_p^p.$$
• Assume for $$\alpha > 0$$ and $$p \ge 0$$ the prior is defined
$$P[\vx] \propto \begin{cases} 0 & \lnot \exists \vs \in \mathcal{R}^K(\vx = \MD\vs) \\ \exp (-\alpha \|\vs\|_p^p) & \end{cases}$$
where the “dictionary” $$\MD \in \mathcal{R}^{N\times K}$$.
We see that only $$\vx$$ in the column space of $$\MD$$
has non-zero probability density.
If $$p \le 1$$, then the $$\vx$$ with sparse representations in $$\MD$$
are the most probable.
If $$p = 2$$, then the $$\vx$$ having small “energies” in $$\MD$$
are the most probable.
The “synthesis MAP” estimate is thus
$$\hat \vx_{MAP,S}(\vy|\MT,\MD,\vv) = \MD \hat \vs$$
where
$$\hat \vs = \arg \min_{\vs \in \mathcal{R}^K} \|\vy – \MT\MD\vs\|_2^2 + 2\sigma_\vv^2\alpha \|\vs\|_p^p.$$
• If $$\MD$$ is full rank and $$\Omega^{-1} = \MD$$,
then $$\hat \vx_{MAP,S}(\vy|\MT,\MD,\vv) = \hat\vx_{MAP,A}(\vy|\MT,\vv)$$.

Now we are set to explore these two approaches
for sparse approximation, and thus unveil the relationships between sparsity and cosparsity.