# MMSE estimators of mean and variance

I am at that point in preparing my lecture notes where I am confusing myself, i.e., done with the elementary things, and into the material I haven’t studied for many years. Let’s say we have a set of $$N$$ observations $$\{z_n\}$$ of some random variable $$Z$$. We don’t know how $$Z$$ is truly distributed, but we can estimate the mean and the variance of the observations, so that we might, e.g., model it by a Gaussian distribution function.

The minimum mean square error estimate of the mean is given by
$$\mu_\textrm{MMSE} = E[\mu | z] = \int_{\Omega} \mu f(\mu|z)d\mu$$
where $$f(\mu|z)$$ is the pdf of the mean parameter given the observations,
and $$\Omega$$ defines the parameter space.
If we define
$$f(\mu|z) = \sum_{n=1}^N \frac{1}{N} \delta(\mu – z_n)$$
then we are saying that the mean is equally likely to be any one of our observations.
Plugging this into the above we see
$$\mu_\textrm{MMSE} = E[\mu | z] = \int_{\Omega} \mu \sum_{n=1}^N \frac{1}{N} \delta(\mu – z_n) d\mu = \frac{1}{N} \sum_{n=1}^N z_n$$
which is the normal sample mean.

Now, we wish to find the minimum mean squared error estimate of the variance of the sample:
$$\sigma^2_\textrm{MMSE} = E[\sigma^2 | z] = \int_{\Omega} \sigma^2 f(\sigma^2|z)d\sigma^2.$$
Let’s create the set of all unique pairwise squared differences divided by 2:
$$\left \{y_k = \frac{(x_i – x_j)^2}{2}: 1 \le i < j \le N \right \}$$
(note that I am dividing by two. Note it.)
This set has $$K = N(N-1)/2$$ members.
As above, we can consider each one of these equally likely to be the variance
$$f(\sigma^2|z) = \sum_{k=1}^K \frac{2}{N(N-1)} \delta(\sigma^2 – y_k).$$
Substituting this above we find that the MMSE estimator of the variance is
$$\sigma^2_\textrm{MMSE} = E[\sigma^2 | z] = \int_{\Omega} \sigma^2 \sum_{k=1}^K \frac{2}{N(N-1)} \delta(\sigma^2 – y_k) d\sigma^2 = \frac{2}{N(N-1)} \sum_{k=1}^K y_k.$$

We know that an unbiased estimate of the variance of these observations is given by
$$\sigma^2_\textrm{UB} = \frac{1}{N-1} \sum_{n=1}^N (z_n – \mu)^2$$
where $$\mu = \mu_\textrm{MMSE}$$.
When I compare $$\sigma^2_\textrm{UB}$$ and $$\sigma^2_\textrm{MMSE}$$ in MATLAB,
I find $$\sigma^2_\textrm{MMSE} = \sigma^2_\textrm{UB}$$ to several dozen decimal places —
which shows that this amusing relationship may be true:
$$\frac{2}{N(N-1)} \mathop{\sum_{i,j = 1}}_{j > i}^N \frac{(z_i – z_j)^2}{2} = \frac{1}{N-1} \sum_{n=1}^N (z_n – \mu)^2.$$
Make sure you note that one in your copybooks!

So the two questions I have are:

1. What is the interpretation of defining each $$y_k$$ as the squared difference between two samples divided by 2?
2. How does one reduce the expression for $$\sigma^2_\textrm{MMSE}$$ to that for $$\sigma^2_\textrm{UB}$$?

On my bike ride home, I figured out the answer to the first question. The variance between two points is defined as half their squared separation.

And now my colleague M. Christensen for anonymity, has shown me that the two expressions are the same if one just applies the expectation operation to both sides. Interesting.