This summer I have the opportunity to read more closely R. A. Bailey, Design of comparative experiments. Cambridge University Press, 2008. One thing I really like about her approach is its incorporation of linear algebra and probability theory, which is essentially estimation theory. This provides an unambiguous picture of what is going on in an experiment, the assumptions that are in play, and the relevance and meaning of particular statistical tests. Below, I explicate some of the fundamental subspaces of an experiment.

**Preliminaries**

We have a set of \(t\) *treatments* \(\mathcal{T}\), a set of \(N\) *observational units (plots)* \(\Omega\), and a *design* \(T:\Omega \to \mathcal{T}\). Define the length-\(N\) vector $$\begin{equation}

[\vu_i]_n := \begin{cases}

1, T(n) = i \\

0, \textrm{else}

\end{cases}\end{equation}

$$ which denotes which of the \(N\) plots are treated with treatment \(i\). The matrix \(\MX := [\vu_1 \; \vu_2 \; \cdots \; \vu_t]\) thus encompasses the design of the experiment. Assume the design is *correct*, i.e., \(\MX \mathbf{1}_t = \vu_0 := \mathbf{1}_N\) and \(rank(\MX)=t\). A correct design means no plot goes untreated, all treatments are used, and each plot is treated by only one treatment. Notice that for a correct design, $$\MX^T\MX = \textrm{diag}(r_1, r_2, \ldots, r_t)$$ where \(r_i\) is the number of replications of treatment \(i\). (We assume a correct design from here on.)

Motivating this experiment are questions pertaining to the *responses* caused by the treatments, independent of all other non-treatment effects due to the plots, measurement noise, and many, many other factors. Executing this experiment results in \(N\) measurements, one from each treated plot, which we denote as the vector \(\vy\). We assume a linear model relating these to the responses, and thus model \(\vy\) as a random vector \(\MY\): $$\MY = \underline\tau + \MZ$$ where \(\underline\tau\) is a length-\(N\) deterministic vector, and \(\MZ\) is a random vector modeling all non-treatment contributions to the measurements. The problem is to estimate the vector of responses \(\tau\).

**Measurement subspaces**

Define the \(N\)-dimensional *measurement space* \(V := \mathcal{R}^N\) (the set of length-\(N\) real vectors). A subspace of this is the \(t\)-dimensional *treatment subspace* \(V_T := \textrm{span}\{\vu_i : i \in \{1, 2, \ldots, t\}\}\). Then, the \((N-t)\)-dimensional space \(V_T^\perp = V \ominus V_T\) is the *residual subspace*. Then there is the 1-dimensional *mean subspace* \(V_0 = \textrm{span}\{\vu_0\} \subseteq V_T\). Finally, there is the (\(t-1\))-dimensional *treatment effects subspace*, \(W_T = V_T \ominus V_0\).

**Decomposing measurements**

The analysis of an experiment essentially involves the decomposition of the measurements \(\vy \in V\) into contributions from all of these subspaces. To this end, we form the following projection matrices: $$\begin{align}

\MP_{V_T} &= \MX\MX^\dagger = \MX(\MX^T\MX)^{-1}\MX^T = \MX \left [ \begin{array}{cccc}

\frac{\vu_1}{r_1} \; \frac{\vu_2}{r_2} \; \ldots \; \frac{\vu_t}{r_t}

\end{array}\right ]^T \\

\MP_{V_0} &= \vu_0\vu_0^T/N \\

\MP_{W_T} &= \MP_{V_T} – \MP_{V_0} \\

\MP_{V_T^\perp} &= \MI_N – \MP_{V_T}.

\end{align}$$

Applying these to the measurements gives $$\begin{align}

\MP_{V_T}\MY &= \MX \left [ \begin{array}{cccc}

\frac{\vu_1}{r_1} \; \frac{\vu_2}{r_2} \; \ldots \; \frac{\vu_t}{r_t}

\end{array}\right ]^T \MY = \sum_{i=1}^t \bar Y_i\vu_i \\

\MP_{V_0}\MY &= \vu_0 \sum_{i=1}^t \bar Y_i \\

\MP_{W_T}\MY &= \sum_{i=1}^t \bar Y_i\vu_i – \vu_0 \sum_{i=1}^t \bar Y_i \\

\MP_{V_T^\perp}\MY &= \MY – \sum_{i=1}^t \bar Y_i\vu_i.

\end{align}$$ where \(\bar Y_i\) is the mean measurement for treatment \(i\). In terms of our measurement model \(\MY = \underline\tau + \MZ\), these become $$\begin{align}

\MP_{V_T}\MY &= \sum_{i=1}^t \bar Y_i\vu_i = \sum_{i=1}^t (\tau_i + \bar Z_i)\vu_i \\

\MP_{V_0}\MY &= \vu_0 \sum_{i=1}^t \bar Y_i = \vu_0 \sum_{i=1}^t (\tau_i + \bar Z_i) \Rightarrow \sum_{i=1}^t \bar Y_i = \sum_{i=1}^t (\tau_i + \bar Z_i) \\

\MP_{W_T}\MY &= \sum_{i=1}^t \bar Y_i\vu_i – \vu_0 \sum_{i=1}^t \bar Y_i = \left (\underline \tau -\vu_0\sum_{i=1}^t \tau_i\right) + \left (\sum_{i=1}^t\bar Z_i\vu_i – \vu_0 \sum_{i=1}^t \bar Z_i \right) \\

\MP_{V_T^\perp}\MY &= \MY – \sum_{i=1}^t \bar Y_i\vu_i = \MZ – \sum \bar Z_i\vu_i.

\end{align}$$ where \(\bar Z_i\) is the mean noise contribution to the \(r_i\) measurements of treatment \(i\).

Now we can begin to appreciate the meaning of these subspaces. \(V_T\) is the space in which the responses exist for our design. It is in this subspace that we search for the \(\underline\tau\) characterizing the responses of the treatments of the experiment. In other words, we seek a \(\underline\tau\) that is a linear combination of the columns of \(\MX\). Our estimation of \(\underline\tau\) from the measurements, however, is corrupted by noise and other effects in the directions of the treatment vectors \(\{\vu_i\}\). We hope that \(\forall i(\bar Z_i = 0)\). \(V_0\) is the subspace in which the mean response exists. \(W_T\) is the subspace in which exists the response different from the mean response. Finally, \(V_T^\perp\) is the subspace containing all the noise contributions orthogonal to the treatment subspace. This subspace contain the contributions to the measurements that are not due to the treatments, and so this is the residual subspace.

Since \(V = V_0\oplus W_T\oplus V_T^\perp\), and they are orthogonal, then $$\|\MY\|_2^2 = \|\MP_{V_T}\MY\|_2^2 + \|\MP_{V_T^\perp}\MY\|_2^2 = \|\MP_{V_0}\MY\|_2^2+ \|\MP_{W_T}\MY\|_2^2+ \|\MP_{V_T^\perp}\MY\|_2^2.$$

**Hypothesis testing**

From the above, we can compute for each space and subspace the mean squared distance of the projected measurements, also known as the *mean squares* of each space: $$\begin{align}

\textrm{MS}(V) &:= \frac{\|\MY\|_2^2}{N} \\

\textrm{MS}(V_T) &:= \frac{\|\MP_{V_T}\MY\|_2^2}{t} \\

\textrm{MS}(\textrm{mean}) &:= \frac{\|\MP_{V_0}\MY\|_2^2}{1} \\

\textrm{MS}(\textrm{treatments}) &:= \frac{\|\MP_{W_T}\MY\|_2^2}{t-1} \\

\textrm{MS}(\textrm{residual}) &:= \frac{\|V_T^\perp\|_2^2}{N-t}.

\end{align}$$

These mean squares are all random variables since they involve \(\MZ\). In particular, they all involve sum of squared values of projected \(\MZ\). The denominators are the dimensions of the subspaces, but in statistics they are *degrees of freedom*.

Assuming the standard textbook model (STM) for \(\MZ\), i.e., \(\underline \mu_{\MZ} := E[\MZ] = \mathbf{0}_N\) and \(\MC_\MZ := Cov[\MZ] = \sigma^2\MI_N\). The *expected* mean squares are thus $$\begin{align}

E[\textrm{MS}(V)] &= \frac{\|\underline \tau \|_2^2 + N\sigma^2}{N} \\

E[\textrm{MS}(V_T)] &= \frac{\|\underline \tau \|_2^2}{t} + \sigma^2 \\

E[\textrm{MS}(\textrm{mean})] &= N\bar{\underline \tau}^2 + \sigma^2 \\

E[\textrm{MS}(\textrm{treatments})] &= \frac{\|\underline\tau – \MP_0\underline\tau \|_2^2}{t-1} + \sigma^2 \\

E[\textrm{MS}(\textrm{residual})] &= \sigma^2

\end{align}$$

where \(\bar{\underline \tau} := \vu_0^T\underline\tau/N\). We can estimate these quantities from a realization of the measurements \(\vy\).

Notice that $$\begin{align}

\frac{E[\textrm{MS}(\textrm{mean})]}{E[\textrm{MS}(\textrm{residual})]} &= N\frac{\bar{\underline \tau}^2}{\sigma^2} + 1 \\

\frac{E[\textrm{MS}(\textrm{treatments})]}{E[\textrm{MS}(\textrm{residual})]} &= \frac{\|\underline\tau – \MP_0\underline\tau \|_2^2}{\sigma^2(t-1)} + 1.

\end{align}$$

From these, we can pose and test several null hypotheses. One is that the mean response of the treatments is zero, i.e., \(H_0: \bar{\underline \tau} = 0\). We thus look at whether the ratio \(\textrm{MS}(\textrm{mean})/\textrm{MS}(\textrm{residual}) \approx 1\). Another is that the responses are the same, i.e., \(H_0: \tau_1 = \tau_2 = \cdots = \tau_t\). We thus look at whether the ratio \(\textrm{MS}(\textrm{treatments})/\textrm{MS}(\textrm{residual}) \approx 1\). These ratios are given in ANOVA tables, with STM in play.

That is just the second chapter of Bailey’s book, and I already feel so much clarity. Twelve more to go!