I found some formalized experimental design

Thanks to Dan Stowell, I have found an interesting book:

R. A. Bailey, Design of comparative experiments. Cambridge University Press, 2008.

The pre-print is here,
but I am definitely buying a copy.

Let’s have a first look at some of the formalization, as I consider whether it is equipped for describing evaluation in machine learning.
Here are some basic definitions, and notations

  • experimental unit: “the smallest unit to which a treatment is applied”
  • observational unit (plot): “the smallest unit on which a response will be measured”
  • The entire set of plots is notated \(\Omega\). The number of plots \(N := |\Omega|\).
  • treatment: “the entire description of what can be applied to an experimental unit”
  • The entire set of treatments is notated \(\mathcal{T}\). The number of treatments \(t := |\mathcal{T}|\)
  • plot structure: “meaningful ways of dividing up the set of plots”
  • treatment structure: “meaningful ways of dividing up the set of treatments”
  • design: “allocation of treatments to plots”.
  • The design is specified by a map of units to treatments, \(T : \Omega \to \mathcal{T}\)
  • plan or layout: “the design translated into actual plots”
  • response on a plot: realization of a random variable.
  • “The response on plot \(\omega\) is the rv \(Y_\omega = Z_\omega + \tau_{T(\omega)}\) where \(\tau_{T(\omega)}\) is a constant, and \(Z_\omega\) is a rv.”

In the last bit, the linear model,
we want to recover \(\tau_{T(\omega)}\) from our observation \(Y_\omega\).
This is the response of the unit to the treatment.
The \(Z_\omega\) includes measurement noise,
stuff that has to do with the plot, and so on.

Now, let’s try to apply these to a real virtual example, e.g., pattern recognition.
We may wish to answer the following question: “how well does system \(i\) detect the presence of human voice in my collection of \(N\) digital audio recordings?”

  • experimental unit: collection
  • observational unit: digital audio recording
  • treatments: system \(i\), random system
  • treatment and plot structure: N/A since we have digital samples
  • design: each \(\omega \in \Omega\) mapped to both \(i\) and random system
  • response on a plot: say \(i\) gives a number in \([0,1]\) denoting its confidence in its detection of human voice. If it is 1, then it is absolutely sure. The random system gives a number in \(\{0,1\}\) with some probability.

This reveals one major shortcoming: since we have assigned each unit to both treatments, we no longer have a function. Bailey writes,

Although we speak of allocating treatments to plots, mathematically the design is a function … Thus plot \(\omega\) is allocated treatment \(T(\omega)\). The function has to be this way round, because each plot can receive only one treatment.

Unlike in agriculture, digital signals can be replicated exactly. Treating one copy does not affect another copy. We think.

So, big question:
Does this mean the entire apparatus in Bailey’s book is inapplicable to evaluation in machine learning?


2 thoughts on “I found some formalized experimental design

  1. I don’t know this stuff inside out, but to put it in the framework I’d imagine it’s more like this: a “plot” is a place where you conduct a single atomic unit of the trial, and so a digital signal is not a plot. Rather, the digital signal is part of the treatment, to be fully crossed with the systems tested. Each “plot” is a piece of memory in your computer, perhaps, in which you will place a signal and then apply an algorithm.
    You would need some way to specify that the digital signal is perfectly replicated in each of the plots where it’s used, i.e. there may be variance among signals but zero variance among copies of the signal.


  2. Yes, I think I figured it out in the shower this morning. So what that we have two plots that are identical in every relevant way? This does not mean we must give them the same identifier. So, I can define the design function as long as I specify as many plots as treatments. In my case above, the design maps one treatment to one plot, and the other to a copy plot.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s