Modeling the 2020 AI Music Generation Challenge ratings, pt. 4

In the part 3, we looked at the decomposition of our responses into (near)-orthogonal subspaces related to the factors:

\mathbf{y} = \mathbf{P}_0\mathbf{y} + \mathbf{P}_{W_T}\mathbf{y} + \mathbf{P}_{W_J}\mathbf{y} + \mathbf{P}_{W_Q}\mathbf{y} + \mathbf{P}_{W_{JQ}}\mathbf{y} + \mathbf{P}_\perp\mathbf{y} = \hat{\mathbf{y}_\mathbf{X}} + \mathbf{P}_\perp\mathbf{y}

where \mathbf{P}_{W_T} = \mathbf{P}_T-\mathbf{P}_0, \mathbf{P}_{W_J} = \mathbf{P}_J-\mathbf{P}_0, \mathbf{P}_{W_Q} = \mathbf{P}_Q-\mathbf{P}_0 and \mathbf{P}_{W_{JQ}} = \mathbf{P}_{JQ}-\mathbf{P}_J-\mathbf{P}_Q-\mathbf{P}_0. Also, \hat{\mathbf{y}}_\mathbf{X} is the projection of our response vector onto \mathrm{C}(\mathbf{X}).

Now that we have decomposed the response vector into orthogonal pieces, we can say the following:

\|\mathbf{y}\|^2 = \|\mathbf{P}_0\mathbf{y}\|^2 + \|\mathbf{P}_{W_T}\mathbf{y}\|^2 + \|\mathbf{P}_{W_J}\mathbf{y}\|^2 + \|\mathbf{P}_{W_Q}\mathbf{y}\|^2 + \|\mathbf{P}_{W_{JQ}}\mathbf{y}\|^2 + \|\mathbf{P}_\perp\mathbf{y}\|^2

using the Euclidean norm. What is the expected value of each of these terms? Remember we model each of our responses as a random variable R_{tjq} = \tau_{tjq} + \epsilon where \epsilon \sim \mathcal{N}(0,\sigma^2), and \tau_{tjq} is the “true” response. Hence our random vector of responses is modeled \mathbf{Y} = \boldsymbol{\tau} + \mathbf{n} where \mathbf{n} \sim \mathcal{N}(\mathbf{0},\sigma^2\mathbf{I}_N). What is the expected norm of a projection of \mathbf{Y}?

E[\|\mathbf{P}\mathbf{Y}\|^2] = E[\|\mathbf{P}\boldsymbol{\tau}+\mathbf{P}\mathbf{n}\|^2] = E[(\mathbf{P}\boldsymbol{\tau}+\mathbf{P}\mathbf{n})^T(\mathbf{P}\mathbf{\tau}+\mathbf{P}\mathbf{n})] = \|\mathbf{P}\boldsymbol{\tau}\|^2 + E[\|\mathbf{P}\mathbf{n}\|^2] = \|\mathbf{P}\boldsymbol{\tau}\|^2 + d\sigma^2.

where d=\mathrm{dim}C(\mathbf{P}). The last part comes from the fact that \mathbf{P} is an orthonormal projection onto a d-dimensional space.

An import detail now is that while \mathbf{P}_T projects onto \mathrm{C}(\mathbf{X}_T), which is a 25-dimensional subspace of \mathbb{R}^N, the projection matrix \mathbf{P}_{W_T} projects onto a 24-dimensional subspace since we have removed one dimension by subtracting out \mathrm{C}(\mathbf{1}_N). The same goes for \mathbf{P}_{W_J} (projecting onto a three-dimensional subspace), and \mathbf{P}_{W_Q} (projecting onto a four-dimensional subspace), and \mathbf{P}_{W{JQ}} (projecting onto a 12-dimensional subspace). That means the residual projection matrix \mathbf{P}_\perp is projecting into a N-44=430-44=386-dimensional subspace.

All of this means that for our model:

E\|\mathbf{P}_0\mathbf{Y}\|^2 = \|\mathbf{P}_0\boldsymbol{\tau}\|^2 + \sigma^2
E\|\mathbf{P}_{W_T}\mathbf{Y}\|^2 = \|\mathbf{P}_{W_T}\boldsymbol{\tau}\|^2 + 24\sigma^2
E\|\mathbf{P}_{W_J}\mathbf{Y}\|^2 = \|\mathbf{P}_{W_J}\boldsymbol{\tau}\|^2 + 3\sigma^2
E\|\mathbf{P}_{W_Q}\mathbf{Y}\|^2 = \|\mathbf{P}_{W_Q}\boldsymbol{\tau}|^2 + 4\sigma^2
E\|\mathbf{P}_{W_{JQ}}\mathbf{Y}\|^2 = \|\mathbf{P}_{W_{JQ}}\boldsymbol{\tau}\|^2 + 12\sigma^2
E\|\mathbf{P}_\perp\mathbf{Y}\|^2 = (N-44)\sigma^2

where the last one comes from the fact that \boldsymbol{\tau} is orthogonal to the residual subspace. The left-hand side of each of these is just an expected sum of squared random values. On the right hand side, we have two terms: the first due to the deterministic effects of the levels in a factor, and the second due to iid noise in the measurements. If there is no effect at a factor, then its deterministic component will be zero. In addition, if there are no differences between the effects in a factor, then the projections will be zero. Hence, to test for significant differences between the effects in a factor, all we need to do is compare the empirical sum of squares of the projections of the responses to the relevant subspace and to the residual subspace, e.g., for the treatments we look at the ratio

F = [\|\mathbf{P}_{W_T}\mathbf{y}\|^2/24]/[E\|\mathbf{P}_\perp\mathbf{y}\|^2/(N-44)].

Under the assumptions of our model, this statistic will be F-distributed with parameters (24, N-44). We can thus compute the probability of observing that statistic or larger. This is all presented in the ANOVA table. The first column shows what subspace we are looking at. The “df” column shows its dimensionality. The “sum_sq” columns shows the squared Euclidean norm of the orthogonal projections. The “mean_sq” column shows the squared Euclidean norm divided by the number of dimensions. The “F” column shows the ratio of the mean squared at the factor divided by the mean_sq of the residual. Finally, the “PR(>F)” or “p” column shows the probability of observing a statistic at least as extreme as the one computed.

Let us look at the ANOVA table for the balanced dataset (keeping only the 19 transcriptions rated by all judges) and compare with our squared norm projections:

          df      sum_sq    mean_sq          F        PR(>F)
T       18.0  159.347368   8.852632  13.054739  2.611261e-29
J        3.0   16.934211   5.644737   8.324142  2.337900e-05
Q        4.0  150.326316  37.581579  55.420547  5.317403e-36
J  Q   12.0   48.473684   4.039474   5.956904  1.902534e-09
E      342.0  231.915789   0.678116        ---           ---

And then computing the squared Euclidean norms of each projected response vector:

||PWT y||^2 = 159.34736842105423
||PWJ y||^2 = 16.934210526315855
||PWQ y||^2 = 150.32631578947368
||PWJQ y||^2 = 48.47368421052631
||PP y||^2 = 231.91578947368413

Perfect agreement! Hence ANOVA shows that our statistical conclusions are that the levels in each factor have significant differences. However, the meaning of the statistic in the individual factors of J and Q is actually in doubt. We see there is a significant differences in the levels of the interaction of the two factors. Hence, we cannot say for each individual factor whether there is a significant difference in its levels because the computation of its mean square involves averaging with the interaction terms. If the interaction terms were not significantly different from each other, then the mean square computation would involve only the levels of the single factor. So, for this particular plot and treatment structure we can only make the following conclusions:

  1. There is a significant difference between transcriptions.
  2. There is a significant difference between judge-quality combinations.

Now what about the unbalanced design, where the orthogonality of factor subspaces breaks? Here’s the ANOVA table and projection results for the model as specified Y ~ C(T) + C(J)*C(Q):

          df      sum_sq    mean_sq          F        PR(>F)
T       24.0  202.144574   8.422691  12.590690  2.096389e-35
J        3.0   16.675775   5.558592   8.309282  2.283125e-05
Q        4.0  170.465116  42.616279  63.705103  2.522982e-41
J  Q   12.0   51.623195   4.301933   6.430760  1.871955e-10
E      386.0  258.219246   0.668962        ---           ---

||PWT y||^2 = 202.14457364340936
||PWJ y||^2 = 18.589378838216078
||PWQ y||^2 = 170.4651162790689
||PWJQ y||^2 = 51.623195409242236
||PP y||^2 = 260.4063341051713

Our statistical conclusions are identical, but we see a slight difference in the numbers for the judge factor and the residual E. Now here’s the results for the same model, but specified Y ~ C(J)*C(Q)+C(T):

          df      sum_sq    mean_sq          F        PR(>F)
J        3.0   18.589379   6.196460   9.262801  6.273992e-06
Q        4.0  170.465116  42.616279  63.705103  2.522982e-41
T       24.0  200.230970   8.342957  12.471500  4.419656e-35
J  Q   12.0   51.623195   4.301933   6.430760  1.871955e-10
E      386.0  258.219246   0.668962        ---           ---

||PWJ y||^2 = 18.589378838216078
||PWQ y||^2 = 170.4651162790689
||PWT y||^2 = 202.14457364340936
||PWJQ y||^2 = 51.623195409242236
||PP y||^2 = 260.4063341051713

Now we see the numbers for the judge factor is the same, but those of the transcription factor and residual are slightly different. This difference comes from how the ANOVA table is computed: it iteratively decomposes the response vector, removing the orthogonal components in each subspace. We saw last time that there exists some overlap between the judge and transcription subspaces. Nonetheless, it appears that for this particular model, our statistical conclusions are not changed between the balanced or unbalanced design. And furthermore, the interaction between judge and quality makes the differences of the statistics for the individual factors moot.

Next time we will look at other designs and their implications for our statistical conclusions.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s