Modeling the 2020 AI Music Generation Challenge ratings, pt. 3

In the last part, we looked at the unabbreviated form of the linear response model from considering plots as the cross of judge, quality and replicate factors, mapped to transcriptions as treatments:

r_{tjq} = (\mu + \beta_{t4101} + \beta_A + \beta_{mel} + \beta_{Amel}) + I_{t,t7983}(\beta_{t7983}-\beta_{t4101}) + \ldots + I_{t,t482}(\beta_{t482}-\beta_{t4101}) + I_{j,B}(\beta_B-\beta_A+\beta_{Bmel}-\beta_{Amel}) + \ldots + I_{j,D}(\beta_D-\beta_A+\beta_{Dmel}-\beta_{Amel}) + I_{q,str}(\beta_{str}-\beta_{mel}+\beta_{Astr}-\beta_{Amel}) + \ldots + I_{q,int}(\beta_{int}-\beta_{mel}+\beta_{Aint}-\beta_{Amel}) + I_{j,B} I_{q,str} (\beta_{Bstr}-\beta_{Bmel}-[\beta_{Astr}-\beta_{Amel}]) + \ldots + I_{j,D} I_{q,int} (\beta_{Dint}-\beta_{Dmel} - [\beta_{Aint}- \beta_{Amel}]) + \epsilon_{tjq}

This odd form comes from the fact that we are regressing on categorical variables: judges, qualities, their interaction, and transcriptions. The reference level we choose is arbitrary: we use transcription 4101, judge A, and the melody quality. This expression becomes more succinct in matrix notation:

\mathbf{y} = \mathbf{X}\boldsymbol{\beta}' + \mathbf{n}

where \boldsymbol{\beta}' are the 44 parameters, \mathbf{y} is our N responses stacked in order of the numbering of the plots, and \mathbf{n}\sim\mathcal{N}(\mathbf{0},\sigma^2\mathbf{I}_N). The design matrix \mathbf{X} is a N \times 44 matrix of zeros and ones with the ith row indicating which factors are active for the ith plot. In our case, N=430 – though we have 500 responses, only 430 are numerical.

Let’s partition the design matrix in the following way:

\mathbf{X} = [\mathbf{1}_{N} | \mathbf{X}_{T'} | \mathbf{X}_{J'} | \mathbf{X}_{Q'} | \mathbf{X}_{J'Q'}]

The first column is all ones. The next 24 columns are \mathbf{X}_{T'}, which identify which plots are treated by which transcriptions, starting with t7983 (since t4104 is the reference). The next three columns are \mathbf{X}_{J'}, which identify the plots involving which judges – the first column being all plots with judge B (since judge A is the reference). Then \mathbf{X}_{Q'} gives four columns identifying which plots involve which quality, starting with structure (since quality melody is the reference). Finally, \mathbf{X}_{J'Q'} identify in its twelve columns which plots involve a given judge (B, C or D) and a given quality (structure, playable, memorable, and interesting).

Now we can rewrite the model with these terms:

\mathbf{y} = \mathbf{1}_{N} \mu' + \mathbf{X}_{T'}\boldsymbol{\beta}'_{T'} + \mathbf{X}_{J'}\boldsymbol{\beta}'_{J'} + \mathbf{X}_{Q'}\boldsymbol{\beta}'_{Q'} + \mathbf{X}_{J'Q'}\boldsymbol{\beta}'_{J'Q'} + \mathbf{n}

where we have partitioned the parameter \boldsymbol{\beta}' according to the factors. Since \mathbf{y} is a vector in \mathbb{R}^N, so are all the others. The first term is a vector that points in a one-dimensional subspace of \mathbb{R}^N, namely \textrm{C}(\mathbf{1}_{N}) (the column space of \mathbf{1}_{N}). Then any combination of first two terms, \mathbf{1}_{N}a + \mathbf{X}_{T'}\mathbf{b}, is a vector that points in the 25-dimensional subspace \textrm{C}([\mathbf{1}_{N} | \mathbf{X}_{T'}]). (We know it is 25-dimensional because no linear combination of the 24 columns in \mathbf{X}_{T'} can create \mathbf{1}_{N}.) Then any combination of the two terms \mathbf{1}_{N} a + \mathbf{X}_{J'}\mathbf{c} is a vector that points in the 4 dimensional subspace \textrm{C}([\mathbf{1}_{N} | \mathbf{X}_{J'}]). And so on.

Now notice the following: \textrm{C}([\mathbf{1}_{N} | \mathbf{X}_{T'}]) = \textrm{C}(\mathbf{X}_{T}) where \mathbf{X}_{T} is the matrix of design vectors relating all 25 transcriptions (columns) to all plots (rows). Similarly, \textrm{C}([\mathbf{1}_{N} | \mathbf{X}_{J'}]) = \textrm{C}(\mathbf{X}_{J}), where \mathbf{X}_{J} is the matrix of design vectors relating all 4 judges to all plots. And \textrm{C}([\mathbf{1}_{N} | \mathbf{X}_{Q'}]) = \textrm{C}(\mathbf{X}_{Q}). We can thus attempt to decompose our response vector into orthogonal components, i.e.,

\mathbf{y} = \mathbf{P}_0\mathbf{y} + (\mathbf{P}_T-\mathbf{P}_0)\mathbf{y} + (\mathbf{P}_J-\mathbf{P}_0)\mathbf{y} + (\mathbf{P}_Q-\mathbf{P}_0)\mathbf{y} + (\mathbf{P}_{JQ}-\mathbf{P}_J-\mathbf{P}_Q-\mathbf{P}_0)\mathbf{y} + \mathbf{P}_\perp\mathbf{y}

where \mathbf{P}_0 is the projection matrix onto \textrm{C}(\mathbf{1}_{N}) and then \mathbf{P}_T is the projection matrix onto \textrm{C}(\mathbf{X}_{T}), etc. The last term, \mathbf{P}_\perp is the projection matrix onto \textrm{N}(\mathbf{X}^T), i.e., the left nullspace of the design matrix – that is, the subspace that is orthogonal to \textrm{C}(\mathbf{X}). This subspace contains all the stuff outside the factors considered in our experiment.

If we can decompose \mathbf{y} in such a way, then we have isolated the effects of all factors from one another, i.e., made all components uncorrelated. The accuracy of this orthogonal decomposition depends on our experimental design. If we have for each factor an equal number of plots for each of its level – for instance, all judges rate the same number of transcriptions, and all transcriptions were rated in the same number of categories – then we have a balanced design. This permits the decomposition of the response vector into orthogonal subspaces related to the factors.

For the 430 responses we have, the result is an unbalanced design. However, we can limit our analysis to only those 380 transcriptions having all ratings in all judge-categories. Then we will have a balanced design, and all subspaces will be orthogonal. Let’s verify this:

import pandas as pd
import numpy as np

df = pd.read_csv('judgescores.csv')

# remove transcriptions that do not have all ratings
remove = [4589,6951,3745,5102,897,7151] 
for dd in remove:
    ix = df.loc[df['T'] == dd]
    df = df.drop(index=ix.index)

# build relevant design matrices
from patsy import dmatrices
y, X_J = dmatrices('Y ~ C(J,levels=levels_J)', data=df)
y, X_T = dmatrices('Y ~ C(T,levels=levels_T)', data=df)
y, X_Q = dmatrices('Y ~ C(Q,levels=levels_Q)', data=df)
y, X_JQ = dmatrices('Y ~ C(J,levels=levels_J):C(Q,levels=levels_Q)', data=df)

# build projection matrices
ones = np.ones((X_T.shape[0],1))
PV0 = np.matmul(ones,np.matmul(np.linalg.inv(np.matmul(ones.T,ones)),ones.T))
PWT = np.matmul(X_T,np.matmul(np.linalg.inv(np.matmul(X_T.T,X_T)),X_T.T))-PV0
PWJ = np.matmul(X_J,np.matmul(np.linalg.inv(np.matmul(X_J.T,X_J)),X_J.T))-PV0
PWQ = np.matmul(X_Q,np.matmul(np.linalg.inv(np.matmul(X_Q.T,X_Q)),X_Q.T))-PV0
PWJQ = np.matmul(X_JQ,np.matmul(np.linalg.inv(np.matmul(X_JQ.T,X_JQ)),X_JQ.T)) \
                -PWQ-PWJ-PV0
PP = np.eye(len(df))-PWT-PWQ-PWJ-PWJQ-PV0

# Print Frobenius norm of projections
print("||PWT X_J|| =", np.linalg.norm(np.matmul(PWT,X_J)))
print("||PWT X_Q|| =", np.linalg.norm(np.matmul(PWT,X_Q)))
print("||PWT X_JQ|| =", np.linalg.norm(np.matmul(PWT,X_JQ)))
print("||PWJ X_Q|| =", np.linalg.norm(np.matmul(PWJ,X_Q)))
print("||PWJQ X_J|| =", np.linalg.norm(np.matmul(PWJQ,X_J)))
print("||PWJQ X_Q|| =", np.linalg.norm(np.matmul(PWJQ,X_Q)))
print("||PP X_T|| =", np.linalg.norm(np.matmul(PP,X_T)))
print("||PP X_J|| =", np.linalg.norm(np.matmul(PP,X_J)))
print("||PP X_Q|| =", np.linalg.norm(np.matmul(PP,X_Q)))
print("||PP X_JQ|| =", np.linalg.norm(np.matmul(PP,X_JQ)))
||PWT X_J|| = 4.839134180639518e-14
||PWT X_Q|| = 4.7810995665843586e-14
||PWT X_JQ|| = 4.862522204521831e-14
||PWJ X_Q|| = 1.203164985943001e-14
||PWJQ X_J|| = 2.2633931374399254e-14
||PWJQ X_Q|| = 2.842004379718575e-14
||PP X_T|| = 5.1357913858050114e-14
||PP X_J|| = 5.165031062645677e-14
||PP X_Q|| = 5.309231218189656e-14
||PP X_JQ|| = 5.46095478984234e-14

So all those norms are effectively zero, proving that we can decompose the response vector into orthogonal components related to each factor, and a residual.

What happens if we try to do this for the unbalanced design?

||PWT X_J|| = 3.4396795415906425
||PWT X_Q|| = 6.760966066567385e-14
||PWT X_JQ|| = 1.538271455162399
||PWJ X_Q|| = 4.059442325587025e-14
||PWJQ X_J|| = 2.0204242927922506e-14
||PWJQ X_Q|| = 2.627683108776917e-14
||PP X_T|| = 1.0780943024977099
||PP X_J|| = 3.4396795415906443
||PP X_Q|| = 6.396470901927262e-14
||PP X_JQ|| = 1.5382714551623993

We see the transcription subspace is not orthogonal to the judge and judge quality subspace, and there’s a bit of overlap of the residual subspace with the subspaces of the transcription, judge and interaction of judge and quality. However, these norms in \mathbb{R}^{430} are still so small that we might ignore them safely.

Next time we will see what all this means when it comes to testing for significant effects among the factors.

One thought on “Modeling the 2020 AI Music Generation Challenge ratings, pt. 3

  1. Pingback: Modeling the 2020 AI Music Generation Challenge ratings, pt. 4 | Folk the Algorithms

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s