Making sense of the folk-rnn v2 model, part 10

This is part 10 of my loose and varied analyses of the folk-rnn v2 model, which have included parts 1, 2, 3, 4, 5, 6, 7, 8 and 9. In the last part we looked at the similarities of the activations inside folkrnn v2 as it generated a particular transcription. Today we are looking at how our observations change for a different generated transcription.

Here’s a strange transcription generated by folkrnn v2:

Screen Shot 2019-07-21 at 13.13.53.png

The second and third parts have a counting error, which can be fixed easy enough. The Scotch snap in bar 9 is unexpected. I like the raised leading tone in the last bar. Otherwise this is a pretty boring tune.

Here’s the Gramian of the matrix of one-hot encoded input/outputs:

Onehot.pngWe see a lot of repetitions, forwards and backwards, which comes from the stepwise up and down the minor scale.

Here’s the Gramian of the softmax output vectors:

softmax.pngWe again see a large number of pairs are far from each other. Of the 12,720 unique pairs of different vectors, 5,745 have a distance greater than 1.98. Second, the probability distributions in some of the steps generating measure tokens are close to identical — which is different from before where nearly all of them appeared quite similar. However, we see at each point when a measure token is produced that the distributions are very different from all others (the crisscrossing powder blue lines). Third, many of the structures seen in \mathbf{X}^T \mathbf{X} are here as well, including some of the backward-slash diagonals. Fourth, we do not see the distributions produced during the first and penultimate bars of each part as overlapping much with other distributions. The distributions produced in the third bar seems to have the greatest dissimilarity to all others — which is curious because of its similarity to the first bar.

Looking at the Gramian of the normalised hidden-state activations of the three layers shows the same kinds of structures we saw before:

ezgif-1-4ac47a82bfe4.gifAgain, the Gramian of the normalised layer-two hidden-state activations appears most similar to the Gramian of the one-hot encoded input. The diagonal lines in the Gramian of the layer-3 hidden state activations are not as strong as before. And there now appear several shorter diagonal lines between the stronger ones.

Here is an animation showing the Gramian from the out gate activations in each layer:

ezgif-1-ac36201b9305.gifThere’s still similarity with the Gramians of the hidden state activations. The grid patterns are interesting. From the first layer output activations they demarcate the three parts of the tune. In the third layer the grid shows the bars.

Here’re the Gramians of the cell gate activations with the hyperbolic tangent:


And here they are without the nonlinearity:


Again we see the cell gate activation of each layer saturates more and more in the same direction as the generation process runs. The extent of this saturation is least present in the first layer, and appears to exist in all of the second and third parts of the transcription in the second layers. The cell gate activations in the third layer are curiously calm.

Here are the Gramians of the in-gate activations of each layer (pausing at the last layer):


Not much going on here that we don’t see in the other gates. And here is the Gramian of the unit norm forget gate activations of all layers:

ezgif-1-7050a92b060a.gifThe three sections are clearly visible.

As before, comparing the activations between gates in each layer does not show any of these structures.

So it seems many of our observations hold!




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s