We’ve got more interesting candidates this month. Go vote!
Here’s my contribution for the August 2018 tune.
At some point every morning I have a listen to a random set of seven tunes on the Endless folk-rnn Traditional Music Session, and nearly every time I hear one or two things that are worth keeping. I transfer these to The Machine Folk Session website, which as a result now has 200 machine folk tunes (many of which have been uploaded by users of folkrnn.org). I put three of my favorites below.
Here’s my setting of “Jump at the Sun” generated by the v1 system (which also generated the title):
Ideally, the musicians who are not playing should jump where it’s obvious.
Here’s my setting of tune 189 by the v3 folk-rnn system:
The system has created an AABBCC structure, with a very different B, and a 4-bar C part.
Here’s my setting of tune 5292 by the v2 folk-rnn system:
Quite some harmonic adventurousness from this little system!
Here’s my contribution to The Machine Folk Session Tune of the Month for August 2018:
I think it turned out to be a nice tune with just a few adjustments. Here’s the original tune, produced by folkrnn:
The weakest part of this tune to me is bars 11-12. I felt repeating the idea of bar 10 with a descending harmony is just perfect to describe how I feel at the close of summer. A further adjustment is just making first and second endings to each part. Here’s my tune:
A fundamental problem we face with a system that can generate an endless number of folk tunes is how to find the interesting ones. In this direction, we have started to look at simple measures of similarity. Since we are dealing with sequences of symbols, a natural first step is to think about comparing those sequences according to their compressibility: how much does knowing one sequence help us describe another sequence with parsimony? In this post we briefly look at the normalised compression distance (NCD) as some measure of similarity (which is at the core of the evaluation approach proposed in J. Ens and P. Pasquier, “A Cross-Domain Analytic Evaluation Methodology for Style Imitation“, in Proc. Int. Conf. Computational Creativity, 2018).
Given two strings and , the NCD between them is defined by
where is the Kolmogorov complexity of the string , and is the concatenation of and . The Kolmogorov complexity of any string is not computable, but an upper bound can be estimated via using a compression algorithm, e.g., using the Lempel-Ziv-Welch algorithm followed by Huffman coding (as done in gzip). expresses the degree to which can be compressed. So expresses the degree to which either of the two strings helps in compressing the other. In the limit that the two strings are the same, this value will be zero because . In the limit that the two strings are completely different, will grow to be no longer than twice . Normalising the numerator with then will give a value close to (but could actually be greater than 1).
In our work, we are looking at symbolic transcriptions of folk tunes. Here’s an example transcription of length 109 tokens from our training data:
M:4/4 K:Cmaj G 3 F E F G A | _B G c B A F F 2 | G A G F E G c f | e c d B G c c 2 :| e 2 c e d e c d | e 2 c e d B G 2 | e 2 c e d e c d | e c d B G c c 2 | e 2 c e d e c d | e 2 c e d B G ^F | G A B c d e f d | g e f d e c c 2 |
Let’s see how this is compressed by a text compression algorithm. LZW (we are using python-lzw) iteratively builds a codebook from an input sequence. It starts with codewords consisting of single items, and then adds longer sequences that it finds are unique. For the example above, this codebook of symbols looks like (each pair of square braces designates a codeword):
['M:4/4'] ['K:Cmaj'] ['G'] ['3'] ['F'] ['E'] ['A'] ['|'] ['_B'] ... ['M:4/4', 'K:Cmaj'] ['K:Cmaj', 'G'] ['G', '3'] ['3', 'F'] ['F', 'E'] ['E', 'F'] ['F', 'G'] ['G', 'A'] ['A', '|'] ['|', '_B'] ['_B', 'G'] ['G', 'c'] ['c', 'B'] ['B', 'A'] ['A', 'F'] ['F', 'F'] ['F', '2'] ['2', '|'] ['|', 'G'] ['G', 'A', 'G'] ['G', 'F'] ['F', 'E', 'G'] ['G', 'c', 'f'] ... ['g', 'e'] ['e', 'f', 'd'] ['d', 'e', 'c', 'c']
This is ordered in the sequence that the codebook is created. Here we see unique individual elements are added first, followed by pairings “M:4/4 K:Cmaj” and then “K:Cmaj G”, … and then “G A G” since the pair “G A” had already been seen. Towards the end of building the codebook we see longer sequences, e.g., “d e c c”.
The LZW encoding of the length-109 transcription above results in a sequence of length 83 , which is decoded according to:
['M:4/4'] ['K:Cmaj'] ['G'] ['3'] ['F'] ['E'] ['F'] ['G'] ['A'] ['|'] ['_B'] ['G'] ['c'] ['B'] ['A'] ['F'] ['F'] ['2'] ['|'] ['G', 'A'] ['G'] ['F', 'E'] ['G', 'c'] ['f'] ['|'] ['e'] ['c'] ['d'] ['B'] ['G', 'c'] ['c'] ['2'] [':|'] ... ['e'] ['f'] ['d', '|'] ['g'] ['e', 'f'] ['d', 'e', 'c'] ['c', '2', '|']
This stream of bytes can be further compressed by allocating bytes to codewords depending on their frequency, i.e., via Huffman coding. For the transcription above, the symbols “G”, “F”, and “E” will be given the fewest bits since they are the most frequent, while “M:4/4”, “K:Cmaj” and “d e c” will be given the most bits since they are the least frequent. For now we will just stick to computing the NCD using LZW compression.
Now that we have an idea of how to estimate the Kolmogorov complexity of strings, let’s look at the NCD within and across two corpora: the 23,636 tunes in the training data of the folk-rnn v2 model; the 60,000 tunes generated by v2 (available in the folk-rnn (v2) Session Books Vols 1-10). We draw a tune from the training corpus, a tune from the generated corpus, and sample a random sequence having a length that is the maximum of the two tunes. We then compute the NCD of each tune with itself (setting ), and paired with the others (where and come from the two corpora, or is a random sequence). We repeat this 10,000 times and then plot the histogram of the resulting NCD.
By way of examples, here’s a generated tune:
M:3/4 K:Cmaj G, 2 | C 2 E /2 G /2 B /2 d /2 c 2 | B > A G E /2 C /2 D 2 | E 2 (3 F D B, C > C | C 4 | G A /2 B /2 | c > B c d e < c | d > B G 3 B /2 d /2 | c > A B G (3 A G E | G 4 G A /2 B /2 | c > B c d e < c | d > B G 3 B /2 d /2 | c > A B G (3 A G E | G 4 E > D | C 2 E /2 G /2 B /2 d /2 c 2 | B > A G E /2 C /2 D 2 | E 2 (3 F D B, C > C | C 4 |
And here’s a random tune:
B, ^f' ^a M:2/4 | E F a' 12 M:3/2 _C ^c E, =D |1 _B M:6/8 C K:Cmin d' ^G /4 _e _B e M:6/8 /3 4 =g _D 12 ^A, G =d > 3/4 =C, D _d =E K:Cmaj _E, ^D ^F, C, =G =g 3 5 b 9 f' /2< C /3 D =B _E, /2> K:Cmix =a _G ^c' ^f G, ^C, =A, M:3/4 b' =a | A, d' ^A, 3/4 =G M:9/8 7 ^F =b _b e M:2/4 |1 _d' /2 e' 7 c ^A _D K:Cmaj a' =E (6 3/4 G, > _E, _e' =E, :| 3/4 _D b ^f ^C K:Cmix ^F /2> =C, 16 F C =G, C ^C, =F, (2 _e' =F K:Cdor /2 e' (3 7/2 5/2 F, _E _e =F, ^A c 2> |2 =b 4 (5 /2> 7 M:9/8 f _B /2 7/2 K:Cmaj |2 [ B =f' _E, _E, _e _c ^c M:3/2 K:Cmin
Below are the distributions of the NCD resulting from the six pairings of the three types of sequences: training to training, generated to generated, training to generated, and random to random, training to random, and generated to random. For each, we sampled 50,000 times.
In sky blue is the NCD distribution for tunes drawn from the training corpus; in coral the NCD distribution for generated tunes; in gold khaki is the NCD distribution for random sequences; in mauve is the NCD distribution of real tunes and generated tunes; and in matcha and light periwinkle are the NCD distributions of the real and generated tunes with the random sequences, respectively. We see markedly different distributions. The three NCD of the random sequences and their pairings with real and generated tunes are predictably high. The three NCD distributions of the real tunes and generated tunes have smaller means.
LZW compression works better as strings become longer. For instance, compressing a novel rather than a paragraph will result in a greater percentage of compression. So let’s run the same experiment as above but using multiple transcriptons. Here’s the results when we build each string from a pair of tunes:
The means of each distribution has increased, but their order remains the same. Here’s the same for strings built from 5 tunes:
Note the difference in the y-scale. It seems that the distributions are converging. Here are the results using strings composed of 100 transcriptions.
Several questions now arise:
The machine folk tune of the month, with three votes, is good old Folk RNN Tune №1931… which needs a catchier title. Maybe something August themed:
The Corn is Nearly Ready
Summer’s Almost Done and Gone
Looking Forward to Cooler Weather
Anyhow, let’s get learning!
Everytime I open up The Endless folk-rnn Traditional Music Session, I find something interesting. Here’s a 9/8 tune generated by the first version of the folk-rnn system, which it has titled, “Sailing Through The Hoples Mac”:
T: Sailing Through The Hoples Mac M: 9/8 K: Gmaj |:Bdd d2B A2 A|Bdd d2 B G2A|Bdd d2B d2B|ged BAB G3|| | dBd g2g b2d|de2 f2g a2a|bgd B2d e2d|BBB B3 dBA| dz f3 b3|afd ~a3 b2a|g2B B2g B2c|d2B A2D EFG:||
In this case, folk-rnn has done an ok job with a time signature that is not too common in the training data (only 847 examples out of 23,636).
I have added a setting of this transcription with a few minor changes to The Machine Folk Session.