Deep learning for assisting the process of music composition (part 3)

This is part 3 of my explorations of using deep learning for assisting the process of music composition. In this part, I look at creating things other than strict folk tunes with a model trained by deep learning methods on over 23,000 folk tunes. Part 1 is here. Part 2 is here.

Does the world really need tens of thousands of new reels and jigs? Maybe or maybe not; but my main motivations for composition are to create musical experiences, solve puzzles, learn, and be funny/dramatic. Toward these ends, I am finding that this music generation system can provide a wealth of materials and ideas. Here are some examples.

The system under study generated this curious little output:

Not a tiptop imitation when it comes to Western folk music, but it immediately brought to my mind drum and fife music, as well as music like that performed by Indian brass bands like the great Jaipur Kawa Brass Band. So, with a little reorchestrating, editing, and effects, we transform it into something like a passing marching band:

The system generated another failed emulation of Western folk music:

and it gave me the idea to create an antiphonal duet. I enjoy the improvisatory feeling of the playing.

Our model generated a piece it calls, “A Fhsoilah Kilnie”, which is made a right mess by the guitarist and flutist.

I don’t know what the system was “thinking.” However, administering some major changes with my certified artistic license and now we have a serious piece with integrity. Bonus: it’s dancable for very agile penguins and the occassional grumpy elephant seal.

Finally, when something tells me to listen to a piece titled, “A Bump Of Howled Sho The fetch”, I have the expectation of something dramatic. Our system generated such a piece, to which our session performers do no justice.

Instead, I layer all of my favorite sounds, and then layer them again but amplified, to make a real big bump of howled shoing all fetches everywhere.

No doubt those fetches are now shoed by a massive bump of howled.

Advertisements

Deep learning for assisting the process of music composition (part 2)

This is part 2 of my explorations of using deep learning for assisting the process of music composition. In this part, I look at some “failures” of a model trained by deep learning methods on over 23,000 folk tunes. Part 1 is here.

Here is a little bit of extended tension composed by the system, which it has titled “Barch Beach”:

I am reminded of the beach scene from “There Will Be Blood.”

The system sometimes composes “sitcom style segues”. For instance, here is a segue perfect for cutting from a festive birthday scene (the system has titled it “The Birthday”) to a tender conversation between two characters that viewers want to see fall in love:

And here is a segue it has titled “The Last Night’s Fanky”, which can cut from said two characters realising what has happened to their humorous admission to the rest of the group what has happened:

These will probably not be favourites to play in a session (although it could be fun to try “Barch Beach”). The same probably goes for the incredible failure that the system has titled, “Le The Chredor”:

T: Le The Chredor
M: 4/4
L: 1/8
K: Gmaj
EGBc BGAG|EAcA C2 BG|DBdB Bcde|fdad bdad|
fBgB fBeB|fBgB adfd|gBag bdbg|eeae (3ggg bg|
|:ecce dBcF|EFGf e2 e/f/g/f/|eBcB AGAB|1^dbcd BGED:|2dfcB A2 zd||
c2 cA C2 CA|GFCD E=FDC|DCDE FCB,d|cFFD EDCA,|
G,2 c,D G,2 G,c,|[A,4D4] [C3:|
[K: Dmaj]
|::|E|"Dm"F2 FD C2 GE|FDFA F4|"C7"ECCE "Dm"FEDF|"Dm"BGFG A2 "A"Ac|
d2 df "Am"eAdc|[dA]fgf e/f/g fe|dBAd/A/ Bcde|A2 "Dm"dc d2 DE:|
|:ECCA _BcBG|F2 AD FAde|fffe f/g/a gf|eccB AGG/G/G|
"Dm"AAFb f2 ef|gdce "G7"dcBA|GEEE {F}EDCE|"F"F2 ^F/G/A FDCD|
|A,CDE F2 Ac|dfed c2 B2|"Bb"AFDF "A7"C3 D| Ec^cd B3 A|
"^Am"AcAF E3 D|"A"A,=FEC A,2 ag|"Dm"fdd(d3-|d)dcd|
f2 fa g2 ab|"D7"agfd "Gm"fedB|"Gm"G2 gf gfdB|"Am"c}"C"Bde (3dcB =A/B/c|
"Dm"f/g/a gf "Em"egag|"Dm"fdef "Gm"d2 de|"Am"fdcd "A7"edBG|1 "Am"D EA^C D2 DA:|2 "Dm"DDDE "Dm"D4|
"Am"A,DEF "C"GECE|"G"D2 A,B, C2 D^C|"Em"D2 GF EDCB,|"Dm"A,2 DF "Dm"A,4:|


I can’t see where in the ABC it says to turn everything up to 11.

Anyhow, I find some of the “failures” of this system very compelling. Take this curious piece the system has composed and titled, “The Castle Star”:

T: Castle Star, The
M: 4/4
L: 1/8
K: Gmaj
d2 de dBGB | e4 d4 | e2 ed d2 B2 | d4 BABd | e2 ed e2 B2 | d2 ed BdBA | GEDE GDBA | G6 :|G3 G G2 GA | B2 Bd efgd | B2 BG A2 GA | BA A2 BGED |G2 G2 G2 GF | EGGE D2 GA | B2 B2 ABcB | AGFG A2 cB ||
| gfgf e2 dg | B2 BA B2 dB | ABAG E2 D2 | GA BA G2 Bd |
efge d2 BG | EGAG A4 | efge dBBd | BABG A2 GE|

TheCastleStar

That first section sounds oriental because it uses a major pentatonic scale (you can play it on just the black keys of the piano, starting with F#). The first section has a really nice contour in two parts, almost like the first three and a half measures poses a question, and the rest answers it with confidence. The cadence is fun too. The end of the first section is missing a beat from the last measure. The second section consists of a strange rambling line that doesn’t really go anywhere, or have any identifiable musical ideas. It also breaks with the pentatonic scale; but this returns in the third section, which seems a good complement to the A section.

From where did this pentatonic inspiration come? In fact, pentatonic scales are common in Scottish and Irish music. There are tunes within our dataset that stay strictly within a pentatonic scale, such as “The Chinese”:
82dQt7-000

(The tune is not an Irish tune, but apparently French Canadian.)

The first figure of “The Castle Star” exaclty matches that appearing in the C section of
The Fifth Legion March“, which is almost pentatonic through and through except for the use of the leading tone (F#):
Legion

We find no tune in our collection that shares the same two measures as “The Castle Star”, but what about its second measure?

It plays an important role in “Tom Tully’s“:
YqRR3y-000

It appears twice in “Jearoid” (which is also in the same pentatonic scale as “The Castle Star”):
6xqu6o-000

It appears once in “Dragon’s Teeth“:

It starts off the tune of “Michael Coleman’s“:

and the funny reel, “Good morning to your nightcap”:

as well as “Siuil A Ruin“; and it starts off the turn of “Ado Barker’s“:

But it ends “The Weavers’ March” … and wouldn’t you know it! It appears in the “Tibetan National Anthem” which is in our dataset??? (Read the comments.)

So, back to “The Castle Star.” This can be made into a lovely tune. The entire part A is great; but to give it more of a feeling of a human conversation let’s add a few ties to lengthen the notes. Let’s cut section B completely, and make a few adjustments to section C to provide a second part of the conversation begun in section A — but this time with the topic augmented with the leading tone (F#). We also fix the ending, and copy the nice cadence that ends section A but an octave higher. Og voilà!

TheCastleStarv2_high
Synthesized with some appropriate portamento on the violin (to imitate the er-hu, poorly), and adding a sound like a pipa, we have a nice non-Irish tune!

All in all, just a few minor modifications to the output of this model, and we now have a tune that nicely mimics a music style of a completely different part of the world from the West. Now: can I put my name on it as the composer? Or have I merely edited the output of a nameless model?

In the next parts, I will be taking a look at composing things other than folk tunes from the output of the system.

UPDATE: Here I am performing the piece.

Deep learning for assisting the process of music composition (part 1)

This is part 1 of my explorations of using deep learning for assisting the process of music composition. In this part, I look at some almost-winning output of a model trained by deep learning methods on over 23,000 folk tunes, and make improvements to produce a session-ready piece.

There have been several recent explorations of music generation using statistical models learned by “deep learning” methods:

These are exciting contributions to the well-studied domain of algorithmic music composition. The prospect of developing a system that can learn to emulate characteristics of a collection of music data is one of the aims of “music metacreation“. Thus far, the most convincing demonstrations of music style emulation in my opinion are the EMI system of David Cope, and Continuator of François Pachet. Historically, the Illiac Suite for String Quartet (1956) is a monumental piece in this direction, but of an approach based on expert knowledge rather than learning from a corpus of exemplary music.

In The Infinite Irish Trad Session, we have used a recurrent neural network with long short-term memory to build a model from 23,962 traditional Irish tunes (well, a large number of the tunes are Irish). We then sample from that model to generate a new tune, and have our performance system realise it in a way imitating a real Irish trad session. So far, we have produced over 32,000 recordings, which amounts to 491 hours of audio (and a surprised administrator asking why my home directory has ballooned to over 80 GB). From this collection, a system randomly selects every 5 minutes seven recordings and then serves it as a set. To see all of the MP3 we have produced so far, look here.

After spending several hours listening to these sets (with my wife feigning incredible enthusiasm, but also surprise at how good the results can sound! Thanks honey!!), it is clear that the model learned does encapsulate important characteristics of the music style. Many of the pieces have an Irish feel, and I am surprised how many are close to being ready for a session.

For instance, here is the ABC produced by the system that it has titled “The Doutlace”:

T: Doutlace, The
M: 4/4
L: 1/8
K: Gmaj
A2 eA cAdA|BAGA BG ~G2|A2 eA cAeA|decd BA A2|A2 eA cAeA|GABG AGEG|AGEG FGAB|c2 BG AFDF:||:EAAB cABc|dBGA BdeB|cAAB cded|eAAB cedB|ecAB cAAB|cADA BAGF|EFGE FGAB|1 cABG A2 AB:|2 cABG A4||cAGA EA A2|cdef gedB| ABAG EA A2|dcde fdAG| |:cAGE FGAB|c2 cd efaa|gefa gedc|BABG FAE<G|EFGd EAGE|c2 AG EGGA|Ec ~c2 cdeg|~f3e decd||

Here is it converted to staff notation, with the audio of its realisation by our performance system.
TheDoutlace

I think this is a promising tune (it is a kind of “reel” since it is in 4/4). The system has produced two sensible melodies of 8 measures, each one repeated. (A “standard” Irish tune is two 8-bar sections, each repeated.) The first melody (the A section, or “tune”) has a nice little figure appearing in the first measure, which returns in the 3rd and 5th measures, but varied with the D raised to an E. The second melody (the B section, or “turn”) has its own little figure, which is repeated and varied. The contours of the two melodies are good. The melody of the B section spends more of its time in the higher notes than that of the A section, which is also typical. The intervals in the cadence in the fourth measure of the tune provides contrary motion to the cadence at the end of the turn. The A tone dominates throughout both sections and makes the tune and the turn sound like they belong together. The last section is odd, however, and doesn’t really fit. It feels as if the model has begun a second piece that it doesn’t finish.

“The Doutlace” is almost too good (and I spent less than a minute to find it among the 32,000 recordings), so I wonder if the system has plagiarised. Searching for the main figure of its tune among the 23,962 ABC tunes of our collection turns up nothing (“A2\_s*e\_s*A\_s*c\_s*A\_s*d\_s*A”); however, I find its variation occurs in the tune of a reel called “The Bag of Spuds”:

BagofSpuds

It also appears in the tune of the reel “Matt Peoples'”:

and the turn of the funny reel “Clais An Adhmid”:

and the tune of the reel “The New Copperplate”

and that’s it! These five reels are not all that similar to each other, even though the same figure appears in all.

So, it seems that if our system has been copying its learning materials, it is not so easy to detect.

Still, there are a few ways I want to improve “The Doutlace,” in terms of sound, music, and play. Everything after the turn should be removed. The ending of the turn is good, but that of the tune is lacking. I change the third appearance of the figure in the tune such that the E is dropped to the D, which echoes its initial appearance. I also make G major more prominent in the last two measures of the tune. In the turn, I think the figure EAA and cAA appears too many times, so I vary its third and fifth appearance by raising the A notes a whole step to B. This clashes with the root (A) to create tension, and strengthens the downward resolution. The red boxes in the score below shows my changes, with a recording of its realisation:

T: The Doutlace (v2)
M: 4/4
L: 1/8
K: Ador
A2 eA cAdA|BAGA BG ~G2|A2 eA cAeA|decd BA A2|A2 eA cAdA|GABG AGEF|G2 EG FGAB| c2 BG AGEF:||:EAAB cABc|dBGA BcdB|cAAB cdec|dBBc dedB|ecAB cAAB|dBBc BAGF|EFGE FGAB|1 cABG A2 GF:|2 cABG A4||

Doutlacev2

All in all, just a few minor modifications to the output, and we now have a tune ready for the session. Now: can I put my name on it as the composer? Or have I merely edited the output of a nameless model? (Typical for folk music, the composer is lost to the sands of time.)

In the next parts, I will be taking a look at the “failures” of the model, and the opportunities they bring.

UPDATE: Here I am playing the piece.

Weak contracts and machine learning: A presentation by Léon Bottou

These ICML2015 slides of Léon Bottou, Two high stakes challenges in machine learning, make several great points. The first is that the train/test paradigm in machine learning/artificial intelligence actually embodies creating systems having a weak “contract”. An example Bottou gives is of an object recognition system that is advertised with some accuracy. If one submits to that system data differing from the test set distribution, nonsense will result, and the system no longer works. This is in comparison to a sorting algorithm, which can sort any numerical data no matter its composition. The sorting algorithm thus does not have a weak contract. The answer of “more data” to improve performance of a system with a weak contract is empty, given the bias that seems to necessarily result.

The second point Bottou makes is that machine learning/artificial intelligence is all three: exact science, experimental science, and engineering. It is necessary that it is all three; however, trouble can arise when the “genres” are mixed. For instance, claiming that a system with a some estimated test error proves something of an exact nature. Third, Bottou points out that the experimental science of machine learning/artificial intelligence has been “dominated” by the train/test experimental paradigm … and this is challenging “the speed of our scientific progress.”

Bottou motivates increasing the ambitions of machine learning/artificial intelligence from building systems with weak contracts (reproducing X amount of ground truth of a dataset), to building systems that learn concepts: “In fact, a system that recognizes a “concept” fulfils a stronger contract than a classifier that works well under a certain distribution.” Bottou also recognizes that such an increased ambition necessarily leads to evaluation that is not as convenient as comparing labels to ground truth.

Bottou’s presentation encompasses much of what I am saying in machine music listening. I think we all want systems to learn concepts. Measuring the amount of ground truth reproduced by a system is no relevant measure of that. The train/test paradigm must be replaced.