This is part 1 of my explorations of using deep learning for assisting the process of music composition. In this part, I look at some almost-winning output of a model trained by deep learning methods on over 23,000 folk tunes, and make improvements to produce a session-ready piece.
There have been several recent explorations of music generation using statistical models learned by “deep learning” methods:
- “Lisl’s Stis”: Recurrent Neural Networks for Folk Music Generation, in which the system is trained on 1180 tunes in ABC format from a collection published in 1778
- Irish folk music generation, in which the system is trained on 23,962 tunes in ABC format from a growing online collection
- Modeling and generating sequences of polyphonic music with the RNN-RBM, in which the systems are trained on various collections of MIDI music
- Composing music with Recurrent Neural Networks, in which the system is trained on 8-measure fragments extracted from an online collection of classical MIDI music
- Synthesizig digital audio with recurrent neural networks, in which a system is trained on digital music audio of the group Madeon. (Code available here.)
These are exciting contributions to the well-studied domain of algorithmic music composition. The prospect of developing a system that can learn to emulate characteristics of a collection of music data is one of the aims of “music metacreation“. Thus far, the most convincing demonstrations of music style emulation in my opinion are the EMI system of David Cope, and Continuator of François Pachet. Historically, the Illiac Suite for String Quartet (1956) is a monumental piece in this direction, but of an approach based on expert knowledge rather than learning from a corpus of exemplary music.
In The Infinite Irish Trad Session, we have used a recurrent neural network with long short-term memory to build a model from 23,962 traditional Irish tunes (well, a large number of the tunes are Irish). We then sample from that model to generate a new tune, and have our performance system realise it in a way imitating a real Irish trad session. So far, we have produced over 32,000 recordings, which amounts to 491 hours of audio (and a surprised administrator asking why my home directory has ballooned to over 80 GB). From this collection, a system randomly selects every 5 minutes seven recordings and then serves it as a set. To see all of the MP3 we have produced so far, look here.
After spending several hours listening to these sets (with my wife feigning incredible enthusiasm, but also surprise at how good the results can sound! Thanks honey!!), it is clear that the model learned does encapsulate important characteristics of the music style. Many of the pieces have an Irish feel, and I am surprised how many are close to being ready for a session.
For instance, here is the ABC produced by the system that it has titled “The Doutlace”:
T: Doutlace, The
A2 eA cAdA|BAGA BG ~G2|A2 eA cAeA|decd BA A2|A2 eA cAeA|GABG AGEG|AGEG FGAB|c2 BG AFDF:||:EAAB cABc|dBGA BdeB|cAAB cded|eAAB cedB|ecAB cAAB|cADA BAGF|EFGE FGAB|1 cABG A2 AB:|2 cABG A4||cAGA EA A2|cdef gedB| ABAG EA A2|dcde fdAG| |:cAGE FGAB|c2 cd efaa|gefa gedc|BABG FAE<G|EFGd EAGE|c2 AG EGGA|Ec ~c2 cdeg|~f3e decd||
Here is it converted to staff notation, with the audio of its realisation by our performance system.
I think this is a promising tune (it is a kind of “reel” since it is in 4/4). The system has produced two sensible melodies of 8 measures, each one repeated. (A “standard” Irish tune is two 8-bar sections, each repeated.) The first melody (the A section, or “tune”) has a nice little figure appearing in the first measure, which returns in the 3rd and 5th measures, but varied with the D raised to an E. The second melody (the B section, or “turn”) has its own little figure, which is repeated and varied. The contours of the two melodies are good. The melody of the B section spends more of its time in the higher notes than that of the A section, which is also typical. The intervals in the cadence in the fourth measure of the tune provides contrary motion to the cadence at the end of the turn. The A tone dominates throughout both sections and makes the tune and the turn sound like they belong together. The last section is odd, however, and doesn’t really fit. It feels as if the model has begun a second piece that it doesn’t finish.
“The Doutlace” is almost too good (and I spent less than a minute to find it among the 32,000 recordings), so I wonder if the system has plagiarised. Searching for the main figure of its tune among the 23,962 ABC tunes of our collection turns up nothing (“A2\_s*e\_s*A\_s*c\_s*A\_s*d\_s*A”); however, I find its variation occurs in the tune of a reel called “The Bag of Spuds”:
It also appears in the tune of the reel “Matt Peoples'”:
and the turn of the funny reel “Clais An Adhmid”:
and the tune of the reel “The New Copperplate”
and that’s it! These five reels are not all that similar to each other, even though the same figure appears in all.
So, it seems that if our system has been copying its learning materials, it is not so easy to detect.
Still, there are a few ways I want to improve “The Doutlace,” in terms of sound, music, and play. Everything after the turn should be removed. The ending of the turn is good, but that of the tune is lacking. I change the third appearance of the figure in the tune such that the E is dropped to the D, which echoes its initial appearance. I also make G major more prominent in the last two measures of the tune. In the turn, I think the figure EAA and cAA appears too many times, so I vary its third and fifth appearance by raising the A notes a whole step to B. This clashes with the root (A) to create tension, and strengthens the downward resolution. The red boxes in the score below shows my changes, with a recording of its realisation:
T: The Doutlace (v2)
A2 eA cAdA|BAGA BG ~G2|A2 eA cAeA|decd BA A2|A2 eA cAdA|GABG AGEF|G2 EG FGAB| c2 BG AGEF:||:EAAB cABc|dBGA BcdB|cAAB cdec|dBBc dedB|ecAB cAAB|dBBc BAGF|EFGE FGAB|1 cABG A2 GF:|2 cABG A4||
All in all, just a few minor modifications to the output, and we now have a tune ready for the session. Now: can I put my name on it as the composer? Or have I merely edited the output of a nameless model? (Typical for folk music, the composer is lost to the sands of time.)
In the next parts, I will be taking a look at the “failures” of the model, and the opportunities they bring.
UPDATE: Here I am playing the piece.