Chicken Bits and Bits and Bobs

I have finished my newest composition, “Chicken Bits and Bits and Bobs”, co-composed with folk-rnn. Here’s the score!

This is my largest “traditional” composition to date (nearly all of my works are electroacoustic), so I am looking very forward to hearing how it sounds. Will it be a clucking failure? Computer says, “No”.

“Chicken Bits and Bits and Bobs” will be premiered by Ensemble x.y on May 23, 2017. Get your tickets now!

(In fact, the place where “Chicken Bits and Bits and Bobs” will be premiered is a stone’s throw from Stepney City Farm, where I have made many visits to observe their chickens and gather inspiration while composing this piece. Some of their chickens look like they are wearing pants. Of course, those pants-wearing chickens make an appearance in my piece. I couldn’t resist their swagger!)

Machine Folk in the Wild!

Following on from our article in The Conversation, ‘Machine folk’ music composed by AI shows technology’s creative side, we have uploaded to our YouTube channel, The Bottomless Tune Box, two recordings of machine folk music “in the wild”. Here, the fearless Daren Banarsë leads a group of musicians in performing two sets in public at The Harrison pub in London, on March 26 2017. One tune in each of the sets is computer generated. (To be clear: Set #1 features three tunes, and one of these is computer generated. Set #2 features two tunes, one of these is computer generated.)

Set #1 (live)

Set #2 (live)

Is this the first public session featuring computer generated tunes???

Lots of tickets are still available for the upcoming May 23 2017 concert! Here are some highlights:

  1. The world premier of Oded Ben-Tal‘s “Bastard Tunes” (a work arising from his co-creating with folk-rnn), written for and performed by Ensemble x.y!
  2. The world premier of Bob L. Sturm’s “Chicken Bits and Bits and Bobs” (a work arising from output that folk-rnn generated and titled “Chicken” — we think it has gained a sense of humour as well), written for and performed by Ensemble x.y!
  3. The world premier of a work by Nick Collins, written for and performed by Ensemble x.y!
  4. The world premier of some works generated by folk-rnn and harmonised by the Deep Bach system in the style of Bach chorales (to be performed on the organ at St. Dunstan’s)!
  5. The world premier of a work by the MorpheuS system, performed by QMUL’s Prof. Elaine Chew!
  6. Sets #1-3 performed by Daren Banarsë and friends!
  7. Three pieces featuring the Millennial Whoop!
  8. A performance by Cambridge Fellow, composer and Irish harper Uná Monaghan!
  9. A memorable experience in a beautiful and old cathedral!
  10. A wine reception featuring WINE and RECEPTIONING!

Get your tickets quick!

Benchmarking “music generation systems”?

A recent conversation on the Google magenta project is about benchmarking “music generation systems” like David Cope’s Experiments in Music Intelligence, magenta’s own recurrent neural networks, our folk-rnn system, the Iamus system, Pachet et al.’s flow machines, etc. This is a very good question because we who design and use these systems want to know whether some change has caused some benefit. Or in what ways our system is better and worse than others.

A google search of “Evaluating music generation systems” returns a lot of links. One of my past Paper of the Day blog posts discusses the article, C. Ariza, “The Interrogator as Critic: The Turing Test and the Evaluation of Generative Music Systems”, Computer Music Journal 33(2): 48–70, 2009. It’s a good article: easy to read and hard to forget. A recent article in a special journal issue devoted to music metacreation focuses on evaluating such systems: K. Agres, J. Forth, and G. A. Wiggins, “Evaluation of Musical Creativity and Musical Metacreation Systems” ACM Computers in Entertainment, Fall 2016. My collaborator Oded Ben-Tal and I will have an article appearing in the second issue of the Journal of Creative Music Systems titled, “Back to music practice: The evaluation of deep learning approaches to music transcription modelling and generation.” In this post, I summarise our upcoming article, and add some new material.

There are many approaches to evaluating “music generation systems”, but the first step is to understand that music is a human-centred activity steeped in rules and conventions that change with function, use, time and place. We try to stay consistent in our description of our folk-rnn system as a music transcription generation system. It is merely generating something that one can use to make music happen. Such a sentiment is echoed in Simon Colton’s 2016 NIPS critique of style transfer (here is a description). One cannot mistake the artifact for the art. Music is a process centred upon humans operating within culture(s).

This means evaluating “music generation systems” in any meaningful way must take into account human behavior and modern culture. No doubt this can be seen as inconvenient because it doesn’t immediately lead to computer-based experiments, cross-validation and the like, which produce with little effort a lot of numbers that can be averaged and compared in objective ways. The phenomenon of music is not so poor that it can be described in such a reduced way. So, when it comes to evaluating “music generation systems”, human or artificial, we need to use a set of methods that are more rich and relevant than comparing training loss curves, log probabilities of sequences, average test errors, and so on.

The second step to evaluating “music generation systems” is to be clear about what “evaluation” or “benchmarking” means. What is the question one is trying to ask through an evaluation? What is the latent quality one is trying to gauge or compare? How do we know a significant result in the lab is a practical result in the real world? As Kiri Wagstaff so nicely argues in her provocative 2012 keynote to ICML, doing machine learning that matters requires taking these tools back to practitioners to measure their real world impacts and failings.

Along these lines, we have thought of a variety of approaches to evaluate our folk-rnn system, and compare it with other generative approaches, each motivated by a different question and application.

  1. First-order sanity check: How do basic descriptive statistics compare between the training material and generated material? This is not about music at this level, but seeing the ways two datasets are similar or different.
  2. Music analysis: How well does a generated transcription work as a composition? How well does it exhibit structure? How coherent is it? How well do its functional elements work, such as melodic ideas, tension and resolution, repetition and variation, etc.? What should be changed, and how, to improve the piece?
  3. Performance analysis: How well does a generated transcription play? Are there awkward transitions, or unconventional figures? Something may “sound right” but not “play right”.
  4. Listening analysis: How plausible does the generated material sound when played? Listening to synthesized recordings is one thing, but hearing it played is another.
  5. Nefarious testing: How does the system behave when we push it outside its “comfort zone”? How fragile/general is its “music knowledge”? Our folk-rnn system seems  able to count in order to correctly place measure lines, but this ability evaporates with minor modifications of an initialisation seed.
  6. Assisted composition: How well does the system contribute to the music composition “pipeline”, both in the conventions of the training data, but also outside? Is it useful for composition? In what ways does it frustrate composition?
  7. Cherry picking: How hard is it to find something of interest, something good, something really good, in a bunch of material generated by a system? Is one in every 20 generated transcriptions good?
  8. Inspiration and stimulation: How does the generated material inspire and stimulate a composer/musician to create music?
  9. Paradigm shifting: To what extent does the output, or thought of the machine generating such material, provoke anxiety in a practitioner? To what extent does it challenge assumptions of creativity and the music making process?

The organisation of our coming workshop and concert provides excellent opportunities for evaluation by bringing our system back to the practioners. In a very real sense, the workshop and concert are just what’s visible to the public. Each event is a vehicle of   opportunities for working together with experts and professional music makers to evaluate dimensions of folk-rnn and other generative systems (we have been testing the magenta basic-rnn model recently, as well as simple Markov chains). In the end, we are not going to have numbers and statistics, but qualitative observations that we will somehow have to translate into engineering decision making!