Submit to this Special Issue on Data Science: Machine Learning for Audio Signal Processing

IEEE Journal of Selected Topics in Signal Processing (J-STSP)

Special Issue on Data Science: Machine Learning for Audio Signal Processing

Important Dates:

  • Submission deadline: October 1, 2018
  • 1st review completed: December 1, 2018
  • Revised manuscript due: February 1, 2019
  • 2nd review completed: March 1, 2019
  • Final Manuscript due: April 1, 2019
  • Publication: May, 2019

Slides from 2018 ICML Workshop: Machine Learning for Music

At the 2018 Joint Workshop on Machine Learning for Music at ICML, I delivered my talk “How Stuff Works: LSTM Model of Folk Music Transcriptions.” Here are my slides: Sturm_ICML2018

While I don’t yet have a complete picture of how this folk-rnn model is working, my talk illuminated a bit more about the workings of its first LSTM layer. In particular, we see that the gates of this layer (before nonlinearities) are mapping the input into different subspaces depending on the type of token (made possible by choosing the number of LSTM units in this layer to be approximately four times the number of vocabulary elements). We also find some neat emergent behaviours at the first layer, such as its similar treatment of octave pitches and enharmonics. We also see each gate is using information fused from the other three gates in the layer’s hidden state (from the previous time step). The next challenge that lies ahead now is figuring out how to talk about these things considering the nonlinearities (each a one-to-one mapping). Then we can move to interpret the second and third LSTM layers. And finally we can link this with our understanding of the softmax layer, described in my paper, “What do these 5,599,881 parameters mean? An analysis of a specific LSTM music transcription model, starting with the 70,281 parameters of the softmax layer” presented at MuMe 2018. (The slides for that talk are here: Sturm_MuMe2018.)

In general, this workshop was excellent! There was a nice selection of talks and posters addressing problems with music data in either symbolic and acoustic domains, or both tied together. The results of the deep singing voice synthesis work of Gómez et al. and Cheng-Wei et al. are very impressive. Plus Gómez took time to highlight ethical issues surrounding such work, as well as the pursuit of research in general. Also, the generated piano music samples of Huang et al. (symbolic) are simply amazing. Have a listen here. The happy hour concluding the workshop was as intellectually stimulating as the rest of the day! Thanks to Pandora for facilitating this event, and big kudos to the orgainsers, especially Erik Schmidt and José Iñesta for running the event.


“Dialogues with folk-rnn” at NIME 2018

Luca Turchet will be performing his composition, “Dialogues with folk-rnn”, at the June 5 concert of NIME 2018. A version of this work was premiered at the Nov. 20 2017 concert in London. He constructed this work from 19 transcriptions generated by folk-rnn, all available in the “folk-rnn v2 Session Book, volume 1 (of 10)“.

Related: the folk-rnn composition competition is currently running! Spread the word. Start generating and composing!

Softmax Polka, a folk-rnn (v2) original

The Tune of the Month for May at is the Tip Top Polka. So I decided to learn the tune and see what folk-rnn v2 would generate when primed with its first four bars (but transposed to minor to contrast with the happy polka). After many trials (folk-rnn v2 doesn’t seem to like duple meter), I finally got something that was worth learning (using “thesession_with_repeats”, seed 687677, temperature 1, and prime tokens M:2/4 K:Cmin c /2 d /2 |: e e e f | c c c A /2 B /2 | c c c e | B 3):

B/2c/2|:dd de |BB BG/2A/2 |BB Bd |
A3c |dB de |fB BA |GA G>F |
G4 :||:d2 df|df d/2e/2d/2c/2 |B3c |
f2 ef/2e/2 |d2 df |df df |ec Ac |

Screen Shot 2018-05-26 at 16.14.50.png

Here I play it together with the Tip Top Polka on the Mean Green Machine Folk Machine.

folk-rnn composition competition!

Spread the news far and wide!

The aim of this competition is to explore an application of machine learning to music — in particular the online tool at

This model is an example of artificial intelligence trained on traditional tunes, mainly Irish and English. The web interface allows users to generate new melodies using a few parameters (There’s a video on the website that explains how it works). We are seeking works that make creative use of this tool to compose new pieces, which do not need to adhere to the idiom of the training material.

Submissions will be judged on their musical quality and their utilisation of outputs from The winning piece will be performed by a professional ensemble at a public concert in London, UK, in early October 2018. Professional recordings of the performance will be provided to composers, and with the permission of musicians and composers, made available on the project youtube channel:

We welcome submissions from any composer without restriction of age or nationality. Attendance at the concert is not mandatory. There is no cost for submitting a work.

Rules for submitted works:

1. scored for any combination of the following instruments: flute, clarinet, violin, cello, and piano (only 1 each); no use of amplification or electronic instruments is allowed;

2. no longer than 10 minutes in duration;

3. must be derived in some way from material generated by the application at;

4. must be accompanied by a written explanation of how the work comes about from the use of artificial intelligence through the website. Composers can also accompany the text with illustrations (e.g. staff notation).

5. no restrictions on style, or the way the outputs from are used;

Important dates:
– August 31 2018: Submission of PDF score and required accompanying material by email to
– September 15 2018: Notification
– September 25 2018: Performance materials due
– October 9 2018: Concert (London UK)

For more information about the technology, see the following:

If you have questions or comments, contact Dr. Oded Ben-Tal:

Recording of the April machine folk session


As part of an outreach day at my university QMUL, I organised a group of musicians to play 7 sets of machine folk music. We played 14 tunes (5 of which are real traditional tunes). Here’s the set list:

  1. March to the mainframe (X:488, folk-rnn v2) with The Glas Herry Comment (folk-rnn v1)
  2. The Mal’s Copporim (folk-rnn v1) with Off to California (traditional)
  3. X:1166 (folk-rnn v2) with Rochdale Coconut Dance (traditional)
  4. Oats and Beans (traditional) with Optoly Louden (folk-rnn v1)
  5. Why are you and your 5,599,881 parameters so hard to understand? (folk-rnn v2) with Why are you still singing even when reduced to a 30-dimensional subspace? (folk-rnn v2 with dimension reduction of softmax layer parameters)
  6. The Portobello Hornpipe (traditional) with The 2714 Hornpipe (folk-rnn v2)
  7. Two Burner Brew No. 1 (folk-rnn v2) with The Hairpin Bend (traditional)
Musicians included: Bob Sturm (button accordion), Sandy Rogers (fiddle), Luca Turchet (mandolin), Emmanouil Benetos (piano accordion), Michael Mcloughlin (tin whistle), Dan Stowell (melodica and bones), and Cornelia Metzig (guitar).

Other sounds courtesy of East London Ambulance service and the QMUL clock tower (donging at 16h).