At the 2018 Joint Workshop on Machine Learning for Music at ICML, I delivered my talk “How Stuff Works: LSTM Model of Folk Music Transcriptions.” Here are my slides: Sturm_ICML2018
While I don’t yet have a complete picture of how this folk-rnn model is working, my talk illuminated a bit more about the workings of its first LSTM layer. In particular, we see that the gates of this layer (before nonlinearities) are mapping the input into different subspaces depending on the type of token (made possible by choosing the number of LSTM units in this layer to be approximately four times the number of vocabulary elements). We also find some neat emergent behaviours at the first layer, such as its similar treatment of octave pitches and enharmonics. We also see each gate is using information fused from the other three gates in the layer’s hidden state (from the previous time step). The next challenge that lies ahead now is figuring out how to talk about these things considering the nonlinearities (each a one-to-one mapping). Then we can move to interpret the second and third LSTM layers. And finally we can link this with our understanding of the softmax layer, described in my paper, “What do these 5,599,881 parameters mean? An analysis of a specific LSTM music transcription model, starting with the 70,281 parameters of the softmax layer” presented at MuMe 2018. (The slides for that talk are here: Sturm_MuMe2018.)
In general, this workshop was excellent! There was a nice selection of talks and posters addressing problems with music data in either symbolic and acoustic domains, or both tied together. The results of the deep singing voice synthesis work of Gómez et al. and Cheng-Wei et al. are very impressive. Plus Gómez took time to highlight ethical issues surrounding such work, as well as the pursuit of research in general. Also, the generated piano music samples of Huang et al. (symbolic) are simply amazing. Have a listen here. The happy hour concluding the workshop was as intellectually stimulating as the rest of the day! Thanks to Pandora for facilitating this event, and big kudos to the orgainsers, especially Erik Schmidt and José Iñesta for running the event.