PhD position open!

Via Emmanuel Vincent:

We are offering a fully funded PhD position on deep learning for musical structure analysis and generation. Please note the tight application deadline (July 22). We will review applications on a continuous basis until then.

TITLE: Deep learning for musical structure analysis and generation
LAB: Inria Rennes & Inria Nancy, France
SUPERVISORS: Frédéric Bimbot & Emmanuel Vincent
STARTING DATE: October 2016 or later (until January 2017)
TO APPLY: send a CV, a motivation letter, a list of publications, and one or more recommendation letters to as soon as possible and no later than July 22, 2016

Inria is the biggest European public research institute dedicated to computer science. The PANAMA team ( and the MULTISPEECH team ( each gather 20+ scientists with a focus on machine learning and signal processing for music, speech, and general audio.

Despite numerous studies on automatic music transcription and composition, the temporal structure of music pieces at various time scales remains difficult to model. Automatic music improvisation systems such as OMax [1] and ImproteK [2] assume that the structure is either predetermined (chord chart) or completely free, which limits their use to specific musical styles. The concepts of semiotic structure [3] and Contrast & System [4] we recently introduced helped defining musical structure in a more general way. Yet, they do not easily translate into a computational model due to the large temporal horizon required and to the semantic gap with the observed musical signal or score. In the last few years, deep learning [5] has emerged as the new state of the art in the field of natural language processing (NLP) and it has already demonstrated its potential for modeling short-term musical structure [6, 7].

The goal of this PhD is to exploit and adapt deep recurrent neural networks (RNNs) for modeling medium- and long-term musical structure. This involves the following tasks in particular:
– designing new RNN architectures for jointly modeling music at several time scales: tatum, beat, bar, structural block (e.g., chorus or verse), whole piece,
– training them on smaller amounts of data than in the field of NLP,
– evaluating their performance for musical structure estimation and automatic music improvisation.

This position is part of a funded project with Ircam, in which the successful candidate will have the opportunity to engage.

MSc in computer science, machine learning, or a related field.
Programming experience in Python or C/C++.
Previous experience with music and deep learning is not required but would be an asset.

[3] F. Bimbot, G. Sargent, E. Deruty, C. Guichaoua, E. Vincent, «Semiotic description of music structure: an introduction to the Quaero/Metiss structural annotations», in Proc. AES 53rd International Conference on Semantic Audio, 2014.
[4] F. Bimbot, E. Deruty, G. Sargent, E. Vincent, «System & Contrast : A polymorphous model of the inner organization of structural segments within music pieces», Music Perception, 2016.
[5] L. Deng, D. Yu, Deep Learning: Methods and Applications, Now Publishers, 2014.
[6] N. Boulanger-Lewandowski, Y. Bengio, P. Vincent, «Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription», in Proc. International Conference on Machine Learning (ICML), 2012.
[7] I.-T. Liu, B. Ramakrishnan, «Bach in 2014: Music composition with recurrent neural network», arXiv:1412.3191, 2014


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s