CFP: 7th International Workshop on Musical Metacreation


((( MUME 2019 )))
The 7th International Workshop on Musical Metacreation
June 17-18, 2019, Charlotte, North Carolina

MUME 2019 is to be held at the University of North Carolina Charlotte in conjunction with the 10th International Conference on Computational Creativity, ICCC 2019 (

=== Important Dates ===
Workshop submission deadline: February 24, 2019
Notification date: April 28, 2019
Camera-ready version: May 19, 2019
Workshop dates: June 17-18, 2019

Metacreation applies tools and techniques from artificial intelligence, artificial life, and machine learning, themselves often inspired by cognitive and natural science, for creative tasks. Musical Metacreation studies the design and use of these generative tools and theories for music making: discovery and exploration of novel musical styles and content, collaboration between human performers and creative software “partners”, and design of systems in gaming and entertainment that dynamically generate or modify music.

MUME intents to bring together artists, practitioners, and researchers interested in developing systems that autonomously (or interactively) recognize, learn, represent, compose, generate, complete, accompany, or interpret music. As such, we welcome contributions to the theory or practice of generative music systems and their applications in new media, digital art, and entertainment at large.

We encourage paper and demo submissions on MUME-related topics, including the following:
— Models, Representation and Algorithms for MUME
—- Novel representations of musical information
—- Advances or applications of AI, machine learning, and statistical techniques for generative music
—- Advances of A-Life, evolutionary computing or agent and multi-agent based systems for generative music
—- Computational models of human musical creativity
— Systems and Applications of MUME
—- Systems for autonomous or interactive music composition
—- Systems for automatic generation of expressive musical interpretation
—- Systems for learning or modeling music style and structure
—- Systems for intelligently remixing or recombining musical material
—- Online musical systems (i.e. systems with a real-time element)
—- Adaptive and generative music in video games
—- Generative systems in sound synthesis, or automatic synthesizer design
—- Techniques and systems for supporting human musical creativity
—- Emerging musical styles and approaches to music production and performance involving the use of AI systems
—- Applications of musical metacreation for digital entertainment: sound design, soundtracks, interactive art, etc.
— Evaluation of MUME
—- Methodologies for qualitative or quantitative evaluation of MUME systems
—- Studies reporting on the evaluation of MUME
—- Socio-economical Impact of MUME
—- Philosophical implication of MUME
—- Authorship and legal implications of MUME

Submission Format and Requirements
Please make submissions via the EasyChair system at:

The workshop is a day and a half event that includes:
-Presentations of FULL TECHNICAL PAPERS (8 pages maximum)
-Presentations of POSITION PAPERS and WORK-IN-PROGRESS PAPERS (5 pages maximum)
-Presentations of DEMONSTRATIONS (3 pages maximum) which present outputs of systems (working live or offline).

All papers should be submitted as complete works. Demo systems should be tested and working by the time of submission, rather than be speculative. We encourage audio and video material to accompany and illustrate the papers (especially for demos). We ask that authors arrange for their web hosting of audio and video files, and give URL links to all such files within the text of the submitted paper.

Submissions do not have to be anonymized, as we use single-blind reviewing. Each submission will be reviewed by at least three program committee members.

Workshop papers will be published as MUME 2019 Proceedings and will be archived with an ISBN number. Please use the updated MuMe paper template to format your paper. Also please feel free to edit the licence entry (at the bottom left of the first page of the new template). We created a new MUME 2019 template based on AAAI template. The MUME 2019 latex and Word template is available at:

Submission should be uploaded using MUME 2019 EasyChair portal:

For complete details on attendance, submissions and formatting, please visit the workshop website:

Presentation and Multimedia Equipment:
We will provide a video projection system as well as a stereo audio system for use by presenters at the venue. Additional equipment required for presentations and demonstrations should be supplied by the presenters. Contact the Workshop Chair to discuss any special equipment and setup needs/concerns.

It is expected that at least one author of each accepted submission will attend the workshop to present their contribution. We also welcome those who would like to attend the workshop without presenting. Workshop registration will be available through the ICCC 2019 conference system.
MUME 2019 builds on the enthusiastic response and participation we received for the past occurrences of MUME series:
MUME 2012 (held in conjunction with AIIDE 2012 at Stanford):
MUME 2013 (held in conjunction with AIIDE 2013 at NorthEastern):
MUME 2014 (held in conjunction with AIIDE 2014 at North Carolina):
MUME 2016 (held in conjunction with ICCC 2016 at Université Pierre et Marie Curie):
MUME 2017 (held in conjunction with ICCC 2017 at Georgia Institute of Technology):
MUME 2018 (held in conjunction with ICCC 2018 at Salamanca University):

Questions & Requests
Please direct any inquiries/suggestions/special requests to one of the Workshop Chairs, Bob ( or Bob (

Workshop Organizers

Program Co-Chair
Robert M. Keller, Professor
Computer Science Department
Harvey Mudd College
301 Platt Blvd
Claremont, CA 91711 USA

Program Co-Chair
Bob L. Sturm, Associate Professor
Tal, Musik och Hörsel (Speech, Music and Hearing)
Lindstedtsvägen 24
School of Electronic Engineering and Computer Science
Royal Institute of Technology KTH, Sweden

Concert Chair
Gus Xia, Assistant Professor
Computer Science
NYU Shanghai

Publicity Chair
Dr. Oliver Bown
Senior Lecturer
Faculty of Art & Design, The University of New South Wales
Room AG12, Cnr Oxford St & Greens Rd,
Paddington, NSW, 2021, Australia



MUME Steering Committee

Andrew Brown, Griffith University, Australia
Michael Casey, Dartmouth College, US
Arne Eigenfeldt, Simon Fraser University, Canada
Anna Jordanous, University of Kent, UK
Bob Keller, Harvey Mudd College, US
Róisín Loughran, University College Dublin, Ireland
Philippe Pasquier, Simon Fraser University, Canada
Benjamin Smith, Purdue University Indianapolis, USA


Machine Folk is Spreading!

Here’s a wonderful performance of three machine folk tunes played by The Continental Ceili Combo (Tijn Berends and Bas Nieraeth). This mini-concert happened at the IMPAKT “Robots Love Music” exhibition at the wonderful Museum Speelklok in Utrecht, Oct. 3 2018.

The first two tunes they play are “The Hills Of Dorrectance” and “The Temples Of The Warvon“, both by folk-rnn v1 (and also titled by the model). The third tune they play is #3809 by folk-rnn v3.

Thanks to Luba Elliott and the Impakt Festival for making this event happen!

Horses in Umeå!

I’m happy to be speaking at the 6th Swedish Workshops on Data Science at Umeå University, Nov. 20-21 2018.

Title: Be a responsible data scientist: Identify and tame your “horses”

Abstract: A “horse” is a system that is not actually addressing the problem it appears to be solving. The inspiration for the metaphor is the real-life example of Clever Hans, a horse that appeared to have great skill in mathematics but had actually learned to respond to a prosaic cue confounded with the correct answer. Similarly, a model created through the statistical treatment of a large dataset and wielded by a data scientist can also appear successful for solving a complex problem, but  actually not be. In this talk, I take a critical look at past applications of data science — exemplifying contemporary practices — and identify where issues arise that affect the validity of conclusions. I argue that the onus is on the data scientist to not stop at describing how well a model performs on a given dataset (no matter how big it may be), but to go further and explain what they with their models are actually doing. I provide some examples of how researchers have identified and tamed “horses” in my research domain, music informatics.

Po’D (Paper of the day): Explaining explainations edition

Hello, and welcome to my Po’D (Paper of the day): Explaining explainations edition. Today’s paper contributes a survey of explainable AI (XAI): L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, L. Kagal, “Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning”, eprint arXiv:1806.00069, May 2018. My interest in this area stems from: 1) horses; 2) ethics; and 3) the development of new  methods of research and development in my field.

My one line precis of this paper: To address mistrust of AI, XAI should focus on interpretability and explainability.

Explainable AI (XAI) is exactly what it says: building systems that “explain” their learned behaviors, or at least are “transparent” enough to “interpret”, and by which address a lack of public trust in AI. A major impediment here — as in much of engineering — is one of definition: what is meant by “transparent”, “to interpret”, and “to explain”? And to whom: an experienced engineer or a lay person? In a sense, we know these things when we see it in the context of some use case; but there is a need for standardising these criteria and systematising their evaluation in developing XAI.

Gilpin et al. argue that there is a need for interpretation and explaination. For Gilpin et al., the difference between the two terms seems to be in who is acting. “Interpretation” involves someone/thing trying to link the behavior of a system with its input (e.g., activiation maximisation in a deep image content recognition system). “Explaination” involves the system giving reasons for its behavior (e.g., a medical diagnostics system presenting a diagnosis and highlighting reasons for it).

Gilpin et al. do not explicitly define “an explaination”, but they propose two qualities of one: its “interpretability” and its “completeness”. The first is measured along lines of human comprehension. The second is measured along lines of inferring future behaviours of the system. An example they provide is an explaination that consists of all parameters in a deep neural network. It is complete because one can infer how the system will behave under any other input, but it is not interpretable because one cannot quickly comprehend how the behavior arises from the parameters. Gilpin et al. suggest that the challenge of XAI is to create explainations that are both interpretable and complete (which they see as a tradeoff), and which increase the public trust in that system.

Gilpin et al. identify three different approaches to XAI in research using deep neural networks (which they propose as a taxonomy of XAI): 1) explaining the processing of data by the network (e.g., LIME, salience mapping); 2) explaining the representation of data in the network (e.g., looking at the filters in each layer, testing layers in other tasks); and 3) making the system explainable from the start (e.g., disentangled representations, generated explanations). They also review some past work on interpretability and XAI.

Gilpin et al. conclude with a look at types of evaluation within each of the three approaches to XAI they identify. These include completeness for data processing; detecting biases for data representation; and human grading of produced explainations.

There’s lots of good references in this article some of which I would like to read now:

  1. B. Herman, “The promise and peril of human evaluation for model interpretability,” arXiv preprint arXiv:1711.07414, 2017.
  2. Q.-s. Zhang and S.-C. Zhu, “Visual interpretability for deep learning: a survey,” Frontiers of Information Technology & Electronic Engineering 19(1):27–39, 2018.
  3. A. S. Ross, M. C. Hughes, and F. Doshi-Velez, “Right for the right reasons: Training differentiable models by constraining their explanations,” arXiv preprint arXiv:1703.03717, 2017.
  4. F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint: arXiv:1702.08608, 2017.
  5. A. Abdul, J. Vermeulen, D. Wang, B. Y. Lim, and M. Kankanhalli, “Trends and trajectories for explainable, accountable and intelligible systems: An HCI research agenda,” in Proc. CHI Conf. on Human Factors in Computing Systems, 2018.
  6. R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti, “A survey of methods for explaining black box models,” arXiv preprint arXiv:1802.01933, 2018.

I only have three criticisms of this article. Its title can better reflect its contents, e.g., “A survey of XAI”. It doesn’t propose an approach to evaluating interpretability of machine learning. (Or maybe its “approach” is via the taxonomy?) The second criticism is that the suitcase words of “explaination” and “interpretability” remain just as vague by the end, and are even accompanied by another suitcase word, e.g., “completeness”. Definitions of these things are difficult to pin down. It’s nice to see Gilpin et al. go to some philosophical works to help with the discussion, but by the end I’m not sure we have anything more solid than, “I know it when I see it.” Finally, the discussion of the evaluation could be made more concrete by appealing to specific use cases of AI. I think notions of “explainability” make the most sense when put in a context of use, whether it’s by the machine learning researcher, or a bank loan officer working with a system. Anyhow, this article was well worth the time I spent reading it.

=== Update: Nov. 1 2018 (after reading group discussion)

In their review of past work explaining deep network representation, Gilpin et al. suggest transfer learning as a way of understanding what layers are doing. I don’t agree with this. This is like answering the question, “Why did my barber do a good job cutting my hair? Look, he also cooks good food.” Inferring what a layer must be doing by looking at its performance on a different task is just guessing.

No attention is paid to the harms of interpretability. For instance, spammers can adjust their strategy by adapting content to subvert explainable spam detection systems.

Going to use the Nottingham Music Database?

The “Nottingham Music Database” (NMD) has been appearing more and more in applied machine learning research and teaching over the past few years. It’s been used in tutorials on machine learning, and even educational books on deep learning projects. It’s been fun to generate music with computers for a very long time.

The music generation start-up company Jukedeck put some effort into cleaning an ABC-converted version of the database, offering it on github. Most recently, NMD appears in this submission to ICLR 2019: HAPPIER: Hierarchical Polyphonic Music Generative RNN. Seeing how that paper uses NMD, and the conclusions it draws from the music generated by the models it creates, I am motivated to look more closely at the NMD, and to propose some guidelines for using it in machine learning research.

Here is the source page of the “Nottingham Folk Music Database” by Eric Foxley, which “contains about 1200 folk melodies, mostly British & American. They mostly come from the repertoire over the years of Fred Folks Ceilidh Band, and are intended as music for dancing.” It is a very personal collection, as Foxley describes: “Most tunes have been collected over a lifetime of playing (which started when I sat in at the back of many bands in the London area and elsewhere from the age of 12 onwards), and the sources from whom I learnt the tunes are acknowledged. These are all collected “by ear”, and details change over time. The arrangements, harmonies, simplifications are entirely mine. Where there is a known printed source, that is included. I apologise for any unknowing omissions of sources, and would be happy to add them.” Based on the date of Foxley’s website, this collection seems to have been assembled before 2001.

Foxley provides a description of the contents here:

  • “Jigs. This directory contains about 350 6/8 single (mostly “crochet-quaver” per half bar) and double jigs (mostly quavers).
  • Reels. 2/4 and 4/4. This includes about 460 marches, polkas, rants etc.
  • Hornpipes. These are played (but not written) dotted. We include about 70 hornpipes, schottisches and strathspeys. See the “Playing for Dancing” document for the distinction.
  • Waltzes. About 50 tunes with 3/4 time signature.
  • Slip jigs. These are jigs in 9/8 time.
  • Miscellaneous. This directory contains just a few tunes, which we play mainly for listening to, when dancers need a breather.
  • Morris. Just a sample few, about 30. They include some chosen for listening to, and some from the Foresters Morris Men’s repertoire.
  • Some Christmas ones (15).
  • About 45 tunes from the Ashover collection, provided by Mick Peat.”

Not listed there, but included in The Tunes, are tunes taken from Playford’s 1651 book, The Dancing Master.

Foxley provides a note on the distribution of the database: “We are happy for others to use tunes from our repertoire; after all, the tunes [we] use were picked up from others, and the traditional tunes are best! We just hope that you play them properly and carefully, not as streams of notes but as phrased music making folks want to dance.”

Foxley also provides a warning: “The melodies as stored are my interpretation of the essence of the tune. Obviously no respectable folk musician actually plays anything remotely like what is written; it is the ornamentation and variation that gives the tune its lilt and style.”

Foxley appears to have assembled his collection for a few different purposes: 1) a collection for his group’s own music practice playing for dances and other events (see this page of tunes for specific weddings); 2) as material for researching music analysis and search and retrieval by computers.

NMD is thus a personal collection of an English folk music enthusiast and computer scientist with decades of experience in playing and dancing to this kind of music. Much of the collection is focused on dance music (jigs, reels, hornpipes, waltzes, slip jigs, Morris, Playford’s ), but some of it is specialised (Miscellaneous, Christmas). A small portion of the collection comes from another person (Mick Peat). While it is an extensive collection for a single person, it is not extensive for a tradition (compare to the Morris music collection at The Morris Ring). It should be emphasised what Foxley says: NMD is his collection of rough transcriptions of tunes that should never be performed as written, but when performed well should make “folks want to dance”.

Here’s the first three guidelines for using the NMD:

1. Do not believe that when you train a model on sequences from the NMD that your model is learning about music. Your trained model may show a good fit to held out sequences in NMD. Do not believe that this means it has learned about the music represented by the NMD. Your model is learning about sequences in the NMD. Those sequences are not music, but impoverished, coarse and arbitrary representations of what one experiences when this particular kind of music is performed. Also, the music represented in NMD is not “polyphonic”. Each sequence of NMD provides a sketch of the melody (which all melody instruments play), and harmonic accompaniment (which is not always present).

2. If you are working with a generative model, your trained model may produce sequences that appear to you like the sequences in NMD. Do not convert those sequences to MIDI and then listen to an artificial performance of them to judge their success. Do not submit those synthetic examples together with synthetic examples of tunes from NMD to a listening test and ask people to rate how pleasant each is. Do not assume that someone with a high degree of musical training knows about the kind of music represented in the NMD.

3. Find an expert in the kind of music represented in the NMD and work with them to determine the success of your model. That means you should submit sequences generated by your model trained on NMD to these experts so that they can evaluate them according to performability and dancability.

Let’s have a look at a real example from NMD. I choose one at random among those I have experience playing. Here’s Foxley’s transcription of “Princess Royal” from what he says is the Abingdon Morris tradition:

title = "\f3Princess Royal\fP";
ctitle = "AABCBCB";
rtitle = "\f2Abingdon\fP";
timesig = 4 4;
key = g;
autobeam = 2;
bars = 33.

d^<'A' c^< |
b"G" a"D" g"G" d^< c^< |
b"G" a"D" g"G" g^ |
e^."C" d^< c^ e^ |
d^."G" c^< b d^ |
c^ "Am" b "g" a "f+" g "e" |
f<"D7" g< a< "c+" f< d "b" d^< "a" c^< |
b<"G" a< b< g< a"D7" f | g>"G" g :| \endstave.

e^.'B'"C" e^< e^ d^ | e^"C" f^"d" g^>"e" |
g^"C/e" f^"d" e^"c" d^"b" |
b<"G/d" a< g< b< a >"D7" |
g"G" g a."D7" a< |
b<"G" a< g g^. f^< | g^"G" d^ e^>"C" |
d^"G" b c^>"C" | \endstave.
\5,8 |! \continue.

d^ 'C' c^ |
b>"G" a>"D" |
g>"Em" d^"D7" c^ |
\-2 |
g>"Em" g^> | \endstave.
e^>."C" d^ |
c^>"C" e^> |
d^>."G" c^ |
\timesig = 2 4. b."G" d^< |
\timesig = 4 4. c^"Am" b "g" a "f+" g "e"|
\6,8 |! \endstave.


Here’s the ABC conversion from the Sourceforge NMD:

X: 20
T:Princess Royal
% Nottingham Music Database
d/2c/2|"G"B"D"A "G"Gd/2c/2|"G"B"D"A "G"Gg|"C"e3/2d/2 ce|"G"d3/2c/2 Bd|
"Am"c"g"B "f#"A"e"G|"D7"F/2G/2"c#"A/2F/2 "b"D"a"d/2c/2|\
"G"B/2A/2B/2G/2 "D7"AF|"G"G2 G:|
"C"e3/2e/2 ed|"C"e"d"f "e"g2|"C/e"g"d"f "c"e"b"d|"G/d"B/2A/2G/2B/2 "D7"A2|\
"G"GG "D7"A3/2A/2|"G"B/2A/2G g3/2f/2|
"G"gd "C"e2|"G"dB "C"c2|"Am"c"g"B "f#"A"e"G|\
"D7"F/2G/2"c#"A/2F/2 "b"D"a"d/2c/2|"G"B/2A/2B/2G/2 "D7"AF|"G"G2 G||
dc |"G"B2 "D"A2|"Em"G2 "D7"dc|"G"B2 "D"A2|"Em"G2 g2|"C"e3d|"C"c2 e2|"G"d3c|
"Am"c"g"B "f#"A"e"G|"D7"F/2G/2"c#"A/2F/2 "b"D"a"d/2c/2|\
"G"B/2A/2B/2G/2 "D7"AF|"G"G2 G||

Here’s the ABC from the Jukedeck NMD cleaned collection:

X: 20
T:Princess Royal
% Nottingham Music Database
d/2c/2|"G"B"D"A "G"Gd/2c/2|"G"B"D"A "G"Gg|"C"e3/2d/2 ce|"G"d3/2c/2 Bd|
"Am"cB AG|"D7"F/2G/2A/2F/2 Dd/2c/2|\
"G"B/2A/2B/2G/2 "D7"AF|"G"G2 G:|
"C"e3/2e/2 ed|"C"ef g2|"C/e"gf ed|"G/d"B/2A/2G/2B/2 "D7"A2|\
"G"GG "D7"A3/2A/2|"G"B/2A/2G g3/2f/2|
"G"gd "C"e2|"G"dB "C"c2|"Am"cB AG|\
"D7"F/2G/2A/2F/2 Dd/2c/2|"G"B/2A/2B/2G/2 "D7"AF|"G"G4||
zz dc |"G"B2 "D"A2|"Em"G2 "D7"dc|"G"B2 "D"A2|"Em"G2 g2|"C"e3d|"C"c2 e2|"G"d3c|
"Am"cB AG|"D7"F/2G/2A/2F/2 Dd/2c/2|\
"G"B/2A/2B/2G/2 "D7"AF|"G"G4||

There’s something unusual in the Jukedeck processing. First, there is an F section that does not appear in the others, but just acts to balance the 3-beat bar before. Second, many of the bass notes (specified by a lower case letter) have been stripped out. Anyhow, by and large Foxley’s version and the Sourceforge NMD appear the same.

Let’s get a feeling for how this sequence becomes music, and how that functions together with a dancer. Below is the staff notation of the Abingdon version of Princess Royal (Foxley’s PDF resulting from his transcription) along with a video of a performance.

Screen Shot 2018-09-30 at 12.13.35 PM.png

There are several important things to notice here. 1) The written and performed melodies deviate in many places, just as Foxley says they should; 2) The accompanying harmony here is sometimes not what is notated; 3) The musician closely follows the dancer, allowing enough time for them to complete the steps (hops and such).

When it comes to the notated version of the sequence, look at how the parts are structured and how they relate to one another. In the A part, bars 5-8 relate to bars 1-4. Patterns in bars 3 and 4 mimic those in bars 2 and 3. The B part contrasts with A, but its conclusion echoes that of A. The first 7 bars of part C is the first four bars of part A with doubled note lengths; and its last four bars are the last four bars of part A. There’s a lot of structure there! And these kinds of structures and patterns exist throughout the sequences in NMD.

Here’s some more guidelines.

4. Look at how the sequences generated by your model trained in the NMD exhibit the same kind of structures and patterns of the sequences in the NMD. Are there similar kinds of repetitions and variations? How do the sections relate together? If you don’t see any of these kinds of things, your model is not working. If you don’t know what to look for, see guideline 3.

5. Do not train your sequence model on a MIDI conversion of the NMD. They are not the same. (The MIDI file created by Jukedeck from the tune above also has the wrong structure — AAAABCBCB instead of AABCBCB. Other midi files there are sure to have similar problems.) Training on MIDI conversions of the NMD will also add a lot more complexity to your model, and make training less effective. The ABC notation makes sequences that are quite terse, so why not take advantage of that?

Now let’s have a look at one of the examples generated by the HAPPIER model:

Screen Shot 2018-10-02 at 3.02.02 PM.pngThe very first event shows something is very wrong. Overall, the chord progression makes no sense, the melody is very strange, and the two do not relate. There is none of the repetition and variation we would expect given the NMD. None of the four examples presented in the HAPPIER paper look anything like music from the NMD. There is some step wise motion, so the HAPPIER model has that going for it; but it is clearly not working as claimed.

The HAPPIER paper claims the new model “generates polyphonic music with long-term dependencies compared to the state-of-the-art methods.” The paper says the HAPPIER models “perform better for melody track generation than the LSTM Baseline in the prediction setting” because their negative log likelihoods on sequences from NMD are lower. The paper also claims that HAPPIER model also “performs better in listening tests compared to the state-of-the-art methods”. The paper also claims that “the generated samples from HAPPIER can be hardly distinguished from samples from the Nottingham dataset.” None of these claims are supported by the evidence.

That brings up the final guideline.

6. If you are going to train a model on the NMD, or on this kind of melody-focused music, compare your results with folk-rnn. The code is freely available, it’s easy to train, and it works exceptionally well on this kind of music (when it is represented compactly, and not as MIDI). I have yet to see any model produce results that are better than folk-rnn in the context of this kind of music.