“Ej Självklar”

# SweDS19: Second call of presentations, posters and sponsors

http://www.kth.se/sweds19

The Swedish Workshop on Data Science (SweDS) is a national event aiming to maintain and develop data science research and its application in Sweden by fostering the exchange of ideas and promoting collaboration within and across disciplines. SweDS brings together researchers and practitioners working in a variety of academic, commercial or other sectors, and in the past has included presentations from a variety of domains, e.g., computer science, linguistics, economics, archaeology, environmental science, education, journalism, medicine, healthcare, biology, sociology, psychology, history, physics, chemistry, geography, forestry, design, and music.

SweDS19 is organised by the School of Electrical Engineering and Computer Science, KTH.

October 15–16, KTH, Stockholm Sweden

# Some observations from my week at the 2019 Joe Mooney Summer School

I arrived to the 2019 Joe Mooney Summer School knowing how to play about 70 tunes, but I left a week later knowing how to play three. That’s a good thing.

This was my first “music camp” – at 43 years old! I didn’t know what to expect, other than lots of music. I signed up for the courses in button accordion, with my D/G box – quite a strange tuning in Ireland but nonetheless not entirely incompatible with the music (more on that below).

The concert to open the week featured the group “Buttons & Bows“. Among the players is the superstar accordion player Jackie Daly. I later learned that Daly’s playing style is quite different from that of the accordion tutors, who seemed to all be students of Joe Burke, who was greatly influenced by the playing of Paddy O’Brien. At a few points in the concert Daly made comments to the extent that polkas aren’t given the respect they deserve. Then he would play a set of polkas. He related one funny story about a friend of his slagging polkas. So Daly wrote a polka and named it after his friend. I will be revisiting the way I play the Ballydesmond polkas, and will model them on Daly’s style. He will also be publishing a book soon collecting his compositions from his many years of playing.

On the first day of classes I found myself in a small room of about 60 accordion students. The average age was surely below 15. The youngest was probably 6 or 7. I was one of 10 adults, at least four of whom had traveled from outside Ireland (including Australia, Canada, England and Sweden). We each had to play a tune individually to be assigned to one of the five tutors – including two All Ireland Champions! When my turn came I started to play “Pigeon on the Gate”, but I wasn’t far into it before I was assigned to level 3, Nuala Hehir. Some students played a scale, a jig or a polka for their grading, but the tutors asked if they knew any reels. Reels are the most technically demanding to play.

There were 15 students in my class, including 6 adults. We each had to play a tune solo again for the tutor to hear. I played a bit of “Drowsy Maggie.” It wasn’t long before the tutor recognised several non-traditional characteristics – which is entirely to be expected since I haven’t had proper lessons in Irish accordion. More on this below.

In the six days of the course, we learned to play four tunes: two reels and two jigs. The first reel we learned is called “Crossing the Shannon” (called “The Funny Reel” here: https://youtu.be/FXqlUCOZBcc?t=50). The tutor played the entire tune for us to give us an idea of what it sounds like. Then she wrote up a textual ABC-like notation on the white board:

The circles denote crochets. The ticks on the letter denote an octave above the middle. Numbers denote fingering for B/C accordions. And the slur underneath two pitches denote sliding a finger on two buttons for B/C accordions.

The course proceeded with the tutor playing a few bars at a time with the ornamentation, and then the students playing along several times. In this tune, the important ornaments are cuts and a roll. Every second D’ can be rolled: D’-E’-D’-C#’-D’. In this case the roll happens in the duration of a crotchet. The E’ is a cut on the D’. A cut should be a nearly imperceptible blip. It doesn’t have any tonal value, but subtly changes the attack of a note. Cuts are often used by accordion, fiddle and flute and whistle when a note repeats. My D/G box can play a D roll only with a change of bellows direction to catch the E and the C#. A roll has to be smooth, so all pitches of the roll have to be played with the same bellows direction. Since we could not find any alternative, I must live with just cutting the D’ with an #F’.

The tutor had each student individually reproduce bars of the tune and coached them into improving it. Then she continued through these steps until we had a whole part. In the first day we made it through the first part of the tune, and recorded the tutor playing the second part at a slow speed so we can individually work on it for the next day.

On the second day we work-shopped the first part of “Crossing the Shannon” and moved on to learning the second part in the same way. Learning the second part wasn’t too hard because it mostly repeats material we already learned in the first part. By the end of the first half we had our first tune! The tutor had each student individually play the entire tune with repeats and helped them improve rolls and cuts, etc. She encouraged the students to not read the notation on the board.

In the second part of the day, the tutor gave us a single reel, “Glentown Reel”:

In this tune we have cuts, rolls, and triplets – all of which are possible on my accordion. The lines over the B’s remind the B/C student to play the outside row B. Some of the cuts are also made explicit. Learning this tune took most of day three. Before the end of the session, the tutor gave us a part of a jig (the second ending would be given the following day):

The tutor didn’t remember what it was called, but remembered she learned it from a particular teacher. She had us play a part of the first section, and then played the entire tune solo so we could record it and learn by ourselves at home. With the help of a friend I learned that the jig is similar to one called “The Road to Granard”.

On day four we went through both “Crossing the Shannon” and “Glentown Reel”, and finished learning the jig with the two endings (not pictured in the notation above). This jig has no rolls, but does involve cuts and triplets. Also, the tutor varied the use of triplets and showed how not every note needs to be ornamented in the same way. She also showed how a tune can be played beautifully without ornamentation.

On day five we went through all our tunes. Then the tutor asked whether any of us had another jig we wanted to work on. I suggested “Scatter the Mud”, but it wasn’t until I played it that she recognized it. Apparently, the version I played was not what she had learned. She confirmed with another tutor that the version she plays is closer to the right one, but she would have to do some research to make sure:

The sources of tunes are very important. The way a tune goes is not to be found on the internet, but in historical sources, like O’Neill’s collections, or the way particular masters play it and have recorded it. She warned us in considering the sources of our tunes.

The class on day six consisted of playing through all our tunes again, with some individual work, and then meeting with all the accordion students to play one or two tunes we learned. All five groups learned different tunes, none of which I had ever heard. Tutors deliberately choose rare tunes so that everyone can experience learning them fresh.

The week was also filled with many sessions happening around the high street of Drumshanbo, starting early in the day and ending very late at night. In any one of the four pubs, there could be four sessions going on. The high street also featured many children playing music together, some dancing, with hats out for money. It was great to see such enthusiasm from these young kids, many of which are playing very well! I attended sessions every night for the first four nights, and played in three, but by my third class I realised that I can play many tunes at speed without too many mistakes and can lead sets, but I’m not playing tunes in the “proper” traditional way.

Early on my tutor recognized some of my untraditional characteristics. One is my use of “Sharon Shannon” rolls, which are like triplets on the same note without any cuts. Another characteristic is my use of bass. B/C accordions have a much more limited bass side than my accordion, so the things I was doing didn’t sound right to her. Another characteristic I have is a general lack of rolls, cuts, and proper triplets. These ornaments, along with the rhythms, are what bring these tunes to life and gives them a dynamism. A bad habit I have developed is playing staccato. This means that when I play the accordion it doesn’t sound like an accordion. Now, in some contexts that could be called masterful, but this is not one of those contexts. So I decided that I would benefit more from going over the tunes and ornaments I was learning at slow speeds than repeating playing all my tunes at speed in non-traditional ways.

I look forward to next year when I can audition with “Pigeon on the Gate” played in a traditional style!

# Making sense of the folk-rnn v2 model, part 10

This is part 10 of my loose and varied analyses of the folk-rnn v2 model, which have included parts 1, 2, 3, 4, 5, 6, 7, 8 and 9. In the last part we looked at the similarities of the activations inside folkrnn v2 as it generated a particular transcription. Today we are looking at how our observations change for a different generated transcription.

Here’s a strange transcription generated by folkrnn v2:

The second and third parts have a counting error, which can be fixed easy enough. The Scotch snap in bar 9 is unexpected. I like the raised leading tone in the last bar. Otherwise this is a pretty boring tune.

Here’s the Gramian of the matrix of one-hot encoded input/outputs:

We see a lot of repetitions, forwards and backwards, which comes from the stepwise up and down the minor scale.

Here’s the Gramian of the softmax output vectors:

We again see a large number of pairs are far from each other. Of the 12,720 unique pairs of different vectors, 5,745 have a distance greater than 1.98. Second, the probability distributions in some of the steps generating measure tokens are close to identical — which is different from before where nearly all of them appeared quite similar. However, we see at each point when a measure token is produced that the distributions are very different from all others (the crisscrossing powder blue lines). Third, many of the structures seen in $\mathbf{X}^T \mathbf{X}$ are here as well, including some of the backward-slash diagonals. Fourth, we do not see the distributions produced during the first and penultimate bars of each part as overlapping much with other distributions. The distributions produced in the third bar seems to have the greatest dissimilarity to all others — which is curious because of its similarity to the first bar.

Looking at the Gramian of the normalised hidden-state activations of the three layers shows the same kinds of structures we saw before:

Again, the Gramian of the normalised layer-two hidden-state activations appears most similar to the Gramian of the one-hot encoded input. The diagonal lines in the Gramian of the layer-3 hidden state activations are not as strong as before. And there now appear several shorter diagonal lines between the stronger ones.

Here is an animation showing the Gramian from the out gate activations in each layer:

There’s still similarity with the Gramians of the hidden state activations. The grid patterns are interesting. From the first layer output activations they demarcate the three parts of the tune. In the third layer the grid shows the bars.

Here’re the Gramians of the cell gate activations with the hyperbolic tangent:

And here they are without the nonlinearity:

Again we see the cell gate activation of each layer saturates more and more in the same direction as the generation process runs. The extent of this saturation is least present in the first layer, and appears to exist in all of the second and third parts of the transcription in the second layers. The cell gate activations in the third layer are curiously calm.

Here are the Gramians of the in-gate activations of each layer (pausing at the last layer):

Not much going on here that we don’t see in the other gates. And here is the Gramian of the unit norm forget gate activations of all layers:

The three sections are clearly visible.

As before, comparing the activations between gates in each layer does not show any of these structures.

So it seems many of our observations hold!

# Making sense of the folk-rnn v2 model, part 9

This is part 9 of my loose and varied analyses of the folk-rnn v2 model, which have included parts 1, 2, 3, 4, 5, 6, 7, and 8. As a brief review, the folkrnn v2 model maps elements of the standard basis of $\mathbb{R}^{137}$ onto the positive surface of the unit L1-ball in $\mathbb{R}^{137}$ by a series of nonlinear transformations. Denote an input by $t$. The first layer transforms this by the following algorithm:

${\bf i}_t^{(1)} \leftarrow \sigma({\bf W}_{xi}^{(1)}{\bf x}_t + {\bf W}_{hi}^{(1)}{\bf h}_{t-1}^{(1)} +{\bf b}_i^{(1)})$
${\bf f}_t^{(1)} \leftarrow \sigma({\bf W}_{xf}^{(1)}{\bf x}_t + {\bf W}_{hf}^{(1)}{\bf h}_{t-1}^{(1)} +{\bf b}_f^{(1)})$
${\bf c}_t^{(1)} \leftarrow {\bf f}_t^{(1)}\odot{\bf c}_{t-1}^{(1)} + {\bf i}_t^{(1)} \odot \tanh({\bf W}_{xc}^{(1)}{\bf x}_t + {\bf W}_{hc}^{(1)}{\bf h}_{t-1}^{(1)} +{\bf b}_c^{(1)})$
${\bf o}_t^{(1)} \leftarrow \sigma({\bf W}_{xo}^{(1)}{\bf x}_t + {\bf W}_{ho}^{(1)}{\bf h}_{t-1}^{(1)} +{\bf b}_o^{(1)})$
${\bf h}_t^{(1)} \leftarrow {\bf o}_t^{(1)}\odot \tanh({\bf c}_{t}^{(1)})$

The second layer does the same but with different parameters, and acting on the first-layer hidden state activation ${\bf h}_t^{(1)}$ to produce ${\bf h}_t^{(2)}$. The third layer does the same but with different parameters, and acting on ${\bf h}_t^{(2)}$ to produce ${\bf h}_t^{(3)}$. The in-gate activation of layer $n$ in step $t$ is ${\bf i}_t^{(n)}$. That of the out-gate is ${\bf o}_t^{(n)}$. That of the forget gate is ${\bf f}_t^{(n)}$. And that of the cell gate is ${\bf c}_t^{(n)}$. The final softmax layer maps ${\bf h}_t^{(3)}$ to a point on the positive surface of the L1 unit ball, which is also the probability mass distribution over the vocabulary $\mathcal{V}$:

${\bf p}_t \leftarrow \textrm{softmax}({\bf V}{\bf h}_t^{(3)} + {\bf v})$

In previous parts of this series, we analyzed the parameters, e.g., ${\bf W}_{xi}^{(1)}$. In this post we look at the activations in these layers during the generation of a transcription. Let’s consider this one from my MuMe 2018 paper and part 8 of this endless series:

Let’s first look at the one-hot encoded inputs. Denote $\mathbf{X} := [\mathbf{x}_1 \mathbf{x}_2 \ldots \mathbf{x}_t]$ as the matrix of concatenated one-hot encoded vectors. The following shows the Gramian matrix $\mathbf{X}^T \mathbf{X}$:

The axes a labeled with the tokens of the transcription. I draw a line at each measure token so we can more easily relate the structures we see in the picture to the bars in the notated transcription above. We clearly see a few different structures. Forward-slash diagonals show repetitions. We see material in bar 1 repeated in bars 2, 3, 5, 6, 9, and 13. The largest repetitions are in bars 5-6 and 13-14. We also see some short backward-slash diagonals. These are also repetitions but reversed, e.g., “A, 2 G,” and “G, 2 A,” in bars 1 and 2. We can see all of this from the notated transcription. How will the various nonlinear transformations performed within folkrnn v2 relate to each other for this generation?

Let’s first consider the softmax output of the last layer, which is also a probability distribution over the vocabulary. We take all these points output by folkrnn v2 and find their pairwise distances on the positive surface of the L1-unit ball. If two of these vectors have a distance of 2, then their supports do not overlap, or equivalently their distributions contain mass distributions at different tokens.

Pairs of points colored sanguine have a distance in [0,0.02], and those colored powder blue have a distance in [1.98, 2]. All others in [0.02,1.98] are colored all gray scale. A few things become clear. First, a large number of pairs are far from each other. Of the 5,671 unique pairs of different vectors, 2,642 have a distance greater than 1.98. Second, the probability distributions in nearly all steps generating measure tokens are close to identical. Third, the distributions co-occurring with the repetitions in the first part are highly similar, reflecting the structures in $\mathbf{X}^T \mathbf{X}$ seen above. We do not, however, see the short backward-slash diagonals. Fourth, the distributions produced during the first and penultimate bars of each part overlap with nearly all other distributions except for those producing measure tokens; and the distributions produced after the middle of each section (bars 5 and 13) seem to have the greatest dissimilarity to all others. (Does the latter arise from the model having learned how parts of a tune define other parts?)

Now let’s look at the similarities of activations within the network during the generation of the transcription above. First, let’s look at the hidden state activations in the third layer. We construct the matrix $\widehat{\mathbf{H}}^{(3)} := [{\bf h}_1^{(3)}/\|{\bf h}_1^{(3)}\|, {\bf h}_2^{(3)}\|{\bf h}_2^{(3)}\|, \ldots ]$, and look at the magnitude of the Gramian $(\widehat{\mathbf{H}}^{(3)})^T \widehat{\mathbf{H}}^{(3)}$.

In this case sanguine shows activations that point in the same direction (ignoring polarity), powder blue shows activations that are orthogonal, and all others are colored gray scale, with darker shades showing smaller angles between pairs of activations. These patterns are intriguing! It seems that more or less as the model steps through the generation, the hidden state activations of the third layer point in similar directions relative to the bar lines. That is, the hidden state activations near to step $\tau$ after each barline point in similar directions, but different from those in the other steps after each barline.

This feature becomes less present in the hidden state activations of the shallower layers above. Here is the magnitude of $(\widehat{\mathbf{H}}^{(2)})^T \widehat{\mathbf{H}}^{(2)}$:

And here is that of the first layer $(\widehat{\mathbf{H}}^{(1)})^T \widehat{\mathbf{H}}^{(1)}$:

Let’s compare all of these to $\mathbf{X}^T \mathbf{X}$. The following animation cycles through each pair starting with the first layer (and pausing at the last layer).

First we see a shift of one pixel to the top right, which comes from the one-step delay between the input and output fed back into the network, i.e., each hidden state activation comes from processing the output generated in the previous step. Second, we see that the magnitude Gramian of the activations in the 2nd layer bear the most resemblance to $\mathbf{X}^T \mathbf{X}$. I don’t know why. Could it be that the second layer  decides what to put and the third layer decides where to put?

When we look at the activations of the out gate in each layer, we see a high similarity with the hidden state activations of the same layer – expected given how the hidden state activation is a function of the output gate activations. As before, we assemble the matrix $\widehat{\mathbf{O}}^{(n)} := [{\bf o}_1^{(n)}/\|{\bf o}_1^{(n)}\|, {\bf o}_2^{(n)}\|{\bf o}_2^{(n)}\|, \ldots ]$, and look at the Gramian $(\widehat{\mathbf{O}}^{(n)})^T \widehat{\mathbf{O}}^{(n)}$. Here is an animation cycling through the Gramian of these out-gate activations for each layer (pausing at the last layer):

Let’s have a look at the cell gate activations. First we look at the Gramian of $\tanh\widehat{\mathbf{C}}^{(n)} := [\tanh{\bf c}_1^{(n)}/\|\tanh{\bf c}_1^{(n)}\|, \tanh{\bf c}_2^{(n)}\|\tanh{\bf c}_2^{(n)}\|, \ldots ]$. Here is an animation showing these from the first to the third layer (pausing at the last layer):

We can see faint echos of the structures in the Gramian of the activations of the out gate and hidden state of each layer. Now let’s have a look at the Gramians without the hyperbolic tangent, i.e., of $\widehat{\mathbf{C}}^{(n)} := [{\bf c}_1^{(n)}/\|{\bf c}_1^{(n)}\|, {\bf c}_2^{(n)}\|{\bf c}_2^{(n)}\|, \ldots ]$ (pausing at the last layer):

This shows that the cell gate activation of a layer saturates more and more in the same direction as the generation process runs. The extent of this saturation is least present in the first layer, and appears to exist in all of the second part of the transcription in the second and third layers.

Here are the Gramians of the in-gate activations of each layer (pausing at the last layer):

And that leaves the activations produced by the forget gate:

These activations appear to be nearly saturated across the entire generation, but we do see the same structures as in the Gramians of the in- and out-gate activations.

Now, in the work I presented at the 2018 ICML Workshop: Machine Learning for Music, I show how each one of the four gates in the first layer seem to store information about token types in different subspaces of $(0,1)^{512}$ (or $\mathbb{R}^{512}$ for the cell gate). Here’s the relevant slide:

Let’s look at the following matrix: $(\widehat{\mathbf{I}}^{(1)})^T widehat{\mathbf{O}}^{(1)}$, that is the set of inner products of all unit-norm activations of the first layer in gate with those of the first layer out gate. Here they are with outgate activations along the vertical axis (pausing on the last layer):

These show none of the structures above. The activations of these two gates are thus pointing in ways that do not strongly relate over the steps. It is the same when we compare the forget-gate activations to those of both the in- and out-gate activations. These structures do not appear either in the comparison of the hidden state activations and in-, out- and forget-gate activations. So, taken with my above theoretical observation of the parameters in the gates of the first layer, it seems the same holds true of the deeper layers: each gate of a layer is projecting information in ways that are not directly related.

Now, how does all of the above change with a different transcription?

# Unintended Uses

When we created http://themachinefolksession.org in 2018, we intended it to be a venue for crowd-sourcing “Machine Folk” — music created by or with artificial intelligence. So far, over 700 tunes have been added, most of them by anonymous users of http://folkrnn.org. And over 60 recordings have also be added.

However, outside of my own use of the website, by far the biggest use has been strange automatic registrations like the below. There have been over 800 of these created since we opened the website. With a two step process to registration none of them have become fully registered.

Are these automatic registrations performed by bots? Are these bots coming for the music? Or are they spammers trying to infiltrate the bustling world of machine folk enthusiasts (current population in the low single digits)?

# SweDS19: FIRST CALL FOR PRESENTATIONS AND POSTERS

http://www.kth.se/sweds19

## October 15–16, KTH, Stockholm Sweden

The Swedish Workshop on Data Science (SweDS) is a national event aiming to maintain and develop data science research and its application in Sweden by fostering the exchange of ideas and promoting collaboration within and across disciplines. SweDS brings together researchers and practitioners working in a variety of academic, commercial or other sectors, and in the past has included presentations from a variety of domains, e.g., computer science, linguistics, economics, archaeology, environmental science, education, journalism, medicine, healthcare, biology, sociology, psychology, history, physics, chemistry, geography, forestry, design, and music.

SweDS19 is organised by the School of Electrical Engineering and Computer Science, KTH.

## Topics

SweDS focuses on theoretical and applied aspects of data science in many disciplines, e.g., Computer Science, Linguistics, Economics, Education, Medicine, Healthcare, Biology, Sociology, Psychology, Physics, Chemistry, Geography, Forestry, Art and Music, Design, etc. Topics include, but are not limited to:
• Text & Web Mining
• Classification, Clustering, and Regression
• Probabilistic & Statistical Methods
• Graphical Models
• Spatial & Temporal Mining
• Data Stream Mining
• Feature Extraction, Selection and Dimension Reduction
• Data Cleaning, Transformation & Preprocessing
• Multi-Task, Multi-label, and Multi-output Learning
• Big Data, Scalable & High-Performance Computing Techniques
• Mining Semi-Structured or Unstructured Data
• Data privacy

## Invited Speakers

• Professor Virginia Dignum (Department of Computing Science, Umeå University)
• Professor Michael Höhle (Department of Mathematics, Stockholm University)
• Professor Sven Ahlbäck (Kungliga Musikhögskolan, CEO på Doremir Music Research AB)
• Anders Arpteg (Principal Data Scientist, peltarion.com)
• Dr. Josephine Sullivan (Division of robotics, perception and learning, KTH)
• Dr. Margaret Schedel (Stony Brook University, NY, USA)

## Submission Guidelines

We invite academic researchers as well as industrial researchers and practitioners to present their work either by giving a talk or poster/demo track. 1) Contributed talk (15-20 minutes incl. questions/discussion): submit abstract (400-500 words) in one of the following categories: original research, new/relevant challenge, or status report of ongoing work; 2) poster/demo track: submit short abstracts (200-300 words), and mark “submit for poster” or “submit for demo” in abstract. Abstracts will be screened and selected based on relevance and quality.

Submissions should be made here: https://easychair.org/conferences/?conf=sweds19

## Important Dates:

• SEPTEMBER 15 2019: Submission deadline
• OCTOBER 4 2019: Registration deadline (https://www.eventbrite.com/e/2019-swedish-workshop-on-data-science-tickets-63426090143)
• OCTOBER 15-16 2019: Workshop!

## Organizing committee

• Dr. Bob L. Sturm, workshop chair (Division of Speech, Music and Hearing, KTH)
• Ronald Cumbal Guerron, registration (Division of Speech, Music and Hearing)
• others TBD

## Program Committee

• Professor Hedvig Kjellström (Division of robotics, perception and learning, KTH)
• Professor Danica Kragic Jensfelt (Division of robotics, perception and learning, KTH)
• Professor Sten Ternstrom (Division of Speech, Music and Hearing, KTH)
• Dr. Hossein Azizpour (Division of robotics, perception and learning, KTH)
• Dr. André Holzapfel (Division of media technology and interaction design, KTH)
• others TBA