At left is The Black Box. It has been perfect! My design of the keyboard has worked out so well. The left hand side, however, is a bit limited. I really miss the F/F# that is on The Mean Green Machine Folk Machine (now sold to a new loving family). So, enter the box at right (so far unnamed). The treble side has two more buttons, but the bass side has four. It’s now in the shop being converted to my new design! Details coming soon…

# An analysis of the 365 double jigs in O’Neill’s, pt. 10

This is part 10 of my live blogging analysis of the 365 double jigs in O’Neill’s 1001. In the last part, I revise and tune the procedure by which I extract time-pitch series from the collection, and then analyze several examples. The part before that reviews where I have been.

While reading O’Neill’s “Irish minstrels and musicians: with numerous dissertations on related subjects” (1918), I found the following quoted from “A History of Music in England” by English composer Earnest Walker (1907). I believe it really encapsulates implicit and explicit properties of Irish traditional music:

Few musicians have been found to question the assertion that Irish folk-music is, on the whole, the finest that exists; it ranges with wonderful ease over the whole gamut of human emotion from the cradle to the battlefield, and is unsurpassed in poetical and artistic charm. If musical composition meant nothing more than tunes sixteen bars long, Ireland could claim some of the very greatest composers that have ever lived; for in their miniature form the best Irish folk-tunes are gems of absolutely flawless lustre, and though of course some of them are relatively undistinctive, it is very rare to meet with one entirely lacking in character. (pg. 335)

I wonder if the convention of Irish tunes being sixteen bars long relates to physical limits of human memory, and the aural transmission of tunes?

Anyhow, in this part I investigate extracting a feature complementary to time-pitch series: one describing rhythmic aspects of a transcription. Let’s consider jig #201 (“Biddy’s wedding”):

We see notes of four different durations. From shortest to longest: semiquaver, quaver, dotted quaver, and crotchet. In the entire collection, there are notes of three other durations: triplet semiquaver, triplet quaver, and dotted crotchet. One way to describe the rhythm of any transcription in this collection is by expressing it as a sequence of encoded durations where. Let’s try a simple one: 1 means a triplet semiquaver, 2 means a semiquaver, …, and 7 means a dotted crotchet. So the series describing “Biddy’s wedding” starts: 5, 2, 6, 4, 4, 4, 4, 4, 4, …

Even better would be an encoding that directly describes time. Dividing each quaver of a 6/8 measure into 6 segments (Fs = 6 segments/quaver) results in a triplet semiquaver lasting 2 segments, a semiquaver lasting 3, a triplet quaver lasting 4, a quaver lasting 6, a dotted quaver lasting 9, a crotchet lasting 12, and a dotted crotchet lasting 18. Hence, such a series extracted from “Biddy’s wedding” starts: 9, 3, 12, 6, 6, 6, 6, 6, 6, … In this way we can easily isolate measures, and accumulate the values of the series to create a series of onset times, i.e., for “Biddy’s wedding”: 0, 9, 12, 24, 30, 36, 42, 48, 52, 58, … Call this indexed series .

The problem with both of these approaches is that the length of a series is equal to the number of notes in a transcription. I want a feature that facilitates the comparison of transcriptions and their parts. This can be done by making all series the same length. Hence, I create the *onset-time series* of length Fs*6*8 = 288 according to the following:

using the indicator function. So the onset-time series is just a series of 288 ones and zeros. For “Biddy’s wedding” the onset-time series looks like:

Each spike shows an onset. Those of the A part are shown in blue and orange, and those of the B part are shown in green and red. It’s hard to differentiate between the series, so let’s view these in an alternative way:

Series 1 and 2 are from part A and 3 and 4 are from part B. Time in each sequence is going along the x-axis, six steps for each quaver, six quavers for each measure, and 8 measures for each series. Series-time in each tune is going along the y-axis, where each part contributes two series since there are repetitions. There is a change in pixel value where an onset occurs.

Let’s have a look at some others. Below is the onset-time series for jig #24 (“The maid at the well”):

This shows each part is built from two measures with the same rhythm: 10 quavers and a crotchet. Here’s the dots:

The onset-time series below shows a sequence of 22 quavers and a crotchet:

This is from jig #32 (“The basket of Turf”):

And here we see quavers all the way:

Those are the onset-time series of jig #125 (“Wasn’t she fond of me?”):

The pickup doesn’t appear in the onset-time series (or any of the other series) because it does not occur in any repetitions. Its existence is seen in the time-interval series, however (start of blue line):

Here’s a strange pattern of onset times:

That’s from jig #357 (“The Hibernian jig”). The dots show what is going on:

For some reason, O’Neill has explicitly notated an exaggerated jig rhythm. The transcription could also be notated with straight quavers and interpreted in the manner above.

How many jigs in this collection have such exaggerated jig rhythms? Only 14 of the 365 notate the dotted quaver semiquaver rhythm at least 8 times: #8, 76, 101, 148, 181, 201, 212, 222, 229, 256, 257, 294, 322, and 357. Jig #101 (“The idle road”) is one of these I looked at in part 5, which is played by Joe Burke with an unbroken rhythm.

Here’s another interesting one:

This is of jig #95 (“The sheep on the mountains”). This is the only jig in the collection with a structure different from all the others: ABAB, where each part is 16 measures long:

Here’s another, of jig #200 (“Daniel of the sun”):

The B part of this tunes looks to be more syncopated than the other two parts. The transcription shows plenty of broken rhythms, including Scotch snaps, in these parts:

The below are the onset-time series for jig #226 (“Tim Hogan’s jig”):

It appears that the parts become more and more dense with notes. Here’s the transcription:

We see the crotchets of the A part are notated as trilled, or rolled. The B part features mostly quavers. Then the C part has semiquaver passing tones.

Here’s another unusual one:

That is for jig #361 (“The Drogheda weavers”). There appears to be a flourish of notes starting each four measure section of the B part. Here’s the transcription showing what is happening in that part.

By way of summary, let’s look at the ways in which we can describe the characteristics of a give transcription from O’Neill’s 1001. Let’s consider one of my favorite jigs, “Scatter the Mud” (#187). Here’s the ABC notation:

M:6/8 K:Am d|eAA B>(cB/A/)|eAA ABd|eAA B>(cB/A/)|dBG GBd| eAA B>(cB/A/)|eAA AGE|GAB Bge|dBA A2:| |:d|eaa egg|dBA ABd|eaa egg|dBG GBd| ea^f ({a}g2)e|dBA AGE|GAB Bge|dBA A2:|

And here’s the dots:

Here’s the onset time series of “Scatter the Mud”:

Here’s the time-pitch series:

As is convention, the B part goes higher in pitch than the A part. And the two parts echo each other in the middle and end. Here’s the time-interval series:

And finally, here’s the circular autocorrelation of the time-interval series:

We see the A part is has high self similarity at lags of one measure, and the greatest value at four measures (other than zero lag). The B part has more self-similarity with a lag of two measures.

In the next part, I think I will look at clustering transcriptions according to their onset time series.

# Seán Ó Riada on the Accordion in Irish Traditional Music

Seán Ó Riada is one of the most important Irish composers of the 20th century, and a key figure in the revival of Irish traditional music. In 1960, he assembled a group of traditional Irish musicians, named “Ceoltóirí Chualann“, to present traditional music in a classical music concert setting. They gave several influential concerts, and the group is considered a precursor to one of the greatest modern Irish music groups, The Chieftains, who have had 18 Grammy Award nominations.

In 1963, Ó Riada recorded a series for Raidió Éireann called “Our Musical Heritage”, in which he introduces and discusses Irish traditional music and its elements. In one of these he discussed the button accordion. I can’t find any transcription of his commentary online, but I love it so much I will transcribe it here.

Ó Riada prefaces his commentary with the following:

First of all, it needs to be emphasized over and over again, that Irish traditional instrumental music is a very close relation of Irish vocal music; that is, sean-nós [old-style] singing. The instruments which suit Irish music best are therefore those that most closely approach the personal expression of the human voice.

The fiddle is ideal. The player is in contact – in complete contact – with his instrument. The notes do not exist until he makes them; and his tone is a completely individual thing, differing from another fiddle player’s tone as much as one voice differs from another. This is also true to varying extents of the uilleann pipes, the flute, and the whistle.

Irish music is entirely a matter of solo expression, and not of group activity. It is the direct expression of the individual musician or singer. It is again very much a matter of personality. Whether that personality exists or not outside the music. That is to say, a singer, a piper, or a fiddler may be quite an unpleasant person when not performing but when performing it is his music personality which counts, which impresses us – the direct expression of his musical personality. Everything that comes in the way of that direct expression beclouds and confuses it.

Now, the most direct means of expression in music in the human voice. Next, in varying degrees, as I said, come the uilleann pipes, fiddle, flute, and whistle. In each of these the player makes the notes himself. He is in control. The notes do not exist until he makes them. The fiddle player and the piper make the notes with their hands. The flute player and whistle player, with their mouth and hands. They are at all times directly in contact with the actual notes they make. And as a result, they are the masters of the notes. They control them. Varying their loudness and their softness. Their tone quality, and even their intonation.

Then Ó Riada is ready to render his judgement:

This, the accordion player cannot do. He does not make the notes – they are already there before him. Ready to sound at the pressing of a button, produced in an almost entirely mechanical fashion. Thus, he has not the control over his instrument that the others have. He has only to press a button and pull or push the bellows and the note sounds for him. The tone and even the intonation have already been decided for him by the maker. Because of this, individual musical expression becomes extremely difficult, if not impossible for him. For this reason, if not for any other, the use of the accordion as a solo instrument in Irish traditional music is to be greatly deplored.

Most accordion players are so hampered by their choice of instrument as to be unable to produce anything but a faint, wheezy imitation of what Irish music should be. And the most unfortunate part of it is, that this instrument, designed by foreigners for the use of peasants who had neither the time, inclination or application to learn a more worthy instrument – this instrument is not just losing favor, but gaining vast popularity throughout the country. The reason for this is mainly, I think, the laziness which afflicts us as a nation at the moment.

We would all like to be musicians, but we don’t want to take the trouble. It is easier to play notes which are already made for us, than to make our own notes. Accordions, bigger and better accordions, and eventually the greatest abomination of all – the piano accordion – nothing could be farther from the spirit of Irish traditional music.

However, I’m afraid this has been a rather long digression. As I said, very few accordion players in this country can surmount the difficulties inherent in their instrument. Most feel on the other hand that something must be done to enable them to produce more expression on the accordion. As this can’t be done by means of varying the tone, and so forth, they have turned to the one thing which it is possible to exploit, namely ornamentation. And it is precisely with regard to ornamentation that accordion players have committed their greatest crimes. In recent years, a technique and style of chromatic ornamentation, utterly alien from the spirit of Irish music, has grown up.

But before I describe it, let me mention briefly the two basic principles of ornamentation. And incidentally, I did not invent these principles. These principles are based on practice – the practice of the best players under the best circumstances. They are not invented principles, they are merely observed principles.

And the first is: generally speaking, no ornament should go outside the mode of the song or tune in which it occurs. And the second is: no ornament should, by its position, draw attention to an irrelevant note in the phrase in which it occurs. As by doing so it destroys the basic shape of the phrase.

At this point, Ó Riada uses the piano to illustrate permissible and impermissible ornamentation. He then caricatures the chromatic ornamentation he was hearing performed by the very influential Irish accordion players of the time, i.e., Paddy O’Brien and Joe Burke (though he does not name names). These players “throw in as many semitones” as they can. Eventually, Ó Riada renders a simple tonal phrase in the key of G into an unrecognizable chromatic mess [QED]. He continues:

The worst feature of it, to my mind, is not so much the incidental semitones, as is the dreadful habit they’ve got of using the downward semitone-inflected mordent, where you begin on a note, go to the semitone below and back to the note. Funnily enough, it is far more common than the upward-inflected mordent, where you begin on the note and go to the next note above it.

So the main downfall of the present day accordion players is the downward-semitone inflected mordent. This kind of thing is of course complete and utter rubbish; and it is up to the musical public to make their disapproval felt.

As I said, there are very few accordion players in this country who can sufficiently overcome the disabilities and limitations of their instrument. So as to make what they play sound like Irish music. But one of these few players is Sonny Brogan of Dublin. He is a man who understands the limitations of his instrument, but who strives to counteract these not in a mishmash of wrongly placed ornamentation, but by emphasizing the most traditional elements in the tunes he plays. His ornamentation is simple usually confined to the single cut, or grace note, and the roll.

Ó Riada then plays recordings of Brogan playing the reels “Repeal of the Union”, “The hut in the bog” and “Gordon’s reel”, and finally the jig “Morrison’s”. He highlights Brogan’s use of variation.

To sum up then, the accordion has been played in this country – the two row button accordion, that is – for upwards of 40 years. And I’m afraid that it has come to stay. However, while I have emphasized its unsuitability for solo playing, it can be a most useful instrument in a band – something about which I am going talk next week. As a proverb says, it’s an ill wind. If only most Irish accordion players would try to fit in with the tradition instead of flying in the face of it, something would be achieved.

And one last word about the accordion: I wish, and indeed I wish again, that all Irish accordion players would drown, muffle, destroy, subdue or in some other fashion, silence the bass of their instrument. I haven’t yet heard an accordion player who knew the right bass to play, and it’s far better to play no bass anyway. It only interferes with the tune and confuses it.

Ó Riada continues his programme by talking about the concertina, which he finds to be superior to the accordion for Irish traditional music (e.g., “it’s not one tenth as unwieldy as the accordion”), and laments its decline.

One repercussion of my research in applying AI to model transcriptions of Irish traditional dance music is that I have become a dedicated student of Irish accordion. But I take no offense to any of Ó Riada’s verdicts and criticisms. Some of them are clearly laughable, such as peasants too busy to learn a “more worthy” instrument, and his nation “afflicted” with laziness. Some are uncomfortably nationalistic, such as those instrument-making foreigners. Some are contradictory, such as when he lauds the concertina over the accordion while overlooking that concertinas and accordions were being made by the same foreigners, and that the concertina involves the exact same mechanics as the accordion. And some are curiously unfair, such as overlooking the great expression that can be accomplished with the bellows. At least the accordion can produce dynamics like the human voice, which is not possible on the uilleann pipes – a more “worthy” instrument for Ó Riada. I am however persuaded by his opinion on some approaches to playing bass on the accordion. I think sparse is the best approach, and only if it fits harmonically.

Ó Riada’s main argument with the fashion of accordion playing at his time is focused on music theory: the “great crime” of downward semitone-inflected mordents. Therein lies Ó Riada’s great crime: using a music theory that is in and of itself foreign to Irish traditional music to castigate contemporary practices of Irish traditional musicians.

I see Ó Riada’s programme on the accordion as a wonderful time capsule from just before Irish traditional music began its transformation into a major economic resource for Ireland – something that is due in large part to Ó Riada. The accordion would soon become a principal instrument of Irish traditional music. Controversy around the accordion would be replaced with controversy around the guitar and the bodhran, group playing, and eventually commercialization – the latter of which was as vigorously denounced by more modern “gate keepers” as Ó Riada denounces the accordion, e.g., Tony MacMahon in his wonderful 1996 essay, “The Language of Passion“.

# An analysis of the 365 double jigs in O’Neill’s, pt. 9

This is part 9 of my live blogging analysis of the 365 double jigs in O’Neill’s 1001. The last part reviews where I have been. In this part I look at the time-pitch series of the collection. I create these series by single nearest neighbor regression on tuples of pitch and time observations extracted from a transcription. As an example, here is jig #201 (“Biddy’s wedding”):

Its four pitch-time series appear like so:

This feature has a clear relationship to the transcription because it shows which pitch occurs at what time over 8-measure segments. I can extract a time-interval series from these series by moving stepwise along time and finding and holding subsequent differences. The time-interval series for “Biddy’s wedding” appears like so:

The step of 5 semitones for series 2-4 come from the G pitch at the end of each line. I am making the first interval of the first series always be zero.

Here is the transcription I found at the center of a multidimensional scaling of the collection of transcriptions, jig #134 (“Young Tim Murphy”):

And here is its time-pitch series:

The two parts appear quite different save for the last two measures. Here is the time-interval series I extract from this:

In terms of intervals, we see the two parts are similar in measure 4 as well.

One difference between the two jigs above is the anacrusis. This in effect shifts to the right each series of #134 with respect to those of #201. If I am comparing only the series extracted from one transcription, there’s no problem since they all have the same shift. But if I want to compare series across transcriptions, some with an anacrusis and others without, I need to account for the shifts, i.e., align the measures. This will be important to consider when looking at tunes as sequences of measures.

The music21 library provides an easy way to detect an anacrusis, so I have rewritten my feature extraction code such that all series are aligned by measure. Let’s continue looking at the time-pitch series of the collection.

The dots of jig #17 (“The eavesdropper”) are:

and results in the following time-pitch series:

Note that middle C is pitch 60, but I have transposed all jigs in this collection to have a root of C. Here is the time-interval series I extract from the time-pitch series:

One major difference in extracting the time-interval series from the time-pitch series as to how I was doing it before is that this new approach considers repeated pitches as one. So the run of B quavers in the first measure are grouped together in an interval of 4 semitones over 3 quavers. I think this is preferable from the standpoint of considering melody. Playing 3 quavers in place of a dotted crotchet does not change the melody other than its rhythmic characteristic.

(This motivates extracting a “time-duration” series from a transcription to describes its rhythmic characteristics. Instead looking at what pitch is playing when, look at what duration is playing when. Ignoring graces, rolls, and trills, the collection has only pitches of seven durations. From shortest to longest these are triplet semiquaver, semiquaver, triplet quaver, quaver, dotted quaver, crotchet, and dotted crotchet. I will explore this additional feature at a later time… but keep in mind that the features I am extracting are not exemplary of how these tunes are experienced in performance. These are just the bones of the tune as it was in someone’s hand in the early 20th century, without any meat, flesh or movement.)

In the time-pitch series for “The eavesdropper”, we also see how its B part departs from the A part by going higher in pitch, and then descends back to join it. A typical feature of two-part jigs in this collection is that the B part sits above the A part in pitch. To get an idea of how typical it is, let us sum the set of differences between time-pitch series 3 and 1, and of 4 and 2 for each two-part jig in the collection (N=291), and make a histogram of them:

A positive difference means part B of a tune spends more time at pitches higher than part A. I find 268 of the 291 two-part jigs (>92%) have a positive difference. The two-part jig that has the largest difference is #190 (“O’Mahony’s frolics”):

Here are its time-pitch series:

Notice how the first ending of the B part stays high, and the second ending takes the melody down back home.

Of the 23 two-part jigs with a negative difference, the most negative one is #57 (“The blazing turf fire”):

Here are its time-pitch series:

What happens in jigs with more than two parts? Here’s the time-pitch series of the four-part jig #286 (“Strop the razor (2nd setting)”):

We see the melody goes highest in penultimate part (series 5&6). we see the same in the three-part jig #320 (“The piper’s welcome”):

This is not the case in the three-part jig #344 (“The stolen purse”):

Another interesting feature I see in some tunes is contrary motion of the parts, e.g., jig #223 (“The rambler from Clare”):

The time-pitch series show this “mirror image” effect:

This is probably not an accidental feature, but done consciously or planned in composition. Jig #237 (“The Fardown farmer”) has the same kind of construction:

Here are its dots

The A part of this jig and the A part of “The rambler from Clare” are so similar it makes me wonder if the Fardown farmer was the that rambler from Clare

Other tunes have similar intervalic motion in their parts. Here’s jig #249 (“The flitch of Bacon”):

And here’s the corresponding time-pitch series

This also shows how I disregard rests in my extraction of the time-pitch series, just extending the duration of the pitch preceding it.

# An analysis of the 365 double jigs in O’Neill’s, pt. 8

This is part 8 of my live blogging analysis of the 365 double jigs in O’Neill’s 1001. It’s time for a breather. Let’s have a review!

- Part 1 discusses O’Neill’s collection of jigs, and how I have normalized the transcriptions expressed with ABC notation. I use the normalized Damerau-Levenshtein distance (DL distance) to compare the transcriptions as strings, which locates some “duplicates” and variations, as well as several errors in the transcriptions. I find that the normalized DL distance provides sensible results.
- Part 2 looks at the similarity matrix created from the normalized DL distance between all pairs of transcriptions. I analyze some of the pairs that have very large distances. I also perform some multidimensional scaling of the collection with the similarity matrix and look at the transcriptions that are at the center of the cluster. Finally, I observe that applying string edit distances to ABC notation is musically naive, e.g., “DEFG2G” in C major and “DEFG2G” in C minor are different.
- Part 3 reduces the transcriptions to sequences of measure tokens and looks at the different measure structures present in the collection. This uncovers more errors in the transcriptions, and leads to further normalization of the collection. Performing multidimensional scaling on the reduced sequences creates sensible clusters.
- Part 4 converts each transcription into “time-interval series”, which describes the intervalic “profile” of the melody. I explore other series derived from this representation by integration, circular autocorrelation, and marginalization (integrating out time). It is clear that the transcriptions in this collection have a well-defined structure having sections of eight measures, which motivates comparisons of features extracted from these sections, and smaller subsections of 1, 2 and 4 measures.
- Part 5 inspects several 8-measure time-interval series in the collection, and gives a broad sense of the intervalic structures of the collection. I also find more transcription errors. I look at transcriptions with time-interval series that have specific statistical characteristics. I also look at the collection as a whole and find some interesting trends, e.g., time spent at pitches arrived to by a perfect fourth up is longer than vice versa.
- Part 6 looks at clustering the 1,712 8-measure time-interval series of the collection. I analyze the centroids, and the distributions of distances to these. I transform some centroids to transcription sequences, which do not resemble any of the tunes in the collection. I also begin to inspect the circular autocorrelation of the time-interval series, which I believe are more indicative of the melodic structure in a transcription, e.g., revealing repetitions within a series.
- Part 7 looks at clustering the 1,712 circular autocorrelations of the 8-measure time-interval series. I analyze the centroids, which make more musical sense to me than the centroids created from the time-interval series. The structure of a melody is more apparent in these representations, but there are some details that need to be worked out.
- Part 8 reviews where we have been, and some questions that remain open. I also look at the sensitivity of a time-pitch series to subtle transformations of the originating transcription.

I have a growing list of open questions:

- A multidimensional scaling of the transcriptions according to their normalized DL distances places a few transcriptions closest to the center of the cluster: jigs #134 (“Young Tim Murphy”) and #296 (“Barney O’Neill”). How stable is that position? What is the significance of those transcriptions in that position? What does that position mean musically speaking, if anything? (Perhaps this is not worth investigating given the lack of musical meaning of a string edit distance between ABC transcriptions.)
- A number of features have been proposed that express the musical content of a transcription in eight-measure sections: 1) (mean-centered) time-interval series; 2) (normalized) circular autocorrelation of time-interval series; 3) integral of time-interval series; 4) time-marginalization of time-interval series; 5) histogram of time-marginalization of time-interval series. What about expressing the normalized melody (transposed to root C) as a time-pitch series? What is the musical significance of each of these features?
- K-means clustering of the circular autocorrelation of time-interval series shows some sensible results, e.g., finding eight-measure series that are structurally similar. What changes when we perform K-mean clustering on normalized circular autocorrelations (that is, dividing each by the value at zero lag)?
- If we break the time-interval series into units of one-measure duration, how many unique units are there? How do they relate? Are there “prototype” measures? Might we see each eight-measure series as a concatenation of these “codebook” units?
- My explorations so far show how we can analyze a collection of transcriptions. Can these approaches be used to compare two collections of transcriptions? Say, O’Neill’s collection with another collection of supposed jigs, say computer-generated, hmm? Hmmmm?

Near the conclusion of the last part, I noticed something that needs more thought. Let’s look at jig #201 (“Biddy’s Wedding”):

This is a very simple tune. Harmonically both parts are: I-I-I-V-I-I-IV-V. The A part is built from two-measure bits like so: abac. The B part is just a variation: a’b’a’c. “Filling in” crotchet-quaver pairs with passing tones or chord tones, or removing those, do not change the melody. But the time-interval series show these as major changes:

The c part in measures 7&8 is clearly identical. The b and b’ parts appear quite close as well, except for the big long-duration jump of 7 semitones in b’. However, the relationship between the a and a’ parts is not clear. Performing a correlation of these parts of the series would involve a multiplication of a string of zeros, which would reduce its value.

The circular autocorrelation of these time-interval series of this tune suggests its both parts are not closely related:

From looking at the transcription, I expect both parts of this tune to produce large peaks at a lag of 2 and 4 measures, which we see. But the half-measure peaks in the B part (lines 3&4) are curious, as are the small peaks for the A part at some other fraction of a measure.

Let’s do an experiment to see how robust these features are. I will slightly modify the transcription as below and recompute the time-interval series and its circular autocorrelation:I have added an anacrusis to each part, and have filled in the crotchet-quaver pairs. Here’s the time-interval series for these parts:

The circular autocorrelation of these are:

The differences with the original features do not appear to be that great, which is a good sign. I still see that curious structure in the A part.

If we make the arpeggiation of I in measures 2&6 of the A part go downward like so: the circular autocorrelation of the time-interval series become more similar:

I don’t think such a minor change to the transcription should result in a major change of high-level features extracted from it. This points to the fact that the time-interval series are too detailed to make meaningful comparisons of melodic structure.

I think I have to return to basics and look at representing the melody as a time-pitch series, and how this might be transformed into a feature that more clearly expresses structure.

# An analysis of the 365 double jigs in O’Neill’s, pt. 7

This is part 7 of my live blogging analysis of of the 365 double jigs in O’Neill’s 1001. Part 1 is here, part 2 is here, part 3 is here, part 4 is here, part 5 is here, and here is part 6.

Now let us look at the results of k-mean clustering of the circular autocorrelations of the 1,712 time-interval series. I start with a single cluster and look at the centroid and the distribution of distances to it. Here is the 145-dimensional centroid:

That looks pretty good. The high value at zero lag suggests this is a sequence with some large time-intervals. The peak at a lag of four suggests that half of the series strongly resembles the other half. The peak at two suggests that the series is built from a two-measure bit. And so on. Let’s look at the distribution of Euclidean distances to this centroid:

The median of this distribution is around a distance of 104. The largest Euclidean distance we see is about 996, and the smallest is 65. The series furthest from this centroid is in jig #257 (“The Morgan Rattler”), which we keep seeing is a very unique jig in this collection. The series closest to this centroid comes from jig #155 (“Jackson’s rambles”). Here’s its circular autocorrelation:

It looks like part A of this tune contributes the matching series. The dots below show this part has a four-measure structure, and some repetition of intervals at the two-measure level:

Here’re the centroids coming from K-means with two clusters:

And here are the distance distributions:

There are 1479 series in cluster 2, but only 233 in cluster 1. The Euclidean distance between the two centroids is about 200.

Let’s try four clusters. Here are the centroids (x-offset is just for display):

Now we can see centroid 1 (population is 209) has to do with time-interval series with similarities at the half-measure-level, centroid 4 (pop. = 98) has to do with time-interval series with similarities at the measure-level, centroid 2 (pop. = 202) has to do with time-interval series with similarities at the two-measure-level, and centroid 3 (pop. = 1102) is perhaps something to do with similarities at the four-measure level.

Here’s eight centroids:

And the distances within each cluster.

Cluster 8 is the most populated, with 697 series; but cluster 6 has only 4. Let me guess: those come from “The Morgan Rattler”… Indeed, I see series from #257. But also #154 (“The Antrim lasses”):Here’s the autocorrelation of its time-interval series:

The B part of this tune shows the same structure we see in centroid 6.

There are 48 series in the cluster described by centroid 5 coming from 22 jigs: #6, 18, 23, 30, 56, 71, 82, 117, 125, 126, 127, 172, 178, 183, 186, 201, 204, 258, 274, 287, 291, 343. These should have sequences with repetition at the half measure. Let’s look at two. Jig #18 (“Saddle the pony”):and jig #201 (“Biddy’s wedding”):Looking at the autocorrelation of their time-interval series shows their similarity in this domain (the first is “Saddle the pony”):

Even the other two parts look related! So time-intervalically speaking, we can see why these sections would be grouped together. However, the melodies of these jigs are not very similar.

I have searched the web for people playing these tunes, but there appear to be none! All the performances I can find of “Saddle the pony” are actually the jigs “The Priest’s Leap” (#59) and “The Draught of Ale” (#156) in O’Neill’s 1001 (identical tunes). And “Biddy’s wedding” doesn’t appear to have been recorded anywhere. So learn to play them I have:

Here’s one time through “Saddle the pony” in O’Neill’s 1001 on The Black Box:

Here’s one time through “Biddy’s wedding” from O’Neill’s 1001 (but played in G):

And now I find something curious! “Saddle the pony” appears in O’Neill’s 1850 as two settings, both in A major. The second setting is the one appearing in O’Neill’s 1001, but with a dropped seventh (A mixolydian):

Why didn’t O’Neill include both settings in his 1001? And where did the G sharp go? I do think the flattened seventh sounds more Irish.

*Update: 20200402*

My teacher Paudie O’Connor says the G sharps might occur in Donegal, but that the 1001 version plays well as written. There is a four-part jig called “Langstrom’s Pony” that has as its first two parts this version. Here’s De Danann playing the tune:

# An analysis of the 365 double jigs in O’Neill’s, pt. 6

This is part 6 of my live blogging analysis of of the 365 double jigs in O’Neill’s 1001. Part 1 is here, part 2 is here, part 3 is here, part 4 is here, and part 5 is here.

Let’s do some clustering of the time-interval profiles in the dataset. I compute these profiles with a sampling rate 6 samples per quaver, over 8 measure sections, which make them have length 6*6*8 = 288. Ima take all 1,712 series in and cluster them. All of the time-interval series have this range (and they are all integers since the semitone is the smallest division of the octave in equal temperament). The jump up of 21 semitones (an octave and a major sixth) occurs only in jig #330 (“The queen of the fair”):

The largest leap down of -17 semitones occurs in three jigs, one of which is #36 (“Father Dollard’s favorite”):

In fact, 362 of the jigs in this collection of 365 have a section where all intervals lie within . The only three jigs that don’t are #36 above, #13 (“The humors of Bantry”):

and jig #355 (“The lasses of Dunse”):

If we were to treat these series as one cluster in , which one lies closest to the centroid? Here’s what the centroid looks like (after projecting it to by rounding each dimension):

This would turn into a rather boring melody, but it is interesting to note that the beginning of the series consists of ascending intervals, and the conclusion is descending intervals to unison.

What is the distribution of distances between all series to this centroid? Below is a histogram of the Manhattan distances of all the series to this centroid:

The numbers are so large because we are computing differences between intervals six times for every quaver, and there are 48 quavers in an 8-measure section. The series closest to this centroid with a distance of 395 semitones is jig #97 (“The straw seat”)

I can see the resemblance of the first two sections to the centroid.

The jig with the section furthest away is jig #257 (“The Morgan Rattler”) which we saw last time has the largest variance in its time-interval series.

What if we perform K-means with four clusters? Here are the resulting centroids:

These are more interesting in terms of intervalic content. (The numbering of the centroids is not important.) Let’s convert them into notation to get a better feeling of the melodic content:

None of these resemble in the least a jig. But nothing to fear: below we see the distributions of distances within each of these clusters.

All the tunes in the collection have time-interval series that are relatively far away from the centroids.

I can increase the number of clusters and see how the centroids and distributions of distances change, but what should I expect? Not “prototype” series that are musically meaningful. If I increase the number of clusters to 40, I begin to see clusters with only a few series in them. Increasing beyond that, the number of clusters consisting of only two series increases. At around 800 centroids, clusters of one series begin appearing.

It doesn’t make sense to cluster 8-measure time-interval series. Breaking the series into single measures and then clustering those smaller units makes more sense to me. As does clustering the circular autocorrelation of the 8-measure time-interval series. Then in some sense we are clustering series based on their time-interval structure, e.g., structures of 2, 4 or 8 measures.

Let’s have a look at some of the 145-dimensional autocorrelations of time-interval series. Here they are for jig #333 (“Miss Downing’s fancy”):

By far the largest peak is at zero. Some smaller peaks around 2 and 3 measures suggests some repetition of features of that length, but no direct repetition. Here’s the dots to see what is going on:

We see both parts feature repetitions of some measures, but with variations that makes the structure of each more complex.

On the contrary, here is the circular autocorrelation of the time-interval series for jig #344 (“The stolen purse”):

This suggests parts A and B in this tune are built from four measure sections, but part C is an eight measure section. The dots shows this to be the case:

Here’s another for jig #17 (“The eavesdropper”):

I predict that the first part is built from an intervalic structure of a single-measure length, but the second part has a structure that is four measures length. Here’s the dots confirming that prediction:

Here’s the autocorrelation for jig #56 (“The humors of Cappa”):

Both parts of this jig seem to be built from a half-measure intervalic structures, but the first part more strongly so. The dots show this to be the case, taking into account the anacrusis:

The autocorrelation for jig #71 (“Courtney’s favorite”) shows quite a difference in the structures of its two parts:

As for the previous jig, this A part is built from repeating a structure of half a measure, and since the size of these is so large, I predict the intervals will be large. The B part has much smaller values, and a structure of two and four measures. Since its values are smaller, I predict the B part has smaller intervals. The dots confirms these predictions:

Jig #73 (“Con Casey’s jig”) shows another interesting structure:

The A part seems to have repetitions of material of one third of a measure. The dots show what is going on.

The first measure shows a repetition of two quavers, occurring again in measures three and five.

Having looked through all of these time-interval series autocorrelations, I have a better sense of what the values mean. The value at zero will always be positive, and its value grows with the size and durations in the time-interval series. The series with the largest value (~363) at lag zero is in jig #257 (“The Morgan Rattler”), which we continue to see is quite a unique one in the collection. The jig with the smallest value (~26) at zero lag is #313 (“The frost is all over”):

The B part consists of a lot of stepwise motion and unisons.

A comparison of these autocorrelations with thus be looking at both the structures of the series and the sizes of the intervals. If I normalize each autocorrelation by the value at zero lag, then I will in some sense be comparing structures independent of the size of the intervals. Let’s try clustering by k-means with the autocorrelation and the normalized autocorrelation and see what comes about…

# An analysis of the 365 double jigs in O’Neill’s, pt. 5

This here is part 5 of my live blogging an analysis of the 365 double jigs in O’Neill’s 1001. Part 1 is here, part 2 is here, part 3 is here, and part 4 is here. Today I will begin to look more closely at the time-interval series of the tunes in the collection.

I first plot all 1,712 8-measure time-interval series from this collection and just look at them to get a sense of what kinds of structures appear. I see some that look like that of jig #89 (“The boys of the town”):

The legend refers to the sections: 1 and 2 are the first and second repeats of the A part, and 3 and 4 are the first and second repeats of the B part. To help with readability I have added some slight offsets in x and y.

The first thing that comes to my mind is this:

I loved that gum when I was kid. The first minute of each piece was glorious! That picture makes my mouth water.

Anyhow, the second thing that comes to my mind is the curious delay between the last two sections (red and green lines). Peeking at the underlying transcription shows how this delay arises:All sections have an anacrusis, but the last measure of the first ending of the B part is a full measure. So the delay we see in the time-interval series comes from a counting mistake. We can correct it simply by removing the B quaver in that last measure. I find about 15 more of these counting mistakes, and so correct them as best I can, reprocess the data, recreate all the features, and plot again.

Let’s have a look at some of the interesting time-interval patterns I see. Here’s the time-interval plot for jig #56 (“The humors of Cappa”):

This shows both parts of the tune share the same intervals in measures 3&4 and 7&8, but do something different in measures 1&2 and 5&6. Here’s the dots confirming that observation:This kind of repetition results in a clear tune structure, and a strong coherence between the parts. If I were to render this as a poem, it would be:

*research blogging, at home, on-line*

*research blogging, COVID-19*

*facebook tweeting, at home, on-line*

*facebook tweeting, COVID-19*

Here’s the time-interval series for jig #69 (“Philip O’Neill”):

The two parts to this tune echo the same final two measures, and share a bit of the middle section, but otherwise do different things. Here’s the dots to confirm:

Here’s the time-interval series for jig #101 (“The idle road”):

Both parts the last half, but at otherwise different. Look at all that bouncing up and down! Here’s the dots:I imagine a fiddle player in a horse-drawn cart on a bumpy road. It’s curious that O’Neill has notated broken rhythms explicitly. Perhaps the player from whom he transcribed this exaggerated the jig rhythm there. In this classic recording of the tune, Joe Burke (accordion) ignores that and plays the jig quite evenly with the others following suit:

Jig #148 (“The Kinnegad slashers”) is a three part jig with the following time-interval series:

We see a strong relationship between parts 1&2 (A) and 5&6 (C). The B part does something different until its last four measures. The B part also appears more constrained in its use of large intervals, except for the octave leap in its fourth measure. Here’s the dots to confirm:My perusal of these time-interval series inspires a few questions.

What tune features a time-interval series that spends most of the time at zero? Apparently there are two: the A part of jig #69 (“Philip O’Neill”):and the B part of jig #331 (“The foot of the mountain”):Sorting the series according to the time spent on a zero interval results in the following graph:

I think the height of the stair steps comes from using a sampling rate of 6 samples per quaver. There are apparently several tunes that spend no time at zero intervals. One of these is jig #82 (“Doherty’s fancy”):Another question to ask is what tune has a time-interval profile with the most positive mean? In other words, which tune spends most of its time at pitches arrived to by positive intervals? It appears to be jig #96 (“Our own little isle”):

The leap from the D quaver to the g dotted crotchet (an interval of 17 semitones) seems to be contributing a lot to this, even though most of the tune is going downwards.

I find 268 of the jigs in the collection feature a section with a positive mean time-interval profile, and 246 have a section with a negative mean time-interval profile. 110 jigs have a section with a mean time-interval profile exactly equal to zero. One is jig #17 (“The eavesdropper”):Another question to ask is which tune has a time-interval profile with the smallest variance? That prize goes to the A part of jig #84 (“Wellington’s advance”):

There are several semitone intervals in the A part. The jig with the largest variance is #257 (“The Morgan Rattler”):It’s easy to see why that’s the case.

Let’s picture all 1,712 time-interval series in the collection:

Here’s a plot showing this collapsed across the series:

We can see that the time spent at pitches arrived to by steps of ±2 semitones (major second) is greater than the time spent at pitches arrived to by ±1 semitone (which makes sense because most of the intervals in a scale are 2 semitones, and much of the melodic motion in these melodies is stepwise). We also see that the time spent after steps of -3 (minor third) and -4 (major third) semitones is greater than the time spent after steps of +3 and +4. However, more time is spent after an interval of +5 (perfect fourth) than -5 semitones. Spending time at pitches arrived to by intervals greater than a perfect fifth is rare, but if one is to find themselves at a pitch after an octave leap, expect to spend more time resting after leap up than down.

This look at the collection raises an interesting question: What happens when we break the series into smaller pieces, e.g., units of one-measure length? In that case, we would have at most 13,696 time-interval series of dimension 36. How many unique units are there? How do they relate? Are there “prototype” measures? Might we see each series as a concatenation of these units?

# An analysis of the 365 double jigs in O’Neill’s, pt. 4

This here is part 4 of my live blogging an analysis of the 365 double jigs in O’Neill’s 1001. Part 1 is here, part 2 is here, and part 3 is here.

I want to start looking at this collection of transcriptions in terms of intervalic content. I first convert each transcription to a sequence of numbers: the first element is the pitch relative to middle C (60), and the following elements are the successive intervals between pitches. Let me illustrate this with one example. Consider jig #218 (“The Connaughtman’s rambles”), transposed to C minor:

I convert each pitch in the sequence to the pitch space, where middle C is 60, the C# above that is 61, etc. I start each sequence with middle C, or 60. So the first four numbers describing the triplet in the anacrusis are (60, 72, 70, 68). Then I compute the difference between each successive element of the sequence. The beginning of the sequence describing this jig is (12, -2, -2, …). I think the first element of the sequence should describe the first pitch in relation to the pitch class C. So, I map its value to the range [-6,6] by subtracting or adding 12 a number of times. The sequence above then becomes (0, -2, -2, …). Its length is a total of 96*2=192 elements after I make the repeats explicit.

I also want to consider the duration of the intervals in terms of quaver. The first triplet squeezes three semiquavers into the duration of two. Since two semiquavers have a quaver duration of 1, this makes each one in this triplet have a duration of 1/3 = 0.3333. Hence the sequence of durations begins (0.3333, 0.3333, 0.3333, …). I make a cumulative sum of this sequence, prepending it with 0, in order to get the times at which interval changes occur. So this resulting sequence begins (0,0.3333,0.6667, …).

To create a piecewise linear function of this data, I perform nearest neighbor regression on a uniformly sampled domain stretching the length of the transcription (sample rate of 9 samples per quaver). Six quavers make a full measure in 6/8 time. The resulting function (which we will call the “time-interval plot”) of “The Connaughtman’s Rambles” appears as below:

Measure 16 divides the first part of this jig from the second part. Negative intervals show the melody moving down in pitch. Positive values show upward motion. Zero shows where the melody is static.

Now, what can we do with time-interval plot? Integrating it should reflect the pitch profile of the melody. The graph below shows the result of this. The second part of “The Connaughtman’s Rambles” is in a higher register than the first part.

Let’s compute the circular autocorrelation of the time-interval plot. This can be done simply by point-wise multiplying the Fourier transform of the sequence with its conjugate, and then returning to the time domain by an inverse Fourier transform. Here’s the resulting plot:

We see positive peaks at lags of 2, 4 and 8 measures, which makes complete sense given how each part of this tune is built from a two-measure core phrase. We see a strong negative peak at a lag of 15 measures. This means that after shifting the tune by 15 measures (or 7 without considering the repeats), or almost a whole part, it resembles a flipped version of itself… i.e., when the melody features positive intervals, the 7-measure shifted version features negative intervals. To see this more clearly, below I show a 7-measure shift of the jig (in its original key) against itself (top). The contrary motion in the intervals at this lag is clear.

We can also marginalize out the time domain of the time-interval plot, creating a histogram that describes the distribution of intervals weighted by the duration of the pitch arrived to by that interval:

We see that much of the time in this jig is spent at pitches arrived to by unison (interval of zero semitones). The next most time is spent at pitches arrived to by an interval of -2 semitones, or a major second. Some time is spent at pitches arrived to by ±5 semitones, which is a perfect fourth up or down. We can also break this histogram over the two parts of this tune:

We might also create a cumulative plot of the intervals in each section:

Each of the above describes the intervalic content of a tune, and how the tune is structured. How could we use this to compare tunes in a collection?

Let’s look at another tune that is quite different from “The Connaughtman’s Rambles”. Below is jig #284 (“Kitty of Oulart”), notated in the original key below:

Here is its time-interval plot:

Here’s the integration of the intervals:The downward trend in the first part of the tune shows more time is spent in downward melodic motion, while the upward trend in the other two parts show that the melody pauses more after upward than downward melodic motion.

Here’s the circular autocorrelation of the time-interval plot:

Due to the conventional structure of this kind of dance music, i.e., repeated 8 measure parts, we expect to nearly always see a strong peak at a lag of 8 measures. Furthermore, it is typical to see each part built from simpler ideas developed over 2 or 4 measures. Here we see a strong peak at 4 measures, showing that the core idea of each part has a length of 4 measures. This is clear from the score. We don’t see a strong peak at a lag of 2 measures, which is unlike The Connaughtman’s Rambles.

Finally, here’s the histogram for the entire tune:And here it is for each part (A, B, C) individually:Here’s the cumulative plot:

Now that we have considered two jigs from the collection that are quite different, several possibilities are clear.

First, we can take advantage of the fact that every jig in O’Neill’s collection has the conventional structure of multiple sections of 8 measures in 6/8 time. Only one jig in this collection has a single section of 8+8 measures (#95, “The sheep on the mountains”), but this can be treated as two 8 measure sections. This means we can express each tune as a set of time-interval plots computed over each 8 measure section. Each one of these plots will have the same length, so comparisons of them are simple. That length is the sampling rate (Fs samples per crotchet) times 6 quavers per measure times 8 measures. So for Fs=9, the dimensionality is 9*6*8=432.

Second, we can look at intervalic relationships within sections of tunes by performing a circular autocorrelation of their time-interval plots, and across sections of tunes by a circular cross-correlation of their time-interval plots. For a single section this can determine whether it is built from a 1-, 2, or 4 measure core idea. Across sections this can show how they relate, e.g., material returning in subsequent parts.

In the next part of this research live blogging, I will explore these ideas, and start to cluster tunes in the collection based on these characteristics.

# An analysis of the 365 double jigs in O’Neill’s, pt. 3

This is part 3 of my live blogging an analysis of the 365 double jigs in O’Neill’s 1001. Part 1 is here and part 2 is here.

Today I am looking at the measure structure of the transcriptions. This means I will reduce each transcription to its measure lines (“|”, “|:”, “:|”, “|1” and “|2”) and analyze the resulting strings. As an example:

X:1 T:Shandon bells O:Ireland B:Francis O'Neill: "The Dance Music of Ireland" (1907) no. 1 R:Double jig Z:Transcribed by Frank Nordberg - http://www.musicaviva.com F:http://www.musicaviva.com/abc/tunes/ireland/oneill-1001/oneill-1001-0001.abc M:6/8 L:1/8 K:D B|:AFD DFA|ded cBA|BGE EFA|B2A Bcd|\ AFD DFA|ded cBA|Bcd ecA|1d3 dcB:|2 d3d2|| g|f2 d- ded|faa afd|cAA eAA|cBc efg|f2 d- ded|faa afd|Bcd ecA|d3 d2:|

The measure structure of this is “|:|||||||1:||2|+||||||||:|”, where I have added the “+” to demarcate the two parts.

We find a total of 70 unique measure structures, which is curiously large. I think we are going to find some transcription errors!

The most common measure structure (140 of the 365 jigs) is: “||||||||:|+|:||||||||:|”. This describes the conventional structure of a repeated eight measure part (8+8) followed by another repeated eight measure part, where the second part has an explicit start repetitions sign. An example is jig #51 (“O’Sullivan’s march”):

Note the presence of an anacrusis (pickup bar) for each part.

The next most prevalent structure (56) is “||||||||:|+||||||||:|”, where the start repetition on both parts is implicit. An example is jig #2 (“The piper’s picnic”):

The third most prevalent structure (22) is “||||||||:|+|:|||||1|||:||2||| “, which now shows tunes with second parts that have two endings, each having more than one measure. An example is jig #67 (“Connie the soldier”):

Curiously, this tune appears differently in O’Neill’s 1905 collection of 1850 melodies (#794):

The first part is missing an anacrusis, and so the last measure of the A part is short by a quaver.

The fourth most prevalent structure (20) is “|||||||:|+|:|||||||:|” which is like the most prevalent structure we see, but here both parts do not have an anacrusis. An example is jig #69 (“Philip O’Neill”):

The fifth most prevalent structure (13) is “||||||||:|+||||||||:|+||||||||:|”, which is a jig with three parts having an anacrusis for each. An example is jig #4 (“The yellow flail”):

The sixth most prevalent structure (12) is also a three part jig with an anacrusis, but with beginning repeat signs made explicit: “||||||||:|+|:||||||||:|+|:||||||||:|”. An example is jig #98 (“The flaxdresser”):

Next most prevalent (9) is a three part jig without an anacrusis: “|||||||:|+|:|||||||:|+|:|||||||:|”. An example is jig #191 (“The Limerick tinker”):

The most prevalent seven measure structures occur in nearly 75% of the 365 jigs. Let’s look at the rest.

In eight transcriptions we see: “|||||||:|+|||||||:|”, which is 8+8 without an anacrusis, and implicit start repeat signs. In 8 transcriptions we see “||||||||:|+|||||1|||:||2|||”, like the third most prevalent structure but with an implicit start repeat sign on the second part. An example is jig #13 (“The humors of Bantry”):

There are five transcriptions with four parts specified like so: “||||||||:|+|:||||||||:|+|:||||||||:|+|:||||||||:|”, and three specified as “||||||||:|+||||||||:|+||||||||:|+||||||||:|”, where the start repeat signs are implicit for each part.

The eleventh most prevalent measure structure (3) is “|:|||||||:|+|||||||:|”, an example of which is jig #18 (“Saddle the pony”):

These cases have an explicit start repeat sign on the first part, but not on the second. There are only two transcriptions that specify a start repeat sign on both parts, i.e., “|:|||||||:|+|:|||||||:|”.

Only two transcriptions specify 8+16: “||||||||:|+||||||||||||||||”, e.g., jig #136 (“Father Jack Walsh”):

One transcription specifies a repeat after the second ending, e.g.,jig #305 (“Delaney’s drummers”):

This is in the printed version of O’Neill’s 1001, but I wonder if it should be there. It’s not played like that by the brilliant Martin O’Connor (his playing show nice improvisation around the bones of the tune). I think that repeat sign should be removed.

Only one transcription specifies first and second endings for all three parts, e.g., jig #329 (“Humors of Clare”):

Jig #313 (“The frost is all over”) is printed like so:

I think the start repeat sign on the second part is wrong because that would make it a very unconventional structure. Instead, I would notate this tune like so:

One transcription appears to not have any repeat measure lines: jig #95 (“The sheep on the mountains”):

That symbol occurring at the beginning and end is a “segno”. The last occurrence should really say “D.S.” or “dal sengo”, which states to play starting at the segno. This jig doesn’t appear to have ever been recorded.

After all of the above corrections and a few others, I reran my tokenization script and recomputed the unique measure structures and got 67 unique ones now. Many of them are equivalent since there are a variety of ways to specify the 8+8 structure, with and without anacrusis, with two different endings, etc. This calls for normalizing the transcriptions to bring things closer in terms of the representation, but first we will replace the tokens consisting of two characters with a single character. So we will make “|:” be “S”, “:|” be “E”, “|1” be “1”, and “|2” be “2”.

Since “||||||||ES||||||||E” is most prevalent, and “||||||||E||||||||E” is the second most prevalent, I will change the 56 expressed in the latter as the former. I will also change the five transcriptions with the structure “|||||||E|||||||E” as “|||||||ES|||||||E”. And change the three with “||||||||E||||||||E||||||||E||||||||E” to have “||||||||ES||||||||ES||||||||ES||||||||E”, etc. etc. After this normalization, I end up with 48 unique measure structures. Of the 365 transcriptions in this collection, 293 (>80%) are expressed by 5 measure structures.

Now we will look at computing the normalized DL similarity of these normalized measure sequences. Here’s the resulting similarity matrix:

Collapsing this matrix along an axis gives the mean normalized DL similarity for each transcription.

Now we have a nice inverse relationship between the length of a tune and its mean normalized DL similarity:

Performing multidimensional scaling on the dissimilarity matrix, we find clusters of transcriptions based on their structures. (The numbering in the image below has a shift of one. So #258 is actually jig #257.)

There appears to be a few major clusters. Below the x-axis lay 282 tunes, about 78% of the collection. The three major clusters below the x-axis from left to right are jigs with measure structures “||||||||ES||||1|||E2|||”, then “||||||||ES||||||||E”, and then “|||||||ES|||||||E”. The two clusters around y=0.2 are of jigs with structures “||||||||ES||||||||ES||||||||E” and “|||||||ES|||||||ES|||||||E”. The cluster at (-0.2,0.4) is of jigs with structures “||||||||ES||||||||ES||||||||ES||||||||E”. So it seems here that the jigs with more than two parts are above the x-axis.

Tomorrow we will start looking at more musically meaningful comparisons, e.g., comparing pitch and intervallic content.