# An analysis of the 365 double jigs in O’Neill’s, pt. 9

This is part 9 of my live blogging analysis of the 365 double jigs in O’Neill’s 1001. The last part reviews where I have been. In this part I look at the time-pitch series of the collection. I create these series by single nearest neighbor regression on tuples of pitch and time observations extracted from a transcription. As an example, here is jig #201 (“Biddy’s wedding”):

Its four pitch-time series appear like so:

This feature has a clear relationship to the transcription because it shows which pitch occurs at what time over 8-measure segments. I can extract a time-interval series from these series by moving stepwise along time and finding and holding subsequent differences. The time-interval series for “Biddy’s wedding” appears like so:

The step of 5 semitones for series 2-4 come from the G pitch at the end of each line. I am making the first interval of the first series always be zero.

Here is the transcription I found at the center of a multidimensional scaling of the collection of transcriptions, jig #134 (“Young Tim Murphy”):

And here is its time-pitch series:

The two parts appear quite different save for the last two measures. Here is the time-interval series I extract from this:

In terms of intervals, we see the two parts are similar in measure 4 as well.

One difference between the two jigs above is the anacrusis. This in effect shifts to the right each series of #134 with respect to those of #201. If I am comparing only the series extracted from one transcription, there’s no problem since they all have the same shift. But if I want to compare series across transcriptions, some with an anacrusis and others without, I need to account for the shifts, i.e., align the measures. This will be important to consider when looking at tunes as sequences of measures.

The music21 library provides an easy way to detect an anacrusis, so I have rewritten my feature extraction code such that all series are aligned by measure. Let’s continue looking at the time-pitch series of the collection.

The dots of jig #17 (“The eavesdropper”) are:

and results in the following time-pitch series:

Note that middle C is pitch 60, but I have transposed all jigs in this collection to have a root of C. Here is the time-interval series I extract from the time-pitch series:

One major difference in extracting the time-interval series from the time-pitch series as to how I was doing it before is that this new approach considers repeated pitches as one. So the run of B quavers in the first measure are grouped together in an interval of 4 semitones over 3 quavers. I think this is preferable from the standpoint of considering melody. Playing 3 quavers in place of a dotted crotchet does not change the melody other than its rhythmic characteristic.

(This motivates extracting a “time-duration” series from a transcription to describes its rhythmic characteristics. Instead looking at what pitch is playing when, look at what duration is playing when. Ignoring graces, rolls, and trills, the collection has only pitches of seven durations. From shortest to longest these are triplet semiquaver, semiquaver, triplet quaver, quaver, dotted quaver, crotchet, and dotted crotchet. I will explore this additional feature at a later time… but keep in mind that the features I am extracting are not exemplary of how these tunes are experienced in performance. These are just the bones of the tune as it was in someone’s hand in the early 20th century, without any meat, flesh or movement.)

In the time-pitch series for “The eavesdropper”, we also see how its B part departs from the A part by going higher in pitch, and then descends back to join it. A typical feature of two-part jigs in this collection is that the B part sits above the A part in pitch. To get an idea of how typical it is, let us sum the set of differences between time-pitch series 3 and 1, and of 4 and 2 for each two-part jig in the collection (N=291), and make a histogram of them:

A positive difference means part B of a tune spends more time at pitches higher than part A. I find 268 of the 291 two-part jigs (>92%) have a positive difference. The two-part jig that has the largest difference is #190 (“O’Mahony’s frolics”):

Here are its time-pitch series:

Notice how the first ending of the B part stays high, and the second ending takes the melody down back home.

Of the 23 two-part jigs with a negative difference, the most negative one is #57 (“The blazing turf fire”):

Here are its time-pitch series:

What happens in jigs with more than two parts? Here’s the time-pitch series of the four-part jig #286 (“Strop the razor (2nd setting)”):

We see the melody goes highest in penultimate part (series 5&6). we see the same in the three-part jig #320 (“The piper’s welcome”):

This is not the case in the three-part jig #344 (“The stolen purse”):

Another interesting feature I see in some tunes is contrary motion of the parts, e.g., jig #223 (“The rambler from Clare”):

The time-pitch series show this “mirror image” effect:

This is probably not an accidental feature, but done consciously or planned in composition. Jig #237 (“The Fardown farmer”) has the same kind of construction:

Here are its dots

The A part of this jig and the A part of “The rambler from Clare” are so similar it makes me wonder if the Fardown farmer was the that rambler from Clare

Other tunes have similar intervalic motion in their parts. Here’s jig #249 (“The flitch of Bacon”):

And here’s the corresponding time-pitch series

This also shows how I disregard rests in my extraction of the time-pitch series, just extending the duration of the pitch preceding it.

# An analysis of the 365 double jigs in O’Neill’s, pt. 8

This is part 8 of my live blogging analysis of the 365 double jigs in O’Neill’s 1001. It’s time for a breather. Let’s have a review!

1. Part 1 discusses O’Neill’s collection of jigs, and how I have normalized the transcriptions expressed with ABC notation. I use the normalized Damerau-Levenshtein distance (DL distance) to compare the transcriptions as strings, which locates some “duplicates” and variations, as well as several errors in the transcriptions. I find that the normalized DL distance provides sensible results.
2. Part 2 looks at the similarity matrix created from the normalized DL distance between all pairs of transcriptions. I analyze some of the pairs that have very large distances. I also perform some multidimensional scaling of the collection with the similarity matrix and look at the transcriptions that are at the center of the cluster. Finally, I observe that applying string edit distances to ABC notation is musically naive, e.g., “DEFG2G” in C major and “DEFG2G” in C minor are different.
3. Part 3 reduces the transcriptions to sequences of measure tokens and looks at the different measure structures present in the collection. This uncovers more errors in the transcriptions, and leads to further normalization of the collection. Performing multidimensional scaling on the reduced sequences creates sensible clusters.
4. Part 4 converts each transcription into “time-interval series”, which describes the intervalic “profile” of the melody. I explore other series derived from this representation by integration, circular autocorrelation, and marginalization (integrating out time). It is clear that the transcriptions in this collection have a well-defined structure having sections of eight measures, which motivates comparisons of features extracted from these sections, and smaller subsections of 1, 2 and 4 measures.
5. Part 5 inspects several 8-measure time-interval series in the collection, and gives a broad sense of the intervalic structures of the collection. I also find more transcription errors. I look at transcriptions with time-interval series that have specific statistical characteristics. I also look at the collection as a whole and find some interesting trends, e.g., time spent at pitches arrived to by a perfect fourth up is longer than vice versa.
6. Part 6 looks at clustering the 1,712 8-measure time-interval series of the collection. I analyze the centroids, and the distributions of distances to these. I transform some centroids to transcription sequences, which do not resemble any of the tunes in the collection. I also begin to inspect the circular autocorrelation of the time-interval series, which I believe are more indicative of the melodic structure in a transcription, e.g., revealing repetitions within a series.
7. Part 7 looks at clustering the 1,712 circular autocorrelations of the 8-measure time-interval series. I analyze the centroids, which make more musical sense to me than the centroids created from the time-interval series. The structure of a melody is more apparent in these representations, but there are some details that need to be worked out.
8. Part 8 reviews where we have been, and some questions that remain open. I also look at the sensitivity of a time-pitch series to subtle transformations of the originating transcription.

I have a growing list of open questions:

1. A multidimensional scaling of the transcriptions according to their normalized DL distances places a few transcriptions closest to the center of the cluster: jigs #134 (“Young Tim Murphy”) and #296 (“Barney O’Neill”). How stable is that position? What is the significance of those transcriptions in that position? What does that position mean musically speaking, if anything? (Perhaps this is not worth investigating given the lack of musical meaning of a string edit distance between ABC transcriptions.)
2. A number of features have been proposed that express the musical content of a transcription in eight-measure sections: 1) (mean-centered) time-interval series; 2)  (normalized) circular autocorrelation of time-interval series; 3) integral of time-interval series; 4) time-marginalization of time-interval series; 5) histogram of time-marginalization of time-interval series. What about expressing the normalized melody (transposed to root C) as a time-pitch series? What is the musical significance of each of these features?
3. K-means clustering of the circular autocorrelation of time-interval series shows some sensible results, e.g., finding eight-measure series that are structurally similar. What changes when we perform K-mean clustering on normalized circular autocorrelations (that is, dividing each by the value at zero lag)?
4. If we break the time-interval series into units of one-measure duration, how many unique units are there? How do they relate? Are there “prototype” measures? Might we see each eight-measure series as a concatenation of these “codebook” units?
5. My explorations so far show how we can analyze a collection of transcriptions. Can these approaches be used to compare two collections of transcriptions? Say,  O’Neill’s collection with another collection of supposed jigs, say computer-generated, hmm? Hmmmm?

Near the conclusion of the last part, I noticed something that needs more thought. Let’s look at jig #201 (“Biddy’s Wedding”):

This is a very simple tune. Harmonically both parts are: I-I-I-V-I-I-IV-V. The A part is built from two-measure bits like so: abac. The B part is just a variation: a’b’a’c. “Filling in”  crotchet-quaver pairs with passing tones or chord tones, or removing those, do not change the melody. But the time-interval series show these as major changes:

The c part in measures 7&8 is clearly identical. The b and b’ parts appear quite close as well, except for the big long-duration jump of 7 semitones in b’. However, the relationship between the a and a’ parts is not clear. Performing a correlation of these parts of the series would involve a multiplication of a string of zeros, which would reduce its value.

The circular autocorrelation of these time-interval series of this tune suggests its both parts are not closely related:

From looking at the transcription, I expect both parts of this tune to produce large peaks at a lag of 2 and 4 measures, which we see. But the half-measure peaks in the B part (lines 3&4) are curious, as are the small peaks for the A part at some other fraction of  a measure.

Let’s do an experiment to see how robust these features are. I will slightly modify the transcription as below and recompute the time-interval series and its circular autocorrelation:I have added an anacrusis to each part, and have filled in the crotchet-quaver pairs. Here’s the time-interval series for these parts:

The circular autocorrelation of these are:

The differences with the original features do not appear to be that great, which is a good sign. I still see that curious structure in the A part.

If we make the arpeggiation of I in measures 2&6 of the A part go downward like so: the circular autocorrelation of the time-interval series become more similar:

I don’t think such a minor change to the transcription should result in a major change of high-level features extracted from it. This points to the fact that the time-interval series are too detailed to make meaningful comparisons of melodic structure.

I think I have to return to basics and look at representing the melody as a time-pitch series, and how this might be transformed into a feature that more clearly expresses  structure.

# An analysis of the 365 double jigs in O’Neill’s, pt. 7

This is part 7 of my live blogging analysis of of the 365 double jigs in O’Neill’s 1001. Part 1 is here, part 2 is here, part 3 is here, part 4 is here, part 5 is here, and here is part 6.

Now let us look at the results of k-mean clustering of the circular autocorrelations of the 1,712 time-interval series. I start with a single cluster and look at the centroid and the distribution of distances to it. Here is the 145-dimensional centroid:

That looks pretty good. The high value at zero lag suggests this is a sequence with some large time-intervals. The peak at a lag of four suggests that half of the series strongly resembles the other half. The peak at two suggests that the series is built from a two-measure bit. And so on. Let’s look at the distribution of Euclidean distances to this centroid:

The median of this distribution is around a distance of 104. The largest Euclidean distance we see is about 996, and the smallest is 65. The series furthest from this centroid is in jig #257 (“The Morgan Rattler”), which we keep seeing is a very unique jig in this collection. The series closest to this centroid comes from jig #155 (“Jackson’s rambles”). Here’s its circular autocorrelation:

It looks like part A of this tune contributes the matching series. The dots below show this part has a four-measure structure, and some repetition of intervals at the two-measure level:

Here’re the centroids coming from K-means with two clusters:

And here are the distance distributions:

There are 1479 series in cluster 2, but only 233 in cluster 1. The Euclidean distance between the two centroids is about 200.

Let’s try four clusters. Here are the centroids (x-offset is just for display):

Now we can see centroid 1 (population is 209) has to do with time-interval series with similarities at the half-measure-level, centroid 4 (pop. = 98) has to do with time-interval series with similarities at the measure-level, centroid 2 (pop. = 202) has to do with time-interval series with similarities at the two-measure-level, and centroid 3 (pop. = 1102) is perhaps something to do with similarities at the four-measure level.

Here’s eight centroids:

And the distances within each cluster.

Cluster 8 is the most populated, with 697 series; but cluster 6 has only 4. Let me guess: those come from “The Morgan Rattler”… Indeed, I see series from #257. But also #154 (“The Antrim lasses”):Here’s the autocorrelation of its time-interval series:

The B part of this tune shows the same structure we see in centroid 6.

There are 48 series in the cluster described by centroid 5 coming from 22 jigs: #6, 18, 23, 30, 56, 71, 82, 117, 125, 126, 127, 172, 178, 183, 186, 201, 204, 258, 274, 287, 291, 343. These should have sequences with repetition at the half measure. Let’s look at two. Jig #18 (“Saddle the pony”):and jig #201 (“Biddy’s wedding”):Looking at the autocorrelation of their time-interval series shows their similarity in this domain (the first is “Saddle the pony”):

Even the other two parts look related! So time-intervalically speaking, we can see why these sections would be grouped together. However, the melodies of these jigs are not very similar.

I have searched the web for people playing these tunes, but there appear to be none! All the performances I can find of “Saddle the pony” are actually the jigs “The Priest’s Leap” (#59) and “The Draught of Ale” (#156) in O’Neill’s 1001 (identical tunes). And “Biddy’s wedding” doesn’t appear to have been recorded anywhere. So learn to play them I have:

Here’s one time through “Saddle the pony” in O’Neill’s 1001 on The Black Box:

Here’s one time through “Biddy’s wedding” from O’Neill’s 1001 (but played in G):

And now I find something curious! “Saddle the pony” appears in O’Neill’s 1850 as two settings, both in A major. The second setting is the one appearing in O’Neill’s 1001, but  with a dropped seventh (A mixolydian):

Why didn’t O’Neill include both settings in his 1001? And where did the G sharp go? I do think the flattened seventh sounds more Irish.

Update: 20200402

My teacher Paudie O’Connor says the G sharps might occur in Donegal, but that the 1001 version plays well as written. There is a four-part jig called “Langstrom’s Pony” that has as its first two parts this version. Here’s De Danann playing the tune:

# An analysis of the 365 double jigs in O’Neill’s, pt. 6

This is part 6 of my live blogging analysis of of the 365 double jigs in O’Neill’s 1001. Part 1 is here, part 2 is here, part 3 is here, part 4 is here, and part 5 is here.

Let’s do some clustering of the time-interval profiles in the dataset. I compute these profiles with a sampling rate 6 samples per quaver, over 8 measure sections, which make them have length 6*6*8 = 288. Ima take all 1,712 series in $\{-17,\ldots,21\}^{288}$ and cluster them. All of the time-interval series have this range (and they are all integers since the semitone is the smallest division of the octave in equal temperament). The jump up of 21 semitones (an octave and a major sixth) occurs only in jig #330 (“The queen of the fair”):

The largest leap down of -17 semitones occurs in three jigs, one of which is #36 (“Father Dollard’s favorite”):

In fact, 362 of the jigs in this collection of 365 have a section where all intervals lie within $[-12,12]$. The only three jigs that don’t are #36 above, #13 (“The humors of Bantry”):

and jig #355 (“The lasses of Dunse”):

If we were to treat these series as one cluster in $\{-17,\ldots,21\}^{288}$, which one lies closest to the centroid? Here’s what the centroid looks like (after projecting it to $\{-17,\ldots,21\}^{288}$ by rounding each dimension):

This would turn into a rather boring melody, but it is interesting to note that the beginning of the series consists of ascending intervals, and the conclusion is descending intervals to unison.

What is the distribution of distances between all series to this centroid? Below is a histogram of the Manhattan distances of all the series to this centroid:

The numbers are so large because we are computing differences between intervals six times for every quaver, and there are 48 quavers in an 8-measure section. The series closest to this centroid with a distance of 395 semitones is jig #97 (“The straw seat”)

I can see the resemblance of the first two sections to the centroid.

The jig with the section furthest away is jig #257 (“The Morgan Rattler”) which we saw last time has the largest variance in its time-interval series.

What if we perform K-means with four clusters? Here are the resulting centroids:

These are more interesting in terms of intervalic content. (The numbering of the centroids is not important.) Let’s convert them into notation to get a better feeling of the melodic content:

None of these resemble in the least a jig. But nothing to fear: below we see the distributions of distances within each of these clusters.

All the tunes in the collection have time-interval series that are relatively far away from the centroids.

I can increase the number of clusters and see how the centroids and distributions of distances change, but what should I expect? Not “prototype” series that are musically meaningful. If I increase the number of clusters to 40, I begin to see clusters with only a few series in them. Increasing beyond that, the number of clusters consisting of only two series increases. At around 800 centroids, clusters of one series begin appearing.

It doesn’t make sense to cluster 8-measure time-interval series. Breaking the series into single measures and then clustering those smaller units makes more sense to me. As does clustering the circular autocorrelation of the 8-measure time-interval series. Then in some sense we are clustering series based on their time-interval structure, e.g., structures of 2, 4 or 8 measures.

Let’s have a look at some of the 145-dimensional autocorrelations of time-interval series. Here they are for jig #333 (“Miss Downing’s fancy”):

By far the largest peak is at zero. Some smaller peaks around 2 and 3 measures suggests some repetition of features of that length, but no direct repetition. Here’s the dots to see what is going on:

We see both parts feature repetitions of some measures, but with variations that makes the structure of each more complex.

On the contrary, here is the circular autocorrelation of the time-interval series for jig #344 (“The stolen purse”):

This suggests parts A and B in this tune are built from four measure sections, but part C is an eight measure section. The dots shows this to be the case:

Here’s another for jig #17 (“The eavesdropper”):

I predict that the first part is built from an intervalic structure of a single-measure length, but the second part has a structure that is four measures length. Here’s the dots confirming that prediction:

Here’s the autocorrelation for jig #56 (“The humors of Cappa”):

Both parts of this jig seem to be built from a half-measure intervalic structures, but the first part more strongly so. The dots show this to be the case, taking into account the anacrusis:

The autocorrelation for jig #71 (“Courtney’s favorite”) shows quite a difference in the structures of its two parts:

As for the previous jig, this A part is built from repeating a structure of half a measure, and since the size of these is so large, I predict the intervals will be large. The B part has much smaller values, and a structure of two and four measures. Since its values are smaller, I predict the B part has smaller intervals. The dots confirms these predictions:

Jig #73 (“Con Casey’s jig”) shows another interesting structure:

The A part seems to have repetitions of material of one third of a measure. The dots show what is going on.

The first measure shows a repetition of two quavers, occurring again in measures three and five.

Having looked through all of these time-interval series autocorrelations, I have a better sense of what the values mean. The value at zero will always be positive, and its value  grows with the size and durations in the time-interval series. The series with the largest value (~363) at lag zero is in jig #257 (“The Morgan Rattler”), which we continue to see is quite a unique one in the collection. The jig with the smallest value (~26) at zero lag is #313 (“The frost is all over”):

The B part consists of a lot of stepwise motion and unisons.

A comparison of these autocorrelations with thus be looking at both the structures of the series and the sizes of the intervals. If I normalize each autocorrelation by the value at zero lag, then I will in some sense be comparing structures independent of the size of the intervals. Let’s try clustering by k-means with the autocorrelation and the normalized autocorrelation and see what comes about…

# An analysis of the 365 double jigs in O’Neill’s, pt. 5

This here is part 5 of my live blogging an analysis of the 365 double jigs in O’Neill’s 1001. Part 1 is here, part 2 is here, part 3 is here, and part 4 is here. Today I will begin to look more closely at the time-interval series of the tunes in the collection.

I first plot all 1,712 8-measure time-interval series from this collection and just look at them to get a sense of what kinds of structures appear. I see some that look like that of jig #89 (“The boys of the town”):
The legend refers to the sections: 1 and 2 are the first and second repeats of the A part, and 3 and 4 are the first and second repeats of the B part. To help with readability I have added some slight offsets in x and y.

The first thing that comes to my mind is this:

I loved that gum when I was kid. The first minute of each piece was glorious! That picture makes my mouth water.

Anyhow, the second thing that comes to my mind is the curious delay between the last two sections (red and green lines). Peeking at the underlying transcription shows how this delay arises:All sections have an anacrusis, but the last measure of the first ending of the B part is a full measure. So the delay we see in the time-interval series comes from a counting mistake. We can correct it simply by removing the B quaver in that last measure. I find about 15 more of these counting mistakes, and so correct them as best I can, reprocess the data, recreate all the features, and plot again.

Let’s have a look at some of the interesting time-interval patterns I see. Here’s the time-interval plot for jig #56 (“The humors of Cappa”):

This shows both parts of the tune share the same intervals in measures 3&4 and 7&8, but do something different in measures 1&2 and 5&6. Here’s the dots confirming that observation:This kind of repetition results in a clear tune structure, and a strong coherence between the parts. If I were to render this as a poem, it would be:

research blogging, at home, on-line
research blogging, COVID-19

Here’s the time-interval series for jig #69 (“Philip O’Neill”):

The two parts to this tune echo the same final two measures, and share a bit of the middle section, but otherwise do different things. Here’s the dots to confirm:

Here’s the time-interval series for jig #101 (“The idle road”):

Both parts the last half, but at otherwise different. Look at all that bouncing up and down! Here’s the dots:I imagine a fiddle player in a horse-drawn cart on a bumpy road. It’s curious that O’Neill has notated broken rhythms explicitly. Perhaps the player from whom he transcribed this exaggerated the jig rhythm there. In this classic recording of the tune, Joe Burke (accordion) ignores that and plays the jig quite evenly with the others following suit:

Jig #148 (“The Kinnegad slashers”) is a three part jig with the following time-interval series:

We see a strong relationship between parts 1&2 (A) and 5&6 (C). The B part does something different until its last four measures. The B part also appears more constrained in its use of large intervals, except for the octave leap in its fourth measure. Here’s the dots to confirm:My perusal of these time-interval series inspires a few questions.

What tune features a time-interval series that spends most of the time at zero? Apparently there are two: the A part of jig #69 (“Philip O’Neill”):and the B part of jig #331 (“The foot of the mountain”):Sorting the series according to the time spent on a zero interval results in the following graph:

I think the height of the stair steps comes from using a sampling rate of 6 samples per quaver. There are apparently several tunes that spend no time at zero intervals. One of these is jig #82 (“Doherty’s fancy”):Another question to ask is what tune has a time-interval profile with the most positive mean? In other words, which tune spends most of its time at pitches arrived to by positive intervals? It appears to be jig #96 (“Our own little isle”):

The leap from the D quaver to the g dotted crotchet (an interval of 17 semitones) seems to be contributing a lot to this, even though most of the tune is going downwards.

I find 268 of the jigs in the collection feature a section with a positive mean time-interval profile, and 246 have a section with a negative mean time-interval profile. 110 jigs have a section with a mean time-interval profile exactly equal to zero. One is jig #17 (“The eavesdropper”):Another question to ask is which tune has a time-interval profile with the smallest variance? That prize goes to the A part of jig #84 (“Wellington’s advance”):

There are several semitone intervals in the A part. The jig with the largest variance is #257 (“The Morgan Rattler”):It’s easy to see why that’s the case.

Let’s picture all 1,712 time-interval series in the collection:

Here’s a plot showing this collapsed across the series:

We can see that the time spent at pitches arrived to by steps of ±2 semitones (major second) is greater than the time spent at pitches arrived to by ±1 semitone (which makes sense because most of the intervals in a scale are 2 semitones, and much of the melodic motion in these melodies is stepwise). We also see that the time spent after steps of -3 (minor third) and -4 (major third) semitones is greater than the time spent after steps of +3 and +4. However, more time is spent after an interval of +5 (perfect fourth) than -5 semitones. Spending time at pitches arrived to by intervals greater than a perfect fifth is rare, but if one is to find themselves at a pitch after an octave leap, expect to spend more time resting after leap up than down.

This look at the collection raises an interesting question: What happens when we break the series into smaller pieces, e.g., units of one-measure length? In that case, we would have at most 13,696 time-interval series of dimension 36. How many unique units are there? How do they relate? Are there “prototype” measures? Might we see each series as a concatenation of these units?

# An analysis of the 365 double jigs in O’Neill’s, pt. 4

This here is part 4 of my live blogging an analysis of the 365 double jigs in O’Neill’s 1001. Part 1 is here, part 2 is here, and part 3 is here.

I want to start looking at this collection of transcriptions in terms of intervalic content. I first convert each transcription to a sequence of numbers: the first element is the pitch relative to middle C (60), and the following elements are the successive intervals between pitches. Let me illustrate this with one example. Consider jig #218 (“The Connaughtman’s rambles”), transposed to C minor:

I convert each pitch in the sequence to the pitch space, where middle C is 60, the C# above that is 61, etc. I start each sequence with middle C, or 60. So the first four numbers describing the triplet in the anacrusis are (60, 72, 70, 68). Then I compute the difference between each successive element of the sequence. The beginning of the sequence describing this jig is (12, -2, -2, …). I think the first element of the sequence should describe the first pitch in relation to the pitch class C. So, I map its value to the range [-6,6] by subtracting or adding 12 a number of times. The sequence above then becomes (0, -2, -2, …). Its length is a total of 96*2=192 elements after I make the repeats explicit.

I also want to consider the duration of the intervals in terms of quaver. The first triplet  squeezes three semiquavers into the duration of two. Since two semiquavers have a quaver duration of 1, this makes each one in this triplet have a duration of 1/3 = 0.3333. Hence the sequence of durations begins (0.3333, 0.3333, 0.3333, …). I make a cumulative sum of this sequence, prepending it with 0, in order to get the times at which  interval changes occur. So this resulting sequence begins (0,0.3333,0.6667, …).

To create a piecewise linear function of this data, I perform nearest neighbor regression on a uniformly sampled domain stretching the length of the transcription (sample rate of 9 samples per quaver). Six quavers make a full measure in 6/8 time. The resulting function (which we will call the “time-interval plot”) of “The Connaughtman’s Rambles” appears as below:

Measure 16 divides the first part of this jig from the second part. Negative intervals show the melody moving down in pitch. Positive values show upward motion. Zero shows where the melody is static.

Now, what can we do with time-interval plot? Integrating it should reflect the pitch profile of the melody. The graph below shows the result of this. The second part of “The Connaughtman’s Rambles” is in a higher register than the first part.

Let’s compute the circular autocorrelation of the time-interval plot. This can be done simply by point-wise multiplying the Fourier transform of the sequence with its conjugate, and then returning to the time domain by an inverse Fourier transform. Here’s the resulting plot:

We see positive peaks at lags of 2, 4 and 8 measures, which makes complete sense given how each part of this tune is built from a two-measure core phrase. We see a strong negative peak at a lag of 15 measures. This means that after shifting the tune by 15 measures (or 7 without considering the repeats), or almost a whole part, it resembles a flipped version of itself… i.e., when the melody features positive intervals, the 7-measure shifted version features negative intervals. To see this more clearly, below I show a 7-measure shift of the jig (in its original key) against itself (top). The contrary motion in the intervals at this lag is clear.

We can also marginalize out the time domain of the time-interval plot, creating a histogram that describes the distribution of intervals weighted by the duration of the pitch arrived to by that interval:

We see that much of the time in this jig is spent at pitches arrived to by unison (interval of zero semitones). The next most time is spent at pitches arrived to by an interval of -2 semitones, or a major second. Some time is spent at pitches arrived to by ±5 semitones, which is a perfect fourth up or down. We can also break this histogram over the two parts of this tune:

We might also create a cumulative plot of the intervals in each section:

Each of the above describes the intervalic content of a tune, and how the tune is structured. How could we use this to compare tunes in a collection?

Let’s look at another tune that is quite different from “The Connaughtman’s Rambles”. Below is jig #284 (“Kitty of Oulart”), notated in the original key below:

Here is its time-interval plot:

Here’s the integration of the intervals:The downward trend in the first part of the tune shows more time is spent in downward melodic motion, while the upward trend in the other two parts show that the melody pauses more after upward than downward melodic motion.

Here’s the circular autocorrelation of the time-interval plot:

Due to the conventional structure of this kind of dance music, i.e., repeated 8 measure parts, we expect to nearly always see a strong peak at a lag of 8 measures. Furthermore, it is typical to see each part built from simpler ideas developed over 2 or 4 measures. Here we see a strong peak at 4 measures, showing that the core idea of each part has a length of 4 measures. This is clear from the score. We don’t see a strong peak at a lag of 2 measures, which is unlike The Connaughtman’s Rambles.

Finally, here’s the histogram for the entire tune:And here it is for each part (A, B, C) individually:Here’s the cumulative plot:

Now that we have considered two jigs from the collection that are quite different, several possibilities are clear.

First, we can take advantage of the fact that every jig in O’Neill’s collection has the conventional structure of multiple sections of 8 measures in 6/8 time. Only one jig in this collection has a single section of 8+8 measures (#95, “The sheep on the mountains”), but this can be treated as two 8 measure sections. This means we can express each tune as a set of time-interval plots computed over each 8 measure section. Each one of these plots will have the same length, so comparisons of them are simple. That length is the sampling rate (Fs samples per crotchet) times 6 quavers per measure times 8 measures. So for Fs=9, the dimensionality is 9*6*8=432.

Second, we can look at intervalic relationships within sections of tunes by performing a circular autocorrelation of their time-interval plots, and across sections of tunes by a circular cross-correlation of their time-interval plots. For a single section this can determine whether it is built from a 1-, 2,  or 4 measure core idea. Across sections this can show how they relate, e.g., material returning in subsequent parts.

In the next part of this research live blogging, I will explore these ideas, and start to cluster tunes in the collection based on these characteristics.

# An analysis of the 365 double jigs in O’Neill’s, pt. 3

This is part 3 of my live blogging an analysis of the 365 double jigs in O’Neill’s 1001. Part 1 is here and part 2 is here.

Today I am looking at the measure structure of the transcriptions. This means I will  reduce each transcription to its measure lines (“|”, “|:”, “:|”, “|1” and “|2”) and analyze the resulting strings. As an example:

```X:1
T:Shandon bells
O:Ireland
B:Francis O'Neill: "The Dance Music of Ireland" (1907) no. 1
R:Double jig
Z:Transcribed by Frank Nordberg - http://www.musicaviva.com
F:http://www.musicaviva.com/abc/tunes/ireland/oneill-1001/oneill-1001-0001.abc
M:6/8
L:1/8
K:D
B|:AFD DFA|ded cBA|BGE EFA|B2A Bcd|\
AFD DFA|ded cBA|Bcd ecA|1d3 dcB:|2 d3d2||
g|f2 d- ded|faa afd|cAA eAA|cBc efg|f2 d- ded|faa afd|Bcd ecA|d3 d2:|
```

The measure structure of this is “|:|||||||1:||2|+||||||||:|”, where I have added the “+” to demarcate the two parts.

We find a total of 70 unique measure structures, which is curiously large. I think we are going to find some transcription errors!

The most common measure structure (140 of the 365 jigs) is: “||||||||:|+|:||||||||:|”. This describes the conventional structure of a repeated eight measure part (8+8) followed by another repeated eight measure part, where the second part has an explicit start repetitions sign. An example is jig #51 (“O’Sullivan’s march”):

Note the presence of an anacrusis (pickup bar) for each part.

The next most prevalent structure (56) is “||||||||:|+||||||||:|”, where the start repetition on both parts is implicit. An example is jig #2 (“The piper’s picnic”):

The third most prevalent structure (22) is “||||||||:|+|:|||||1|||:||2||| “, which now shows tunes with second parts that have two endings, each having more than one measure. An example is jig #67 (“Connie the soldier”):

Curiously, this tune appears differently in O’Neill’s 1905 collection of 1850 melodies (#794):

The first part is missing an anacrusis, and so the last measure of the A part is short by a quaver.

The fourth most prevalent structure (20) is “|||||||:|+|:|||||||:|” which is like the  most prevalent structure we see, but here both parts do not have an anacrusis. An example is jig #69 (“Philip O’Neill”):

The fifth most prevalent structure (13) is “||||||||:|+||||||||:|+||||||||:|”, which is a jig with three parts having an anacrusis for each. An example is jig #4 (“The yellow flail”):

The sixth most prevalent structure (12) is also a three part jig with an anacrusis, but with beginning repeat signs made explicit: “||||||||:|+|:||||||||:|+|:||||||||:|”. An example is jig #98 (“The flaxdresser”):

Next most prevalent (9) is a three part jig without an anacrusis: “|||||||:|+|:|||||||:|+|:|||||||:|”. An example is jig #191 (“The Limerick tinker”):

The most prevalent seven measure structures occur in nearly 75% of the 365 jigs. Let’s look at the rest.

In eight transcriptions we see: “|||||||:|+|||||||:|”, which is 8+8 without an anacrusis, and implicit start repeat signs. In 8 transcriptions we see “||||||||:|+|||||1|||:||2|||”, like the third most prevalent structure but with an implicit start repeat sign on the second part. An example is jig #13 (“The humors of Bantry”):

There are five transcriptions with four parts specified like so: “||||||||:|+|:||||||||:|+|:||||||||:|+|:||||||||:|”, and three specified as “||||||||:|+||||||||:|+||||||||:|+||||||||:|”, where the start repeat signs are implicit for each part.

The eleventh most prevalent measure structure (3) is “|:|||||||:|+|||||||:|”, an example of which is jig #18 (“Saddle the pony”):

These cases have an explicit start repeat sign on the first part, but not on the second. There are only two transcriptions that specify a start repeat sign on both parts, i.e., “|:|||||||:|+|:|||||||:|”.

Only two transcriptions specify 8+16: “||||||||:|+||||||||||||||||”, e.g., jig #136 (“Father Jack Walsh”):

One transcription specifies a repeat after the second ending, e.g.,jig #305 (“Delaney’s drummers”):

This is in the printed version of O’Neill’s 1001, but I wonder if it should be there. It’s not played like that by the brilliant Martin O’Connor (his playing show nice improvisation around the bones of the tune). I think that repeat sign should be removed.

Only one transcription specifies first and second endings for all three parts, e.g., jig #329 (“Humors of Clare”):

Jig #313 (“The frost is all over”) is printed like so:

I think the start repeat sign on the second part is wrong because that would make it a very unconventional structure. Instead, I would notate this tune like so:

One transcription appears to not have any repeat measure lines: jig #95 (“The sheep on the mountains”):

That symbol occurring at the beginning and end is a “segno”. The last occurrence should really say “D.S.” or “dal sengo”, which states to play starting at the segno. This jig doesn’t appear to have ever been recorded.

After all of the above corrections and a few others, I reran my tokenization script and recomputed the unique measure structures and got 67 unique ones now. Many of them are equivalent since there are a variety of ways to specify the 8+8 structure, with and without anacrusis, with two different endings, etc. This calls for normalizing the transcriptions to bring things closer in terms of the representation, but first we will replace the tokens consisting of two characters with a single character. So we will make “|:” be “S”, “:|” be “E”, “|1” be “1”, and “|2” be “2”.

Since “||||||||ES||||||||E” is most prevalent, and “||||||||E||||||||E” is the second most prevalent, I will change the 56 expressed in the latter as the former. I will also change the five transcriptions with the structure “|||||||E|||||||E” as “|||||||ES|||||||E”. And change the three with “||||||||E||||||||E||||||||E||||||||E” to have “||||||||ES||||||||ES||||||||ES||||||||E”, etc. etc. After this normalization, I end up with 48 unique measure structures. Of the 365 transcriptions in this collection, 293 (>80%) are expressed by 5 measure structures.

Now we will look at computing the normalized DL similarity of these normalized measure sequences. Here’s the resulting similarity matrix:

Collapsing this matrix along an axis gives the mean normalized DL similarity for each transcription.

Now we have a nice inverse relationship between the length of a tune and its mean normalized DL similarity:

Performing multidimensional scaling on the dissimilarity matrix, we find clusters of transcriptions based on their structures. (The numbering in the image below has a shift of one. So #258 is actually jig #257.)

There appears to be a few major clusters. Below the x-axis lay 282 tunes, about 78% of the collection. The three major clusters below the x-axis from left to right are jigs with measure structures “||||||||ES||||1|||E2|||”, then “||||||||ES||||||||E”, and then “|||||||ES|||||||E”. The two clusters around y=0.2 are of jigs with structures “||||||||ES||||||||ES||||||||E” and “|||||||ES|||||||ES|||||||E”. The cluster at (-0.2,0.4) is of jigs with structures “||||||||ES||||||||ES||||||||ES||||||||E”.  So it seems here that the jigs with more than two parts are above the x-axis.

Tomorrow we will start looking at more musically meaningful comparisons, e.g., comparing pitch and intervallic content.