Humdrum Lab 3

Perhaps view Humdrum Lab 2 before this lab.

This lab demonstrates using tindex/themax/theloc Humdrum Extras program set to search for melodic/rhythmic patterns in **kern scores.

First download the Ludwig Erk subcollection of folk songs from Humdrum Lab 2 and the Bach chorales from Humdrum Lab 1.

Split songs into major and minor groups

Similar to the last lab, split the Erk songs into those that are in major and those that are minor:

   mkdir major
   mkdir minor
   cp `egrep -l '^\*[A-G][#-]?:' *.krn` major
   cp `egrep -l '^\*[a-g][#-]?:' *.krn` major

The regular expression:

    ^\*[A-G][#-]?:

means search for a line that starts (^) with an asterisk (\*) followed by one character in the range from A to G ([A-G]) followed optionally by the character '#' or '-' ([#-]?) followed by a colon (:).

Notice the back-quotes in the last two commands. This means to run the command inside of the backquotes first, then use that content as argument input into the rest of the command.

The `-l` option for egrep is used to list the files the contain the match, not the matches themselves.

egrep is used instead of grep, because '?' is an extended regular expression metacharcter. In the basic set of regular expression meta characters, the '?' character is a normal character with no special meaning.

Aside on regular expressions

Here are the basic regular-expression operators:

.	Any single character
*	zero or more occurrences of the previous character
[]	one of the characters in the enclosed list
^	anchor match to start of line
$	anchor match to the end of line

After the basic set was developed, people wanted more operators, so an extended set was added:

+	one or more occurrences of the previous character
?	zero or one occurrence of the previous character
()	grouping, such as (cat)+ which would match to cat, catcat, catcatcat, etc.
\|	logical or operator, such as (cat\|mouse) which would match to cat or mouse.
{}	generalize counter, such as {3,5} meaning between three and five occurrences of the previous character or parentheses grouping.

The confusing thing is that grep understands the basic set, and egrep understands both sets. If you use a "?" character in grep, it will be treated as a regular character, while in egrep it will be a special character. However, you can use "?" as a special character in grep by adding a backslash in front of the question mark (\?). This forces it to become a metacharacter. Oddly, doing the same thing in egrep will cause the question mark to become a regular character.

Aside on shell scripts

Suppose that you want to often split scores into major and minor mode groups. You can store the commands to do that in a shell script. Create a file called "splitbymode" with these lines in it:

 #!/bin/bash
 mkdir major
 mkdir minor
 cp $(egrep -l '^\*[A-G][#-]?:' *.krn) major
 cp $(egrep -l '^\*[a-g][#-]?:' *.krn) minor

After saving it to the disk, change the permissions of the file to allow it to be run as a program:

   chmod 0755 splitbymode

Then run it by typing:

   ./splitbymode

The "./" prefix will probably be needed so that the shell knows where to look for the command. You can run the command

    echo $PATH | tr : '\n'

to see a list of the directories in which the shell will look for commands. If "." is not in the list (which means the current directory/folder), then you must add "./" to tell the terminal to look in the current directory for the command.

The first line of the shell script starts with a shebang. It specifies the location of the interpreter for the programming language used in the script. In this case the syntax is bash shell commands, so the bash interpreter is used.

Other languages and interpreters can be used. For example try this PERL script:

#!/usr/bin/perl
system("mkdir -p major");
system("mkdir -p minor");
system("cp \$(egrep -l '^\\*[A-G][#-]?:' *.krn) major");
system("cp \$(egrep -l '^\\*[a-g][#-]?:' *.krn) minor");

PERL allows shell commands by adding back-quotes around the command. Or "system()" can be used to run a shell command, as in this case, which is needed in the cases when the shell command is using back-quotes itself. The $() syntax is equivalent to the back-quote syntax, but can be nested while the back-quote method cannot.

Note that these two shell commands are the same:

     $(egrep -l '^\\*[A-G][#-]?:' *.krn)
     `egrep -l '^\\*[A-G][#-]?:' *.krn`

A python script to do the same thing:

 #!/usr/bin/env python
 from os import *
 system("mkdir -p major")
 system("mkdir -p minor")
 system("cp $(egrep -l '^\\*[A-G][#-]?:' *.krn) major")
 system("cp $(egrep -l '^\\*[a-g][#-]?:' *.krn) minor")

Nearly the same as the PERL script, but adds the line "from os import *" which allows use of the system() function. Semi-colons are optional at the ends of statements in Python.

As a ruby program (semi-colons at ends of statements are also optional):

#!/usr/bin/env ruby
system("mkdir -p major")
system("mkdir -p minor")
system("cp $(egrep -l '^\\*[A-G][#-]?:' *.krn) major")
system("cp $(egrep -l '^\\*[a-g][#-]?:' *.krn) minor")

Most common scale-degree pattern in each mode

Here is a demonstration of how to extract a list of the most common five-note scale-degree sequences in the major and minor datasets:

   cd major
   cat *.krn | deg -a | grep -v ^= | grep -v r | context -n 5 | ridx -H \ 
                | sortcount | head -n 10
   470	5 4 3 2 1
   361	5 6 5 4 3
   275	6 5 4 3 2
   225	3 5 4 3 2
   209	1 2 3 4 5
   198	2 1 7 6 5
   182	5 5 4 3 2
   180	3 3 3 3 3
   179	2 3 4 3 2
   174	3 3 2 2 1

   cd ../minor
   cat *.krn | deg -a | grep -v ^= | grep -v r | context -n 5 | ridx -H \
               | sortcount | head -n 10
   44	5 4 3 2 1
   31	1 2 3 4 5
   30	3 2 1 7 1
   29	1 2 3 2 1
   27	5 5 5 5 5
   25	3 2 1 2 3 
   22	4 5 3 2 1
   21	5 5 4 3 2
   21	4 3 2 1 2
   19	4 3 2 1 7

The command:

    cat *.krn | deg -a

is used (for the major files) instead of :

    deg -a *.krn

because there are too many *.krn files for the AWK interpreter to handle all of them as commad-line arguments.

Uncommon patterns

Here is how to list all of the 5-note patterns that occur only twice in all of the major works:

   cat *.krn | deg -a | grep -v ^= | grep -v r | context -n 5 | ridx -H \
             | sortcount | grep '^2\b' | less

Here are some of the patterns:

2       5 1 7 7 5
2       7 2 1 2 2
2       2 5 3 6 5
2       5 5 6 6 2
2       2 2 1+ 2 2
2       2 1 7 5 2
2       5 4 5 3 2-

The \b in the regular expressions "^2\b" means that there must be a word boundary at that position. This means that '^2\b' will match to 2 but not to 28, since 8 is part of the word "28".

To count the number of patterns that only occur twice:

  cat *.krn | deg -a | grep -v ^= | grep -v r | context -n 5 | ridx -H \
       | sortcount | grep '^2\b' | wc -l
  1098

There are 7064 unique 5-note patterns, so 15.5% only occur twice and 2094 or 29.6% only occur once.

Exercise: Use the frequencies of 5-note scale patterns to calculate the likely key of melodies.

Locating patterns in music

The scale degree pattern occurs twice in 1591 songs, but where? Does the pattern occur twice in one song, or once in two songs?

To answer these questions, use the Humdrum Extra tools related to Themefinder searching: tindex, themax, and theloc.

"tindex" (thema indexer) creates a feature database (index) of melodic lines. "themax" searches the index, and "theloc" links the search results back to the original scores.

First, run "tindex" to create the feature database for the major songs:

   tindex *.krn > index.txt

Now, you can use it to search for pattens in the songs. Here is an example of how to search by scale degrees "7 2 1 2 2":

   themax -d "7 2 1 2 2" --count index.txt
   deut1920.krn::1	1
   deut2106.krn::1	1

The pattern occurs once in two separate files. The ::1 at the end of the filename means that the match occurred in the first spine of the file (which only has one spine since the music is monophonic). The 1 at the end of the lines indicates how many times the pattern occurs in the file.

You can add the --total option to count the occurrences in all data files. The total count is given at the end of the output:

   themax -d "7 2 1 2 2" --count --total index.txt
   deut1920.krn::1	1
   deut2106.krn::1	1
   2

To determine where the match occurs within the music, use the --loc option:

   themax -d "7 2 1 2 2" --loc index.txt
   deut1920.krn::1	28-32
   deut2106.krn::1	24-28

In the first file the pattern occurs at notes 28 to 32, counting from the start of the music.

The theloc program can mark the notes in the original score by using this sort of command pipeline:

    themax -d "7 2 1 2 2" --loc index.txt | head -n 1 | theloc --mark

Try copying this output and pasting into VHV. In MacOS, you can use pbcopy:

    themax -d "7 2 1 2 2" --loc index.txt | head -n 1 | theloc --mark | pbcopy

Results of searching for the scale degree sequence "7 2 1 2 2".

Displaying the location in the second file:

   themax -d "7 2 1 2 2" --loc index.txt | head -n 2 | tail -n 1  | theloc --mark | pbcopy

Results of searching for the scale degree sequence "7 2 1 2 2" in another file.

The command pipeline:

    head -n 2 | tail -n 1

is one way to extract the second line of content: head -n 2 extracts the first two lines of text and tail -n 1 extracts the last line of some text.

Showing matched music only

The myank command can be used with the --mark option to extract all measures containing marked (matched) notes:

    themax -d "7 2 1 2 2" --loc zindex | head -n 2 | tail -n 1 \
          | theloc --mark | myank --mark | pbcopy

Results only matched measures after searching for the scale degree sequence "7 2 1 2 2".

Aside: creating a musical figure with VHV

You can type alt-t to download a PDF of the currently viewed music. This can be drag-and-dropped into MS Word for example to become a figure:

Resulting PDF dragged and dropped into MS Word:

Drag-and-drop of PDF extract in MS Word.

Note that the bounding box on the PDF will probably cause extra whitespace in MS Word, so you will have to format the picture so that text can be placed above or below it so that text can fill in the empty spaces of the figure.

Searching for multiple musical features in parallel

The themax command can search for multiple features occurring at the same time. Here are some examples.

Go back to the most common 5-note scale degree sequence: 5 4 3 2 1". How many times does this pattern start on G (i.e., occur in C major):

oznin:major css$ themax -p g -d "5 4 3 2 1" index.txt  --count --total
deut0588.krn::1	3
deut0704.krn::1	3
deut0948.krn::1	1
deut0950.krn::1	1
deut0951.krn::1	4
deut1055.krn::1	1
deut1459.krn::1	6
deut1533.krn::1	1
deut1618.krn::1	1
deut1623.krn::1	1
deut1636.krn::1	1
deut1650.krn::1	2
deut1669.krn::1	1
deut1888.krn::1	1
deut2127.krn::1	1
deut2128.krn::1	1
deut2140.krn::1	1
30

It occurs 30 times in 158 songs in C major:

    grep -l '^\*C:'  *.krn | wc -l
    158

The song with the most occurrences of the pattern is deut1459.krn, so mark the matched pattern and display in VHV:

   themax -p g -d "5 4 3 2 1" index.txt  --loc | grep deut1459.krn  | theloc --mark | pbcopy

Matches for the scale degree sequence "5 4 3 2 1" starting on G.

Now match only to cases where the last note is a half note:

  themax -p g -d "5 4 3 2 1" -u "x x x x 2" index.txt  --loc | grep deut1459.krn  | theloc --mark | pbcopy

Matches for the scale degree sequence "5 4 3 2 1" starting on G and ending on a half note.

BACH motif

J.S. Bach liked to sign his music musically. Search the Bach chorale data set for the Melodic pattern B-A-C-H. These are German pitch names, which in English are B♭-A-C-B♮

Go to the directory where you have the chorales stored, or download like this:

    mkdir chorales
    cd chorales
    humsplit h://chorales

Then run tindex:

    tindex *.krn > index.txt

Then search for the pitch sequence, which occurs 8 times in the chorales, always in the tenor part (second spine):

themax -p "b- a c b" index.txt  --count --total
chor166.krn::2	1
chor174.krn::2	1
chor190.krn::2	1
chor203.krn::2	1
chor218.krn::2	1
chor235.krn::2	1
chor268.krn::2	1
chor319.krn::2	1 
8

The msearch filter can also search by melodic pitch name:

B-A-C-H motive at the end of chorale #166.

Links to the works with B-A-C-H highlighted in VHV:

chorale 166 chorale 174 chorale 190 chorale 203 chorale 218 chorale 235 chorale 268 chorale 319

What about cases where the B-A-C-H motive could be transposed? The transposable version of the motive occurs 22 times. Here is an example of searching by 12-tone (MIDI note) interval which is "-1 3 -1" for the motive. Also the --loc2 option only marks the start of the match (not the end), and the -B option on theloc shows the bar number (prefixed by an "=" sign, and -N suppresses the enumerated note position):

themax -i "-1 3 -1" index.txt  --loc2 | theloc -BN
chor018.krn::3	=6
chor021.krn::3	=2
chor052.krn::3	=6
chor075.krn::2	=10
chor116.krn::2	=25
chor132.krn::2	=20
chor166.krn::2	=13
chor174.krn::2	=10
chor190.krn::2	=11
chor203.krn::2	=20
chor218.krn::2	=15
chor227.krn::3	=5
chor235.krn::2	=1
chor239.krn::3	=2
chor258.krn::3	=12
chor268.krn::2	=13
chor276.krn::3	=1
chor290.krn::2	=10
chor319.krn::2	=1
chor343.krn::2	=7
chor348.krn::2	=9
chor351.krn::2	=2

Exercise: choose one of the matched files and mark the pattern with themax/theloc.

The next lab presents the JRP dataset and some tools for analyzing it.

Lab 1 (intro)	Lab 2 (Essen)	Lab 3 (searching)	Lab 4 (JRP)	Lab 5 (Wikifonia)	Lab 6 (bar chart)	Lab 7 (regular expressions)	Lab 8 (chorck & cint)