Difference between revisions of "Humdrum Lab 3"

From CCARH Wiki
Jump to navigation Jump to search
Line 111: Line 111:
 
               | sortcount | grep '^2\b' | less
 
               | sortcount | grep '^2\b' | less
  
 +
Here are the most common patterns:
 +
 +
2      5 1 7 7 5
 +
2      7 2 1 2 2
 +
2      2 5 3 6 5
 +
2      5 5 6 6 2
 +
2      2 2 1+ 2 2
 +
2      2 1 7 5 2
 +
2      5 4 5 3 2-
  
 
The \b in the regular expressions means that there must be a word boundary at that position.  This means that '^2\b' will match to 2 but not to 28, since 8 is part of the word "28".
 
The \b in the regular expressions means that there must be a word boundary at that position.  This means that '^2\b' will match to 2 but not to 28, since 8 is part of the word "28".
 +
 +
To count the number of patterns that only occur twice:
 +
 +
 +
  cat *.krn | deg -a | grep -v ^= | grep -v r | context -n 5 | ridx -H \
 +
        | sortcount | grep '^2\b' | wc -l
 +
  1098
 +
 +
There are 7064 unique 5-note patterns, so 15.5% only occur twice and 2094 or 29.6% only occur once.
 +
 +
* Exercise:  Use the frequencies of 5-note scale patterns to calculate the likely key of melodies.

Revision as of 18:21, 20 April 2018

Perhaps view Humdrum Lab 2 before this lab.

This lab demonstrate using tindex/themax/theloc to search for melodic/rhythmic patterns in **kern scores.

First download the erk songs from Humdrum Lab 2 and the Bach chorales from Humdrum Lab 1.


Split songs into major and minor groups

Similar to the last lab, split the Erk songs into those that are in major and those that are minor:

   mkdir major
   mkdir minor
   cp `egrep -l '^\*[A-G][#-]?:' *.krn` major
   cp `egrep -l '^\*[a-g][#-]?:' *.krn` major

The regular expression:

    ^\*[A-G][#-]?:

means search for a line that starts (^) with an asterisk (\*) followed by one character in the range from A to G ([A-G]) followed optionally by the character '#' or '-' ([#-]?) followed by a colon (:).

Notice the back-quotes in the last two commands. This means to run the command inside of the backquotes first, then use that content as argument input into the rest of the command.

The `-l` option for egrep is used to list the files the contain the match, not the matches themselves.

egrep is used instead of grep, because '?' is an extended regular expression metacharcter. In the basic set of regular expression meta characters, the '?' character is a normal character with no special meaning.

Side-track on regular expressions

Here are the basic regular-expression operators:

. Any single character
* zero or more occurrences of the previous character
[] one of the characters in the enclosed list
^ anchor match to start of line
$ anchor match to the end of line


After the basic set was developed, people wanted more operators, so an extended set was added:


+ one or more occurrences of the previous character
? zero or one occurrence of the previous character
() grouping, such as (cat)+ which would match to cat, catcat, catcatcat, etc.
| logical or operator, such as (cat|mouse) which would match to cat or mouse.
{} generalize counter, such as {3,5} meaning between three and five occurrences of the previous character or parentheses grouping.

The confusing thing is that grep understands the basic set, and egrep understands both sets. If you use a "?" character in grep, it will be treated as a regular character, while in egrep it will be a special character. However, you can use "?" as a special character in grep by adding a backslash in front of the question mark (\?). This forces it to become a metacharacter. Oddly, doing the same thing in egrep will cause the question mark to become a regular character.


Most common scale-degree pattern in each mode

Here is a demonstration of how to extract a list of the most common scale-degree sequences in the major and minor datasets:

   cd major
   cat *.krn | deg -a | grep -v ^= | grep -v r | context -n 5 | ridx -H \ 
                | sortcount | head -n 10
   470	5 4 3 2 1
   361	5 6 5 4 3
   275	6 5 4 3 2
   225	3 5 4 3 2
   209	1 2 3 4 5
   198	2 1 7 6 5
   182	5 5 4 3 2
   180	3 3 3 3 3
   179	2 3 4 3 2
   174	3 3 2 2 1
   cd ../minor
   cat *.krn | deg -a | grep -v ^= | grep -v r | context -n 5 | ridx -H \
               | sortcount | head -n 10
   44	5 4 3 2 1
   31	1 2 3 4 5
   30	3 2 1 7 1
   29	1 2 3 2 1
   27	5 5 5 5 5
   25	3 2 1 2 3 
   22	4 5 3 2 1
   21	5 5 4 3 2
   21	4 3 2 1 2
   19	4 3 2 1 7


The command:

    cat *.krn | deg -a 

is used (for the major files) instead of :

    deg -a *.krn

because there are too many *.krn files for the AWK interpreter to handle all of them as commad-line arguments.

Uncommon patterns

Here is how to list all of the 5-note patterns which occur only twice in all of the major works:

   cat *.krn | deg -a | grep -v ^= | grep -v r | context -n 5 | ridx -H \
             | sortcount | grep '^2\b' | less

Here are the most common patterns:

2 5 1 7 7 5 2 7 2 1 2 2 2 2 5 3 6 5 2 5 5 6 6 2 2 2 2 1+ 2 2 2 2 1 7 5 2 2 5 4 5 3 2-

The \b in the regular expressions means that there must be a word boundary at that position. This means that '^2\b' will match to 2 but not to 28, since 8 is part of the word "28".

To count the number of patterns that only occur twice:


  cat *.krn | deg -a | grep -v ^= | grep -v r | context -n 5 | ridx -H \
       | sortcount | grep '^2\b' | wc -l
  1098

There are 7064 unique 5-note patterns, so 15.5% only occur twice and 2094 or 29.6% only occur once.

  • Exercise: Use the frequencies of 5-note scale patterns to calculate the likely key of melodies.