Humdrum Lab 2

From CCARH Wiki
Revision as of 22:21, 20 April 2018 by Craig (talk | contribs) (→‎Context)
Jump to navigation Jump to search

Perhaps view Humdrum Lab 1 before doing this lab.

This lab introduces the Essen Folksong Collection and some tools for analysis and data extraction from the collection.

Downloading a repertory from the collection

The data files for the Essen Folksong Collection can be browsed here:

    http://kern.ccarh.org/browse?l=/essen

The primary collection consists of about 5,000 German songs and 3,000 Chinese songs.

To download a particular sub-collection try:

     mkdir erk
     cd erk
     humsplit  h://essen/europa/deutschl/erk

This should download 1701 song files:

    $ wc -l *.krn 
    1701

scaletype tool

A simple scale categorizing tool can list what sort of musical content the songs have:

   scaletype *.krn

Here is how to count each basic category of scaletype:

    scaletype -F *.krn | sortcount
    817	heptatonic
    583	hexatonic
    159	pentatonic
    108	chromatic
    33	toofew

The meaning of each category: `heptatonic` is 7 pitch classes, `hexatonic` is 6 pitch classes, `pentatonic` is 5 pitch classes, `chromatic` is more than 7 pitch classes and `toofew` is less than 5 pitch classes.


  • Exercise: What scale degrees are missing in songs classified as `pentatonic`?
  • Exercise: in the `toofew` how many pitch classes are there? (0, 1, 2, 3, or 4).

Meter

To list the time signatures of each song in the subcollection, try using grep:

    grep ^\*M[0-9]  *.krn

The string `^\*M[0-9]` is a "regular expression" which means: at the starting of a line (^) find an asterisk followed by the capital letter M followed by a character in the range from 0 to 9 (a digit).

It is often safer to place the regular expression in single quotes to avoid the terminal from messing with it:

    grep '^\*M[0-9]'

The backslash is needed in this case because an asterisk by itself is a wildcard character in regular expressions which means to match to 0 or more of the previous character. But in this case we want to find the asterisk character so its non-special meaning is indicated by the backslash.

Count the number of songs in each meter with this command

         grep  -h '^\*M[0-9]' | sortcount

The -h option will suppress the name of the file in the output from grep. Type `man grep` to see more documentation about grep and its options. Alternatively search online for more information about grep. For example, here is a youtube tutorial on grep: https://www.youtube.com/watch?v=3w7xrQWRYrU

Sortcount is a humdrum extras script that basically merges these regular unix command line tools. try:

     grep -h '^\*M[0-9]' | sort | uniq -c | sort -nr

Try `man sort` and `man uniq` to learn more about those standard unix command-line tools.

The count of meters that I am getting for the erk subcollection is:

     581	*M4/4
     522	*M3/4
     473	*M2/4
     193	*M6/8
     119	*M3/8
     27	*M6/4
     8	*M3/2
     5	*M2/2
     1	*M4/2
     1	*M5/4


selecting by scaletype and meter

Often when you do analysis on music, you need to select music with common forms or features (such as input for David Cope's sort of analysis and music generation). Here is an example of selecting files by a particular meter and scaletype:

     scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic

The text in the backquotes is a command which is run first, then the result is passed as arguments to the scaletype program. In this case `-l` (lower-case L) means that grep should display all filenames for files that have a match for the regular expression.

Note that this method will have problems if the song changes the meter after the start of the music (if any of the meters in the file are matching to 3/4, then the whole file is being considered).

The scaletype command needs an equivalent of `-l`, but for now if you need a list of the files without the heptatonic label:


     scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic | sed 's/:.*//'


Save a copy of the 3/4 heptatonic files to a separate directory (262 files according to my count):

     mkdir 347
     scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic | sed 's/:.*//'  > list.txt
     cp `cat list.txt` 347
     cd 347
     scaletype *.krn
     grep -h '^\*M3/4' *.krn | sortcount


beat tool

Try the Humdrum Extras `beat` tool on the heptatonic 3/4-meter songs.

A basic listing of the options available in the tool can be listed on the command line with the `--options` option:

    beat --options

More detailed information about the tool can be found at http://extras.humdrum.org/man/beat

The `beat` tool calculates and extracts rhythmic, duration and metric information from Humdrum `**kern` files.

Use the -b option to extract beat positions in the meter:

    beat *.krn | ridx -H | sortcount -p

What is the most common beat position in 3/4, the least common that are present in the data. What is the commonality of each beat 1, 2 and 3?

There are complications since the above analysis includes rests and secondary tied notes. To avoid these problems in the analysis:

    $ beat *.krn  -p  | hgrep -vkd '[]_r]'  | cut -f 1 | sortcount 

29.44	1
24.35	3
16.93	2
11.67	3.5
7.01	2.5
6.15	1.5
1.84	1.75
0.8	3.75
0.48	4
0.38	2.75
0.31	4.5
0.2	1.25
0.19	3.25
0.08	3.66667
0.08	3.33333
0.05	2.25
0.01	1.33333
0.01	1.66667
0.01	2.33333
0.01	4.25
0.01	2.66667


`beat -p` means to prepend the beat analysis spine to the start of the line.

hgrep is a Humdrum Extras tool that emulates the functionality of grep but is aware of Humdrum file structure: http://extras.humdrum.org/man/hgrep/

`hgrep -v` means show the lines of data which do not match the regular expression.

`hgrep -k` means only search in `**kern` data.

`hgrep -d` means only process data tokens.

Note that `hgrep -kd` is shorthand for `hgrep -k -d`.

A simpler method for doing the same analysis is to require the beat analysis to only analyze metric positions of lines with at least one note attack by using the `beat -A1` option:

    beat *.krn  -A1 |  ridx -H | sortcount -p
  • Exercise: try the same analysis on the 4/4 meter pieces from the erk collection.

Context

The classic Humdrum Tool called `context` is useful for extracting sequences of n-grams for using in Markov chain analyses. See the webpage http://www.humdrum.org/man/context for more documentation about the context tool.

Try context on the metric data extracted from the erk songs.

Here is an example of extracting a context of 2 adjacent notes:

$ beat *.krn  -A1 | context -n 2 | ridx -H | grep -v = | sortcount | head -n 20
1026	2 3
991	1 2
981	3 3.5
685	2.5 3
639	1 3
621	1 1.5
567	1.5 2
370	2 2.5
330	1 2.5
191	1.75 2
169	2 3.5
143	1 1.75
48	1.5 1.75
47	3 3.75
37	2.75 3
36	3.5 3.75
25	2 2.75
24	4 4.5
23	3 4
21	1 1.25

So the most common metric pattern in the data is going from beat 2 to beat 3.

Here is an example of how to extract all metric patterns in the songs:

$ beat *.krn  -A1 | context -n 20 | ridx -H | grep ^1 | grep -v '^1\.' | sed 's/=.*//' | sortcount -p | head -n 20
19.59	1 3
15.86	1 2 3
7.69	1 2 3 3.5 
5.97	1 2.5 3 3.5
5.26	1 2.5 3
4.25	1
4.2	1 1.5 2 3 3.5
3.92	1 1.5 2 2.5 3 3.5
3.82	1 1.5 2 3
3.68	1 3 3.5
3.11	1 1.5 2 3.5
2.82	1 2 2.5 3 3.5
1.96	1 2
1.67	1 2 3.5
1.39	1 1.75 2 2.5 3 3.5
1.34	1 2 2.5 3
1.15	1 1.5 2 
0.96	1 1.75 2 3.5
0.91	1 1.75 2 3
0.86	1 1.5 2 2.5

20% of the measures have the pattern of half-note followed by a quarter note (beats one and three).

  • Exercise: Notice that beat 1.5 is fairly rare in these 3/4 songs. How does this compare to the songs in 4/4?

Next lab

Next, you can view Humdrum Lab 3 which searching for musical features.