Humdrum Lab 2: Difference between revisions

Revision as of 19:29, 17 April 2018

Perhaps view Humdrum Lab 1 before doing this lab.

This lab introduces the Essen Folksong Collection and some tools for analysis and data extraction from the collection.

Downloading a repertory from the collection

The data files for the Essen Folksong Collection can be browsed here:

    http://kern.ccarh.org/browse?l=/essen

The primary collection consists of about 5,000 German songs and 3,000 Chinese songs.

To download a particular sub-collection try:

     humsplit  h://essen/europa/deutschl/erk

This should download 1701 song files:

    $ wc -l *.krn 
    1701

Create a new directory with `mkdir` first to keep them all in one location, or download all songs to a single file with:

     humcat -s h://essen/europa/deutschl/erk

scaletype tool

A simple scale categorizing tool can list what sort of musical content the songs have:

   scaletype *.krn

Here is how to count each basic category of scaletype:

    scaletype -F *.krn | sortcount
    817	heptatonic
    583	hexatonic
    159	pentatonic
    108	chromatic
    33	toofew

The meaning of each category: `heptatonic` is 7 pitch classes, `hexatonic` is 6 pitch classes, `pentatonic` is 5 pitch classes, `chromatic` is more than 7 pitch classes and `toofew` is less than 5 pitch classes.

Exercise: What scale degrees are missing in songs classified as `pentatonic`?

Exercise: in the `toofew` how many pitch classes are there? (0, 1, 2, 3, or 4).

Meter

To list the time signatures of each song in the subcollection, try using grep:

    grep ^\*M[0-9]  *.krn

The string `^\*M[0-9]` is a "regular expression" which means: at the starting of a line (^) find an asterisk followed by the capital letter M followed by a character in the range from 0 to 9 (a digit).

It is often safer to place the regular expression in single quotes to avoid the terminal from messing with it:

    grep '^\*M[0-9]'

The backslash is needed in this case because an asterisk by itself is a wildcard character in regular expressions which means to match to 0 or more of the previous character. But in this case we want to find the asterisk character so its non-special meaning is indicated by the backslash.

Count the number of songs in each meter with this command

         grep  -h '^\*M[0-9]' | sortcount

The -h option will suppress the name of the file in the output from grep. Type `man grep` to see more documentation about grep and its options. Alternatively search online for more information about grep. For example, here is a youtube tutorial on grep: https://www.youtube.com/watch?v=3w7xrQWRYrU

Sortcount is a humdrum extras script that basically merges these regular unix command line tools. try:

     grep -h '^\*M[0-9]' | sort | uniq -c | sort -nr

Try `man sort` and `man uniq` to learn more about those standard unix command-line tools.

The count of meters that I am getting for the erk subcollection is:

     581	*M4/4
     522	*M3/4
     473	*M2/4
     193	*M6/8
     119	*M3/8
     27	*M6/4
     8	*M3/2
     5	*M2/2
     1	*M4/2
     1	*M5/4

selecting by scaletype and meter

Often when you do analysis on music, you need to select music with common forms or features (such as input for David Cope's sort of analysis and music generation). Here is an example of selecting files by a particular meter and scaletype:

     scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic

The text in the backquotes is a command which is run first, then the result is passed as arguments to the scaletype program. In this case `-l` (lower-case L) means that grep should display all filenames for files that have a match for the regular expression.

Note that this method will have problems if the song changes the meter after the start of the music (if any of the meters in the file are matching to 3/4, then the whole file is being considered).

The scaletype command needs an equivalent of `-l`, but for now if you need a list of the files without the heptatonic label:

     scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic | sed 's/:.*//'

Save a copy of the 3/4 heptatonic files to a separate directory (262 files according to my count):

     mkdir 347
     scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic | sed 's/:.*//'  > list.txt
     cp `cat list.txt` 347
     cd 347
     scaletype *.krn
     grep -h '^\*M3/4' *.krn | sortcount

beat tool

Try the Humdrum Extras `beat` tool on the heptatonic 3/4-meter songs.

A basic listing of the options available in the tool can be listed on the command line with the `--options` option:

    beat --options

More detailed information about the tool can be found at http://extras.humdrum.org/man/beat

The `beat` tool calculates and extracts rhythmic, duration and metric information from Humdrum `**kern` files.

Use the -b option to extract beat positions in the meter:

    beat *.krn | ridx -H | sortcount -p

What is the most common beat position in 3/4, the least common that are present in the data. What is the commonality of each beat 1, 2 and 3?

There are complications since the above analysis includes rests and secondary tied notes. To avoid these problems in the analysis:

    oznin:347 css$ beat *.krn  -p  | hgrep -vkd '[]_r]'  | cut -f 1 | sortcount 

29.44	1
24.35	3
16.93	2
11.67	3.5
7.01	2.5
6.15	1.5
1.84	1.75
0.8	3.75
0.48	4
0.38	2.75
0.31	4.5
0.2	1.25
0.19	3.25
0.08	3.66667
0.08	3.33333
0.05	2.25
0.01	1.33333
0.01	1.66667
0.01	2.33333
0.01	4.25
0.01	2.66667

`beat -p` means to prepend the beat analysis spine to the start of the line.

hgrep is a Humdrum Extras tool that emulates the functionality of grep but is aware of Humdrum file structure: http://extras.humdrum.org/man/hgrep/

`hgrep -v` means show the lines of data which do not match the regular expression.

`hgrep -k` means only search in `**kern` data.

`hgrep -d` means only process data tokens.

Note that `hgrep -kd` is shorthand for `hgrep -k -d`.

A simpler method for doing the same analysis is to require the beat analysis to only analyze metric positions of lines with at least one note attack by using the `beat -A1` option:

    beat *.krn  -A1 |  ridx -H | sortcount -p

Exercise: try the same analysis on the 4/4 meter pieces from the erk collection.

Context

The classic Humdrum Tool called `context` is useful for extracting sequences of n-grams for using in Markov chain analyses.

Try context on the metric data extracted from the erk songs.

Here is an example of extracting a context of 2 adjacent notes:

beat *.krn  -A1 | context -n 2 | ridx -H | grep -v = | sortcount | head -n 20
1026	2 3
991	1 2
981	3 3.5
685	2.5 3
639	1 3
621	1 1.5
567	1.5 2
370	2 2.5
330	1 2.5
191	1.75 2
169	2 3.5
143	1 1.75
48	1.5 1.75
47	3 3.75
37	2.75 3
36	3.5 3.75
25	2 2.75
24	4 4.5
23	3 4
21	1 1.25

Humdrum Lab 2: Difference between revisions

Revision as of 19:29, 17 April 2018

Contents

Downloading a repertory from the collection

scaletype tool

Meter

selecting by scaletype and meter

beat tool

Context

Navigation menu

@@ Line 166: / Line 166: @@
 Note that `hgrep -kd` is shorthand for `hgrep -k -d`.
+A simpler method for doing the same analysis is to require the beat analysis to only analyze metric positions of lines with at least one note attack by using the `beat -A1` option:
+     beat *.krn  -A1 |  ridx -H | sortcount -p
+* Exercise: try the same analysis on the 4/4 meter pieces from the erk collection.
+== Context ==
+The classic Humdrum Tool called `context` is useful for extracting sequences of n-grams for using in Markov chain analyses.
+Try context on the metric data extracted from the erk songs.
+Here is an example of extracting a context of 2 adjacent notes:
+ beat *.krn  -A1 | context -n 2 | ridx -H | grep -v = | sortcount | head -n 20
+	2 3
+	1 2
+	3 3.5
+	2.5 3
+	1 3
+	1 1.5
+	1.5 2
+	2 2.5
+	1 2.5
+	1.75 2
+	2 3.5
+	1 1.75
+	1.5 1.75
+	3 3.75
+	2.75 3
+	3.5 3.75
+	2 2.75
+	4 4.5
+	3 4
+	1 1.25

Humdrum Lab 2: Difference between revisions

Revision as of 19:29, 17 April 2018

Downloading a repertory from the collection

scaletype tool

Meter

selecting by scaletype and meter

beat tool

Context

Navigation menu

Search