Humdrum Lab 2
This lab introduces the Essen Folksong Collection and some tools for analysis and data extraction from the collection.
Downloading a repertory from the collection
The data files for the Essen Folksong Collection can be browsed here:
http://kern.ccarh.org/browse?l=/essen
The primary collection consists of about 5,000 German songs and 3,000 Chinese songs.
To download a particular sub-collection try:
humsplit h://essen/europa/deutschl/erk
This should download 1701 song files:
$ wc -l *.krn 1701
Create a new directory with `mkdir` first to keep them all in one location, or download all songs to a single file with:
humcat -s h://essen/europa/deutschl/erk
scaletype tool
A simple scale categorizing tool can list what sort of musical content the songs have:
scaletype *.krn
Here is how to count each basic category of scaletype:
scaletype -F *.krn | sortcount 817 heptatonic 583 hexatonic 159 pentatonic 108 chromatic 33 toofew
The meaning of each category: `heptatonic` is 7 pitch classes, `hexatonic` is 6 pitch classes, `pentatonic` is 5 pitch classes, `chromatic` is more than 7 pitch classes and `toofew` is less than 5 pitch classes.
- Exercise: What scale degrees are missing in songs classified as `pentatonic`?
- Exercise: in the `toofew` how many pitch classes are there? (0, 1, 2, 3, or 4).
Meter
To list the time signatures of each song in the subcollection, try using grep:
grep ^\*M[0-9] *.krn
The string `^\*M[0-9]` is a "regular expression" which means: at the starting of a line (^) find an asterisk followed by the capital letter M followed by a character in the range from 0 to 9 (a digit).
It is often safer to place the regular expression in single quotes to avoid the terminal from messing with it:
grep '^\*M[0-9]'
The backslash is needed in this case because an asterisk by itself is a wildcard character in regular expressions which means to match to 0 or more of the previous character. But in this case we want to find the asterisk character so its non-special meaning is indicated by the backslash.
Count the number of songs in each meter with this command
grep -h '^\*M[0-9]' | sortcount
The -h option will suppress the name of the file in the output from grep. Type `man grep` to see more documentation about grep and its options. Alternatively search online for more information about grep. For example, here is a youtube tutorial on grep: https://www.youtube.com/watch?v=3w7xrQWRYrU
Sortcount is a humdrum extras script that basically merges these regular unix command line tools. try:
grep -h '^\*M[0-9]' | sort | uniq -c | sort -nr
Try `man sort` and `man uniq` to learn more about those standard unix command-line tools.
The count of meters that I am getting for the erk subcollection is:
581 *M4/4 522 *M3/4 473 *M2/4 193 *M6/8 119 *M3/8 27 *M6/4 8 *M3/2 5 *M2/2 1 *M4/2 1 *M5/4