Humdrum Lab 2

From CCARH Wiki
Jump to navigation Jump to search

This lab introduces the Essen Folksong Collection and some tools for analysis and data extraction from the collection.

Downloading a repertory from the collection

The data files for the Essen Folksong Collection can be browsed here:

    http://kern.ccarh.org/browse?l=/essen

The primary collection consists of about 5,000 German songs and 3,000 Chinese songs.

To download a particular sub-collection try:

     humsplit  h://essen/europa/deutschl/erk

This should download 1701 song files:

    $ wc -l *.krn 
    1701

Create a new directory with `mkdir` first to keep them all in one location, or download all songs to a single file with:

     humcat -s h://essen/europa/deutschl/erk


scaletype tool

A simple scale categorizing tool can list what sort of musical content the songs have:

   scaletype *.krn

Here is how to count each basic category of scaletype:

    scaletype -F *.krn | sortcount
    817	heptatonic
    583	hexatonic
    159	pentatonic
    108	chromatic
    33	toofew

The meaning of each category: `heptatonic` is 7 pitch classes, `hexatonic` is 6 pitch classes, `pentatonic` is 5 pitch classes, `chromatic` is more than 7 pitch classes and `toofew` is less than 5 pitch classes.


  • Exercise: What scale degrees are missing in songs classified as `pentatonic`?
  • Exercise: in the `toofew` how many pitch classes are there? (0, 1, 2, 3, or 4).

== Meter

To list the time signatures of each song in the subcollection, try using grep:

    grep ^\*M[0-9]  *.krn

The string `^\*M[0-9]` is a "regular expression" which means: at the starting of a line (^) find an asterisk followed by the capital letter M followed by a character in the range from 0 to 9 (a digit).

It is often safer to place the regular expression in single quotes to avoid the terminal from messing with it:

    grep '^\*M[0-9]'

The backslash is needed in this case because an asterisk by itself is a wildcard character in regular expressions which means to match to 0 or more of the previous character. But in this case we want to find the asterisk character so its non-special meaning is indicated by the backslash.

Count the number of songs in each meter with this command

         grep  -h '^\*M[0-9]' | sortcount

The -h option will suppress the name of the file in the output from grep. Type `man grep` to see more documentation about grep and its options. Alternatively search online for more information about grep. For example, here is a youtube tutorial on grep: https://www.youtube.com/watch?v=3w7xrQWRYrU

Sortcount is a humdrum extras script that basically merges these regular unix command line tools. try:

     grep -h '^\*M[0-9]' | sort | uniq -c | sort -nr

Try `man sort` and `man uniq` to learn more about those standard unix command-line tools.

The count of meters that I am getting for the erk subcollection is:

     581	*M4/4
     522	*M3/4
     473	*M2/4
     193	*M6/8
     119	*M3/8
     27	*M6/4
     8	*M3/2
     5	*M2/2
     1	*M4/2
     1	*M5/4