Difference between revisions of "Humdrum Lab 2"

From CCARH Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 +
Perhaps view [[Humdrum Lab 1]] before doing this lab.
 +
 
This lab introduces the Essen Folksong Collection and some tools for analysis and data extraction from the collection.
 
This lab introduces the Essen Folksong Collection and some tools for analysis and data extraction from the collection.
  

Revision as of 18:16, 17 April 2018

Perhaps view Humdrum Lab 1 before doing this lab.

This lab introduces the Essen Folksong Collection and some tools for analysis and data extraction from the collection.

Downloading a repertory from the collection

The data files for the Essen Folksong Collection can be browsed here:

    http://kern.ccarh.org/browse?l=/essen

The primary collection consists of about 5,000 German songs and 3,000 Chinese songs.

To download a particular sub-collection try:

     humsplit  h://essen/europa/deutschl/erk

This should download 1701 song files:

    $ wc -l *.krn 
    1701

Create a new directory with `mkdir` first to keep them all in one location, or download all songs to a single file with:

     humcat -s h://essen/europa/deutschl/erk


scaletype tool

A simple scale categorizing tool can list what sort of musical content the songs have:

   scaletype *.krn

Here is how to count each basic category of scaletype:

    scaletype -F *.krn | sortcount
    817	heptatonic
    583	hexatonic
    159	pentatonic
    108	chromatic
    33	toofew

The meaning of each category: `heptatonic` is 7 pitch classes, `hexatonic` is 6 pitch classes, `pentatonic` is 5 pitch classes, `chromatic` is more than 7 pitch classes and `toofew` is less than 5 pitch classes.


  • Exercise: What scale degrees are missing in songs classified as `pentatonic`?
  • Exercise: in the `toofew` how many pitch classes are there? (0, 1, 2, 3, or 4).

Meter

To list the time signatures of each song in the subcollection, try using grep:

    grep ^\*M[0-9]  *.krn

The string `^\*M[0-9]` is a "regular expression" which means: at the starting of a line (^) find an asterisk followed by the capital letter M followed by a character in the range from 0 to 9 (a digit).

It is often safer to place the regular expression in single quotes to avoid the terminal from messing with it:

    grep '^\*M[0-9]'

The backslash is needed in this case because an asterisk by itself is a wildcard character in regular expressions which means to match to 0 or more of the previous character. But in this case we want to find the asterisk character so its non-special meaning is indicated by the backslash.

Count the number of songs in each meter with this command

         grep  -h '^\*M[0-9]' | sortcount

The -h option will suppress the name of the file in the output from grep. Type `man grep` to see more documentation about grep and its options. Alternatively search online for more information about grep. For example, here is a youtube tutorial on grep: https://www.youtube.com/watch?v=3w7xrQWRYrU

Sortcount is a humdrum extras script that basically merges these regular unix command line tools. try:

     grep -h '^\*M[0-9]' | sort | uniq -c | sort -nr

Try `man sort` and `man uniq` to learn more about those standard unix command-line tools.

The count of meters that I am getting for the erk subcollection is:

     581	*M4/4
     522	*M3/4
     473	*M2/4
     193	*M6/8
     119	*M3/8
     27	*M6/4
     8	*M3/2
     5	*M2/2
     1	*M4/2
     1	*M5/4


selecting by scaletype and meter

Often when you do analysis on music, you need to select music with common forms or features (such as input for David Cope's sort of analysis and music generation). Here is an example of selecting files by a particular meter and scaletype:

     scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic

The text in the backquotes is a command which is run first, then the result is passed as arguments to the scaletype program. In this case `-l` (lower-case L) means that grep should display all filenames for files that have a match for the regular expression.

Note that this method will have problems if the song changes the meter after the start of the music (if any of the meters in the file are matching to 3/4, then the whole file is being considered).

The scaletype command needs an equivalent of `-l`, but for now if you need a list of the files without the heptatonic label:


     scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic | sed 's/:.*//'


Save a copy of the 3/4 heptatonic files to a separate directory (262 files according to my count):

     mkdir 347
     scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic | sed 's/:.*//'  > list.txt
     cp `cat list.txt` 347
     cd 347
     scaletype *.krn
     grep -h '^\*M3/4' *.krn | sortcount