Difference between revisions of "Humdrum Lab 2"

From CCARH Wiki
Jump to navigation Jump to search
Line 40: Line 40:
  
 
* Exercise: in the `toofew` how many pitch classes are there? (0, 1, 2, 3, or 4).
 
* Exercise: in the `toofew` how many pitch classes are there? (0, 1, 2, 3, or 4).
 +
 +
== Meter
 +
 +
To list the time signatures of each song in the subcollection, try using grep:
 +
 +
    grep ^\*M[0-9]  *.krn
 +
 +
The string `^\*M[0-9]` is a "regular expression" which means: at the starting of a line (^) find an asterisk followed by the capital letter M followed by a character in the range from 0 to 9 (a digit).
 +
 +
It is often safer to place the regular expression in single quotes to avoid the terminal from messing with it:
 +
 +
    grep '^\*M[0-9]'
 +
 +
The backslash is needed in this case because an asterisk by itself is a wildcard character in regular expressions which means to match to 0 or more of the previous character.  But in this case we want to find the asterisk character so its non-special meaning is indicated by the backslash.
 +
 +
Count the number of songs in each meter with this command
 +
 +
          grep  -h '^\*M[0-9]' | sortcount
 +
 +
The -h option will suppress the name of the file in the output from grep.  Type `man grep` to see more documentation about grep and its options.  Alternatively search online for more information about grep.  For example, here is a youtube tutorial on grep: https://www.youtube.com/watch?v=3w7xrQWRYrU
 +
 +
Sortcount is a humdrum extras script that basically merges these regular unix command line tools.  try:
 +
 +
      grep -h '^\*M[0-9]' | sort | uniq -c | sort -nr
 +
 +
Try `man sort` and `man uniq` to learn more about those standard unix command-line tools.
 +
 +
The count of meters that I am getting for the erk subcollection is:
 +
 +
      581 *M4/4
 +
      522 *M3/4
 +
      473 *M2/4
 +
      193 *M6/8
 +
      119 *M3/8
 +
      27 *M6/4
 +
      8 *M3/2
 +
      5 *M2/2
 +
      1 *M4/2
 +
      1 *M5/4

Revision as of 18:01, 17 April 2018

This lab introduces the Essen Folksong Collection and some tools for analysis and data extraction from the collection.

Downloading a repertory from the collection

The data files for the Essen Folksong Collection can be browsed here:

    http://kern.ccarh.org/browse?l=/essen

The primary collection consists of about 5,000 German songs and 3,000 Chinese songs.

To download a particular sub-collection try:

     humsplit  h://essen/europa/deutschl/erk

This should download 1701 song files:

    $ wc -l *.krn 
    1701

Create a new directory with `mkdir` first to keep them all in one location, or download all songs to a single file with:

     humcat -s h://essen/europa/deutschl/erk


scaletype tool

A simple scale categorizing tool can list what sort of musical content the songs have:

   scaletype *.krn

Here is how to count each basic category of scaletype:

    scaletype -F *.krn | sortcount
    817	heptatonic
    583	hexatonic
    159	pentatonic
    108	chromatic
    33	toofew

The meaning of each category: `heptatonic` is 7 pitch classes, `hexatonic` is 6 pitch classes, `pentatonic` is 5 pitch classes, `chromatic` is more than 7 pitch classes and `toofew` is less than 5 pitch classes.


  • Exercise: What scale degrees are missing in songs classified as `pentatonic`?
  • Exercise: in the `toofew` how many pitch classes are there? (0, 1, 2, 3, or 4).

== Meter

To list the time signatures of each song in the subcollection, try using grep:

    grep ^\*M[0-9]  *.krn

The string `^\*M[0-9]` is a "regular expression" which means: at the starting of a line (^) find an asterisk followed by the capital letter M followed by a character in the range from 0 to 9 (a digit).

It is often safer to place the regular expression in single quotes to avoid the terminal from messing with it:

    grep '^\*M[0-9]'

The backslash is needed in this case because an asterisk by itself is a wildcard character in regular expressions which means to match to 0 or more of the previous character. But in this case we want to find the asterisk character so its non-special meaning is indicated by the backslash.

Count the number of songs in each meter with this command

         grep  -h '^\*M[0-9]' | sortcount

The -h option will suppress the name of the file in the output from grep. Type `man grep` to see more documentation about grep and its options. Alternatively search online for more information about grep. For example, here is a youtube tutorial on grep: https://www.youtube.com/watch?v=3w7xrQWRYrU

Sortcount is a humdrum extras script that basically merges these regular unix command line tools. try:

     grep -h '^\*M[0-9]' | sort | uniq -c | sort -nr

Try `man sort` and `man uniq` to learn more about those standard unix command-line tools.

The count of meters that I am getting for the erk subcollection is:

     581	*M4/4
     522	*M3/4
     473	*M2/4
     193	*M6/8
     119	*M3/8
     27	*M6/4
     8	*M3/2
     5	*M2/2
     1	*M4/2
     1	*M5/4