Humdrum Lab 2
Perhaps view Humdrum Lab 1 before doing this lab.
This lab introduces the Essen Folksong Collection and some tools for analysis and data extraction from the collection.
Contents
Downloading a repertory from the collection
The data files for the Essen Folksong Collection can be browsed here:
http://kern.ccarh.org/browse?l=/essen
The primary collection consists of about 5,000 German songs and 3,000 Chinese songs.
To download a particular sub-collection try:
mkdir erk cd erk humsplit h://essen/europa/deutschl/erk
This should download 1701 song files:
ls *.krn | wc -l 1701
Also download one of the Chinese repertories for comparison:
mkdir ../han cd ../han humsplit h://essen/asia/china/han
There should be 1223 songs in this dataset:
ls *.krn | wc -l 1223
scaletype tool
A simple scale categorizing tool can list what sort of musical content the songs have:
scaletype *.krn
Here is how to count each basic category of scaletype:
scaletype -F *.krn | sortcount 817 heptatonic 583 hexatonic 159 pentatonic 108 chromatic 33 toofew
The meaning of each category: `heptatonic` is 7 pitch classes, `hexatonic` is 6 pitch classes, `pentatonic` is 5 pitch classes, `chromatic` is more than 7 pitch classes and `toofew` is less than 5 pitch classes.
- Exercise: What scale degrees are missing in songs classified as `pentatonic`?
- Exercise: in the `toofew` how many pitch classes are there? (0, 1, 2, 3, or 4).
- Compare these results to the han dataset.
Meter
To list the time signatures of each song in the subcollection, try using grep:
grep ^\*M[0-9] *.krn
The string `^\*M[0-9]` is a "regular expression" which means: at the starting of a line (^) find an asterisk followed by the capital letter M followed by a character in the range from 0 to 9 (a digit).
It is often safer to place the regular expression in single quotes to avoid the terminal from messing with it:
grep '^\*M[0-9]'
The backslash is needed in this case because an asterisk by itself is a wildcard character in regular expressions which means to match to 0 or more of the previous character. But in this case we want to find the asterisk character so its non-special meaning is indicated by the backslash.
Count the number of songs in each meter with this command
grep -h '^\*M[0-9]' | sortcount
The -h option will suppress the name of the file in the output from grep. Type `man grep` to see more documentation about grep and its options. Alternatively search online for more information about grep. For example, here is a youtube tutorial on grep: https://www.youtube.com/watch?v=3w7xrQWRYrU
Sortcount is a humdrum extras script that basically merges these regular unix command line tools. try:
grep -h '^\*M[0-9]' | sort | uniq -c | sort -nr
Try `man sort` and `man uniq` to learn more about those standard unix command-line tools.
The count of meters that I am getting for the erk subcollection is:
581 *M4/4 522 *M3/4 473 *M2/4 193 *M6/8 119 *M3/8 27 *M6/4 8 *M3/2 5 *M2/2 1 *M4/2 1 *M5/4
Selecting by scaletype and meter
Often when you do analysis on music, you need to select music with common forms or features (such as input for David Cope's sort of analysis and music generation). Here is an example of selecting files by a particular meter and scaletype:
scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic
or
scaletype $(grep -l '^\*M3/4' *.krn) | grep heptatonic
The text in the backquotes or $() is a command thatis run first, then the result is passed as arguments to the scaletype program. In this case `-l` (lower-case L) means that grep should display all filenames for files that have a match for the regular expression.
Note that this method will have problems if the song changes the meter after the start of the music (if any of the meters in the file are matching to 3/4, then the whole file is being considered).
The scaletype command needs an equivalent of `-l`, but for now if you need a list of the files without the heptatonic label:
scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic | sed 's/:.*//'
Save a copy of the 3/4 heptatonic files to a separate directory (262 files according to my count):
mkdir 347 scaletype `grep -l '^\*M3/4' *.krn` | grep heptatonic | sed 's/:.*//' > list.txt cp `cat list.txt` 347 cd 347 scaletype *.krn grep -h '^\*M3/4' *.krn | sortcount
Beat tool
Try the Humdrum Extras `beat` tool on the heptatonic 3/4-meter songs.
A basic listing of the options available in the tool can be listed on the command line with the `--options` option:
beat --options
More detailed information about the tool can be found at http://extras.humdrum.org/man/beat
The `beat` tool calculates and extracts rhythmic, duration and metric information from Humdrum `**kern` files.
Use the -b option to extract beat positions in the meter:
beat *.krn | ridx -H | sortcount -p
What is the most common beat position in 3/4, the least common that are present in the data. What is the commonality of each beat 1, 2 and 3?
There are complications since the above analysis includes rests and secondary tied notes. To avoid these problems in the analysis:
$ beat *.krn -p | hgrep -vkd '[]_r]' | cut -f 1 | sortcount 29.44 1 24.35 3 16.93 2 11.67 3.5 7.01 2.5 6.15 1.5 1.84 1.75 0.8 3.75 0.48 4 0.38 2.75 0.31 4.5 0.2 1.25 0.19 3.25 0.08 3.66667 0.08 3.33333 0.05 2.25 0.01 1.33333 0.01 1.66667 0.01 2.33333 0.01 4.25 0.01 2.66667
`beat -p` means to prepend the beat analysis spine to the start of the line.
hgrep is a Humdrum Extras tool that emulates the functionality of grep but is aware of Humdrum file structure: http://extras.humdrum.org/man/hgrep/
`hgrep -v` means show the lines of data which do not match the regular expression.
`hgrep -k` means only search in `**kern` data.
`hgrep -d` means only process data tokens.
Note that `hgrep -kd` is shorthand for `hgrep -k -d`.
A simpler method for doing the same analysis is to require the beat analysis to only analyze metric positions of lines with at least one note attack by using the `beat -A1` option:
beat *.krn -A1 | ridx -H | sortcount -p
- Exercise: try the same analysis on the 4/4 meter pieces from the erk collection.
Context
The classic Humdrum Tool called `context` is useful for extracting sequences of n-grams for using in Markov chain analyses. See the webpage http://www.humdrum.org/man/context for more documentation about the context tool.
Try context on the metric data extracted from the erk songs.
Here is an example of extracting a context of 2 adjacent notes:
$ beat *.krn | context -n 2 | ridx -H | grep -v = | sortcount | head -n 20 1026 2 3 991 1 2 981 3 3.5 685 2.5 3 639 1 3 621 1 1.5 567 1.5 2 370 2 2.5 330 1 2.5 191 1.75 2 169 2 3.5 143 1 1.75 48 1.5 1.75 47 3 3.75 37 2.75 3 36 3.5 3.75 25 2 2.75 24 4 4.5 23 3 4 21 1 1.25
So the most common metric pattern in the data is going from beat 2 to beat 3.
Here is an example of how to extract all metric patterns in the songs:
$ beat *.krn | context -n 20 | ridx -H | grep ^1 | grep -v '^1\.' | sed 's/=.*//' | sortcount -p | head -n 20 19.59 1 3 15.86 1 2 3 7.69 1 2 3 3.5 5.97 1 2.5 3 3.5 5.26 1 2.5 3 4.25 1 4.2 1 1.5 2 3 3.5 3.92 1 1.5 2 2.5 3 3.5 3.82 1 1.5 2 3 3.68 1 3 3.5 3.11 1 1.5 2 3.5 2.82 1 2 2.5 3 3.5 1.96 1 2 1.67 1 2 3.5 1.39 1 1.75 2 2.5 3 3.5 1.34 1 2 2.5 3 1.15 1 1.5 2 0.96 1 1.75 2 3.5 0.91 1 1.75 2 3 0.86 1 1.5 2 2.5
20% of the measures have the pattern of half-note followed by a quarter note (beats one and three).
- Exercise: Notice that beat 1.5 is fairly rare in these 3/4 songs. How does this compare to the songs in 4/4?
mint
"Melodic Interval" extraction
Try:
cat deut*.krn | kern -x | mint | ridx -H | grep -v r | sortcount -v --sort interval -p --min 0.5 > analysis.html
- Meaning of each command?
cat deut*.krn | kern -x | mint | grep -v = | context -n 2 | ridx -H | grep -v r | sortcount -v -p --min 0.5 > index.html
- How to remove P1 (unison/repeated notes)?
Next lab
Next, you can view Humdrum Lab 3 which demonstrates searching for musical features in the Essen Folksong Collection.
Lab 1 (intro) | Lab 2 (Essen) | Lab 3 (searching) | Lab 4 (JRP) | Lab 5 (Wikifonia) | Lab 6 (bar chart) | Lab 7 (regular expressions) | Lab 8 (chorck & cint) |