Difference between revisions of "Humdrum lab 5"

From CCARH Wiki
Jump to navigation Jump to search
(35 intermediate revisions by the same user not shown)
Line 1: Line 1:
This lab is about plotting data, and doing further analysis of the raw data extracted from Humdrum files.
+
This lab is about the wikifonia data.
  
 +
== Update Humdrum tools ==
  
There are several possibilities for plotting.  We will focus on the last one in the lab, but here are other possibilities:
+
The beat command has some new options (-C and -U), so you will need to update humdrum extras:
  
 +
    cd $(which beat | sed 's/humdrum-tools.*/humdrum-tools/')
 +
    make update
 +
    make
  
 +
== Basic information ==
  
== Load data into a spreadsheet ==
+
How many files
  
You can copy-and-paste data into a spreadsheet, either Microsoft Excel, Google Spreadsheets, or similar.
+
    ls *.krn | wc -l
 +
    6710
  
In MacOS, try the command:
+
How many have lyrics:
  
     humcat -s h://chorales | deg -at | serialize | ridx -H | egrep -v "=|r" | sortcount | pbcopy
+
     grep -l "\*\*text" *.krn | wc -l
 +
    5460
  
pbcopy is used to copy data to the clipboard.
+
How many have chords:
  
This will extract a count of scale-degrees in Bach chorales:
+
    grep -l "\*\*mxhm" *.krn | wc -l
 +
    6282
  
15628 5
+
How many have two or more verses:
14991 1
 
11710 3
 
10721 2
 
9435 4
 
7761 6
 
5742 7
 
4728 7-
 
1135 6+
 
1134 4+
 
556 3+
 
332 2-
 
326 1+
 
324 5+
 
50 3-
 
41 5-
 
32 2+
 
11 6-
 
7 1-
 
2 4-
 
  
 +
    grep -l "\*\*text.*\*\*text" *.krn | wc -l
 +
    2006
  
Open up a spreadsheet program and paste the resulting data into the spreadsheet.
+
== Bibliographic information ==
  
 +
Who are the top 10 represented composer in the data:
  
[[File:excel-scalegrees-chorales.png|center|500px]]
+
  grep -h COM *.krn | sortcount | head -n 10
 +
  132 !!!COM: Unknown
 +
  121 !!!COM: Hungarian folk song
 +
  119 !!!COM: Traditional
 +
  91 !!!COM: Richard Rodgers
 +
  75 !!!COM: Irving Berlin
 +
  67 !!!COM: Hungarian song
 +
  65 !!!COM: Cole Porter
 +
  46 !!!COM: Harry Warren
 +
  45 !!!COM: George Gershwin
 +
  40 !!!COM: Harold Arlen
  
Notice that some of the cells in the B column are left justified while others are right justified.  This is because Excel is autodetecting the format of each cell.  It is right justifying the numbers, and left justifying the text.
+
List the titles of all pieces where George Gershwin is the composer:
  
Make all of the B column identified as text by clicking on the "B" at the top of the column, then right-click and choose "Format Cell..." for the context menu that appears, and choose Text as the type for the column cells:
+
  grep OTL  $(grep -li  COM.*Gershwin *.krn) | sort -k2
 +
  WF2190.krn:!!!OTL: 'S Wonderful!
 +
  WF2191.krn:!!!OTL: A FOGGY DAY
 +
  WF2267.krn:!!!OTL: A Foggy Day
 +
  WF2186.krn:!!!OTL: A Woman Is A Sometime Thing
 +
  WF2192.krn:!!!OTL: Bidin' My Time
 +
  WF2178.krn:!!!OTL: Blues
 +
  WF2193.krn:!!!OTL: But Not For Me
 +
  WF2194.krn:!!!OTL: By Strauss
 +
  WF2195.krn:!!!OTL: Clap yo' hands
 +
  WF2185.krn:!!!OTL: Do It Again!
 +
  WF2196.krn:!!!OTL: Embraceable You
 +
  WF2197.krn:!!!OTL: Fascinating Rhythm
 +
  WF2268.krn:!!!OTL: For You, For Me, For Evermore
 +
  WF2198.krn:!!!OTL: How Long Has This Been Going On
 +
  WF2199.krn:!!!OTL: I Got Plenty o' Nuttin'
 +
  WF2103.krn:!!!OTL: I Got Rhythm
 +
  WF2200.krn:!!!OTL: I Got Rhythm
 +
  WF2222.krn:!!!OTL: I Loves You Porgy
 +
  WF2201.krn:!!!OTL: I Was Doing All Right
 +
  WF2202.krn:!!!OTL: I Was Doing All Right
 +
  WF2179.krn:!!!OTL: I loves you Porgy
 +
  WF2184.krn:!!!OTL: I'll Build A Stairway To Paradise
 +
  WF2203.krn:!!!OTL: I've Got A Crush On You
 +
  WF2204.krn:!!!OTL: Isn't It A Pity
 +
  WF2221.krn:!!!OTL: It Ain't Necessarily So
 +
  WF2205.krn:!!!OTL: Let's Call the Whole Thing Off
 +
  WF2220.krn:!!!OTL: Liza
 +
  WF2219.krn:!!!OTL: Liza (All the clouds'll roll away)
 +
  WF2269.krn:!!!OTL: Love Is Here To Stay
 +
  WF2206.krn:!!!OTL: Love Walked In
 +
  WF2223.krn:!!!OTL: My Man's Gone Now
 +
  WF2207.krn:!!!OTL: Nice Work If You Can Get It
 +
  WF2208.krn:!!!OTL: Oh Lady Be Good
 +
  WF2189.krn:!!!OTL: SUMMERTIME
 +
  WF2995.krn:!!!OTL: Shoes With Wings On
 +
  WF2183.krn:!!!OTL: Somebody Loves Me
 +
  WF2209.krn:!!!OTL: Someone To Watch Over Me
 +
  WF2210.krn:!!!OTL: Soon
 +
  WF2211.krn:!!!OTL: Strike Up The Band
 +
  WF2180.krn:!!!OTL: Summertime
 +
  WF2181.krn:!!!OTL: Summertime
 +
  WF2187.krn:!!!OTL: Summertime
 +
  WF2212.krn:!!!OTL: Summertime
 +
  WF2224.krn:!!!OTL: Swanee
 +
  WF2213.krn:!!!OTL: That Certain Feeling
 +
  WF2214.krn:!!!OTL: The Man I Love
 +
  WF2225.krn:!!!OTL: The Simple Life
 +
  WF2188.krn:!!!OTL: There's A Boat Dat's Leaving Soon For New York
 +
  WF2215.krn:!!!OTL: They All Laughed
 +
  WF2216.krn:!!!OTL: They Can't Take That Away From Me
 +
  WF2217.krn:!!!OTL: They Can't Take That Away From Me
 +
  WF2218.krn:!!!OTL: Who Cares
  
[[File:excel-format-cells-as-text.png|center|300px]]
+
== Texture ==
  
Now all cells in the B column are text:
+
How many contain more than one **kern spine (i.e., are polyphonic, probably piano):
  
[[File:excel-cells-are-text.png|center|500px]]
+
    grep -l "\*\*kern.*\*\*kern" *.krn | wc -l
 +
    58
  
Switch the order of the columns and then create a bar chart:
+
WF5118.krn is an example:
  
[[File:excel-chart1.png|center|500px]]
+
[[File:wf5118.png|center|500px]]
  
Compare the to barchart created further below with pandas/jupyter:
+
This one is interesting because it has invisible chords in the top staff which are realizing the harmonic chords above the staff.
  
[[File:excel-bar-chart-jupyter.png|center|300px]]
+
How many songs have chords (this takes a long time to calculate -- 70 songs per second = 95 seconds):
  
 +
  for i in *.krn
 +
  do
 +
      extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
 +
  done | grep -v " ^0$" | wc -l
 +
  365
 +
 +
How many songs do not have chords:
 +
 +
  for i in *.krn
 +
  do
 +
      extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
 +
  done | grep " ^0$" | wc -l
 +
  6345
  
== Plotting with Gnuplot ==
+
== Duration ==
  
[http://www.gnuplot.info/ Gnuplot] is a handy command-line plotting program.  Here is an example of plotting the same data in gnuplot:
+
What is the duration of all songs if played back-to-back and at the specified tempo without repeats?
  
First save the data to a file:
+
    gettime -T *.krn | tail -n 1
 +
    286:50:23.1354 hours
  
          humcat -s h://chorales | deg -at | serialize | ridx -H | egrep -v "=|r" | sortcount > data.txt
+
What are the longest songs:
  
On MacOS, install gnuplot with Homebrew:
+
    gettime --simple -T *.krn | sort -k2 -nr | head -n 10
+
    WF6618.krn: 3120
          brew install gnuplot
+
    WF0181.krn: 3120
 +
    WF0182.krn: 1864
 +
    WF3616.krn: 1420
 +
    WF6336.krn: 1134
 +
    WF5131.krn: 909
 +
    WF6068.krn: 785
 +
    WF5004.krn: 696
 +
    WF3226.krn: 671
 +
    WF1249.krn: 664
  
Then create a file called plotbar with these contents:
+
The -k2 option means to sort by the second column of data. -n means to sort numerically rather than alphabetically, and -r means to sort by highest first.
  
  #!/usr/bin/env gnuplot
+
What are the shortest songs:
 
 
  set terminal svg size 800,500 enhanced font "Helvetica,20"
 
  set output "output.svg" 
 
 
 
  set style data histogram
 
  set style fill solid
 
  set title "Scale degrees used in Bach chorales"
 
 
 
  unset key
 
 
 
  plot "data.txt" using 1:xtic(2) linecolor rgb "#ff0088"
 
  
 +
    gettime --simple -T *.krn | sort -k2 -nr | tail -n 10
 +
    WF2814.krn: 16
 +
    WF2806.krn: 16
 +
    WF2795.krn: 16
 +
    WF2785.krn: 16
 +
    WF2856.krn: 14
 +
    WF2852.krn: 12
 +
    WF2799.krn: 12
 +
    WF6338.krn: 8
 +
    WF5609.krn: 8
  
Run the script with this command:
+
  The shortest song in VHV:
  
  chmod 0755 plotbar
+
    cat WF5609.krn | pbcopy
  ./plotbar
 
  
This should create a file called output.svg that looks like this:
+
[[File:wf-shortest.png|center|500px]]
  
 +
== Meter ==
  
[[File:gnuplot-barchart.png|center|500px]]
+
What sort of meters are in the database and how much of each type?
  
 +
    beat -Ca *.krn | beat -Ua  | extractx -s '$1-$'  | ridx -H | sortcount -p
 +
    65.89 4 4
 +
    15.44 3 4
 +
    11.67 2 2
 +
    3.24 2 4
 +
    2.78 6 8
 +
    0.52 12 8
 +
    0.16 5 4
 +
    0.14 9 8
 +
    0.09 6 4
 +
    0.06 3 8
 +
    0.01 3 2
 +
    0.01 7 4
 +
    0    2 8
 +
    0    7 8
 +
    0    9 4
 +
    0    10 8
 +
    0    17 16
 +
    0    1 2
 +
    0    5 8
 +
    0    1 4
 +
    0    4 8
  
== Jupyter/pandas/matplotlib ==
+
The most common meter is 4/4, where 65% of the music is in that meter.
  
The rest of the lab examines how to load and plot similar data in a Jupyter notebook using pandas and matplotlib to display the barchart.
+
-C means extract the count of the meter (the top number).
  
Check out the online version of the notebook here:  http://nbviewer.jupyter.org/url/notebooks.humdrum.org/jupyter/craig/barplots/barplots.ipynb
+
-U means extract the duration unit from the meter (the bottom number).
  
Also, there is a download button on that page to download a local copy of the original jupyter notebook file.   
+
-C and -U are output once for each measure, so using these are a simple way of counting the number of measures in the scoresIf you add -F option with these two options, every data line will display the metrical information.
  
To run jupyter on your computer, do these commands (may vary depending on os and other installation systems, but this worked well for me in MacOS):
+
-a means to append the analysis to the end of the lines (keeping the original input score).
  
      brew install python3
+
The extract option:
  
Also some necessary modules for python:
+
    -s '$1-$'
  
    pip3 install matplotlib
+
means to extract from one before the last spine to the last spine.  $1 is one before the last spine, $2 is two before the last spine, and so on.
    pip3 install pandas
 
  
This will take a while.  Then install [https://github.com/jupyterlab/jupyterlab jupyterlab], which is the development version of the jupyter notebook web interface:
+
== Chord labels ==
  
      pip3 install jupyterlab
+
How many unique chord labels are there?
  
A nice, but optional thing to do is install the [https://github.com/ian-r-rose/jupyterlab-toc table-of-contents tab plugin]:
+
    extractx -i mxhm  * | ridx -H | sortcount | wc -l
 +
    1399
  
    brew install nodejs
+
What are the most common ones:
    jupyter-labextension install jupyterlab-toc
 
  
But I have had problems on most computers getting this to install without the installation process hanging...
+
    extractx -i mxhm  * | ridx -H | sortcount -p | head -n 10
 +
    7.21 C major
 +
    6.14 F major
 +
    4.94 G major
 +
    4.83 G dominant
 +
    4.07 C dominant
 +
    3.55 D dominant
 +
    3.28 B- major
 +
    2.81 E- major
 +
    2.3 D major
 +
    2.25 F dominant
  
 +
How many chord qualities:
  
To start jupyterlab, type in the terminal:
+
    extractx -i mxhm  * | ridx -H | sed 's/[^ ]* //; s/\/.*//' | sortcount | wc -l
 +
    80
  
    jupyter-lab
+
Here are the 80 qualities:
  
If you want to use a specific browser because jupyter chose the wrong one:
+
  93061 major
 +
  64155 dominant
 +
  31402 minor
 +
  27490 minor-seventh
 +
  8594 major-seventh
 +
  5861 dominant-ninth
 +
  5733 major-sixth
 +
  4138 diminished
 +
  2943 min
 +
  2912 7
 +
  2816 minor-sixth
 +
  2159 half-diminished
 +
  2154 suspended-fourth
 +
  1884 diminished-seventh
 +
  1738 augmented-seventh
 +
  1408 augmented
 +
  1208 C
 +
  1102 dominant-13th
 +
  1082 min7
 +
  1008 F
 +
  1008 maj7
 +
  967 dominant-seventh
 +
  892 G
 +
  878 minor-ninth
 +
  705 D
 +
  650 B-
 +
  592 major-ninth
 +
  355 E-
 +
  352 A
 +
  280 E
 +
  249 power
 +
  237 dominant-11th
 +
  222 suspended-second
 +
  202 minor-11th
 +
  165 minor-major
 +
  157 dim
 +
  129 maj
 +
  128 A-
 +
  96 augmented-ninth
 +
  84 9
 +
  71 other
 +
  66 B
 +
  62 6
 +
  62 major-minor
 +
  58 sus47
 +
  46 aug
 +
  46 D-
 +
  46 min9
 +
  36 G-
 +
  29 m7b5
 +
  23 major-13th
 +
  21 maj9
 +
  19 min6
 +
  19 none
 +
  17 pedal
 +
  16 dim7
 +
  16 maj69
 +
  15 F#
 +
  12 major B- major
 +
  8 major F major
 +
  7 C#
 +
  6 major .
 +
  5 minor D minor D minor
 +
  4 minor-13th
 +
  3 minor G minor
 +
  3 C-
 +
  3 minMaj7
 +
  3 D#
 +
  2 minor .
 +
  2 major F major F major
 +
  2 dominant C dominant
 +
  2 5b
 +
  2 major C major C major
 +
  1 major E- major
 +
  1 ma
 +
  1 major . .
 +
  1 major G major G major
 +
  1 minor . .
 +
  1 7sus
 +
  1
  
    jupyter-lab --browser=chrome
 
    jupyter-lab --browser=firefox
 
    jupyter-lab --browser=safari
 
  
Here is the default window of jupyter-lab:
+
  extractx -i mxhm  * | ridx -H | sed 's/ .*//' | sortcount
 +
  48078 C
 +
  45001 G
 +
  38459 F
 +
  33577 D
 +
  24361 A
 +
  23237 B-
 +
  16334 E-
 +
  15587 E
 +
  8279 A-
 +
  7869 B
 +
  3487 D-
 +
  3365 F#
 +
  1701 C#
 +
  1316 G-
 +
  824 G#
 +
  187 D#
 +
  168 C-
 +
  52 A#
 +
  28 F-
 +
  6 B--
 +
  4 B#
 +
  4 . F
 +
  3 B/D#
 +
  3 E#
 +
  3 C/G
 +
  1 . B-
 +
  1 A--
  
[[File:jupyter-lab-window.png|center|700px]]
 
  
To create a new notebook, click on the "Python 3" icon in the "Notebook" section of the Launcher.  You can load a notebook saved on the computer by using the file viewer on the left side of the window.  Click on the "Files" tab on the far left to hide the file menu.  Here is what the browser looks like after doing all of that and then typing a test command:
+
What is the most common 3-note chord sequence:
  
[[File:jupyter-lab-notebook2.png|center|700px]]
+
    extractx -i mxhm  * | grep -v ^= | serialize | context -n 3 | ridx -H | sortcount | head -n 10
 +
    1779 C major G dominant C major (I V7 I)
 +
    1334 F major C dominant F major (I V7 I)
 +
    1301 C major F major C major    (I V I)
 +
    1062 D minor-seventh G dominant C major (ii7 V7 I)
 +
    994  G major D dominant G major (I V7 I)
 +
    939  G dominant C major G dominant (V7 I V7)
 +
    863  G major C major G major    (V I V)
 +
    857 G dominant C major F major (V7 I IV)
 +
    812  F major B- major F major (V I V)
 +
    781  G minor-seventh C dominant F major (ii7 V7 I)
  
Now to go the webpage http://nbviewer.jupyter.org/url/notebooks.humdrum.org/jupyter/craig/barplots/barplots.ipynb
 
  
and download that notebook from the icon on the top right:
+
== Scale degrees ==
  
[[File:nbviewer-download-icon.png|center]]
+
The key information is not present in the files.  They need to be processed further for that.  MusicXML input has key information, but it is often incorrect since people use it more for key signature information, and the "mode" part is usually left at "major".  The finalis-tonic script can be used to add an approximate key.
  
You will probably have to copy the notebook to the same directory in which you started jupyter-lab.
 
  
Before running the commands in the notebook, you may have to update your humdrum tools installation:
+
  for i in *.krn
 +
  do
 +
      finalis-tonic $i | extractx -i **kern | deg -at | serialize | ridx -H | grep -v r
 +
  done | sortcount
  
    cd `which transpose | sed 's/humdrum-tools.*/humdrum-tools/'` && make update && make
+
Output includes chords and some other junk, but basic counts are:
  
Since I had to update the [http://extras.humdrum.org/man/transpose transpose] tool to allow a stream of multiple input files at one time.
+
161963 1
 +
135309 5
 +
102726 2
 +
102275 3
 +
86931 6
 +
85740 4
 +
50631 7
 +
38827 7-
 +
38765 3-
 +
24511 6-
 +
12031 4+
 +
11188 2-
 +
6041 5+
 +
5822 2+
 +
5803 1+
 +
4907 5-
 +
2822 1-
 +
2669 6+
 +
1874 4-
 +
1222 7+
 +
1126 3+
  
=== Useful tips for working in jupyter notebooks ===
+
Looking at 5-note sequences:
  
* To run a program in a cell, click in the cell with a mouse and then type {{keypress|shift-return}}. This will evaluate the cell and print the results underneath.
+
  for i in *.krn
 +
  do
 +
      finalis-tonic $i | extractx -i **kern | deg -at | serialize -f | grep -v '^[r=]' \
 +
            | context -n 5 | ridx -H ; done | sortcount > /tmp/analysis-data.txt
 +
  done
 +
  head -n 25 analysis-data.txt
  
* To add a new cell above the current cell. Press {{keypress|esc}} to exit from editing the cell. Then type {{keypress|a}}. A new cell should be added above the current one. Similarly {{keypress|b}} will create a new cell below the current one.
+
  6053 1 1 1 1 1
 +
  4417 5 5 5 5 5
 +
  2186 3 3 3 3 3
 +
  1872 4 4 4 4 4
 +
  1573 2 2 2 2 2
 +
  1293 3 2 1 2 3
 +
  1282 5 4 3 2 1
 +
  1189 6 6 6 6 6
 +
  1087 1 2 3 2 1
 +
  915 2 1 1 1 1
 +
  894 3 4 3 2 1
 +
  890 3- 3- 3- 3- 3-
 +
  883 3 3 3 2 1
 +
  855  3 2 1 1 1
 +
  852  1 1 1 1 2
 +
  825  6 5 4 3 2
 +
  806  5 5 5 5 4
 +
  786  3 2 1 2 1
 +
  785  5 5 5 5 6
 +
  761  3 3 3 3 2
 +
  761  7 7 7 7 7
 +
  753  7- 7- 7- 7- 7-
 +
  752  5 1 1 1 1
 +
  726  1 2 3 4 5
 +
  723  4 3 2 1 1
  
* Text can be added to the page by converting a cell to the markdown format, and then typing markdown data.  To convert to markdown, click in the cell then press {keypress|esc}} to defocus on the text, and then type {{keypress|m}}.  Then click in the cell again and add text.  Then when finished, press {{keypress|shift-enter}} to convert to text.
 
  
* Here is a summary of Markdown syntax for adding text commentary to your notebooks: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet.  Jupyter notebook markdown is somewhat more constrained than Github flavor of markdown.
+
== Key ==
  
* 15 Nov 2017 blog about Jupyterlab: [https://medium.com/@brianray_7981/jupyterlab-first-impressions-e6d70d8a175d JupyterLab first impressions]
+
Using entire contents of file to determine key:
  
* Video presentation on Jupyterlab: https://www.youtube.com/watch?v=w7jq4XgwLJQ
+
  keycor *.krn | sed 's/.*: //' | sortcount -p
 +
  22.33 C Major
 +
  15.05 F Major
 +
  11.87 G Major
 +
  9.21 E- Major
 +
  6.87 B- Major
 +
  5.82 A Minor
 +
  4.51 D Minor
 +
  4.39 D Major
 +
  4.16 C Minor
 +
  3.44 G Minor
 +
  3.03 E Minor
 +
  2.11 A- Major
 +
  1.78 A Major
 +
  1.41 F Minor
 +
  1.02 E Major
 +
  0.64 B- Minor
 +
  0.53 D- Major
 +
  0.39 E- Minor
 +
  0.39 B Minor
 +
  0.29 B Major
 +
  0.18 G- Major
 +
  0.16 F# Minor
 +
  0.16 C# Minor
 +
  0.08 F# Major
 +
  0.06 C# Major
 +
  0.04 D- Minor
 +
  0.04 A- Minor
 +
  0.04 G- Minor
 +
  0.02 G# Minor
 +
  0.02 G# Major
  
* Jupyterlab tutorial docs: https://media.readthedocs.org/pdf/jlab/debug-rtd/jlab.pdf
 
  
* Jupyterlab online docs: https://jupyterlab.readthedocs.io/en/stable/
+
Using only the first 8 bars of the music for analysis of key:
 +
 
 +
  for i in *.krn
 +
  do
 +
      myank -m 1-8 $i | keycor
 +
  done | sed 's/.*: //' | sortcount -p
  
Jupyterlab also allows multiple sets of tabbed workspaces, which is useful:
+
  19.46 C Major
 +
  12.32 F Major
 +
  10.2 G Major
 +
  6.89 E- Major
 +
  6.7 A Minor
 +
  6.65 D Minor
 +
  5.64 B- Major
 +
  5.36 C Minor
 +
  5.21 D Major
 +
  4.3 G Minor
 +
  4.1 E Minor
 +
  2.41 F Minor
 +
  1.72 A Major
 +
  1.65 A- Major
 +
  1.31 B Minor
 +
  1.22 B Major
 +
  1.04 E Major
 +
  0.89 B- Minor
 +
  0.56 E- Minor
 +
  0.56 D- Major
 +
  0.48 F# Minor
 +
  0.45 C# Minor
 +
  0.3 A- Minor
 +
  0.23 G- Major
 +
  0.12 F# Major
 +
  0.08 C# Major
 +
  0.07 G# Minor
 +
  0.05 G- Minor
 +
  0.03 G# Major
 +
  0.02 D- Minor
  
[[File:jupyterlab-tabbed-workspace.png|center|700px]]
 
  
== R markdown notebooks ==
 
  
The [https://www.r-project.org/ R langauge] is another useful language for statistical analysis and plotting of data.  This is not covered in the lab, but checkout the website:  https://rmarkdown.rstudio.com/r_notebooks.html
+
{{humdrum_labs}}

Revision as of 01:18, 30 June 2018

This lab is about the wikifonia data.

Update Humdrum tools

The beat command has some new options (-C and -U), so you will need to update humdrum extras:

   cd $(which beat | sed 's/humdrum-tools.*/humdrum-tools/')
   make update
   make

Basic information

How many files

   ls *.krn | wc -l
   6710

How many have lyrics:

    grep -l "\*\*text" *.krn | wc -l
    5460

How many have chords:

    grep -l "\*\*mxhm" *.krn | wc -l
    6282

How many have two or more verses:

    grep -l "\*\*text.*\*\*text" *.krn | wc -l
    2006

Bibliographic information

Who are the top 10 represented composer in the data:

  grep -h COM *.krn | sortcount | head -n 10
  132	!!!COM:	Unknown
  121	!!!COM:	Hungarian folk song
  119	!!!COM:	Traditional
  91	!!!COM:	Richard Rodgers
  75	!!!COM:	Irving Berlin
  67	!!!COM:	Hungarian song
  65	!!!COM:	Cole Porter
  46	!!!COM:	Harry Warren
  45	!!!COM:	George Gershwin
  40	!!!COM:	Harold Arlen

List the titles of all pieces where George Gershwin is the composer:

  grep OTL  $(grep -li  COM.*Gershwin *.krn) | sort -k2
  WF2190.krn:!!!OTL:	'S Wonderful!
  WF2191.krn:!!!OTL:	A FOGGY DAY
  WF2267.krn:!!!OTL:	A Foggy Day
  WF2186.krn:!!!OTL:	A Woman Is A Sometime Thing
  WF2192.krn:!!!OTL:	Bidin' My Time
  WF2178.krn:!!!OTL:	Blues
  WF2193.krn:!!!OTL:	But Not For Me
  WF2194.krn:!!!OTL:	By Strauss
  WF2195.krn:!!!OTL:	Clap yo' hands
  WF2185.krn:!!!OTL:	Do It Again!
  WF2196.krn:!!!OTL:	Embraceable You
  WF2197.krn:!!!OTL:	Fascinating Rhythm
  WF2268.krn:!!!OTL:	For You, For Me, For Evermore
  WF2198.krn:!!!OTL:	How Long Has This Been Going On
  WF2199.krn:!!!OTL:	I Got Plenty o' Nuttin'
  WF2103.krn:!!!OTL:	I Got Rhythm
  WF2200.krn:!!!OTL:	I Got Rhythm
  WF2222.krn:!!!OTL:	I Loves You Porgy
  WF2201.krn:!!!OTL:	I Was Doing All Right
  WF2202.krn:!!!OTL:	I Was Doing All Right
  WF2179.krn:!!!OTL:	I loves you Porgy
  WF2184.krn:!!!OTL:	I'll Build A Stairway To Paradise
  WF2203.krn:!!!OTL:	I've Got A Crush On You
  WF2204.krn:!!!OTL:	Isn't It A Pity
  WF2221.krn:!!!OTL:	It Ain't Necessarily So
  WF2205.krn:!!!OTL:	Let's Call the Whole Thing Off
  WF2220.krn:!!!OTL:	Liza
  WF2219.krn:!!!OTL:	Liza (All the clouds'll roll away)
  WF2269.krn:!!!OTL:	Love Is Here To Stay
  WF2206.krn:!!!OTL:	Love Walked In
  WF2223.krn:!!!OTL:	My Man's Gone Now
  WF2207.krn:!!!OTL:	Nice Work If You Can Get It
  WF2208.krn:!!!OTL:	Oh Lady Be Good
  WF2189.krn:!!!OTL:	SUMMERTIME
  WF2995.krn:!!!OTL:	Shoes With Wings On
  WF2183.krn:!!!OTL:	Somebody Loves Me
  WF2209.krn:!!!OTL:	Someone To Watch Over Me
  WF2210.krn:!!!OTL:	Soon
  WF2211.krn:!!!OTL:	Strike Up The Band
  WF2180.krn:!!!OTL:	Summertime
  WF2181.krn:!!!OTL:	Summertime
  WF2187.krn:!!!OTL:	Summertime
  WF2212.krn:!!!OTL:	Summertime
  WF2224.krn:!!!OTL:	Swanee
  WF2213.krn:!!!OTL:	That Certain Feeling
  WF2214.krn:!!!OTL:	The Man I Love
  WF2225.krn:!!!OTL:	The Simple Life
  WF2188.krn:!!!OTL:	There's A Boat Dat's Leaving Soon For New York
  WF2215.krn:!!!OTL:	They All Laughed
  WF2216.krn:!!!OTL:	They Can't Take That Away From Me
  WF2217.krn:!!!OTL:	They Can't Take That Away From Me
  WF2218.krn:!!!OTL:	Who Cares

Texture

How many contain more than one **kern spine (i.e., are polyphonic, probably piano):

   grep -l "\*\*kern.*\*\*kern" *.krn | wc -l
   58

WF5118.krn is an example:

Wf5118.png

This one is interesting because it has invisible chords in the top staff which are realizing the harmonic chords above the staff.

How many songs have chords (this takes a long time to calculate -- 70 songs per second = 95 seconds):

  for i in *.krn 
  do 
     extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
  done | grep -v " ^0$" | wc -l
  365

How many songs do not have chords:

  for i in *.krn 
  do 
     extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
  done | grep " ^0$" | wc -l
  6345

Duration

What is the duration of all songs if played back-to-back and at the specified tempo without repeats?

   gettime -T *.krn | tail -n 1
   286:50:23.1354 hours

What are the longest songs:

    gettime --simple -T *.krn | sort -k2 -nr | head -n 10
    WF6618.krn:	3120
    WF0181.krn:	3120
    WF0182.krn:	1864
    WF3616.krn:	1420
    WF6336.krn:	1134
    WF5131.krn:	909
    WF6068.krn:	785
    WF5004.krn:	696
    WF3226.krn:	671
    WF1249.krn:	664

The -k2 option means to sort by the second column of data. -n means to sort numerically rather than alphabetically, and -r means to sort by highest first.

What are the shortest songs:

    gettime --simple -T *.krn | sort -k2 -nr | tail -n 10
    WF2814.krn:	16
    WF2806.krn:	16
    WF2795.krn:	16
    WF2785.krn:	16
    WF2856.krn:	14
    WF2852.krn:	12
    WF2799.krn:	12
    WF6338.krn:	8
    WF5609.krn:	8
 The shortest song in VHV:
    cat WF5609.krn | pbcopy
Wf-shortest.png

Meter

What sort of meters are in the database and how much of each type?

    beat -Ca *.krn | beat -Ua  | extractx -s '$1-$'  | ridx -H | sortcount -p
    65.89	4	4
    15.44	3	4
    11.67	2	2
    3.24	2	4
    2.78	6	8
    0.52	12	8
    0.16	5	4
    0.14	9	8
    0.09	6	4
    0.06	3	8
    0.01	3	2
    0.01	7	4
    0     	2	8
    0     	7	8
    0     	9	4
    0     	10	8
    0     	17	16
    0     	1	2
    0     	5	8
    0     	1	4
    0     	4	8

The most common meter is 4/4, where 65% of the music is in that meter.

-C means extract the count of the meter (the top number).

-U means extract the duration unit from the meter (the bottom number).

-C and -U are output once for each measure, so using these are a simple way of counting the number of measures in the scores. If you add -F option with these two options, every data line will display the metrical information.

-a means to append the analysis to the end of the lines (keeping the original input score).

The extract option:

    -s '$1-$'

means to extract from one before the last spine to the last spine. $1 is one before the last spine, $2 is two before the last spine, and so on.

Chord labels

How many unique chord labels are there?

   extractx -i mxhm  * | ridx -H | sortcount | wc -l
   1399

What are the most common ones:

   extractx -i mxhm  * | ridx -H | sortcount -p | head -n 10
   7.21	C major
   6.14	F major
   4.94	G major
   4.83	G dominant
   4.07	C dominant
   3.55	D dominant
   3.28	B- major
   2.81	E- major
   2.3	D major
   2.25	F dominant

How many chord qualities:

    extractx -i mxhm  * | ridx -H | sed 's/[^ ]* //; s/\/.*//' | sortcount | wc -l
    80

Here are the 80 qualities:

  93061	major
  64155	dominant
  31402	minor
  27490	minor-seventh
  8594	major-seventh
  5861	dominant-ninth
  5733	major-sixth
  4138	diminished
  2943	min
  2912	7
  2816	minor-sixth
  2159	half-diminished
  2154	suspended-fourth
  1884	diminished-seventh
  1738	augmented-seventh
  1408	augmented
  1208	C
  1102	dominant-13th
  1082	min7
  1008	F
  1008	maj7
  967	dominant-seventh
  892	G
  878	minor-ninth
  705	D
  650	B-
  592	major-ninth
  355	E-
  352	A
  280	E
  249	power
  237	dominant-11th
  222	suspended-second
  202	minor-11th
  165	minor-major
  157	dim
  129	maj
  128	A-
  96	augmented-ninth
  84	9
  71	other
  66	B
  62	6
  62	major-minor
  58	sus47
  46	aug
  46	D-
  46	min9
  36	G-
  29	m7b5
  23	major-13th
  21	maj9
  19	min6
  19	none
  17	pedal
  16	dim7
  16	maj69
  15	F#
  12	major	B- major
  8	major	F major
  7	C#
  6	major	.
  5	minor	D minor	D minor
  4	minor-13th
  3	minor	G minor
  3	C-
  3	minMaj7
  3	D#
  2	minor	.
  2	major	F major	F major
  2	dominant	C dominant
  2	5b
  2	major	C major	C major
  1	major	E- major
  1	ma
  1	major	.	.
  1	major	G major	G major
  1	minor	.	.
  1	7sus
  1


  extractx -i mxhm  * | ridx -H | sed 's/ .*//' | sortcount
  48078	C
  45001	G
  38459	F
  33577	D
  24361	A
  23237	B-
  16334	E-
  15587	E
  8279	A-
  7869	B
  3487	D-
  3365	F#
  1701	C#
  1316	G-
  824	G#
  187	D#
  168	C-
  52	A#
  28	F-
  6	B--
  4	B#
  4	.	F
  3	B/D#
  3	E#
  3	C/G
  1	.	B-
  1	A--


What is the most common 3-note chord sequence:

    extractx -i mxhm  * | grep -v ^= | serialize | context -n 3 | ridx -H | sortcount | head -n 10
    1779	C major G dominant C major	(I V7 I)
    1334	F major C dominant F major	(I V7 I)
    1301	C major F major C major     	(I V I)
    1062	D minor-seventh G dominant C major	(ii7 V7 I)
    994  	G major D dominant G major	(I V7 I)
    939  	G dominant C major G dominant	(V7 I V7)
    863  	G major C major G major     	(V I V)
    857 	G dominant C major F major	(V7 I IV)
    812  	F major B- major F major	(V I V)
    781  	G minor-seventh C dominant F major	(ii7 V7 I)


Scale degrees

The key information is not present in the files. They need to be processed further for that. MusicXML input has key information, but it is often incorrect since people use it more for key signature information, and the "mode" part is usually left at "major". The finalis-tonic script can be used to add an approximate key.


  for i in *.krn 
  do 
     finalis-tonic $i | extractx -i **kern | deg -at | serialize | ridx -H | grep -v r
  done | sortcount

Output includes chords and some other junk, but basic counts are:

161963	1
135309	5
102726	2
102275	3 
86931	6
85740	4
50631	7
38827	7-
38765	3-
24511	6-
12031	4+
11188	2-
6041	5+
5822	2+
5803	1+
4907	5-
2822	1-
2669	6+
1874	4-
1222	7+
1126	3+

Looking at 5-note sequences:

  for i in *.krn
  do 
     finalis-tonic $i | extractx -i **kern | deg -at | serialize -f | grep -v '^[r=]' \
           | context -n 5 | ridx -H ; done | sortcount > /tmp/analysis-data.txt
  done
  head -n 25 analysis-data.txt
  6053 1 1 1 1 1
  4417 5 5 5 5 5
  2186 3 3 3 3 3
  1872 4 4 4 4 4
  1573 2 2 2 2 2
  1293 3 2 1 2 3
  1282 5 4 3 2 1
  1189 6 6 6 6 6
  1087 1 2 3 2 1
  915  2 1 1 1 1
  894  3 4 3 2 1
  890  3- 3- 3- 3- 3-
  883  3 3 3 2 1
  855  3 2 1 1 1
  852  1 1 1 1 2
  825  6 5 4 3 2
  806  5 5 5 5 4
  786  3 2 1 2 1
  785  5 5 5 5 6
  761  3 3 3 3 2
  761  7 7 7 7 7
  753  7- 7- 7- 7- 7-
  752  5 1 1 1 1
  726  1 2 3 4 5
  723  4 3 2 1 1


Key

Using entire contents of file to determine key:

  keycor *.krn | sed 's/.*: //' | sortcount -p
  22.33	C Major
  15.05	F Major
  11.87	G Major
  9.21	E- Major
  6.87	B- Major
  5.82	A Minor
  4.51	D Minor
  4.39	D Major
  4.16	C Minor
  3.44	G Minor
  3.03	E Minor
  2.11	A- Major
  1.78	A Major
  1.41	F Minor
  1.02	E Major
  0.64	B- Minor
  0.53	D- Major
  0.39	E- Minor
  0.39	B Minor
  0.29	B Major
  0.18	G- Major
  0.16	F# Minor
  0.16	C# Minor
  0.08	F# Major
  0.06	C# Major
  0.04	D- Minor
  0.04	A- Minor
  0.04	G- Minor
  0.02	G# Minor
  0.02	G# Major


Using only the first 8 bars of the music for analysis of key:

  for i in *.krn
  do
     myank -m 1-8 $i | keycor
  done | sed 's/.*: //' | sortcount -p
  19.46	C Major
  12.32	F Major
  10.2	G Major
  6.89	E- Major
  6.7	A Minor
  6.65	D Minor
  5.64	B- Major
  5.36	C Minor
  5.21	D Major
  4.3	G Minor
  4.1	E Minor
  2.41	F Minor
  1.72	A Major
  1.65	A- Major
  1.31	B Minor
  1.22	B Major
  1.04	E Major
  0.89	B- Minor
  0.56	E- Minor
  0.56	D- Major
  0.48	F# Minor
  0.45	C# Minor
  0.3	A- Minor
  0.23	G- Major
  0.12	F# Major
  0.08	C# Major
  0.07	G# Minor
  0.05	G- Minor
  0.03	G# Major
  0.02	D- Minor



Lab 1 (intro) Lab 2 (Essen) Lab 3 (searching) Lab 4 (JRP) Lab 5 (Wikifonia) Lab 6 (bar chart) Lab 7 (regular expressions) Lab 8 (chorck & cint)