Difference between revisions of "Humdrum lab 5"

From CCARH Wiki
Jump to navigation Jump to search
 
(48 intermediate revisions by the same user not shown)
Line 1: Line 1:
This lab is about plotting data, and doing further analysis of the raw data extracted from Humdrum files.
+
This lab is about the wikifonia data.
  
 +
== Update Humdrum tools ==
  
There are several possibilities for plotting.  We will focus on the last one in the lab, but here are other possibilities:
+
In case there have been changes to the humdrum tools programs, you can update them with this command:
  
 +
    cd $(which beat | sed 's/humdrum-tools.*/humdrum-tools/')
 +
    make update
 +
    make
  
 +
== Basic information ==
  
== Load data into a spreadsheet ==
+
How many files
  
You can copy-and-paste data into a spreadsheet, either Microsoft Excel, Google Spreadsheets, or similar.
+
    ls *.krn | wc -l
 +
    6710
  
In MacOS, try the command:
+
How many have lyrics:
  
     humcat -s h://chorales | deg -at | serialize | ridx -H | egrep -v "=|r" | sortcount | pbcopy
+
     grep -l "\*\*text" *.krn | wc -l
 +
    5460
  
pbcopy is used to copy data to the clipboard.
+
How many have chords:
  
This will extract a count of scale-degrees in Bach chorales:
+
    grep -l "\*\*mxhm" *.krn | wc -l
 +
    6282
  
15628 5
+
How many have two or more verses:
14991 1
 
11710 3
 
10721 2
 
9435 4
 
7761 6
 
5742 7
 
4728 7-
 
1135 6+
 
1134 4+
 
556 3+
 
332 2-
 
326 1+
 
324 5+
 
50 3-
 
41 5-
 
32 2+
 
11 6-
 
7 1-
 
2 4-
 
  
 +
    grep -l "\*\*text.*\*\*text" *.krn | wc -l
 +
    2006
  
Open up a spreadsheet program and paste the resulting data into the spreadsheet.
+
== Bibliographic and basic information ==
  
 +
Who are the top 10 represented composer in the data:
  
[[File:excel-scalegrees-chorales.png|center|500px]]
+
  grep -h COM *.krn | sortcount | head -n 10
 +
  132 !!!COM: Unknown
 +
  121 !!!COM: Hungarian folk song
 +
  119 !!!COM: Traditional
 +
  91 !!!COM: Richard Rodgers
 +
  75 !!!COM: Irving Berlin
 +
  67 !!!COM: Hungarian song
 +
  65 !!!COM: Cole Porter
 +
  46 !!!COM: Harry Warren
 +
  45 !!!COM: George Gershwin
 +
  40 !!!COM: Harold Arlen
  
Notice that some of the cells in the B column are left justified while others are right justified.  This is because Excel is autodetecting the format of each cell.  It is right justifying the numbers, and left justifying the text.
+
Most repeated titles:
  
Make all of the B column identified as text by clicking on the "B" at the top of the column, then right-click and choose "Format Cell..." for the context menu that appears, and choose Text as the type for the column cells:
+
  grep OTL * -h | sortcount | head -n 10
 +
  7 !!!OTL: A Daisy A Day
 +
  6 !!!OTL: Amazing Grace
 +
  5 !!!OTL: Cabaret
 +
  5 !!!OTL: Take Five
 +
  4 !!!OTL: test
 +
  4 !!!OTL: Unforgettable
 +
  4 !!!OTL: You'll Never Walk Alone
 +
  4 !!!OTL: Nuages
 +
  4 !!!OTL: Birk's Works
 +
  4 !!!OTL: This Is My Song
  
[[File:excel-format-cells-as-text.png|center|300px]]
+
Note that "sortcount" is a Humdrum Extra script which is equivalent (in this case) with the unix command "sort | uniq -c | sort -nr".   Which files contain the totle "A Daisy A Day":
  
Now all cells in the B column are text:
+
    grep "OTL.*A Daisy A Day" *.krn -l
 +
    WF3959.krn
 +
    WF3960.krn
 +
    WF3961.krn
 +
    WF3962.krn
 +
    WF3963.krn
 +
    WF3964.krn
 +
    WF3967.krn
  
[[File:excel-cells-are-text.png|center|500px]]
+
To view the files in VHV from MacOS:
 +
  cat WV3959.krn | pbcopy
 +
And then paste onto https://verovio.humdrum.org text region (command-A to select all old text, and then command-V to paste new score).
  
Switch the order of the columns and then create a bar chart:
+
[[File:wf3959.png|center|500px]]
  
[[File:excel-chart1.png|center|500px]]
+
List the titles of all pieces where George Gershwin is the composer:
  
Compare the to barchart created further below with pandas/jupyter:
+
  grep OTL  $(grep -li  COM.*Gershwin *.krn) | sort -k2
 +
  WF2190.krn:!!!OTL: 'S Wonderful!
 +
  WF2191.krn:!!!OTL: A FOGGY DAY
 +
  WF2267.krn:!!!OTL: A Foggy Day
 +
  WF2186.krn:!!!OTL: A Woman Is A Sometime Thing
 +
  WF2192.krn:!!!OTL: Bidin' My Time
 +
  WF2178.krn:!!!OTL: Blues
 +
  WF2193.krn:!!!OTL: But Not For Me
 +
  WF2194.krn:!!!OTL: By Strauss
 +
  WF2195.krn:!!!OTL: Clap yo' hands
 +
  WF2185.krn:!!!OTL: Do It Again!
 +
  WF2196.krn:!!!OTL: Embraceable You
 +
  WF2197.krn:!!!OTL: Fascinating Rhythm
 +
  WF2268.krn:!!!OTL: For You, For Me, For Evermore
 +
  WF2198.krn:!!!OTL: How Long Has This Been Going On
 +
  WF2199.krn:!!!OTL: I Got Plenty o' Nuttin'
 +
  WF2103.krn:!!!OTL: I Got Rhythm
 +
  WF2200.krn:!!!OTL: I Got Rhythm
 +
  WF2222.krn:!!!OTL: I Loves You Porgy
 +
  WF2201.krn:!!!OTL: I Was Doing All Right
 +
  WF2202.krn:!!!OTL: I Was Doing All Right
 +
  WF2179.krn:!!!OTL: I loves you Porgy
 +
  WF2184.krn:!!!OTL: I'll Build A Stairway To Paradise
 +
  WF2203.krn:!!!OTL: I've Got A Crush On You
 +
  WF2204.krn:!!!OTL: Isn't It A Pity
 +
  WF2221.krn:!!!OTL: It Ain't Necessarily So
 +
  WF2205.krn:!!!OTL: Let's Call the Whole Thing Off
 +
  WF2220.krn:!!!OTL: Liza
 +
  WF2219.krn:!!!OTL: Liza (All the clouds'll roll away)
 +
  WF2269.krn:!!!OTL: Love Is Here To Stay
 +
  WF2206.krn:!!!OTL: Love Walked In
 +
  WF2223.krn:!!!OTL: My Man's Gone Now
 +
  WF2207.krn:!!!OTL: Nice Work If You Can Get It
 +
  WF2208.krn:!!!OTL: Oh Lady Be Good
 +
  WF2189.krn:!!!OTL: SUMMERTIME
 +
  WF2995.krn:!!!OTL: Shoes With Wings On
 +
  WF2183.krn:!!!OTL: Somebody Loves Me
 +
  WF2209.krn:!!!OTL: Someone To Watch Over Me
 +
  WF2210.krn:!!!OTL: Soon
 +
  WF2211.krn:!!!OTL: Strike Up The Band
 +
  WF2180.krn:!!!OTL: Summertime
 +
  WF2181.krn:!!!OTL: Summertime
 +
  WF2187.krn:!!!OTL: Summertime
 +
  WF2212.krn:!!!OTL: Summertime
 +
  WF2224.krn:!!!OTL: Swanee
 +
  WF2213.krn:!!!OTL: That Certain Feeling
 +
  WF2214.krn:!!!OTL: The Man I Love
 +
  WF2225.krn:!!!OTL: The Simple Life
 +
  WF2188.krn:!!!OTL: There's A Boat Dat's Leaving Soon For New York
 +
  WF2215.krn:!!!OTL: They All Laughed
 +
  WF2216.krn:!!!OTL: They Can't Take That Away From Me
 +
  WF2217.krn:!!!OTL: They Can't Take That Away From Me
 +
  WF2218.krn:!!!OTL: Who Cares
  
[[File:excel-bar-chart-jupyter.png|center|300px]]
+
== Texture ==
  
 +
How many contain more than one **kern spine (i.e., are polyphonic, probably piano):
  
== Plotting with Gnuplot ==
+
    grep -l "\*\*kern.*\*\*kern" *.krn | wc -l
 +
    58
  
[http://www.gnuplot.info/ Gnuplot] is a handy command-line plotting program.  Here is an example of plotting the same data in gnuplot:
+
WF5118.krn is an example:
  
First save the data to a file:
+
[[File:wf5118.png|center|500px]]
  
          humcat -s h://chorales | deg -at | serialize | ridx -H | egrep -v "=|r" | sortcount > data.txt
+
This one is interesting because it has invisible chords in the top staff which are realizing the harmonic chords above the staff.
  
On MacOS, install gnuplot with Homebrew:
+
How many songs have chords (this takes a long time to calculate -- 70 songs per second = 95 seconds):
 +
 
 +
  for i in *.krn
 +
  do
 +
      extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
 +
  done | grep -v " ^0$" | wc -l
 +
  365
 
   
 
   
          brew install gnuplot
+
How many songs do not have chords:
 +
 
 +
  for i in *.krn
 +
  do
 +
      extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
 +
  done | grep " ^0$" | wc -l
 +
  6345
 +
 
 +
== Duration ==
 +
 
 +
What is the duration of all songs if played back-to-back and at the specified tempo without repeats?
 +
 
 +
    gettime -T *.krn | tail -n 1
 +
    286:50:23.1354 hours
 +
 
 +
What are the longest songs:
 +
 
 +
    gettime --simple -T *.krn | sort -k2 -nr | head -n 10
 +
    WF6618.krn: 3120
 +
    WF0181.krn: 3120
 +
    WF0182.krn: 1864
 +
    WF3616.krn: 1420
 +
    WF6336.krn: 1134
 +
    WF5131.krn: 909
 +
    WF6068.krn: 785
 +
    WF5004.krn: 696
 +
    WF3226.krn: 671
 +
    WF1249.krn: 664
 +
 
 +
The -k2 option means to sort by the second column of data. -n means to sort numerically rather than alphabetically, and -r means to sort by highest first.
 +
 
 +
What are the shortest songs:
 +
 
 +
    gettime --simple -T *.krn | sort -k2 -nr | tail -n 10
 +
    WF2814.krn: 16
 +
    WF2806.krn: 16
 +
    WF2795.krn: 16
 +
    WF2785.krn: 16
 +
    WF2856.krn: 14
 +
    WF2852.krn: 12
 +
    WF2799.krn: 12
 +
    WF6338.krn: 8
 +
    WF5609.krn: 8
 +
 
 +
  The shortest song in VHV:
 +
 
 +
    cat WF5609.krn | pbcopy
 +
 
 +
[[File:wf-shortest.png|center|500px]]
 +
 
 +
== Meter ==
 +
 
 +
What sort of meters are in the database and how much of each type?
 +
 
 +
    beat -Ca *.krn | beat -Ua  | extractx -s '$1-$'  | ridx -H | sortcount -p
 +
    65.89 4 4
 +
    15.44 3 4
 +
    11.67 2 2
 +
    3.24 2 4
 +
    2.78 6 8
 +
    0.52 12 8
 +
    0.16 5 4
 +
    0.14 9 8
 +
    0.09 6 4
 +
    0.06 3 8
 +
    0.01 3 2
 +
    0.01 7 4
 +
    0    2 8
 +
    0    7 8
 +
    0    9 4
 +
    0    10 8
 +
    0    17 16
 +
    0    1 2
 +
    0    5 8
 +
    0    1 4
 +
    0    4 8
 +
 
 +
The most common meter is 4/4, where 65% of the music is in that meter.
 +
 
 +
-C means extract the count of the meter (the top number).
  
Then create a file called plotbar with these contents:
+
-U means extract the duration unit from the meter (the bottom number).
  
  #!/usr/bin/env gnuplot
+
-C and -U are output once for each measure, so using these are a simple way of counting the number of measures in the scoresIf you add -F option with these two options, every data line will display the metrical information.
 
 
  set terminal svg size 800,500 enhanced font "Helvetica,20"
 
  set output "output.svg"  
 
 
 
  set style data histogram
 
  set style fill solid
 
  set title "Scale degrees used in Bach chorales"
 
 
 
  unset key
 
 
 
  plot "data.txt" using 1:xtic(2) linecolor rgb "#ff0088"
 
  
 +
-a means to append the analysis to the end of the lines (keeping the original input score).
  
Run the script with this command:
+
The extract option:
  
  chmod 0755 plotbar
+
    -s '$1-$'
  ./plotbar
 
  
This should create a file called output.svg that looks like this:
+
means to extract from one before the last spine to the last spine.  $1 is one before the last spine, $2 is two before the last spine, and so on.
  
 +
== Chord labels ==
  
[[File:gnuplot-barchart.png|center|500px]]
+
How many unique chord labels are there?
  
 +
    extractx -i mxhm  * | ridx -H | sortcount | wc -l
 +
    1399
  
== Jupyter/pandas/matplotlib ==
+
What are the most common ones:
  
The rest of the lab examines how to load and plot similar data in a Jupyter notebook using pandas and matplotlib to display the barchart.
+
    extractx -i mxhm  * | ridx -H | sortcount -p | head -n 10
 +
    7.21 C major
 +
    6.14 F major
 +
    4.94 G major
 +
    4.83 G dominant
 +
    4.07 C dominant
 +
    3.55 D dominant
 +
    3.28 B- major
 +
    2.81 E- major
 +
    2.3 D major
 +
    2.25 F dominant
  
Check out the online version of the notebook here: http://nbviewer.jupyter.org/url/notebooks.humdrum.org/jupyter/craig/barplots/barplots.ipynb
+
How many chord qualities:
  
Also, there is a download button on that page to download a local copy of the original jupyter notebook file.
+
    extractx -i mxhm  * | ridx -H | sed 's/[^ ]* //; s/\/.*//' | sortcount | wc -l
 +
    80
  
To run jupyter on your computer, do these commands (may vary depending on os and other installation systems, but this worked well for me in MacOS):
+
Here are the 80 qualities:
  
      brew install python3
+
  93061 major
 +
  64155 dominant
 +
  31402 minor
 +
  27490 minor-seventh
 +
  8594 major-seventh
 +
  5861 dominant-ninth
 +
  5733 major-sixth
 +
  4138 diminished
 +
  2943 min
 +
  2912 7
 +
  2816 minor-sixth
 +
  2159 half-diminished
 +
  2154 suspended-fourth
 +
  1884 diminished-seventh
 +
  1738 augmented-seventh
 +
  1408 augmented
 +
  1208 C
 +
  1102 dominant-13th
 +
  1082 min7
 +
  1008 F
 +
  1008 maj7
 +
  967 dominant-seventh
 +
  892 G
 +
  878 minor-ninth
 +
  705 D
 +
  650 B-
 +
  592 major-ninth
 +
  355 E-
 +
  352 A
 +
  280 E
 +
  249 power
 +
  237 dominant-11th
 +
  222 suspended-second
 +
  202 minor-11th
 +
  165 minor-major
 +
  157 dim
 +
  129 maj
 +
  128 A-
 +
  96 augmented-ninth
 +
  84 9
 +
  71 other
 +
  66 B
 +
  62 6
 +
  62 major-minor
 +
  58 sus47
 +
  46 aug
 +
  46 D-
 +
  46 min9
 +
  36 G-
 +
  29 m7b5
 +
  23 major-13th
 +
  21 maj9
 +
  19 min6
 +
  19 none
 +
  17 pedal
 +
  16 dim7
 +
  16 maj69
 +
  15 F#
 +
  12 major B- major
 +
  8 major F major
 +
  7 C#
 +
  6 major .
 +
  5 minor D minor D minor
 +
  4 minor-13th
 +
  3 minor G minor
 +
  3 C-
 +
  3 minMaj7
 +
  3 D#
 +
  2 minor .
 +
  2 major F major F major
 +
  2 dominant C dominant
 +
  2 5b
 +
  2 major C major C major
 +
  1 major E- major
 +
  1 ma
 +
  1 major . .
 +
  1 major G major G major
 +
  1 minor . .
 +
  1 7sus
 +
  1
  
This will take a while.  Then install [https://github.com/jupyterlab/jupyterlab jupyterlab], which is the development version of the jupyter notebook web interface:
 
  
      pip3 install jupyterlab
+
  extractx -i mxhm  * | ridx -H | sed 's/ .*//' | sortcount
 +
  48078 C
 +
  45001 G
 +
  38459 F
 +
  33577 D
 +
  24361 A
 +
  23237 B-
 +
  16334 E-
 +
  15587 E
 +
  8279 A-
 +
  7869 B
 +
  3487 D-
 +
  3365 F#
 +
  1701 C#
 +
  1316 G-
 +
  824 G#
 +
  187 D#
 +
  168 C-
 +
  52 A#
 +
  28 F-
 +
  6 B--
 +
  4 B#
 +
  4 . F
 +
  3 B/D#
 +
  3 E#
 +
  3 C/G
 +
  1 . B-
 +
  1 A--
  
A nice, but optional thing to do is install the [https://github.com/ian-r-rose/jupyterlab-toc table-of-contents tab plugin]:
 
  
    brew install nodejs
+
What is the most common 3-note chord sequence:
    jupyter-labextension install jupyterlab-toc
 
  
But I have had problems on most computers getting this to install without the installation process hanging...
+
    extractx -i mxhm  * | grep -v ^= | serialize | context -n 3 | ridx -H | sortcount | head -n 10
 +
    1779 C major G dominant C major (I V7 I)
 +
    1334 F major C dominant F major (I V7 I)
 +
    1301 C major F major C major    (I V I)
 +
    1062 D minor-seventh G dominant C major (ii7 V7 I)
 +
    994  G major D dominant G major (I V7 I)
 +
    939  G dominant C major G dominant (V7 I V7)
 +
    863  G major C major G major    (V I V)
 +
    857 G dominant C major F major (V7 I IV)
 +
    812  F major B- major F major (V I V)
 +
    781  G minor-seventh C dominant F major (ii7 V7 I)
  
  
To start jupyterlab, type in the terminal:
+
== Scale degrees ==
 +
 +
The key information is not present in the files.  They need to be processed further for that.  MusicXML input has key information, but it is often incorrect since people use it more for key signature information, and the "mode" part is usually left at "major".  The finalis-tonic script can be used to add an approximate key.
  
    jupyter-lab
 
  
If you want to use a specific browser because jupyter chose the wrong one:
+
  for i in *.krn
 +
  do
 +
      finalis-tonic $i | extractx -i **kern | deg -at | serialize | ridx -H | grep -v r
 +
  done | sortcount
  
    jupyter-lab --browser=chrome
+
Output includes chords and some other junk, but basic counts are:
    jupyter-lab --browser=firefox
 
    jupyter-lab --browser=safari
 
  
Here is the default window of jupyter-lab:
+
161963 1
 +
135309 5
 +
102726 2
 +
102275 3
 +
86931 6
 +
85740 4
 +
50631 7
 +
38827 7-
 +
38765 3-
 +
24511 6-
 +
12031 4+
 +
11188 2-
 +
6041 5+
 +
5822 2+
 +
5803 1+
 +
4907 5-
 +
2822 1-
 +
2669 6+
 +
1874 4-
 +
1222 7+
 +
1126 3+
  
[[File:jupyter-lab-window.png|center|700px]]
+
Looking at 5-note sequences:
  
To create a new notebook, click on the "Python 3" icon in the "Notebook" section of the Launcher. You can load a notebook saved on the computer by using the file viewer on the left side of the window. Click on the "Files" tab on the far left to hide the file menu. Here is what the browser looks like after doing all of that and then typing a test command:
+
  for i in *.krn
 +
  do
 +
      finalis-tonic $i | extractx -i **kern | deg -at | serialize -f | grep -v '^[r=]' \
 +
            | context -n 5 | ridx -H ; done | sortcount > /tmp/analysis-data.txt
 +
  done
 +
  head -n 25 analysis-data.txt
  
[[File:jupyter-lab-notebook2.png|center|700px]]
+
  6053 1 1 1 1 1
 +
  4417 5 5 5 5 5
 +
  2186 3 3 3 3 3
 +
  1872 4 4 4 4 4
 +
  1573 2 2 2 2 2
 +
  1293 3 2 1 2 3
 +
  1282 5 4 3 2 1
 +
  1189 6 6 6 6 6
 +
  1087 1 2 3 2 1
 +
  915  2 1 1 1 1
 +
  894  3 4 3 2 1
 +
  890  3- 3- 3- 3- 3-
 +
  883  3 3 3 2 1
 +
  855  3 2 1 1 1
 +
  852  1 1 1 1 2
 +
  825  6 5 4 3 2
 +
  806  5 5 5 5 4
 +
  786  3 2 1 2 1
 +
  785  5 5 5 5 6
 +
  761  3 3 3 3 2
 +
  761  7 7 7 7 7
 +
  753  7- 7- 7- 7- 7-
 +
  752  5 1 1 1 1
 +
  726  1 2 3 4 5
 +
  723  4 3 2 1 1
  
Now to go the webpage http://nbviewer.jupyter.org/url/notebooks.humdrum.org/jupyter/craig/barplots/barplots.ipynb
+
== Key ==
  
and download that notebook from the icon on the top right:
+
Using entire contents of file to determine key:
  
[[File:nbviewer-download-icon.png|center]]
+
  keycor *.krn | sed 's/.*: //' | sortcount -p
 +
  22.33 C Major
 +
  15.05 F Major
 +
  11.87 G Major
 +
  9.21 E- Major
 +
  6.87 B- Major
 +
  5.82 A Minor
 +
  4.51 D Minor
 +
  4.39 D Major
 +
  4.16 C Minor
 +
  3.44 G Minor
 +
  3.03 E Minor
 +
  2.11 A- Major
 +
  1.78 A Major
 +
  1.41 F Minor
 +
  1.02 E Major
 +
  0.64 B- Minor
 +
  0.53 D- Major
 +
  0.39 E- Minor
 +
  0.39 B Minor
 +
  0.29 B Major
 +
  0.18 G- Major
 +
  0.16 F# Minor
 +
  0.16 C# Minor
 +
  0.08 F# Major
 +
  0.06 C# Major
 +
  0.04 D- Minor
 +
  0.04 A- Minor
 +
  0.04 G- Minor
 +
  0.02 G# Minor
 +
  0.02 G# Major
  
You will probably have to copy the notebook to the same directory in which you started jupyter-lab.
 
  
 +
Using only the first 8 bars of the music for analysis of key:
 +
 
 +
  for i in *.krn
 +
  do
 +
      myank -m 1-8 $i | keycor
 +
  done | sed 's/.*: //' | sortcount -p
  
=== Useful tips for working in jupyter notebooks ===
+
  19.46 C Major
 +
  12.32 F Major
 +
  10.2 G Major
 +
  6.89 E- Major
 +
  6.7 A Minor
 +
  6.65 D Minor
 +
  5.64 B- Major
 +
  5.36 C Minor
 +
  5.21 D Major
 +
  4.3 G Minor
 +
  4.1 E Minor
 +
  2.41 F Minor
 +
  1.72 A Major
 +
  1.65 A- Major
 +
  1.31 B Minor
 +
  1.22 B Major
 +
  1.04 E Major
 +
  0.89 B- Minor
 +
  0.56 E- Minor
 +
  0.56 D- Major
 +
  0.48 F# Minor
 +
  0.45 C# Minor
 +
  0.3 A- Minor
 +
  0.23 G- Major
 +
  0.12 F# Major
 +
  0.08 C# Major
 +
  0.07 G# Minor
 +
  0.05 G- Minor
 +
  0.03 G# Major
 +
  0.02 D- Minor
  
* To run a program in a cell, click in the cell with a mouse and then type {{keypress|shift-return}}.  This will evaluate the cell and print the results underneath.
 
  
* To add a new cell above the current cell.  Press {{keypress|esc}} to exit from editing the cell.  Then type {{keypress|a}}.  A new cell should be added above the current one.  Similarly {{keypress|b}} will create a new cell below the current one.
 
  
* Text can be added to the page by converting a cell to the markdown format, and then typing markdown data.  To convert to markdown, click in the cell then press {keypress|esc}} to defocus on the text, and then type {{keypress|m}}.  Then click in the cell again and add text.  Then when finished, press {{keypress|shift-enter}} to convert to text.
+
{{humdrum_labs}}

Latest revision as of 19:11, 13 April 2021

This lab is about the wikifonia data.

Update Humdrum tools

In case there have been changes to the humdrum tools programs, you can update them with this command:

   cd $(which beat | sed 's/humdrum-tools.*/humdrum-tools/')
   make update
   make

Basic information

How many files

   ls *.krn | wc -l
   6710

How many have lyrics:

    grep -l "\*\*text" *.krn | wc -l
    5460

How many have chords:

    grep -l "\*\*mxhm" *.krn | wc -l
    6282

How many have two or more verses:

    grep -l "\*\*text.*\*\*text" *.krn | wc -l
    2006

Bibliographic and basic information

Who are the top 10 represented composer in the data:

  grep -h COM *.krn | sortcount | head -n 10
  132	!!!COM:	Unknown
  121	!!!COM:	Hungarian folk song
  119	!!!COM:	Traditional
  91	!!!COM:	Richard Rodgers
  75	!!!COM:	Irving Berlin
  67	!!!COM:	Hungarian song
  65	!!!COM:	Cole Porter
  46	!!!COM:	Harry Warren
  45	!!!COM:	George Gershwin
  40	!!!COM:	Harold Arlen

Most repeated titles:

  grep OTL * -h | sortcount | head -n 10
  7	!!!OTL:	A Daisy A Day
  6	!!!OTL:	Amazing Grace
  5	!!!OTL:	Cabaret
  5	!!!OTL:	Take Five
  4	!!!OTL:	test
  4	!!!OTL:	Unforgettable
  4	!!!OTL:	You'll Never Walk Alone
  4	!!!OTL:	Nuages
  4	!!!OTL:	Birk's Works
  4	!!!OTL:	This Is My Song

Note that "sortcount" is a Humdrum Extra script which is equivalent (in this case) with the unix command "sort | uniq -c | sort -nr". Which files contain the totle "A Daisy A Day":

   grep "OTL.*A Daisy A Day" *.krn -l
   WF3959.krn
   WF3960.krn
   WF3961.krn
   WF3962.krn
   WF3963.krn
   WF3964.krn
   WF3967.krn

To view the files in VHV from MacOS:

  cat WV3959.krn | pbcopy

And then paste onto https://verovio.humdrum.org text region (command-A to select all old text, and then command-V to paste new score).

Wf3959.png

List the titles of all pieces where George Gershwin is the composer:

  grep OTL  $(grep -li  COM.*Gershwin *.krn) | sort -k2
  WF2190.krn:!!!OTL:	'S Wonderful!
  WF2191.krn:!!!OTL:	A FOGGY DAY
  WF2267.krn:!!!OTL:	A Foggy Day
  WF2186.krn:!!!OTL:	A Woman Is A Sometime Thing
  WF2192.krn:!!!OTL:	Bidin' My Time
  WF2178.krn:!!!OTL:	Blues
  WF2193.krn:!!!OTL:	But Not For Me
  WF2194.krn:!!!OTL:	By Strauss
  WF2195.krn:!!!OTL:	Clap yo' hands
  WF2185.krn:!!!OTL:	Do It Again!
  WF2196.krn:!!!OTL:	Embraceable You
  WF2197.krn:!!!OTL:	Fascinating Rhythm
  WF2268.krn:!!!OTL:	For You, For Me, For Evermore
  WF2198.krn:!!!OTL:	How Long Has This Been Going On
  WF2199.krn:!!!OTL:	I Got Plenty o' Nuttin'
  WF2103.krn:!!!OTL:	I Got Rhythm
  WF2200.krn:!!!OTL:	I Got Rhythm
  WF2222.krn:!!!OTL:	I Loves You Porgy
  WF2201.krn:!!!OTL:	I Was Doing All Right
  WF2202.krn:!!!OTL:	I Was Doing All Right
  WF2179.krn:!!!OTL:	I loves you Porgy
  WF2184.krn:!!!OTL:	I'll Build A Stairway To Paradise
  WF2203.krn:!!!OTL:	I've Got A Crush On You
  WF2204.krn:!!!OTL:	Isn't It A Pity
  WF2221.krn:!!!OTL:	It Ain't Necessarily So
  WF2205.krn:!!!OTL:	Let's Call the Whole Thing Off
  WF2220.krn:!!!OTL:	Liza
  WF2219.krn:!!!OTL:	Liza (All the clouds'll roll away)
  WF2269.krn:!!!OTL:	Love Is Here To Stay
  WF2206.krn:!!!OTL:	Love Walked In
  WF2223.krn:!!!OTL:	My Man's Gone Now
  WF2207.krn:!!!OTL:	Nice Work If You Can Get It
  WF2208.krn:!!!OTL:	Oh Lady Be Good
  WF2189.krn:!!!OTL:	SUMMERTIME
  WF2995.krn:!!!OTL:	Shoes With Wings On
  WF2183.krn:!!!OTL:	Somebody Loves Me
  WF2209.krn:!!!OTL:	Someone To Watch Over Me
  WF2210.krn:!!!OTL:	Soon
  WF2211.krn:!!!OTL:	Strike Up The Band
  WF2180.krn:!!!OTL:	Summertime
  WF2181.krn:!!!OTL:	Summertime
  WF2187.krn:!!!OTL:	Summertime
  WF2212.krn:!!!OTL:	Summertime
  WF2224.krn:!!!OTL:	Swanee
  WF2213.krn:!!!OTL:	That Certain Feeling
  WF2214.krn:!!!OTL:	The Man I Love
  WF2225.krn:!!!OTL:	The Simple Life
  WF2188.krn:!!!OTL:	There's A Boat Dat's Leaving Soon For New York
  WF2215.krn:!!!OTL:	They All Laughed
  WF2216.krn:!!!OTL:	They Can't Take That Away From Me
  WF2217.krn:!!!OTL:	They Can't Take That Away From Me
  WF2218.krn:!!!OTL:	Who Cares

Texture

How many contain more than one **kern spine (i.e., are polyphonic, probably piano):

   grep -l "\*\*kern.*\*\*kern" *.krn | wc -l
   58

WF5118.krn is an example:

Wf5118.png

This one is interesting because it has invisible chords in the top staff which are realizing the harmonic chords above the staff.

How many songs have chords (this takes a long time to calculate -- 70 songs per second = 95 seconds):

  for i in *.krn 
  do 
     extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
  done | grep -v " ^0$" | wc -l
  365

How many songs do not have chords:

  for i in *.krn 
  do 
     extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
  done | grep " ^0$" | wc -l
  6345

Duration

What is the duration of all songs if played back-to-back and at the specified tempo without repeats?

   gettime -T *.krn | tail -n 1
   286:50:23.1354 hours

What are the longest songs:

    gettime --simple -T *.krn | sort -k2 -nr | head -n 10
    WF6618.krn:	3120
    WF0181.krn:	3120
    WF0182.krn:	1864
    WF3616.krn:	1420
    WF6336.krn:	1134
    WF5131.krn:	909
    WF6068.krn:	785
    WF5004.krn:	696
    WF3226.krn:	671
    WF1249.krn:	664

The -k2 option means to sort by the second column of data. -n means to sort numerically rather than alphabetically, and -r means to sort by highest first.

What are the shortest songs:

    gettime --simple -T *.krn | sort -k2 -nr | tail -n 10
    WF2814.krn:	16
    WF2806.krn:	16
    WF2795.krn:	16
    WF2785.krn:	16
    WF2856.krn:	14
    WF2852.krn:	12
    WF2799.krn:	12
    WF6338.krn:	8
    WF5609.krn:	8
 The shortest song in VHV:
    cat WF5609.krn | pbcopy
Wf-shortest.png

Meter

What sort of meters are in the database and how much of each type?

    beat -Ca *.krn | beat -Ua  | extractx -s '$1-$'  | ridx -H | sortcount -p
    65.89	4	4
    15.44	3	4
    11.67	2	2
    3.24	2	4
    2.78	6	8
    0.52	12	8
    0.16	5	4
    0.14	9	8
    0.09	6	4
    0.06	3	8
    0.01	3	2
    0.01	7	4
    0     	2	8
    0     	7	8
    0     	9	4
    0     	10	8
    0     	17	16
    0     	1	2
    0     	5	8
    0     	1	4
    0     	4	8

The most common meter is 4/4, where 65% of the music is in that meter.

-C means extract the count of the meter (the top number).

-U means extract the duration unit from the meter (the bottom number).

-C and -U are output once for each measure, so using these are a simple way of counting the number of measures in the scores. If you add -F option with these two options, every data line will display the metrical information.

-a means to append the analysis to the end of the lines (keeping the original input score).

The extract option:

    -s '$1-$'

means to extract from one before the last spine to the last spine. $1 is one before the last spine, $2 is two before the last spine, and so on.

Chord labels

How many unique chord labels are there?

   extractx -i mxhm  * | ridx -H | sortcount | wc -l
   1399

What are the most common ones:

   extractx -i mxhm  * | ridx -H | sortcount -p | head -n 10
   7.21	C major
   6.14	F major
   4.94	G major
   4.83	G dominant
   4.07	C dominant
   3.55	D dominant
   3.28	B- major
   2.81	E- major
   2.3	D major
   2.25	F dominant

How many chord qualities:

    extractx -i mxhm  * | ridx -H | sed 's/[^ ]* //; s/\/.*//' | sortcount | wc -l
    80

Here are the 80 qualities:

  93061	major
  64155	dominant
  31402	minor
  27490	minor-seventh
  8594	major-seventh
  5861	dominant-ninth
  5733	major-sixth
  4138	diminished
  2943	min
  2912	7
  2816	minor-sixth
  2159	half-diminished
  2154	suspended-fourth
  1884	diminished-seventh
  1738	augmented-seventh
  1408	augmented
  1208	C
  1102	dominant-13th
  1082	min7
  1008	F
  1008	maj7
  967	dominant-seventh
  892	G
  878	minor-ninth
  705	D
  650	B-
  592	major-ninth
  355	E-
  352	A
  280	E
  249	power
  237	dominant-11th
  222	suspended-second
  202	minor-11th
  165	minor-major
  157	dim
  129	maj
  128	A-
  96	augmented-ninth
  84	9
  71	other
  66	B
  62	6
  62	major-minor
  58	sus47
  46	aug
  46	D-
  46	min9
  36	G-
  29	m7b5
  23	major-13th
  21	maj9
  19	min6
  19	none
  17	pedal
  16	dim7
  16	maj69
  15	F#
  12	major	B- major
  8	major	F major
  7	C#
  6	major	.
  5	minor	D minor	D minor
  4	minor-13th
  3	minor	G minor
  3	C-
  3	minMaj7
  3	D#
  2	minor	.
  2	major	F major	F major
  2	dominant	C dominant
  2	5b
  2	major	C major	C major
  1	major	E- major
  1	ma
  1	major	.	.
  1	major	G major	G major
  1	minor	.	.
  1	7sus
  1


  extractx -i mxhm  * | ridx -H | sed 's/ .*//' | sortcount
  48078	C
  45001	G
  38459	F
  33577	D
  24361	A
  23237	B-
  16334	E-
  15587	E
  8279	A-
  7869	B
  3487	D-
  3365	F#
  1701	C#
  1316	G-
  824	G#
  187	D#
  168	C-
  52	A#
  28	F-
  6	B--
  4	B#
  4	.	F
  3	B/D#
  3	E#
  3	C/G
  1	.	B-
  1	A--


What is the most common 3-note chord sequence:

    extractx -i mxhm  * | grep -v ^= | serialize | context -n 3 | ridx -H | sortcount | head -n 10
    1779	C major G dominant C major	(I V7 I)
    1334	F major C dominant F major	(I V7 I)
    1301	C major F major C major     	(I V I)
    1062	D minor-seventh G dominant C major	(ii7 V7 I)
    994  	G major D dominant G major	(I V7 I)
    939  	G dominant C major G dominant	(V7 I V7)
    863  	G major C major G major     	(V I V)
    857 	G dominant C major F major	(V7 I IV)
    812  	F major B- major F major	(V I V)
    781  	G minor-seventh C dominant F major	(ii7 V7 I)


Scale degrees

The key information is not present in the files. They need to be processed further for that. MusicXML input has key information, but it is often incorrect since people use it more for key signature information, and the "mode" part is usually left at "major". The finalis-tonic script can be used to add an approximate key.


  for i in *.krn 
  do 
     finalis-tonic $i | extractx -i **kern | deg -at | serialize | ridx -H | grep -v r
  done | sortcount

Output includes chords and some other junk, but basic counts are:

161963	1
135309	5
102726	2
102275	3 
86931	6
85740	4
50631	7
38827	7-
38765	3-
24511	6-
12031	4+
11188	2-
6041	5+
5822	2+
5803	1+
4907	5-
2822	1-
2669	6+
1874	4-
1222	7+
1126	3+

Looking at 5-note sequences:

  for i in *.krn
  do 
     finalis-tonic $i | extractx -i **kern | deg -at | serialize -f | grep -v '^[r=]' \
           | context -n 5 | ridx -H ; done | sortcount > /tmp/analysis-data.txt
  done
  head -n 25 analysis-data.txt
  6053 1 1 1 1 1
  4417 5 5 5 5 5
  2186 3 3 3 3 3
  1872 4 4 4 4 4
  1573 2 2 2 2 2
  1293 3 2 1 2 3
  1282 5 4 3 2 1
  1189 6 6 6 6 6
  1087 1 2 3 2 1
  915  2 1 1 1 1
  894  3 4 3 2 1
  890  3- 3- 3- 3- 3-
  883  3 3 3 2 1
  855  3 2 1 1 1
  852  1 1 1 1 2
  825  6 5 4 3 2
  806  5 5 5 5 4
  786  3 2 1 2 1
  785  5 5 5 5 6
  761  3 3 3 3 2
  761  7 7 7 7 7
  753  7- 7- 7- 7- 7-
  752  5 1 1 1 1
  726  1 2 3 4 5
  723  4 3 2 1 1

Key

Using entire contents of file to determine key:

  keycor *.krn | sed 's/.*: //' | sortcount -p
  22.33	C Major
  15.05	F Major
  11.87	G Major
  9.21	E- Major
  6.87	B- Major
  5.82	A Minor
  4.51	D Minor
  4.39	D Major
  4.16	C Minor
  3.44	G Minor
  3.03	E Minor
  2.11	A- Major
  1.78	A Major
  1.41	F Minor
  1.02	E Major
  0.64	B- Minor
  0.53	D- Major
  0.39	E- Minor
  0.39	B Minor
  0.29	B Major
  0.18	G- Major
  0.16	F# Minor
  0.16	C# Minor
  0.08	F# Major
  0.06	C# Major
  0.04	D- Minor
  0.04	A- Minor
  0.04	G- Minor
  0.02	G# Minor
  0.02	G# Major


Using only the first 8 bars of the music for analysis of key:

  for i in *.krn
  do
     myank -m 1-8 $i | keycor
  done | sed 's/.*: //' | sortcount -p
  19.46	C Major
  12.32	F Major
  10.2	G Major
  6.89	E- Major
  6.7	A Minor
  6.65	D Minor
  5.64	B- Major
  5.36	C Minor
  5.21	D Major
  4.3	G Minor
  4.1	E Minor
  2.41	F Minor
  1.72	A Major
  1.65	A- Major
  1.31	B Minor
  1.22	B Major
  1.04	E Major
  0.89	B- Minor
  0.56	E- Minor
  0.56	D- Major
  0.48	F# Minor
  0.45	C# Minor
  0.3	A- Minor
  0.23	G- Major
  0.12	F# Major
  0.08	C# Major
  0.07	G# Minor
  0.05	G- Minor
  0.03	G# Major
  0.02	D- Minor



Lab 1 (intro) Lab 2 (Essen) Lab 3 (searching) Lab 4 (JRP) Lab 5 (Wikifonia) Lab 6 (bar chart) Lab 7 (regular expressions) Lab 8 (chorck & cint)