Difference between revisions of "Humdrum lab 5"

From CCARH Wiki
Jump to navigation Jump to search
(Created page with "This lab is about plotting data, and doing further analysis of the raw data extracted from Humdrum files. There are several possibilities for plotting. We will focus on the...")
 
 
(66 intermediate revisions by the same user not shown)
Line 1: Line 1:
This lab is about plotting data, and doing further analysis of the raw data extracted from Humdrum files.
+
This lab is about the wikifonia data.
  
 +
== Update Humdrum tools ==
  
There are several possibilities for plotting.  We will focus on the last one in the lab, but here are other possibilities:
+
In case there have been changes to the humdrum tools programs, you can update them with this command:
  
 +
    cd $(which beat | sed 's/humdrum-tools.*/humdrum-tools/')
 +
    make update
 +
    make
  
 +
== Basic information ==
  
== Load data into a spreadsheet ==
+
How many files
  
You can copy-and-paste data into a spreadsheet, either Microsoft Excel, Google Spreadsheets, or similar.
+
    ls *.krn | wc -l
 +
    6710
  
In MacOS, try the command:
+
How many have lyrics:
  
     humcat -s h://chorales | deg -at | serialize | ridx -H | egrep -v "=|r" | sortcount | pbcopy
+
     grep -l "\*\*text" *.krn | wc -l
 +
    5460
  
To extract a count of scale-degrees in Bach chorales:
+
How many have chords:
  
  15628 5
+
    grep -l "\*\*mxhm" *.krn | wc -l
  14991 1
+
    6282
  11710 3
+
 
  10721 2  
+
How many have two or more verses:
  9435 4
+
 
  7761 6  
+
    grep -l "\*\*text.*\*\*text" *.krn | wc -l
  5742 7
+
    2006
  4728 7-
+
 
  1135 6+
+
== Bibliographic and basic information ==
  1134 4+  
+
 
  556 3+
+
Who are the top 10 represented composer in the data:
  332 2-
+
 
  326 1+
+
  grep -h COM *.krn | sortcount | head -n 10
  324 5+
+
  132 !!!COM: Unknown
  50 3-
+
  121 !!!COM: Hungarian folk song
  41 5-
+
  119 !!!COM: Traditional
  32 2+
+
  91 !!!COM: Richard Rodgers
  11 6-
+
  75 !!!COM: Irving Berlin
7 1-
+
  67 !!!COM: Hungarian song
2 4-
+
  65 !!!COM: Cole Porter
 +
  46 !!!COM: Harry Warren
 +
  45 !!!COM: George Gershwin
 +
  40 !!!COM: Harold Arlen
 +
 
 +
Most repeated titles:
 +
 
 +
  grep OTL * -h | sortcount | head -n 10
 +
  7 !!!OTL: A Daisy A Day
 +
  6 !!!OTL: Amazing Grace
 +
  5 !!!OTL: Cabaret
 +
  5 !!!OTL: Take Five
 +
  4 !!!OTL: test
 +
  4 !!!OTL: Unforgettable
 +
  4 !!!OTL: You'll Never Walk Alone
 +
  4 !!!OTL: Nuages
 +
  4 !!!OTL: Birk's Works
 +
  4 !!!OTL: This Is My Song
 +
 
 +
Note that "sortcount" is a Humdrum Extra script which is equivalent (in this case) with the unix command "sort | uniq -c | sort -nr".  Which files contain the totle "A Daisy A Day":
 +
 
 +
    grep "OTL.*A Daisy A Day" *.krn -l
 +
    WF3959.krn
 +
    WF3960.krn
 +
    WF3961.krn
 +
    WF3962.krn
 +
    WF3963.krn
 +
    WF3964.krn
 +
    WF3967.krn
 +
 
 +
To view the files in VHV from MacOS:
 +
  cat WV3959.krn | pbcopy
 +
And then paste onto https://verovio.humdrum.org text region (command-A to select all old text, and then command-V to paste new score).
 +
 
 +
[[File:wf3959.png|center|500px]]
 +
 
 +
List the titles of all pieces where George Gershwin is the composer:
 +
 
 +
  grep OTL  $(grep -li  COM.*Gershwin *.krn) | sort -k2
 +
  WF2190.krn:!!!OTL: 'S Wonderful!
 +
  WF2191.krn:!!!OTL: A FOGGY DAY
 +
  WF2267.krn:!!!OTL: A Foggy Day
 +
  WF2186.krn:!!!OTL: A Woman Is A Sometime Thing
 +
  WF2192.krn:!!!OTL: Bidin' My Time
 +
  WF2178.krn:!!!OTL: Blues
 +
  WF2193.krn:!!!OTL: But Not For Me
 +
  WF2194.krn:!!!OTL: By Strauss
 +
  WF2195.krn:!!!OTL: Clap yo' hands
 +
  WF2185.krn:!!!OTL: Do It Again!
 +
  WF2196.krn:!!!OTL: Embraceable You
 +
  WF2197.krn:!!!OTL: Fascinating Rhythm
 +
  WF2268.krn:!!!OTL: For You, For Me, For Evermore
 +
  WF2198.krn:!!!OTL: How Long Has This Been Going On
 +
  WF2199.krn:!!!OTL: I Got Plenty o' Nuttin'
 +
  WF2103.krn:!!!OTL: I Got Rhythm
 +
  WF2200.krn:!!!OTL: I Got Rhythm
 +
  WF2222.krn:!!!OTL: I Loves You Porgy
 +
  WF2201.krn:!!!OTL: I Was Doing All Right
 +
  WF2202.krn:!!!OTL: I Was Doing All Right
 +
  WF2179.krn:!!!OTL: I loves you Porgy
 +
  WF2184.krn:!!!OTL: I'll Build A Stairway To Paradise
 +
  WF2203.krn:!!!OTL: I've Got A Crush On You
 +
  WF2204.krn:!!!OTL: Isn't It A Pity
 +
  WF2221.krn:!!!OTL: It Ain't Necessarily So
 +
  WF2205.krn:!!!OTL: Let's Call the Whole Thing Off
 +
  WF2220.krn:!!!OTL: Liza
 +
  WF2219.krn:!!!OTL: Liza (All the clouds'll roll away)
 +
  WF2269.krn:!!!OTL: Love Is Here To Stay
 +
  WF2206.krn:!!!OTL: Love Walked In
 +
  WF2223.krn:!!!OTL: My Man's Gone Now
 +
  WF2207.krn:!!!OTL: Nice Work If You Can Get It
 +
  WF2208.krn:!!!OTL: Oh Lady Be Good
 +
  WF2189.krn:!!!OTL: SUMMERTIME
 +
  WF2995.krn:!!!OTL: Shoes With Wings On
 +
  WF2183.krn:!!!OTL: Somebody Loves Me
 +
  WF2209.krn:!!!OTL: Someone To Watch Over Me
 +
  WF2210.krn:!!!OTL: Soon
 +
  WF2211.krn:!!!OTL: Strike Up The Band
 +
  WF2180.krn:!!!OTL: Summertime
 +
  WF2181.krn:!!!OTL: Summertime
 +
  WF2187.krn:!!!OTL: Summertime
 +
  WF2212.krn:!!!OTL: Summertime
 +
  WF2224.krn:!!!OTL: Swanee
 +
  WF2213.krn:!!!OTL: That Certain Feeling
 +
  WF2214.krn:!!!OTL: The Man I Love
 +
  WF2225.krn:!!!OTL: The Simple Life
 +
  WF2188.krn:!!!OTL: There's A Boat Dat's Leaving Soon For New York
 +
  WF2215.krn:!!!OTL: They All Laughed
 +
  WF2216.krn:!!!OTL: They Can't Take That Away From Me
 +
  WF2217.krn:!!!OTL: They Can't Take That Away From Me
 +
  WF2218.krn:!!!OTL: Who Cares
 +
 
 +
== Texture ==
 +
 
 +
How many contain more than one **kern spine (i.e., are polyphonic, probably piano):
 +
 
 +
    grep -l "\*\*kern.*\*\*kern" *.krn | wc -l
 +
    58
 +
 
 +
WF5118.krn is an example:
 +
 
 +
[[File:wf5118.png|center|500px]]
 +
 
 +
This one is interesting because it has invisible chords in the top staff which are realizing the harmonic chords above the staff.
 +
 
 +
How many songs have chords (this takes a long time to calculate -- 70 songs per second = 95 seconds):
 +
 
 +
  for i in *.krn
 +
  do
 +
      extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
 +
  done | grep -v " ^0$" | wc -l
 +
  365
 +
 +
How many songs do not have chords:
 +
 
 +
  for i in *.krn
 +
  do
 +
      extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
 +
  done | grep " ^0$" | wc -l
 +
  6345
 +
 
 +
== Duration ==
 +
 
 +
What is the duration of all songs if played back-to-back and at the specified tempo without repeats?
 +
 
 +
    gettime -T *.krn | tail -n 1
 +
    286:50:23.1354 hours
 +
 
 +
What are the longest songs:
 +
 
 +
    gettime --simple -T *.krn | sort -k2 -nr | head -n 10
 +
    WF6618.krn: 3120
 +
    WF0181.krn: 3120
 +
    WF0182.krn: 1864
 +
    WF3616.krn: 1420
 +
    WF6336.krn: 1134
 +
    WF5131.krn: 909
 +
    WF6068.krn: 785
 +
    WF5004.krn: 696
 +
    WF3226.krn: 671
 +
    WF1249.krn: 664
 +
 
 +
The -k2 option means to sort by the second column of data. -n means to sort numerically rather than alphabetically, and -r means to sort by highest first.
 +
 
 +
What are the shortest songs:
 +
 
 +
    gettime --simple -T *.krn | sort -k2 -nr | tail -n 10
 +
    WF2814.krn: 16
 +
    WF2806.krn: 16
 +
    WF2795.krn: 16
 +
    WF2785.krn: 16
 +
    WF2856.krn: 14
 +
    WF2852.krn: 12
 +
    WF2799.krn: 12
 +
    WF6338.krn: 8
 +
    WF5609.krn: 8
 +
 
 +
  The shortest song in VHV:
 +
 
 +
    cat WF5609.krn | pbcopy
 +
 
 +
[[File:wf-shortest.png|center|500px]]
 +
 
 +
== Meter ==
 +
 
 +
What sort of meters are in the database and how much of each type?
 +
 
 +
    beat -Ca *.krn | beat -Ua  | extractx -s '$1-$' | ridx -H | sortcount -p
 +
    65.89 4 4
 +
    15.44 3 4
 +
    11.67 2 2
 +
    3.24 2 4
 +
    2.78 6 8
 +
    0.52 12 8
 +
    0.16 5 4
 +
    0.14 9 8
 +
    0.09 6 4
 +
    0.06 3 8
 +
    0.01 3 2
 +
    0.01 7 4
 +
    0    2 8
 +
    0    7 8
 +
    0    9 4
 +
    0    10 8
 +
    0    17 16
 +
    0    1 2
 +
    0    5 8
 +
    0    1 4
 +
    0    4 8
 +
 
 +
The most common meter is 4/4, where 65% of the music is in that meter.
 +
 
 +
-C means extract the count of the meter (the top number).
 +
 
 +
-U means extract the duration unit from the meter (the bottom number).
 +
 
 +
-C and -U are output once for each measure, so using these are a simple way of counting the number of measures in the scores. If you add -F option with these two options, every data line will display the metrical information.
 +
 
 +
-a means to append the analysis to the end of the lines (keeping the original input score).
 +
 
 +
The extract option:
 +
 
 +
    -s '$1-$'
 +
 
 +
means to extract from one before the last spine to the last spine.  $1 is one before the last spine, $2 is two before the last spine, and so on.
 +
 
 +
== Chord labels ==
 +
 
 +
How many unique chord labels are there?
 +
 
 +
    extractx -i mxhm  * | ridx -H | sortcount | wc -l
 +
    1399
 +
 
 +
What are the most common ones:
 +
 
 +
    extractx -i mxhm * | ridx -H | sortcount -p | head -n 10
 +
    7.21 C major
 +
    6.14 F major
 +
    4.94 G major
 +
    4.83 G dominant
 +
    4.07 C dominant
 +
    3.55 D dominant
 +
    3.28 B- major
 +
    2.81 E- major
 +
    2.3 D major
 +
    2.25 F dominant
 +
 
 +
How many chord qualities:
 +
 
 +
    extractx -i mxhm * | ridx -H | sed 's/[^ ]* //; s/\/.*//' | sortcount | wc -l
 +
    80
 +
 
 +
Here are the 80 qualities:
 +
 
 +
  93061 major
 +
  64155 dominant
 +
  31402 minor
 +
  27490 minor-seventh
 +
  8594 major-seventh
 +
  5861 dominant-ninth
 +
  5733 major-sixth
 +
  4138 diminished
 +
  2943 min
 +
  2912 7
 +
  2816 minor-sixth
 +
  2159 half-diminished
 +
  2154 suspended-fourth
 +
  1884 diminished-seventh
 +
  1738 augmented-seventh
 +
  1408 augmented
 +
  1208 C
 +
  1102 dominant-13th
 +
  1082 min7
 +
  1008 F
 +
  1008 maj7
 +
  967 dominant-seventh
 +
  892 G
 +
  878 minor-ninth
 +
  705 D
 +
  650 B-
 +
  592 major-ninth
 +
  355 E-
 +
  352 A
 +
  280 E
 +
  249 power
 +
  237 dominant-11th
 +
  222 suspended-second
 +
  202 minor-11th
 +
  165 minor-major
 +
  157 dim
 +
  129 maj
 +
  128 A-
 +
  96 augmented-ninth
 +
  84 9
 +
  71 other
 +
  66 B
 +
  62 6
 +
  62 major-minor
 +
  58 sus47
 +
  46 aug
 +
  46 D-
 +
  46 min9
 +
  36 G-
 +
  29 m7b5
 +
  23 major-13th
 +
  21 maj9
 +
  19 min6
 +
  19 none
 +
  17 pedal
 +
  16 dim7
 +
  16 maj69
 +
  15 F#
 +
  12 major B- major
 +
  8 major F major
 +
  7 C#
 +
  6 major .
 +
  5 minor D minor D minor
 +
  4 minor-13th
 +
  3 minor G minor
 +
  3 C-
 +
  3 minMaj7
 +
  3 D#
 +
  2 minor .
 +
  2 major F major F major
 +
  2 dominant C dominant
 +
  2 5b
 +
  2 major C major C major
 +
  1 major E- major
 +
  1 ma
 +
  1 major . .
 +
  1 major G major G major
 +
  1 minor . .
 +
  1 7sus
 +
  1
 +
 
 +
 
 +
  extractx -i mxhm * | ridx -H | sed 's/ .*//' | sortcount
 +
  48078 C
 +
  45001 G
 +
  38459 F
 +
  33577 D
 +
  24361 A
 +
  23237 B-
 +
  16334 E-
 +
  15587 E
 +
  8279 A-
 +
  7869 B
 +
  3487 D-
 +
  3365 F#
 +
  1701 C#
 +
  1316 G-
 +
  824 G#
 +
  187 D#
 +
  168 C-
 +
  52 A#
 +
  28 F-
 +
  6 B--
 +
  4 B#
 +
  4 . F
 +
  3 B/D#
 +
  3 E#
 +
  3 C/G
 +
  1 . B-
 +
  1 A--
 +
 
 +
 
 +
What is the most common 3-note chord sequence:
 +
 
 +
    extractx -i mxhm  * | grep -v ^= | serialize | context -n 3 | ridx -H | sortcount | head -n 10
 +
    1779 C major G dominant C major (I V7 I)
 +
    1334 F major C dominant F major (I V7 I)
 +
    1301 C major F major C major    (I V I)
 +
    1062 D minor-seventh G dominant C major (ii7 V7 I)
 +
    994  G major D dominant G major (I V7 I)
 +
    939  G dominant C major G dominant (V7 I V7)
 +
    863  G major C major G major    (V I V)
 +
    857 G dominant C major F major (V7 I IV)
 +
    812  F major B- major F major (V I V)
 +
    781  G minor-seventh C dominant F major (ii7 V7 I)
 +
 
 +
 
 +
== Scale degrees ==
 +
 +
The key information is not present in the files.  They need to be processed further for that.  MusicXML input has key information, but it is often incorrect since people use it more for key signature information, and the "mode" part is usually left at "major". The finalis-tonic script can be used to add an approximate key.
 +
 
 +
 
 +
  for i in *.krn
 +
  do
 +
      finalis-tonic $i | extractx -i **kern | deg -at | serialize | ridx -H | grep -v r
 +
  done | sortcount
 +
 
 +
Output includes chords and some other junk, but basic counts are:
 +
 
 +
161963 1
 +
135309 5
 +
102726 2
 +
102275 3
 +
86931 6
 +
  85740 4
 +
50631 7
 +
  38827 7-
 +
  38765 3-
 +
24511 6-
 +
12031 4+
 +
11188 2-
 +
6041 5+
 +
5822 2+
 +
5803 1+
 +
4907 5-
 +
2822 1-
 +
2669 6+
 +
  1874 4-
 +
1222 7+
 +
  1126 3+
 +
 
 +
Looking at 5-note sequences:
 +
 
 +
  for i in *.krn
 +
  do
 +
      finalis-tonic $i | extractx -i **kern | deg -at | serialize -f | grep -v '^[r=]' \
 +
            | context -n 5 | ridx -H ; done | sortcount > /tmp/analysis-data.txt
 +
  done
 +
  head -n 25 analysis-data.txt
 +
 
 +
  6053 1 1 1 1 1
 +
  4417 5 5 5 5 5
 +
  2186 3 3 3 3 3
 +
  1872 4 4 4 4 4
 +
  1573 2 2 2 2 2
 +
  1293 3 2 1 2 3
 +
  1282 5 4 3 2 1
 +
  1189 6 6 6 6 6
 +
  1087 1 2 3 2 1
 +
  915 2 1 1 1 1
 +
  894  3 4 3 2 1
 +
  890  3- 3- 3- 3- 3-
 +
  883  3 3 3 2 1
 +
  855  3 2 1 1 1
 +
  852  1 1 1 1 2
 +
  825  6 5 4 3 2
 +
  806  5 5 5 5 4
 +
  786 3 2 1 2 1
 +
  785 5 5 5 5 6
 +
  761 3 3 3 3 2
 +
  761  7 7 7 7 7
 +
  753  7- 7- 7- 7- 7-
 +
  752 5 1 1 1 1
 +
  726 1 2 3 4 5
 +
  723 4 3 2 1 1
 +
 
 +
== Key ==
 +
 
 +
Using entire contents of file to determine key:
 +
 
 +
  keycor *.krn | sed 's/.*: //' | sortcount -p
 +
  22.33 C Major
 +
  15.05 F Major
 +
  11.87 G Major
 +
  9.21 E- Major
 +
  6.87 B- Major
 +
  5.82 A Minor
 +
  4.51 D Minor
 +
  4.39 D Major
 +
  4.16 C Minor
 +
  3.44 G Minor
 +
  3.03 E Minor
 +
  2.11 A- Major
 +
  1.78 A Major
 +
  1.41 F Minor
 +
  1.02 E Major
 +
  0.64 B- Minor
 +
  0.53 D- Major
 +
  0.39 E- Minor
 +
  0.39 B Minor
 +
  0.29 B Major
 +
  0.18 G- Major
 +
  0.16 F# Minor
 +
  0.16 C# Minor
 +
  0.08 F# Major
 +
  0.06 C# Major
 +
  0.04 D- Minor
 +
  0.04 A- Minor
 +
  0.04 G- Minor
 +
  0.02 G# Minor
 +
  0.02 G# Major
 +
 
 +
 
 +
Using only the first 8 bars of the music for analysis of key:
 +
 
 +
  for i in *.krn
 +
  do
 +
      myank -m 1-8 $i | keycor
 +
  done | sed 's/.*: //' | sortcount -p
 +
 
 +
  19.46 C Major
 +
  12.32 F Major
 +
  10.2 G Major
 +
  6.89 E- Major
 +
  6.7 A Minor
 +
  6.65 D Minor
 +
  5.64 B- Major
 +
  5.36 C Minor
 +
  5.21 D Major
 +
  4.3 G Minor
 +
  4.1 E Minor
 +
  2.41 F Minor
 +
  1.72 A Major
 +
  1.65 A- Major
 +
  1.31 B Minor
 +
  1.22 B Major
 +
  1.04 E Major
 +
  0.89 B- Minor
 +
  0.56 E- Minor
 +
  0.56 D- Major
 +
  0.48 F# Minor
 +
  0.45 C# Minor
 +
  0.3 A- Minor
 +
  0.23 G- Major
 +
  0.12 F# Major
 +
  0.08 C# Major
 +
  0.07 G# Minor
 +
  0.05 G- Minor
 +
  0.03 G# Major
 +
  0.02 D- Minor
 +
 
 +
 
 +
 
 +
{{humdrum_labs}}

Latest revision as of 19:11, 13 April 2021

This lab is about the wikifonia data.

Update Humdrum tools

In case there have been changes to the humdrum tools programs, you can update them with this command:

   cd $(which beat | sed 's/humdrum-tools.*/humdrum-tools/')
   make update
   make

Basic information

How many files

   ls *.krn | wc -l
   6710

How many have lyrics:

    grep -l "\*\*text" *.krn | wc -l
    5460

How many have chords:

    grep -l "\*\*mxhm" *.krn | wc -l
    6282

How many have two or more verses:

    grep -l "\*\*text.*\*\*text" *.krn | wc -l
    2006

Bibliographic and basic information

Who are the top 10 represented composer in the data:

  grep -h COM *.krn | sortcount | head -n 10
  132	!!!COM:	Unknown
  121	!!!COM:	Hungarian folk song
  119	!!!COM:	Traditional
  91	!!!COM:	Richard Rodgers
  75	!!!COM:	Irving Berlin
  67	!!!COM:	Hungarian song
  65	!!!COM:	Cole Porter
  46	!!!COM:	Harry Warren
  45	!!!COM:	George Gershwin
  40	!!!COM:	Harold Arlen

Most repeated titles:

  grep OTL * -h | sortcount | head -n 10
  7	!!!OTL:	A Daisy A Day
  6	!!!OTL:	Amazing Grace
  5	!!!OTL:	Cabaret
  5	!!!OTL:	Take Five
  4	!!!OTL:	test
  4	!!!OTL:	Unforgettable
  4	!!!OTL:	You'll Never Walk Alone
  4	!!!OTL:	Nuages
  4	!!!OTL:	Birk's Works
  4	!!!OTL:	This Is My Song

Note that "sortcount" is a Humdrum Extra script which is equivalent (in this case) with the unix command "sort | uniq -c | sort -nr". Which files contain the totle "A Daisy A Day":

   grep "OTL.*A Daisy A Day" *.krn -l
   WF3959.krn
   WF3960.krn
   WF3961.krn
   WF3962.krn
   WF3963.krn
   WF3964.krn
   WF3967.krn

To view the files in VHV from MacOS:

  cat WV3959.krn | pbcopy

And then paste onto https://verovio.humdrum.org text region (command-A to select all old text, and then command-V to paste new score).

Wf3959.png

List the titles of all pieces where George Gershwin is the composer:

  grep OTL  $(grep -li  COM.*Gershwin *.krn) | sort -k2
  WF2190.krn:!!!OTL:	'S Wonderful!
  WF2191.krn:!!!OTL:	A FOGGY DAY
  WF2267.krn:!!!OTL:	A Foggy Day
  WF2186.krn:!!!OTL:	A Woman Is A Sometime Thing
  WF2192.krn:!!!OTL:	Bidin' My Time
  WF2178.krn:!!!OTL:	Blues
  WF2193.krn:!!!OTL:	But Not For Me
  WF2194.krn:!!!OTL:	By Strauss
  WF2195.krn:!!!OTL:	Clap yo' hands
  WF2185.krn:!!!OTL:	Do It Again!
  WF2196.krn:!!!OTL:	Embraceable You
  WF2197.krn:!!!OTL:	Fascinating Rhythm
  WF2268.krn:!!!OTL:	For You, For Me, For Evermore
  WF2198.krn:!!!OTL:	How Long Has This Been Going On
  WF2199.krn:!!!OTL:	I Got Plenty o' Nuttin'
  WF2103.krn:!!!OTL:	I Got Rhythm
  WF2200.krn:!!!OTL:	I Got Rhythm
  WF2222.krn:!!!OTL:	I Loves You Porgy
  WF2201.krn:!!!OTL:	I Was Doing All Right
  WF2202.krn:!!!OTL:	I Was Doing All Right
  WF2179.krn:!!!OTL:	I loves you Porgy
  WF2184.krn:!!!OTL:	I'll Build A Stairway To Paradise
  WF2203.krn:!!!OTL:	I've Got A Crush On You
  WF2204.krn:!!!OTL:	Isn't It A Pity
  WF2221.krn:!!!OTL:	It Ain't Necessarily So
  WF2205.krn:!!!OTL:	Let's Call the Whole Thing Off
  WF2220.krn:!!!OTL:	Liza
  WF2219.krn:!!!OTL:	Liza (All the clouds'll roll away)
  WF2269.krn:!!!OTL:	Love Is Here To Stay
  WF2206.krn:!!!OTL:	Love Walked In
  WF2223.krn:!!!OTL:	My Man's Gone Now
  WF2207.krn:!!!OTL:	Nice Work If You Can Get It
  WF2208.krn:!!!OTL:	Oh Lady Be Good
  WF2189.krn:!!!OTL:	SUMMERTIME
  WF2995.krn:!!!OTL:	Shoes With Wings On
  WF2183.krn:!!!OTL:	Somebody Loves Me
  WF2209.krn:!!!OTL:	Someone To Watch Over Me
  WF2210.krn:!!!OTL:	Soon
  WF2211.krn:!!!OTL:	Strike Up The Band
  WF2180.krn:!!!OTL:	Summertime
  WF2181.krn:!!!OTL:	Summertime
  WF2187.krn:!!!OTL:	Summertime
  WF2212.krn:!!!OTL:	Summertime
  WF2224.krn:!!!OTL:	Swanee
  WF2213.krn:!!!OTL:	That Certain Feeling
  WF2214.krn:!!!OTL:	The Man I Love
  WF2225.krn:!!!OTL:	The Simple Life
  WF2188.krn:!!!OTL:	There's A Boat Dat's Leaving Soon For New York
  WF2215.krn:!!!OTL:	They All Laughed
  WF2216.krn:!!!OTL:	They Can't Take That Away From Me
  WF2217.krn:!!!OTL:	They Can't Take That Away From Me
  WF2218.krn:!!!OTL:	Who Cares

Texture

How many contain more than one **kern spine (i.e., are polyphonic, probably piano):

   grep -l "\*\*kern.*\*\*kern" *.krn | wc -l
   58

WF5118.krn is an example:

Wf5118.png

This one is interesting because it has invisible chords in the top staff which are realizing the harmonic chords above the staff.

How many songs have chords (this takes a long time to calculate -- 70 songs per second = 95 seconds):

  for i in *.krn 
  do 
     extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
  done | grep -v " ^0$" | wc -l
  365

How many songs do not have chords:

  for i in *.krn 
  do 
     extractx -i kern $i | serialize | ridx -H | grep " " | wc -l
  done | grep " ^0$" | wc -l
  6345

Duration

What is the duration of all songs if played back-to-back and at the specified tempo without repeats?

   gettime -T *.krn | tail -n 1
   286:50:23.1354 hours

What are the longest songs:

    gettime --simple -T *.krn | sort -k2 -nr | head -n 10
    WF6618.krn:	3120
    WF0181.krn:	3120
    WF0182.krn:	1864
    WF3616.krn:	1420
    WF6336.krn:	1134
    WF5131.krn:	909
    WF6068.krn:	785
    WF5004.krn:	696
    WF3226.krn:	671
    WF1249.krn:	664

The -k2 option means to sort by the second column of data. -n means to sort numerically rather than alphabetically, and -r means to sort by highest first.

What are the shortest songs:

    gettime --simple -T *.krn | sort -k2 -nr | tail -n 10
    WF2814.krn:	16
    WF2806.krn:	16
    WF2795.krn:	16
    WF2785.krn:	16
    WF2856.krn:	14
    WF2852.krn:	12
    WF2799.krn:	12
    WF6338.krn:	8
    WF5609.krn:	8
 The shortest song in VHV:
    cat WF5609.krn | pbcopy
Wf-shortest.png

Meter

What sort of meters are in the database and how much of each type?

    beat -Ca *.krn | beat -Ua  | extractx -s '$1-$'  | ridx -H | sortcount -p
    65.89	4	4
    15.44	3	4
    11.67	2	2
    3.24	2	4
    2.78	6	8
    0.52	12	8
    0.16	5	4
    0.14	9	8
    0.09	6	4
    0.06	3	8
    0.01	3	2
    0.01	7	4
    0     	2	8
    0     	7	8
    0     	9	4
    0     	10	8
    0     	17	16
    0     	1	2
    0     	5	8
    0     	1	4
    0     	4	8

The most common meter is 4/4, where 65% of the music is in that meter.

-C means extract the count of the meter (the top number).

-U means extract the duration unit from the meter (the bottom number).

-C and -U are output once for each measure, so using these are a simple way of counting the number of measures in the scores. If you add -F option with these two options, every data line will display the metrical information.

-a means to append the analysis to the end of the lines (keeping the original input score).

The extract option:

    -s '$1-$'

means to extract from one before the last spine to the last spine. $1 is one before the last spine, $2 is two before the last spine, and so on.

Chord labels

How many unique chord labels are there?

   extractx -i mxhm  * | ridx -H | sortcount | wc -l
   1399

What are the most common ones:

   extractx -i mxhm  * | ridx -H | sortcount -p | head -n 10
   7.21	C major
   6.14	F major
   4.94	G major
   4.83	G dominant
   4.07	C dominant
   3.55	D dominant
   3.28	B- major
   2.81	E- major
   2.3	D major
   2.25	F dominant

How many chord qualities:

    extractx -i mxhm  * | ridx -H | sed 's/[^ ]* //; s/\/.*//' | sortcount | wc -l
    80

Here are the 80 qualities:

  93061	major
  64155	dominant
  31402	minor
  27490	minor-seventh
  8594	major-seventh
  5861	dominant-ninth
  5733	major-sixth
  4138	diminished
  2943	min
  2912	7
  2816	minor-sixth
  2159	half-diminished
  2154	suspended-fourth
  1884	diminished-seventh
  1738	augmented-seventh
  1408	augmented
  1208	C
  1102	dominant-13th
  1082	min7
  1008	F
  1008	maj7
  967	dominant-seventh
  892	G
  878	minor-ninth
  705	D
  650	B-
  592	major-ninth
  355	E-
  352	A
  280	E
  249	power
  237	dominant-11th
  222	suspended-second
  202	minor-11th
  165	minor-major
  157	dim
  129	maj
  128	A-
  96	augmented-ninth
  84	9
  71	other
  66	B
  62	6
  62	major-minor
  58	sus47
  46	aug
  46	D-
  46	min9
  36	G-
  29	m7b5
  23	major-13th
  21	maj9
  19	min6
  19	none
  17	pedal
  16	dim7
  16	maj69
  15	F#
  12	major	B- major
  8	major	F major
  7	C#
  6	major	.
  5	minor	D minor	D minor
  4	minor-13th
  3	minor	G minor
  3	C-
  3	minMaj7
  3	D#
  2	minor	.
  2	major	F major	F major
  2	dominant	C dominant
  2	5b
  2	major	C major	C major
  1	major	E- major
  1	ma
  1	major	.	.
  1	major	G major	G major
  1	minor	.	.
  1	7sus
  1


  extractx -i mxhm  * | ridx -H | sed 's/ .*//' | sortcount
  48078	C
  45001	G
  38459	F
  33577	D
  24361	A
  23237	B-
  16334	E-
  15587	E
  8279	A-
  7869	B
  3487	D-
  3365	F#
  1701	C#
  1316	G-
  824	G#
  187	D#
  168	C-
  52	A#
  28	F-
  6	B--
  4	B#
  4	.	F
  3	B/D#
  3	E#
  3	C/G
  1	.	B-
  1	A--


What is the most common 3-note chord sequence:

    extractx -i mxhm  * | grep -v ^= | serialize | context -n 3 | ridx -H | sortcount | head -n 10
    1779	C major G dominant C major	(I V7 I)
    1334	F major C dominant F major	(I V7 I)
    1301	C major F major C major     	(I V I)
    1062	D minor-seventh G dominant C major	(ii7 V7 I)
    994  	G major D dominant G major	(I V7 I)
    939  	G dominant C major G dominant	(V7 I V7)
    863  	G major C major G major     	(V I V)
    857 	G dominant C major F major	(V7 I IV)
    812  	F major B- major F major	(V I V)
    781  	G minor-seventh C dominant F major	(ii7 V7 I)


Scale degrees

The key information is not present in the files. They need to be processed further for that. MusicXML input has key information, but it is often incorrect since people use it more for key signature information, and the "mode" part is usually left at "major". The finalis-tonic script can be used to add an approximate key.


  for i in *.krn 
  do 
     finalis-tonic $i | extractx -i **kern | deg -at | serialize | ridx -H | grep -v r
  done | sortcount

Output includes chords and some other junk, but basic counts are:

161963	1
135309	5
102726	2
102275	3 
86931	6
85740	4
50631	7
38827	7-
38765	3-
24511	6-
12031	4+
11188	2-
6041	5+
5822	2+
5803	1+
4907	5-
2822	1-
2669	6+
1874	4-
1222	7+
1126	3+

Looking at 5-note sequences:

  for i in *.krn
  do 
     finalis-tonic $i | extractx -i **kern | deg -at | serialize -f | grep -v '^[r=]' \
           | context -n 5 | ridx -H ; done | sortcount > /tmp/analysis-data.txt
  done
  head -n 25 analysis-data.txt
  6053 1 1 1 1 1
  4417 5 5 5 5 5
  2186 3 3 3 3 3
  1872 4 4 4 4 4
  1573 2 2 2 2 2
  1293 3 2 1 2 3
  1282 5 4 3 2 1
  1189 6 6 6 6 6
  1087 1 2 3 2 1
  915  2 1 1 1 1
  894  3 4 3 2 1
  890  3- 3- 3- 3- 3-
  883  3 3 3 2 1
  855  3 2 1 1 1
  852  1 1 1 1 2
  825  6 5 4 3 2
  806  5 5 5 5 4
  786  3 2 1 2 1
  785  5 5 5 5 6
  761  3 3 3 3 2
  761  7 7 7 7 7
  753  7- 7- 7- 7- 7-
  752  5 1 1 1 1
  726  1 2 3 4 5
  723  4 3 2 1 1

Key

Using entire contents of file to determine key:

  keycor *.krn | sed 's/.*: //' | sortcount -p
  22.33	C Major
  15.05	F Major
  11.87	G Major
  9.21	E- Major
  6.87	B- Major
  5.82	A Minor
  4.51	D Minor
  4.39	D Major
  4.16	C Minor
  3.44	G Minor
  3.03	E Minor
  2.11	A- Major
  1.78	A Major
  1.41	F Minor
  1.02	E Major
  0.64	B- Minor
  0.53	D- Major
  0.39	E- Minor
  0.39	B Minor
  0.29	B Major
  0.18	G- Major
  0.16	F# Minor
  0.16	C# Minor
  0.08	F# Major
  0.06	C# Major
  0.04	D- Minor
  0.04	A- Minor
  0.04	G- Minor
  0.02	G# Minor
  0.02	G# Major


Using only the first 8 bars of the music for analysis of key:

  for i in *.krn
  do
     myank -m 1-8 $i | keycor
  done | sed 's/.*: //' | sortcount -p
  19.46	C Major
  12.32	F Major
  10.2	G Major
  6.89	E- Major
  6.7	A Minor
  6.65	D Minor
  5.64	B- Major
  5.36	C Minor
  5.21	D Major
  4.3	G Minor
  4.1	E Minor
  2.41	F Minor
  1.72	A Major
  1.65	A- Major
  1.31	B Minor
  1.22	B Major
  1.04	E Major
  0.89	B- Minor
  0.56	E- Minor
  0.56	D- Major
  0.48	F# Minor
  0.45	C# Minor
  0.3	A- Minor
  0.23	G- Major
  0.12	F# Major
  0.08	C# Major
  0.07	G# Minor
  0.05	G- Minor
  0.03	G# Major
  0.02	D- Minor



Lab 1 (intro) Lab 2 (Essen) Lab 3 (searching) Lab 4 (JRP) Lab 5 (Wikifonia) Lab 6 (bar chart) Lab 7 (regular expressions) Lab 8 (chorck & cint)