Difference between revisions of "Humdrum Extras"

From CCARH Wiki
Jump to navigation Jump to search
Line 412: Line 412:
 
<br><br><br>
 
<br><br><br>
  
= C string comparison functions =
+
== C string comparison functions ==
  
 
Here are three of the string comparison functions available within in the C (or C++) language:
 
Here are three of the string comparison functions available within in the C (or C++) language:

Revision as of 01:59, 8 December 2012

Humdrum Extras is a set of command-line programs and C++ parser library for processing Humdrum files. The programs can be compiled for Linux, Apple OS X, or Windows (primarily within cygwin, but also in Visual C++). The Humdrum Extras library can be used to parse Humdrum files independent of the example programs provided with the package.


Example Programs

The primary intent of the Humdrum Extras package is for user-based processing of Humdrum files as an auxiliary to the Humdrum Toolkit. Since the programs are compiled from C++, they process data much faster than programs written in interpreted languages, such as AWK which is the main development language for the Humdrum Toolkit.

Documentation for example programs can be found on the web at extras.humdrum.org/man. The source code for these programs is found in the download file, within the src-programs directory, or they can be viewed online.




Programming Examples

Basic data access

humecho.cpp

Here is a very simple C++ program called humecho.cpp that uses the Humdrum file parser in the Humdrum Extras library:

#include "humdrum.h"
#include <iostream>

int main(int argc, char** argv) {
   HumdrumFile hfile;
   if (argc > 1) hfile.read(argv[1]);
   else hfile.read(std::cin);
   std::cout << hfile;
   return 0;
}

This program will take one Humdrum file as an argument (or standard input) and echo the contents of the Humdrum file to standard output. To compile this program using the Humdrum Extras makefiles, place humecho.cpp in the directory humextra/src-programs, and then type "make humecho. The humecho program can be utilized in several ways, including downloading from the web, or using the humdrum:// (or hum:// or h:// abbreviations):

   cat file.krn | bin/humecho           | less     # standard input
   bin/humecho file.krn                 | less     # command-line argument
   bin/humecho h://wtc/wtc1f01.krn      | less     # humdrum:// URI
   bin/humecho http://y.z.com/file.krn  | less     # URL




humecho2.cpp (Accessing individual lines)

The humecho program shows how to access the datafile in its entirety. The following source code for humecho2.cpp demonstrates how to access lines in the file individually. A HumdrumFile class essentially consists of an array of HumdrumRecord classes, and HumdrumRecord classes essentially are character strings which print tab-delimited with cout:

#include "humdrum.h"

int main(int argc, char** argv) {
   HumdrumFile hfile;
   if (argc > 1) hfile.read(argv[1]);
   else hfile.read(std::cin);
   for (int i=0; i<hfile.getNumLines(); i++) {
      std::cout << hfile[i] << std::endl;
   }
   return 0;
}

hfile.getNumLines() returns the number of text lines in the Humdrum file stored in the hfile variable. So the for loop iterates through each line in the file and prints it to standard output.




humecho3.cpp (Accessing spine data)

An even more verbose version of humecho is given below. The humecho3 program implements the << operator as a second for-loop. Each HumdrumRecord representing a line of music can be thought of as an array of strings, with each string being one token in the Humdrum File structure.

#include "humdrum.h"

int main(int argc, char** argv) {
   HumdrumFile hfile;
   if (argc > 1) hfile.read(argv[1]);
   else hfile.read(std::cin);
   for (int i=0; i<hfile.getNumLines(); i++) {
      std::cout << "\t" << hfile[i][0];
      for (int j=1; j<hfile[i].getFieldCount(); j++) {
         std::cout << "\t" << hfile[i][j] << std::endl;
      }
      std::cout << std::endl;
   }
   return 0;
}


HumdrumRecords always contain at least one field, so the code "cout << hfile[i][0];" will not cause an invalid array access in any situation. Both [] operators used on the hfile variable (first to access a HumdrumRecord, and the second for a const char*) are checked for a valid range, and the program will exit with an error if an out-of-range value is requested.

The code hfile[i].getFieldCount() returns the number of "fields" on the line. This is a non-standard term for Humdrum files, since "spines" and "tokens" can have somewhat ambiguous meanings. The field count is a count of the spines, but if the spines split the count would include the subspines as well. Global comments and reference records are always element 0 in a HumdrumRecord line. Empty lines, which are technically not allowed in Humdrum files, are also acessed as an empty string at element 0.

Note that hfile[i][j] is a const char* and not a char*. If you want to change the contents of a field, you would have to use hfile[i].changeField(j, "new string").




HumdrumRecord line types

Each HumdrumRecord is a certain enumerated type.

E_humrec_empty empty line (technically invalid, but allowed in Humdrum Extras parsing)
E_humrec_bibliography of the form “!!!key: value”
E_humrec_global_comment   starts with “!!”
E_humrec_local_comment local comment (!)
E_humrec_data_measure line starting with “=”
E_humrec_interpretation line starting with “*”
E_humrec_data data lines other than measure

Use the HumdrumRecord::getType() function to access the type of a line. But for better code readability, the following helper HumdrumRecord functions interface with these enumerations:

.isData() true if data (other than barline).
.isMeasure() true if barline (line starts with “=”).
.isInterpretation() true if line starts with “*”.
.isBibliographic() true if in the form of “!!!key: value”.
.isGlobalComment()    true if line starts with “!!” and not bib.
.isLocalComment() true if line starts with one “!”.
.isEmpty() true if nothing on line.

In addition there are a few composite test for line types:

.isComment()    isBibliographic() or isGlobalComment() or isLocalComment()
.isTandem() Interpretation lines which contain no spine manipulators (*+, *-, *^, *v, *x, or exclusive interpretations (starting with **).
.isNull() isData() and all fields are "." (null token).




"rid -GLI" (Remove all lines except for data lines)

The Humdrum Tool rid with the -GLI options can be implemented using the following C++ code:

#include "humdrum.h"
int main(int argc, char** argv) {
   HumdrumFile hfile(argv[1]);
   for (int i=0; i<hfile.getNumLines(); i++) {
      if (!(hfile[i].isData() || hfile[i].isMeasure())) continue;
      std::cout << hfile[i] << std::endl;
   }
   return 0;
}

The above code will only print lines which are data or barlines. The official Humdrum file specification does not technically distinguish between barlines and data, but in practice and from a logical point of view they must be separated. So when using the Humdrum Extras C++ parser for Humdrum files, a line of data should not contain a mixture of data (or null tokens) and barlines.




"rid -GLId" (Remove comments, interpretations and null data)

#include "humdrum.h"
int main(int argc, char** argv) {
   HumdrumFile hfile(argv[1]);
   for (int i=0; i<hfile.getNumLines(); i++) {
      if (!(hfile[i].isData() || hfile[i].isMeasure())) continue;
      if (hfile[i].isNull()) continue;
      std::cout << hfile[i] << std::endl;
   }
   return 0;
}


The HumdrumRecord::isNull() returns true if all fields in the record are equal to the string "." (called a null record in Humdrum terminology—not related to a NULL pointer in C).




User-specified Options

"myrid -M -C -I" (Handling command-line options)

The Humdrum Extras code contains a helper class called Options which can be used to manage command-line options. The following example program implements the options -M (suppress measure lines), -C (suppress comments), -I (suppress interpretations) in a C++ implementation of the Humdrum Toolkit rid program.

The Options class can be used to define multiple aliases for the same option, such as a short abbreviation and a long form. The options are formulated on the command line according to POSIX rules for options: single-letter options are preceded by a single dash. Multiple-letter options are preceeded by two dashes. When a single-letter option does not require it's own argument, they can be globbed together into a list of options preceded by a single dash. Here are various program usages for the code below:


myrid -M file.krn Remove measure lines when echoing file.krn to standard output.
myrid -M -I -C file.krn Remove measure lines, interpretations and comments (global, local and reference).
myrid -MIC file.krn Same as above. Shorthand for bundling multiple single-letter boolean options.
myrid --no-measures file.krn   Long for of "myrid -M".
myrid --options Secret built-in option for the Option class which will force a list of defined options to be printed to standard output.
myrid -A file.krn The option list will also be displayed when an undefined or misspelled option is used. Use "--" to disable options processing for unusual cases such as a filename starting with a dash.
myrid -MM file.krn Duplicate options are ignored, so only the last -M is used. Note that this is not the option "MM" which would be formulated as "myrid --MM".
myrid -M file.krn -IC Options can occur in any order, and can come before or after any command arguments which are not options.
myride -M -- -file.krn -C Process the poorly named file "-file.krn" and the even more poorly named file called "-C" (which is not an option if it comes after the -- marker.


Note in the following source code, an extra include directive does not need to be added, since the declaration of the Options class is included in humdrum.h. If you want to use the Options class independent of the HumdrumFile parser, you can instead include the file "Options.h".

#include "humdrum.h"
int main(int argc, char** argv) {
   Options opts;
   opts.define("M|no-measures:b", "remove measures");
   opts.define("C|no-comments:b", "remove comments");
   opts.define("I|no-interpretations:b", "remove interpretations");
   opts.process(argc, argv);
   int measuresQ = !opts.getBoolean("no-measures");
   int commentsQ = !opts.getBoolean("no-comments");
   int interpQ = !opts.getBoolean("no-interpretations");
   HumdrumFile hfile(opts.getArg(1));
   for (int i=0; i<hfile.getNumLines(); i++) {
      if (hfile[i].isMeasure() && !measureQ) continue;
      if (hfile[i].isComment() && !commentQ) continue;
      if (hfile[i].isInterpretation() && !interpQ) continue;
      std::cout << hfile[i] << std::endl;
   }
   return 0;
}

The code "HumdrumFile hfile(opts.getArg(1));" reads data from the first argument on the command line. Note that argument counts are indexed from 1 rather than 0. Perhaps not a great thing to do, but was intended to allow for similar behavior with command-line string arrays in C, where the name of the command is stored in array element 0, and the first argument (or option) is stored in array element 1. To access the name of the command, use the Options::getCommand() function.




Option definitions

Notice the Options::define() function calls in the above program. These are used to define the options that an Options variable will search for when the Options::process() function is called. The .define() function takes two arguments (the second one optional). The first argument is the definition string, and the second is a human-readable description of the option.

The option definition string has the basic format:

"OptionName=OptionType:DefaultValue"

The OptionName can include aliases which are added to the Option name, separated by a pipe (|) character:

"OptionName|OptionAlias1|OptionAlias2=OptionType:DefaultValue"

For example:

"M|no-measures=b"

Is the definition of the option "M" or equivalently "no-measures" which is a boolean type (which means that it sets a true/false switch for the option). For boolean options, there is no default value—they are "false" if not given as an argument to the program, and turned to "true" when given as input to a program.

There are four Options data types:

Option type Description Options value access
b boolean (true or false) .getBoolean("OptionName")
i integer .getInteger("OptionName")
d double (floating-point number) .getDouble("OptionName")
s string .getString("OptionName")

In terms of implementation, there are really only two types: booleans (with out parameters) and non-booleans (with parameters). Within a C++ program you can acess the original string form of the option's parameter, or you can convert it into an int or a double at runtime. For example, if an option "number" is defined, you can get the integer version of the number with .getInteger("number"), or the double version of the number with .getDouble("number"), or you can check to see if the option was set from the input arguments to the program with .getBoolean("number").

Here are some example option definitions with option names, option aliases, and option types:

Option definition Command-line examples
"r=b" command -r
"m=i" command -m 10 or command -m10
value=d" command -v 5.23 or command -v5.23
command --value 5.23
command --value=5.23
"t=s" command -t string or command -tstring
command -t "string with spaces"
command -t 'funny $tring'

When options names (or option aliases) are a single character, the space between the option name and it parameter is optional, as in "command -m 10" or "command -m10". When an option has multiple characters, the space is not optional, although an equals sign can be substituted for the space: "command --value 5.23" and "command --value=5.23". When a string option contains spaces, or other special characters reserved for shell syntax, (such as [;&$|?*\]). The multi-word option must be enclosed in quotes. To insert a quote into the string option place a backslash before it: \". To prevent the command-line parser from looking inside of the string use single quotes: "command -t 'funny $tring'". In this case the final input will be "funny $tring". If double quotes were used, $tring would be interpreted as an environmental variable and its value would be substituted, usually resulting in "funny ", since you are not likely to have the shell variable $tring defined.




Default option values

The final component of the option definition is a default value to use if no input is given for that option on the command-line. If no default value is given in the definition, the default value will be zero. For example, if this option definition is given:

   options.define("v|val|value=i:10", "an integer value");

Then here are different behaviors when accessing that option's value in C++:

User-set option:

  program -v 20
     options.getInteger("value")      → 20
     options.getInteger("val")        → 20
     options.getInteger("v")          → 20

Default option:

  program
     options.getInteger("value")      → 10
     options.getInteger("val")        → 10
     options.getInteger("v")          → 10




Accessing option values

As mentioned previously, the .getBoolean, .getInteger, .getDouble and .getString accessor functions are used to extract an option value from the Options database after .process() has been called on the argc and argv input parameters to main(). All of the get functions can be applied to any option type. For example, using the option definition:

   .define("t|tempature=d:80.6 Farenheit", "temperature setting")

can be used to extract any of the four option types in C++:

  .getBoolean("temperature")           → 1 (true) if set via the command-line.
                                       → 0 (false) if not set via the command-line.
  .getInteger("temperature")           → 80
  .getDouble("temperature")            → 80.6
  .getString("temperature")            → "80.6 Farenheit"




Input from piped data or file(s)

Most of the previous program examples expect a single filename as input for processing. The following program example (humecho4 is more flexible, allowing for multiple input files. If no filenames are given, then standard input will be read as the input data:

#include "humdrum.h"
int main(int argc, char** argv) {
   Options options(argc, argv);
   options.process();
   HumdrumFile hfile;
   int numinputs = options.getArgCount();
   for (int i=1; i<=numinputs || i==0; i++) {
      if (numinputs < 1) {
         hfile.read(std::cin); // read from standard input
      } else {
         hfile.read(options.getArg(i));
      }
      // do something with the Humdrum data here:
      std::cout << hfile;
   }
   return 0;
}

This program has an identical function to humecho.cpp, but now multiple files can be read in and processed at the same time. For example if there are two input files with these contents:

         file 1                   file 2
         ========                 =========
         **kern                   **kern
         1c                       2cc
         2d                       4b
         4e                       2a
         *-                       *-

The final output from the above program will be:

**kern
1c
2d
4e
*-
**kern
2cc
4b
2a
*-

Here are some possible command-line realizations for the above program:

  humecho4 file.krn
  humecho4 file1.krn file2.krn file3.krn
  cat file.krn | humecho4
  humecho4

The last command will cause the shell to wait while you type in the input to humecho4, followed by control-D to indicate the end of input data.

Note that the number of command-line arguments (other than options) can be queried from an Options variable by using the .getArgCount() function. If there are three filenames as in "echo4 file1.krn file2.krn file3.krn", then .getArgCount() will return 3. The .getArg() function will return a string for the specified argument, starting with argument 1: .getArg(1) == file1.krn, .getArg(2) == file2.krn, .getArg(3) == file3.krn. Note that the first argument is not .getArg(0). If you want to access the command name, then use .getCommand(), which would return "humecho4" in this case.

When reading from standard input use HumdrumFile::read(istream) rather than HumdrumFile::read(const char*). For example, reading from standard input is done with hfile.read(cin) in the above code.




C string comparison functions

Here are three of the string comparison functions available within in the C (or C++) language:

strcmp("string1", "string2")
returns 0 if strings are equivalent
returns –1 if string1 is alphabetized before string2
returns +1 if string1 is alphabetized after string2.
strncmp("string1", "string2", n)
compare only first n characters of the two strings.
strchr("string", 'character')
returns a pointer to the first occurrence of the character within the string. If the character is not found in the string, returns a NULL pointer.

Other interesting string processing functions in the C language are strstr which is similar to strchr but search for a sub-string within the a string; and strrchr which is similar to strchr but searches for the character in the reverse direction in the string, which returns the last occurrence of the character in the string (or NULL) if the character is not in the string. For more description about these functions, type "man strrchr" in a terminal for more information about the strrchr function (or any other standard C fuction).




Third dimension of data access (Note-level access)

Accessing individual notes in **kern data spines requires three dimensions of indexing: (1) the data line of the note, the data field on the line for the note, and then the note number within a chord for the note. Previous program examples demonstrated how to access lines and line-fields. The following program (noteloc) goes one step further to access individual **kern notes. The program takes any sort of Humdrum file, and then outputs a list of all notes found in all kern spines:

#include "humdrum.h"
int main(int argc, char** argv) {
   Options options(argc, argv);
   options.process();
   HumdrumFile hfile;
   hfile.read(options.getArg(1));
   char buffer[1024] = {0};
   for (int i=0; i<hfile.getNumLines(); i++) {
      if (!hfile[i].isData()) continue; // ignore non-data lines
      for (int j=0; j<hfile[i].getFieldCount(); j++) {
         if (strcmp("**kern", hfile[i].getExInterp(j)) != 0) continue;
         if (strcmp(".", hfile[i][j]) == 0) continue; // ignore null tokens
         int count = hfile[i].getTokenCount(j);
         for (int k=0; k<count; k++) {
            cout << "(" << i+1 <<"," << j+1 << "," << k+1 << ")\t"
                 << hfile[i].getToken(buffer, j, k) << endl;
         } 
      }
   }
   return 0;
}

The line:

if (strcmp("**kern", hfile[i].getExInterp(j)) != 0) continue;

is used to skip over all spines which do not have **kern data. The function .getExInterp() returns a const char* string for the name of the exclusive interpretation. The strcmp() function compares the returns exclusive interpretation name with the string "**kern", and if it does not match, the next data field on the line will be examined. An equivalent way of identifying the exclusive interpretation can be done with the .isExInterp() function. The following line of code is equivalent to the one above:

if (hfile[i].isExInterp(j, "**kern")) continue;

If the input to the program is the following:

**kern	**text	**kern
4C	ig-	4c
4D 4E	-no-	.
4F	-red	.
.	.	4d 4e
4r	.	.
4G 4A 4B	text	.
*-	*-	*-


Then the output from the noteloc program will be:


  (2,1,1) 4C
  (2,3,1) 4c
  (3,1,1) 4D
  (3,1,2) 4E
  (4,1,1) 4F
  (5,3,1) 4d
  (5,3,2) 4e
  (6,1,1) 4r
  (7,1,1) 4G
  (7,1,2) 4A
  (7,1,3) 4B

Each of the three numbers before the note indicates the address within the file for the note, with the first number being the line on which the note occurs, the second number the field on the line which contains the note, and the last number is the note number within the (possible) chord for the note.




kerninfo.cpp (Count **kern notes in data)

Here is an example program which somewhat emulates the "census -k" command from the Humdrum Toolkit. The program will count the number of note attacks, rests and tied notes in one or more Humdrum files.

#include "humdrum.h"
using namespace std;
int main(int argc, char** argv) {
   Options options(argc, argv);
   options.process();
   HumdrumFile hfile;
   int restcount   = 0;
   int nullcount   = 0;
   int attackcount = 0;
   int tiedcount   = 0;
   int chordcount  = 0;
   for (int arg=1; arg <= options.getArgCount() || arg == 0; arg++) {
      if (options.getArgCount() == 0) {  hfile.read(cin); } 
      else { hfile.read(options.getArg(arg)); }
      char buffer[1024] = {0};
      for (int i=0; i<hfile.getNumLines(); i++) {
         if (!hfile[i].isData()) continue;
         for (int j=0; j<hfile[i].getFieldCount(); j++) {
            if (!hfile[i].isExInterp(j, "**kern")) continue;
            int count = hfile[i].getTokenCount(j);
            if (count > 1) chordcount++;
            for (int k=0; k<count; k++) {
               hfile[i].getToken(buffer, j, k);
               if (strchr(buffer, 'r') != NULL)   { restcount++; } 
               else if (strcmp(buffer, ".") == 0) { nullcount++; } 
               else if (strchr(buffer, '_') != NULL) { /* ignore */ }
               else if (strchr(buffer, ']') != NULL) { tiedcount++; } 
               else { attackcount++; }
            }
         }   
      }
   }
   cout << "Note attacks: " << attackcount << endl;   
   cout << "Tied notes  : " << tiedcount   << endl;
   cout << "Chords      : " << chordcount  << endl;   
   cout << "Rests       : " << restcount   << endl;
   cout << "Null Tokens : " << nullcount   << endl;   
   return 0;
}

Trying out the kerninfo prorgram on this input data:

**kern	**text	**kern
4C	ig-	4c
4D 4E	-no-	.
4F	-red	.
.	.	4d 4e
4r	.	.
4G 4A 4B	text	.
*-	*-	*-

Results in these statistics:

  Note attacks: 10
  Tied notes  : 0
  Chords      : 3
  Rests       : 1
  Null Tokens : 5

Trying out the kerninfo program on a real piece of music:

  kerninfo h://wtc/wtc1p04.krn
  Note attacks: 675
  Tied notes  : 85
  Chords      : 14
  Rests       : 69
  Null Tokens : 967

Convert class

In addition to the Options class, and important helper class in HumdrumExtras is the Convert class. This class handles most conversions between data types. The HumdrumFile class essentially stores a two-dimensional array of strings. The **kern notes in a HumdrumFile variable are extracted as strings, but will need to be interpreted further depending on the information about the note which you need. For example, to convert a **kern note into a MIDI note number, use the following Convert function:

  Convert::kernToMidiNoteNumber("4d-")          →  61

Likewise, the MIDI note 61 can be converted back into a **kern note:

  Convert::midiNoteNumberToKern(buffer, 61)     →  "c#"

All access to Convert class functions is done statically, so you can shorten the code by using the a typedef for Convert to a shorter name:

   typedef Convert C;
   C::kernToMidiNoteNumber("4d-");

Convert **kern note name to MIDI

The following program will convert the first note of every chord into a MIDI note number. As an exercise, adjust the code so that it prints a MIDI note number for every note in the chords.

#include "humdrum.h"
int main(int argc, char** argv) {
   Options options(argc, argv);
   options.process();
   HumdrumFile hfile(options.getArg(1));
   for (int i=0; i<hfile.getNumLines(); i++) {
      if (!hfile[i].isData()) continue;
      for (int j=0; j<hfile[i].getFieldCount(); j++) {
      if (hfile[i].isExInterp(j, "**kern")) continue;
      if (strcmp(".", hfile[i][j]) == 0) continue; // ignore null tokens
      if (strchr(hfile[i][j], 'r') != NULL) continue; // ignore rests
         cout << hfile[i][j] << "\t" << Convert::kernToMidiNoteNumber(hfile[i][j]) << endl;
      }
   }
   return 0;
}

Example input and output:

**kern	**text	**kern
4C	ig-	4c
4D 4E	-no-	.
4F	-red	.
.	.	4d 4e
4r	.	.
4G 4A 4B	text	.
*-	*-	*-
  4C        48
  4c        60
  4D 4E     50
  4F        53
  4d 4e     62
  4G 4A 4B  55