Josquin Humdrum encoding standards

From CCARH Wiki
Jump to: navigation, search

1 Reference records

Reference records are lines in Humdrum data which start with three exclamation marks followed by a reference key, then a colon, and then the value for the reference key. The standard Humdrum reference record keys are listed on Appendix 1 of the Humdrum Users' Guide

1.1 Header reference records

Reference records which occur before the start of the data.

1.1.1 Composer Information

Composer-specific reference records in Humdrum typically start with the letter "C". Here are the standard Humdrum reference records related to composers which are used in the JRP Humdrum encodings of the music:

Abbreviation Description
COM Composer's name
COA Attributed composer
CDT Composer's dates (birth-death)

COM is used when the identity of the composer is reasonably certain; otherwise, COA is used to indicate uncertainty in the assignment of the composer. Works can contain multiple COM and/or COA records. Typically there will be only one COM record, but it is possible to have more than one when the work has multiple composers (such as a collaboration). Usually when there are more than one COM/COA entry, there will be one COM record indicating the true or most likely composer, and a list of COA records listing other composers who have been attributed to this work in some original edition but is no longer thought to be the composer of the work. The first COM/COA record in the file is intended to be the most reasonable attribution of a composer for a file (so COM records are typically placed before COA records). For clarity if there are more than one COM or COA record, the secondary COM/COA records may have an optional numeral appended after the COM/COA tag.

A CDT (Copmoser's dates record) should Immediately following each COM/COA record. If the COM/COA record has an optional number attached to it, the CDT tag should optionally have the same number.

For works composed or attributed to Josquin des Prez, there is an comment in the file trailer called "!!attribution-level:" which categorizes the attribution quality for the work (see the reference record trailer section below for more details). Other composers in the JRP database do not posses such an attribution comment.

Example composer-related reference records:

  !!!COM: Josquin des Prez

COM is the composer of the work. COM is used with the work is securely attributed to Josquin as the composer.

  !!!COA: Josquin des Prez

COA is the attributed composer of the work. When the authorship is in question, COA is used to give the suspected composer. When there is both a COM and COA entry in the file, then the COA is the spurious composer.

  !!!CDT: ~1450-1521/08/27

CDT is the Composer's birth and death dates. Josquin was born in approximately 1450 and died 27 August 1521. Note all dates in Humdrum reference records are in the form year/month/day[/hour:minute:second]. A tilde character before a number such as the year means circa, or an approximate date.

1.1.2 Title Information

Work-related reference records in Humdrum typically start with the letter "O" which stands for Opus. Here are the work-related reference records used in JRP Humdrum scores:

Abbreviation Description
OTL Work title
OPR Parent work title
OMD Movement designation
ONM Movement number

In the JRP database, the titles of motets and songs are stored in an OTL record. The titles of Masses are stored in OPR records, while the name of the mass sections are stored in OTL records. For motets and other multi-section works, there may be multiple OTL records followed by numerals which indicate the title (text incipit) for each section. The language of the title may be explicitly indicated by using a @@ modifier after a bibliographic key. For example "!!!OTL@@FRE" means that the title is in French, and that French is the original language of the title (as opposed to a translation which is indicated by a single @ marker. Other language codes currently used in the JRP music database are LAT for Latin and GER for German. The genre typically dictates the language within the JRP database: masses and motets are in Latin; secular songs are in French. Ideally all languages would be indicated so that automatic analysis by original text language can be done.

The ONM record is used to indicate the section sequence for mass movements. For example, the standard mass sequence is (1) Kyrie, (2) Gloria, (3) Credo, (4) Sanctus, and (5) Agnus Dei.

Example reference records related to work titles:

  !!!OPR: Missa Ave maris stella

When the data file contains a Mass section, it is treated as an independent work, with one section to a file. The OPR (opus parent) record will give the name of the mass, while the OTL record the name of the Mass section (Kyrie, Gloria, Credo, Sanctus, and Agnus Dei).

  !!!ONM: 1

For Masses, ONM (opus number) is the order of the main segmentation of the mass, which is typically:

  1. Kyrie
  2. Gloria
  3. Credo
  4. Sanctus
  5. Agnus Dei
  !!!OTL: Kyrie

OTL (opus title) stores the title of the work/movement represented in the file. The title for Mass sections will be stored in an OTL record. For Motets, the first line(s) of the vocal text will be used as the title.

  !!!OTL@@LAT: Sancta mater istud agas

When an OTL record contains two at signs (@@) (or any reference record), a 3-letter ISO 639-2 language code will follow. This language is either the primary language to use for the reference record, or it may also mean the original language in the case where translations are given in other reference records. The language code should be in upper case. In this case LAT means Latin.

  !!!OTL@ENG: Holy Mother! pierce me through

When a reference record is only followed by a single at sign (@), the language code which follows is a language into which the reference value has been translated into. In this case the title is translated into English, and also it implies that English was not the original language for the reference record.

1.1.3 Other Header Information

Other reference records included in the header include the genre category, scholarly catalog information and the number of voices in the music. Here are examples of each type of record:

  !!!AGN: Mass; Kyrie

AGN (analytic genre) is used to encode the genre within the data. This is useful for selecting or grouping works by genre for analysis. Multiple analysis genres are separated by semicolons, such as in this case where the work is a part of a mass, but is more specifically a kyrie.

Example list of genres and subgenres which are typically found in the data:

  • Masses. Below are typical examples of genre and subgenre entries for masses in the JRP database. When a full mass is not extant, the JRP database file for the work with have the primary genre designation "Mass section" rather than "Mass". This is used to separate mass sections from full masses on the JRP websites's work listings. Mass sections may also only contain part of a mass section, such as the Crucifixus of a Credo section. Missa Brevis is indicated as a subgenre for masses, this indication typically comes after the subgenre label for the mass section.
  !!!AGN:	Mass section; Credo
  !!!AGN:	Mass section; Credo; Crucifixus
  !!!AGN:	Mass section; Gloria
  !!!AGN:	Mass section; Kyrie
  !!!AGN:	Mass section; Sanctus
  !!!AGN:	Mass; Agnus Dei
  !!!AGN:	Mass; Agnus Dei; Missa Brevis
  !!!AGN:	Mass; Communion
  !!!AGN:	Mass; Credo
  !!!AGN:	Mass; Gloria
  !!!AGN:	Mass; Gradual
  !!!AGN:	Mass; Introit
  !!!AGN:	Mass; Kyrie
  !!!AGN:	Mass; Kyrie; Agnus Dei
  !!!AGN:	Mass; Offertory
  !!!AGN:	Mass; Requiem; Gradual
  !!!AGN:	Mass; Requiem; Introit
  !!!AGN:	Mass; Requiem; Kyrie
  !!!AGN:	Mass; Requiem; Offertory
  !!!AGN:	Mass; Requiem; Tract
  !!!AGN:	Mass; Sanctus

  • Motets. Here are typical examples of genre and subgenre entries for motets in the JRP database:
  !!!AGN:	Motet
  !!!AGN:	Motet; Hymn
  !!!AGN:	Motet; Motet cycle
  !!!AGN: Chanson; Motet-Chanson
  • Songs. The genre-designation is currently "Chanson" for songs. This may be changed in the future to "Song; Chanson" for Chansons, if there is a need to distinguish French songs from songs in other languages (such as Italian or German).
  !!!AGN:	Chanson
  !!!AGN:	Chanson; Ballade
  !!!AGN: Chanson; Motet-Chanson
  !!!AGN:	Chanson; Rondeau
  !!!AGN:	Chanson; Virelai

  • Fragments. Although not strictly a genre, "Fragment" is listed as the primary genre if the work is significantly incomplete. This is useful to filter out incomplete works which may disturb the analysis results when complete works are expected as input to some analytic algorithm. The Fragment marker in the AGN record is also used to control the grouping of fragments in the works list on the JRP website.
  !!!AGN:	Fragment; Motet

SCT reference records are used to indicate "scholarly catalog numbers" in Humdrum, such as KV numbers for Mozart, or Hoboken numbers for Haydn. For the music of Josquin des Prez, the volume number and work enumeration within the volume are used in reference to the New Josquin Edition. For other composers, the JRP catalog number is given within the SCT record. SCA records contain the unabbreviated scholarly catalog number (useful for cases where the abbreviation would not be clear outside of a specialized field).

  !!!SCT: NJE 3.1

SCT is a scholarly catalog number, in abbreviated form. In this case "NJE" stands for the "New Josquin Edition", "3" stands for the third volume of the series, and "1" indicates that the work is the first one (or a section of the first one in this case) in the volume.

  !!!SCA: New Josquin Edition 3.1

SCA is an unabbreviated scholarly catalog number.

The !!!voices: record is used to indicate the number of voices in the original score. For the most part, this will match the number of parts found in the Humdrum data, but may be different if there is a missing part. In addition, the !!!voices: record is used to describe the number of voices for the complete mass when given in mass-section files, which may contain sections with different numbers of voices. This record is used to list the number of voices in the work lists on the JRP website. More complex example include:

   !!!voices: 4-5

The (parent) work contains sections for four voices and for five voices.

   !!!voices: 4(-5)

The work is for four voices, but there are some subsections where a part splits into two, making the total texture have five voices.

   !!!voices: 5?

The work was probably for 5 voices, but one (or more) parts are lost.

   !!!voices: ?

The original number of voices for the composition are unknown (usually for a fragment from a single part).

  !!!voices: 4

voices is the number of voices in the work/section. This value is usually the same as the number of spines (columns) in the Humdrum data, but is occasionally different, such as in the case where one of the original parts has been lost.

1.2 Trailer reference records

Reference records which occur before the start of the data:

  !!!RDF**kern: l=long note in original notation

RDF**kern is an encoding emendation for the **kern data type. In this case the character "l" when found in **kern data is used to represent a terminal long note, which should be displayed as a long note regardless of it

  !!!RDF**kern: i=musica ficta

This encoding emendation indicates that the character "i" found in **kern data is used to indicate a musica ficta accidental which is not found in the original score, but was implicitly applied by performers in Josquin's time (as assumed by the modern editor).

  !!!RDF**kern: %=rational rhythm

When this encoding emendation is used, it indicates the presence of a non-backwards compatible **kern rhythm code. There are two rhythmic values which cannot be represened in standard **kern:

  1. triplet whole notes (semibreves) (2/3rds of a whole note) represented in the data as "3%2" which is the inverse of 2/3.
  2. triplet breves (double whole notes) (4/3rds of a whole note) represented in the data as "3%4" which is the inverse of 4/3. This record can be used to ascertain whether the rhythmic data can be correctly processed with the Humdrum toolkit without pre-processing.
  !!!rscale-alt: 1/2

When this reference record is present (always with an !!!RDF**kern: %=rational rhythm emendation, this is the rhythmic scaling which is necessary in order to allow the note rhythms to be fully understood by the standard Humdrum Toolkit commands. You may use the Humdrum Extras program rscale to alter the rhythms in the file to make them compatible with the standard Humdrum Toolkit commands:

   rscale -a file.krn | census -k

To convert data back into the original rhythmic values, use the -o option:

   rscale -a file.krn | rscale -o

When the data contains a triplet whole note ("3%2"), the rscale alternate entry should be at least 1/2 to shift the triplet whole note to the triplet half-note rhythmic level. If the data contains a triplet breve ("3%4"), the rscale alternate entry should be at least 1/4 to shift the triplet breve to the triplet half-note rhythmic level.

  !!!RDF**kern: V=start of tuplet marker
  !!!RDF**kern: Z=end of tuplet marker

Triplet markings in the data are explicitly indicated in the data with V for the start of a tuplet marking, and Z as the end of the tuplet marking. These specific characters may change in the future, but the rest of the emendation line will remain the same. The explicit tuplet markings are needed to differentiate between isolated triplets (such as might originally be written in red notation in mensural notation) from notes marked with the mensural sign "3". The "3" sign essentially is a shorthand for a longer section of triplets, and the triplet brackets are not displayed when a "3" mensural sign is in effect.

  !!!ENC: Jesse Rodin; Victoria Chang 2011/04/01/

ENC is the encoder (transcriber) of the work in Finale from a printed score using a MIDI keyboard. If there is more than one encoder who generated the data their names will be separated a semi-colons (;).

  !!!END: 2011/04/01/

END is the primary encoding date, typically the date on which the encoding was finished. May be estimated if the original encoding date was not recorded.

  !!!EED: Jesse Rodin

EED is the electronic editor. This is the person to complain to about the content of the data, such as a wrong note or questionable editorial decision.

  !!!EEV: 2011/04/06/

EEV is the electronic edition version. When the data is change (such as to fix an errata), this date should be changed to the current date.

  !!!ONB: Translated from MusicXML and edited on 2011/04/06/ by Craig Sapp

ONB stands for opus nota bene which is a free-form note. In this case there will be a reference record in this form in the bottom reference records which indicates when the musical data was extracted from the MusicXML data exported from Finale (Finale versions 2009, 2010, or 2011). If the MusicXML data file representing the same data has an encoding date later than this date, then there may be differences between the two versions of the music. However, if the EEV date is later than the encoding date within the MusicXML file, it is likely than any fix made to the MusicXML data was also manually made to the Humdrum file.

 !!onb: tie removed from Tenor voice G in m97 [2011/04/06/]

Ocassionally there will be !!onb: comments which are used to store errata fixes made to the data. These are primarily used to coordinate with the MusicXML files created in Finale. When the coordination between the two files is secure, these errata records may be deleted.

1.3 Intermediate reference records


The OMD reference record (opus movement designation) may be found within the data as well as at the top of the data. When it is found within the data, it is used to print the section name in the graphical music notation. The first OMD which may be found in the header record section, or within the data interpretations will be displayed at the start of the music.

2 Mensural signs

3 Music ficta

4 Terminal longs

5 Augmentation and diminution

6 Metrical considerations

Mensuration signs of the original music are encoded in the scores. These mensuration signs are displayed instead of time signature in the graphical music notation generated from the data. The following table lists the mensuration signs used in the first 150 Josquin works that have been encoded.

mensuration sign
(Humdrum name)
mensuration sign
(CamelCase name)
relative frequency
in scores
modern equivalents
*met(C|) menCutC 61.0% 2/1
*met(O) menCircle 14.4% 3/1
*met(3) men3 7.5% 2/1 (when used as triplet shorthand), 3/1 (when simulating triplets)
* (null mensuration) 6.5%
*met(C3) menC3 1.67% 3/1, found in masses (sanctus) only.
*met(O/3) menOover3 1.61% 2/1, 3/1, functions like *met(3)
*met(C) menC 1.50% 2/1
*met(O2) menCircle2 1.16% 2/1
*met(O|) menCutCircle 1.05% 3/1, masses only
*met(C|3) menCutC3 1.00% 3/1, (in 0402d, 1202d, 2313, 2803)
*met() (empty mensuration used to suppress new time signature display) 1.16%
*met(O.) menCircleDot 0.61% 3/1, masses only
*met(C2) menC2 0.50% 2/1, (in 1401, 2901)
*met(C|2) menCutC2 0.28% 2/1, (in 0802b, 2403)
*met(2) men2 0.28% 4/1, 6/1 (in 0402e, 1307)
*met(C.) menCDot 0.11% 3/1, masses only
*met(3/2) men3over2 0.055% 3/1, (in 9001e, 1903)

In addition to the mensuration signs, time signatures must always be encoded along with the mensuration signs. The time signatures should occur on the previous line in the data files above the mensuration signs. The time signature should contain the tactus rhythmic value in the bottom of the signature, most commonly "1" for semi-breve (whole note) and "0" for breve (double whole-note). The top of the time signature should contain the number of tactuses within the measure (which is usually the duration of a long).

6.1 primary mensuration

When different parts have different mensurations at the same time, a global comment can be added at that point in the score to indicate the primary mensuration for performance tempo determination. For example, if three out of for parts are in Cut-C and one is in C, then the global record starting with "!!primary-mensuration:" and followed by the main mensuration used to determine the tempo of the following music. For example:

  *M2/1      *M2/1     *M2/1      *M2/1
  *met(C|)   *met(C)   *met(C|)   *met(C|)
  !!primary-mensuration: met(C|)

7 Segmentation

8 Conversion Notes

Outline of the conversion process from MusicXML into Humdrum:

  1. Data files are received in three formats (1) MusicXML file, (2) PDF file showing intended graphical notation of music, (3) original data file from music editor used to generate MusicXML file (such as a SCORE/Finale .mus file or a Sibelius .sib file).
  2. MusicXML data is converted into Humdrum with the xml2hum program
  3. The raw conversion is processed with the jrpize.cpp program
  4. The previous two steps are typically handled automatically by this PERL script which also inserts reference records.
  5. The Humdrum file is manually checked for errors in the automatic translation.
  6. The Humdrum file is converted to graphical music notation with muse2ps and manually checked for expected notation rendering.
  7. A !!muse2ps: record is added to indicate the formatting information needed to produce decent automatic formatting of the music (spacing between staves, density of the music, title, composer information at start of music).
  8. The Humdrum file is stored in the database (in the kern directory for a particular composer or collection
  9. The work-info page is displayed. This automatically generates an incipit image. If the incipit is too short or too long, an !!!incipit: direction is added, the previous incipit is deleted and the work-info page is reloaded.
  10. The entry for the work in the main work list is checked to make sure it has the correct composer, genre and voice information. For Josquin compositions, the attribution-level is checked (displayed as a color code in the work list).