Meta-information about MARC: an XML framework for validation, explanation and help systems

This page works as a support for the paper Meta-information about MARC: an XML framework for validation, explanation and help systems, JOAQUIM DE CARVALHO (BookMARC/University of Coimbra, Coimbra, Portugal), MARIA INÊS CORDEIRO (Art Library, Calouste Gulbenkian Foundation, Lisbon, Portugal), ANTÓNIO LOPES (BookMARC, Coimbra, Portugal) and MIGUEL VIEIRA (BookMARC, Coimbra, Portugal).
The paper was published by the Library Hi Tech Journal.

A diagram of the flow of transformations

Diagram

An XML schema for MARC21DOC rules

The purpose of the XML scheme for MARC is to provide a formalism for the representation of MARC rules and human oriented information currently held in MARC manuals. The scheme will allow not only HTML, PDF, Windows HELP version of the MARC manual, all generated automatically, but also the production of stylesheets for record transformation with validation or decoding purposes.

See MARC21DOC schema.

Samples

Download the samples described in the following sections.

Generation of valitation stylesheets

The MARC21Validation.xsl stylesheet is automatically generated by the MARC21ValidationGenerator.xslt stylesheet. This stylesheet uses the information contained in the MARC21DOC.xml file to build validation rules to create the validation stylesheets.

The MARC21Validation.xsl has the same structure and functionality of the LoC validation stylesheet, with the following differences and additional features:

  • It's automaticaly generated from an XML version of the MARC 21 manual.
  • Handles control fields.
  • Handles positional fields.
  • Checks for mandatory fields, both control and data fields.
  • Creates variables to hold the content of the current field and the value of the leader. These variables are used for tests to decide which rules to apply.
  • Handles ranges of legal values in positional fields tests and indicator values.
  • Uses vocabularies of acceptable values in positional field testing.
  • Adds templates for positional fields validation:
    • validatePSubfield
    • checkPSubfieldValue
    • checkRangeCode
    • generatePSubfieldError

Generation of decoding stylesheets

The EnglishFormat.xsl stylesheet is automatically generated by the EnglishFormatGenerator.xslt. This stylesheet uses the information contained in the MARC21DOC.xml file to create the decoding stylesheets.

Our EnglishFormat.xsl has the same structure and functionality as the LoC one, and some additional features:

  • It's automaticaly generated from an XML version of the MARC 21 manual.
  • Handles control fields.

HTML formating of rules

The MARC21DOCtoHTML.xsl stylesheet transforms the XML version of the MARC 21 manual in an HTML document for referencial purposes.

How to use the samples

Requires Java VM 1.3 or later.

At the DOS prompt or Linux/Unix terminal:

  1. Decompress the archive into a directory. This will be called the MARCDOC_DIR.
  2. Change dir to MARCDOC_DIR.
  3. Three sub-directories have been created:
    • bin : java libraries and two short scripts for testing: transform.sh (unix) and transform.bat (windows)
    • doc: documentation, specially these notes and representation of the scheme in file xsd.html
    • src: sample MARC21 manual in XML and various stylesheets:
      • EnglishFormatGenerator.xslt --> generates a slim to english stylesheet
      • MARC21DOC.dtd --> DTD for the MARC manual
      • MARC21DOC.xml --> Sample of MARC manual
      • MARC21DOC.xsd --> Scheme for the MARC manual
      • MARC21DOCtoHTML.xsl --> generates a html version of the manual
      • MARC21ValidationGenerator.xslt --> generates a stylesheet for validation
      • sandburg.xml --> sample record
  4. Examples (run from MARCDOC_DIR):

    The transform.sh and transform.bat are simple scripts that call the command line interpreter of the Saxon java XLST processor. They take three arguments: XML document, XLS stylesheet, output file name. They will apply the stylesheet to the XML document and save the result in the output file.

    1.    Generate a HTML version of the MARC manual:
    ./bin/transform.sh src/MARC21DOC.xml src/MARC21DOCtoHTML.xsl MARC21DOC.html
    2.    Generate a StyleSheet for decoding a record in english
    ./bin/transform.sh src/MARC21DOC.xml src/EnglishFormatGenerator.xslt englishFormater.xsl
    3.    Now EnglishFormater.xsl is able to render a record in HTML:
    ./bin/transform.sh src/sandburg.xml englishFormater.xsl english.html
    Here you can try to change something in MARC21DOC.xml and redo steps 2 and 3 to see how a change in the documentation is reflected in the record rendered in HTML.
    4.    Generate a validation stylesheet from the MARC manual:
    ./bin/transform.sh src/MARC21DOC.xml src/MARC21ValidationGenerator.xslt validator.xsl
    5.    Now validator.xsl is capable of validating specific records
    ./bin/transform.sh src/sandburg.xml validator.xsl errors.xml

Additional examples

This set of additional examples is intended to demonstrate the validator.xsl capabilities regarding error detection. In order to test these examples several errors are going to be inserted in the sandburg.xml file.

Use the command: ./bin/transform.sh src/sandburg.xml validator.xsl errors.xml to validate the XML file.

  1. Test mandatory control fields and unknown fields
    Replace the 001 control field with:
    <controlfield tag="999">   92005291 </controlfield>
    Report after replacing the 001 control field tag with 999:
    <error type="MandatoryControlfield" tag="001"/>
    <warning type="UnknownControlfieldTag">
      <controlfield xmlns="http://www.loc.gov/MARC21/slim" tag="999">92005291</controlfield>
    </warning>
  2. Test subfield codes
    Replace the 040 data field with:
    <datafield tag="040" ind1=" " ind2=" ">
      <subfield code="a">DLC</subfield>
      <subfield code="c">DLC</subfield>
      <subfield code="x">DLC</subfield>
    </datafield>
    Report after changing the subfield code d to x in data field 040:
    <error type="InvalidSubfieldCode" tagID="d0e43">
      <code>x</code>
    </error>
  3. Test positional contents
    Replace the 008 control field with:
    <controlfield tag="008">920219X1993    caua   j      000 0 eng  </controlfield>
    Report after changing the Type of date/Publication status, of field 008, to X:
    In this case, besides showing the error informatioin, also shows the acceptable values for the positional field content.
    <error type="InvalidPSubfield">
      <field tag="008" start="6" end="6">
        <invalid>X</invalid>
        <content>920219X1993    caua   j      000 0 eng</content>
        <vocabulary>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="b" name="No dates given; B.C. date involved">
            <DESCRIPTION>
              Each character position in fields 008/07-10 and 008/11-14 contains a blank (#).
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="c" name="Serial item currently published">
            <DESCRIPTION>
              008/07-10 contain the beginning date of publication; 008/11-14 contain 9999.
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="d" name="Serial item ceased publication">
            <DESCRIPTION>
              008/07-10 contain the beginning date of publication; 
              008/11-14 contain the ending date of publication.
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="e" name="Detailed date">
            <DESCRIPTION>
              008/07-10 contain the year and 008/11-14 contain the month and day, 
              recorded in the pattern mmdd.
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="i" name="Inclusive dates of collection">
            <DESCRIPTION/>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="k" name="Range of years of bulk of collection">
            <DESCRIPTION/>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="m" name="Multiple dates">
            <DESCRIPTION>
              008/07-10 usually contain the beginning date and 008/11-14 the ending date.
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="n" name="Dates unknown">
            <DESCRIPTION>
              Indicates that the dates appropriate for 008/07-10 and 
              008/11-14 are unknown (e.g., when no dates are given in field 260).
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="p" 
            name="Date of distribution/release/issue and 
              production/recording session when different">
            <DESCRIPTION/>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="q" name="Questionable date">
            <DESCRIPTION>
              008/07-10 contain the earliest possible date; 
              008/11-14 contain the latest possible date.
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="r" name="Reprint/reissue date and original date">
            <DESCRIPTION>
              008/07-10 contain the date of reproduction or reissue; 
              008/11-14 contain the date of the original, if known; 
              008/11-14 contain code u ("uuuu"), if unknown.
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            code="s" name="Single known date/probable date">
            <DESCRIPTION>
              008/07-10 contain the date; 008/11-14 contain blanks (####).
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="t" name="Publication date and copyright date">
            <DESCRIPTION/>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="u" name="Serial status unknown">
            <DESCRIPTION>
              008/07-10 contain the beginning date of publication; 
              008/11-14 contain code u ("uuuu").
            </DESCRIPTION>
          </ITEM>
          <ITEM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            code="|" name="No attempt to code">
            <DESCRIPTION/>
          </ITEM>
        </vocabulary>
      </field>
    </error>