The ZeeRex DTD Reference Guide

22nd March 2002

Author: Rob Sanderson <azaroth@liverpool.ac.uk>
Editor: Mike Taylor <mike@miketaylor.org.uk>

[NOTE: the Reference Guide is out of date and may be deleted in the future, as the commentary provides a more digestible and up to date form of essentially the same information.]


ZeeRex records are contained within <explain>...</explain> tags. This document briefly discusses all of the elements and attributes that may be contained within that top-level element.

1. The authoritative attribute
2. The id attribute
3. The <serverInfo> element
        3.1. The protocol attribute
        3.2. The <host> element
        3.3. The <port> element
        3.4. The <database> element
                3.4.1. The numRecs attribute
                3.4.2. The lastUpdate attribute
        3.5. The <authentication> element
                3.5.1. The <user> element
                3.5.2. The <group> element
                3.5.3. The <password> element
4. The <databaseInfo> element
        4.1. The <title> element
                4.1.1. The primary attribute
                4.1.2. The lang attribute
        4.2. The <description> element
        4.3. The <author> element
        4.4. The <contact> element
        4.5. The <extent> element
        4.6. The <history> element
        4.7. The <langUsage> element
                4.7.1. The codes attribute
        4.8. The <restrictions> element
        4.9. The <subjects> element
                4.9.1. The <subject> element
5. The <metaInfo> element
        5.1. The <dateModified> element
        5.2. The <aggregatedFrom> element
        5.3. The <dateAggregated> element
6. The <indexInfo> element
        6.1. The <index> element
                6.1.1. The id attribute
                6.1.2. The search attribute
                6.1.3. The scan attribute
                6.1.4. The sort attribute
                6.1.5. The <title> element
                6.1.6. The <map> element
                        6.1.6.1. The primary attribute
                        6.1.6.2. The <attr> element
                                6.1.6.2.1. The attributeSet attribute
                                6.1.6.2.2. The type attribute
                        6.1.6.3. The <name> element
                                6.1.6.3.1. The attributeSet attribute
                                6.1.6.3.2. The type attribute
        6.2. The <sortKeyword> element
7. The <recordInfo> element
        7.1. The <recordSyntax> element
                7.1.1. The name attribute
                7.1.2. The <elementSet> element
                        7.1.2.1. The name attribute
                        7.1.2.2. The <title> element

1. The authoritative attribute

Used to specify whether or not this record should be treated as the canonical version of the description of this database. This may only be set to true if the author of the record created it with full knowledge of the database which it describes. If the record is later aggregated into another collection, then this attribute must be reset to false.

2. The id attribute

3. The <serverInfo> element

Contains the basic information required for any connection to the database such as protocol, host, port and database name.

3.1. The protocol attribute

Contains the name of the protocol to be used with the server described in the record. The default value is Z39.50 and therefore if this attribute is not present, then it may be validly assumed that this is the protocol to use. Other legal values are: SRW and SRU.

3.2. The <host> element

Should contain the primary symbolic name of the server where the database is located. If the server does not have a name which will resolve via DNS, then the numeric IP address should be given.

3.3. The <port> element

Should contain the port on which to connect to the server.

3.4. The <database> element

The name of the database should be specified in this element. If the protocol is not Z39.50, then this element may be used to record the remainder of the URL to identify the server. For example, an SRW service available at http://redbelly.kb.nl/cgi-zoek/srw.pl would be specified with

3.4.1. The numRecs attribute

If present, should contain the number of records in the database.

3.4.2. The lastUpdate attribute

Should contain the date at which the database was last updated.

All dates in ZeeRex records should use the ISO standard date format, specified in ISO 8601. If you've got money to burn, you could buy a paper copy at approximately 1 per page; or you could download a free copy of the final draft before the published version. But everything you really need to know about ISO date format can be found in Markus Kuhn's very helpful Summary of the International Standard Date and Time Notation. (Thanks to Barbara Shuh <barbara.shuh@nlc-bnc.ca> for these references.)

(Actually, all you really need to know is contained in this one example: 1998-03-18 15:02:34)

3.5. The <authentication> element

Used to specify authentication information for the database. This may seem a stupid thing to do: you might think that a database which is prepared to make its authentication information public might just as well not require authentication at all. Nevertheless, there are databases which require authentication tokens to be sent but make those tokens public.

The <authentication> element may contain a simple ``open authentication token'' or zero or more of the following subelements:

3.5.1. The <user> element

3.5.2. The <group> element

3.5.3. The <password> element

4. The <databaseInfo> element

Title is used in several different contexts. In the databaseInfo element it is the human-readable title of the database.

4.1. The <title> element

4.1.1. The primary attribute

Should be set to true if there is one version of the title which should be used unless the client has a reason not to. For example, the record creator might wish for the English version of the database title to be displayed unless the client requests a specific language.

4.1.2. The lang attribute

Should contain the two-letter code for the language contained within the element, as defined in RFC 1766 (H. Alvestrand. RFC 1766: Tags for the Identification of Languages. March 1995, available at ftp://ftp.uu.net/inet/rfc/rfc1766.Z)

4.2. The <description> element

Should contain a description of why this database might be of interest. Anything which does not fit under the other fields in <databaseInfo> may also be put into this element.

4.3. The <author> element

Should contain the name of the person or organisation to be credited with the creation of the database.

4.4. The <contact> element

Contact, as opposed to author, should contain information on a contact person for the database. This should include at least a name and some form or address, either electronic or postal.

4.5. The <extent> element

Used to describe the completeness of the database, or the range of material that is included in it. For example a database which contained all the emails sent to a mailing list would be considered complete. If this database only maintained a smaller subset of the emails, then it should be noted in this element.

4.6. The <history> element

Used to record any information which is considered useful regarding the history of the database. This might include the sponsors for its creation, or significant moments in its history.

4.7. The <langUsage> element

Used to record the languages used in the database records (as opposed to that used in the ZeeRex records). If it is wished that this be searchable, then the codes attribute should contain the two-letter language codes, separated by spaces if there is more than one.

4.7.1. The codes attribute

4.8. The <restrictions> element

Used to record any usage or availability restrictions concerning the database or its contents. For example it might contain information regarding the copyright status of the records, or an indication that the database is only available between certain hours.

4.9. The <subjects> element

This is a wrapper element for the <subject> element, which may be used to record controlled vocabulary subjects for the database. Such subjects might be drawn from the Library of Congress Subject Headings or another appropriate thesaurus.

4.9.1. The <subject> element

5. The <metaInfo> element

The elements within <metaInfo> are metadata concerned with the record itself, rather than about the database that it describes.

5.1. The <dateModified> element

Contains the date at which the record was created or last modified. This should be updated every time the record is changed by the owner. (An aggregator changing the authoritative attribute does not constitute a change which should be recorded in this element.)

5.2. The <aggregatedFrom> element

Should contain enough information for a third party to retrieve the original, authoritative record. The contents should be in the form of a URL, using the z39.50r specification for Z39.50 servers or the appropriate form for other protocols.

For information about z39.50r URLs, see RFC 2056 (Denenberg, Kunze and Lynch. RFC 2056: Uniform Resource Locators for Z39.50. November 1996, available at lcweb.loc.gov/z3950/agency/defns/rfc2056.html

5.3. The <dateAggregated> element

Should contain the date at which the record was aggregated from the source recorded in the above element.

6. The <indexInfo> element

The indexInfo section is where all of the ways in which the server may be interogated are recorded. These are recorded using <index> and <sortKeyword> elements.

6.1. The <index> element

An index is an abstract concept which represents a single type of search. For example an author keyword search is one index, whereas an exact author search is a second index. The sort, scan and search attributes on this element record which of the functions are available using it.

6.1.1. The id attribute

6.1.2. The search attribute

6.1.3. The scan attribute

6.1.4. The sort attribute

6.1.5. The <title> element

<title> within the <index> element should record the title of the index. For example ``Exact Author''. As with all textual fields, it may be repeated and hence has the lang and primary attributes.

6.1.6. The <map> element

Contains the protocol-specific information on how to request a search of a particular index. If the primary attribute is set to true, then this implies that unless the client has a reason to use a different map of the same index, then the specified map should be used. Only one map may be set as primary. If no map of an index is marked as primary, then the client should decide by itself which map to use.

6.1.6.1. The primary attribute
6.1.6.2. The <attr> element

Represents a single Z39.50 attribute used to search the index. The type should be specified as an integer: this attribute must be set explicitly on all <attr> tags.

6.1.6.2.1. The attributeSet attribute

The attributeSet attribute should contain the name of the attribute set to be used for the search, as defined in the Maintenance Agency's list at http://lcweb.loc.gov/z3950/agency/defns/oids.html#3. It defaults to BIB-1 if not set explicitly.

All object names - attribute sets, record syntaxes, etc. - are treated by ZeeRex as case- and hyphen-insensitive. So, for example, the attribute set names BIB-1, bib-1, BIB1 and bib1 all refer to the BIB-1 attribute set, 1.2.840.10003.3.1.

6.1.6.2.2. The type attribute
6.1.6.3. The <name> element

The other possibility for the contents of <map> is the <name> element. This contains a name string to use in the search. For example this might represent a complex attribute value in Z39.50, or the name of the index in SRW.

6.1.6.3.1. The attributeSet attribute
6.1.6.3.2. The type attribute

Should specify what sort of string the name represents if it is not obvious from context. In Z39.50, an index name it can be used to represent complex attribute values where the search field indicated by a string rather than not an attribute list; or in SRW, it can be the name of an index to search. The type is there to say which it is, if there's more than one way of doing things.

6.2. The <sortKeyword> element

Contains a keyword which may be given to the server as the so-called ``sort-field-designator'' part of a Sort Request to request sorting on this index - see section 3.2.7.1.3 (Sort-sequence) of the standard, and the definition of sortKey's sortfield in the ASN.1.

7. The <recordInfo> element

The <recordInfo> section contains information about the ways in which records may be retrieved from the server.

7.1. The <recordSyntax> element

7.1.1. The name attribute

Specifies a record syntax supported by the database, by means of its standard name as specified in the Maintenance Agency's list at http://lcweb.loc.gov/z3950/agency/defns/oids.html#5.

7.1.2. The <elementSet> element

Each <recordSyntax> element may contain any number of <elementSet> tags indicating that records may be fecthed using the specified element set in that record syntax; the name of each is recorded in its name attribute. For example, b and f are the standard element set names.

7.1.2.1. The name attribute
7.1.2.2. The <title> element

<title> inside <elementSet> specifies a title to be presented to the user for describing that element set. For example, the title corresponding to the element set name f might be ``full record''.

Feedback to <mike@indexdata.com> is welcome!