Author: Rob Sanderson <azaroth@liverpool.ac.uk>
1. Structure
2. serverInfo
3. databaseInfo
4. metaInfo
5. indexInfo
5.1. index
5.2. indexInfo for Z39.50
5.3. indexInfo for SRW
6. recordInfo
7. schemaInfo
8. configInfo
The ZeeRex DTD has a very simple structure. The main tag, explain, has six sections within it, only the first of which is required. The sections are:
The following describes each of the above divisions in turn, but first there are two attributes on the explain tag that bear discussion.
The authoritative attribute contains either true or false, defaulting to false if not present, and describes whether or not the record should be treated as the final authority on the described database. This may only be set to true if the following conditions are met:
The second attribute, id, is to allow for a unique identifier to be assigned to the record. This identifier may then be used for retrieval purposes. The identifier may be modified on aggregation from one server to another.
The serverInfo element contains the information necessary to start a connection to the described server. It has four attributes and four possible subelements, host, port, database, and authentication.
The protocol attribute on serverInfo is used to record the protocol that should be used to connect to the server. The default value is "Z39.50", but "SRW", "SRU" and "SRW/U" are also possible values. "SRW/U" means that both versions of the web service are available at the same URL endpoint. The version attribute may be used to further specify the highest version of the protocol supported. The transport attribute allows the file to record the protocol used for transporting SRW or SRU messages. Normally this will be http, the default, but may be changed to allow https or different protocols. Finally, the wsdl element can be included to specify the URL to a wsdl file which describes the server.
The host element contains the address of the server which hosts the record. This address should be in a name which will resolve to the correct IP address. Only if the server does not have a symbolic name should the numeric IP address be given.
The port on the server to connect to should be given in the port element, in numeric form.
The database element should contain the name of the Z39.50 database which the record describes. For other protocols such as SRW and SRU, this element may be used to contain the remainder of the URL to the server without any preceeding '/' character.
The database element may also have two attributes describing the number of records in the database, numRecs, and the time that the database was last updated in lastUpdate. As with all dates in the ZeeRex record, this should be given in the ISO 8601 format (YYYY-MM-DD hh-mm-ss).
The last element in serverInfo is the optional authentication element. If this element is present, but empty, then it implies that authentication is required to connect to the server, however there is no publically available login. If it contains a string, then this is the token to give in order to authenticate. Otherwise it may contain three elements:
An example serverInfo section might be:
<serverInfo protocol="Z39.50"> <host>gondolin.hist.liv.ac.uk</host> <port>210<210</port> <database>IR-Explain---1</database> <authentication> <user>azaroth</user> <password>squirrelfish</password> </authentication> </serverInfo>
The databaseInfo section contains the full text descriptions of various aspects of the database. All of the elements in this section are repeatable and may have the following attributes:
The title element in this context represents the title of the database.
description should contain a description of why this database might be of interest. Anything which does not fit under the other fields in
The author element should contain the name of the person or organisation to be credited with the creation of the database. On the other hand, the contact element is used to record information on a contact person for the database. This should include at least a name and some form or address, either electronic or postal.
extent is used to describe the completeness of the database, or the range of material that is included in it. For example a database which contained all the emails sent to a mailing list would be considered complete. If this database only maintained a smaller subset of the emails, then it should be noted in this element.
Any information which is considered useful regarding the history of the database may be recorded in the history element. This might include the sponsors for its creation, or significant moments in its history.
The langUsage element is used to record the languages used in the database records (as opposed to the ZeeRex record). If it is wished that this be searchable, then the codes attribute should contain the two-letter language codes separated by spaces.
If there are any restrictions on the usage or availability of the database or its contents then these should be recorded in the restrictions element. For example it might contain information regarding the copyright status of the records, or an indication that the database is only available between certain hours.
If the database concerns particular subjects from a controlled vocabulary then these may be recorded using subject elements within the subjects wrapper. These subjects might be drawn from the Library of Congress Subject Headings or another appropriate thesaurus.
The implementation element contains information concerning the underlying software. It has version and identifier attributes which may be used to identify particular releases. It may contain one or more title elements containing a human readable title to describe the server.
Finally, one or more links to other resources can be recorded, each in a link element within a links wrapper. The type of link is given in the type attribute, for which several values have been defined though more would be welcomed.
The list of types below may be added to at any point, it is not limited to updates only at new versions. Types will not be removed apart from at version boundaries. The same applies to the configInfo types further on.
An example databaseInfo section might look something like:
This section is quite short, containing a maximum of three elements. These elements describe the essential pieces of information concerning the ZeeRex record itself, as opposed to the database.
The dateModified element contains the date at which the record was created or last modified. This should be updated every time the record is changed by the owner. (An aggregator changing the authoritative attribute does not constitute a change which should be recorded in this element.)
The aggregatedFrom and dateAggregated elements should be present if the record has been harvested from another source. The latter, as the name implies, should contain the date on which the aggregation last took place. The aggregatedFrom element should contain enough information for a third party to retrieve the original, authoritative record. The contents should be in the form of a URL, using the z39.50r specification for Z39.50 servers or the appropriate form for other protocols.
An example:
This element is where the features of the database are recorded. In order not to repeat the confusion of 'term lists', interacting with a database is done via 'indexes' which are represented by attribute combinations for Z39.50. The indexInfo element may contain three different elements, set, index and sortKeyword, repeated as many times as needed.
An index represents a single type of search, scan or non-keyword sort that can be performed. This allows for different searches to be given specific titles, using the title element, and for multiple maps to be assigned to a single type of search if there is more than one way to do the same search.
The element has 4 attributes, the first three being search, scan and sort. These are true/false flags and record whether this particular request is allowed on the index described. If the flag is not present, then the implication is that the creator of the record did not know whether the function was available or not, such as might be the case for a remotely discovered server. The last element, id, is similar in function to the same element on the top level tag, but applies to the index.
Within the index element may appear one or more titles, which should be used when presenting the index as an option to a user. The protocol level information occurs within one or more map elements. If more than one is given, then these are to be considered alternative ways of accessing exactly the same information. For example, in Z39.50 one might wish to have both BIB1 and BIB2 attribute combinations available for clients which support one or the other. The map element may also have the primary attribute, with the same semantics -- one of the mappings should be used unless the client has a particular reason not to.
Each index can also have its own configInfo section, as described below. In this case, the information applies only to this index. This would be used, for example, to say that a particular index supports something which the rest did not.
It is worth describing the rest of the section separately for both Z39.50 and SRW.
One or more attr elements may occur within each map.
This would represent BIB1's USE attribute 1003, eg Personal Author.
This might then be represented in BIB2 as:
indexInfo may also contain the set element. This may contain zero or more title elements, and has two required attributes, name and identifier. For Z39.50, this can be used to declare a short name for an attribute set to be then later used in the set attributes of attr as described above rather than the full OID every time. For example:
One may also record that the server will accept specific keywords on which to sort result sets. This is the `sort-field-designator' part of a Sort Request, as opposed to a set of attributes which may be described in an index with the sort attribute set to true. Each sortKeyword element should contain a single keyword which the server will accept to sort upon.
SRW has two essential components for searching, context sets and indexes. Each searchable access point has one or more names associated with a set. As more than one name can be associated with the same set, the sets are declared at the beginning of the section.
The set element may contain zero or more title elements, and declares the short form used in CQL in the name attribute and the identifying URI in the identifier attribute.
The index tag is much the same as its use for Z39.50, except that the sort attribute is not used as sorting in SRW is done via XPath. Each name and set combination is listed in a name within a map element. The name element has a set attribute, which contains the short name of the context set. Each index can have more than one mapping, if there are multiple ways to reach it.
Like Z39.50, an index may have one or more titles, distinguished by language.
This section concerns how records may be retrieved from a Z39.50 database. It consists of a list of one or more recordSyntax elements, which may contain any number of elementSet elements.
Each recordSyntax has an identifier attribute which should contain the OID for the record syntax as defined at http://lcweb.loc.gov/z3950/agency/defns/oids.html#5.
Inside a recordSyntax element may appear any number of elementSets, each representing a particular element set that the server supports for that record syntax. The name to use for the elementset is recorded in the name attribute.
Titles may be given to element sets by using the title element within the elementSet element. As for all text intended for users, this may be repeatable and has the lang and primary attributes.
An example recordInfo section:
This element records the XML schemas in use by an SRW server, both for sorting and retrieval.
Each schema is recorded in a schema tag. It has several attributes which record how it can be used and the identifying information about the schema. If the sort attribute is true, then it can be used for sorting. Likewise, the retrieve attribute governs if it can be used for retreival. In the same way as indexSet it also has an identifier attribute which contains an identifying URI. The final attribute is location which is a URL to a copy of the schema itself, for validation and information purposes.
Inside the schema element may be any number of language differentiated title tags.
This section contains configuration information about how the server is set up. It has three possible tags within it, each being repeatable as many times as required. Each has a type attribute to say what sort of configuration option it is.
The default tag contains a default value set in the server that may be overriden by a specific request. For example in SRW there are a lot of default values, such as index, context set for indexes and record schemas.
If the configuration option is not something that can be changed, then the setting tag is used. Another example from SRW or Z39.50 is the maximum number of records that can be retrieved at once.
Finally if the information is just that the server supports a particular feature of the protocol, then the supports element is used. Some examples of this are proximity, sort requests, or record element selection.
Type Description www URL to a native web interface to the database z39.50 URL to a z39.50 interface srw URL to an SRW interface sru URL to an SRU interface oai URL to an OAI interface rss URL to an RSS news feed for the database icon URL to a graphical icon for the server
<databaseInfo>
<title lang="en" primary="true">The Science Fiction Foundation Collection</title>
<description lang="en" primary="true">
A database containing bibliographic records describing the books
and articles in the Science Fiction Foundation's collection held
at the University of Liverpool.
</description>
<author> Andy Sawyer </author>
<contact> Rob Sanderson, azaroth@liv.ac.uk</contact>
<langUsage codes="en fr ru">
The records are in English, French and Russian.
</langUsage>
...
</databaseInfo>
4. metaInfo
For information about z39.50r URLs, see RFC 2056 (Denenberg, Kunze and Lynch. RFC 2056: Uniform Resource Locators for Z39.50. November 1996, available at http://lcweb.loc.gov/z3950/agency/defns/rfc2056.html
<metaInfo>
<dateModified>2002-03-29 19:00:00</dateModified>
<aggregatedFrom> z39.50r://gondolin.hist.liv.ac.uk:210/IR-Explain---1?id=ghlau-1;esn=F;rs=XML </aggregatedFrom>
<dateAggregated>2002-03-30 06:30:00</dateAggregated>
</metaInfo>
5. indexInfo
5.1. index
5.2. indexInfo for Z39.50
The attr element is used to record a single attribute. The type of it is given in the numeric form in the type attribute of the element, and the value is given as its contents. If the attribute set is not BIB1, then it should be given in the set attribute. An example or two is in order.
<map>
<attr type="1">1003</attr>
</map>
A single index might contain both of these mappings to specify that either may be used.
<map>
<attr type="1" set="1.2.840.10003.3.12">3</attr>
<attr type="2" set="1.2.840.10003.3.18">3</attr>
<attr type="12" set="1.2.840.10003.3.18">aut</attr>
</map>
<set name="xd" identifier="1.2.840.10003.3.12"/>
<set name="bib2" identifier="1.2.840.10003.3.18"/>
...
<map>
<attr type="1" set="xd">3</attr>
<attr type="2" set="bib2">3</attr>
<attr type="12" set="bib2">aut</attr>
</map>
<indexInfo>
<set name="xd" identifier="1.2.840.10003.3.12"/>
<set name="bib2" identifier="1.2.840.10003.3.18"/>
<index search="true" scan="true" sort="true" id="ghlau-mail-1">
<title lang="en" primary="true">Author (keyword)</title>
<map primary="true">
<attr type="1">1003</attr>
</map>
<map>
<attr type="1" set="xd">3</attr>
<attr type="2" set="bib2">3</attr>
<attr type="12" set="bib2">aut</attr>
</map>
</index>
<sortKeyword> private </sortKeyword>
</indexInfo>
5.3. indexInfo for SRW
<indexInfo>
<set identifier="http://www.loc.gov/zing/cql/dc-indexes/v1.0/" name="dc"/>
<set identifier="http://www.loc.gov/zing/cql/bath-indexes/v1.0/" name="bath"/>
<index>
<title lang="en">Book Title</title>
<map><name set="dc">title</name></map>
<map><name set="bath">title</name></map>
</index>
</indexInfo>
6. recordInfo
<recordInfo>
<recordSyntax identifier="1.2.840.10003.5.109.10">
<elementSet name="F">
<title lang="en" primary="true">Full XML Record</title>
</elementSet>
</recordSyntax>
</recordInfo>
7. schemaInfo
<schemaInfo>
<schema identifier="http://www.loc.gov/zing/srw/dcschema/v1.0/"
location="http://www.loc.gov/zing/srw/dc.xsd"
sort="false" retrieve="true">
<title lang="en">Dublin Core</title>
</schema>
</schemaInfo>
8. configInfo
<configInfo>
<default type="numberOfRecords">1</default>
<setting type="maximumRecords">10</setting>
<supports type="proximity"/>
<supports type="relationModifier">stem</supports>
</configInfo>