An Overview of ZeeRex

28th August 2002

1. Introduction
        1.1. What's in this document?
        1.2. A note on the name ``ZeeRex''
2. Historical context
        2.1. Friends and Neighbours
        2.2. ``ZeeRex Proper''
                2.2.1. Explain Classic
                2.2.2. Explain Lite
3. ZeeRex records
        3.1. Friends and Neighbours records
        3.2. Full ZeeRex records
4. ZeeRex databases
        4.1. Searching ZeeRex databases
        4.2. Aggregation databases
5. Support for other protocols
        5.1. Representing non-Z39.50 databases
        5.2. Non-Z39.50 interfaces to ZeeRex
6. Summary

1. Introduction

The ZeeRex specifications at explain.z3950.org attempt to solve two separate but related problems in the Z39.50 world. The first is that of finding Z39.50 resources - servers and their databases - for a client to use. The second is that once a resource is known, it's necessary to figure out exactly what it's capable of doing. Our solution to the first problem is informally known as `Friends and Neighbours'', or ``F&N'' for short; the latter is ``ZeeRex Proper''.

1.1. What's in this document?

ZeeRex is not the first attempt at solving these problems. In section 2, we'll briefly consider the history of each problem, and why we think ZeeRex provides the necessary silver bullets.

Fundamentally, the ZeeRex facility is based on exchanging records which describe databases. Section 3 describes the structure of these records - both the brief records required for F&N functionality and the detailed records that can be used to describe a database more fully.

ZeeRex records are found in ZeeRex databases (where else?) which are generally accessed via Z39.50. Section 4 discusses those databases, including the mechanisms prescribed for searching them and ways of creating aggregated databases.

ZeeRex is unashamedly a Z9.50-o-centric specification, aimed at solving a Z39.50-specific problem, rather than attempting to address the broader problem of general resource discovery. However, it does contain some facilities for locating databases accessible by related protocols - particularly the conceptually similar Information Retrieval protocols SRW and SRU. Similarly, while ZeeRex records are generally found and transmitted via Z39.50, space is left for other protocols to be used for transport where appropriate. These elements of support for protocols other that Z39.50 are described in section 5.

Finally, section 6 summarises the reasons why ZeeRex will succeed where previous initiatives have fallen short.

1.2. A note on the name ``ZeeRex''

For the first two months of its life, the specification now known as ZeeRex went by the name of ``Explain--''. This name, in the spirit of ``C++'', was intended to convey the sense that this is Explain Classic with something taken away - promoting simplicity in favour of expressive power where these two qualities are in conflict. While there was some support for this name, it was not universally loved. Aside from the fact that it's uncomfortably easy to read the ``--'' part of the name as punctuation, there are various mechanical difficulties: for example, the name may not appear in an XML comment, as the ``--'' string ends the comment. Worse (for me, anyway), the name can confuse emacs's HTML mode.

So on Sunday April 28th, the name was changed to ZeeRex, which rhymes with T. rex and stands for Z39.50 Explain, Explained and Re-Engineered in XML.

The working group spent a lot of time on the issue of the name. Let posterity note some of the other names that we considered, very roughly in the order they were suggested:

Explain NG (``Explain: the Next Generation'')
eXplainML (``Explain'' meets ``XML'')
XplainML (same thing)
Plain-X (another ``Explain'' pun)
Splain (a colloquial contraction of ``Explain'')
Explain<<<< (``Explain left-shift left-shift'')
Explain<<<< (an XML joke)
Explainable Explain
TESYWBA2I2YM (``The Explain Service You Wouldn't Be Ashamed To Introduce To Your Mother'')
WIZARD (``Web/Internet/Z39.50 Application Resource Discovery'')
E4M (``Explain For Mortals'')
E=mc2 (``Explain Made Completely Comprehensible'')
XYZ (eXplain Your Z39.50 databases)
Explain Revisited
Explain Redux
ExplainRZ (I never understood that one)
Explain Rite (a pun on ``Explain Lite'' q.v.)
HenDriX (Half-Decent eXplain)

2. Historical context

2.1. Friends and Neighbours

The problem of finding Z39.50 server and databases has traditionally been addressed simply by having poor, overworked humans building lists of servers: for example, one such list is published by UKOLN at www.ukoln.ac.uk/distributed-systems/zdir and another by Index Data at www.indexdata.dk/targettest.

The difficulties with this approach are twofold:

First, it's difficult to achieve good coverage of all available servers. No-one knows how many Z39.50 servers there really are out there, but it's a good bet that neither the UKOLN nor the Index Data list contains more than a fraction of the totality.
Second, maintaining such lists is awkward, error-prone, and - worst of all - dull. Which means that it doesn't get done. It's possible to automate the process of pruning dead servers from a list, but not that of adding new servers, since there is no way to find out about them.

At the June 2000 ZIG meeting in San Antonio, Texas, a distributed directory of Z39.50 Servers was proposed: see http://lcweb.loc.gov/z3950/agency/zig/meetings/texas/zweb-report.html#directory for notes on this discussion. This directory became known as the ``Friends and Neighbours'' facility. As originally conceived, servers would unilaterally include in their Init Responses an otherInfo structure offering the hostnames and port numbers of other Z39.50 servers known to it - its friends and neighbours, in fact. Clients which didn't recognise the otherInfo's OID would simply ignore it; but those which understood it would then be able automatically to add the F&N servers to their lists. (For more details, see http://lists.w3.org/Archives/Public/www-zig/2000Jun/0002.html)

That effort did not take wing; but it has borne progeny in the form of ZeeRex's F&N facility. This works by allowing clients to search for and retrieve very brief XML records describing other Z39.50 databases.

It's possible to construct an F&N ``crawler'', similar in principle to the web crawlers that build web indexes such as www.google.com. Such a crawler would maintain a database of connection information about Z39.50 databases, visiting each database recommended by a friend or neighbour of a friend or neighbour of a friend or neighbour of ... Well, you get the idea.

Such a database could be automatically maintained, not only to remove the entries for servers which are no longer maintained, but also to add new entries for servers which become available and which are listed as friends or neighbours of any already-known database. This distributed arrangement removes the requirement for anyone to maintain a single monolithic list.

2.2. ``ZeeRex Proper''

The more complex problem of discovering detailed information about a Z39.50 database has a longer and more involved history. ZeeRex is the third major attempt to solve this problem.

2.2.1. Explain Classic

The first approach was the Explain facility enshrined in Z39.50-1995, now referred to as Explain Classic (lcweb.loc.gov/z3950/agency/markup/07.html) This certainly had all the necessary functionality, but has been widely perceived as difficult to implement for several reasons:

Explain Classic records are encoded using the Explain Record Syntax (described in Appendix REC.1 of the standard) rather than GRS-1 or XML with a suitable schema. Accordingly, implementing Explain requires messing about at the level of ASN.1 and BER - not generally a popular area. And it doesn't help that the ASN.1 for the Explain record syntax is rather longer than that for all the core services of the standard put together.
There is a perception that Explain Classic sometimes offers more than one way of saying the same thing, so that a server might advertise a facility that a client wants, but the client not know it because it was expecting the facility to be advertised in a different way. For example, no-one seems to be sure whether to represent supported attributes and combinations using AttributeDetails or TermList
All of this is exacerbated by the problem that there is very little commentary available, and virtually none in the standard: potential implementers are faced with an edifice of ASN.1 and nothing to help them interpret it.

Pretty much without exception, everyone loves Explain in theory. In practice, seven years after it was enshrined in a national standard, it has still only been implemented very rarely, and generally in little interoperable islands rather than in a way that promotes general interoperability as originally conceived.

2.2.2. Explain Lite

The second major tilt at the Explain windmill was the Explain Lite facility (www.one-2.org/exp-lite) evolved as a part of the ONE-2 project. Like the original F&N proposal, this worked by having servers unilaterally return additional information in their Init Responses; but this time, the information was an XML record describing the server being initialised.

Its crucial role within ONE-2 aside, we regard Explain Lite as a valuable prototyping exercise, but too tightly focussed on the specific needs of ONE-2 to serve as a general solution. In particular:

Returning all the information in an Init Response is not practical when dealing with lists of hundreds or thousands of databases, as we hope to do.
There is no facility for specifying the attribute set of an attribute - BIB-1 is assumed - so that Explain Lite can't describe the searching facilities of CIMI, Zthes, BIB-2 and other servers.
There is no way to speciify titles and other human-readable text in multiple languages.
While the XML record defined in Explain Lite, is fine for the kind of bibliographic databases that ONE-2 concerned itself with, we do not feel that it is sufficiently expressive to describe general non-bibliographic databases.

ZeeRex attempts to learn from both previous schemes. From Explain Classic it inherits the notion of an Explain database, which can be searched just like any other Z39.50 database. From Explain Lite it inherits the approach of using XML records to represent the data (although the format of the ZeeRex XML is rather different). It also adds its own new functionality, particularly in the areas of one server explaining another's databases, and aggregation of explain information from multiple sources.

ZeeRex can be thought of as a further development of, and generalisation of, Explain Lite.

3. ZeeRex records

Each ZeeRex record represents a single database that is available on a Z39.50 server somewhere. There may be multiple databases on a single server, or course; in this case, there will be multiple ZeeRex records, each of them describing a single database.

It is central to ZeeRex's ``Explain by stealth'' approach that F&N records are a strict (and tiny) subset of full ZeeRex records. The strategy is that we sucker server implementers into providing a full ZeeRex service in two easy steps:

First, they read the F&N specification and think, ``That's easy, I could do that'', and hack it into their servers one lunch-break.
Then they read the full ZeeRex specification and think, ``I've already done F&N, this is just the same thing with fatter records'', and hack it into their servers over lunch the next day.

We'll see how well this strategy works out in practice. At the very least, it should lower the perceived barrier because there is something tangible to show for a first-cut implementation.

But of course, the beauty of ZeeRex (and F&N in particular) is that not everyone has to implement it. So long as you have a friend or neighbour whose server implements ZeeRex, and who is willing to list your databases for you, you're OK.

3.1. Friends and Neighbours records

When a Z39.50 client retrieves an ZeeRex record using the element set b (that is, the brief record), an F&N record is returned. It is very simple: an XML document with an <explain> element at the top level, a <serverInfo> element within it, and <host>, <port> and <database> elements within that, containing the hostname, IP port number and database name respectively of a Z39.50 database which may be on the same server as the ZeeRex record or a different one.

For example:

	
	<explain>
	  <serverInfo>
	    <host>gondolin.hist.liv.ac.uk</host>
	    <port>210</port>
	    <database>l5r</database>
	  </serverInfo>
	</explain>

[download]

You could think of these records as corresponding to the following trivial DTD:

	
	<!ELEMENT explain (serverInfo)>
	<!ELEMENT serverInfo (host, port, database)>
	<!ELEMENT host (#PCDATA)>
	<!ELEMENT port (#PCDATA)>
	<!ELEMENT database (#PCDATA)>

[download]

(Although that's not the whole truth: these elements may carry a few optional attributes, details of which can be found in the real DTD.)

3.2. Full ZeeRex records

When a Z39.50 client retrieves an ZeeRex record using the element set f (that is, the full record), a full ZeeRex record is returned. In addition to the <serverInfo> section also found in F&N records, full records may also include the following sections:

<databaseInfo>: contains human-readable information about the database: its title, a description, the address of a contact person, etc.
<metaInfo>: information about the ZeeRex record itself: when it was created or last modified, when it was aggregated (see section 4.2) if at all, etc.
<indexInfo>: information about how to search in the database: which indexes exist and what combinations of attributes may be used to search against them, which indexes can be used for sorting, scan, etc.
<recordInfo>: information about which record syntaxes the database can serve records in, and which element sets are supported.

For much more about the full record, see The ZeeRex DTD, especially:

These documents also explain how the ZeeRex record may be extended to carry additional information, and why this is nearly always a bad idea.

4. ZeeRex databases

ZeeRex records are found in - get ready for a shock - ZeeRex databases. These are completely ordinary Z39.50 databases held on ordinary Z39.50 servers; the usual Z39.50 search and retrieval facilities may be used on them in the usual way.

As the Explain facility's database is called IR-Explain-1, so the Explain-- database, back when ZeeRex was so named, was called IR-Explain---1. We are particularly proud of that name, so we've kept it despite the specification's name-change: don't even think about trying to make us change it :-)

4.1. Searching ZeeRex databases

ZeeRex databases are searched in accordance with the Z39.50 Attribute Architecture, using a combination of attributes from the Z39.50 Utility Attribute Set, the Cross-Domain Attribute Set and the new ZeeRex Attribute Set.

The available searches are described in detail in Searching ZeeRex Databases, but in broad outline, clients are able to search for records representing databases on a particular host or port, with a particular name, supporting particular record syntaxes and access points, etc.

4.2. Aggregation databases

We envisage that some ZeeRex databases will be built by aggregating records found in other ZeeRex databases. These will in general be found by an ``ZeeRex crawler'' as hinted at in section 2.1. Such a crawler may choose to gather only F&N information, so that the resulting ZeeRex database is only a list of servers with connection details; but there is no reason why it should not gather full ZeeRex records and build a richer database that can provide more sophisticated searching facilities.

One of the motivations for the whole ZeeRex initiative was the idea of a google-like website providing a web interface to a poewerful facility for searching a large, aggregated database of ZeeRex information. We imagine a scenario where someone can just go to (say) http://find.z3950.org/ and search for Z39.50 databases containing cultural heritage information which are in either .org or .org.uk domains, understand the CIMI-1 attribute set and are able to return XML records.

(Such a database should be pretty big, running at least into the thousands of records, and maybe tens or hundreds of thousands. That's part of the reason that we felt it necessary to have the ZeeRex database searchable, rather than simply returning the whole lot in an Init Response as Explain Lite does.)

The ZeeRex record includes some information about aggregation, including the date that the record was aggregated, and the source that it was aggregated from. This information can be used to ensure that aggregated records older than some given limit are discarded and re-fetched from the original source.

5. Support for other protocols

ZeeRex is firmly focussed on Z39.50 rather than trying to solve the general problem of discovering information resources on the internet. There are plenty of existing initiatives working on that, including:

ebXML at www.ebxml.org
The Global Information Location Service at www.gils.net
UDDI at www.uddi.org
The WebClarity Resource Registry, described at www.webclarity.info/whitepapers/registry_white.pdf
The Semantic Web initiative at www.w3.org/2001/sw

Our goal is a more modest, and more immediate, one: enhancing interoperability today between the many Z39.50 servers and clients that are already at work out there in the world.

Nevertheless, we do recognise that Z39.50 exists in the context of a broader world of information retrieval; and that the database model captured by the ZeeRex DTD is of some relevance to other IR protocols. Similarly, we recognise the possibility that some Z39.50 server providers, unable to add ZeeRex functionality to their server software, may wish to publish records describing their databases by some other means. Where it's possible to support these non-Z39.50 uses of ZeeRex without doing violence to its model, we are keen to accomodate alternative protocols.

5.1. Representing non-Z39.50 databases

At the moment, the only kind of non-Z39.50 databases explicitly catered for in ZeeRex are those provided by SRW and the closely related SRU. (These protocols are attempts to provide a Z39.50-like IR model using XML/SOAP technology.) Specifications for describing SRW/SRU databases are provided in Using ZeeRex with SRW.

We remain open to the possibility of extending or generalising ZeeRex to allow the description of databases available via other protocols in which the IR notions of ``database'', ``index'' etc. make sense. (So that counts out websites, for example.)

Suggestions to <protocols@explain.z3950.org> are welcome.

5.2. Non-Z39.50 interfaces to ZeeRex

The preferred method for obtaining ZeeRex records is by searching in, and retrieving from, the IR-Explain---1 databases of an Z39.50 server that supports ZeeRex. Owners of Z39.50 servers which can't be modified to support ZeeRex directly - perhaps because they use binary-only server software - are encouraged to find a friendly ZeeRex server maintainer, and have records describing their databases added to that server's ZeeRex database. We envisage providing such a service at z39.50s://proxy.explain.z3950.org:210/IR-Explain---1 shortly.

However, if for some reason this is not possible, ZeeRex records may be made available by other means. We do not plan to recommend any such alternative protocol over any other: ZeeRex records may be obtained by HTTP, FTP, NFS or carrier-pigeon if necessary. But remember that all these approaches are inferior to placing them in an ZeeRex database, if only because they can never be harvested by an ZeeRex crawler.

6. Summary

ZeeRex is:

Soundly Engineered because is builds on lessons learned from not one but two previous initiatives towards solving similar problems.
Expressive because the record structure has been carefully worked through to provide all the key information about a Z39.50 database's capabilities.
Scalable: the distributed nature of the catalogue means that no one agency is responsible for maintaining lists.
Powerful: the full machinery of the proven Z39.50 searching mechanisms is used to locate relevant databases.
Extensible where necessary, by the use of XML namespaces.
Easy to implement since it re-uses Z39.50 facilities already available in most servers, together with minimal XML functionality which is readily available in many free implementations.
Here today: three weeks after the working group was first convened, work had already begun on four separate ZeeRex server implementations; two can already be accessed on z39.50s://gondolin.hist.liv.ac.uk:210/IR-Explain---1 and z39.50s://z3950.simdb.com:210/IR-Explain---1

Feedback to <mike@indexdata.com> is welcome!