Back to IMesh Toolkit Home Page
Back to IMesh Toolkit Homepage
Subject Gateway Requirements
Technology Review
Work In Hand
  Personalization
Annotation
Reading Lists
OAI  Normalization tools
Metadata Exchange
RDF queries
Evaluation
Dissemination
Project Documentation
Related Links
Project Partners
IMesh Home Page

The IMesh Toolkit

[ Work In Hand > Technology Review > Standards and Protocols ]

Extensible Markup Language (XML)

Overall Purpose

XML stands for eXtensible Mark-up Language. It began as a project to address HTML's limitations on structured documents, by selecting a simple-to-implement yet extensible subset of SGML for use on the Web. [1] In fact, most explanations [2] [3] [10] of XML begin by a compare-and-contrast to HTML and SGML. XML aims to bring rich structure to documents by providing flexible and extensible semantics to the structure (unlike HTML which is inflexible), without the full complexity of SGML (whilst still being compatible with SGML). XML was designed for ease of implementation and for interoperability with both SGML and HTML [4]. It was developed by the XML Working Group of the World Wide Web consortium (W3C).

Brief Overview of Functionality

XML is an open, text-based markup language that provides structural and shallow semantic information to data. It provides a means for structuring data whilst making it available for manipulation and display by different applications. Markup consists of codes (or tags) which, added to plain text, change that text to give it special formatting (look) or meaning. Unlike HTML, XML has no predefined tags. Using XML, it is possible to define tags and the structural relationships between them. XML users can create their own tags or reuse those created by others. The tags may be such that they describe the content of the element. At the simplest level XML is just a way of marking up data such that it is self-describing. It provides a standardised means for storage and transport of data.

XML documents look similar to HTML or SGML documents; they consist of markup and content. There are six kinds of markup that can be used in an XML document; comments and processing instructions are two kinds of markup. The most common form of markup is used for elements. All the data in an XML document is contained within a Document element, which is the topmost (or root) element. This single element can comprise any number of nested sub-elements.

Elements consist of two tags, an opening and a closing tag. Text can be contained between the element tags and is considered part of the element. Empty elements are also supported. Elements can have attributes which further refine or modify element behaviour. Most elements identify the nature of the contents they surround. The logical structure of an XML document indicates how it is built, what elements can be included and in what order. The official XML specification [6] describes an XML document and what it should contain.

The following provides an example of how information about the UKOLN homepage could be structured using XML. The root element is <Description>. Sub-elements like <creator> and <date> contain information that describes the page. Each element is named appropriately for the kind of information it supplies, making the XML description human-readable as well as machine-readable. The description element also has an attribute 'about', which tells us which web page this selection of XML is describing. The value of the attribute 'about' is the URL of the webpage.
<Description about="http://www.ukoln.ac.uk/">
    <Title UKOLN: UK Office for Library and Information Networking </Title>
    <Creator> UKOLN Information Services Group </Creator>
    <Subject>
      national centre; network information support; library
      community; awareness; research; information services;
      public library networking; bibliographic management;
      distributed library systems; metadata; resource discovery;
      conferences; lectures; workshops
    </Subject>
    <Summary>
      UKOLN is a national centre for support in network
      information management in the library and information
      communities. It provides awareness, research and
      information services
    </Summary>
    <Date>2000-02-17</Date>
    <Format> text/html</Format>
</Description>

Deployment

The intensity with which the take-up of XML has proceeded is reflected in the ongoing work at W3C on XML-related standards. An article providing a brief survey of recent work and current efforts on XML-related topics at W3C described this activity thus:
"Submissions from Member organizations, and draft specifications in varying stages of completion define applications of XML, rules for using XML in particular contexts, extensions to XML, languages for processing XML documents, languages for declaring XML-based languages, languages for querying collections of XML documents -- all in ever-greater speed and profusion." [7]

Describing all the XML-based applications is beyond the scope of this review. XML-related Software [8] provides a compilation of XML software, ranging from parsers (available for almost any programming language) to conversion tools, database systems to APIs. Brief descriptions and links to web pages and suppliers are given.

XML: Libraries' Strategic Opportunity [9] argues for the advantages of using XML over other library standards and provides examples of how XML is being used in several (mostly bibliographic data) settings.

Related Standards

(Most of the standards mentioned here are tackled in other sections within this technical review, thus the reader is referred to those sections for details. Here we simply indicate their relation to XML).

SGML - XML is a cut-down implementation of SGML, being both simpler and stricter.

XSLT - Extensible Stylesheet Language Transformations - a rules-based system used to transform XML documents

DOM - the Document Object Model.
The DOM defines interfaces, properties and methods to manipulate XML documents. It is a W3C specification and is designed to be used with any programming language and any operating system. With the XML DOM, a programmer can create an XML document, navigate its structure, and add, modify, or delete its elements.

SAX - Simple API for XML
SAX is another interface to read and manipulate XML documents. It was developed by members of the XML-DEV mailing list as a standard set of interfaces to allow different vendor implementations.

SOAP - Simple Object Access Protocol [UKOLN SOAP review]
SOAP is intended as a simple application-to-application protocol over the Internet. Messages sent via the SOAP protocol are encoded as XML.

Relevance to IMesh context

XML is envisioned as a language that enables richly-structured documents to be made available on the web. It provides a powerful means of data encoding, storage and transmission and thus could fulfil all these three functions for the metadata handled by subject gateways, which is by nature structured information.

As a data interchange format, XML is suitable for exchanging data between systems that ordinarily are not compatible, by providing a common format. This could be useful for exchanging information, for cross-searching and for transforming "legacy" metadata into an interoparable form. The SOAP protocol is an example of using XML as a common format for data transport between distributed systems.

The ease of transformation and manipulation of data encoded as XML gives flexibility in creating, storing and presenting metadata. Tools and APIs are widely available.

References

[1] Web Architecture: Describing and Exchanging Data, W3C Note, 7 June 1999
http://www.w3.org/1999/04/WebData

[2] What is XML?
Norman Walsh, Senior Application Analyst, ArbourText Inc., October 1998
http://www.xml.com/pub/a/98/10/guide1.html#AEN63

[3] XML: What is it?
Architag International, January 1998
http://www.architag.com/solutions/980106-01.html

[4] Annotated XML Specification
http://www.xml.com/axml/testaxml.htm

[5] XML in Action - Web Technology, W.J. Pardi, Microsoft Press
Paperback March 1999 ISBN 0735605629

[6] XML Specification, Extensible Markup Language (XML) 1.0 (Second Edition),
W3C Recommendation, 6 October 2000
http://www.w3.org/TR/2000/REC-xml-20001006

[7] XML-related Activities at the W3C
C.M.Sperburg-MacQueen, W3C, January 2001
http://www.xml.com/pub/a/2001/01/03/w3c.html

[8] XML-related software
http://www.xmlsoftware.com/

[9] XML: Libraries' Strategic Opportunity
Dick R. Miller,Stanford University, 2000
http://www.libraryjournal.com/xml.asp

[10] XML for the absolute beginner
Mark Johnson, April 1999
http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml.html

Other Standards and Protocols

CIP DC LDAP OAI
RDF RSS SDLIP SOAP
WHOIS++ XHTML XML Z39.50