|
|
The IMesh Toolkit
[ Work In Hand > Technology Review > Standards and
Protocols ]
Extensible Markup Language (XML)
|
Overall Purpose
|
| XML stands for eXtensible Mark-up Language.
It began as a project to address HTML's limitations on structured
documents, by selecting a simple-to-implement yet extensible
subset of SGML for use on the Web. [1] In fact, most explanations
[2] [3] [10] of XML begin by a compare-and-contrast to HTML and
SGML. XML aims to bring rich structure to documents by providing
flexible and extensible semantics to the structure (unlike HTML
which is inflexible), without the full complexity of SGML (whilst
still being compatible with SGML). XML was designed for ease of
implementation and for interoperability with both SGML and HTML
[4]. It was developed by the XML Working Group of the World Wide
Web consortium (W3C). |
Brief Overview of Functionality
|
XML is an open, text-based markup language
that provides structural and shallow semantic information to data. It
provides a means for structuring data whilst making it available
for manipulation and display by different applications. Markup
consists of codes (or tags) which, added to plain text, change
that text to give it special formatting (look) or meaning. Unlike
HTML, XML has no predefined tags. Using XML, it is possible to
define tags and the structural relationships between them. XML
users can create their own tags or reuse those created by others.
The tags may be such that they describe the content of the
element. At the simplest level XML is just a way of marking up
data such that it is self-describing. It provides a standardised
means for storage and transport of data.
XML documents look similar to HTML or SGML documents; they
consist of markup and content. There are six kinds of markup that
can be used in an XML document; comments and processing
instructions are two kinds of markup. The most common form of
markup is used for elements. All the data in an XML document is
contained within a Document element, which is the topmost (or
root) element. This single element can comprise any number of
nested sub-elements.
Elements consist of two tags, an opening and a closing tag.
Text can be contained between the element tags and is considered
part of the element. Empty elements are also supported. Elements
can have attributes which further refine or modify element
behaviour. Most elements identify the nature of the contents they
surround. The logical structure of an XML document indicates how
it is built, what elements can be included and in what order. The
official XML specification [6] describes an XML document and what
it should contain.
The following provides an example of how information about the
UKOLN homepage could be structured using XML. The root element is
<Description>. Sub-elements like <creator> and
<date> contain information that describes the page. Each
element is named appropriately for the kind of information it
supplies, making the XML description human-readable as well as
machine-readable. The description element also has an attribute
'about', which tells us which web page this selection of XML is
describing. The value of the attribute 'about' is the URL of the
webpage.
<Description about="http://www.ukoln.ac.uk/">
<Title UKOLN: UK Office for Library and Information Networking </Title>
<Creator> UKOLN Information Services Group </Creator>
<Subject>
national centre; network information support; library
community; awareness; research; information services;
public library networking; bibliographic management;
distributed library systems; metadata; resource discovery;
conferences; lectures; workshops
</Subject>
<Summary>
UKOLN is a national centre for support in network
information management in the library and information
communities. It provides awareness, research and
information services
</Summary>
<Date>2000-02-17</Date>
<Format> text/html</Format>
</Description>
|
Deployment
|
The intensity with which the take-up of XML
has proceeded is reflected in the ongoing work at W3C on
XML-related standards. An article providing a brief survey of
recent work and current efforts on XML-related topics at W3C
described this activity thus:
"Submissions from Member organizations, and draft specifications
in varying stages of completion define applications of XML, rules
for using XML in particular contexts, extensions to XML,
languages for processing XML documents, languages for declaring
XML-based languages, languages for querying collections of XML
documents -- all in ever-greater speed and profusion." [7]
Describing all the XML-based applications is beyond the scope
of this review. XML-related Software [8] provides a compilation
of XML software, ranging from parsers (available for almost any
programming language) to conversion tools, database systems to
APIs. Brief descriptions and links to web pages and suppliers are
given.
XML: Libraries' Strategic Opportunity [9] argues for the
advantages of using XML over other library standards and provides
examples of how XML is being used in several (mostly
bibliographic data) settings.
|
Related Standards
|
(Most of the standards mentioned here are
tackled in other sections within this technical review, thus the
reader is referred to those sections for details. Here we simply
indicate their relation to XML).
SGML - XML is a cut-down implementation of SGML, being both
simpler and stricter.
XSLT - Extensible Stylesheet Language Transformations - a
rules-based system used to transform XML documents
DOM - the Document Object Model.
The DOM defines interfaces, properties and methods to manipulate
XML documents. It is a W3C specification and is designed to be
used with any programming language and any operating system. With
the XML DOM, a programmer can create an XML document, navigate
its structure, and add, modify, or delete its elements.
SAX - Simple API for XML
SAX is another interface to read and manipulate XML documents. It
was developed by members of the XML-DEV mailing list as a
standard set of interfaces to allow different vendor
implementations.
SOAP - Simple Object Access Protocol
[UKOLN SOAP review]
SOAP is intended as a simple application-to-application protocol
over the Internet. Messages sent via the SOAP protocol are
encoded as XML.
|
Relevance to IMesh context
|
XML is envisioned as a language that
enables richly-structured documents to be made available on the
web. It provides a powerful means of data encoding, storage and
transmission and thus could fulfil all these three functions for
the metadata handled by subject gateways, which is by nature
structured information.
As a data interchange format, XML is suitable for exchanging
data between systems that ordinarily are not compatible, by
providing a common format. This could be useful for exchanging
information, for cross-searching and for transforming "legacy"
metadata into an interoparable form. The SOAP protocol is an
example of using XML as a common format for data transport
between distributed systems.
The ease of transformation and manipulation of data encoded as
XML gives flexibility in creating, storing and presenting
metadata. Tools and APIs are widely available.
|
References
|
[1] Web Architecture: Describing and
Exchanging Data, W3C Note, 7 June 1999
http://www.w3.org/1999/04/WebData
[2] What is XML?
Norman Walsh, Senior Application Analyst, ArbourText Inc.,
October 1998
http://www.xml.com/pub/a/98/10/guide1.html#AEN63
[3] XML: What is it?
Architag International, January 1998
http://www.architag.com/solutions/980106-01.html
[4] Annotated XML Specification
http://www.xml.com/axml/testaxml.htm
[5] XML in Action - Web Technology, W.J. Pardi, Microsoft
Press
Paperback March 1999 ISBN 0735605629
[6] XML Specification, Extensible Markup Language (XML) 1.0
(Second Edition),
W3C Recommendation, 6 October 2000
http://www.w3.org/TR/2000/REC-xml-20001006
[7] XML-related Activities at the W3C
C.M.Sperburg-MacQueen, W3C, January 2001
http://www.xml.com/pub/a/2001/01/03/w3c.html
[8] XML-related software
http://www.xmlsoftware.com/
[9] XML: Libraries' Strategic Opportunity
Dick R. Miller,Stanford University, 2000
http://www.libraryjournal.com/xml.asp
[10] XML for the absolute beginner
Mark Johnson, April 1999
http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml.html
|
|