|
|
The IMesh Toolkit
[ Work In Hand > Technology Review > Standards and
Protocols ]
XHTML
|
Overall Purpose
|
Having gone through several stages of
evolution, HTML achieved version 4.0 Recommended status in
December 1997. By 1999 HTML 4 had been recast in XML. The
resultant XHTML 1.0 is now a W3C recommendation also. Whereupon
work on modularising XHTML was carried forward. XHTML can be
implemented on existing browsers with a small number of changes
from HTML 4.0 syntax. W3C's intention, in championing XHTML, is
to promote stricter standards and clearer code to provide richer
web pages over an ever-widening variety of browser platforms [2].
XHTML is in effect a reformulation of HTML 4.0 as an
application of XML, its elements and attributes being almost
identical to HTML [1]. So XHTML is XML-based and is ultimately
designed to work with XML-based user agents. In terms of its
origins, XHMTL may be viewed as the steadying influence upon the
increasingly uncontrolled HTML which in itself evolved as a
simpler subset of structural and semantic tags from the rich and
flexible, but for many over-complex, Standard Generalised Markup
Language (SGML). As HTML has developed so rapidly since 1990 to
meet deeper needs than existed for its original purpose, the
resulting plethora of new elements has occasioned difficulties
with compatibility [3].
XHTML 1.0 can be seen as a solution to the increasing problems
growing up around HTML which is becoming less appropriate for use
on new platforms. The rationale for XHTML is its ability to
accommodate new or additional element attributes that are
permitted under XML. XHTML is able to accommodate such extensions
through modularisation. XHTML modules will therefore permit
developers to combine new feature sets with existing ones
[3].
Bearing in mind the expectation of a far greater variety of
platforms for viewing Internet documents than at present, the
XHTML "family" will be able to offer user agent interoperability
using features such as user agent and document profiling
mechanisms [3]. So XHTML 1.0 may be regarded as a bridge provided
for web developers who are keen to promote a modular and
extensible web based on XML whilst maintaining compatibility with
HTML 4.0 browsers.[1]
|
Brief Overview of Functionality
|
The chief benefit that derives from using
XHTML is its extensibility. HTML, in order to accept the addition
of a new group elements would have to subject its DTD (Document
Type Definition) to alteration. XHTML as an application of XML, a
simplified subset of SGML, facilitates the development and
integration of new elements. There is an increasingly important
issue of portability with regard to HTML. Up until now, the
increased computing power of desktops has mitigated against the
overloaded and ill- structured nature of much of HTML code that
abounds. However, it is predicted that in the next few years the
overwhelming majority of devices accessing the Internet, for
example palmtops, WAP 'phones, etc., will not have the power of
modern PC desktops capable of accommodating HTML's failings [1].
The modular design of XHTML reflects the realisation that a
one-size-fits-all approach will no longer work on the Web where
browsers vary enormously in their capabilities. For example, a
browser in a PDA (Personal Digital Assistant) cannot offer the
same experience as a high-end multimedia desktop computer because
of the differences in the screens and memory [12].
It is possible for anyone to extend XHTML; authors can define
new tags and attributes and are able to embed content and
programming in a Web page, combining HTML 4.0 elements with those
from other XML languages [1]. The criteria for conformance or the
rules for creating one's own XHTML Family have been published
recently [5][6].
One tool that is already proving popular, HTML Tidy by Dave
Raggett [8], which cleans up HTML and converts it to XHTML, was
hosted by W3C HTML Activity [2]. Furthermore a validation tool
[9] is also available from W3C. This allows users to validate a
document against the document type definition. It has a number of
options and may be considered as one of the more reliable ones
available. It is also possible to link webpages to it via
http://validator.w3.org/check/referer [7]. Another possibility is
Mozquito Factory [10].
Furthermore, HTML-Kit [11] is a freely available program for
Windows 9x/NT, designed to help HTML authors to edit, format,
validate, preview and publish documents on the Web. It includes a
customizable GUI to HTML Tidy for converting documents from HTML
to XHTML 1.0. Among other possibilities of viewing, it provides a
view with split windows, one with the original markup and the
other with the markup after transformation. Errors and
suggestions for improving the markup are also reported in a
separate window [12].
With regard to document conformance, it is possible to use
XHTML with other name spaces, for example, in order to include
metadata expressed in say, RDF in an XHTML document. Whilst such
documents are not strictly conformant with the XHTML 1.0
specification, W3C is undertaking work to specify conformance for
documents employing multiple namespaces, bringing this
functionality into compliance with strict XHTML [7]. XHTML
document authors are advised to use XML declarations in all their
documents [3].
A key feature of XHTML will be modularisation. XHTML can be
arranged into a series of modules organised by different areas of
markup such as paragraphs, hypertext links, images and so forth.
Therefore each module contains a number of tags, for example, the
Text Module will include tags h1 to h6, pre, em, etc. In this way
modules are able to provide a means for sub-setting and extending
HTML and hence its sphere of influence over non-PC platforms
[2].
XHTML 1.0 specifies three XML document types which correspond
to the three HTML 4.0 document type definitions, namely Strict,
Transitional and Frameset. The former promotes really tidy
markup, unhindered by presentational clutter and is to be used
with Cascading Style Sheets. Conversely Transitional allows the
user to take advantage of HTML's presentational features to
benefit readers whose browsers do not support CSS's. Finally
Frameset permits the partition of the browser into two or more
frames. W3C intends to deprecate DTD's in XML in favour of XML
Schemas [1].
|
Deployment
|
Despite its ability to accommodate HTML
4.0, it should be remembered that XHTML is nonetheless an XML
application and that it will not tolerate certain loose practices
such as overlapping markup. XML insists on well-formed markup
which basically means properly nested and closed markup [3].
However the demands that that represents are not too onerous:
elements must be nested properly and element attribute names
should be in lower case. Non-empty elements must have an end tag
unless declared EMPTY in the document type definition. All
attribute values must be quoted, even numerical ones, and all
attribute-value pairs should be written in full, i.e. no
minimalisation as in <dl compact>,(in other words, <dl
compact = "compact"> is required). Finally empty elements
require an end tag or a terminated start tag, e.g.<br/>[3].
Nonetheless, provided the XHTML code is well-formed, it is not
complicated to add a new set of elements to an existing document
type definition, provided they are internally consistent. As a
consequence the development and integration of new element
collections in XHTML is far easier [1]. Moreover the benefit of
extensibility comes without any obligation to support the entire
language [7]. With XHTML 1.0, via XML Namespaces, elements from
other XML vocabularies can be added without altering the entire
DTD that the document is based on. As XHTML is modular, it can be
used in conjunction with other XML applications such as
Mathematical Markup Language (MathML), Scalable Vector Graphics
(SVG), Resource Description Framework (RDF), among others [12].
Note that whereas in SGML a DTD is able to exclude specific
elements from being contained within an element, no such
exclusions are permissible in XML [3].
XHTML support for document profiles is also a benefit . A
document profile permits the specification of syntax and
semantics for a set of documents and also the facilities
necessary to process documents of that set, facilities such as
levels of scripting, style sheet support, etc. Conformance to
such profiles promotes interoperability as well as allowing
groups with common aims to build a profile comprising standard
HTML elements plus those elements useful and specific to that
group as an extension [3].
However one difficulty identified relates to fragment
identifiers: the XHTML 1.0 Specification admits that the
constraint attached to the attribute NMTOKEN cannot be expressed
in XHTML 1.0 DTD's. as a consequence there may be problems when
converting existing HTML documents. Equally Cascading Style
Sheets, which define style properties, are applied to the parse
tree of XHTML and XML documents. However differences in parsing
will produce different results according to the selectors used,
though the specification does offer suggestions to mitigate these
effects [3].
A further possible disadvantage may arise with certain
editors: XHTML's insistence on developers quoting attribute
values, even numeric ones, may prove irritating for those whose
HTML editors remove quotes from numeric values, despite whatever
their documentation might claim to the contrary [7].
There is a body of positive industry support for XHTML
including testimonials [13] as published by W3C as well as the
endorsement by the latter's Director, Tim Berners-Lee : "XHTML
1.0 connects the present Web to the future Web. It provides the
bridge to page and site authors for entering the structured data,
XML world, while still being able to maintain operability with
user agents that support HTML 4." [14]
|
Related Standards
|
SGML (Standard Generalised Markup Language)
is a meta-language for describing markup languages, particularly
those used in electronic document exchange, document management,
and document publishing. It is both feature-rich and flexible.
This flexibility, however, comes with a level of complexity that
has inhibited its adoption on the Web.
HTML is an SGML application, and is widely regarded as the
standard publishing language of the Web. HTML was originally
conceived as a language for the exchange of scientific and
technical documents, suitable for use by non-document
specialists. HTML addressed the problem of SGML complexity by
specifying a small set of structural and semantic tags suitable
for authoring relatively simple documents. In addition to
simplifying the document structure, HTML added support for
hypertext. Later, multimedia capabilities were added [12].
Whilst there can be no denying the success of HTML since its
inception in 1990, it no longer proves a reliable basis for the
deployment of complex web-based applications on the Internet.
Indeed it has long outgrown its original rationale and developed
into something of an unwieldy monster that some browsers and
spiders find increasingly difficult to handle. HTML lacks the
structured set of rules inherent in XML with which to define
properly all data placed on the Web. It is not possible under
HTML , unlike XML, to introduce a new set of markup to suit a
particular purpose [1].
However XML, at base, is so important because it delivers a
standardised markup capable of separating layout code from
syntax, thereby making document creation and parsing much easier
[7]. However the difficulty still remained what was to be done
about the very large body of HTML Web documents that ought
ideally to migrate to XML; such a transition would pose problems.
XHTML can be seen as the next generation language for Web
documents that did not render such a large body of HTML documents
obsolete [12]. XML [UKOLN XML review]
|
Relevance to IMesh context
|
In the future the definition of modules and
a mechanism for combining them will permit the extension and
sub-setting of XHTML in a uniform fashion. The use of sub-sets of
XHTML through modularisation will prove important as a
significant number of user agents no longer operate on PC's. For
example, hand-held devices and cell-phones etc. will most likely
only support subsets of XHTML elements. Such modularisation
confers a number of benefits since not only does it provide a
formal mechanism for sub-setting and modularising XHTML but also
simplifies transformation between document types and promotes the
re-use of modules in new document types [3].
It is anticipated that W3C work on new modules for XHTML, e.g.
XHTML events and also work on better ways to handle compound
documents will result in a specification for XHTML 2.0.
Development of a CC/PP (Composite Capability/Preference Profiles,
[4]) vocabulary for XHTML is also anticipated, thus making it
practical to use CC/PP to specify which XHTML modules are
supported by a device. [2]
Work on modularisation of XHTML will be boosted by activity on
XML Schemas for XHTML. W3C envisages new features, related to
areas such as synchronised Multimedia, privacy preferences, etc.
which will give XHTML greater appeal [2].
XHTML may be viewed as a crossover between the point where
HTML 4.0 "runs out of road", (particularly in the context of some
current and indeed where future web-based applications are
concerned) and where XML appears to have left its mark as far as
many developers are concerned. Its offer of backward
compatibility with HTML and its conformance with XML may
represent for some its single most attractive quality. XHTML
provides a bridge, serving as an application of XML with the
purpose of expressing web pages. [1].
|
References
|
[1] Introduction to XHTML, with eXamples,
Alan Richmond, Web Developers' Virtual Library, February,
2000
http://wdvl.com/Authoring/Languages/XML/XHTML/
[2] HyperText Markup Language Activity Statement, W3C,
2000
http://web4.w3.org/MarkUp/Activity.html#future
[3] XHTML 1.0 W3C Recommendation, January 2000
http://www.w3.org/TR/xhtml1/">
http://www.w3.org/TR/xhtml1/">http://www.w3.org/TR/xhtml1/
[4] Composite Capability/Preference Profiles (CC/PP): A user
side framework for content negotiation, W3C Note, July 1999
http://www.w3.org/TR/NOTE-CCPP/
[5] Extending XHTML: Using Modularization and Schemas to Good
Use in Current and Future XHTML/XML Documents, August 2000, Sean
B. Palmer
http://xhtml.waptechinfo.com/extxhtml/#future
[6] Modularization of XHTML, W3C Candidate Recommendation,
October 2000
http://www.w3.org/TR/xhtml-modularization/conformance.html#s_conform_na
ming_rules
[7] XHTML: The Clean Code Solution, Peter Wiggin, Web
Development Manager, O'Reilly Network, April, 2000
http://www.oreillynet.com/pub/a/network/2000/04/28/feature/xhtml_rev.ht
ml?page=1
[8] HTML Tidy, Dave Raggett
http://www.w3.org/People/Raggett/tidy/
[9] W3C HTML Validation Service
http://validator.w3.org/
[10] Mozquito Factory
http://www.mozquito.org/factory/index.html
[11] HTML-Kit
http://www.chami.com/html-kit/
[12] The Emperor Has New Clothes : HTML Recast As An XML
Application, Pankaj Kamthan, Department of Computer Science,
Concordia University, Montreal
http://indy.cs.concordia.ca/kamthan/publ/html-xml/index.html
[13] W3C Testimonials for XHTML 1.0
http://www.w3.org/2000/01/xhtml-test.html
[14] W3C Press Release, 26 January 2000
http://www.w3.org/2000/01/xhtml-pressrelease.html.en2
|
|