Back to IMesh Toolkit Home Page
Back to IMesh Toolkit Homepage
Subject Gateway Requirements
Technology Review
Work In Hand
  Personalization
Annotation
Reading Lists
OAI  Normalization tools
Metadata Exchange
RDF queries
Evaluation
Dissemination
Project Documentation
Related Links
Project Partners
IMesh Home Page

The IMesh Toolkit

[ Work In Hand > Technology Review > Standards and Protocols ]

XHTML

Overall Purpose

Having gone through several stages of evolution, HTML achieved version 4.0 Recommended status in December 1997. By 1999 HTML 4 had been recast in XML. The resultant XHTML 1.0 is now a W3C recommendation also. Whereupon work on modularising XHTML was carried forward. XHTML can be implemented on existing browsers with a small number of changes from HTML 4.0 syntax. W3C's intention, in championing XHTML, is to promote stricter standards and clearer code to provide richer web pages over an ever-widening variety of browser platforms [2].

XHTML is in effect a reformulation of HTML 4.0 as an application of XML, its elements and attributes being almost identical to HTML [1]. So XHTML is XML-based and is ultimately designed to work with XML-based user agents. In terms of its origins, XHMTL may be viewed as the steadying influence upon the increasingly uncontrolled HTML which in itself evolved as a simpler subset of structural and semantic tags from the rich and flexible, but for many over-complex, Standard Generalised Markup Language (SGML). As HTML has developed so rapidly since 1990 to meet deeper needs than existed for its original purpose, the resulting plethora of new elements has occasioned difficulties with compatibility [3].

XHTML 1.0 can be seen as a solution to the increasing problems growing up around HTML which is becoming less appropriate for use on new platforms. The rationale for XHTML is its ability to accommodate new or additional element attributes that are permitted under XML. XHTML is able to accommodate such extensions through modularisation. XHTML modules will therefore permit developers to combine new feature sets with existing ones [3].

Bearing in mind the expectation of a far greater variety of platforms for viewing Internet documents than at present, the XHTML "family" will be able to offer user agent interoperability using features such as user agent and document profiling mechanisms [3]. So XHTML 1.0 may be regarded as a bridge provided for web developers who are keen to promote a modular and extensible web based on XML whilst maintaining compatibility with HTML 4.0 browsers.[1]

Brief Overview of Functionality

The chief benefit that derives from using XHTML is its extensibility. HTML, in order to accept the addition of a new group elements would have to subject its DTD (Document Type Definition) to alteration. XHTML as an application of XML, a simplified subset of SGML, facilitates the development and integration of new elements. There is an increasingly important issue of portability with regard to HTML. Up until now, the increased computing power of desktops has mitigated against the overloaded and ill- structured nature of much of HTML code that abounds. However, it is predicted that in the next few years the overwhelming majority of devices accessing the Internet, for example palmtops, WAP 'phones, etc., will not have the power of modern PC desktops capable of accommodating HTML's failings [1]. The modular design of XHTML reflects the realisation that a one-size-fits-all approach will no longer work on the Web where browsers vary enormously in their capabilities. For example, a browser in a PDA (Personal Digital Assistant) cannot offer the same experience as a high-end multimedia desktop computer because of the differences in the screens and memory [12].

It is possible for anyone to extend XHTML; authors can define new tags and attributes and are able to embed content and programming in a Web page, combining HTML 4.0 elements with those from other XML languages [1]. The criteria for conformance or the rules for creating one's own XHTML Family have been published recently [5][6].

One tool that is already proving popular, HTML Tidy by Dave Raggett [8], which cleans up HTML and converts it to XHTML, was hosted by W3C HTML Activity [2]. Furthermore a validation tool [9] is also available from W3C. This allows users to validate a document against the document type definition. It has a number of options and may be considered as one of the more reliable ones available. It is also possible to link webpages to it via http://validator.w3.org/check/referer [7]. Another possibility is Mozquito Factory [10].

Furthermore, HTML-Kit [11] is a freely available program for Windows 9x/NT, designed to help HTML authors to edit, format, validate, preview and publish documents on the Web. It includes a customizable GUI to HTML Tidy for converting documents from HTML to XHTML 1.0. Among other possibilities of viewing, it provides a view with split windows, one with the original markup and the other with the markup after transformation. Errors and suggestions for improving the markup are also reported in a separate window [12].

With regard to document conformance, it is possible to use XHTML with other name spaces, for example, in order to include metadata expressed in say, RDF in an XHTML document. Whilst such documents are not strictly conformant with the XHTML 1.0 specification, W3C is undertaking work to specify conformance for documents employing multiple namespaces, bringing this functionality into compliance with strict XHTML [7]. XHTML document authors are advised to use XML declarations in all their documents [3].

A key feature of XHTML will be modularisation. XHTML can be arranged into a series of modules organised by different areas of markup such as paragraphs, hypertext links, images and so forth. Therefore each module contains a number of tags, for example, the Text Module will include tags h1 to h6, pre, em, etc. In this way modules are able to provide a means for sub-setting and extending HTML and hence its sphere of influence over non-PC platforms [2].

XHTML 1.0 specifies three XML document types which correspond to the three HTML 4.0 document type definitions, namely Strict, Transitional and Frameset. The former promotes really tidy markup, unhindered by presentational clutter and is to be used with Cascading Style Sheets. Conversely Transitional allows the user to take advantage of HTML's presentational features to benefit readers whose browsers do not support CSS's. Finally Frameset permits the partition of the browser into two or more frames. W3C intends to deprecate DTD's in XML in favour of XML Schemas [1].

Deployment

Despite its ability to accommodate HTML 4.0, it should be remembered that XHTML is nonetheless an XML application and that it will not tolerate certain loose practices such as overlapping markup. XML insists on well-formed markup which basically means properly nested and closed markup [3]. However the demands that that represents are not too onerous: elements must be nested properly and element attribute names should be in lower case. Non-empty elements must have an end tag unless declared EMPTY in the document type definition. All attribute values must be quoted, even numerical ones, and all attribute-value pairs should be written in full, i.e. no minimalisation as in <dl compact>,(in other words, <dl compact = "compact"> is required). Finally empty elements require an end tag or a terminated start tag, e.g.<br/>[3].

Nonetheless, provided the XHTML code is well-formed, it is not complicated to add a new set of elements to an existing document type definition, provided they are internally consistent. As a consequence the development and integration of new element collections in XHTML is far easier [1]. Moreover the benefit of extensibility comes without any obligation to support the entire language [7]. With XHTML 1.0, via XML Namespaces, elements from other XML vocabularies can be added without altering the entire DTD that the document is based on. As XHTML is modular, it can be used in conjunction with other XML applications such as Mathematical Markup Language (MathML), Scalable Vector Graphics (SVG), Resource Description Framework (RDF), among others [12]. Note that whereas in SGML a DTD is able to exclude specific elements from being contained within an element, no such exclusions are permissible in XML [3].

XHTML support for document profiles is also a benefit . A document profile permits the specification of syntax and semantics for a set of documents and also the facilities necessary to process documents of that set, facilities such as levels of scripting, style sheet support, etc. Conformance to such profiles promotes interoperability as well as allowing groups with common aims to build a profile comprising standard HTML elements plus those elements useful and specific to that group as an extension [3].

However one difficulty identified relates to fragment identifiers: the XHTML 1.0 Specification admits that the constraint attached to the attribute NMTOKEN cannot be expressed in XHTML 1.0 DTD's. as a consequence there may be problems when converting existing HTML documents. Equally Cascading Style Sheets, which define style properties, are applied to the parse tree of XHTML and XML documents. However differences in parsing will produce different results according to the selectors used, though the specification does offer suggestions to mitigate these effects [3].

A further possible disadvantage may arise with certain editors: XHTML's insistence on developers quoting attribute values, even numeric ones, may prove irritating for those whose HTML editors remove quotes from numeric values, despite whatever their documentation might claim to the contrary [7].

There is a body of positive industry support for XHTML including testimonials [13] as published by W3C as well as the endorsement by the latter's Director, Tim Berners-Lee : "XHTML 1.0 connects the present Web to the future Web. It provides the bridge to page and site authors for entering the structured data, XML world, while still being able to maintain operability with user agents that support HTML 4." [14]

Related Standards

SGML (Standard Generalised Markup Language) is a meta-language for describing markup languages, particularly those used in electronic document exchange, document management, and document publishing. It is both feature-rich and flexible. This flexibility, however, comes with a level of complexity that has inhibited its adoption on the Web.

HTML is an SGML application, and is widely regarded as the standard publishing language of the Web. HTML was originally conceived as a language for the exchange of scientific and technical documents, suitable for use by non-document specialists. HTML addressed the problem of SGML complexity by specifying a small set of structural and semantic tags suitable for authoring relatively simple documents. In addition to simplifying the document structure, HTML added support for hypertext. Later, multimedia capabilities were added [12].

Whilst there can be no denying the success of HTML since its inception in 1990, it no longer proves a reliable basis for the deployment of complex web-based applications on the Internet. Indeed it has long outgrown its original rationale and developed into something of an unwieldy monster that some browsers and spiders find increasingly difficult to handle. HTML lacks the structured set of rules inherent in XML with which to define properly all data placed on the Web. It is not possible under HTML , unlike XML, to introduce a new set of markup to suit a particular purpose [1].

However XML, at base, is so important because it delivers a standardised markup capable of separating layout code from syntax, thereby making document creation and parsing much easier [7]. However the difficulty still remained what was to be done about the very large body of HTML Web documents that ought ideally to migrate to XML; such a transition would pose problems. XHTML can be seen as the next generation language for Web documents that did not render such a large body of HTML documents obsolete [12]. XML [UKOLN XML review]

Relevance to IMesh context

In the future the definition of modules and a mechanism for combining them will permit the extension and sub-setting of XHTML in a uniform fashion. The use of sub-sets of XHTML through modularisation will prove important as a significant number of user agents no longer operate on PC's. For example, hand-held devices and cell-phones etc. will most likely only support subsets of XHTML elements. Such modularisation confers a number of benefits since not only does it provide a formal mechanism for sub-setting and modularising XHTML but also simplifies transformation between document types and promotes the re-use of modules in new document types [3].

It is anticipated that W3C work on new modules for XHTML, e.g. XHTML events and also work on better ways to handle compound documents will result in a specification for XHTML 2.0. Development of a CC/PP (Composite Capability/Preference Profiles, [4]) vocabulary for XHTML is also anticipated, thus making it practical to use CC/PP to specify which XHTML modules are supported by a device. [2]

Work on modularisation of XHTML will be boosted by activity on XML Schemas for XHTML. W3C envisages new features, related to areas such as synchronised Multimedia, privacy preferences, etc. which will give XHTML greater appeal [2].

XHTML may be viewed as a crossover between the point where HTML 4.0 "runs out of road", (particularly in the context of some current and indeed where future web-based applications are concerned) and where XML appears to have left its mark as far as many developers are concerned. Its offer of backward compatibility with HTML and its conformance with XML may represent for some its single most attractive quality. XHTML provides a bridge, serving as an application of XML with the purpose of expressing web pages. [1].

References

[1] Introduction to XHTML, with eXamples, Alan Richmond, Web Developers' Virtual Library, February, 2000
http://wdvl.com/Authoring/Languages/XML/XHTML/

[2] HyperText Markup Language Activity Statement, W3C, 2000
http://web4.w3.org/MarkUp/Activity.html#future

[3] XHTML 1.0 W3C Recommendation, January 2000
http://www.w3.org/TR/xhtml1/"> http://www.w3.org/TR/xhtml1/">http://www.w3.org/TR/xhtml1/

[4] Composite Capability/Preference Profiles (CC/PP): A user side framework for content negotiation, W3C Note, July 1999
http://www.w3.org/TR/NOTE-CCPP/

[5] Extending XHTML: Using Modularization and Schemas to Good Use in Current and Future XHTML/XML Documents, August 2000, Sean B. Palmer
http://xhtml.waptechinfo.com/extxhtml/#future

[6] Modularization of XHTML, W3C Candidate Recommendation, October 2000
http://www.w3.org/TR/xhtml-modularization/conformance.html#s_conform_na ming_rules

[7] XHTML: The Clean Code Solution, Peter Wiggin, Web Development Manager, O'Reilly Network, April, 2000
http://www.oreillynet.com/pub/a/network/2000/04/28/feature/xhtml_rev.ht ml?page=1

[8] HTML Tidy, Dave Raggett
http://www.w3.org/People/Raggett/tidy/

[9] W3C HTML Validation Service
http://validator.w3.org/

[10] Mozquito Factory
http://www.mozquito.org/factory/index.html

[11] HTML-Kit
http://www.chami.com/html-kit/

[12] The Emperor Has New Clothes : HTML Recast As An XML Application, Pankaj Kamthan, Department of Computer Science, Concordia University, Montreal
http://indy.cs.concordia.ca/kamthan/publ/html-xml/index.html

[13] W3C Testimonials for XHTML 1.0
http://www.w3.org/2000/01/xhtml-test.html

[14] W3C Press Release, 26 January 2000
http://www.w3.org/2000/01/xhtml-pressrelease.html.en2

Other Standards and Protocols

CIP DC LDAP OAI
RDF RSS SDLIP SOAP
WHOIS++ XHTML XML Z39.50