Back to IMesh Toolkit Home Page
Back to IMesh Toolkit Homepage
Subject Gateway Requirements
Technology Review
Work In Hand
  Personalization
Annotation
Reading Lists
OAI  Normalization tools
Metadata Exchange
RDF queries
Evaluation
Dissemination
Project Documentation
Related Links
Project Partners
IMesh Home Page

The IMesh Toolkit

[ Work In Hand > Technology Review > Standards and Protocols ]

Rich Site Summary (RSS)

Overall Purpose

There is an increasing trend towards interdependency between websites and to integrating content originating elsewhere on the Internet. Effective integration often demands considerable effort on the part of the information provider and the recipient too. There is a business model for such syndication on the Web which is not dissimilar to that of the traditional media syndicates, reflected in specifications such as the Internet Content Exchange, (ICE) [1] and in companies like iSyndicate.com. Whilst commentators point to the paucity of XML applications currently[2], RSS is proving very useful in furnishing an open-ended syndication model that is quite different. [3] As an application of XML, RSS is in essence a file format using XML that is easily created. But surrounding it are emerging tools and services that will allow it to compete with commercial content-sharing systems such as those mentioned above. [2]

RSS, known either as Rich Site Summary or RDF Rich Site Summary, depending on the version you are reading, is a lightweight multipurpose extensible metadata description and syndication format.[4] It is becoming a widely known XML application conformant with W3C's RDF specification. As a minimum definition, an RSS summary may be termed a document that describes a channel comprising items that may be retrieved by their URL. In turn each item decomposes into title, link and description. In the short time that RSS has been evolving, its purposes have begun to alter and for that reason it may be of interest to the project.

Netscape was responsible for bringing version 0.9 into the world with the purpose of providing website producers with a wider audience by placing their content on the My Netscape Network, albeit in an abbreviated or summarised form. As a by-product of the MNN effort there evolved the use of RSS as an XML-based lightweight syndication format, often more practical than comparatively heavyweight standards like the Information and Content Exchange (ICE) protocol. RSS has found favour carrying a variety of other content types, for example, discussion fora, software announcements and even proprietary data. With the introduction of an item-level description element in version 0.91, RSS moved firmly into the field of lightweight content syndication. RSS derives much of its current success from the fact that it is a simple XML document. Netscape MNN has issued a statement to the effect that whilst initially and deliberately restricting the degree of complexity of the RSS format, it nonetheless planned enhancements.[5] It hoped to work with other bodies to enhance the format by adding further tags, at the same time rendering it further compliant with W3C standards on XML and RDF. It envisaged a richer metadata description, greater search and personalisation capabilites, sub-channels, related channel information, keyword-based data collection and PICS ratings, to name some of their ideas. However, despite being the originators of RSS, MNN has reduced its effort on RSS, whereas others like David Winer of Userland.com have picked up the baton.

The aim of RSS version 1.0 is to provide a lightweight extensible syndication and metadata description format, based on XML, and capable of use in a variety of areas, whilst being backwardly compatible with version 0.9. (Backward compatibility operates on the basis of a stipulation that parsers, modules and libraries ignore any elements that they were not designed to recognise). Version 1.0 achieves greater extensibility than version 0.91 through compliance with XML Namespaces and RDF. The use of XML Namespace-based modules permits RSS to be extended without repeated modifications to the core RSS specification, without needless consultation, naming collisions nor the danger of overloading RSS with a large number of elements that are unlikely to gain wide acceptance.

Brief Overview of Functionality

Like any XML file, the RSS file holds in succession the standard XML declaration, a document type declaration indicating the document type definition followed by the root element. The latter contains the <channel> element which holds four main types of information: an optional channel image, a maximum 15 channels, (it is possible to exceed this number but this would not be compliant with the My Netscape Network specification),[3] and an optional form input box, (the form input box allows the user feedback and any HTML form action limited to a single text field and submit button). Channel information defines such elements as title, description, channel URL, contact details, etc. and the optional small channel image gives the image URL, title, link and dimensions parameters. [2]

Content providers like Slashdot, Wired News and Linux Today are now using RSS to promote their material. [3] Aggregators such as my.userland.com gather multiple RSS channels using its Frontier software [6] into one location called an aggregation. They also aggregate feeds; these send new content to partners by using XML-Remote Procedure Calling function calls and through static XML files generated hourly [6]. Aggregators are useful not only for their proliferation of feeds but they are likely to offer tools to allow patterns to customise feeds and reduce their integration effort, and to permit content providers to syndicate their information more easily.[3]

RSS continued to be subject to new functions being found for it and to overloading of the title and description elements at channel- and item-level with metadata and HTML. Some producers were ignoring any official constraints and creating their own elements outside any standard in order to improve upon RSS's rather thin metadata facilities. [4] As it has proven useful and practical, RSS has been employed in ways for which it was not designed and users wishing to employ it in content metadata propagation have begun to recognise its current limitations. For example, version 0.91, confined by the restrictions of its document type definition, cannot provide even a small symbol for each headline nor the author of an article, matters that some website maintainers would regard as standard. In the absence of Netscape development of the format, the aim of 1.0 is to provide solutions to some of the flaws as well as provide an extensible framework. [7] Therefore the re-inclusion of RDF in RSS 1.0 means that headline writers ought to be able to include symbols to highlight their news, and which those parsers that do not recognise such symbols can ignore without mishap, something which was not possible in straightforward XML parsing.

The solutions available to deal with this problem came down to two quite differing approaches. One possibility was just to add more simple elements to the core RSS specification. However, although this represented an apparently straightforward solution, there were possible drawbacks since alterations would have to be made to the core format and, in the long run, this approach would create problems in assuring scalability.

The decision by authors of the RSS 1.0 specification was to opt for the modularisation of RSS through the use of XML Namespaces to partition vocabularies. The task in hand could then be matched to the set of modules deemed most appropriate to its execution. This location of specific functionality into pluggable modules meant that no alteration to the core format would be required. Version 1.0 offers two modules, Dublin Core and Syndication [8].

Nonetheless the pressure is still on for RSS to furnish a richer representation of the relationships between elements, for example threaded discussions. But at least RSS v.0.9 provided a simple RDF base upon which a more complex framework could be layered. Version 1.0, having re-adopted the fuller RDF framework, is expected to provide for richer metadata modelling, i.e. a more detailed representation of the relationships existing between and within channel elements that some advanced RSS applications will require.

The use of XML Namespace-based modules permits RSS to be extended without repeated modifications to the core RSS specification. Modules should reside in their own XML Namespaces, compartmentalising functionality by the task that functionality must perform. The version 1.0 specification offers Dublin Core and Syndication modules. The latter's chief elements are those that give instructions about the system update, i.e. how frequently the RSS feed is revisited by aggregators. Therefore the syndication module can, theoretically, set its frequency from hourly to even yearly (!) intervals, as opposed to v.0.91's hour and day intervals.[9] There is no obligation on other modules to adopt the full RDF framework, but it is sensible to develop modules in such a way that they do not obstruct RDF parsers. (A practical precaution therefore is to place "parseType="Literal" "in any element containing XML which is not meant to be interpreted).

A reassuring feature in the area of deciding whether to opt for a rich or a simple content model is the opportunity RSS 1.0 provides to improve on a simple model with a secondary module or by turning a flat structure of elements into a taxonomy module.

With regard to tools for RSS, several are in development or newly developed, such as Eisenzopf's XML::RSS module which underpins many applications generating or processing RSS. [7] This module simplifies the maintenance of RSS files by abstracting XML syntax into a number of class methods and by applying a method for each of the RSS elements, for example, passing the channel method an associative array in which the names and values of each channel sub-element can be stored.[3] The advantage of using XML::RSS is that it creates valid and well-formed data, an important consideration with XML[10]. Validators and version converters are anticipated [7]. (Older RSS formats can be converted to RSS 1.0 via eXstensible Style Language Transformations (XSLT) Stylesheets)[11]. Portal toolkits such as Apache Jetspeed now incorporate support for RSS; it is easy to import headline material into a portal. [7] RSSMaker, recently renamed RSS Channel Editor, is a Web based tool that makes it easy to create and maintain RSS files. The program visits sites every couple of hours and obtains the latest headlines and produces an RSS file for viewing. It can also simply copy RSS files from other sites. It includes most of the RSS 0.91 channel elements. It also includes the ability to save the RSS to the Web server's file system instead of prompting one to download it via a Web browser. [11]

Deployment

For the purposes of metadata propagation, RSS has begun to show its metal when compared with more common approaches such as proprietary API's, fetching and parsing HTML and database dumps. Whilst superior to grabbing and parsing HTML, API's can prove problematic even while granting partners data access. They are language-dependent and are not extensible which means users are limited to the data and functionalities they orginally offer. In organisational terms they also present difficulties in staffing as they require competencies and expertise long after their original authors may have departed.

Collecting and parsing HTML from sites remains the most common method of sharing data. However such a cut-and-paste method requires the development and maintenance of an application for each source of data; this will prove increasingly impractical as sites and indeed types of HTML presentation proliferate. It is possible for websites to exchange data using database dumps. However this requires the conversion of data at both ends and does not necessarily solve the problem of multiple data formats. Whilst content providers fail to adopt a common data model for such information exchange, database dumps do not represent a straightforward approach.

In the RSS approach, each site publishes an RSS file which describes the contents of its "channel". Other sites can subscribe to and acquire its contents. The RSS file can be converted to HTML and displayed on a subscriber's site. Unlike approaches such as HTML parsing and proprietary API's, once the system is developed for one channel, one can subscribe to as many as one wishes.[3]

The RSS 1.0 proposal appears to promote widespread application of a successful metadata format. It does not impose authorisation of new and specialised tags. It can be expected to appeal to a wider community than at present. It provides a solid framework on which to build metadata distribution. [7] One commentator regards RSS as currently benefiting from a virtuous circle : as developers build more infrastructure around it, the more publishers will post material as more of their material, thanks to RSS, is read. [12] Whilst Netscape remains the de facto maintainer of the RSS specification, one can anticipate developers seeking to extend it beyond current applications [6]. RSS is easy to use as a generic format for content exchange over the Web. One of its perceived strengths is its simplicity which renders it useful for practically any type of material.[2] It will attract more web sites even where Web masters are not steeped in, or convinced by, XML.[3] RSS can be recommended as a format for exchanging content between sites since it is already a standard and is far superior to screen-scraping programs.[6] Nonetheless RSS is limited to content headlines in syndication terms. ICE and XMLNews are more appropriate for larger syndication operations. [3]

Related Standards

The principal options appear to be either Internet Content Exchange (ICE) and XMLNews.

Relevance to IMesh context

Whilst there has not been unanimous optimism for the future of RSS specification 1.0, [13], it is evident that the intention of the authors to bring RSS back to the use of metadata after its de-prioritisation in version 0.91 can only be a welcome development for communities promoting metadata propagation [14]. Given the inclusion in the IMesh Toolkit architecture of the option of news alerts and news channels, RSS is very worthy of consideration. Provided the volume involved does not go beyond the bounds of its capabilities, RSS may well present as a simple and practical option for the project. The degree of interest being generated in RSS is promising. Most of all , the extensibility of RSS through modularisation is a key benefit which may well avoid the fate of many technologies, namely "running out of road" when widespread development and adoption begins to occur.

References

[1] The Information and Content Exchange (ICE) Protocol, W3C Note 26 October 1998
http://www.w3.org/TR/1998/NOTE-ice-19981026

[2] "RSS Delivers the XML Promise", Peter Wiggin,, Songline Studios, publishers of Web Review Oct., 1999
http://www.webreview.com/1999/10_29/webauthors/10_29_99_2a.shtml

[3] Making Headlines with RSS : Using Rich Site Summaries To Draw New Visitors, Jonathan Eisenzopf, Webtechniques, February, 2000
http://www.webtechniques.com/archives/2000/02/eisenzopf/

[4] RSS 1.0 Specification, Release Candidate 1, Authors: The members of the RSS-DEV Working Group: November, 2000, draft 1.2
http://www.egroups.com/files/rss-dev/specification.html#

[5] MNN Future Directions, Netscape Netcenter My Netscape Network,1999
http://my.netscape.com/publish/help/futures.html

[6] "Why Would You Use RSS?" by Peter Wiggin, Oct. 29, 1999
http://www.webreview.com/1999/10_29/webauthors/10_29_99_2b.shtml

[7] "RSS Moves Forward", Edd Dumbill, Edd Dumbill's Weblog, , August 2000
http://edd.oreillynet.com/discuss/msgReader$100

[8] RSS 1.0 Modules, Authors The members of the RSS-DEV Working Group: November 2000, draft 1.2
http://www.egroups.com/files/rss-dev/Modules/modules.html

[9] RSS 1.0 Modules: Syndication, Authors The members of the RSS-DEV Working Group: November, 2000, draft 1.2
"http://www.egroups.com/files/rss-dev/Modules/Standard/mod_syndication.h tml

[10] "RSS and You", Chris Nandor, Jan., 2000
http://www.perl.com/pub/2000/01/rss.html

[11] Oasis :The XML Cover Pages: "RDF Rich Site Summary (RSS)", Robin Cover, October, 2000
http://www.oasis-open.org/cover/rss.html

[12] "What is RSS?" James Carlyle, XML Tree RSS Resources,1999
http://www.xmltree.com/rss/

[13] 2000 XML News, 22 August 2000, Elliotte Rusty Harold
http://www.ibiblio.org/xml/news2000.html

[14] "Next Generation of RSS Metdata Format", Edd Dumbill , in "xmlhack", August, 2000
http://xmlhack.com/read.php?item=708

Other Standards and Protocols

CIP DC LDAP OAI
RDF RSS SDLIP SOAP
WHOIS++ XHTML XML Z39.50