|
|
The IMesh Toolkit
[ Work In Hand > Technology Review > Standards and
Protocols ]
Rich Site Summary (RSS)
|
Overall Purpose
|
There is an increasing trend towards
interdependency between websites and to integrating content
originating elsewhere on the Internet. Effective integration
often demands considerable effort on the part of the information
provider and the recipient too. There is a business model for
such syndication on the Web which is not dissimilar to that of
the traditional media syndicates, reflected in specifications
such as the Internet Content Exchange, (ICE) [1] and in companies
like iSyndicate.com. Whilst commentators point to the paucity of
XML applications currently[2], RSS is proving very useful in
furnishing an open-ended syndication model that is quite
different. [3] As an application of XML, RSS is in essence a file
format using XML that is easily created. But surrounding it are
emerging tools and services that will allow it to compete with
commercial content-sharing systems such as those mentioned above.
[2]
RSS, known either as Rich Site Summary or RDF Rich Site
Summary, depending on the version you are reading, is a
lightweight multipurpose extensible metadata description and
syndication format.[4] It is becoming a widely known XML
application conformant with W3C's RDF specification. As a minimum
definition, an RSS summary may be termed a document that
describes a channel comprising items that may be retrieved by
their URL. In turn each item decomposes into title, link and
description. In the short time that RSS has been evolving, its
purposes have begun to alter and for that reason it may be of
interest to the project.
Netscape was responsible for bringing version 0.9 into the
world with the purpose of providing website producers with a
wider audience by placing their content on the My Netscape
Network, albeit in an abbreviated or summarised form. As a
by-product of the MNN effort there evolved the use of RSS as an
XML-based lightweight syndication format, often more practical
than comparatively heavyweight standards like the Information and
Content Exchange (ICE) protocol. RSS has found favour carrying a
variety of other content types, for example, discussion fora,
software announcements and even proprietary data. With the
introduction of an item-level description element in version
0.91, RSS moved firmly into the field of lightweight content
syndication. RSS derives much of its current success from the
fact that it is a simple XML document. Netscape MNN has issued a
statement to the effect that whilst initially and deliberately
restricting the degree of complexity of the RSS format, it
nonetheless planned enhancements.[5] It hoped to work with other
bodies to enhance the format by adding further tags, at the same
time rendering it further compliant with W3C standards on XML and
RDF. It envisaged a richer metadata description, greater search
and personalisation capabilites, sub-channels, related channel
information, keyword-based data collection and PICS ratings, to
name some of their ideas. However, despite being the originators
of RSS, MNN has reduced its effort on RSS, whereas others like
David Winer of Userland.com have picked up the baton.
The aim of RSS version 1.0 is to provide a lightweight
extensible syndication and metadata description format, based on
XML, and capable of use in a variety of areas, whilst being
backwardly compatible with version 0.9. (Backward compatibility
operates on the basis of a stipulation that parsers, modules and
libraries ignore any elements that they were not designed to
recognise). Version 1.0 achieves greater extensibility than
version 0.91 through compliance with XML Namespaces and RDF. The
use of XML Namespace-based modules permits RSS to be extended
without repeated modifications to the core RSS specification,
without needless consultation, naming collisions nor the danger
of overloading RSS with a large number of elements that are
unlikely to gain wide acceptance.
|
Brief Overview of Functionality
|
Like any XML file, the RSS file holds in
succession the standard XML declaration, a document type
declaration indicating the document type definition followed by
the root element. The latter contains the <channel> element
which holds four main types of information: an optional channel
image, a maximum 15 channels, (it is possible to exceed this
number but this would not be compliant with the My Netscape
Network specification),[3] and an optional form input box, (the
form input box allows the user feedback and any HTML form action
limited to a single text field and submit button). Channel
information defines such elements as title, description, channel
URL, contact details, etc. and the optional small channel image
gives the image URL, title, link and dimensions parameters. [2]
Content providers like Slashdot, Wired News and Linux Today
are now using RSS to promote their material. [3] Aggregators such
as my.userland.com gather multiple RSS channels using its
Frontier software [6] into one location called an aggregation.
They also aggregate feeds; these send new content to partners by
using XML-Remote Procedure Calling function calls and through
static XML files generated hourly [6]. Aggregators are useful not
only for their proliferation of feeds but they are likely to
offer tools to allow patterns to customise feeds and reduce their
integration effort, and to permit content providers to syndicate
their information more easily.[3]
RSS continued to be subject to new functions being found for
it and to overloading of the title and description elements at
channel- and item-level with metadata and HTML. Some producers
were ignoring any official constraints and creating their own
elements outside any standard in order to improve upon RSS's
rather thin metadata facilities. [4] As it has proven useful and
practical, RSS has been employed in ways for which it was not
designed and users wishing to employ it in content metadata
propagation have begun to recognise its current limitations. For
example, version 0.91, confined by the restrictions of its
document type definition, cannot provide even a small symbol for
each headline nor the author of an article, matters that some
website maintainers would regard as standard. In the absence of
Netscape development of the format, the aim of 1.0 is to provide
solutions to some of the flaws as well as provide an extensible
framework. [7] Therefore the re-inclusion of RDF in RSS 1.0 means
that headline writers ought to be able to include symbols to
highlight their news, and which those parsers that do not
recognise such symbols can ignore without mishap, something which
was not possible in straightforward XML parsing.
The solutions available to deal with this problem came down to
two quite differing approaches. One possibility was just to add
more simple elements to the core RSS specification. However,
although this represented an apparently straightforward solution,
there were possible drawbacks since alterations would have to be
made to the core format and, in the long run, this approach would
create problems in assuring scalability.
The decision by authors of the RSS 1.0 specification was to
opt for the modularisation of RSS through the use of XML
Namespaces to partition vocabularies. The task in hand could then
be matched to the set of modules deemed most appropriate to its
execution. This location of specific functionality into pluggable
modules meant that no alteration to the core format would be
required. Version 1.0 offers two modules, Dublin Core and
Syndication [8].
Nonetheless the pressure is still on for RSS to furnish a
richer representation of the relationships between elements, for
example threaded discussions. But at least RSS v.0.9 provided a
simple RDF base upon which a more complex framework could be
layered. Version 1.0, having re-adopted the fuller RDF framework,
is expected to provide for richer metadata modelling, i.e. a more
detailed representation of the relationships existing between and
within channel elements that some advanced RSS applications will
require.
The use of XML Namespace-based modules permits RSS to be
extended without repeated modifications to the core RSS
specification. Modules should reside in their own XML Namespaces,
compartmentalising functionality by the task that functionality
must perform. The version 1.0 specification offers Dublin Core
and Syndication modules. The latter's chief elements are those
that give instructions about the system update, i.e. how
frequently the RSS feed is revisited by aggregators. Therefore
the syndication module can, theoretically, set its frequency from
hourly to even yearly (!) intervals, as opposed to v.0.91's hour
and day intervals.[9] There is no obligation on other modules to
adopt the full RDF framework, but it is sensible to develop
modules in such a way that they do not obstruct RDF parsers. (A
practical precaution therefore is to place "parseType="Literal"
"in any element containing XML which is not meant to be
interpreted).
A reassuring feature in the area of deciding whether to opt
for a rich or a simple content model is the opportunity RSS 1.0
provides to improve on a simple model with a secondary module or
by turning a flat structure of elements into a taxonomy
module.
With regard to tools for RSS, several are in development or
newly developed, such as Eisenzopf's XML::RSS module which
underpins many applications generating or processing RSS. [7]
This module simplifies the maintenance of RSS files by
abstracting XML syntax into a number of class methods and by
applying a method for each of the RSS elements, for example,
passing the channel method an associative array in which the
names and values of each channel sub-element can be stored.[3]
The advantage of using XML::RSS is that it creates valid and
well-formed data, an important consideration with XML[10].
Validators and version converters are anticipated [7]. (Older RSS
formats can be converted to RSS 1.0 via eXstensible Style
Language Transformations (XSLT) Stylesheets)[11]. Portal toolkits
such as Apache Jetspeed now incorporate support for RSS; it is
easy to import headline material into a portal. [7] RSSMaker,
recently renamed RSS Channel Editor, is a Web based tool that
makes it easy to create and maintain RSS files. The program
visits sites every couple of hours and obtains the latest
headlines and produces an RSS file for viewing. It can also
simply copy RSS files from other sites. It includes most of the
RSS 0.91 channel elements. It also includes the ability to save
the RSS to the Web server's file system instead of prompting one
to download it via a Web browser. [11]
|
Deployment
|
For the purposes of metadata propagation,
RSS has begun to show its metal when compared with more common
approaches such as proprietary API's, fetching and parsing HTML
and database dumps. Whilst superior to grabbing and parsing HTML,
API's can prove problematic even while granting partners data
access. They are language-dependent and are not extensible which
means users are limited to the data and functionalities they
orginally offer. In organisational terms they also present
difficulties in staffing as they require competencies and
expertise long after their original authors may have departed.
Collecting and parsing HTML from sites remains the most common
method of sharing data. However such a cut-and-paste method
requires the development and maintenance of an application for
each source of data; this will prove increasingly impractical as
sites and indeed types of HTML presentation proliferate. It is
possible for websites to exchange data using database dumps.
However this requires the conversion of data at both ends and
does not necessarily solve the problem of multiple data formats.
Whilst content providers fail to adopt a common data model for
such information exchange, database dumps do not represent a
straightforward approach.
In the RSS approach, each site publishes an RSS file which
describes the contents of its "channel". Other sites can
subscribe to and acquire its contents. The RSS file can be
converted to HTML and displayed on a subscriber's site. Unlike
approaches such as HTML parsing and proprietary API's, once the
system is developed for one channel, one can subscribe to as many
as one wishes.[3]
The RSS 1.0 proposal appears to promote widespread application
of a successful metadata format. It does not impose authorisation
of new and specialised tags. It can be expected to appeal to a
wider community than at present. It provides a solid framework on
which to build metadata distribution. [7] One commentator regards
RSS as currently benefiting from a virtuous circle : as
developers build more infrastructure around it, the more
publishers will post material as more of their material, thanks
to RSS, is read. [12] Whilst Netscape remains the de facto
maintainer of the RSS specification, one can anticipate
developers seeking to extend it beyond current applications [6].
RSS is easy to use as a generic format for content exchange over
the Web. One of its perceived strengths is its simplicity which
renders it useful for practically any type of material.[2] It
will attract more web sites even where Web masters are not
steeped in, or convinced by, XML.[3] RSS can be recommended as a
format for exchanging content between sites since it is already a
standard and is far superior to screen-scraping programs.[6]
Nonetheless RSS is limited to content headlines in syndication
terms. ICE and XMLNews are more appropriate for larger
syndication operations. [3]
|
Related Standards
|
| The principal options appear to be either
Internet Content Exchange (ICE) and XMLNews. |
Relevance to IMesh context
|
| Whilst there has not been unanimous
optimism for the future of RSS specification 1.0, [13], it is
evident that the intention of the authors to bring RSS back to
the use of metadata after its de-prioritisation in version 0.91
can only be a welcome development for communities promoting
metadata propagation [14]. Given the inclusion in the IMesh
Toolkit architecture of the option of news alerts and news
channels, RSS is very worthy of consideration. Provided the
volume involved does not go beyond the bounds of its
capabilities, RSS may well present as a simple and practical
option for the project. The degree of interest being generated in
RSS is promising. Most of all , the extensibility of RSS through
modularisation is a key benefit which may well avoid the fate of
many technologies, namely "running out of road" when widespread
development and adoption begins to occur. |
References
|
[1] The Information and Content Exchange
(ICE) Protocol, W3C Note 26 October 1998
http://www.w3.org/TR/1998/NOTE-ice-19981026
[2] "RSS Delivers the XML Promise", Peter Wiggin,, Songline
Studios, publishers of Web Review Oct., 1999
http://www.webreview.com/1999/10_29/webauthors/10_29_99_2a.shtml
[3] Making Headlines with RSS : Using Rich Site Summaries To
Draw New Visitors, Jonathan Eisenzopf, Webtechniques, February,
2000
http://www.webtechniques.com/archives/2000/02/eisenzopf/
[4] RSS 1.0 Specification, Release Candidate 1, Authors: The
members of the RSS-DEV Working Group: November, 2000, draft
1.2
http://www.egroups.com/files/rss-dev/specification.html#
[5] MNN Future Directions, Netscape Netcenter My Netscape
Network,1999
http://my.netscape.com/publish/help/futures.html
[6] "Why Would You Use RSS?" by Peter Wiggin, Oct. 29,
1999
http://www.webreview.com/1999/10_29/webauthors/10_29_99_2b.shtml
[7] "RSS Moves Forward", Edd Dumbill, Edd Dumbill's Weblog, ,
August 2000
http://edd.oreillynet.com/discuss/msgReader$100
[8] RSS 1.0 Modules, Authors The members of the RSS-DEV
Working Group: November 2000, draft 1.2
http://www.egroups.com/files/rss-dev/Modules/modules.html
[9] RSS 1.0 Modules: Syndication, Authors The members of the
RSS-DEV Working Group: November, 2000, draft 1.2
"http://www.egroups.com/files/rss-dev/Modules/Standard/mod_syndication.h
tml
[10] "RSS and You", Chris Nandor, Jan., 2000
http://www.perl.com/pub/2000/01/rss.html
[11] Oasis :The XML Cover Pages: "RDF Rich Site Summary
(RSS)", Robin Cover, October, 2000
http://www.oasis-open.org/cover/rss.html
[12] "What is RSS?" James Carlyle, XML Tree RSS
Resources,1999
http://www.xmltree.com/rss/
[13] 2000 XML News, 22 August 2000, Elliotte Rusty
Harold
http://www.ibiblio.org/xml/news2000.html
[14] "Next Generation of RSS Metdata Format", Edd Dumbill , in
"xmlhack", August, 2000
http://xmlhack.com/read.php?item=708
|
|