Back to IMesh Toolkit Home Page
Back to IMesh Toolkit Homepage
Subject Gateway Requirements
Technology Review
Work In Hand
  Personalization
Annotation
Reading Lists
OAI  Normalization tools
Metadata Exchange
RDF queries
Evaluation
Dissemination
Project Documentation
Related Links
Project Partners
IMesh Home Page

The IMesh Toolkit

[ Work In Hand > Architecture ]

General Architectural Overview of the IMesh Toolkit

[Comments within square brackets]

1.0 Introduction and Approach

1.1 Introduction to the IMesh Toolkit Project

....... and place of this deliverable within the project
This deliverable follows on from the Technical Review and the Subject Gateway Requirements Work Package.

This section (1.0 Introduction and Approach) sets the scene by describing the design philosophy used in the project, introduces some concepts such as components, architectural design and interoperability and exposes some of the issues dealt with in the project. Section 2.0 (General Architectural Overview) first describes the high level architecture and functionality that is being proposed. In Section 2.1 .......

1.2 Purpose and Rationale

The general architectural overview.....
shows how the architecture is subdivided into components
provides a common understanding of what services the architecture and its components deliver (for the partners)
assigns responsibilities to components
indicates how the components are expected to interact
explains how this architecture achieves interoperability
guides the reference implementation.

1.2.1 Scope

This deliverable provides a general architectural overview. Some sections of the overview will be elaborated, providing more details of the architecture and the functionality, in this or other work packages.

[Explain iteratvie approach, refinement/understanding of requirements]

The areas which are chosen for elaboration will be decided on:

  1. The outcome of the requirements workpackage, so that tools delivered by the project will meet the needs of subject gateways. The requirements will indicate not only which components are perceived as being most useful, but also what their functionality should be.
  2. Areas considered to provide core functionality (as opposed to added-value components). There seems to be consensus among partners that these central functions include at least the search-browse components.
  3. The areas of expertise of project partners. Emphasis will be made on those categories of tools which the partners have experience of delivering, based on their participation in other projects, (e.g. ROADS) as well as on the staff skills sets and experience (e.g. involvement in RDF development).
  4. The project timescales. The project will deliver reference implementations. It is desired that any such implementations are usable, workable tools rather than underdeveloped demonstrator services [REF.] Placing the emphasis on quality rather than quantity, more importance will be given to investigating smaller areas of functionality in greater depth, leading to a specification that can be used for software development. The functionality of the whole architecture will be covered in less detail leaving enough time to develop working implementations of those components which have been specified more fully.

1.3 Architectural Design Principles

[This section describes our component-based approach to building an architecture and how this architecture can provide interoperability - provides some explanation (mostly via definitions) of what components/architecture/interoperability are as background for the reader but also for us to help us understand what we want to achieve in this deliverable!]

General Architecture

Distributed vs. heterogenous (or federated) systems

Component-based design
Components evolve independantly but rely on each other to accomplish larger tasks. To achieve interoperability goal is for components to be able to call on one another efficiently and conveniently. [Paepcke - ACM 41(4)]

A software component can be defined as a nontrivial piece of software, a module, a package, or a subsystem that fulfills a clear function, has a clear boundary, and can be integrated into a well-defined architecture. Component-based frameworks
Component-based framework solutions are partial implementations specifying the nature and way to extend the framework with pluggable components. [Larsen, G, Component-Based Enterprise Frameworks, Communications of the ACM, 43 (10), October 2000.]

Each framework should provide extension-point specifications, the specifications with which components should comply and extend the fundamental functional scenarios. [Larsen, G, Component-Based Enterprise Frameworks, Communications of the ACM, 43 (10), October 2000.]

Designing, building, and testing component-based frameworks require considering several things including static models that illustrate component structure, dynamic models that illustrate component collaboration, as well as the technology to implement the component framework.....component-based frameworks build upon what you know - frameworks build upon experiences and technologies both past and present. [Larsen, G, Component-Based Enterprise Frameworks, Communications of the ACM, 43 (10), October 2000.]

Component-based architectures
A sytem's architecture encompasses the set of significant decisions about

  1. The organisation of a software system
  2. The selection of the structural elements and their interfaces by which the system is composed
  3. Their behaviour, as specified by collaborations among those elements
  4. The composition of these structural and behavioural elements into progressively larger subsystems
  5. The arhcitectural style that guides this organisation: these elements and their interfaces, their collaborations, and their composition.

1.3.2 Designing for Interoperability

The ultimate goal for systems that are collections of independently developed components, that rely on each other to accomplish larger tasks, is for the components to evolve independently yet be able to call on one another efficiently and conveniently (interoperability). [Paepcke et al ACM 41(4)].

Achieving interoperability: Interoperability solutions and solution classes . [from article by Paepcke et. al]

Strong Standards
This approach relies on the establishment of a standard that achieves a limited amount of homogeneity among heterogenous components. These standards are extablished in different ways:

  • a large and diverse enough community agree that a standard is needed e.g. Z39.50 for information retrieval.
  • One product gains enough market share that it becomes a de facto standard by virtue of its broad deployment e.g De facto standards may arise spontaneously if a standard is compelling and easy to deploy and fills an important need at the right time. e.g. HTML, HTTP, MIME
  • Government organisations help a standard gain wide acceptance
Advantages:
Diasadvantages:
Families of Standards
External Mediation
Locate the interoperabilty machinery outside the participating local systems. Mediation machinery mediates between components. Primary function is translation of formats and interaction modes. Examples: network gateways, mapping global Advantages: schemas to local ones. Specification-based interaction
Thorough description of semantics and structure of all data and operations. Mobile Functionality
Comparing interoperability solutions - relationship to design goals
Component Autonomy
Cost of infrastructure and entry.
Ease of Contributing Components
Ease of Use

1.3.3 Heterogeneity in subject-based information gateways

[Sources for this section: ......] This section explains where the differences are in subject gateways; will probably refer to studies/reports from other projects, published literature etc. or our own analysis of software or systems and feedback from the requirements review

1.3.4 Discussion of how to achieve interoperability for the IMesh toolkit

[from the mailing list postings]
[This is my summary of the postings. I have attributed the contributions to the discussion to the person who made them, within the square brackets, the date and time of the posting are given so you can go back to the original posting. Hope I haven't misrepresented anyone. Give a shout if you think I have and feel free to contradict yourself if your thinking has changed since the posting!!]

1.3.4.1 Query Language (Or in other words how to express the terms of the search - examples of how to express search terms are attribute-value pairs, SQL) [Discussed by Dan Bri on 17/10/00, 20:28]
IDEAL: The query language that we use for expressing searches is protocol-neutral, and IMesh clients can express queries in this language regardless of the native/favourite protocol of the server at which the request is directed.
Options
Option 1: to achieve the ideal we could "invent a query language"
[Dan provides an example in his email]
Requirement for the query language in this option is that it needs to abstract across various protocol types, i.e. query needs to be concocted in a way that is still expressible in LDAP, WHOIS++ etc. [Issue: what kinds of questions would this query language need to be able to express? If we follow this option we need to make a list of these]
Option 2: Pick a protocol and adopt its language Option 3: Avoid specifying query-level interoperability in the IMesh toolkit [Dan gives thumbs-down to both options 2 and 3]

1.3.4.2 Data Structures within the data stores The IMesh toolkit infrastructure will have to have neutrality wrt the data structures that gateways want to represent.
Dan [29/10/00, 17:22] proposes that we need to allow records with arbitrarily unpredictable content.
This gives rise to the requirement for a database system with very freeform schemas. [Dan refers us to a book which provides some implementation options. Data on the Web : From Relations to Semistructured Data and XML by Sergei Abiteboul, Pete Buneman, Jim Gray. It would be good to list these options. I've [MB] ordered the book via Inter library loan and am waiting to receive it but maybe someone else could do the list of options]
[Dan] The alternative is to have gateways agree on rigid vocabularies/schemas.
[Rachel, 29/10/00, 23:52]Agrees that there will be more than one such datastore and on the neeed to support varied schemas; however thinks 'unpredictable content' is too strong, and expects re-use of schemas through the use of schemas registries.
[Dan also asks] What are the likely characteristics of IMesh tk datastores ?

[Dan, 29/10/00, 17:22] proposes this as a work-item:
The data stored in subject gateways will be complex, and unpredictable. In keeping with the IMeshtk goal of offering content-neutral tools and infrastructure, IMesh tk will explore options for exposing and querying databases that allow subject gateway applications to employ a complex mix of netadata vocabularies. We will explore technology to support search and query of such content, and evaluate existing tools (eg. RDFdb, Lore, highly generic RDBMS schemata) that might facilitate this.
[I've put this here since it seems to propose finding a solution for the above two issues]
[Rachel, 29/10/00, 23:52] agrees this is within scope.
[Martin, 30/10/00, 17:54] warns about the difficulties of internationalisation/ localisation. The issue could be avoided by keeping this a UK/US project only. On the other hand, to cope with non-English/ISO-8859-1 locales, working with Unicode may be a *partial* solution.
Martin proposes that there should be a single _database_ schema. It would be a 'maintenance nightmare' to support multiple, almost-identical database schemas derived from differing _metadata_ schemas. E.g. coping with different repeatable attributes; different spellings. [In his email Martin further discusses issue of hardcoding metadata schemas as database schemas].

1.3.4.3 APIs for processing queries, searching and retrieving search results i.e. how to make the datastores of resource descriptions available
[Martin, 3/11/2000, 21:46]
2 approaches described.
Approach 1: Mandate a protocol
Assuming we decide to mandate a single protocol [which I will refer to as the IMesh protocol, but it could be say, LDAP] which the repository of resource descriptions supports for access directly, then any other protocols [I will refer to these as foreign protocols] will get access indirectly.[Martin, 3/11/2000, 21:46] describes 2 models of providing indirect access via other protocols.
Indirect access [method 1a]: use application level gateways.
Comments on approach 1a: From ROADS experience this is difficult because of conversion between protocols (e.g. converting between different query language formats). These issues not solved in ROADS.
[Martin does not recommend this option]
Indirect access [method 1b]: run separate servers serving up imported copy of the resource description
Comments on 1b: [Martin favours this approach.] Can re-use existing servers for the protocols. Problem still remains of loss of information when converting between internal structures.
One other advantage Martin sees for approach 1b is that we are not hampered by having to suspend other work which depends on the core of the toolkit until it is ready. Elswhere [Martin, 30/10/00, 17:54] had also divided the approaches into 2 Approach 1: Decide on adopting a standard API (e.g. the LDAP API)
Martin asks is it possible to use the API in an abstract capacity, or would we be bound to use the LDAP protocol ?
Notes that there are already bindings in several languages for this API.
If there is a proper API underlying the server implementation used, we'd still be in a position to use this for our own servers which speak other protocols -- rather than going indirectly using an application-level gateway.
[Martin, not sure how this fits in with the options in 1 a and 1b, or is it yet another option ? I'm trying to fuse your discussions from 2 emails here...]
[Approach 2 was called multi-protocol approach and was not elaborated in [Martin, 30/10/00, 17:54] ]

Approch 2: Adopt a multi-protocol approach. i.e. multiple protocols are supported directly for access.
[Martin, 3/11/2000, 21:46] discusses approach 2 further.
Multiprotocol approach [Method 2a]: Monolithic server capable of speaking multiple protocols.
Comments on multi-protocol approach 2a:
Martin feels it would be an extremely complicated project. The monolithic multi-protocol-speaking server could be based on existing open-source code, however this code is developed in different styles and languages so integrating it into one package could be very difficult.
Multiprotocol approach [Method 2b]: Multiserver co-operative, with common backend
Comments: This is an option that Martin calls conceptually appealing. However, it consists of doing all the work in 1a (running each of the packages, doing the format conversions) PLUS the code for the backend. (Implications for effort in installation and support).

[Rachel, 3/11/00, 15:04] proposed as a work-item - to think about search/browse/ retrieve in context of:
user search via Z39.50 to item descriptions database (DC metadata)
user search via LDAP to same (DC metadata)
user search via LDAP to directory of people (v-card or agent-core metadata)
the third could be directory of locations of collections , or anything, just that it introduces some realistic complexity to this.
Then one could figure out how one could provide framework to enable 'toolkit' to accomodate

  • ROADS
  • an LDAP based bit of software (ISAAC?)
  • PRIDE directory software
1.3.4.4 Query routing
[Dan Bri, 17/10/00, 18:41]
Dan contrasts the models for a mesh of queryable nodes that are offered by the Gnutella (and Freenet) approach vs. ROADS/WHOIS++/CIP approach. Dan wonders if a Gnutella like system could flourish on top of a client-centric distributed search mesh.

1.3.4.5 Return of search results
[Martin, 30/10/00, 17:54]
Search results returned in a form (e.g. as an object) which can be further manipulated to extract desired subsets, rather than as (say) HTML

1.3.4.X Other issues
[Martin, 30/10/00, 17:54]feels there is not much benefit in building our own software when there are plenty around that are perfectly capable - says this for database or search engine, protocols that can be used for search and retrieval and client/server side implementations.

[Martin, ] stresses the importance of providing "pedal-to-the-metal" access to core services, i.e. support for local queries that can by-pass the generic/cross-querying approach. e.g. the user-facing code can talk directly to the datastore. The purpose is to provide faster services. Martin however thinks it undesirable to end up with two separate sets of code. To keep the metal to the pedal option open (if, say, we go for 1b) it is important to keep the parts of the user-facing code which make ongoing network connections in separate modules with a clearly defined interface to the rest of the code. This would facilitate removal or replacement without changing the other code.

1.3.5 Areas to be elaborated

Elaboration involves looking at the lines between components and identifying what travels along them.

This elaboration will tackle:

  • APIs
  • Protocols
  • Data
    • Structure of data
    • Data encoding
    • How data is transported
An effective means of representation needs to be used, most probably a combination of text and diagrams. Open Issue: What are the available methods of representing functionality ? e.g. UML use cases; How does this choice impinge on later development (such as choice of programming languages ?)

Suggested areas for elaboration

  • Search
  • Browse
    The project will need to identify what browse means, investigating the types of browse (basis and structure of hierarchies), the presentation of browse, and technical solutions to different kinds of browse.
  • Metadata Editor
  • Database
  • User Interface

The relation between search and browse is of particular interest.This would have an impact on the data model and on the design of other components, such as a metadata editor. These relationships need to be explored.

Non-elaborated sections

For sections of the architecture that are not described in detail, for example entities external to the system, one option is to specify how they would be expected to interact with the "elaborated system". For example, an IMesh component for implementing cross-browsing might require that a system be asked to identify the structure and/or style of the browse hierarchy employed (classification based on subject, material type, MIME headings). One example of how this is implemented is the approach taken in Renardus, which uses the scan facility of Z39.50 to interrogate the headings of databases outside their system.

2.0 General Architecture Overview

These Sections to be developed according to above discussion.....

Say which of the options from those outlined above were chosen (and why) Express the functionality for those areas chosen for elaboration (using the representation decided upon).

2.1 Introduction

2.2 Overview Diagram-High Level Architecture

General System description

The personalised user interface is the means by which information is displayed to the user, handles direct interaction with the user, such as input of requests. The system offers services to the user. The core of theses services is the searching and display of information about web resources. This information is available via local (internal) datastores, or other external entitities. Add-on services are built around the core service, consisting of personalisation facilites, authorisation facilities, annotation......

2.3 Brief description of functionality

Personalized User Interface
This is what the end-user (researcher/student) sees.
Knows about range of services and customisation options
Informs user of range of services and customisation options
Informs user of terms of use (depending on user status)
Authenticates user with authorisation component
Checks user status
Allows user to enter identification details
Allows user to enter search requests
Allows user to enter preferences
Allows user to enter annotations
For a special category of user (cataloguer/maintainer/datastore owner), allows access to the metadata editor component.
Open Issues:
Web-based ?
Language choices ?
Format for search requests ?
What sort of customisation to be offered ?
(preferences defined by the user and/or service controlled) ?
(preferences for one session only or "use-always")?
Are the above services optional ? e.g. one can chose not to have any news feeds (etc.)
Does the interface have to be broken down into smaller models e.g. sub-component called customization interface for handling customisation options? Seems to me this should be a "mediator" component that mediates between other modules and the actual interface that the user sees (that could be say, a browser, or run on diffferent platforms). Is there a missing layer ?

Service Status
Reports on availability of services
Open Issues:
Reports to what ?
Finds out from what ?
Reports regularly or on request ?

Collection Service Description Server
Knows details about collections available
Receives requests for collection details
Reports collection details
Open Issues:
Which collection details does it need to know (e.g. what authorisation is required for this collection)?
What format should the details be stored in ?
How does it obtain details of collections ?

Search
Receives search requests passed on from the user interface
Passes on request to the Datastore/s
Receives result from datastore/s
Returns search results
Open Issues:
What formats for search requests does it accept (one format/multiple formats)?
Does it do format conversions of the search requests?
Is there interaction with the browse component ? What is the interaction ?
What format does it return results in ?
Does it interact with one or multiple datastores ? How ?

Browse
Presents content of datastores in a structured manner
Open Issues:
How is the structure derived ?
What is the connection with search
Does it give options of what structures can be viewed ? i.e. does it have the ability of showing different views (structures) of the datastores ?
Only for local datastores ?

Record Conversion
Converts records from one format to another
Receives requests from ?
Returns results to ?

Metadata editor
Gives access to datastores to a special category of user
Open Issues:
Does it need to retrieve data from datastores, or acccesses it directly?

News Import
Receives news alerts from local datastores or external entities
Passes on news to User Interface (on request?)

News Editor
????

Alert for export
Receives news from local datastores
Passes on news to News Import Component
Passes on news to News Editor Component

User profiles
Knows user profiles by accessing local datastores
Receives requests for user profiles
Returns User Profile Information to the User Interface
Augments user profile information by querying Authentication component

Authentication
Knows User authentication details
Receives requests for User Authentication
Authenticates users
Reports on authentication process to User Interface
Communicates authentication details to the User Profile Component
?? Connection to external entity

Annotation Server
Relates annotations to annotated document
Stores annotations in local datastore
Receives requests for annotations
Returns annotations to User Interface component
Accepts new annotations

Reading List Management
??

Community Building

Import/Export

Datastores
Local datastores - Open Issues
What is meant by local ?
Are they functionally different from external entities ? What are the differences ?

External Entities

The iterative approach allows developers to progressively identify components and decide which ones to develop, which ones to reuse, and which ones to buy.

Concepts such as packages, subsystems, and layers are used during analysis and design to organize components and specify interfaces.

3.0 Definitions and Quotes

Interoperability

In a computer science context the term interoperability is used to refer to the transparent management of different applications and software. In a resource discovery context it means the transparent searching of and retrieval of data from diverse systems and in different metadata formats. (Providing solutions to interoperability problems will be an important factor in helping integrate the wide range of information services available in a distributed and heterogenous network environment.)

ROADS Interoperability Guidelines, Michael Day (UKOLN), 18 January 1999 http://www.ukoln.ac.uk/metadata/roads/interoperability-guidelines

Components

A physical and replaceable part of a system that conforms to and provides the realization of a set of interfaces.

Booch, G,

Components provide the packaging of frameworks.

Larsen, G, Component-Based Enterprise Frameworks, Communications of the ACM, 43 (10), October 2000.

Framework

An architectural pattern that provides an extensible template for applications within a domain.

Booch, G.,

[Component-based frameworks] In general, a component framework is a collaboration in which all the components are specified with type models; some of them may come with their own implemnetations. To use the framework, you plug in components that fulfill the specifications.

D'Souza, quoted in Larsen, G, Component-Based Enterprise Frameworks, Communications of the ACM, 43 (10), October 2000.

Frameworks provide the fundamental elements, relationships, structural and dynamic integrity, as well as the extension points for modifying the framework for a given application.

Larsen, G, Component-Based Enterprise Frameworks, Communications of the ACM, 43 (10), October 2000.

Component-based framework solutions are partial implementations specifying the nature and way to extend the framework with pluggable components.

Larsen, G, Component-Based Enterprise Frameworks, Communications of the ACM, 43 (10), October 2000.

Heterogneous (federated) systems

The terms "heterogenous" or "federated" are often used to describe cooperating systems in which individual components are designed or operated autonomously. Such co-operation is in contrast to the more general term "distributed systems", which also includes collections of components deployed at different sites and are carefully designed to work with each other.

Paepcke et al. Interoperability for Digital Libraries Worlwide, Communications of the ACM, 41 (4), April 1998

Architecture

A set of software architecture definitions collected by Carnegie Mellon Software Engineering Institute.

Architecture provides the decomposition, justification, and design for given constraints and requirements.

Larsen, G, Component-Based Enterprise Frameworks, Communications of the ACM, 43 (10), October 2000.

Component Model

References

ROADS Interoperability Guidelines, Michael Day (UKOLN), 18 January 1999 http://www.ukoln.ac.uk/metadata/roads/interoperability-guidelines

Paepcke et al. Interoperability for Digital Libraries Worlwide, Communications of the ACM, 41 (4), April 1998

Larsen, G, Component-Based Enterprise Frameworks, Communications of the ACM, 43 (10), October 2000.

Booch, G.,

D'Souza