|
|
The IMesh Toolkit
[ Work In Hand > Architecture ]
General Architectural Overview of the IMesh Toolkit
[Comments within square brackets]
1.0 Introduction and Approach
1.1 Introduction to the IMesh Toolkit Project
....... and place of this deliverable within the project
This deliverable follows on from the Technical Review
and the Subject
Gateway Requirements Work Package.
This section (1.0 Introduction and Approach) sets the scene by
describing the design philosophy used in the project, introduces
some concepts such as components, architectural design and
interoperability and exposes some of the issues dealt with in the
project. Section 2.0 (General Architectural
Overview) first describes the high level architecture and
functionality that is being proposed. In Section 2.1 .......
1.2 Purpose and Rationale
The general architectural overview.....
shows how the architecture is subdivided into components
provides a common understanding of what services the architecture
and its components deliver (for the partners)
assigns responsibilities to components
indicates how the components are expected to interact
explains how this architecture achieves interoperability
guides the reference implementation.
1.2.1 Scope
This deliverable provides a general architectural overview.
Some sections of the overview will be elaborated, providing more
details of the architecture and the functionality, in this or
other work packages.
[Explain iteratvie approach, refinement/understanding of
requirements]
The areas which are chosen for elaboration will be decided
on:
- The outcome of the requirements workpackage, so that tools
delivered by the project will meet the needs of subject gateways.
The requirements will indicate not only which components are
perceived as being most useful, but also what their functionality
should be.
- Areas considered to provide core functionality (as opposed to
added-value components). There seems to be consensus among
partners that these central functions include at least the
search-browse components.
- The areas of expertise of project partners. Emphasis will be
made on those categories of tools which the partners have
experience of delivering, based on their participation in other
projects, (e.g. ROADS) as well as on the staff skills sets and
experience (e.g. involvement in RDF development).
- The project timescales. The project will deliver reference
implementations. It is desired that any such implementations are
usable, workable tools rather than underdeveloped demonstrator
services [REF.] Placing the emphasis on quality rather than
quantity, more importance will be given to investigating smaller
areas of functionality in greater depth, leading to a
specification that can be used for software development. The
functionality of the whole architecture will be covered in less
detail leaving enough time to develop working implementations of
those components which have been specified more fully.
1.3 Architectural Design Principles
[This section describes our component-based approach to building
an architecture and how this architecture can provide
interoperability - provides some explanation (mostly via
definitions) of what components/architecture/interoperability are
as background for the reader but also for us to help us
understand what we want to achieve in this deliverable!]
General Architecture
Distributed vs. heterogenous (or federated)
systems
Component-based design
Components evolve independantly but rely on each other to
accomplish larger tasks. To achieve interoperability goal is for
components to be able to call on one another efficiently and
conveniently. [Paepcke - ACM 41(4)]
A software component can be defined as a nontrivial piece of
software, a module, a package, or a subsystem that fulfills a
clear function, has a clear boundary, and can be integrated into
a well-defined architecture. Component-based
frameworks
Component-based framework solutions are partial
implementations specifying the nature and way to extend the
framework with pluggable components. [Larsen, G, Component-Based
Enterprise Frameworks, Communications of the ACM, 43 (10),
October 2000.]
Each framework should provide extension-point specifications,
the specifications with which components should comply and extend
the fundamental functional scenarios. [Larsen, G, Component-Based
Enterprise Frameworks, Communications of the ACM, 43 (10),
October 2000.]
Designing, building, and testing component-based frameworks
require considering several things including static models that
illustrate component structure, dynamic models that illustrate
component collaboration, as well as the technology to implement
the component framework.....component-based frameworks build upon
what you know - frameworks build upon experiences and
technologies both past and present. [Larsen, G, Component-Based
Enterprise Frameworks, Communications of the ACM, 43 (10),
October 2000.]
Component-based architectures
A sytem's architecture encompasses the set of significant
decisions about
- The organisation of a software system
- The selection of the structural elements and their interfaces
by which the system is composed
- Their behaviour, as specified by collaborations among those
elements
- The composition of these structural and behavioural elements
into progressively larger subsystems
- The arhcitectural style that guides this organisation: these
elements and their interfaces, their collaborations, and their
composition.
1.3.2 Designing for Interoperability
The ultimate goal for systems that are collections of
independently developed components, that rely on each other to
accomplish larger tasks, is for the components to evolve
independently yet be able to call on one another efficiently and
conveniently (interoperability). [Paepcke et al ACM 41(4)].
Achieving interoperability: Interoperability solutions and
solution classes . [from article by Paepcke et. al]
Strong Standards
This approach relies on the establishment of a standard that
achieves a limited amount of homogeneity among heterogenous
components. These standards are extablished in different
ways:
- a large and diverse enough community agree that a standard is
needed e.g. Z39.50 for information retrieval.
- One product gains enough market share that it becomes a de
facto standard by virtue of its broad deployment e.g De facto
standards may arise spontaneously if a standard is compelling and
easy to deploy and fills an important need at the right time.
e.g. HTML, HTTP, MIME
- Government organisations help a standard gain wide
acceptance
Advantages:
Diasadvantages:
Families of Standards
External Mediation
Locate the interoperabilty machinery outside the participating
local systems. Mediation machinery mediates between components.
Primary function is translation of formats and interaction modes.
Examples: network gateways, mapping global Advantages: schemas to
local ones. Specification-based interaction
Thorough description of semantics and structure of all data and
operations. Mobile Functionality
Comparing interoperability solutions - relationship to
design goals
Component Autonomy
Cost of infrastructure and entry.
Ease of Contributing Components
Ease of Use
1.3.3 Heterogeneity in subject-based information
gateways
[Sources for this section: ......] This section explains where
the differences are in subject gateways; will probably refer to
studies/reports from other projects, published literature etc. or
our own analysis of software or systems and feedback from the
requirements review
1.3.4 Discussion of how to achieve interoperability for the
IMesh toolkit
[from the mailing list postings]
[This is my summary of the postings. I have attributed the
contributions to the discussion to the person who made them,
within the square brackets, the date and time of the posting are
given so you can go back to the original posting. Hope I haven't
misrepresented anyone. Give a shout if you think I have and feel
free to contradict yourself if your thinking has changed since
the posting!!]
1.3.4.1 Query Language (Or in other words how to
express the terms of the search - examples of how to express
search terms are attribute-value pairs, SQL) [Discussed by Dan
Bri on 17/10/00, 20:28]
IDEAL: The query language that we use for expressing searches is
protocol-neutral, and IMesh clients can express queries in this
language regardless of the native/favourite protocol of the
server at which the request is directed.
Options
Option 1: to achieve the ideal we could "invent a query
language"
[Dan provides an example in his email]
Requirement for the query language in this option is that it
needs to abstract across various protocol types, i.e. query needs
to be concocted in a way that is still expressible in LDAP,
WHOIS++ etc. [Issue: what kinds of questions would this query
language need to be able to express? If we follow this option we
need to make a list of these]
Option 2: Pick a protocol and adopt its language Option 3: Avoid
specifying query-level interoperability in the IMesh toolkit [Dan
gives thumbs-down to both options 2 and 3]
1.3.4.2 Data Structures within the data stores The
IMesh toolkit infrastructure will have to have neutrality wrt the
data structures that gateways want to represent.
Dan [29/10/00, 17:22] proposes that we need to allow records with
arbitrarily unpredictable content.
This gives rise to the requirement for a database system with
very freeform schemas. [Dan refers us to a book which provides
some implementation options. Data on the Web : From Relations to
Semistructured Data and XML by Sergei Abiteboul, Pete Buneman,
Jim Gray. It would be good to list these options. I've [MB]
ordered the book via Inter library loan and am waiting to receive
it but maybe someone else could do the list of options]
[Dan] The alternative is to have gateways agree on rigid
vocabularies/schemas.
[Rachel, 29/10/00, 23:52]Agrees that there will be more than one
such datastore and on the neeed to support varied schemas;
however thinks 'unpredictable content' is too strong, and expects
re-use of schemas through the use of schemas registries.
[Dan also asks] What are the likely characteristics of IMesh tk
datastores ?
[Dan, 29/10/00, 17:22] proposes this as a work-item:
The data stored in subject gateways will be complex, and
unpredictable. In keeping with the IMeshtk goal of offering
content-neutral tools and infrastructure, IMesh tk will explore
options for exposing and querying databases that allow subject
gateway applications to employ a complex mix of netadata
vocabularies. We will explore technology to support search and
query of such content, and evaluate existing tools (eg. RDFdb,
Lore, highly generic RDBMS schemata) that might facilitate
this.
[I've put this here since it seems to propose finding a solution
for the above two issues]
[Rachel, 29/10/00, 23:52] agrees this is within scope.
[Martin, 30/10/00, 17:54] warns about the difficulties of
internationalisation/ localisation. The issue could be avoided by
keeping this a UK/US project only. On the other hand, to cope
with non-English/ISO-8859-1 locales, working with Unicode may be
a *partial* solution.
Martin proposes that there should be a single _database_ schema.
It would be a 'maintenance nightmare' to support multiple,
almost-identical database schemas derived from differing
_metadata_ schemas. E.g. coping with different repeatable
attributes; different spellings. [In his email Martin further
discusses issue of hardcoding metadata schemas as database
schemas].
1.3.4.3 APIs for processing queries, searching and
retrieving search results i.e. how to make the datastores of
resource descriptions available
[Martin, 3/11/2000, 21:46]
2 approaches described.
Approach 1: Mandate a protocol
Assuming we decide to mandate a single protocol [which I will
refer to as the IMesh protocol, but it could be say, LDAP] which
the repository of resource descriptions supports for access
directly, then any other protocols [I will refer to these as
foreign protocols] will get access indirectly.[Martin, 3/11/2000,
21:46] describes 2 models of providing indirect access via other
protocols.
Indirect access [method 1a]: use application level gateways.
Comments on approach 1a:
From ROADS experience this is difficult because of conversion
between protocols (e.g. converting between different query
language formats). These issues not solved in ROADS.
[Martin does not recommend this option]
Indirect access [method 1b]: run separate servers serving up
imported copy of the resource description
Comments on 1b: [Martin favours this
approach.] Can re-use existing servers for the protocols. Problem
still remains of loss of information when converting between
internal structures.
One other advantage Martin sees for approach 1b is that we are
not hampered by having to suspend other work which depends on the
core of the toolkit until it is ready. Elswhere [Martin,
30/10/00, 17:54] had also divided the approaches into 2 Approach
1: Decide on adopting a standard API (e.g. the LDAP API)
Martin asks is it possible to use the API in an abstract
capacity, or would we be bound to use the LDAP protocol ?
Notes that there are already bindings in several languages for
this API.
If there is a proper API underlying the server implementation
used, we'd still be in a position to use this for our own servers
which speak other protocols -- rather than going indirectly using
an application-level gateway.
[Martin, not sure how this fits in with the options in 1 a and
1b, or is it yet another option ? I'm trying to fuse your
discussions from 2 emails here...]
[Approach 2 was called multi-protocol approach and was not
elaborated in [Martin, 30/10/00, 17:54] ]
Approch 2: Adopt a multi-protocol approach. i.e. multiple
protocols are supported directly for access.
[Martin, 3/11/2000, 21:46] discusses approach 2 further.
Multiprotocol approach [Method 2a]: Monolithic server capable of
speaking multiple protocols.
Comments on multi-protocol approach
2a:
Martin feels it would be an extremely complicated project. The
monolithic multi-protocol-speaking server could be based on
existing open-source code, however this code is developed in
different styles and languages so integrating it into one package
could be very difficult.
Multiprotocol approach [Method 2b]: Multiserver co-operative,
with common backend
Comments: This is an option that
Martin calls conceptually appealing. However, it consists of
doing all the work in 1a (running each of the packages, doing the
format conversions) PLUS the code for the backend. (Implications
for effort in installation and support).
[Rachel, 3/11/00, 15:04] proposed as a work-item - to think
about search/browse/ retrieve in context of:
user search via Z39.50 to item descriptions database (DC
metadata)
user search via LDAP to same (DC metadata)
user search via LDAP to directory of people (v-card or agent-core
metadata)
the third could be directory of locations of collections , or
anything, just that it introduces some realistic complexity to
this.
Then one could figure out how one could provide framework to
enable 'toolkit' to accomodate
- ROADS
- an LDAP based bit of software (ISAAC?)
- PRIDE directory software
1.3.4.4 Query routing
[Dan Bri, 17/10/00, 18:41]
Dan contrasts the models for a mesh of queryable nodes that are
offered by the Gnutella (and Freenet) approach vs.
ROADS/WHOIS++/CIP approach. Dan wonders if a Gnutella like system
could flourish on top of a client-centric distributed search
mesh.
1.3.4.5 Return of search results
[Martin, 30/10/00, 17:54]
Search results returned in a form (e.g. as an object) which can
be further manipulated to extract desired subsets, rather than as
(say) HTML
1.3.4.X Other issues
[Martin, 30/10/00, 17:54]feels there is not much benefit in
building our own software when there are plenty around that are
perfectly capable - says this for database or search engine,
protocols that can be used for search and retrieval and
client/server side implementations.
[Martin, ] stresses the importance of providing
"pedal-to-the-metal" access to core services, i.e. support for
local queries that can by-pass the generic/cross-querying
approach. e.g. the user-facing code can talk directly to the
datastore. The purpose is to provide faster services. Martin
however thinks it undesirable to end up with two separate sets of
code. To keep the metal to the pedal option open (if, say, we go
for 1b) it is important to keep the parts of the user-facing code
which make ongoing network connections in separate modules with a
clearly defined interface to the rest of the code. This would
facilitate removal or replacement without changing the other
code.
1.3.5 Areas to be elaborated
Elaboration involves looking at the lines between components and
identifying what travels along them.
This elaboration will tackle:
- APIs
- Protocols
- Data
- Structure of data
- Data encoding
- How data is transported
An effective means of representation needs to be used, most
probably a combination of text and diagrams. Open Issue: What are
the available methods of representing functionality ? e.g. UML
use cases; How does this choice impinge on later development
(such as choice of programming languages ?)
Suggested areas for elaboration
- Search
- Browse
The project will need to identify what browse means,
investigating the types of browse (basis and structure of
hierarchies), the presentation of browse, and technical solutions
to different kinds of browse.
- Metadata Editor
- Database
- User Interface
The relation between search and browse is of particular
interest.This would have an impact on the data model and on the
design of other components, such as a metadata editor. These
relationships need to be explored.
Non-elaborated sections
For sections of the architecture that are not described in
detail, for example entities external to the system, one option
is to specify how they would be expected to interact with the
"elaborated system". For example, an IMesh component for
implementing cross-browsing might require that a system be asked
to identify the structure and/or style of the browse hierarchy
employed (classification based on subject, material type, MIME
headings). One example of how this is implemented is the approach
taken in Renardus, which uses the scan facility of Z39.50 to
interrogate the headings of databases outside their system.
2.0 General Architecture Overview
These Sections to be developed according to above
discussion.....
Say which of the options from those outlined above were chosen
(and why) Express the functionality for those areas chosen for
elaboration (using the representation decided upon).
2.1 Introduction
2.2 Overview Diagram-High Level Architecture
General System description
The personalised user interface is the means by which information
is displayed to the user, handles direct interaction with the
user, such as input of requests. The system offers services to
the user. The core of theses services is the searching and
display of information about web resources. This information is
available via local (internal) datastores, or other external
entitities. Add-on services are built around the core service,
consisting of personalisation facilites, authorisation
facilities, annotation......
2.3 Brief description of functionality
Personalized User Interface
This is what the end-user (researcher/student) sees.
Knows about range of services and customisation options
Informs user of range of services and customisation options
Informs user of terms of use (depending on user status)
Authenticates user with authorisation component
Checks user status
Allows user to enter identification details
Allows user to enter search requests
Allows user to enter preferences
Allows user to enter annotations
For a special category of user (cataloguer/maintainer/datastore
owner), allows access to the metadata editor component.
Open Issues:
Web-based ?
Language choices ?
Format for search requests ?
What sort of customisation to be offered ?
(preferences defined by the user and/or service controlled) ?
(preferences for one session only or "use-always")?
Are the above services optional ? e.g. one can chose not to have
any news feeds (etc.)
Does the interface have to be broken down into smaller models
e.g. sub-component called customization interface for handling
customisation options? Seems to me this should be a "mediator"
component that mediates between other modules and the actual
interface that the user sees (that could be say, a browser, or
run on diffferent platforms). Is there a missing layer ?
Service Status
Reports on availability of services
Open Issues:
Reports to what ?
Finds out from what ?
Reports regularly or on request ?
Collection Service Description Server
Knows details about collections available
Receives requests for collection details
Reports collection details
Open Issues:
Which collection details does it need to know (e.g. what
authorisation is required for this collection)?
What format should the details be stored in ?
How does it obtain details of collections ?
Search
Receives search requests passed on from the user interface
Passes on request to the Datastore/s
Receives result from datastore/s
Returns search results
Open Issues:
What formats for search requests does it accept (one
format/multiple formats)?
Does it do format conversions of the search requests?
Is there interaction with the browse component ? What is the
interaction ?
What format does it return results in ?
Does it interact with one or multiple datastores ? How ?
Browse
Presents content of datastores in a structured manner
Open Issues:
How is the structure derived ?
What is the connection with search
Does it give options of what structures can be viewed ? i.e. does
it have the ability of showing different views (structures) of
the datastores ?
Only for local datastores ?
Record Conversion
Converts records from one format to another
Receives requests from ?
Returns results to ?
Metadata editor
Gives access to datastores to a special category of user
Open Issues:
Does it need to retrieve data from datastores, or acccesses it
directly?
News Import
Receives news alerts from local datastores or external
entities
Passes on news to User Interface (on request?)
News Editor
????
Alert for export
Receives news from local datastores
Passes on news to News Import Component
Passes on news to News Editor Component
User profiles
Knows user profiles by accessing local datastores
Receives requests for user profiles
Returns User Profile Information to the User Interface
Augments user profile information by querying Authentication
component
Authentication
Knows User authentication details
Receives requests for User Authentication
Authenticates users
Reports on authentication process to User Interface
Communicates authentication details to the User Profile
Component
?? Connection to external entity
Annotation Server
Relates annotations to annotated document
Stores annotations in local datastore
Receives requests for annotations
Returns annotations to User Interface component
Accepts new annotations
Reading List Management
??
Community Building
Import/Export
Datastores
Local datastores - Open Issues
What is meant by local ?
Are they functionally different from external entities ? What are
the differences ?
External Entities
The iterative approach allows developers to progressively
identify components and decide which ones to develop, which ones
to reuse, and which ones to buy.
Concepts such as packages, subsystems, and layers are used
during analysis and design to organize components and specify
interfaces.
3.0 Definitions and Quotes
Interoperability
In a computer science context the term interoperability is used
to refer to the transparent management of different applications
and software. In a resource discovery context it means the
transparent searching of and retrieval of data from diverse
systems and in different metadata formats. (Providing solutions
to interoperability problems will be an important factor in
helping integrate the wide range of information services
available in a distributed and heterogenous network environment.)
ROADS Interoperability Guidelines, Michael Day (UKOLN), 18
January 1999
http://www.ukoln.ac.uk/metadata/roads/interoperability-guidelines
Components
A physical and replaceable part of a system that conforms to and
provides the realization of a set of interfaces.
Booch, G,
Components provide the packaging of frameworks.
Larsen, G, Component-Based Enterprise Frameworks,
Communications of the ACM, 43 (10), October 2000.
Framework
An architectural pattern that provides an extensible template for
applications within a domain.
Booch, G.,
[Component-based frameworks] In general, a component framework
is a collaboration in which all the components are specified with
type models; some of them may come with their own
implemnetations. To use the framework, you plug in components
that fulfill the specifications.
D'Souza, quoted in Larsen, G, Component-Based Enterprise
Frameworks, Communications of the ACM, 43 (10), October 2000.
Frameworks provide the fundamental elements, relationships,
structural and dynamic integrity, as well as the extension points
for modifying the framework for a given application.
Larsen, G, Component-Based Enterprise Frameworks,
Communications of the ACM, 43 (10), October 2000.
Component-based framework solutions are partial
implementations specifying the nature and way to extend the
framework with pluggable components.
Larsen, G, Component-Based Enterprise Frameworks,
Communications of the ACM, 43 (10), October 2000.
Heterogneous (federated) systems
The terms "heterogenous" or "federated" are often used to
describe cooperating systems in which individual components are
designed or operated autonomously. Such co-operation is in
contrast to the more general term "distributed systems", which
also includes collections of components deployed at different
sites and are carefully designed to work with each other.
Paepcke et al. Interoperability for Digital Libraries
Worlwide, Communications of the ACM, 41 (4), April 1998
Architecture
A set of
software architecture definitions collected by Carnegie
Mellon Software Engineering Institute.
Architecture provides the decomposition, justification, and
design for given constraints and requirements.
Larsen, G, Component-Based Enterprise Frameworks,
Communications of the ACM, 43 (10), October 2000.
Component Model
References
ROADS Interoperability Guidelines, Michael Day (UKOLN), 18
January 1999
http://www.ukoln.ac.uk/metadata/roads/interoperability-guidelines
Paepcke et al. Interoperability for Digital Libraries
Worlwide, Communications of the ACM, 41 (4), April 1998
Larsen, G, Component-Based Enterprise Frameworks,
Communications of the ACM, 43 (10), October 2000.
Booch, G.,
D'Souza
|