Middleware for Distributed Cognition - Final Report

 

Table of contents

1. Introduction

Background

The Middleware for Distributed Cognition (MDC) project was funded by the JISC under the Frameworks Programme.

Back to table of contents

Project Team

Colin Tatham - Technical lead

David Gilks - Technical consultant: JAFER, Database, OpenURL

Howard Noble - Project manager

Jeff Kahn - Technical consultant: VUE

Katherine Ferguson - Technical consultant: Cocoon

Matthew Dovey - Technical consultant: JAFER, SRW and UDDI

Robert Gilks - Technical consultant: JSP

Tom Coppeto - Technical consultant: VUE

Back to table of contents

Summary

Conducting an effective search over the internet, discovering material and building resource lists requires many groups to collaborate in their implementations of technology. The JISC E-Learning Framework supports this collaboration by providing a framework for software development: code should be released under an open source license and/ or comply with relevant interoperability specifications (as managed by international bodies such as IMS, NISO and OASIS).

The MDC project team has built a learner interface to the distributed search software called JAFER. The interface allows learners to search across many repositories simultaneously and bring back information that describes a resource (the metadata). The learner is then able to inspect the records returned by the search and move relevant references into a resource list.

The MDC software implements the Z39.50 and SRW search protocols, the OpenURL v.0.1 discovery specification and the IMS Resource List Interoperability Data Model. All the components that form the solution are available under the LGPL license, MySQL or Apache license.and have been developed using the Java programming language.

The MDC software is still being developed (4/11/04) - we are ironing out the last few bugs which will take us to a beta release. We are in the process of forming collaborations with other groups with the goal of integrating the functionality within e-learning environments such as VLEs, Portals and learning design systems.

The project web site will provide guidelines on how to get the code and try out the software.

Back to table of contents

2. The Learner's perspective

Before discussing the technical aspects of the project we will outline what the software provides the learner.

The focus of this project has been to implement a solution that enables groups of learners to collaborate in the process of recommending resources.

The process of a learner using the MDC software has been described as a UML Activity Diagram.

Here we provide links to a live demonstration of the MDC software, please not the code is still being developed and we know it will crash with some searches. We are interested in your comments in terms of usability though!

You can try the MDC software on our test server, just type a search term and following steps below.

Searching for resources

1. The learner would begin by typing a search term into into the search box in the 'Your Search' section of the MDC interface.

2. This would return a number of records that are displayed in a tabular format. The learner can then use visual inspection, scrolling, sorting and local searching (coming soon!) to compile their resource list.

3. The learner can then select references they want to save to the 'Your List' section of the interface by marking the checkboxes to the left of the records.

4. The learner is also able to hyperlink from specific fields within the returned records to execute another search. This will open another tab within the window and display the resulting search results.

5. The learner is able to take more control of their search through configuring an 'Advanced search'.

6. The learner can specify the conditions or logic behind how their search terms are interpreted by a target database, so say there must be an exact match for a specific word for example.

7. They can also select which databases a search should be conducted against from a pre-defined list.

8. The learner can also select a subject search term from a list that is used to search a global registry of databases. This constrains the learners search to databases that are catalogued against the corresponding keyword.

Back to table of contents

Discovering resources

9. The learner may also be able to preview a resource. This can be achieved through pressing the OpenURL link that takes them to functionality provided by an OpenURL Resolver.

Back to table of contents

Building a resource list

10. The learner will continue to search using the functionality described above until they have compiled a satisfactory list.

11. They can then begin to compile a resource list by pressing the Create Resource List button.

12. The learner is presented with a form that allows them to add annotations and metadata to the list as a whole and further annotations to each list item.

13. Once this is completed the learner can export the list as XML compliant with the IMS Resource List Interoperability (RLI) specification.

14. The learner is also able to export the list as XHTML for embedding within a web site.

15. Or as a PDF to take to print and take to a classroom.

16. Finally the learner can import the IMS RLI XML into the VUE application to re-represent the list as a mind map.

Back to table of contents

3. Technical brief

The MDC project has integrated software components to provide functionality within the search, discover and resource list component definitions within the JISC E-Learning Framework.

The search interface calls methods within the JAFER software that produce queries compliant with the Z39.50 and SRW specifications. The Z39.50 search method is used to manage the corresponding present method call to the repositories that can return references (represented diagrammatically here). The returned metadata is then cached in a database (see Entity Relationship diagram here).

JAFER returns metadata to the interface to allow the learner to compile a selection of records.

The compiled list can then be annotated and marked-up using a standard web form template. This annotated form can then be saved in a format compliant with the IMS RLI XML specification.

The RLI XML can then be output for storage within a repository after being packaged according to the IMS Content Packaging specification (and in turn be searched) or uploaded into the VUE application for re-representation as a mind map.

The XML can also be parsed by Cocoon which can output the resource list as XHTML or PDF. For further details regarding the MDC use of Cocoon please see the documentation here.

Back to table of contents

4. Discussion:

While building the MDC software a number of topics for discussion have emerged.

Usability

It is essential to minimize the cognitive load required to use any tool. The MDC software is designed to enable the learner to maintain as much of their attention as possible on the task-at-hand: engaging in the process of recommendation.

Creating resource lists enables learners (students and academics) to represent their understanding of a subject. A resource list allows a learner to share their view of a subject by providing references to resources and brief notes (annotations).

A distributed search brings back information from many sources and should be presented in a simple and consistent manner that allows learner to inspect the information about a resource (the reference or metadata) to find the relevant items.

We have facilitated this process by presenting the metadata in a tabular format and allowing the learner to scroll, sort and search through all the returned information. We have also implemented tabs that allow the learner to conduct many searches with a single view (browser window). The learner can also execute new searches by hyperlinking from individual items of metadata.

Typically references are represented in highly structured formats, the purpose of these formats is:

Much of the information contained in a traditional reference does not need to be displayed to a learner when they are using digital technologies because the computer can automatically perform the discovery process. The OpenURL mechanism allows metadata to be hidden from the learner by employing the parsing and URL creation functions of an OpenURL Resolver.

Besides the title, author and date there is very little information provided within a reference that allows a learner to decide whether a reference pertains to a suitable text. In short digital references could cut out much of the traditional metadata and add more information that supports learners in the process of recommendation.

A reference could contain a formal abstract or various types of secondary metadata such as:

The MDC interface presents the metadata in a tabular format and implements the sort, search, hyperlink and scroll functions. The same data could be represented according to other visual metaphors that provide new functions:

Hyperbolic trees Cone trees Perspective walls

For more information see: Gary Geisler Making Information More Accessible: A Survey of Information Visualization Applications and Techniques (January 31, 1998)

To preview a resource a learner is often forced to follow a large number of links and gets lost in a process. This is because content providers are not implementing the access required to make open linking a seamless process for the learner. As a result learners may avoid the use of academic search tools in preference for more simple search interfaces such as Google. The main problem with this is that Google will often not enable them to gain access to resources or its metadata which means the search strategy is skewed (towards resources that can be discovered through Google).

Back to table of contents

Navigating information landscapes

Is it still feasible to attempt to store all academic reference metadata in a central repository? Surely there is too much information and this centralised approach is neither robost or scalable enough.

Alternatively is it possible to search all databases that hold metadata for each search? A distributed search tool that attempted to search all repositories would swamp the network and retrieve too many results. Scaling the search down to a select group of repositories requires either the user to configure where their search should go or some kind of machine intelligence to calculate it. Having a large number of repositories also requires the need for effective de-duplication and the ability to conduct a meaningful search over what will inevitably be a heterogenious set of metadata schemas. (See Matthew Dovey's article for a full discussion of these points).

xxx general searches (keyword) vs specific search (complete references)

xxx The tricks: de-duplication, matching algorithms, caching, cataloguing (as in RDN), regular expressions

How do we create a tool that does not flood the network with searches that use up bandwidth and suffer from network problems (looping, loss of service etc):

Taken from OmWiki Peach Bridge web site.

While keeping the number of repositories manageable enough for diversity and robustness to exist on the network as a whole:

Cooperative Association for Internet Data Analysis (CAIDA), taken from CAIDA Walrus project.

The Open Archive Initiative has produced specifications that details how repository owners to expose their metadata to web robots and in turn harvest information from other providers. The result of this is that the number of nodes that could support a full search is minimised. (xxx does an OAI repository mandate a query mechanism?)

Distributed searches can then be conducted against a manageable number of repositories so not consuming too much bandwidth. OAI repositories should also adhere to common metadata standards which means searches can be more precise.

A distributed search interface such as the one designed on the MDC project is designed to simplify what learners need to do to search large information landscapes. It is crucial that solutions in this functional domain implement interoperability specifications because so many groups are required to collaborate. Interoperability specifications are needed that describe:

By bolting together software over the interoperability points as above the MDC project has encountered some of the problems that arise from such a component-based approach to software integration:

The problems that currently exist as these interoperabily points directly effect the usability of the interfaces that can be provided to the learner. To create a system that will be popular with learners it is essential that all the groups involved in the process of search and discover comply with interoperability specifications.

Most of these complexities are created by the need to assert ownership over a resource. Without the need for authenticated access all content could be accessed through simple protocols such as HTTP and encoded in formats with viewers that are freely available such as HTML. Full text search engines (such as Google) could provide basic cataloguing and communities of practice could collaborate to add more structured metadata to resources, (e.g. RDN or Merlot)

When ownership is asserted over a resource the owner may attempt to brand the resources and the repository as a whole. The owner is interested in providing their services in preference to a competitor so will not necessarily be motivated by providing an open interface for searching or a seamless mechanism for allowing learners to discover/ preview a resource.

The result is that interoperability cannot be achieved and the services that learners need cannot be provided. Making content more openly available is being championed by groups such as:

The MDC project has bolted together software components that implement interoperability specifications in this domain:

The project has attempted to support a third approach towards allowing improved searching within the academic community. The distributed search and metadata harvesting approaches are a means for creating an open landscape for all resources, a means for ensuring learners can find any resource.

By creating resource lists and uploading them into repositories that can be searched the process of recommendation is potentially extended to any internet user. Learners will be able to search for lists that have a good match to keywords or articles that they are interested in. Finding these lists could also put them in contact with networks of learners with similar interests. These learning networks could spend time searching for resources and when they find a one they feel is worth telling the community about they could trigger an alert (emails, RSS, postings to blogs/ wikis etc).

By supporting collaboration through the sharing of resource lists we enabling learners to support each other in searching the vast information landscape that is emerging on the internet. Facilitating this kind of collaboration is the focus of research into distributed cognition which is the subject of the next section.

Back to table of contents

What is Distributed Cognition?

The purpose of software must be to serve a human need, this project has used research termed distributed cognition to develop functionality that does not simply model traditional processes. Instead the software we are bolting together facilitates learning activities that could support more effective ways for networks of learners to collaborate in the process of recommendation.

A succinct summary of the field is:

"Distributed Cognition is a hybrid approach to studying all aspects of cognition, from a cognitive, social and organizational perspective. The most well known level of analysis is to account for complex socially distributed cognitive activities, of which a diversity of technological artifacts and other tools and representations are an indispensable part." - Yvonne Rogers and Mike Scaife 1997.

The research emphasizes the importance of taking a holistic view to problems solving where large social networks collaborate in the use of tools to complete activities. This approach is contrasted with seeing problem solving as being performed by individuals:

"A main point of departure from the traditional cognitive science framework is that, at the ‘work setting' level of analysis, the distributed cognition approach aims to show how intelligent processes in human activity transcend the boundaries of the individual actor. Hence, instead of focusing on human activity in terms of processes acting upon representations inside an individual actor's heads the method seeks to apply the same cognitive concepts, but this time, to the interactions among a number of human actors and technological devices for a given activity. In addition, other concepts coming from the social sciences are utilized to account for the socially-distributed cognitive phenomenon. These include notions like intersubjectivity, organizational learning and the distribution of labour. - Yvonne Rogers and Mike Scaife 1997.

Resource lists represent a technology that supports groups of learners in exchanging recommendations about resources. The traditional model is for a domain expert (e.g. physics professor) to compile references to resources (e.g. paper handouts with journal article titles that can be found in a local library) that are relevant to a learning activity (e.g. introductory quantum mechanics).

The internet has enabled this model to be significantly enhanced. The obvious logistical advantages are:

It is important to remember that the technology a learner chooses to solve a problem may have profound implications on their ability to complete an activity. Learners must always endeavor to use technology in an appropriate way by considering their own specific context. Printing a copy of a journal article and annotating it with a pencil could still be the most effective way to learn - this is perhaps particularly true where the annotations need to be done very quickly and include the creation of diagrams.

A major consideration of the research in this field regards understanding how knowledge is represented. The term external cognition refers to how we think using our external environment (as opposed to internal, within our minds). Research here is focused on the role different representation types have on our ability to solve problems. If we define a problem as navigating the information space within a specific subject domain then we can see that an annotated reading/ resource list is designed to allow experts to support novice learners with this task.

There is no reason why this information about references needs to be represented as a list. Some learners may benefit from the list being represented as a mind map. Mind maps can be used to filter the information that is available to immediate inspection (so removed unnecessary metadata) and show how different concepts relate to each other. The VUE mind mapping tool allows learners to import a resource list and re-represent the list according to how they understand the subject.

Another feature of VUE is the ability to overlay a sequence on top of the map so for instance allow experts and learners to communicate how they move their attention through a mind map.

Distributed cognition research emphasizes the importance of seeing knowledge as being organized and manipulated between individuals (as opposed to residing in a single person's mind). It is important to build tools that allow individuals to collaborate in the creation and use of resource lists. This can be done through pasting OpenURLs into emails and messenger tools or printing a list out for a class discussion. Another important component in the overall solution however is the use of repositories that can store resource lists.

If a resource list is uploaded into a database learners can search it to find resource lists that match their search term. The RLI XML allows resource lists to be described in terms of who created a list which in turn allows searches to build in the idea of trust. One of the major issues with searching the web today is that it can be difficult to agree on the quality or validity of a resource. If a learner can ask for a list by Author/ Group (e.g. resource list about Logic created by Sir P. Strawson then if they trust the repository implements secure uploading then they can trust the items in the list of high quality (if they trust that Sir Peter Strawson is an authority on Logic!) Networks of users who trust each others online contributions is goal of the ideas behind the semantic web.

Back to table of contents

Glossary

Discover: NISO have developed the OpenURL specification to enable organizations to manage access to resources on behalf of learners. There is a need for this because many resources require subscriptions to view the content. Basically an OpenURL is a means for passing metadata (a reference) to centrally managed software called a Resolver. The Resolver holds the organizations subscriptions to resources information, it uses the metadata to work out how the learner could locate the appropriate copy of a resource. For example Oxford University might hold a local copy or a journal article and a subscription to a publisher who provides online access to the same article. The resolver would be able to construct a URL to allow the learner to link to the resource and also provide the shelf-mark information for the physical copy. For more information see the Overview of OpenURL by ExLibris.

Back to table of contents

Distributed search: Implies a simple mechanism for searching across multiple repositories. Some kind of client that takes a user query and sends it to multiple target repositories and then aggregates the results for the user.

See this article about the difference between distributed search and metadata harvesting paradigms. This article succinctly summarized the contentious areas in this field of research.

Back to table of contents

Learner: Throughout this document we have used the word learner as a generic term for all the possible roles that could be using the MDC software. The most obvious roles would be: students, academics and librarians.

Back to table of contents

Local searching: Typing a term into this box filters the records returned for a search to only show references that match the search term. For example if the search term was 'Rogers' and the local search terms was '2004' then 'Your Results' would only display records that held metadata matching this term.

Metadata: Information about a resource that forms the basis of a reference. <Author>, <ISBN> and <Title> are all items of metadata that allow the full text of a resource to be discovered.

Back to table of contents

Repository: These are the targets of a distributed search. Repositories need to implement an interface that allows their catalogues to be search and methods to allow the records that match the search term to be returned to the user. The JAFER project produced code that insulates repositories from the complexities of interoperable search syntax by translating for example a Z39.50 query into a repositories native language such as SQL.

Back to table of contents

Resource Lists: Used to be called reading lists but the definition has been changed to extend the scope of what a list could contain. IMS have produced the Resource List Interoperability specification to enable learners to recommend any type of resource. The specification defines a schema for creating a list of annotated resources and describing the list so that it can in turn be found through an online search. Within the IMS RLI specification there are also definitions as to how a list would be created, updated and deleted - the services that would be provided within a resource list management system.

Back to table of contents

Scrolling: Using the 'scroll bar' to move up and down a list of records.

Search: The process of querying one or more catalogues to retrieve information or metadata about a resource. As distinct from discover which is concerned with the location of the appropriate copy of the resource (i.e. the shelf mark or URL to full text of a journal article).

Back to table of contents

Sorting: Using the hyperlink in a column heading to sequence a list in alphabetical order

Visual inspection: The process of visually scanning information (e.g. the metadata returned from the distributed search) with the intention of finding a match against a set of criteria as discerned by the viewer (e.g. articles on the subject of mentalese where the title suggests a discussion of Fodor from a biological perspective). The human process of inspection augments the sort, search and scroll tools provides in the MDC interface.

Back to table of contents

References:

UML Course: http://www.soc.staffs.ac.uk/kch1/