Next: The Architecture Up: Organization of Distributed Digital Previous: Organization of Distributed Digital

Introduction

Digital libraries, like their conventional counterparts, resemble, by their very nature, distributed sources of information. These individual libraries are managed and maintained independently and the organization of every library is suited to its specific needs. As the size and equally the speed of growth of these libraries increases, new tools are needed to organize these amounts of information. Among other approaches, neural networks have proven to be useful for coping with the kind of demands encountered in that application arena, namely adapting to a changing environment and dealing with the `noise' of large amounts of unstructured information. One of the central goals of the application of neural networks in this domain is the content-oriented structuring of the single pieces of information stored in a library, which in turn is a starting point for query processing and library browsing interfaces. One special type of networks, namely the self-organizing map (SOM) [2] has been applied successfully in this domain on a number of occasions, be it the presentation of Usenet Newsgroup articles as in the Websom approach [1,4], or the structuring and visualization of software libraries [5,6]. However, all of these systems face the severe limitation, that all data must be available and processed at one place for producing a SOM representation of the library. This poses a severe restriction considering the natural structure of libraries as being distributed and independently managed. Apart from the requirement of having to combine these distributed sources of information and to integrate them into one single system, we are faced with the problem of scalability. The training of huge map systems requires by far too much processing time in order to allow frequent retraining, which becomes increasingly necessary as the speed of both growth and change of topics increases.

In this paper we describe an approach for a digital library representation based on self-organizing maps. The reason for using a SOM-based system for representing a collection of text documents is marked by its capability to preserve the topology of the input data, i.e. in the final SOM representation, similar documents are supposed to be located close to each other on the map. We suggest to use a set of several independently managed self-organizing maps, each of which represents a part of a network of digital libraries, which are integrated to form an overall view of the distributed library system. In a nutshell, this is achieved by using the trained SOMs as input to create a higher level SOM representing a collection of single libraries. However, the single maps need not necessarily be arranged in a strict hierarchical order, but can mutually reference each other, forming a Web-like structure, with the extent of this structure not being limited to a single location. Rather, distributed maps can be integrated and referenced via a network of systems, while all library maps can be maintained locally and independent from other referencing SOMs, allowing the retraining of the maps as the necessity arises. Furthermore, the approach is not limited to the use of the very standard SOM architecture for representing such a library, since most approaches based on a similar architecture can be integrated as well. This allows the user to choose the network architecture, network size and training frequency suiting his very needs.

The rest of the paper is organized as follows: Section 2 describes the basic architecture and training principles of the integrating SOM system. We then describe an application scenario in the field of text document classification in Section 3 and, finally, present our conclusions in Section 4.

Next: The Architecture Up: Organization of Distributed Digital Previous: Organization of Distributed Digital

Andreas RAUBER
1998-11-02