Next: A Neural Network Approach Up: SOMLib: A Distributed Digital Previous: SOMLib: A Distributed Digital

Introduction

The last years have seen an uninterrupted rise of the amount of information available in electronic form, with both the number as well as the size of such information collections continuing to grow. Similar to huge conventional libraries, which are interconnected by some organizational network, the combination of several independently managed information repositories must be facilitated, allowing to access and query a collection via one single interface. This requirement arises not only from the need for convenient interaction, but also from the very nature of the information repositories currently in use: Apart from the desire to combine specialized libraries, we are faced with the situation of publishers offering their annual publications in electronic form, be it online or on CD, which users want to combine into one collection. Finally, the mere size of such information repositories calls for their separation into smaller parts which are then again integrated into one single system, in order to keep them in manageable orders of magnitude. Most systems in use today rely on central databases to retrieve text documents based on keyword queries. However, with the ever increasing amount of information available, these systems face some severe limitations due to computational demands, since all queries have to be dealt with by a central search engine or its mirrors.

On the other hand research in neural networks has produced promising results in the the field of full text archives. A number of projects [1,6] especially show the capabilities of one specific type of unsupervised neural network for dealing with the the problems of text document clustering, namely the Self-Organizing Map.[2] Apart from its applicability for query processing, this network architecture provides the advantage of intuitive visualization of the structure of the information repository [4], allowing the implementation of intuitive user interfaces for browsing collections. One of the most prominent examples in this application arena is the WEBSOM¹, providing access to over 1 million articles from 83 Usenet Newsgroups. However, the mere size of the map and the huge computational demands pose limitations from the technical point of view. Recent studies [3] successfully try to reduce the size of the maps and thus the computational demands by using a modified architecture, e.g. the Hierarchical SOM [5], consisting of a hierarchy of (small) maps being trained consecutively layer for layer top down. However, all training data must be processed centrally.

In this paper we propose a model of a distributed SOM-based library system (SOMLib), which combines both local text documents as well as remote library systems, which use a conforming feature vector representation. The system is not limited to a specific architecture of SOM nor does it need central parsing of all documents to create the training vector structure. The overall goal is the integration of several independently managed libraries using the learning capabilities of Neural Networks. Analogously, each user can create both a personal library built up of other library systems as well as create a user-profile covering her fields of interest to enhance keyword queries.

The rest of the paper is built up as follows: Section 2 describes the principles of our Neural Network approach for such a library system. Section 3 details the architecture of the system with a model setup and application scenario detailed in Section 4. Finally we present a short summary in Section 5.

Next: A Neural Network Approach Up: SOMLib: A Distributed Digital Previous: SOMLib: A Distributed Digital

Andreas RAUBER
1998-06-02