Next: SOMLib at Work Up: SOMLib: A Distributed Digital Previous: A Neural Network Approach

SOMLib - The Architecture

**Figure 1:** SOMLib-Architecture: Integration of several lower order SOMs
$\begin{figure}\begin{center} \leavevmode \epsfxsize=55mm \epsffile{somlib_img.eps} \end{center}\vspace{-0.7cm} \end{figure}$

Basically, there are two different types of SOMs in the SOMLib architecture. (Figure 1) First, there is a set of independent, small SOMs, referred to as first order maps, which are trained with the feature vectors obtained by parsing the documents. Thus every node represents a set of documents, with the whole map representing a topographically ordered mapping of all documents in the library. In a second step, higher order maps are trained using the weight vectors of those first order maps as input vectors. Note, however, that the vocabulary and thus the vector structures of those separate libraries differ from each other. Thus, a unique feature vector setup has to be created based upon the different vectors of the libraries to be included by merging the vector structures to train the higher-order map. The resulting map is conceptually identical to the various library maps it is based upon, with the nodes now representing a set of other nodes from the various lower order maps. Analogously, small SOMs trained with relevant documents can be used as user profiles to enhance keyword queries.

For any higher order map two different types of referencing must be considered: with the static referencing scheme each node passes on the documents it represents to any higher order map that uses its weight vector for training. In this case every node in any map directly represents a set of documents. With the node referencing scheme, higher order maps contain references to the nodes of the maps they are built upon, in which case a node represents a certain topic-area of any lower order map. Static referencing allows direct access to the represented documents while avoiding problems that arise if any of the lower order maps is retrained, in which case the assignment of files to nodes changes. With the node-referencing scheme this requires either the 'old' map to remain available for referencing higher order maps or a re-routing of the corresponding referenced nodes by providing appropriate 'substitute nodes' has to be implemented. This can easily be achieved by mapping all nodes of the 'old' map onto the new, retrained map. An advantage provided by the node-referencing scheme is the fact, that it allows the retrieval of documents that were added to the lower order library document collection after the higher order map was trained. Such documents are not automatically added to the list of referenced documents in the static referencing scheme, but require an update process instead. A combined referencing scheme might provide all the benefits while avoiding the shortcomings: static referencing allows a fast browsing of a high order library map and - with appropriate information given by the static referencing scheme - fast access to the desired up-to-time information on the corresponding lowest order library map in an interactive drill down manner.

Next: SOMLib at Work Up: SOMLib: A Distributed Digital Previous: A Neural Network Approach

Andreas RAUBER
1998-06-02