Department of Software Technology
Vienna University of Technology


Exploration of Text Collections with Hierarchical Feature Maps Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classification task. Each of the individual self-organizing maps is trained independently and gets specialized to a subset of the input data. As a consequence, the choice of this particular artificial neural network model enables the true establishment of a document taxonomy. The benefit of this approach is a straightforward representation of document similarities combined with dramatically reduced training time. In particular, the hierarchical representation of document collections is appealing because it is the underlying organizational principle in use by librarians providing the necessary familiarity for the user. The massive reduction in the time needed to train the artificial neural network together with its highly accurate clustering results makes it a challenging alternative to conventional approaches.


Up

Comments: rauber@ifs.tuwien.ac.at