Creating a SOMLib Digital Library

Department of Software Technology
Vienna University of Technology

Data Mining in Large Free Text Document Archives Document classification may be regarded as one of the central issues in information retrieval research during the last decades. The challenge of classification is to uncover the similarities between groups of data in order to improve the retrieval effectiveness of the overall system. From an exploratory data analysis point of view the same process of classification may be used to gain insight in the structure of the various data items and may thus be referred to as data mining in text archives. In this paper we show the results from applying a neural network model, the hierarchical feature map, to such a data mining task. The neural network is carefully designed to impose a hierarchical structure on the underlying document collection which leads to straight-forward representation of data similarities. Apart from the benefit for text data mining, we are able to demonstrate that the hierarchical feature map leads to a tremendous speed-up of the training process as compared to more traditional neural network architectures that are already known to be effective in text classification tasks. It is this time-consuming training-process that is commonly regarded as a major obstacle of real-world large-scale neural network application. Hence, hierarchical feature maps point the way towards an effective usage of neural network technology in realistic applications and thus, represent a powerful alternative to traditional methods for text classification.

Up

Comments: rauber@ifs.tuwien.ac.at