Putting the World Wide Web into a Data Warehouse: A DWH-based Approach to Web Analysis

Abstract:

The World Wide Web, due to its sheer size and dynamics, has turned into one of the most fascinating and important data sources for large-scale analysis and investigation, ranging from content-based information location, dynamics of change, to community analysis. Yet, most projects so far rely on special-purpose tools optimized for a given task, providing only limited flexibility.
In this paper we propose a Data Warehouse based approach to analyze the World Wide Web. Information contained in the web pages, meta data on the documents, as well as information acquired from additional sources such as the WHOIS database, are integrated into a multidimensional view of the Web. The resulting system allows for flexible analysis of the various characteristics of the Web. Results from a prototypical study of the Austrian national Web space as part of the AOLA project demonstrate the potential of the presented approach.

Authors:

Andreas Rauber
Institute of Software Technology, Vienna University of Technology, Austria.

Oliver Witvoet
Institute of Software Technology, Vienna University of Technology, Austria.

Andreas Aschenbrenner
Institute of Software Technology, Vienna University of Technology, Austria.

Robert M. Bruckner
Institute of Software Technology, Vienna University of Technology, Austria.

Publishing Information:

First International Workshop on Very Large Data Warehouses (VLDWH 2002).
In 13th International Workshop on Database and Expert Systems Applications (DEXA'02), IEEE Computer Society Press, Aix-en-Provence, France, September 2002.