Striving Towards Near Real-Time Data Integration for Data Warehouses
Abstract:
The amount of information available to large-scale enterprises is growing
rapidly. While operational systems are designed to meet well-specified (short)
response time requirements, the focus of data warehouses is generally the
strategic analysis of business data integrated from heterogeneous source systems.
The decision making process in traditional data warehouse environments is often
delayed because data cannot be propagated from the source system to the data
warehouse in time. A real-time data warehouse aims at decreasing the time it
takes to make business decisions and tries to attain zero latency between the
cause and effect of a business decision. In this paper we present an
architecture of an ETL environment for real-time data warehouses, which supports
a continual near real-time data propagation. The architecture takes full
advantage of existing J2EE (Java 2 Platform, Enterprise Edition) technology and
enables the implementation of a distributed, scalable, near real-time ETL
environment. Instead of using vendor proprietary ETL (extraction, transformation,
loading) solutions, which are often hard to scale and often do not support an
optimization of allocated time frames for data extracts, we propose in our
approach ETLets (spoken “et-lets”) and Enterprise Java Beans (EJB) for the
ETL processing tasks.
Authors:
Robert M. Bruckner
Institute of Software Technology, Vienna University of Technology, Austria.
Beate List
Institute of Software Technology, Vienna University of Technology, Austria.
Josef Schiefer
IBM Watson Research Center, New York, USA.
Publishing Information:
In Proceedings of the Fourth International Conference on Data
Warehousing and Knowledge Discovery (DaWaK 2002), Springer
LNCS 2454, pp. 317-326, Aix-en-Provence, France, September 2002.