Zero-Latency Data Warehousing

Toward an Integrated Analysis Environment with Minimized Latency for Data Propagations

 

Robert M. Bruckner

Ph.D. thesis. 248 Pages. Department of Computer Science, Vienna University of Technology, November 2002.

 

Abstract:

Data warehousing is a powerful concept for organizations to analyze their business. It generates benefits for the business as it transforms the intelligence contained in the data into better decision-making, which results in more effective action. The most successful data warehouse implementations deliver business value on an iterative and continuous basis. Therefore, we propose a six-stage data warehouse evolution model in order to meet the need for minimized latency in certain data propagation and decision-making processes.

The zero-latency data warehouse is our vision of a data warehouse system which aims to decrease the time it takes to make a business decisions. In fact, there should be almost zero-latency between the cause and effect of a business decision. This doctoral thesis proposes a technical architecture for a zero-latency data warehouse and investigates its core components:

Time Consistency. We distinguish between two different temporal characterizations of the information appearing in a data warehouse: one is the classical description of the time instant when a given fact has occurred; the other represents the instant when the information is actually intelligible to the system. This distinction, implicit and usually not critical in on-line transaction processing applications, is of particular importance for zero-latency data warehouses. There it can be most useful (or even vital) to determine and analyze what the situation was in the past, with only the information available at a given point in time.

Near Real-Time Data Integration. We will discuss the changing requirements for near real-time data integration in data warehouses. In that context we will study the convergence of traditional ETL (extract-transform-load) and EAI (enterprise application integration) technology, as well as the ODS (operational data store) concept. Finally, we will describe a detailed architecture for near real-time data integration and evaluate two prototype implementations.

Active Decision-Making. Both for efficiency reasons and for consistency in decision-making, an organization will want to (semi-)automate decisions whenever the human mind does not add significant value. We investigate and evaluate several approaches for automating routine decision tasks (database triggers, event-condition-action rules, notifications, etc.). Furthermore, we will extend one of the prototypes with event-handling capabilities in order to enable active decision-making during near real-time data integration.

Finally, we will discuss strengths and limitations of zero-latency data warehouses, as well as some application scenarios, where the approach we propose strongly improves decision-making.