Overview of data warehousing

Data warehouse databases provide a decision support system (DSS) environment in which you can evaluate the performance of an entire enterprise over time.

In the broadest sense, the term data warehouse is used to refer to a database that contains very large stores of historical data. The data is stored as a series of snapshots, in which each record represents data at a specific time. By analyzing these snapshots you can make comparisons between different time periods. You can then use these comparisons to help make important business decisions.

Data warehouse databases are optimized for data retrieval. The duplication or grouping of data, referred to as database denormalization, increases query performance and is a natural outcome of the dimensional design of the data warehouse. By contrast, traditional online transaction processing (OLTP) databases automate day-to-day transactional operations. OLTP databases are optimized for data storage and strive to eliminate data duplication. Databases that achieve this goal are referred to as normalized databases.

An enterprise data warehouse (EDW) is a data warehouse that services the entire enterprise. An enterprise data warehousing environment can consist of an EDW, an operational data store (ODS), and physical and virtual data marts.

A data warehouse can be implemented in several different ways. You can use a single data management system, such as HCL OneDB™, for both transaction processing and business analytics. Or, depending on your system workload requirements, you can build a data warehousing environment that is separate from your transactional processing environment.

HCL OneDB uses the umbrella terms data warehousing and data warehousing environment to encompass any of the following forms that you might use to store your data:

Data warehouse
A database that is optimized for data retrieval to facilitate reporting and analysis. A data warehouse incorporates information about many subject areas, often the entire enterprise. Typically you use a dimensional data model to design a data warehouse. The data is organized into dimension tables and fact tables using star and snowflake schemas. The data is denormalized to improve query performance. The design of a data warehouse often starts from an analysis of what data already exists and how to collected in such a way that the data can later be used. Instead of loading transactional data directly into a warehouse, the data is often integrated and transformed before it is loaded into the warehouse.
The primary advantage of a data warehouse is that it provides easy access to and analysis of vast stores of information on many subject areas.
Figure 1: A sample snowflake schema which has the DAILY_SALES table as the fact table.

begin figure description - The figure shows the DAILY_SALES fact table in the center of the figure. This fact table is linked to five dimension tables: STORE, PERIOD, PRODUCT, PROMOTION, and CUSTOMER. The STORE table is linked to the CITY table. The CITY table is linked to the REGION table. The PERIOD table is linked directly to the MONTH and QUARTER tables. The PRODUCT table is linked directly to the BRAND and PRODUCT_LINE tables. The PROMOTION table is not linked to any other tables. The CUSTOMER table is linked directly to three tables: CONTACT, ADDRESS, and DEMOGRAPHICS. - end figure

Data mart
A database that is oriented towards one or more specific subject areas of a business, such as tracking inventories or transactions, rather than an entire enterprise. A data mart is used by individual departments or groups. Like a data warehouse, you typically use a dimensional data model to build a data mart. For example the data mart might use a single star schema comprised of one fact table and several dimension tables. The design of a data mart often starts with an analysis of what data the user needs rather than focusing on the data that already exists.
Figure 2: A data mart with the DAILY_SALES fact table

begin figure description - The figure shows the DAILY_SALES fact table which is linked to five dimension tables: STORE, CUSTOMER, PROMOTION, PERIOD, and PRODUCT. The key references in the fact table are used to link to the dimension tables. For example the STOREKEY column in the DAILY_SALES fact table is linked to the STOREKEY column in the STORE dimension table. The CUSTKEY column in the DAILY_SALES fact table is linked to the CUSTKEY column in the CUSTOMER dimension table. - end figure

Operational data store
A subject-oriented system that is optimized for looking up one or two records at a time for decision making. An operational data store (ODS) is a hybrid form of data warehouse that contains timely, current, integrated information. Including the ODS in the data warehousing environment enables access to more current data more quickly, particularly if the data warehouse is updated by one or more batch processes rather than updated continuously. The data typically is of a higher level granularity than the transaction. You can use an ODS for clerical, day-to-day decision making. This data can serve as the common source of data for data warehouses.