What is dimensional data?

Traditional relational databases, such as OLTP databases, are organized around a list of records. Each record contains related information that is organized into attributes (fields). The customer table of the stores_demo demonstration database, which includes fields for name, company, address, phone, and so forth, is a typical example. While this table has several fields of information, each row in the table pertains to only one customer. If you wanted to create a two-dimensional matrix with customer name and any other field, for example, phone number), you would realize that there is only a one-to-one correspondence. The following table is an example of a database table with fields that have only a one-to-one correspondence.

Table 1. A table with a one-to-one correspondences between fields

A four-column table displaying names and phone numbers.

Customer Phone number --->
Ludwig Pauli 408-789-8075 ---------------- ----------------
Carole Sadler ---------------- 415-822-1289 ----------------
Philip Currie ---------------- ---------------- 414-328-4543

You could put any combination of fields from the preceding customer table in this matrix, but you would always end up with a one-to-one correspondence, which shows that this table is not multidimensional and would not be well suited for a dimensional database.

However, consider a relational table that contains more than a one-to-one correspondence between the fields of the table. Suppose you create a table that contains sales data for products sold in each region of the country. For simplicity, the company has three products that are sold in three regions. The following table shows how you might store this data in a table, using a normalized data model. This table lends itself to multidimensional representation because it has more than one product per region and more than one region per product.

Table 2. A simple table with a many-to-many correspondence

A three-column table showing sales data for products sold in each region of the country.

Product Region Unit Sales
Football East 2300
Football West 4000
Football Central 5600
Tennis racket East 5500
Tennis racket West 8000
Tennis racket Central 2300
Baseball East 10000
Baseball West 22000
Baseball Central 34000

Although this data can be forced into the three-field relational table, the data fits more naturally into the two-dimensional matrix in the following table. This matrix better represents the many-to-many relationship of product and region data shown in the previous table.

Table 3. A simple two-dimensional example

A five-column table showing many-to-many relationship of product and region data.

Region Central East West
Product Football 5600 2300 4000
Tennis Racket 2300 5500 8000
Baseball 34000 10000 22000

The performance advantages of the dimensional model over the normalized model can be great. A dimensional approach simplifies access to the data that you want to summarize or compare. For example, using the dimensional model to query the number of products sold in the West, the database server finds the West column and calculates the total for all row values in that column. To perform the same query on the normalized table, the database server has to search and retrieve each row where the Region column equals 'West' and then aggregate the data. In queries of this kind, the dimensional table can total all values of the West column in a fraction of the time it takes the relational table to find all the 'West' records.