Sunday, 9 February 2014

Data Modelling - Slowly Changing Dimensions (SCD)

Dimension is a term in data management and data warehousing. It's the logical groupings of data such as geographical location, customer or product information. With Slowly Changing Dimensions (SCDs) data changes slowly, rather than changing on a time-based, regular schedule. Some examples:


  • product prices changing over time
  • people changing their name
  • country/state names may change over time

A slowly changing dimension is generally categorised into one of three possible types; Type 1, Type 2 and Type 3. In the examples that follow, I explain each type.

This is our initial example data:

Product ID (PK) Year Product Name Product Price
1 2004 Product 1 £150

Type 1: Overwriting the old values

In the year 2005, if the price of the product changes to £250, then the old values of the columns "Year" and "Product Price" have to be updated and replaced with the new values. In this type, there is no way to find out the old value of the product "Product 1" in the year 2004 since the table now contains only the new price and year information.

Product ID (PK) Year Product Name Product Price
1 2005 Product 1 £250

Type 2: Creating another additional record

In this type, the old values will not be replaced but a new row containing the new values will be added to the product table. So at any point of time, the difference between the old values and the new values can be retrieved and easily be compared. This would be very useful for reporting purposes.

Product ID (PK) Year Product Name Product Price
1 2004 Product 1 £150
1 2005 Product 1 £250

The problem with this approach is "Product ID" cannot store duplicate values of Product 1 since "Product ID" is the primary key. Also, the current data structure doesn't clearly specify the effective date and expiry date of Product 1 (that is, when the change to it's price happened). So, it would be better to change the current data structure to overcome the above primary key violation.

Product ID (PK)Effective DateTime (PK)YearProduct NameProduct PriceExpiry DateTime
101-01-2004 12.00 AM2004Product 1£15012-31-2004 11.59 PM
101-01-2005 12.00 AM2005Product 1£250

In the changed table structure, "Product ID" and "Effective DateTime" are composite primary keys. So there would be no violation of primary key constraint. Addition of new columns, "Effective DateTime" and "Expiry DateTime" provides the information about the product's effective date and expiry date which adds more clarity and enhances the scope of the table. This type 2 approach will require additional space in the database due to the additional record being stored, but this should not be an issue as dimensions are not that big in the real world.

Type 3: Creating new fields

In this type, the latest update to the changed values can be seen. New columns are added which allow us to keep track of the changes.

Product ID (PK)Current YearProduct NameCurrent Product PriceOld Product PriceOld Year
12005Product 1£250£1502004

The problem with Type 3 is that the complete history will not be stored if there are several continuous changes over the years (it retains a limited history). For example, in 2006, if the price changes to £350, we would not be able to see the complete history of 2004 prices, since the old values would have been updated with the 2005 product information.

Product ID (PK)Current YearProduct NameCurrent Product PriceOld Product PriceOld Year
12006Product 1£350£2502005

No comments:

Post a Comment