Usage Instructions

Before you start attempting to use the Dimension Merge SSIS Slowly Changing Dimension component or the SCD Wizard, please do your background learning on dimensional modeling and what a slowly changing dimension is. Here are some resources: If you happen to be familiar with the SCD Wizard's terminology and not Kimball's - all you need to know is that a "Changing" attribute is a Type 1, and a "Historical" attribute is a Type 2.

Overview

For a quick runthrough of how to use the component, watch the Overview video (on YouTube, or download the full-quality WMV). See the Documentation page for the full series of videos.

Inputs

Existing Dimension

The flow you connect to this input should be from your data warehouse's dimension table. Use whatever connection manager and source component you wish - the input connection is a data flow, not a connection manager. You will want to provide all rows, and likely all columns from your dimension table to the component - especially the "housekeeping" columns, such as the SCD2 effective and expiry dates.

Source System

The flow you connect to this input should be from your source OLTP system. It's likely you will have transformed the information coming from that source somewhat, in order to generate the attributes you'll need to compare and either insert or update in your data warehouse's dimension table. The format of the source system flow should match the data types of the existing dimension flow.

Special Members

This flow is entirely optional, but provides a clean way of specifying and maintaining any possible "special" members in your data warehouse dimension table. In typical dimensions, there is an "unknown" member that you can use to link fact table rows to when the fact table row refers to a NULL or otherwise unknown dimension value. In some cases, you may have several of these artificial special members in your dimension.
It is completely acceptable to preload your dimension table manually with these members, and/or manually add them to your source system on every load. However, this method does result in those special members acting identically to any other source row when you may not want them to behave in that manner. Typically, all changes to special member rows should be treated as Type 1 - fixing a mistake. The reason for this is that your "unknown" member is "special" and always represents "unknown". If an attribute of an "unknown" member needs to be changed, it's highly unlikely that you would want two versions of "unknown". It's much more likely that you are changing the "unknown" member to clear up confusion or improve its behaviour - which means you would want all "versions" of the member to be revised.

Outputs

The component does not perform any lookups on your dimension table, or persist any changes itself. Instead of performing lookups on the Dimension table, it compares the data you've provided it via the Existing Dimension input. Just like the SCD Wizard, it does not persist any changes. However, unlike the SCD Wizard, it will not create the data flow transformations and destinations needed to persist those changes - that's up to you. (Since the component doesn't know where your dimension table is.)
You may not need to use all of the outputs listed below. The outputs need to be treated differently - please make sure you understand SCD processing, and have watched the video series to understand (in general) what the outputs are used for.

Unchanged

All rows from your dimension table that haven't been changed by the source system will be output here. The rows on this output are typically discarded, but you may want to do some auditing/troubleshooting work with them.

New

Any row from your source system that didn't exist in your dimension table will be output here. The rows on this output are typically inserted directly into your dimension table as the "first version" of this business entity. This output does not include new versions of rows that have SCD changes.

Deleted

Any row from your dimension table that no longer exists in your source system will be output here - but only if your SCD process isn't tracking type 2 changes. If your SCD processing is only tracking type 1 changes, then deleted records will be output here. It's really undefined what you should do with these records... your data warehouse shouldn't be deleting data that's historically relevant... but since you're not tracking type 2 changes, there's no way of marking this record as retired in your warehouse. Rows coming out here should indicate an architecture problem with your data warehouse.

Expired SCD2 with SCD1 Changes

Any row from your dimension table that has SCD type 2 columns changed by the source system will generate two rows from this component. One of those rows will be sent out of this output - the "old version" of the row. All SCD type 1 columns will be set to the "new" values, and the expiry column will be set. It's typical to attach an OLE DB Command transformation to this output... although it's recommended that you send rows to a temporary table instead, and use a set-based UPDATE statement in an Execute SQL Task to perform the update.

New SCD2

Any row from your dimension table that has SCD type 2 columns changed by the source system will generate two rows from this component. One of those rows will be sent out of this output - the "new version" of the row. It's typical to send all these rows to an OLE DB Destination component for direct insertion into the dimension table.

SCD1 Updated

Any rows that have SCD type 1 changes to them (retroactive history changes) will be output here - except for "old versions" of rows that had SCD type 2 changes made. It's typical to attach an OLE DB Command transformation to this output... although it's recommended that you send rows to a temporary table instead, and use a set-based UPDATE statement in an Execute SQL Task to perform the update.

Invalid Input

Any rows that the component detects problems with - NULL business keys, NULL values where you've said there shouldn't be any, etc - are sent out this output. It's imperative that you record this information somewhere. Failure to do so will undoubtedly mean your SCD process is unreliable, and therefore your dimension table contents are suspect.

Auditing

Auditing information - input and output row counts, start and end times, average and maximum cache levels, etc - are sent on one row from this output. Best practice says you should be using some of this information for auditing purposes, but you may not need to use it all. It's typical practice to send this row to an OLE DB Command transform to update a row in an ETL audit table.

Statistics

This output will get sent performance information that's accumulated during the component operation. For package diagnosis, it's very informative to hook this up to a CSV or database table destination for later analysis. Be aware that rows are sent out this output in batches (due to the buffer structure of the data flow), and aren't written to the destination in real-time.

Last edited Apr 13, 2011 at 7:15 PM by toddmcdermid, version 1

Comments

No comments yet.