Usage Instructions

The following should assist you in attaching inputs and outputs, and configuring the component.
I recommend that you download the Sample SSIS project to try it out. I've created videos of how the sample SSIS project was created - view Part One to see the input configuration, and Part Two to see the output configuration. (Those videos show the v1.3 component, which is largely the same as v1.4 - it should give you a good idea.)

Inputs

The component requires two inputs, and can optionally accept a third. The two required inputs are the Existing Dimension input and the Source System input. The optional input is the Special Members input.

Existing Dimension Input

The Existing Dimension input is intended to be supplied with the entire contents of your existing dimension table; all rows (including historical rows), and all columns (including "housekeeping" columns)
For better performance, the input should be sorted according to the Business Key column(s).

Source System Input

The Source System input is intended to be supplied with the information from your OLTP system(s), transformed into the schema of your dimension table. In essence, it should have the same schema (columns and column types) as the Existing Dimension input, except for:
  • A Surrogate Key column
  • SCD2 housekeeping columns (the effective date, expiry date, and/or "current" record indicators)
Column names are not required to match those of the Existing Dimension - but it does assist the component in automatically configuring itself, so it is highly recommended that column naming match the Existing Dimension.
For better performance, the input should be sorted according to the Business Key column(s).

Special Members Input (optional)

The Special Members input is intended to be supplied with any "special" dimension members your dimension requires that aren't supplied by the Source System input. Typically, this consists of at least the "unknown" member - but I've found that there can be several "special purpose" members needed on certain dimensions, so this input can receive an unlimited number of rows. Much like the Source System input, the Special Members input should have the same schema (columns and column types) as the Existing Dimension input, except for:
  • SCD2 housekeeping columns (the effective date, expiry date, and/or "current" record indicators)
The Special Members input can have a Surrogate Key column optionally identified - if your ETL process requires that you manually specify the Surrogate Key of your special dimension members. If you do not supply a Surrogate Key column, the component will automatically assign one according to its configuration settings.

Configuration

Configuring the component consists of filling out five tabs:

Outputs

This component supplies six outputs: Unchanged, New, Expired SCD2, Updated SCD1, Auditing, and Invalid Input.

Unchanged Output

Typically, this output isn't used in "live" packages. It receives the Existing Dimension input rows that haven't been changed due to changes in the Source System or (optional) Special Members inputs. You do not have to place a Row Count transformation on this output in order to collect audit information. Please see the I/O Counters Tab and/or Auditing Output for audit information.

New Output

This output receives rows for several reasons:
  1. A brand new row was seen on the Source System or (optional) Special Members inputs (identified by a Business Key that does not exist in the Existing Dimension)
  2. A row from the Existing Dimension input had the same Business Key as a row from the Source System or Special Members inputs - but one of the columns identified as an SCD2 behaving column was different. The "new" version of the row will be output here, the "old" version will go to the Expired SCD2 output.
Typically, the rows from this output should be inserted directly into the dimension table in the data warehouse. A new column is added to this output, called "Row Change Reason", containing a short description of why the row is appearing on the output.

Expired SCD2 Output

This output receives rows that have been "retired" because of a change to a business entity. Most often, a row output here has a "brother" row appear on the New output.
Typically, the rows from this output should be used to update the dimension table in the data warehouse. Commonly, an OLEDB Command (or similar component) is used to propagate the new date placed by the component in the "expiry" column into the dimension table by using the Surrogate Key column value. A new column is added to this output, called "Row Change Reason", containing the SCD2 column name that caused the expiry. (For dimensions that result in large quantities of SCD2 expiries during processing, the OLE DB Command component can be very slow. In these cases, it is recommended to insert these rows into a staging table with a Destination component, then issue a single UPDATE command in the Control Flow - OR - use the Merge Destination or Batch Destination components found at the SSIS Community Tasks and Components project here on CodePlex.)

Updated SCD1 Output

This output receives all historical rows from the Existing Dimension input that have had a change to a column identified as an SCD1 behaving column. This output may also receive "current" rows from the Existing Dimension input if the only change to that row was on SCD1 columns.
Typically, the rows from this output should be used to update the dimension table in the data warehouse. Commonly, an OLEDB Command (or similar component) is used to propagate the new value(s) in the SCD1 column(s) into the dimension table by using the Surrogate Key column value. A new column is added to this output, called "Row Change Reason", containing the SCD1 column name that caused the change. (See above section on Expired SCD2 output for alternatives to increase performance over the OLE DB Command component.)

Auditing Output

This output receives exactly one row, containing:
  • the Audit Key used to identify the changed rows,
  • several columns containing System variable values (Package Name, Execution Instance, Machine Name, and Start Time),
  • row counts on all inputs and outputs.
This row is intended to be inserted into (or update a row in) an auditing dimension table.

Invalid Input Output

This output receives all rows that cause an error in the component. There are several reasons a row can appear in this output:
  1. The Business Key is a duplicate for the input. Rows appearing in the Source System must have a unique Business Key within the Source System flow and the Special Members flow. Rows appearing in the Existing Dimension input must have a unique combination of the Business Key and SCD2 housekeeping columns - if identified. Rows in the Special Members input must have a unique Business Key withint the Special Members flow, and the Source System flow.
  2. A null value appeared in a column that was identified as not accepting nulls. (All columns are configured by default to reject null values.)
Typically, these rows are directed to some logging or reporting system to alert the operator(s) that a data quality error snuck past the transform operations. Two new columns are added to this output: "Invalid Input", and "Invalid Input Error Description". The "Invalid Input" column will contain the name of the input that the row arrived on (Existing Dimension, Special Members, or Source System). The "Invalid Input Error Description" will contain information related to the reason the row appears here.

Last edited Apr 12, 2009 at 7:11 PM by toddmcdermid, version 14

Comments

toddmcdermid Sep 2, 2009 at 11:44 PM 
Boy - another comment I missed. I've fixed up my RSS subscriptions (using FeedRinse) so that I don't miss these going forward.
Sorry - no, I don't think they do. I believe I wanted to spare all of you from that. However, I'll likely be redoing those with the next release (1.5), as the UI has changed quite dramatically.

greenjeans Jun 23, 2009 at 9:21 PM 
Do your demo videos have sound? I wasn't getting any audio in Windows media player, and didn't get any messages about missing codecs.