Usage Question: Logging Adds and Updates

Coordinator
Feb 12, 2009 at 7:55 PM
fred_mumble asked:

"My package grabs an ETL Log ID and puts it into an SSIS variable to use for logging purposes.  Each of my dimension tables also have an etl_log_id column which gets updated on Type 1 changes and inserted with new adds.  I'm not clear on the best way to handle this using the component; input a derived column with the log id as a separate source, add the log id to the main source input table or directly within the component through some sort of variable assignment."



This advice applies to version 1.4 or higher of the Kimball Method SCD component.

I must have been thinking almost identically to you (or we interpreted the Kimball Auditing practices the same way), because the component is ready to handle your logging requirements with a few simple configuration settings.  I've referred to these features under the term of "auditing" rather than "logging".

1. Make sure that the "Existing Dimension" input flow contains your "etl_log_id" column(s).  It sounds like you only have one column to track SCD changes - this component supports up to four different auditing columns, for different purposes.
2. Open the Kimball Method SCD editor, and on the Existing Dimension Column Usage tab, identify your "etl_log_id" column as one of the "Auditing" types.  From your description, I think that "Audit Key for Last Change" should be appropriate, but here are your choices, and their effects (these must be integer-type columns):
    A) "Audit Key for Adds" - any column identified as this type of auditing column will get a user-supplied number written in it for all rows that are output on the New flow.  The net result of using this kind of auditing is that for any auditing key you supply, you will be able to query your dimension table for all of the rows that were added to your table for that particular ETL run - whether they were "brand new" business keys, or "new versions" of existing members changed by SCD2 attribute changes.
    B) "Audit Key for Last SCD1 Change" - any column identified as this type of auditing column will get a user-supplied number written in it for all rows that are changed as a result of a change in an SCD1 attribute.  Rows receiving this audit key can be output on the New, SCD1 Updates, or Expired SCD2 outputs.
    C) "Audit Key for Last SCD2 Change" - any column identified as this type of auditing column will get a user-supplied number written in it for all rows that are involved in an SCD2 attribute change.  This will include up to two rows - one row on the "New" output (if a new row is required to track an SCD2 change), and/or one row on the "Expired SCD2" output (if an older version was retired due to SCD2 change, or being "deleted" in the source system).
    D) "Audit Key For Last Change" - any column identified as this type of auditing column will get a user-supplied number written in it for all rows sent to all outputs except the Unchanged output.  In other words, if a row changes for any reason, this column will be updated with the audit key.
3. Go to the "Options" tab of the Kimball Method SCD editor, and scroll down to the "Auditing Options" section.  Pull down the "variable containing audit key to use," and select the SSIS variable you have populated with your "ETL logging ID".  This variable MUST be an integer type.  Specify a "no value" value for the component to use for New rows instead of NULL.
Mar 24, 2009 at 9:33 PM
Can someone post a sample of how all of these are used?
I only create a new audit key value for each package execution and use that same key for all record changes (I guess that would be using option D. "Audit Key For Last Change").
I am interested in finding out how the other options may be used?
Also, would it be possible to capture the prior audit key value to include in the row change reason?
An example of using this might be "SCD 1 Existing Row Updated - Previous Audit Key was 45".
This is somewhat analagous to SCD 3 Audit Key tracking and is useful for tracking changes to records.
Coordinator
Mar 25, 2009 at 6:32 PM
Hi Joe,

You're not limited to using only one of the above options - you can use all four at the same time if you like (I use three of them: ABC). 

For example, if you wanted to track your auditing key as an "SCD 3" item - first you'd have to be very clear as to what kind of audit history you're keeping, and what it's useful for.  Let me explain how I "troubleshoot" with the audit keys:

Finding out "what" and "why" a row is how it is:
I use all three audit key types for this.  If I want to know why a certain row exists, and why it exists with the data it has on it, these keys should tell me.  I can see what load (=when) the row was added to the dimension table, and what the last changes to it (via SCD1 or 2) was made.  If it was an SCD2 change that was "last done" (the audit key is higher/later - I know this cause my audit keys are increasing over time), then I can easily find the earlier record to compare differences.  (Of course, you can do this by querying on the business key as well.)  I can't get a complete history, of course - multiple SCD1 changes result in the audit key for SCD1 getting overwritten - just like the history for the SCD1 fields gets overwritten.

Rolling back a dimension load:
It's not "easy", but it's mostly possible.  I can simply delete all of the records that have the audit key in the "added" column.  I can then modify the expiry dates of any row with the key in their SCD2 audit column back to my standard "9999-12-31" expiry date.  Unfortunately, I can't roll back SCD1 changes - because there's no record of what they used to be - but that's not terribly important, because if the source system "fixes" the data to what it should be, it will nicely clobber any "bad" changes that were made anyway.