Kimball method for Deleted records (SCD1 only)

Topics: Questions About Best Practices
Nov 1, 2010 at 10:55 PM

I am just wondering what the recommended behaviour for deleted records is in Kimball. From what I have read I infer that we never want to delete rows from dimension tables - so what happens if you only have type 1 SCD fields and then your source record is deleted?

I had designed a table with [Audit Inserted DateTime], [Audit Updated DateTime], [Audit Deleted DateTime]. I am starting to think that inserted and updated should be replaced with the Audit functionality (although I am still unsure exactly how this works (see my comment http://kimballscd.codeplex.com/Thread/View.aspx?ThreadId=225047) but what about Deleted audit information. Doesn't it make sense to have a 5th SCD Column Type "Audit Column - Row Deleted" along with the others?

I am just trying to ensure that I am following the guidelines in my design and am worried that if not in this (awesome) SSIS component that there is a more standard method of dealing with this.

If it is a worthy feature let me know and I will add the task.

Coordinator
Nov 9, 2010 at 6:19 PM

Problem there is that you're working with two incompatible ideas.  You're talking about "deleting" data from a data warehouse - and that just doesn't happen - or isn't supposed to.  In data warehouses, you don't delete, you mark rows as expired... possibly.  In Microsoft's "world", you don't actually do anything - records in the warehouse don't change if the source system record gets deleted.  The SCD Wizard simply doesn't modify anything.  You can take this same view with using this component - especially if you're not intending to track SCD2 attributes.  Just dump the output from the Deleted output and don't use it.  This seems like a shame (certainly) and if you'd like to use it, then add some effective/expiry columns to your table.  Even if the component doesn't allow you to identify those as SCD2 date housekeeping columns - and that does look like somewhat of an oversight... but what you can do is include an effective/expiry date set of columns - but just don't include them in your data flow.  Define them in the table, but don't retrieve them to shove them in the component.  (Because if you did, you wouldn't be able to define them as something useful.)  On some of the outputs of the component, you'd probably have to add a few Derived columns to add in new values so you could perform your updates - or heck, just hardcode them into the UPDATE statement or whatever you're using to persist the changes.

Nov 9, 2010 at 6:35 PM
Edited Nov 9, 2010 at 6:39 PM

Thanks for the reply Todd.

As I mentioned I do have 3 "Audit DateTime" columns for created, updated and delete and I am doing exactly what you describe - doing a merge command and setting the "Audit Deleted DateTime" field to the current datetime.

I also understand that with SCD2 you have a natural expiry method with the SCD2 dates for deletes. However in my usage presently I am only using SCD1 fields.

I was really just asking about how to tie the deleted rows back to the audit task that performed the delete and whether it would be useful for this to be explicitly suported by the component (to provide the column in the same manner as created/updated fields).

Coordinator
Nov 25, 2010 at 7:25 PM

Yeah - I think I understand what you're getting at.

I suppose I just have a real problem with thinking about dimension tables that only have SCD1 attributes in them.  Those kind of tables just don't seem realistic to me, so I don't think that way...

Problem is that I don't have a specific audit column for "delete"... I probably shouldn't use the SCD1 audit column for it - because it's not really an SCD1 change... I can't use the SCD2 column for it either, cause it's not really an SCD2 change... but maybe it is... except in your case - because you just don't have any SCD2 attributes.  See how it bends my brain?