SCD II - Updating latest value for the same business key

Apr 28, 2010 at 8:19 PM

Hi Todd,

my source data set is like this 11 cat, 12 dog, 11 lion. So there are 3 rows of data, and the business key is the first 2 numbers. There is no date time stamp for the rows, but I need to replace cat with lion in the destination table. I failed to do this SCD, but looks like SCD will load all the three rows. How can I do this using Kimball Method SCD?

Coordinator
Apr 30, 2010 at 7:03 AM

Your business key isn't the first two numbers, or you're trying to use the component to process a "change stream" which it's not capable of. 

The definition of a business key is: the primary key on your source system's table.  Having two records with a key of "11" would violate the uniqueness constraint of a primary key - therefore you don't have the business key you think you do.

OR...

Since you mentioned you have no date/time stamp for any of the rows, you could be attempting to process data like a "change stream".  Such that the "11 cat" row was how the row looked on "day 1", but on "day 2" it changed to "11 lion".  Neither the Kimball SCD, nor the included SCD component in SSIS will process a "change stream" of rows.  In order to process that information at all, you'll have to eliminate the duplicate primary keys.  Here are some methods to do just that: Eliminating Duplicate Primary Keys in SSIS.

Apr 30, 2010 at 7:21 PM

Hi Todd, thanks.

I agree that SSIS considers business key as the 'Primary Key' of the source system table. Lot of times, we get data from legacy proprietary systems in flat files in this format, so I raised this issue. I am familiar with adding a sort task before the SCD. Alternately, I also used row_number() function and eliminated the duplicates. I recently got exposed to Kimball SCD and found the additional features compared to regular SCD. I thought you might have introduced 'change stream' like storing temporarily in a hash file or some other mechanism in 2008.

Does row_number() work the same way as 'sorting in SSIS'? I am trying an article for the algorithm behind SSIS sorting mechanism.

Thanks a lot for your inputs....

Coordinator
May 4, 2010 at 6:53 AM

No - I haven't implemented change stream style processing yet. 

SSIS' Sort does not behave exactly like SQL Server's for some collations - Alberto Ferrari posted a blog article on that.  At the moment, there is no way to configure the collation or any other part of the SSIS Sort behaviour, except for the column order, and whether the sort is ascending or descending.