Problem with Type I update on a record After a Type II update

Jul 27, 2009 at 1:37 PM
Edited Jul 27, 2009 at 2:14 PM

Hi there,

First of all, thanks for this great component! I'm testing the component for a couple of days. It's really fast and is working fine. I got one little hick-up today.

I got a dimension table with 4 columns:
Column 1: SurrogatKey  ID
Column 2: BusinessKey  CODE
Column 3: Type 1 Field NAME
Column 4: Type 2 Field ADRESS

There are 5 records in de source table and 0 in de dimension table.

After building a SSIS package, I run the package. The first run inserts all 5 records in my Dimension. Then, my first change in de sourcetable on a column on a record of type 2 (Column 4 - ADRESS). When I run the package again, the old record is closed (type 2 status = 0, enddate ok) and there is a new record. This works fine!

Then I change the value in the column of type 1 (Column 4) on the same record. I espect a change to the sixth record I just created with the new value on Column 3 (the record which is actual). When I run again, it updates both records, the old one (which is not actual) and the new one.

Example:

AFTER RUN 1
ID    CODE    NAME    ADRESS    ACTUAL    FROMDATE    TILLDATE    
1    100    A    A    1    1753-01-01    9999-12-31
2    101    B    B    1    1753-01-01    9999-12-31
3    102    C    C    1    1753-01-01    9999-12-31
4    103    D    D    1    1753-01-01    9999-12-31
5    104    E    E    1    1753-01-01    9999-12-31

AFTER RUN 2
ID    CODE    NAME    ADRESS    ACTUAL    FROMDATE    TILLDATE    
1    100    A    A    1    1753-01-01    9999-12-31
2    101    B    B    1    1753-01-01    9999-12-31
3    102    C    C    1    1753-01-01    9999-12-31
4    103    D    D    1    1753-01-01    9999-12-31
5    104    E    E    0    1753-01-01    2009-07-26
6    104    E    F    1    2009-07-27    9999-12-31

AFTER RUN 3

ID    CODE    NAME    ADRESS    ACTUAL    FROMDATE    TILLDATE    
1    100    A    A    1    1753-01-01    9999-12-31
2    101    B    B    1    1753-01-01    9999-12-31
3    102    C    C    1    1753-01-01    9999-12-31
4    103    D    D    1    1753-01-01    9999-12-31
5    104    X    E    0    1753-01-01    2009-07-26
6    104    X    F    1    2009-07-27    9999-12-31    


First, I changed the ADRESS of CODE 104 from E -> F. A new record is inserted.
Then I changed the NAME of CODE 104 from E -> X. Both records are updated now.

What is has to do: only let the records go through the pipeline "update ScD1' where the actualstatus = 1.

Thanks!

Image: http://img93.imageshack.us/img93/7066/kimballscd.jpg

Coordinator
Jul 27, 2009 at 7:35 PM

This is expected behaviour for a "type 1" SCD column according to Kimball methodology.  You can alter this behaviour, but I'm not sure why you would.

Why is this expected?  A "type 1" SCD attribute is an attribute that we do not want to record historical changes for.  If the attribute changes, we want it to appear as though it "always was" this way.  To explain this behaviour using your specific example, would you consider that changing your customer's "name" (in your source system) was the result of "correcting a mistake", or was the result of a legal name-change?  By identifying the column as an SCD1 attribute, you are saying that any changes to this column are "fixes" to the data.

So why does the component "fix" all versions of this customer?  Any other behaviour would be arbitrary (inconsistent) and would also not fit with the purpose of "fixing" the data.  It would be arbitrary because the component, as well as you the designer, have no idea at what time the previous SCD2 update was made to this customer.  Therefore, in some instances of a customer having their name "fixed", the effect of this will only go back one "day", and for some it will go back to when the customer was first entered into the system.  It would also not fit with the purpose of "fixing" the data - because we didn't "fix" all of it.  If your source system user corrects a customer's name, doesn't that "fix" apply regardless of time period?  Or is it valid to consider they have an "incorrect" name at some time in the past, but it's "fixed now"?

If you disagree, you may still "prevent" the component from making such changes.  Simply place a Conditional Split component on the SCD 1 Update output, and filter out all rows that have an "expiry date" unequal to the value you have indicated is the "current" record.

If you disagree, I would be interested to know why.  Why is it that you would not consider only updating the "most recent" record would not be "arbitrary" as I've described?

Jul 28, 2009 at 3:30 PM

I think I agree this way of thinking, thanks for the feedback and keep up the good work! I really love the component.

p.s. If you don't want to update the old records (which are not actuel and expired), write all changes to a temp table and update the temp table (not the dimension) records with the data from the dimension (Can you follow me? :-) ) After that, write back alle changes to the dimension.

 

Coordinator
Jul 28, 2009 at 5:56 PM

You're welcome!  When you find the time, please go rate it and/or post something in the Performance Improvement thread.

Your method for not updating the "old" records is very convoluted.  It would be better to use a Conditional Split to filter out the "old" records from the SCD1 updated output.  Then you have a smaller number of records to update, and don't have to "undo" any updates.