SCD1 not output when SCD2 and SCD1 change made

Jun 25, 2010 at 4:08 PM
Edited Jun 25, 2010 at 4:09 PM
I just put in version 1.5 and I have noticed two issues... one is a fairly easy fix and the other is something that is problematic.
Firstly, the checkbox "Route rows to new output (v1.4 behavior)" under the 'New SCD2' output, does not appear to be working. I have this box checked and unless I unionall the 'New SCD2' and 'New' outputs, I get nothing when an SCD2 change occurs.
Now on to the problem that I can't work around. When an SCD2 and an SCD1 change occur on the same row, only the SCD2 related outputs get rows. I get an expired row in expired SCD2 and a new row in New SCD2, but nothing in Updated SCD1. This results in a row being aged and a new row being inserted with a different value for the SCD1 column.
In other words, the source has this row:
id=1, name(scd1)=mynameA, status(scd2)=updated

The warehouse has this row:
id=1, name=myname, status=new

After this task runs, the warehouse has this:
id=1, name=myname, status=new
id=1, name=mynameA, status=updated

This is fairly problematic and if it isn't something silly that I am doing, I am going to have to rollback to version 1.4, which I would rather not do.
Jun 25, 2010 at 6:55 PM

I would expect something like the behaviour you describe - here's what I mean.

If you only have a "current" record (one row for the business key), and a type 1 and 2 change are made on that record, then the following will happen.  Two rows will be sent out - one out the "New SCD2" and one out the "Expired SCD2".  The "New SCD2" record will have the type 1 AND 2 changes made on it.  The "Expired SCD2" will have the type 1 changes made on it.  You may be missing that in your design.  (I really need to make some docs!)  You should go to your Output Columns tab and make sure you are outputting the "recommended" columns on the Expired SCD2 output - you'll see that the SCD1 columns are included in that set, for exactly this reason.

Now, if you had some "history" for that business key, you'd see records going out the SCD1 Updated...

I hope that's your issue!

Jun 26, 2010 at 4:15 PM
Ok, I understand how to overcome the issue now, but is there a reason you chose to do it this way? I am assuming that there must be something weird in my design of my warehouse loads, because now my update dimension task is fairly complex and cluttered (of course not as bad as it would be without your hard work on this excellent component). I can understand having an output just for new SCD2, since you may want to do something to them before loading them, but what is the benefit of having the SCD1 updates in the SCD2 expired output?
Here is what I have to do now:
Source system and existing dimension in warehouse (where invalid_date is null, to reduce un-needed processing) going into the component
Outputs from the component:
  • New and New SCD2 are going to a union all, then to a fastload into the warehouse
  • Expired SCD2 is multicasted, with one part going to a conditional split to only expire non-deleted rows in the warehouse, via a sql command
  • The other part of the multicast is going to a conditional split to see if the row change reason contains the string "SCD1", which is then piped into a union all with the Updated SCD1 output
  • Then of course the auditing output, which is just a straight sql command and simple
Jul 6, 2010 at 7:51 PM

The primary reason that those records appear with SCD 1 changes on the SCD 2 expired output is to reduce confusion :)

If I made the same row (as in, same surrogate key) come out two outputs, I thought it would be natural for the designer (you) to push those to separate OLE DB Command components - or similar batch update processes.  In doing so, you'd have two components attempting to update the same record - and I thought that only having one update was better.  Additionally, you'd have two "same" rows coming out of the component, which I thought would be confusing in and of itself.

That said... perhaps it's confusing/inefficient the way it is now...

Mar 24, 2011 at 6:36 PM

This was the exact issue I was having. I set my existing dimension input to only pull the current rows as that seems a more efficent design to me, and then noticed that no rows were being sent to the Updated SCD1. I either have to pull all dimension records, or add a multicast to the Expired SCD2 output. I'm glad to see this was intentional and not a bug.