Capturing SCD type 1 deletions

Jul 9, 2009 at 10:24 AM

Hello,

How to capture deleted records if I am only handling SCD type 1 modifications? By "deleted" I mean such records that exist in the dimension table, but are no longer found in the source system. Currently these rows seem to go to the Unchanged output - please see http://img411.imageshack.us/img411/3121/scddelete.png . Is it possible to redirect the record with an ID of 5 to a separate output.

The usage of column "Value" is set to SCD1, other options are defaults ( see http://img509.imageshack.us/img509/4503/scdconf2.png ).

 

Thanks,

Rain

Coordinator
Jul 9, 2009 at 9:45 PM

Ah - you're right.

Currently, if you have no SCD 2 columns, the component will send "deleted" rows out the Unchanged output.

I believe that the intent I'd had at that point was that if you weren't tracking history - why bother tracking deletions?  Additionally, there's no "expired" date to set on the row, so it's not possible to mark up when the row was "deleted".  On top of all that - you'd never actually want to delete that row in your Data Warehouse - it still (likely) has fact table rows associated with it.  Even if you "delete" it in your source system, you still need it in your dimension table to help identify the fact rows that relate to it.

Or is there some other scenario I'm not considering?

Jul 10, 2009 at 2:20 AM

There are two usages:

* First I'd like to mark deleted records for further auditing (internal company procedures). As You mentioned, especially interesting cases are such deletions that point to valid fact table rows. Usually such deletions should be accompanied with a transaction in a fact table, that "moves" facts to a new valid dimension (consider ledger accounts), so one result is to find deletions without that matching transfer.

* I am also using this component to simply keep two tables in sync ("naive replication"). I have a legacy of heterogeneous databases and flat-file imports, so SQL replication does not work for me. Truncate-insert is no option, because I would like to generate my own surrogate keys (source data has composite keys containing 3-6 columns). It may not be the direct purpose of the component, but insert-update tracking works quite well already.

 

Br,

Rain

Coordinator
Jul 11, 2009 at 3:45 AM

Here's a thought - let me know what you think...

Add a Derived Column component to the Existing Dimension flow.  Add a column called "ID2" which you will assign the value of "ID" (basically a copy of the ID column).  Add a column called "current", assigned a value of True.

Add a Derived Column component to the Source System flow.  Add a column called "ID2" which you will assign the vlaue of "ID".

Configure the ID2 column in the KSCD as an SCD2 column, and configure the "current" column to be the SCD2 Current column.

Now the KSCD should behave as follows:
* Rows that exist in both tables will pass out the Unchanged output
* Rows that exist only in the Source System will pass out the New output
* Rows that exist only in the Existing Dimension (i.e. deleted from the source) will pass out the SCD2 Expired output

Although - for the "table synchronization" purpose - wouldn't the TableDifference component work better for you?  Maybe even for this purpose?

Jul 15, 2009 at 6:41 PM
Edited Jul 15, 2009 at 6:42 PM

I have tried Your suggestions, but I couldn't get them to work. I'm stuck with a validation error "Existing Dimension has columns identified as SCD2, but no column is identified as SCD2 Effective Date".

I have also tried adding "Effective Date" and "Expiry Date" columns to Existing Dimension flow. When these dates are set, I still don't see the deleted record in the "Expired" output. Also, all existing records are redirected to "New" output, which is not what I wanted. See  http://img440.imageshack.us/img440/7561/200907151.png

I have tried the TableDifference, but I find this component more user-friendly and robust. I especially like the surrogate key generation - it keeps the SSIS packages clear and tidy.

So, if the SCD1 delete capture is not possible with the current version, can this wish be included in the project's feature request list.

 

Rain

Coordinator
Jul 20, 2009 at 3:11 PM
Edited Jul 20, 2009 at 3:12 PM

I've added a note about this to Issue 2317 - watch that for updates.