Expired SCD2 - where source rows are missing

Aug 11, 2009 at 8:27 AM

Hello,

I am very impressed with the work on this tool, however I am finding that if the source is missing rows that are in the dimension then they are being outputed in the expired scd2. Is this a deliberate design or is it a configuration issue maybe? If not, is there any way to switch of this process?

The reason I raise this is that i am working with a new source system that does not have all of the data contained in the dimensions. A flagging process of using a deleted_flag is to indicate records that are to be made inactive is used in my case. I have developed a conditional split to deal with with flagged deletes seperately from the Kimbal method ssis changing dimension component.

Could someone advise please?

Thank you

 

 

 

 

 

Coordinator
Aug 11, 2009 at 2:51 PM

That behaviour is specifically by design - it's an intentional feature improvement over the SCD Wizard.  The reasoning is that if your source system no longer supplies that specific Business Key, then the "business object" it represented no longer exists.  Even it no longer exists, there is no harm keeping it "active" (current = true, or non-expired) in the Data Warehouse Dimension table - in fact, this is the behaviour of the SCD Wizard.  However, not expiring those records has a poor side-effect: there is no way in the Data Warehouse to query for an "active member count" that is meaningful.  This should be one of the sanity checks you should be able to compare the Data Warehouse dimension against the source system(s) with - does the Data Warehouse have the same number of members (today) as the Source System does?  If it does not, then there is definitely a consistency problem between the Dimension table and the OLTP Source System!

If you believe this is a warranted option, please add an WorkItem on the Issue tracker.

Aug 12, 2009 at 1:17 AM

Thanks for the response. There are cases (such as mine) where a purging process has occured and a new data source introduced with a subset of 'active' records. We are using a seperate boolean deleted flag to indicate deletions from the source system. I will hopefully be able to get around this by using a conditional to feed current records to the tool and then ignore any deleted it sees due to 'missing records'. I will process the flagged deleted stream seperately.