New SCD option for millisecond-level effective/expiry

Mar 1, 2010 at 4:48 PM

I created a small modification to KSCD 1.5 to add a new date option: "Old ends 1ms ago, New starts now".  In this mode, the new effective SCD date is DateTime.Now down to millisecond precision, and the expiry SCD date is .001 second before now.

The existing options create expiry times by subtracting one or two DAYS; this new option subtracts one MILLISECOND.

This feature allows multiple ETL passes per day because it avoids a situation where the effective date (with time=0:0:0) of all ETL batches today were the same. If a record changed twice or more per day, the old versions had a quirky condition where effective=today and expiry=yesterday. Such records become invisible because they fail the usual date range predicate.

If McDermid is online, I wish he would advise me how to proceed with this code change.

Mar 3, 2010 at 2:01 AM

Currently some work items within the Issue Tracker are related to this discussion. (Support Multiple Intra-day SCD Processing http://kimballscd.codeplex.com/WorkItem/View.aspx?WorkItemId=2318 and Add Support for a "Change Stream" Source System Input http://kimballscd.codeplex.com/WorkItem/View.aspx?WorkItemId=304 and Enable SCD 2 Date Handling Effective and Expiry Dates Set to SSIS Variable Values http://kimballscd.codeplex.com/WorkItem/View.aspx?WorkItemId=5322). Please vote and add your additional insights.

My original vision behind these work items was for the component to also be able to support an initial data mart load that would have multiple days of data to be processed. Yes, a loop over a period of days could be used for this but that is a lot of extra SSIS gymnastics for an initial load design that required historical data. I also have had times when the data mart was loaded from audit structures that capture mutiple changes per day. The requirements dictated that all these individual changes needed to be caputred (although as it pertains to dimensions, one could argue that this no longer would be characterized as an SCD) rather than aggregating all the changes in a day.

Mar 9, 2010 at 10:43 PM

I could be wrong, but my approach to change data capture has been to use SCD2 and just eliminate the "Deleted" stream. The inputs consist of the deltas from the source (by using ETL batch run date ranges) and from the existing dimension (by using only SCD active members).

This way, my dimension is updated and augmented, but nothing is ever removed unless it's an SCD2 retirement.