SCD2 Data Handling, first record's effective data ...

Topics: Issues 2: Using the Component (Design-Time)
Oct 1, 2010 at 2:01 PM

Hi,

By default the first record's effective date is 1753 ......

I want it to be the current date.

so I give it todays date using GETDATE() when I insert any new records that the component spits out.

However when i run the component again it inserts new records again for every record even though there are NO SCD1 or SCD2 changes.

How can I change it to use GETDATE () instead of 1753 .... without breaking the logic of the component?

Thanks,

Ian

Oct 1, 2010 at 2:20 PM

Ok, i'm using v1.5 will try v1.6 and see if it works.

it comes down to me trying to run it twice in the same day, in which case it expect it to be clever enough to not insert the same records again based on the business keys specified + SCD1 and SCD2 matches.

Oct 1, 2010 at 2:36 PM

Hi,

:( Can still not get it to work - installed v1.6 alpha and exactly the same results.

how can  i configure it to:

1) use the current date/time GETDATE() for the effective date of new records

2) able to run the package more than once in the same day - without it creating new records - duplicates? it is not even expiring the records when i use the derive column to populate the effective date.

I need some step by step instructions pls.

Bye,

Ian

Oct 3, 2010 at 5:01 AM

Hi,

I have come up with a solution that works for now.

When I retrieve the 'Existing Dimension' I apply the following formula to all the records.

CASE WHEN RowIsCurrent = 1 THEN DATEADD(DAY, -1, RowStartDate) ELSE RowStartDate END AS RowStartDate,

This will subtract one day from the RowStartDate

Now I can run my SSIS package several times - even during the same day.

:)

I'm now trying to 'batch' my SSIS package so that 500,000 records are only processed, and then another 500,000 records

so trying to place a For loop around my data flow component that will carry on until there are no new records remaining, still contemplating the exact logic.

I find when I have to process about 1,500,000 + records it slows down a bit but around 500,000 records is perfect.

Will just have to think how to handles the Delete output since the source system simply does not yet have the records in question ... maybe also filter the existing dimension somehow for each 500,000 source records based on the business keys etc ...

Oct 15, 2010 at 6:09 PM

That code should not be necessary.  Are new records being given an End Date of NULL?  I'd be interested in knowing a bit more about the process as I think something is missing and it just needs a second pair of eyes to spot it.  PM me if you still need a resolution to this.