Would this component be useful for delta fact table loads?

Jun 10, 2010 at 11:58 PM

I am about to start working on my first delta snapshot fact table today and while considering how to best attack the problem it appeared to me to be very similar to a slowly changing dimension problem and got me wondering if I should try to use this component instead of just doing all of the comparisons in SQL.

Here is what I am after, I want to take a daily snapshot of charges over 150 days old to determine the progress we are making in working old charges and trying to get them paid.  The desired columns are:

Date,

DollarsIn150+ - This is how bad the problem of old unpaid charges is.

DollarsMovedInto150+ - This is how quickly new charges are moving into the bad state as defined by charges that haven't been paid in a long time.

DollarsMovedOutOf150+ - This is how quickly we are cleaning up old charges.

NumberOfChargesIn150+ - Just a count of charges that need to be paid.

NumberOfChargesMovedInto150+ - Charges that are now very old that were not yesterday.

NumberOfChargesMovedOutOf150+ - Charges that have been completely paid since yesterday.  Not that a charge may have only a portion of the balance paid which would result in dollars moved out of 150+ but not a charge moved out of 150+.

This seems very similar to a SCD type problem where I need to compare yesterday to today and then take actions based on whether the row is unchanged, new, or deleted.  Should I try using the Kimball Method SCD component for this or am I better off just using SQL, or is there a third method I have not thought of?

-Shane

 

Jun 11, 2010 at 12:23 AM

Shane -

I have been doing just this thing in production mode for a month.  I've commented on this forum on my experience http://kimballscd.codeplex.com/Thread/View.aspx?ThreadId=208962

For me it works great because it intelligently parses the changes into inserts, mods, and deletes which I can in turn make into ledger type transactions to the fact table.  I happen to use "staging" fact table in between the source and ultimate fact target.  The staging fact table is based upon business keys instead of surrogate keys.

Ultimately, you may be limited by the size of your fact table...more directly you will be limited by how many facts are in the source and "existing fact dimension" because, after all, the component is intended for dimensions which by their nature are much smaller than fact tables.  That being said, we have over 2 million fact rows and growing.  I don't have the latest throughput stats at hand, but I'm guessing we process that fact table through KSCD in 15 mnutes at the most.

One thing to watch out for is the size of your business keys in the fact table.  Because it is a fact table and not a dimension table, you will have a composite business key for the fact table.  Long, composite, business keys tend to slow down KSCD.  I was able to transform my 'alpha' business keys into numerics...and performance took off exponentially.

 

Hope this helps,

Tim

Jun 11, 2010 at 12:29 AM

That is very helpful, my keys are fairly small, a combination key of a smallint and an int so 6 bytes, under 1million rows.  Just wanted to make sure I wasn't way off track before I proceeded but it sounds like now it's time to start testing.  I did see Todd mention the CozyRoc components in the other thread but unless they do a better job of the table difference (all I really want to do with the SCD component in this case) I don't see the need to mess around with buying a license for a closed source product. 

It sounds like I can just pipe the different outputs to an aggregation task and get the number of changed records and the dollar amounts of them, then do a few small writes to my daily snapshot table. 

Thanks for your feedback, it's nice to know someone else has had success with this.

 

 

Jun 11, 2010 at 2:07 AM

Close, but no luck so far.  My inputs are incredibly simple, a chargeID and balance for yesterday, and a chargeID and balance for today.

 

I tried mapping it up with no surrogate keys and just comparing on the business key but it doesn't return deleted rows, and for type 1 changes I can only access the new value when I need the old and new value to compute the delta.

 

Then I tried adding a surrogate key and setting the balance to a type 2 scd but it errors out saying I have no current or expiration date columns.  Am I using this component wrong or do I indeed need to use a table difference component?

 

 

Jun 11, 2010 at 2:19 AM

I just did a little more digging and found my new rows are being sent to the new output correctly but my deleted rows are being sent to unchanged.  I put a data viewer on the output and a row that existed yesterday but does not today is showing up in the unchanged output with yesterday's record in there.  Not sure what the next step is.

 

Jun 11, 2010 at 2:27 AM

Oops, I should have searched more first.  I found this thread http://kimballscd.codeplex.com/Thread/View.aspx?ThreadId=208961 discussing the fact that deleted not working properly is a known issue.