Use of Row Change Reason

Topics: Questions About Best Practices
Editor
Aug 31, 2010 at 4:10 AM

Hi, I'm new to SSIS but have spent the last 5 years doing Kimball on Cognos/Oracle so my challenge is applying the same concepts on a different platform.  I must say I'm actually really enjoying the technology with 2008 R2 SSIS and this great SCD component.  One thing I'm just coming to terms with however is the use of the Row Change Reason which I think is a great feature.

I note that your v1.5 YouTube demo Todd, you create derived column transformations for each SCD output.  My question is why not just use the Row Change Reason special output column from the component?  If I use the RCR then I save half a dozen derivations which keeps the design much simpler.

I'm creating extra metadata columns in each dimension to accomodate the Row Change Reason but was it your intent when designing the component that we use the RCR column in a dimension or was it meant to be something we output to a separate audit table?  You seem to indicate in the demo that we could use a less verbose variant of the RCR attribute, maybe that is the thinking here?

I'm also not using unicode data types in my dimensions but can't see any way of pushing our the RCR in non-unicode format.  I therefore seem to need to have a derived column transformation for each SCR output too in order to provide this attribute into the dimension rows. 

My current design is to include all the following metadata attributes in each dimension - row_create_date, row_start_date, row_end_date, row_last_update_date, row_is_current, inferred_member and row_change_reason so we've got a metadata frenzy going on here.

BTW, the WMV link for the Usage Instructions is broken (shame as my new employer doesn't like us using YouTube :(

Grateful for any guidance/direction on the best way of implementing the RCR given the above.

Coordinator
Aug 31, 2010 at 7:47 PM

I'm beginning to regret that YouTube demo - I didn't think through how people would read into it.

The demo I did was a "proof of concept" type demo.  In your "real-life" use, you should NEVER be doing what I did after the component.  I don't see any reason (except your RCR to non-unicode) to have a Derived Column or Union All (of all flows).  I should have set up the video to show the outputs going to the appropriate OLE DB Destination/Command components that would be used in real-life.

But it's fantastic you know Kimball methodology - that means you weren't misled by my union all business... probably.

OK - to handle your metadata frenzy...

First - Your point about a non-unicode option for RCR is very valid.  I'll add that as an issue for me to work on (#7060).

Next, you should be able to handle all the other metadata.  In order to populate the row_create_date column, you should identify that column as an "Audit Column - Row Added" type column, then on the Auditing tab, set it's Audit Value to an appropriate variable - like System::StartTime or a user variable of your choice.  row_start_date and row_end_date are the SCD2 Effective and Expiry types.  row_last_update would be an "Audit Column - Last Changed" type column - with the same kind of setup on the Auditing tab as the row_create_date.  row_is_current should be set up as an SCD2 Current Record type, and inferred_member as an Inferred Member Indicator.

All that make sense?

Editor
Sep 2, 2010 at 9:43 AM
That makes perfect sense and you'll be pleased to hear that common sense prevailed and I had actually worked it out in the end. In fact it wasn't a bad exercise to force me to understand what was going on but yes, that YouTube thing totally threw me as an SSIS newbie! Maybe I could volunteer to help you with some doco? Happy to do some draft content for you if that helps - it's all new to me so might be good coming from a 'know nothing' perspective? BTW if you still want a new icon I also know a tame graphic designer who could help. Oh and final question for today - the version number of the actual transform doesn't appear to increment or reflect your codeplex version. It just struck me that would be useful and it was hard for me to check which one I had installed having played with the stable and alpha versions on the some day and same machine. Thanks for your help and for supporting such a great tool.
Coordinator
Sep 3, 2010 at 7:12 PM

You're welcome - I hope the component serves you well.

If you want to work on some documentation, I won't stand in your way - although I will edit it :)  I've added you as an editor, so you should be able to edit Wiki and Docs pages.  If you don't find the time, don't fret.  I'd appreciate a newbie perspective - things get "forgotten" when I look at it!

Ooh - yes - please do ask your designer.  My icon is LAME.  And if he/she is a UX designer, have them look at the UI for about thirty seconds, and have them tell you why they made a sour face three or four times.

Version numbers, version numbers... aah.  There are several, and they control (and muck up) different things.  I think you're talking about the .Net assembly version number - the one you see in the GAC or in the DLL file properties.  The version number that's there has to stay as it is for all future versions... unless I want to break the upgrade path.  If I change that number, then SSIS sees the component as a completely different component.  It won't automatically (or manually) allow you to upgrade without editing the DTSX XML directly.  The version number that I publish here (currently v1.5 with 1.6 in alpha) is a completely made-up number that isn't reflected anywhere except the installer.  There's another version number hidden (mostly) that's the metadata version number - and it's roughly in step with the published version number - except it's only an integer.  v1.5 is metadata version... uh.. 6.  v1.6 is metadata version 8 (musta skipped 7, or done that internally for me).  OK - they're completely random.  Anyway, the metadata version is the one that "counts" - it lets SSIS and me communicate as to whether your package needs upgrading or not to work.

May 4, 2011 at 9:50 AM

Hi

Can I add another query about the usage of the Row Change Reason? I have 2 questions:

1. Sometimes there are multiple type 2 fields in the dimension that cause a new row to be inserted. I have configured the new record to have the reason for the change. When there is one its fine but when there are more than one it seems that only one field is incldued in the row change reason. Is this by design or am I doing something wrong?

2. Where is it best practice to put the row change reason. I can only see it being used in inserts for type 2. Does it make sense to put it in the expired records. Also I suppose putting it in the updates for type 1 changes is fine but you wont have the history of past row change reasons.

Otherwise its a great component - good work

Thanks

May 5, 2011 at 12:37 PM
Edited May 5, 2011 at 12:39 PM

In order to ensure the fastest possible column by column comparison processing, this component uses bit-wise OR processing of the columns. In other words, the component stops processing after a particular column comparison returns an unmatched result rather than having to incur the overhead of continuing to process all the remaining columns. The result is that only one column is included in the row change reason.

I don't know about a best practice approach but usually I include the row change reason in an audit column alongside other things like IsCurrent and IsInferred.

Often I will have additional Derived Column components add different types of values to each output (e.g. New = Original Record, SCD 1 Updated = Modified Original Record).

Hope this gives you some insights.

May 10, 2011 at 7:17 AM

Thanks for the reply Joe.

I understand the emphasis is on getting the best performance by using bit wise processing. However, I think that in this instance it would be better to favour the intended functionality over performance improvement. What's the purpose of having a row change reason if it only really works when you have one attribute changing. As an alternative (but still not preferable) option, perhaps have a "Multiple attributes changed" text if more than one attribute caused the change.

I do put the row change reason as an additional audit column, I was asking whether best practice calls to include the row change reason in the expired record or new record or both.

Thanks

May 11, 2011 at 8:50 AM

I agree, it is somewhat unintuitive to have a row change reason that doesn't provides all the rows that have changed each time.

I think if the row change reason fields are checked on output fields it should list all of them in the text field (data changed fields: x,y,z etc). Perhaps by default it should be off to maintain the high performance mode we have now.

Alternatively I'd like an option for Row Change Reason (high performance vs verbose) but the above suggestion is more elegant.

This way we can have our cake and eat it.

Regarding the expired vs new row it is a problem and I can't see any specific guidance in the Kimball books on this (not unusual!). I guess with a single field we have to pick one or the other - if you do both you end up with a more complex method to read the context accurately and also you will have to pick a survivor as a row that has replaced and has been replaced will have 2 independent "reasons for change".

Perhaps a way of solving this issue is to have 2 fields (Created Change Reason, Expired Change Reason) as I can see that information in expired rows on why it was superceded might be useful. But currently I haven't had a call to use this information in any live scenario - I wonder if anyone has needed/used this?

May 11, 2011 at 4:02 PM

To each his own I guess. It seems like way too much emphasis is being placed on an area of data warehouse implementation that adds little to no value to the users. Logging and audit information is important but almost never seen or used by the end users. Additionally, IMHO, having a detailed audit of every column that changed should not be part of the data warehouse audit.

I prefer to use row change reason to simply indicate that a row is an "Expired Initial Record", "Expired Version Record", "Current Version Record", "Initial Version Record", etc...

May 12, 2011 at 5:33 AM

Joe I think the point is not about whether this part of the warehouse adds value to the users or not. Rather its about building in a feature into a component that does not really implement the functionality it is supposed to do. I have now abandoned the use of the row change reason for my particular implementation of the Dimension merge SCD component.