Pass Through Columns

Feb 14, 2009 at 5:38 PM
How do I simply pass through a couple of audit columns? I have a couple of columns that are 'derived' prior to the component, but I do not want them to play a part in the determining if the row changed. The Insert/Update statements that follow will use them.

Thanks,
Paul
Coordinator
Feb 14, 2009 at 7:16 PM
Interesting...

I'm not sure that's possible with the current setup.  If you identify a column as "Not Used", then I do not think it's placed on the outputs...

I'll add an Issue for that - it's a very easy item to add to the component - please go vote for it.  I have a bug to fix in v1.4, and can probably squeeze that in at the same time...
Coordinator
Feb 14, 2009 at 7:23 PM
I was just writing up the Issue... and then some thoughts came up:

1. A "PassThrough" type column (that's what I'd call it, I think) would have to be a new "Usage Type" on the Existing Dimension... which means that...
2. That passthrough column would have to be linked to a column on the Source System input - it can't only be on the Existing Dimension.  (If it were, then where would I come up with a value for that column on "new" records?)
3. Which input's value should I take for the passthrough column, should a "match" be determined betweeen an Existing Dimension row and a Source System row?  For example, if I have a Product dimension, and had a "Corn Flakes" row on the ED and SS - which "passthrough" column's value would I use on the outputs?  Do I need to enforce that the "passthrough" column values have to match?  That doesn't seem like it can possibly work - but I might not be understanding what you're looking for.

With all those thoughts/questions... can you post up more information about your scenario?
Feb 14, 2009 at 10:21 PM
For every Dimension that we build we have 2 audit columns, WarehouseLoadDate and WarehouseEventCode, we always update these 2 columns when we insert or update a row. The obvious place to set the values is in a derived column transform on the source input, but, we do not want them to be used as part of the hashing used to determine if the row has changed, rather I just want them to pass through to be used as nessesary in the Insert/Update that follows. these columns will always be present on the3 'Existing' stream.

I would hazard a guess that a 'PassThrough' type would be the way to implement this, the values from the source would overwrite any values on the ED, but would not play a role in any checksum operation, this way they would still be on the pipeline for later use.

The way I do this at the moment is (apologies for the text layout here): -

Source                                                      Existing
  Transforms
  Calc Checksum                                        Calc Checksum      (Component we wrote)
  Add Audit Columns
  Sort                                                          Sort                       (Just keep Business Keys and Checksum value)
                Left Merge Join on business keys
                Conditional Split                                                      
Insert      Update         Dustbin

Does this help any?

Coordinator
Feb 15, 2009 at 6:25 PM
It doesn't sound like you want a "PassThrough" column - because I simply don't think that's possible.  You can't pass a column value through "untouched" unless the Source row and Existing row match exactly (in all respects - even the PassThrough value(s)), OR unless only the Source has that row (it's a brand new row).  If you ever have any SCD activity on the row, you then have to "choose" between what row to take the "PassThrough" value from - the Source System or the Existing Dimension.  So let's not call this problem a "passthrough" problem...

Ummm....

It kindof sounds like you're looking for a non-historical SCD1 column.  But not quite.  (BTW - I can't fathom that concept that the stock SCD Wizard supports.)  Because you don't want that column to "cause" any changes.

OK - let me ask you a question...  These audit columns - what do they contain?  I can assume that the WarehouseLoadDate has today's date in it, correct?  What does the WarehouseEventCode have in it?
Feb 15, 2009 at 7:45 PM
The WarehouseLoadDate (DATETIME) holds the date & time a row was inserted or updated and WarehouseEventCode (NVARCHAR(10)) holds an eventcode that triggered that action, our framework uses event codes to track processing and are integralto the framework I have been using for several years. A row may be created in the Employee dimension that came from an OLTP system (say EventCode SALES) and at a later date the coporate extract provides the authorative information about the employee (say EventCode CORPORATE).

For the sake of efficient code I have always derived these 2 columns on the 'source' stream and then not included them the checksum calculations, because the the WarehouseLoadDate will always differ from the one in the 'existing' stream but it could well be that none of the interesting columns have changed.

One way around this would be to derive the columns after the component, but I would then need to duplicate the task for each of the outputs, i.e. New, Update SCD1, Update SCD2

BTW: Having performed testing with v1.4 I reckon it's a must have component, I am just trying to figure ways to get it to fit into our current scenario
Coordinator
Feb 16, 2009 at 4:33 PM
OK - I see what you're doing.  Those are what I'd consider "auditing" fields... but with the current component, you can't use them.

However, it would be not out of the question for some modifications to allow that.  Now I can add an issue for you.

What I see being possible is expanding the capability of the current auditing columns.  Currently, they only allow integer types, which won't work for you.  Secondly, they only allow one column per "type" of auditing.  If I allow any data type for the auditing columns (including date and chars), AND allow multiple columns to be identified as the same kind of auditing column, AND I find a way to allow you to specify an SSIS variable for each of those columns, then I can see this scenario working for you:

1. You include the WarehouseLoadDate and WarehouseEventCode columns in the Existing Dimension flow ONLY.
2. You create one, maybe two SSIS variables.  One, a string, that contains "SALES" or a similar "code".  The other one (a date) may be optional - because you could use System::StartTime or something similar.
3. You identify both of those columns as "Audit Last Changed" type columns, and associate the "code" SSIS variable with your code column, and a date variable with the date column.

Sound like that would work?  That gets rid of your derived column component too...
Feb 19, 2009 at 7:50 PM
Sounds like that would work nicely, I especially like getting rid of the derived column component
Feb 20, 2009 at 1:44 AM
Todd - this is similar to the question I had asked (http://www.codeplex.com/kimballscd/Thread/View.aspx?ThreadId=46987).  In that situation all of my dimensions have an etl_log_id column that links to a table containing information similar (run date, logical date, load type, etc.) to what Paul is looking to pass through.  The etl_log_id scenario seems easier for the component to deal with at the moment since it's an integer and a single value but the solution you suggested looks like it would be good at giving more flexibility.
Coordinator
Feb 20, 2009 at 6:09 PM
Added as workitem 1678 (http://www.codeplex.com/kimballscd/WorkItem/View.aspx?WorkItemId=1678)...