Record is already exists in Dimension table but still Kimball SCD Component is Identifying it as a new record every time I run Package

Topics: Issues 2: Using the Component (Design-Time)
Sep 2, 2010 at 2:46 PM

Hi Todd,

I am sorry I posted question on http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/e09cf721-52e3-4d50-9c0c-d69455471e92 and not here.

I am continuing it here. I am really sorry.

Kimball meand it is a custom component called Kimball Method SCD Component which identifies if the record is new or SCD1 or SCD2.

It is supposed to identify that if record already exists in table then it should not give that perticular record as a new.

but it is giving me few records as a new even though those already exist in my dimension table.

Now the issue has been resolved when I compltely deleted it from my package and configured it again then it is working fine.

but I would like to know why this was happened earlier. I am unable to find the problem why it behaved weird.

I have also copied youe answer below so that we can continue the discussion.

Given your description of the problem, the most likely explanation is that you'd customized some part of the configuration of the component such that a match wasn't going to happen.  Since you deleted, then replaced the component on the design surface to get it to work - I'm betting the defaults were what you should have had in the first place.  However, not knowing how you had it configured before, it's pretty hard to divine what could have gone wrong.

Typically, I see people:

  • Incorrectly sorting their inputs.  As in: using an ORDER BY on their query, but having their SortKeyPosition properties set differently.
  • Incorrectly mapping the input columns.  As in: mapping the business key from the dimension table to a column on the source data that isn't the business key.
  • Running the package with the same "date" twice.  The transformation of the SCD is NOT idempotent.  Run it twice on the same data with the same config, and you'll get very odd results.

 

Sep 2, 2010 at 2:51 PM

Todd,

I got my answer. May be that was because of same config. thanks a lot and again sorry for troubling you.

My problem got solved.

Thanks & Regards,

Anuja

Sep 3, 2010 at 8:14 PM

No problem Anuja - your comments have taught me that I should improve the documentation to describe the fact the component isn't "idempotent" - that's not obvious...

Sep 7, 2010 at 3:29 PM

Todd,

I am again having the same problem where I havent made any changes in my package. When I am running it today, My source has 6 new rows. Those 6 rows are showing me as a new record every time I run my package and It is resulting into duplicate in my destination where surrogate Key "RepID" is incrementing and those 6 rows are as it is.

Would you recommend making this package again from scratch?

I am really worried.

I am not running this package for perticular date. Everytime I truncate staging and load it again to populate my dimension table " D_Rep"

 

 

Thanks & Regards,

Anuja

Sep 7, 2010 at 10:22 PM

Hi Todd,

When I did this with Look Up transformation it is not showing up any "No Match Records" that means all records are already available in Dimension then I am not getting Why Kimball SCD is unable to identify that records are already available in dimension table.

 

Thanks & Regards,

Anuja

Sep 10, 2010 at 11:51 PM

Can you provide a small sample set of data to illustrate your problem?  Show a record or two of what's in your dimension table, a record or two of what's in your source.  Then show what you expect to see come out of the component, and what actually comes out.

Sep 13, 2010 at 2:50 AM
Edited Sep 13, 2010 at 2:57 AM

Todd,

 Thanks for your concern.

Below Row is already there in My Dimension table.Here the RepID is 2.RepID is surrogate Key.RepCodeBK,BrokerDealerBK,AffiliationstartDate and AffiliationEndDate are my businessKeys.(So there are 4 businesskeys). Column marked in Bold are my business keys

 

RepID Rep RepCodeBK BrokerDealerBK RepLastName RepFirstName RepCRDNumber BirthDate CreatedTS EditedTS EditedBy ExpireDate CurrentDate RepEffectiveDate RepEndDate RepStatus IA_Type AffiliationStartDate AffiliationEndDate PayeeType TerminationDate DBAName

2      WILL BEN ABC 2 BEN WILL XXXXX 1959-07-16 00:00:00.000 2010-08-27 15:06:29.000 2010-09-10 12:03:26.000 ADVISOR\akhambalikar NULL 2010-09-10 12:04:33.990 2010-04-09 00:00:00.000 2010-04-26 00:00:00.000 2 C 2010-04-06 00:00:00.000 2010-04-26 00:00:00.000 Rep 2010-04-26 00:00:00.000 .

Below record is in my source.Other columns values are coming with the use of derived column tranformation

 

RepCode BrokerDealer       FirstName     LastName CRDNumber    AffiliationstartDate           AffiliationEndDate
ABC           2                  WILL                         BEN       XXXXX      2010-04-06 00:00:00.000 2010-04-26 00:00:00.000

After running ETL, above record has been populated as New Record where RepID is now 39,910 as total record exists in my dimension are 39,909. So RepID is just keep on incrementing even though my business keys are same. I really like this transformation.So I would love to use this component for population my dimension table.  It will be pain to use any other thing or component.

 

RepID Rep RepCodeBK BrokerDealerBK RepLastName RepFirstName RepCRDNumber BirthDate CreatedTS EditedTS EditedBy ExpireDate CurrentDate RepEffectiveDate RepEndDate RepStatus IA_Type AffiliationStartDate AffiliationEndDate PayeeType TerminationDate DBAName

39910      WILL BEN ABC 2 BEN WILL XXXXX 1959-07-16 00:00:00.000 2010-08-27 15:06:29.000 2010-09-10 12:03:26.000 ADVISOR\akhambalikar NULL 2010-09-10 12:04:33.990 2010-04-09 00:00:00.000 2010-04-26 00:00:00.000 2 C 2010-04-06 00:00:00.000 2010-04-26 00:00:00.000 Rep 2010-04-26 00:00:00.000 .

Thanks & Regards,

Anuja

Sep 19, 2010 at 9:01 PM
Edited Sep 19, 2010 at 9:01 PM

Can you tell me the data types of your BK columns as SSIS sees them?  Double-click on the flows going in to the KSCD, and post the metadata information.

Have you identified any SCD type 2 columns?  If so, do you have any SCD 2 Housekeeping columns defined?  (SCD2 Effective Date, SCD2 Expiry Date, SCD2 Current Record?)  If so, which columns are those?  If so, what settings do you have in the "SCD2 Date Handling" tab?

Sep 20, 2010 at 4:55 PM

Hi Todd,

Below is the meta data info for BK columns.

  • RepCodeBK- (DT_STR,3)
  • BrokerDealerBK-(DT_I4)
  • AffiliationStartDate-DT_DBTIMESTAMP
  • AffiliationendDate-DT_DBTIMESTAMP

and below are my SCD2 columns.

  • Rep
  • FirstName
  • LastName
  • CRDNumber
  • DBAName.

SCD2 Housekeeping Columns.

  • CurrentDate is SCD2 Effective Date
  • ExpireDate is SCD2 Expiry Date

CurrentDate has been set to explicit date and then I have used derived column transformation immediate after Kimball SCD to update it(CurrentDate) whenever there is SCD2 record identified or new record identified.

ExpiureDate has set to NULL and I update it using OLEDB Command Transformation.

 

Thanks,

Anuja

 

 

 

Sep 20, 2010 at 8:57 PM

OK - I'm not entirely sure I grasp how you have it set up, but I'm pretty sure your configuration of the SCD2 Date Handling tab is working against you.  If I understand what you're trying to do, the derived column you have after the KSCD is unnecessary, and is actually causing your problem.

If I understand you correctly, your dimension table's rules for identifying "current" records is that their expiry dates are NULL.  In order to make this work with KSCD, you need to alter the configuration of the SCD 2 Date Handling tab to the following:

  • Select a variable containining "today's" date - you probably already have this set up appropriately.
  • Selecting a setting for "record changes by marking" - the default is probably fine.  Any setting is probably fine.
  • The "first record's effective date" can really be set to anything you like as well.  I preferred the default "MS SQL DateTime Min or Max" - although I've changed that in v1.6.
  • The critical setting is the last one - the "last record's expiry date".  You need to change that to "Null Value".

If you don't change that last setting to "Null Value" then the KSCD will not work as you expect.  What's happening is that this instruction is used for TWO things - first, detecting the "current" row.  Second, marking the "new" current row properly (if a new version of a row is produced by the transformation).  In your example, your dimension has your "existing row" set with an expiry date of NULL.  The KSCD reads this row in, as well as the row from the source - and DOES match the business keys.  However, it looks at the row from the dimension table and determines that it is a row of history.  It considers this row expired.  Even though that dimension row business key matches the source row business key, the KSCD can't consider that those two things are the same "version" because the existing dimension table doesn't have a current instance of this business key.  Therefore, it must add one - and that's why your source row is always being determined as a "new" row, not a "match" to the row you already have.

If you change that one setting in the KSCD, you can remove your Derived Column component, and your transform should work.

 

 

 

Sep 29, 2010 at 3:17 PM

Todd,

Thanks for your response.

I have done the same "last record's expiry date" to NULL Value. I have put derived column transformation to set the CurrentDate of new records and new SCD2 records.

Here is my setting for SCD 2 Data Handling

  • Select a variable containining "today's" date - has set to "System::StartTime".
  • Selecting a setting for "record changes by marking" - has set to "Old Ends yesterday and New starts Today"
  • The "first record's effective date" - has set to explicit date
  • the "last record's expiry date". - has set to NULL Value.

So I think, my setting are the same as advised by you.

Thanks,

Anuja

 

 

Sep 30, 2010 at 7:41 PM

I can tell you they're not :)

Remove the Derived Column.

Unless you can present me with a compelling case as to why that Derived Column is necessary, I'm going to tell you it's your problem - especially when you tell me you're messing around with dates in it.  One of the core reasons I designed the KSCD the way I did was to make sure that I didn't have to put Derived Columns after it to "fix up" what should have been handled inside the SCD processing.  The fact that you are trying to "fix" something after the KSCD has processed the rows tells me that I either haven't done my job, or you're using the KSCD inappropriately.  My ego tells me it's the latter :)

Oct 1, 2010 at 3:15 PM

Hi Todd

Firstly I'd like to say, that this component is excellent! I'm enjoying using it for a B.I implementation at our company.

However I'm also stumped by this problem (at least I think it is similar to the what the original poster is stumped with)

It works perfectly! ... until i also start messing with adding a derived column to create an EffectiveDate using GETDATE().

ALL that i want is to use the current date for 'new' records and NOT 1753 ... which is the MS SQL DateTime MIN value.

How can one accomplish this without using a derived column? as soon as one adds a derived column and use that value as the start date - the next time the data flow task runs it duplicates all records (inserting another new record for every source record)

 

Thanks,

Ian

Oct 21, 2010 at 1:06 AM

Thanks for the support!

The problem with throwing GETDATE() in as an effective or expiry date is that the component can then misinterpret whether this is the "first" record.  There should be a setting on the SCD2 Date Handling tab that should allow you to set an "explicit" date/time for the "first" record.  I can't recall at the moment whether using a variable is possible with this setting - I don't think it is.  BUT - you should be able to use property expressions on this property in order to set it to a variable value - say "System::StartTime"...

Nov 18, 2010 at 3:00 AM

Hi Ian,

As Todd says above, you should be able to do this by setting the SCD2 Date Handling "first row effective date" to "Explicit Date".

You then need to switch to the Control Flow designer (not the Data Flow designer), select the Data Flow task and then click on the Expressions box in the Properties window.

In the Property Expressions Editor window that opens, select the Property [name of KSCD component].[SCD2 Explicit Effective Date] and in the Expression box add whatever variableyou want to use for the date, e.g. System::StartTime

I haven't tested it myself but I think that should do what you want.

cheers,

Nathan

Feb 9, 2011 at 5:56 AM

AnujaETL says:  

  • Running the package with the same "date" twice.  The transformation of the SCD is NOT idempotent.  Run it twice on the same data with the same config, and you'll get very odd results.

 

Can you explain this further?  I can re-run mine over and over with the exact same records on the exact same date and it does not change my resulting dimension at all.  Everything simply remains "Unchanged."

Thanks!

Feb 22, 2011 at 10:07 PM

I mean running the package on the same "date" with different data.  If you think of the component as performing a transformation - turning one thing into another - it's not idempotent (see the definition in my original post).  The v1.5 component isn't set up to transform data twice in the same day.  Even if you push exactly the same data through, with the same value for "today's date", you may get changes.  You may not.  It really depends on your data.  It's like saying your dimension table has the value "5" in it.  Running the SCD component multiplies that value by "2".  If you run the component twice, you'll get a different result than just running it once.  However, the transformation isn't so simple (is it?) so sometimes the transformation may do what you expect, but most often, it will not.

Mar 25, 2011 at 2:34 PM

I have come across the same problem several times using the component: The second run will send all source rows to the new output. Successive runs treat rows the way it should be, sending them on the unchanged output. Anyhow, the duplication in the second run always breaks the logic in the dimension table.

When this happens I delete the component. A fresh and newly configured component runs as it should.

If I should provide tracing information next time this happens, please advise how that should be done. The component is great and I would appreciate seeing bugs fixed.

Apr 8, 2011 at 7:48 PM

Replacing the component should have absolutely no effect on runtime behaviour.  If you can possibly generate a test case that demonstrates the issue reliably, please post a work item in the Issue Tracker and attach the package.