Deadlocks occurring in Deleted and Expired SCD2 paths - Fixed

Topics: Issues 3: Executing the Component (Run-Time)
Nov 11, 2010 at 8:21 PM

Am I the only one who has experienced this?  My google searches for others who have had this same problem have turned up nothing but I think I've resolved my problem on my own, for now anyways.  It always seemed to occur for the Deleted and Expired SCD2 paths, sometimes at the same time and other times it was just one or the other path failing.  Rerunning the SCD would either result in another deadlock or eventually complete successfully. 

However, after making changes to all of the settings that I could possibly make in the Kimball SCD, I finally found one that seems to have fixed my problem with no side effects so far.   The setting is actually on the Data Flow task that contains the KSCD.  Under properties I changed the Engine Threads from 5 to 2.  So far I haven't run into any Deadlocks over the last 3 days!! 

So, if you run into this deadlocking issue, give that setting change a shot. 

 

Jeff.

Nov 12, 2010 at 1:53 PM

Jeff,

You are not alone!  I had the same experience and frustration trying to find other instances of people experiencing deadlocks in KSCD.   My problems occurred in Q1 2010 when we first started developing using KSCD.  So, keep in mind that my initial testing was in KSCD v1.4. 

We experienced deadlocks that were especially frequent during the intial load of large dimensions (>100k rows).  My first attempt was to dial down the threads like you did.  In fact I dialed the threads all the way down to 1.  This seemed to slow down the deadlocks.  That is, somewhat fewer occurred.  But more often then not they still occurred but perhaps they occurred a bit later in the load process.  In any event, dialing down the threads in KSCD v1.4 did not solve the problem for me.  The only way that I could totally rid myself of deadlocks was to bust out all of the SCD1, SCD2, and delete processing into separate data flows.  That really wasn't a big deal to engineer.  The bigger deal was the realization from an engineering point of view that it made my code more readable/maintainable and had no negative impact on performance.  To make the change, I simply wrote all of the "non-new" outputs to raw files and simply consume those raw files within the downstream dataflows.

 

Tim

 

 

Coordinator
Nov 25, 2010 at 7:08 PM

The problem here - as I understand it - isn't specifically with the KSCD.  It's with how you're persisting inserts and updates.  With the KSCD, there are a lot of output flows that write to the same table as you're reading from, and that can cause a lock problem.

I'd suggest some combination of these:

  • Use NOLOCK (or something similar) on your source that reads from the dimension table.
  • Dump your updates into temporary tables and use set-based UPDATE commands in an Execute SQL Task to "commit" them.
  • Instead of dividing the flow into SCD1/SCD2/Delete "flows", persist the results to RAW files or temp tables, then commit them later.
Nov 18, 2013 at 2:16 PM
I had introduced this side affect by accident in my SCD. It was resolved by setting the MaximumInsertCommitSize of the Insert OLEDB destination to > 0. The default is usually some massive number like 20bn.

Cheers

Peter