Performance Issues Migrating from 2005 to 2008 R2

Topics: Questions About the Code or Project
Sep 27, 2010 at 3:36 PM

I'm currently using the Kimball Method SCD component on SQL Server 2005 and am migrating to using it in SQL Server 2008 R2.  I'm finding almost no change in performance between the two which is pretty confusing considering my 2008 R2 box has significantly more horsepower than my 2005 box.  Here's the setup that I'm working with:

SQL 2005 Enterprise x64 box with a dual core Xeon at 3.8Ghz and 8Gb of RAM - Server 2003 x64.  Migrating to SQL 2008 R2 Enterprise x64 box with 2 quad core Xeons with hyperthreading at 2.67Ghz and 48Gb of RAM - Server 2008 R2 x64.  I have a SSIS package that is utilizing the Kimball SCD component to process approx. 10,000 incoming (source) records per day against an approx. 2.7 million row existing dimension.  The table has a one column business key and twelve other columns that I'm comparing for SCD 1 inserts/updates.  The incoming source records and existing dimension records are sorted on the business key column in the source objects and the Kimball component is aware of this.  I currently have the Kimball component set to automatically manage thread use.  I don't think it should make a difference, but just so you're aware I took the existing 2005 packages, modified the version of the Kimball component in the .dtsx file and upgraded them to 2008 R2 SSIS.

Running three tests with the exact same start data in the existing dimension and the exact same set of incoming data (reverting back after each test) is yielding the following results:

2005 box - Test 1 = 10 mins 17 secs, Test 2 = 9 mins 53 secs, Test 3 = 9 mins 50 secs

2008 box - Test 1 = 10 mins 13 secs, Test 2 = 9 mins 56 secs, Test 3 = 9 mins 50 secs 

I would expect there to be a pretty big difference in the run times between the two boxes, but that's not what I'm seeing.  I've tried a number of different things to try to improve the performance on the 2008 box, but none of them has made any difference in how fast the package executes.  I can definitely state that all of my other packages which don't utilize the Kimball component yield a pretty big increase in performance so I'm leaning towards the Kimball component as the potential source of the problem.

While this isn't much of an issue with a dimension of this size I have several other dimension tables that have anywhere from 60 million to over 100 million rows in them and some of them are much wider than the sample so the processing time of those tables will become an issue later on.  BTW, I've done a few tests with the larger tables and they are yielding the same results as the sample above.  Has anyone else witnessed this type of behavior?  Does anyone have any ideas regarding why I'm not seeing a performance gain and/or tips/solutions for me?

Sep 30, 2010 at 7:37 PM

A 10K source flow versus a 2.7M row dimension table?  That seems a little off-kilter... but OK.

Can you tell me more about resource usage differences (if any) on the new and old boxes?

My thinking is probably skewed by recent work I've done, but my best thought is that 1.5 wasn't threading as efficiently as it could.  Perhaps - to test - you could try the alpha of v1.6?

Also, can you check/ensure that the small "source" flow gets completely passed to the KSCD "as soon as possible"?  By this, I mean that the KSCD can work faster if it the component immediately feeding it rows on the small "source" input shows green in BIDS.