Sorting Business Keys

Mar 5, 2010 at 12:41 AM

Hi there,

I've been using the KSCD for the past few days and I must say there is a huge difference in performance, which is great.

I have a question though - the KSCD warns me to create a sort task on the input business key(s) to improve runtime performance, whereas SSIS best practice tells us to aviod asynchronous transformations, such as sort - so my question is why is the KSCD saying one thing and best practice another? Who is more right?


Mar 6, 2010 at 7:18 AM

It doesn't advise you to use a Sort transform - it advises that it would be best to provide it with sorted input.  You don't necessarily need a Sort transform to give it what it wants - you may be able to request your data sorted from your source.

However - to answer your underlying question - it depends.  You would have to experiment with your system.

The KSCD transform will work better with sorted inputs because if the inputs are sorted, it's able to make some additional assumptions about how to handle the rows that come in.  If it receives rows with business keys "A", "B", "C", and "D" on the Existing Dimension, then receives rows "C", and "D" on the Source System, it knows that it will never see rows "A" and "B" come in on the Source System.  It can then determine that the "A" and "B" rows from the Existing Dimension must be sent out the "Deleted" output.  If the flows weren't sorted, it couldn't assume that it wouldn't see rows for "A" and "B" come in the Source System later, so it would have to hold those rows in its cache until it had seen each and every row from the Source System.