1
Vote

Unpredictable sorting of "New" output

description

Hi! I have 2 inputs - source data and existing dimension data. Source data is sorted by Order by clause. Starting with 1 existing record in dimension (unknown member), so it's first time filling. When new records going out of DMSC they are sorted in a strange manner - by blocks or something like this, but not like intput records. This is not the order I'd like to see in my dimension table.
I tried to set IsSorted property with sortKeyPosition on business key field, but it didn't help. Tried do that on source, on ed, on both of them and by apart - no use.
Of course Sort component come to the rescue, but asynchronous component is not desirable.
I found a DMSC property - Number of Processing Threads with value 0, changed that to 1 and got what I want - sorted new records. Tried to change to 2 and returned to the problem.
So what to prefer - left 1 thread (on multicore system), use Sort component or do someting else? In terms of performance and best practice.
Thank you.

comments

Sm1th wrote Jan 23, 2012 at 6:06 AM

A remark:
setting Number of Processing Threads = 1 doesn't help with dimension tables size of about 10k rows+. It works just fine with small dimensions, but for a bit larger ones need use Sort component for New records output. Sad that the component messing up the input stream

toddmcdermid wrote Mar 15, 2012 at 11:45 PM

The component makes no statements or commitments about the order of rows emitted from any output. It doesn't try, and couldn't possibly do so except under extremely strict conditions which would compromise performance heavily. In essence, you're asking for me to place the asynchronous Sort inside the component... which is already asynchronous, but that would make it worse :)
However, I'm not sure why you're at all interested or concerned about emitted row order. Can you tell my why it's relevant and important to you?

Sm1th wrote May 17, 2012 at 7:17 PM

"I'm not sure why you're at all interested or concerned about emitted row order. Can you tell my why it's relevant and important to you?"
It's just more convinient when u create clusterd index on PK, then type query
select * from dimTable
and it shows u sorted values. When dim table is releatively small u can look at the result set and find the value with just your eyes - it's just faster to find some row this way than to remember it's business key and type where-clause. It's stupid, I know, just bad habbit =)) left it all =)