Doesn't recognize input as sorted

Apr 1, 2010 at 11:44 PM

I am sorting my inputs by specifying an order by clause in data source sql commands but the component does not recognize it as sorted.

Is there a way to tell it the input is sorted?

Coordinator
Apr 5, 2010 at 1:08 AM

This is a common issue with SSIS.  The problem is that SSIS doesn't understand the "ORDER BY" clause you've placed in your query.  It would behave the same way if your query was a stored procedure with no hints as to what the sort order of the returned rowset was.  In these cases, you have to tell SSIS explicitly two things - first, the output is sorted, and second, how it's sorted.

So - right click on your source, and select Advanced Editor.  Go to the Inputs and Outputs tab.  Open the Output node.  Select the second-level output node.  On the right, you'll see a property called "IsSorted".  Set that to "true".  Open that second-level node to show all the columns.  Find the first column of your sort order and select it.  On the right, you'll see a property called "SortKeyPosition".  Set that to "1".  Find the second column in your sort, and set the SortKeyPosition to 2.  Etc.

Apr 28, 2011 at 9:07 PM

Had a follow-up question to this issue.  I was wondering when using "ORDER BY" with the DMSCD, does the component require the sort to be done using Windows collation as in the Merge and Merge Join Transformations (see http://msdn.microsoft.com/en-us/library/ms137653.aspx under "Sorting the Data").  I tend to use VARCHAR as opposed to NVARCHAR data types in my DB, so wanted to know if it would be possible to simply "ORDER BY" the standard VARCHAR fields, or whether I would have to do a CAST (or some other transformation) to NVARCHAR.  Thanks.

Coordinator
May 3, 2011 at 10:12 PM

The component doesn't require anything - it simply assumes that you know what you're doing.  If you say it's "ordered by" a certain column set, it believes you.  It doesn't care about collation at all.  You just need to be careful that:

  1. It is actually sorted
  2. If you sort both inputs (source and existing dimension) you use the same collation.

The DMSCD uses the sort order like this... (assume both inputs are sorted)

  1. Rows 1 through 10 arrive from the Existing Dimension.  Those ten rows are cached by the component, because it's waiting to see if it needs to compare them to rows coming from the Source.
  2. Rows 1 through 5 arrive from the Source.  Cached rows 1 through 5 from the ED are "matched" to these rows, decisions are made, and rows are pushed out the outputs.  The ED cache is relieved of the rows, so it now only contains rows 6 through 10.
  3. Row 7 arrives from the Source.  Cached row 7 is matched against this row, so a decision can be made about what to do with it.  But because we KNOW that row 6 is "before" row 7... we know that we will NEVER see row 6 from the Source input.  (If they weren't sorted, we wouldn't know that.)  Therefore, even through we haven't seen all of the Source rows, we can decide what to do with row 6 - expire it!

As you should be able to see - the actual collation is irrelevant.  I'm never comparing a row from the Source to a row from the ED - I'm only assuming relative ordering.