Unable to Load Data - Out of Memory Errors

Topics: Issues 3: Executing the Component (Run-Time), Possible Bugs/Missing Features?
Sep 10, 2012 at 6:36 PM

I am creating a new SCD that will have a large number of records (100M+) and approximately 2.5M in changes from the source data to evaluate daily.  Right now I am simply working on building the package and trying to load some sample data.  After about 1.5M rows are read from SOURCE DATA, zero from EXISTING DIMENION and 200K rows are output to the NEW flow, I start to get out of memory errors.  See Scenario 1 error messages below.

  • Server: SS2008 R2, 32-bit, 20GB RAM on Windows Server 2008 R2
  • Source Data: 15 nvarchar fields for 150 characters, 6 DATETIME2 fields, 5 numeric fields

So I decided to limit the SOURCE DATA records to 500K at a time and loop the feed.  This worked fine until there were 5.3M EXISTING DIMENSION rows.  Then the out of memory errors started again.  In this scenario, the only output was to "NEW".  Both inputs were properly sorted.

Are there some settings I can change that may throttle the data flow and reduce memory requirements?

Below are the various error codes.

Scenario 1 - Bulk load of data error messages:


[Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in ProcessInput sending row to cache.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in ProcessInput adding rows to the cache.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in ProcessInput.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) ProcessCache_SendOtherInputToNew setting SCD2 dates, key: e1714300000000000000000000000000300030003000310036003500.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) ProcessCache_SendOtherInputToNew setting SCD2 dates, key: e1714300000000000000000000000000300030003000310036003500.) ProcessCache_SendOtherInputToNew, key: e1714300000000000000000000000000300030003000310036003500.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error (Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) ProcessCache_SendOtherInputToNew setting SCD2 dates, key: e1714300000000000000000000000000300030003000310036003500.) ProcessCache_SendOtherInputToNew, key: e1714300000000000000000000000000300030003000310036003500.) in ProcessKey, case 4 - New, key: e1714300000000000000000000000000300030003000310036003500.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal exceptions encountered.) in ProcessInput.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error (Internal error (Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) ProcessCache_SendOtherInputToNew setting SCD2 dates, key: e1714300000000000000000000000000300030003000310036003500.) ProcessCache_SendOtherInputToNew, key: e1714300000000000000000000000000300030003000310036003500.) in ProcessKey, case 4 - New, key: e1714300000000000000000000000000300030003000310036003500.) in ProcessKey detecting changes, key: e1714300000000000000000000000000300030003000310036003500.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) ProcessCache_SendOtherInputToNew setting inferred member(s), key: e1714300000000000000000000000000300030003000320033003000.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error (Internal error (Internal error (Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) ProcessCache_SendOtherInputToNew setting SCD2 dates, key: e1714300000000000000000000000000300030003000310036003500.) ProcessCache_SendOtherInputToNew, key: e1714300000000000000000000000000300030003000310036003500.) in ProcessKey, case 4 - New, key: e1714300000000000000000000000000300030003000310036003500.) in ProcessKey detecting changes, key: e1714300000000000000000000000000300030003000310036003500.) in ProcessCache_Thread_ProcessKey.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) ProcessCache_SendOtherInputToNew setting inferred member(s), key: e1714300000000000000000000000000300030003000320033003000.) ProcessCache_SendOtherInputToNew, key: e1714300000000000000000000000000300030003000320033003000.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal exceptions encountered.) in ProcessInput.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal exceptions encountered.) in ProcessInput.

Scenario 2 - Looping load of data 500K records at a time

[Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in MatchKeys thread attempting to queue key fe451e01000000000000000000000000320034003100320036003000.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in MatchKeys thread attempting to queue key 10241e01000000000000000000000000300030003600360034003000.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in MatchKeys thread attempting to queue key 10561e01000000000000000000000000300030003000300033003000.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in MatchKeys thread attempting to queue key 10241e01000000000000000000000000300030003600360034003000.) in MatchKeys thread.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in MatchKeys thread attempting to queue key fe451e01000000000000000000000000320034003100320036003000.) in MatchKeys thread.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in MatchKeys thread attempting to queue key 10561e01000000000000000000000000300030003000300033003000.) in MatchKeys thread.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error in MatchKeys thread.) in ProcessCache_Thread_MatchKeys.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error in MatchKeys thread.) in ProcessCache_Thread_MatchKeys.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in MatchKeys thread attempting to queue key 44241e01000000000000000000000000300031003000320035003000.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error in MatchKeys thread.) in ProcessCache_Thread_MatchKeys.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in MatchKeys thread attempting to queue key 44241e01000000000000000000000000300031003000320035003000.) in MatchKeys thread.[Dimension Merge Slowly Changing Dimension] Error: Internal error (Internal error in MatchKeys thread.) in ProcessCache_Thread_MatchKeys.

Sep 11, 2012 at 10:16 AM

You are using 32 bit with loads over 100 million records? How is the OS setup, ie is it actually using the full 20GB in the machine? unless you have pulled several tricks out of the bag you'll never get 32 bit to handle 20Gb of ram!

I think in this scenerio you would have a much better chance of getting this to work if you go to 64 bit os and then possibly consider chopping down the 150 character nvarchars to something a bit more sensible. For Dimension attributes I usually don't recommend more than 50 chars as the description can get messy. You can also save memory by switching to Varchar instead of Nvarchar, if you can get away with not using unicode.

Other things to try, on the existing dimension try limiting the select to only 'current' members.

The looping idea won't really work for you if your using type 2 changes though, unless you set the existing dimension source to pull in just the same business keys as the data in the source, if you dont the component will start saying 'ahh existing business key z doesn't exist in the source anymore, I therefore have to expire the latest row for business key z'  which could lead to terrible problems when you try and load your facts!

I think your best bet though is to go to 64 bit and get more ram!

 

Sep 11, 2012 at 11:01 PM

WeeeBob - thank you for your reply.  I'm afraid I'm stuck with my 32-bit dev machine for a couple of more months.  The 20GB of RAM is only really used by SQL Server and even then not as much as needed. 

I know SSIS will improve on 64-bit (my production server is 64-bit), however, I am concerned that the task will not process.  I cannot push something to production without knowing that it works first. 

The total nvarchar characters I have is 150 characters across 15 fields.  Overall that does not seem to be that much.  And yes, I need to use nvarchar as some fields may have unicode data. The existing dimension is limited to just "current" members.

The looping load of the data works this way: 

  1. Business key 1 to 500,000 is processed and loads through the "New" output. 
  2. Next loop loads 1 to 500,000 from the existing dimension and 500,001 to 1,000,000 from source data. 
  3. All the data for source is output as "New". 
  4. I turned off the "Deleted" output as no records are ever deleted. 
  5. This loops fine until there is about 5 million rows of data in the existing dimension.  Then the out of memory errors start.  It seems that the task is not releasing the rows of the existing dimension from memory.  Since the first row of the source is 5,000,001, the task should discard rows from the existing dimension until it gets to a business key of 5,000,001 or greater.

My final data will have about 60M "current" records and about 5M source records.  Most of the 5M source records will not yield any change (Type 1 or Type2).  And yes, the 5M has been reduced based upon modified date fields in the source data.

Does anyone have this level of processing occuring?

Rob

 

Jan 8, 2013 at 9:37 AM

Hi Rob,

I'm having the exact same issue. I'm trying to bulk load initial data into an empty destination table

I've monitored my memory commit size while the task is running and my findings are that the task appears to be creating some sort of cached dataset of all the 'new' rows that keeps growing and remains in memory until the package finishes

So the data is retained in memory even though none of the rows coming in will match anything and it should just pass though. I've also noticed that the speed with which the rows exit the SCD task is about 1/2 - 1/4 that with which they come in.

I'm assuming it's the same with the data from the 'destination' - it appears to be able to hold more 'existing' records than new records though, i can load a few times more 'destination data' as source data (im guessing a hashed version is held which is narrower?)

But yes my findings are that this task can't be used for medium to large data sets or bulk loads where a high large volume of rows are new :(

I would have thought that new rows would just pass though, and i'm with you with the retention of rows where the business key is outside the sorted range of the 'source'. They should go straight to trash

weeBobs advice of just pulling through the business keys that exist in the source is what I'd do

Jan 24, 2013 at 6:18 PM

I am loading 3M new rows against a 29M type 2 dimension where the soure and target tables both have over 60 columns with over 50% of those columns being nvarchar. I just had a system out of memory exception today for the first time in a year. The tables only grow by about 2% a month. My server is Windows 2008 R2 x64 with 48GB RAM. I cannot find a way to reduce my data sets or to expand the cache setting. Does anyone else out there have any ideas on how to resolve this out of memory error?

Jan 24, 2013 at 7:28 PM

I wrote my own SCD set of tasks. I found I could process 350K change rows against 60M source rows in under 30 minutes. The data flow starts by using a Merge Join task. The source rows are outer joined to the change rows. Thus only the change rows flow through the rest of the process.

Rob

Jan 24, 2013 at 10:11 PM

I considered changing to using Execute SQL tasks with T-SQL merge statements or possibly using the Merge Join task. Do you know if both approaches have the same performance? I am leaning towards T-SQL just because it can be easier to troubleshoot.

Ken

Jan 24, 2013 at 11:24 PM

Pragmatic Works, which now sells and supports SSIS Dimension Merge, has a good white paper on the different methods including T-SQL Merge. From their home page, go to Learning Center > Resources.

As I mentioned, my custom SCD used a Merge Join at the start to identify changes. I checked my log and the entire package processes records at about 25K source rows per second.

Jan 24, 2013 at 11:37 PM
Edited Jan 24, 2013 at 11:39 PM

I checked the pragmatic works scd task but unfortunately it has the same memory probs. I reported it to them and they're looking at it so it might get fixed eventually.

But I went with the same approach as rob with the merge task. I also use the free Konesans checksum task to generate a hash of the attributes I want to monitor for change, then use the merge task to join the datasets on the business key.

Then a conditional split on comparing the src and dw dim hashes. And checking for nulls to process new and expired/deleted rows.

At its worst (all matches no changes) this was as fast as the scd task, at its best it was much faster and in all cases the memory footprint was much smaller. I'd still use the scd task for <100k rows as its much quicker to set up and the processing time difference is negligible, but anything bigger and i used the above approach

TSQL will be much faster for large amount of changes because you'll be doing a single set based update vs many single row update statements, but you lose the auditing (rowcounts,error output etc) of ssis. If you do have a lot of changes and want to use ssis then you could send the changed rows for scd1 updates into a staging update table then use the execute sql task to do a single set based update