Status of v1.6

Topics: Announcements!
Coordinator
Mar 30, 2010 at 7:10 AM

DO NOT REPLY TO THIS THREAD.  This is an administrative thread to be posted in by project members only.  It is a tool to keep those who wish to be made aware of progress on the project but don't want to be overwhelmed by the standard CodePlex RSS feeds.  If you want to know what's going on with v1.6, subscribe to this thread.

Coordinator
Mar 30, 2010 at 7:16 AM

As I'd said in the v1.5 thread, I'd been hacking at performance a little bit as I (re)loaded history into my warehouse.  That, some conversations with some of you about the intra-day feature request, and a general feeling of well-being has led me to decide I shouldn't release those performance changes in v1.5 and disrupt what appears to be stable.  I'll be making an alpha page for v1.6 shortly, but roughly, here are the goals for v1.6:

  • The performance changes.  (They really might not amount to much at all - I don't have any benchmarks.)
  • The intra-day load feature
  • Fixing the YouTube video - yes, I just noticed there's a huge bit of dead air in the middle.
  • Making a benchmark package.  I hope to do some presentations at some SQLSaturdays, and I therefore need some ammo :)
  • Documentation, in general.  In moving to the new format for the documentation for v1.5, I discarded some useful information that was with the v1.4 pages.

I probably won't do any other features.  The above is enough for one release.  The changes made from v1.4 to v1.5 were quite substantial, and probably too many...

As always, no ETA.

Coordinator
Jul 18, 2010 at 8:53 PM

Not an ETA update - just a little update on where things are.

I haven't tackled any large issues yet - I'm not sure if I'll try to do the intra-day loads for the next release, that would probably hold it up too much.  For my own workloads, I'm much more interested in increasing performance some more.  To that end, I'm planning to install and use VS2010 to profile the threading behaviour and hopefully squeeze what I imagine are some substatial inefficiencies in my locking behaviour between threads, and my understanding of the performance profile of the objects I'm interacting with.  I don't expect substantial gains there - but you never know.  I am particularly interested in using "substitute hashed business keys" to improve performance.  By that, I mean some of the techniques I know some of you are using for this, and for roll-your-own SCD processing, where you use the T-SQL HASHBYTES or that other function, or the Multiple Hash component here on CodePlex to build a smaller key for each row, instead of using a compound, large key.  Internally, the KSCD already does something like this - but not quite.  The net effect with the current component is that it bears the burden of calculating it's hash for each execution, for each row.  If I could offload that calculation on to you - hopefully so you could persist half of the work (in the dimension table) - then the performance ought to increase.  If you can do some non-HASHBYTES magic and still get a unique, smaller representation of your Business Key - similar to what Darren Gosbell did knowing his business key had range limits on specific parts of that key - then you should be able to use that calculation as well.

Coordinator
Aug 22, 2010 at 6:18 AM

I crossed a bunch of t's and dotted a bunch of i's - the alpha build is now available from the Downloads tab.

Now - keep in mind that it isn't feature-complete yet.  I really do want to get the checksum business key in there.  I've run one of my dimensions with checksums, and it did provide a 35% boost versus a six-column 32 (total) character business key.

I do expect a significant performance boost from this new code.  I have to download v1.5 myself so that I can establish a baseline to measure v1.6 against :) - but I did use some of VS 2010's profiling tools to help me find some places that I could tweak.  Doing so also allowed me to figure out and reason through better optimization strategies for matching rows, which I think should help performance out more than the little tweaks profiling exposed.  I'll be blogging about the relative benefits of sorting various inputs as it relates to performance.  Turns out that you don't have to sort both inputs to get really great numbers...

Anyway, for some of the more avid fans of the component - yes, there are some crazies out there ;) - I first want to say that you are very much appreciated for your feedback.  Second, I would like you to test v1.6 against the specific issues that you've raised.  I can only be trusted so much when it comes to fixing the problems you've exposed...

Coordinator
Oct 22, 2010 at 8:30 PM

Wow, I get behind.

Good news is that I'm stress testing on my own warehouse at the moment, and things are looking good to get a real release out in the next week.

Coordinator
Nov 23, 2010 at 4:28 PM

Apologies yet again - I didn't make enough time before the "conference season" this year to get a new beta up... but I have something to offer in return.  In half an hour (short notice, I know) I'll be running through alternatives to the SCD Wizard for the BI virtual chapter of PASS.  You can attend the webcast for free - no strings attached - just go to the BI Virtual Chapter site.  It's at 12pm ET on Nov 23rd.  If you miss it - no biggie, you can download the recording (soon) either from that site, or my blog.  I'll have a link up to it as soon as I know it's posted.

And I do promise I'll get a new 1.6 up here soon...

Coordinator
Nov 27, 2010 at 12:11 AM
Edited Nov 27, 2010 at 12:12 AM

Here's proof I'm closer.  I found time to make some videos based off of the sample package I've used in several presentations.  The videos walk through the simple sample data, a sample SSIS package, and complete demonstrations of how to create dimension loading logic using four techniques: the SCD Wizard, other non-Wizard SSIS components, the T-SQL MERGE command, and the KSCD.  This time the videos show a practical use of the KSCD outputs, not the impractical "demonstration only" arrangement I had in the previous v1.5 video.  I hope that can help educate people on how to process dimensions using any one of the techniques, and show that none of them are particularly hard.  Visit my YouTube channel, or the front page of this site for links.

I plan to post a "pros and cons" conclusion video to cap off that series, as well as PUBLISH THE DAMN THING...

Coordinator
Jan 17, 2011 at 6:45 AM
Edited Jan 17, 2011 at 6:47 AM

Sick and tired of waiting?  Me too - so I'm pushing out the beta.  Sorry if some of the workitems haven't been ticked off.  If you're waiting for v1.6, please download the beta and give it a try.  Post up any problems so I can smooth them out before I switch it to release!

And if you haven't noticed - I've got a series of videos up on the main page that do a better job (I think) of describing both how to use the Kimball SCD, as well as how to use three other standard methods for processing SCD changes to dimensions using SSIS.  Hopefully, those will do a better job of educating people on what they should be doing with SCDs in general, as well as how to get the Kimball SCD working.

Coordinator
Apr 8, 2011 at 6:49 PM

WOOHOO.  A few bits of news to report...

FIRST - not so great news.  The component you love has been stripped of its name... but you've probably noticed it's been reborn as the Dimension Merge SCD.  Long live the DMSCD!

SECOND - and I know that anybody reading this (and especially those subscribed to the RSS feed or email notifications) has been waiting FOREVER... v1.6 has just been RTW'd!  Yes, you can now go and download the v1.6 goodness.  Take advantage of performance improvements (thread concurrency, memory reductions, code streamlining) as well as intra-day load capabilities.

THIRD - the good news keeps coming... Pragmatic Works is prepping their next version of Task Factory to include the DMSCD!  That's right.  A hearty thank you has to go out from me to all of you who have found some value in this project.  Your support - simply measured in downloads, reviews, and word of mouth - has resulted in one of the most-often requested "non-features" for this project... a support option.  If you've tried the component and liked it, but couldn't get your organization to risk using unsupported open-source software, I strongly encourage you to go visit Pragmatic Works.  Brian Knight, who many of you will know as an accomplished author, community champion, and MVP, has been very supportive of this project - and I can't thank him enough.  I'm VERY happy that Pragmatic Works is going to provide support for this component, and perhaps add some features along the way.

A final note for today - you may also notice that I've bundled a ZIP of the source code on the release page.  The reason for this is because I almost burst a blood vessel dealing with SVN/TFS/whatever pile of junk they run on CodePlex.  I'm sure the "source code" stored in CodePlex is a mess... but honestly, I can't check... because in trying to sort out what's going on, I can't find a way to download the complete source.  I can only download one change-set at a time.  WTF?  Anyway, now I know why so many other projects here have the source as a "release" download.

Coordinator
Apr 12, 2011 at 5:20 PM

I just refreshed the v1.6 release - the upgrade from v1.5 had a minor problem with it!  Everything should work as expected now.  I also failed to include some of the source in the Source zip on the downloads page - all the source is there now.

And if anyone knows how to unmangle TFS/SVN... call me.