How to work with partial Source sets

Topics: Questions About Best Practices
Nov 9, 2010 at 11:43 PM
Edited Nov 9, 2010 at 11:44 PM

Hi guys, my source data I am retrieving as any records that have been added/modified/deleted since I last uploaded. But of course this appears to cause the Kimball component some trouble as it then believes that loads of records have been deleted.

I have now rigged up a load of Multicast, Full Outerjoin, Conditional split work before the kimball component is fired (to ensure that the kimball component is only comparing against dimension rows that relate to those source rows that are presented - but can I just check - does it need to be this complex or am I overlooking something? I would have thought that this pattern (only receiving source rows) was not particularly unusual so wondered if there was a good and established pattern here?

Of course I could ignore the deletes but then my audit information would be incorrect (it would say deletes when none happened) and also I would then need a seperate "real" delete path anyway which seems counter intuitive - however maybe this is simpler overall.

Is this a feature waiting to be specified? :)

Nov 25, 2010 at 7:12 PM

Whenever you're using "CDC" type information in the Source input, you're going to have some problems with "deletes" according to the component.  It doesn't know those rows are intentionally "missing".

If I understand your situation properly, what I'd suggest is removing most of your pre-processing and place a single Conditional Split between your Source and the KSCD.  In that Conditional Split, send the "deleted" rows off to have their expiry date set, and appropriate audit information set.  Send all other rows to the KSCD, and simply ignore the "deleted" output of the KSCD.