How the Kimball SCD Task handles the Eff and Exp Dates

Feb 26, 2010 at 3:30 AM

How the Kimball SCD Task handles the Effective and Expiry Dates while populating a Dimension.

Feb 26, 2010 at 3:42 AM

I am facing an issue while using Kimball SCD dimension task for the dimension population:

Table structure used:

--This is a source table for dimension

CREATE TABLE [dbo].[src_sid](

[id] [int] NULL,

[a] [varchar](2) NULL,

[efdate] [datetime] NULL

) ON [PRIMARY]

GO

INSERT [dbo].[src_sid] ([id], [a], [efdate]) VALUES (1, N'A', GETDATE())

GO

--This is a dimension table structure

CREATE TABLE [dbo].[dim_sid](

[skid] [int] IDENTITY(1,1) NOT NULL,

[id] [int] NULL,

[a] [varchar](2) NULL,

[efdate] [datetime] NULL,

[expdate] [datetime] NULL,

[output] [nvarchar](50) NULL

) ON [PRIMARY]

GO

Selected the following values for the column set:

skid - surrogate key

id - business key

a - SCD 2

efdate - SCD2 Effective Date

expdate - SCD2 Expiration Date

record changes by marking - Old end yesterday and new starts today functionality

first record effective date - MS SQL date min or max

last record effective date - MS SQL date min or max

So , after selecting these values when I run the package for the first time it makes an entry as below:

Step 1)

select * from dim_sid

SKID ID a efdate expdate output

1 1 A 1753-01-01 00:00:00.000 9999-12-31 00:00:00.000 New

Step 2) Run the script -- update src_sid set a = 'B' where id = 1

Running the package again gives the below record set

select * from dim_sid

SKID ID a efdate expdate output

1 1 A 1753-01-01 00:00:00.000 9999-12-31 00:00:00.000 New

2 1 b 2010-02-20 00:00:00.000 9999-12-31 00:00:00.000 New SCD2

Issue here : The data should be like this:

SKID ID a efdate expdate output

1 1 A 1753-01-01 00:00:00.000 2010-02-19 00:00:00.000 New

2 1 B 2010-02-20 00:00:00.000 NULL New SCD2

step 3) Run the script -- update src_sid set a = 'C' where id = 1

Running the Package again gives the below record set:

SKID ID a efdate expdate output

1 1 A 1753-01-01 00:00:00.000 9999-12-31 00:00:00.000 New

2 1 B 2010-02-20 00:00:00.000 9999-12-31 00:00:00.000 New SCD2

3 1 B 2010-02-22 00:00:00.000 9999-12-31 00:00:00.000 Invalid

4 1 C 2010-02-22 00:00:00.000 9999-12-31 00:00:00.000 New SCD2

But the output should be like:

SKID ID a efdate expdate output

1 1 A 1753-01-01 00:00:00.000 2010-02-19 00:00:00.000 New

2 1 b 2010-02-20 00:00:00.000 2010-02-21 00:00:00.000 New SCD2

3 1 c 2010-02-22 00:00:00.000 NULL New SCD2

Please suggest how can i get the data like this by using this scd task, am I missing something.

I don't want to miss the opportunity to use this task as it is very much fast and optimized.

Feb 26, 2010 at 4:49 AM

Hello Guys,

I have resolved the eff and exp date issue. What is did is that, I added a oledb command task next to "SCD" Expiry derived column output task where i put the script " update dim_sid set expdate = getdate() and output = 'Expired SCD' where skid = ?". Thats the beauty of KSCD.  But still on the next update run of the task its sending the output to invalid output task sometimes and sometimes not. Let me check how i am sorting the business keys before passing to the KSCD Task. Please let me know my mistake before i solve it.

Thanks,

Sid 

Feb 26, 2010 at 5:46 AM

Hello Guys,

I am back to you again with the the solution of my own problem. So when i update a value for the second time for the scd2 column then on the execution of the package, it inserts two rows 1st as scd new output and 2nd for invalid input which is not all the right way. The reason for invalid input was Duplicate Business Key In Existing Dimension, so when i checked the sort task for the dimension table before passing the values to the KSCD task I had selected business key and effdate column for sorting and here the issue was which i missed. I sorted the dimension data only by business key in this case an id column which solves this issue.

Thanks a lot guys.

Regards,

Sid

Coordinator
Feb 26, 2010 at 6:40 PM

Hey Sid,

I think that the root of the problem is that you have an "effdate" on your source system which is probably getting linked to the "effdate" on the dimension.  Let me explain...

The columns marked as "SCD2 Effective Date", "SCD2 Expiry Date", and "SCD2 Current Record" on the Existing Dimension should NOT pair up with columns on the source system.  Why?  Because they're housekeeping columns intended to manage the lifetime of the "version" of that record within the warehouse.  As such, the source data flow shouldn't be dictating, suggesting, or setting any of those properties.  Those properties should be determined solely by the loading and processing of the dimension.

However, I think that the problem is that my component isn't actively preventing you from (unintentionally) linking up any of those columns.  I haven't read through all of your sample runs - mostly because you shouldn't be providing an effective date anyway - but I'd wager that you'll see that the component is (or should be) specifically ignoring most of the values you're placing in that effective date column.

So - the real question (for me) is - why are you providing an "effdate" column on the Source flow?

Mar 2, 2010 at 10:07 AM

Hi Todd,

eff date column in present in the source as record eff dat, however here in this task i am not mapping source eff date column to dim eff date column and also if we try to map them keeping the eff date in dim table as SCD Eff date , it will not allow/ will not be in effect to perform like this way.

Thanks and regards,

Sid

Coordinator
Mar 6, 2010 at 5:41 AM

The component must be using the eff date source column improperly - otherwise it should be complaining about not having all columns mapped properly.  At least I think it should.  I'll have to try it out.

My question stands though - what is the "eff date" on the source flow for?  Remove it...