Skip to content

Instantly share code, notes, and snippets.

@danieljarolim
Last active January 7, 2022 14:23
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danieljarolim/51e4b536cc7230e2ef6d to your computer and use it in GitHub Desktop.
Save danieljarolim/51e4b536cc7230e2ef6d to your computer and use it in GitHub Desktop.
SSIS Range Lookup

SSIS Range Lookup

Range lookups are used for slowly changing dimensions where there is a need to attach a dimension to a fact based on a key and a date range. There is no stock component to do this but there are three ways to carry out range lookups. The first is a merge join followed by a conditional split pattern. Second is a third party component, and last is a script component.

Merge Join Conditional Split Pattern

This requires both the facts and the joining dimension to be sorted by the dimension's natural key. For large fact tables the sort component is restriceted by available memory. This can be easily overcome by using a sort in the source query and marking the source component as sorted by that column but it means only a single SCD can be used in a data flow. It's also slow. The mertge join multiplies the number of fact rows by the number of SCD entires for each key, and gets reduced again by the following conditional split which drops entries that don't fit in the required date range. It means each natural key in the dimension must have records covering the entire date range or fact records will be lost. (How do you tell the conditional split that you haven't matched any of the dante ranges for the key in the SCD?)

Script Component

The downside is that it is slower to setup and needs fiddly editing but is fast to execute. The upsidee is that it is much faster than using a merge join with none of its limitations.

  1. Using the followng template to create a script transform component.
  2. Add an ADO conection manager to the script with the default Connection name.
  3. Select the Input columns containing the Key to join and the date range.
  4. Add output columns with the correct types to be added to the data flow from the lookup.
  5. Edit the script.
  6. Update the LookupClass class to change the Key to the correct type. Change the output columns to the corect names and types.
  7. Update the query for the lookup. The columns are refenced 0 to n-1.
  8. Update the assignement of the class from the query to correctly populate the key, dates, and outputs.
  9. Update the ProcessInputRow method to populate the new output columns from the Lookup class.

By default the compoent assignes null to all rows with no matching lookup. It acts the same as a stock lookup component with "ignore failed lookups" selected.

Note: The code assumes we're using Database Dates (DT_DBATE) where each dimension record goes from StartDate to EndDate. If using Kimbal Method DateTime for StartDate and EndDate where each record is flagged StartDate to EndDate but goes from StartDate to almost EndDate ie to 23:59:997 of day prior to EndDate, then change the check in ProcessInputRow from "Row.InteractionDate <= LookupRecord.EndDate" to "Row.InteractionDate < LookupRecord.EndDate"

/* Microsoft SQL Server Integration Services Script Component
*  Write scripts using Microsoft Visual C# 2008.
*  ScriptMain is the entry point class of the script.*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Data.SqlClient; // Required to read lookup values
using System.Collections.Generic; // For Dictvand List 

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

    // Class to hold key with date range, and output values.
    private class LookupClass
    {
        public String Key;
        public DateTime StartDate, EndDate;
        public Int32 OutputDimensionKey;  // Add more, one for each output column
    }

    // Dictionary to lookup based on natural key, holds list of date ranges
    private Dictionary<String, List<LookupClass>> Lookup;

    public override void PreExecute()
    {
        base.PreExecute();

        // Create Lookup Dictionary
        Lookup = new Dictionary<String, List<LookupClass>>();
        
        // Connect to ADO Connection to get lookup values
        IDTSConnectionManager100 LookupCM = Connections.Connection;
        SqlConnection LookupConnection = (SqlConnection)LookupCM.AcquireConnection(null);

        // Populate Lookup Dictionary
        SqlCommand LookupQuery = new SqlCommand("SELECT  OutputDimensionKey, RecordStartDate, RecordEndDate, NaturalKey FROM Dim_SomeDim", LookupConnection);
        SqlDataReader LookupQueryResult = LookupQuery.ExecuteReader();
        while (LookupQueryResult.Read())
        {
            // Create and Populate new Lookup Record
            LookupClass LookupRecord = new LookupClass();
            LookupRecord.Key = LookupQueryResult[3].ToString();
            LookupRecord.StartDate = (DateTime)LookupQueryResult[1];
            LookupRecord.EndDate = (DateTime)LookupQueryResult[2];
            LookupRecord.OutputDimensionKey = (Int32)LookupQueryResult[0];
            
            // Create new key with list of records (one for each date range)
            if (!Lookup.ContainsKey(LookupRecord.Key))
                Lookup.Add(LookupRecord.Key, new List<LookupClass>());

            // Add Record with date range to list for a key
            Lookup[LookupRecord.Key].Add(LookupRecord);
        }
        LookupQueryResult.Close();
        
    }

    public override void PostExecute()
    {
        base.PostExecute();
        /*
          Add your code here for postprocessing or remove if not needed
          You can set read/write variables here, for example:
          Variables.MyIntVar = 100
        */
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Output default null
        Row.OutputDimensionKey_IsNull = true;

        if (!Row.NaturalKey_IsNull && Lookup.ContainsKey(Row.NaturalKey))
        {
            foreach (LookupClass LookupRecord in Lookup[Row.NaturalKey])
            {
                if (Row.InteractionDate >= LookupRecord.StartDate && Row.InteractionDate <= LookupRecord.EndDate)
                {
                    Row.OutputDimensionKey = LookupRecord.OutputDimensionKey;
                    break;
                }
            }
        }
    }
}
@tajmahalz
Copy link

tajmahalz commented Jun 1, 2016

[Update] I was use OLEDB Type of connection, that made this error
I have got an runtime error,

> Unable to cast COM object of type 'System.__ComObject' to class type 'System.Data.SqlClient.SqlConnection'. Instances of types that represent COM components cannot be cast to types that do not represent COM components; however they can be cast to interfaces as long as the underlying COM component supports QueryInterface calls for the IID of the interface.
> 
> at ScriptMain.PreExecute()
> at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PreExecute()

@qunsong
Copy link

qunsong commented Jun 2, 2016

tajmahalz,
I have encountered the same issue, and wonder if you have found a solution?

Thanks!
Jennifer

@danieljarolim
Copy link
Author

This uses an ADO connection manager. You'll need to change the script to work with an OLEDB connection. It's possible but it's messier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment