Skip to content

Instantly share code, notes, and snippets.

@danieljarolim
Last active December 19, 2021 13:42
Show Gist options
  • Save danieljarolim/e89ff5b41b12383c60c7 to your computer and use it in GitHub Desktop.
Save danieljarolim/e89ff5b41b12383c60c7 to your computer and use it in GitHub Desktop.
SHA1 SSIS Script Component
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Microsoft.SqlServer.Dts.Pipeline;
using System.Security.Cryptography;
using System.Text;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
private int ColumnCount;
private ColumnClass[] ColumnNameArray;
private PipelineBuffer inputBuffer;
private SHA1CryptoServiceProvider SHA1Hash;
private class ColumnClass
{
public int Index;
public string Name;
}
public override void PreExecute()
{
base.PreExecute();
// Initialise the SHA1 Crypto Provider
SHA1Hash = new SHA1CryptoServiceProvider();
//SHA1Hash.Initialize();
ColumnCount = ComponentMetaData.InputCollection[0].InputColumnCollection.Count;
ColumnNameArray = new ColumnClass[ColumnCount];
int[] ColumnIndexes = GetColumnIndexes(ComponentMetaData.InputCollection[0].ID); // Same as InputID in ProcessInput
int columnIndex = 0;
foreach (IDTSInputColumn100 item in ComponentMetaData.InputCollection[0].InputColumnCollection)
{
ColumnNameArray[columnIndex] = new ColumnClass();
ColumnNameArray[columnIndex].Name = Convert.ToString(item.Name);
ColumnNameArray[columnIndex].Index = ColumnIndexes[ComponentMetaData.InputCollection[0].InputColumnCollection.GetObjectIndexByID(item.ID)];
columnIndex++;
}
Array.Sort(
ColumnNameArray,
delegate(ColumnClass column1, ColumnClass column2)
{ return column1.Name.CompareTo(column2.Name); }
);
}
public override void PostExecute()
{
base.PostExecute();
/*
Add your code here for postprocessing or remove if not needed
You can set read/write variables here, for example:
Variables.MyIntVar = 100
*/
}
public override void ProcessInput(int InputID, Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer Buffer)
{
// We need access to the PipelineBuffer which isn't exposed in ProcessInputRow
inputBuffer = Buffer;
base.ProcessInput(InputID, Buffer);
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
StringBuilder HashString = new StringBuilder(10000); -- Change to be larger than max size of concatenated columns
for (int columnIndex = 0; columnIndex < ColumnCount; columnIndex++)
{
String ColumnString;
if (inputBuffer[ColumnNameArray[columnIndex].Index] != null)
ColumnString = Convert.ToString(inputBuffer[ColumnNameArray[columnIndex].Index]);
else
ColumnString = "NULL";
HashString.Append(ColumnString);
HashString.Append("|");
}
Row.SHA1 = SHA1Hash.ComputeHash(Encoding.Unicode.GetBytes(HashString.ToString()));
}
}

SHA1 SSIS Script Component

WARNING: Do not copy a script component within a package in VS2015. It will not update the Assembly name when copied so when executing only the first script will be executed for all copies made and result in errors or bad output. Assemly names get updated when copying between backages so ss a workaround copy the script into a different package then copy back.

When comparing entire rows of a wide table with many columns, using the derived column component or conditional split component becomes complicated because of the 4000 character limit. Changes to metadata in the data flow mean lots of work updating complex comparison expressions. It’s easier to calculate a hash value of the fields and compare the hash value in the conditional split.

There are a few ways to get the hash.

  • Using the HASH() function within an SQL query but that’s not possible for columns changed by the data flow.
  • The SSIS Multiple Hash component from http://ssismhash.codeplex.com/ which I’ve found a little slow to setup because of the way input fields are selected.
  • This script. I’ve found it’s just as fast as the SSIS Multiple Hash custom component.

To use the script: Drag in a script component (default C#) and add a single (DT_BYTES,40) output called SHA1, edit the script and paste in the script.

It’s fast to use when creating data flows if you copy and paste the component to create new instances rather than creating from scratch each time. Then select the input rows to hash. Hit edit to open the script and close it again to re-save.

There are caveats to using the script.

  • The fields are sorted in alpha order based on the data flow field names and concatenated before SHA1 is calculated on the whole. This means changes to field names could change sort order and change the hash value. It also means fields on both sides of the indented compare should be named the same or named in a way that results in the same sort order.
  • Because of the method used to sort the fields, anytime columns within the data flow change, the script needs to be opened and closed so SSIS can update the field names and field indexes used for sorting by the script.
@HartCO
Copy link

HartCO commented Apr 7, 2017

This seems super helpful to get away from manually concatenating all the strings, but getting one error on the output assignment:
Row.SHA1 = SHA1Hash.ComputeHash(Encoding.Unicode.GetBytes(HashString.ToString()));
Throws: Cannot implicitly convert type 'byte[]' to 'string'.
Any idea on that?
Thanks much!

Edit: I think this might make sense, but I'm a C# rookie so not confident:
Row.SHA1 = Encoding.Unicode.GetString(SHA1Hash.ComputeHash(Encoding.Unicode.GetBytes(HashString.ToString())));
Edit2: The above returned some gibberish, which seems valid enough but not quite what I was expecting to see, Convert.ToBase64String seems more in line with what I'd expect.

@trumpet7347
Copy link

This is amazing work, thank you so much!!

I also encountered the conversion error, I corrected it by changing the data type of the output on the script component to DT_BYTES instead of DT_WSTR, and this fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment