WARNING: Do not copy a script component within a package in VS2015. It will not update the Assembly name when copied so when executing only the first script will be executed for all copies made and result in errors or bad output. Assemly names get updated when copying between backages so ss a workaround copy the script into a different package then copy back.
When comparing entire rows of a wide table with many columns, using the derived column component or conditional split component becomes complicated because of the 4000 character limit. Changes to metadata in the data flow mean lots of work updating complex comparison expressions. It’s easier to calculate a hash value of the fields and compare the hash value in the conditional split.
There are a few ways to get the hash.
- Using the HASH() function within an SQL query but that’s not possible for columns changed by the data flow.
- The SSIS Multiple Hash component from http://ssismhash.codeplex.com/ which I’ve found a little slow to setup because of the way input fields are selected.
- This script. I’ve found it’s just as fast as the SSIS Multiple Hash custom component.
To use the script: Drag in a script component (default C#) and add a single (DT_BYTES,40) output called SHA1, edit the script and paste in the script.
It’s fast to use when creating data flows if you copy and paste the component to create new instances rather than creating from scratch each time. Then select the input rows to hash. Hit edit to open the script and close it again to re-save.
There are caveats to using the script.
- The fields are sorted in alpha order based on the data flow field names and concatenated before SHA1 is calculated on the whole. This means changes to field names could change sort order and change the hash value. It also means fields on both sides of the indented compare should be named the same or named in a way that results in the same sort order.
- Because of the method used to sort the fields, anytime columns within the data flow change, the script needs to be opened and closed so SSIS can update the field names and field indexes used for sorting by the script.
This is amazing work, thank you so much!!
I also encountered the conversion error, I corrected it by changing the data type of the output on the script component to DT_BYTES instead of DT_WSTR, and this fixed the issue.