Last active
March 20, 2024 04:05
-
-
Save KEINOS/78cc23f37e55e848905fc4224483763d to your computer and use it in GitHub Desktop.
GAS(Google Apps Script) user function to get MD5 hash or 4digit shortened hash for Multibyte(UTF-8, 2bytes character) environment.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* ------------------------------------------ | |
* MD5 function for GAS(GoogleAppsScript) | |
* | |
* You can get a MD5 hash value and even a 4digit short Hash value of a string. | |
* ------------------------------------------ | |
* Usage1: | |
* `=MD5("YourStringToHash")` | |
* or | |
* `=MD5( A1 )` | |
* to use the A1 cell value as the argument of MD5. | |
* | |
* result: | |
* `FCE7453B7462D9DE0C56AFCCFB756193` | |
* | |
* For your sure-ness you can verify it locally in your terminal as below. | |
* `$ md5 -s "YourStringToHash"` | |
* | |
* Usage2: | |
* `=MD5("YourStringToHash", true)` for short Hash | |
* | |
* result: | |
* `6MQH` | |
* Note that it has more conflict probability. | |
* | |
* How to install: | |
* Copy the scipt, pase it at [Extensions]-[Apps Script]-[Editor]-[<YourProject>.gs] | |
* or go to https://script.google.com and paste it. | |
* For more details go: | |
* https://developers.google.com/apps-script/articles/ | |
* | |
* License: WTFPL (But mentioning the URL to the latest version is recommended) | |
* | |
* Version: 1.1.0.2022-11-24 | |
* Latest version: | |
* https://gist.github.com/KEINOS/78cc23f37e55e848905fc4224483763d | |
* | |
* Author/Collaborator/Contributor: | |
* KEINOS @ https://github.com/keinos | |
* Alex Ivanov @ https://github.com/contributorpw | |
* Curtis Doty @ https://github.com/dotysan | |
* Haruo Nakayama @ https://github.com/harupong | |
* | |
* References and thanks to: | |
* https://stackoverflow.com/questions/7994410/hash-of-a-cell-text-in-google-spreadsheet | |
* https://gist.github.com/KEINOS/78cc23f37e55e848905fc4224483763d#gistcomment-3129967 | |
* https://gist.github.com/dotysan/36b99217fdc958465b62f84f66903f07 | |
* https://developers.google.com/apps-script/reference/utilities/utilities#computedigestalgorithm-value | |
* https://cloud.google.com/dataprep/docs/html/Logical-Operators_57344671 | |
* https://gist.github.com/KEINOS/78cc23f37e55e848905fc4224483763d#gistcomment-3441818 | |
* ------------------------------------------ | |
* | |
* @param {(string|Bytes[])} input The value to hash. | |
* @param {boolean} isShortMode Set true for 4 digit shortend hash, else returns usual MD5 hash. | |
* @return {string} The hashed input value. | |
* @customfunction | |
*/ | |
function MD5( input, isShortMode ) | |
{ | |
var isShortMode = !!isShortMode; // Ensure to be bool for undefined type | |
var txtHash = ''; | |
var rawHash = Utilities.computeDigest( | |
Utilities.DigestAlgorithm.MD5, | |
input, | |
Utilities.Charset.UTF_8 // Multibyte encoding env compatibility | |
); | |
if ( ! isShortMode ) { | |
for ( i = 0; i < rawHash.length; i++ ) { | |
var hashVal = rawHash[i]; | |
if ( hashVal < 0 ) { | |
hashVal += 256; | |
}; | |
if ( hashVal.toString( 16 ).length == 1 ) { | |
txtHash += '0'; | |
}; | |
txtHash += hashVal.toString( 16 ); | |
}; | |
} else { | |
for ( j = 0; j < 16; j += 8 ) { | |
hashVal = ( rawHash[j] + rawHash[j+1] + rawHash[j+2] + rawHash[j+3] ) | |
^ ( rawHash[j+4] + rawHash[j+5] + rawHash[j+6] + rawHash[j+7] ); | |
if ( hashVal < 0 ) { | |
hashVal += 1024; | |
}; | |
if ( hashVal.toString( 36 ).length == 1 ) { | |
txtHash += "0"; | |
}; | |
txtHash += hashVal.toString( 36 ); | |
}; | |
}; | |
// change below to "txtHash.toUpperCase()" if needed | |
return txtHash; | |
} |
Finally -- is getBytes an expensive operation? It seems to take a long time for some (large) files. For example, my script tried to getBytes for a large Google Sheets file and that took 16 seconds, and it produced an array of something like 50,000 rows. Then the MD5 function timed out on it.
(edit) As a workaround for the GAS files, I found I can get the file's MIME type and if it's "application/vnd.google-apps.script" then I just skip processing it for now.
Oops you've already tried it sorry.
As you mentioned, I assume getBytes
is expensive or has some kind of limitation and stretches the response time, thus time out.
I wonder why they don't implement a hash function to the blob
class since it is useful to de-dup files for machine learning.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@knwpsk
Got it! Some what like CAS, isn't it?
How about using getBytes() method from the getBlob()'s blob object? (not tested though)