Skip to content

Instantly share code, notes, and snippets.

@barbietunnie
Created March 12, 2016 07:24
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save barbietunnie/7bc6d48a424446c44ff4 to your computer and use it in GitHub Desktop.
Save barbietunnie/7bc6d48a424446c44ff4 to your computer and use it in GitHub Desktop.
Sanitize a string for use as a filename
/**
* Extracted from node-sanitize (https://github.com/parshap/node-sanitize-filename/blob/master/index.js)
*
* Replaces characters in strings that are illegal/unsafe for filenames.
* Unsafe characters are either removed or replaced by a substitute set
* in the optional `options` object.
*
* Illegal Characters on Various Operating Systems
* / ? < > \ : * | "
* https://kb.acronis.com/content/39790
*
* Unicode Control codes
* C0 0x00-0x1f & C1 (0x80-0x9f)
* http://en.wikipedia.org/wiki/C0_and_C1_control_codes
*
* Reserved filenames on Unix-based systems (".", "..")
* Reserved filenames in Windows ("CON", "PRN", "AUX", "NUL", "COM1",
* "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9",
* "LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", and
* "LPT9") case-insesitively and with or without filename extensions.
*
* Capped at 255 characters in length.
* http://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs
*
* @param {String} input Original filename
* @param {Object} options {replacement: String}
* @return {String} Sanitized filename
*/
var truncate = require("truncate-utf8-bytes");
var illegalRe = /[\/\?<>\\:\*\|":]/g;
var controlRe = /[\x00-\x1f\x80-\x9f]/g;
var reservedRe = /^\.+$/;
var windowsReservedRe = /^(con|prn|aux|nul|com[0-9]|lpt[0-9])(\..*)?$/i;
function sanitize(input, replacement) {
var sanitized = input
.replace(illegalRe, replacement)
.replace(controlRe, replacement)
.replace(reservedRe, replacement)
.replace(windowsReservedRe, replacement);
return truncate(sanitized, 255);
}
module.exports = function (input, options) {
var replacement = (options && options.replacement) || '';
var output = sanitize(input, replacement);
if (replacement === '') {
return output;
}
return sanitize(output, '');
};
@akshatsethi2
Copy link

@Alynva using this sanitized.split("").splice(0, 255).join("") is not a good idea, as complex symbols like emojis are made up of more than one character, so if you split a string containing a emoji, it will return an array of 2 elements

@Techn1x
Copy link

Techn1x commented May 5, 2023

With modern javascript, we can use TextEncoder and TextDecoder to do the truncate for us, accurately and keeping in mind complex characters that take more than 1 byte (eg a☃ is 2 characters but 1 byte + 3 bytes = 4 bytes)

const truncate = (sanitized: string, length: number): string => {
  const uint8Array = new TextEncoder().encode(sanitized)
  const truncated = uint8Array.slice(0, length)
  return new TextDecoder().decode(truncated)
}

Extra points: new Blob([sanitized]).size will also provide you the byte size (though is less helpful in terms of truncation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment