Sanitize a string for use as a filename
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Extracted from node-sanitize (https://github.com/parshap/node-sanitize-filename/blob/master/index.js) | |
* | |
* Replaces characters in strings that are illegal/unsafe for filenames. | |
* Unsafe characters are either removed or replaced by a substitute set | |
* in the optional `options` object. | |
* | |
* Illegal Characters on Various Operating Systems | |
* / ? < > \ : * | " | |
* https://kb.acronis.com/content/39790 | |
* | |
* Unicode Control codes | |
* C0 0x00-0x1f & C1 (0x80-0x9f) | |
* http://en.wikipedia.org/wiki/C0_and_C1_control_codes | |
* | |
* Reserved filenames on Unix-based systems (".", "..") | |
* Reserved filenames in Windows ("CON", "PRN", "AUX", "NUL", "COM1", | |
* "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9", | |
* "LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", and | |
* "LPT9") case-insesitively and with or without filename extensions. | |
* | |
* Capped at 255 characters in length. | |
* http://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs | |
* | |
* @param {String} input Original filename | |
* @param {Object} options {replacement: String} | |
* @return {String} Sanitized filename | |
*/ | |
var truncate = require("truncate-utf8-bytes"); | |
var illegalRe = /[\/\?<>\\:\*\|":]/g; | |
var controlRe = /[\x00-\x1f\x80-\x9f]/g; | |
var reservedRe = /^\.+$/; | |
var windowsReservedRe = /^(con|prn|aux|nul|com[0-9]|lpt[0-9])(\..*)?$/i; | |
function sanitize(input, replacement) { | |
var sanitized = input | |
.replace(illegalRe, replacement) | |
.replace(controlRe, replacement) | |
.replace(reservedRe, replacement) | |
.replace(windowsReservedRe, replacement); | |
return truncate(sanitized, 255); | |
} | |
module.exports = function (input, options) { | |
var replacement = (options && options.replacement) || ''; | |
var output = sanitize(input, replacement); | |
if (replacement === '') { | |
return output; | |
} | |
return sanitize(output, ''); | |
}; |
Also, it seems that truncate would cut the file name extension?
@Alynva using this sanitized.split("").splice(0, 255).join("")
is not a good idea, as complex symbols like emojis are made up of more than one character, so if you split a string containing a emoji, it will return an array of 2 elements
With modern javascript, we can use TextEncoder and TextDecoder to do the truncate for us, accurately and keeping in mind complex characters that take more than 1 byte (eg a☃
is 2 characters but 1 byte + 3 bytes = 4 bytes)
const truncate = (sanitized: string, length: number): string => {
const uint8Array = new TextEncoder().encode(sanitized)
const truncated = uint8Array.slice(0, length)
return new TextDecoder().decode(truncated)
}
Extra points: new Blob([sanitized]).size
will also provide you the byte size (though is less helpful in terms of truncation)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think that
truncate(sanitized, 255)
can be replaced withsanitized.split("").splice(0, 255).join("")
so don't need thetruncate-utf8-bytes
lib...