Last active
February 24, 2024 13:01
-
-
Save sogaiu/993ea373fd045dd13436c63e51404f92 to your computer and use it in GitHub Desktop.
notes on choosing file names for janet identifiers across unixy and windows filesystems
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
naming of usages files | |
* cannot use all symbol names as-is because of file system and | |
urls limitations | |
* problem characters include: /, <, >, :, *, %, ? (others?) | |
* want readable / typable "escaping" if possible because | |
people may need to: | |
* figure out what symbol a file is associated with | |
* type the file name before it has been created (like for jref work) | |
* using subdirectories when a name has a slash in it doesn't work well | |
a symbol like /. there is at least one thing in spork that ends | |
in /. who knows what the future may bring... | |
* notes about specific characters | |
* / is common because of things like string/format | |
* < and > are used for comparisons and threading macros | |
* * turns up in function versions of macros (and other places) | |
* % not too common(?) but happens | |
* : might be uncommon | |
* ? often used in names that are predicates | |
* escape character non-possibilities | |
* windows prevents | |
* zero byte | |
* ascii representation 1-31 | |
* < (less than) - e.g. < | |
* > (greater than) - e.g. ->> | |
* : (colon) | |
* " (double quote) | |
* / (forward slash) - e.g. string/split | |
* \ (backslash) | |
* | (vertical bar or pipe) | |
* ? (question mark) - e.g. pos?, nan?, truthy? | |
* * (asterisk) - e.g. import* | |
* *nix | |
* depends on fs? | |
* : is better to avoid in directory names because of PATH-ish things? | |
* some other chars (like |, *, ', and "?) not great for typing because | |
of shell expansion-ish issues? | |
* ( and ) don't work well with shells? | |
* ` is awkward because of shell use? | |
* escape character possibilities | |
* [ - becomes %5B (via http?) | |
* ] - becomes %5D (via http?) | |
* { - becomes %7B (via http?) | |
* } - becomes %7D (via http?) | |
* https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#List_of_character_entity_references_in_HTML | |
* / - sol | |
* < - lt | |
* > - gt | |
* * - ast | |
* % - percent | |
* : - colon | |
* ? - quest | |
* proposal 1 | |
* use square brackets to surround the character entity ref name | |
* examples | |
* [sol] | |
* [lt] | |
* [gt] | |
* [ast] | |
* [percent] | |
* [colon] | |
* [quest] | |
* main downside might be for ? (due to length), but possibly that's not so | |
bad | |
* proposal 2 | |
* use square brackets to surround abbreviated character entity ref names | |
* examples | |
* [sol] | |
* [lt] | |
* [gt] | |
* [ast] | |
* [per] | |
* [col] | |
* [que] | |
* slightly non-standard, but lengths seem more acceptable | |
* have a test to make sure that urls that are formed using the | |
"proper" file names work? | |
* janet-lang.org examples naming scheme | |
* underscore seems to be used to prefix a number which is the decimal | |
value of the character being escaped - this means presumably | |
that if a name contains an underscore, that underscore will | |
need to be escaped. n.b. a leading underscore is used for gensym | |
results | |
* though _.janet is a file for `/` - why is this? | |
* found code that does the transformation in content/api/gen-docs.janet | |
* seems incomplete as it does not handle some cases, e.g. | |
<, >, :, ? | |
* dont need to handle: ", \, | because these cannot be in janet | |
identifiers | |
* avoid ascii range 0 - 31 | |
(def- replacer | |
(peg/compile | |
~(accumulate (any (choice (replace (capture (set "/*%")) | |
,|(string "_" (0 $))) | |
(capture 1)))))) | |
(defn- sym-to-filename | |
"..." | |
[fname] | |
(string "examples/" | |
((peg/match replacer fname) 0) | |
".janet")) | |
# ... | |
(def- url-repl-chars | |
{(chr "%") "%25" | |
(chr "?") "_q" | |
(chr "=") "%3d"}) | |
(defn jdoc-escape | |
[str] | |
(def ret @"") | |
(when-let [prefix (dyn :jdoc-prefix)] | |
(buffer/push ret prefix)) | |
(each b str | |
(if-let [repl (in url-repl-chars b)] | |
(buffer/push-string ret repl) | |
(buffer/push-byte ret b))) | |
ret) | |
* some names have characters that will cause problems on | |
different platforms / filesystems or being used as | |
part of urls | |
* all characters in file name should work in url | |
* https://www.rfc-editor.org/rfc/rfc3986#appendix-A | |
* all characters in file name should work in windows | |
* https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file | |
* file names (with or without extensions) should not be any of: | |
CON, | |
PRN, | |
AUX, | |
NUL, | |
COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, | |
LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9 | |
* don't use following characters in name: | |
* < (less than) - e.g. < | |
* > (greater than) - e.g. ->> | |
* : (colon) | |
* " (double quote) | |
* / (forward slash) - e.g. string/split | |
* \ (backslash) | |
* | (vertical bar or pipe) | |
* ? (question mark) - e.g. pos?, nan?, truthy? | |
* * (asterisk) - e.g. import* | |
* zero byte | |
* ascii representation 1-31 | |
* fs-specific limitations | |
* don't end file or dir name with . or space |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment