Skip to content

Instantly share code, notes, and snippets.

@sogaiu
Last active February 24, 2024 13:01
Show Gist options
  • Save sogaiu/993ea373fd045dd13436c63e51404f92 to your computer and use it in GitHub Desktop.
Save sogaiu/993ea373fd045dd13436c63e51404f92 to your computer and use it in GitHub Desktop.
notes on choosing file names for janet identifiers across unixy and windows filesystems
naming of usages files
* cannot use all symbol names as-is because of file system and
urls limitations
* problem characters include: /, <, >, :, *, %, ? (others?)
* want readable / typable "escaping" if possible because
people may need to:
* figure out what symbol a file is associated with
* type the file name before it has been created (like for jref work)
* using subdirectories when a name has a slash in it doesn't work well
a symbol like /. there is at least one thing in spork that ends
in /. who knows what the future may bring...
* notes about specific characters
* / is common because of things like string/format
* < and > are used for comparisons and threading macros
* * turns up in function versions of macros (and other places)
* % not too common(?) but happens
* : might be uncommon
* ? often used in names that are predicates
* escape character non-possibilities
* windows prevents
* zero byte
* ascii representation 1-31
* < (less than) - e.g. <
* > (greater than) - e.g. ->>
* : (colon)
* " (double quote)
* / (forward slash) - e.g. string/split
* \ (backslash)
* | (vertical bar or pipe)
* ? (question mark) - e.g. pos?, nan?, truthy?
* * (asterisk) - e.g. import*
* *nix
* depends on fs?
* : is better to avoid in directory names because of PATH-ish things?
* some other chars (like |, *, ', and "?) not great for typing because
of shell expansion-ish issues?
* ( and ) don't work well with shells?
* ` is awkward because of shell use?
* escape character possibilities
* [ - becomes %5B (via http?)
* ] - becomes %5D (via http?)
* { - becomes %7B (via http?)
* } - becomes %7D (via http?)
* https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#List_of_character_entity_references_in_HTML
* / - sol
* < - lt
* > - gt
* * - ast
* % - percent
* : - colon
* ? - quest
* proposal 1
* use square brackets to surround the character entity ref name
* examples
* [sol]
* [lt]
* [gt]
* [ast]
* [percent]
* [colon]
* [quest]
* main downside might be for ? (due to length), but possibly that's not so
bad
* proposal 2
* use square brackets to surround abbreviated character entity ref names
* examples
* [sol]
* [lt]
* [gt]
* [ast]
* [per]
* [col]
* [que]
* slightly non-standard, but lengths seem more acceptable
* have a test to make sure that urls that are formed using the
"proper" file names work?
* janet-lang.org examples naming scheme
* underscore seems to be used to prefix a number which is the decimal
value of the character being escaped - this means presumably
that if a name contains an underscore, that underscore will
need to be escaped. n.b. a leading underscore is used for gensym
results
* though _.janet is a file for `/` - why is this?
* found code that does the transformation in content/api/gen-docs.janet
* seems incomplete as it does not handle some cases, e.g.
<, >, :, ?
* dont need to handle: ", \, | because these cannot be in janet
identifiers
* avoid ascii range 0 - 31
(def- replacer
(peg/compile
~(accumulate (any (choice (replace (capture (set "/*%"))
,|(string "_" (0 $)))
(capture 1))))))
(defn- sym-to-filename
"..."
[fname]
(string "examples/"
((peg/match replacer fname) 0)
".janet"))
# ...
(def- url-repl-chars
{(chr "%") "%25"
(chr "?") "_q"
(chr "=") "%3d"})
(defn jdoc-escape
[str]
(def ret @"")
(when-let [prefix (dyn :jdoc-prefix)]
(buffer/push ret prefix))
(each b str
(if-let [repl (in url-repl-chars b)]
(buffer/push-string ret repl)
(buffer/push-byte ret b)))
ret)
* some names have characters that will cause problems on
different platforms / filesystems or being used as
part of urls
* all characters in file name should work in url
* https://www.rfc-editor.org/rfc/rfc3986#appendix-A
* all characters in file name should work in windows
* https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file
* file names (with or without extensions) should not be any of:
CON,
PRN,
AUX,
NUL,
COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9,
LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9
* don't use following characters in name:
* < (less than) - e.g. <
* > (greater than) - e.g. ->>
* : (colon)
* " (double quote)
* / (forward slash) - e.g. string/split
* \ (backslash)
* | (vertical bar or pipe)
* ? (question mark) - e.g. pos?, nan?, truthy?
* * (asterisk) - e.g. import*
* zero byte
* ascii representation 1-31
* fs-specific limitations
* don't end file or dir name with . or space
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment