Skip to content

Instantly share code, notes, and snippets.

@dmtucker
Created August 23, 2018 15:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dmtucker/0e058c51a51c0060cd05cbc7dbc629e9 to your computer and use it in GitHub Desktop.
Save dmtucker/0e058c51a51c0060cd05cbc7dbc629e9 to your computer and use it in GitHub Desktop.
why doesn't it make sense to mix strings and bytes in path components with posixpath.join? this Q is really only for py2 since 3 does away with implicit conversions, but the unicode could be encoded and the join would succeed. even if that results in a multi-encoding str, I've been told, that's ok in POSIX
@dmtucker
Copy link
Author

09:19       dtux| instead of raising TypeError on calling posixpath.join on a str an bytes, why isn't the str just encoded with sys.getfilesystemencoding()?
09:19      Yhg1s| dtux: because guessing about what encoding you meant to use is what Python 2 did, and it turned out to be a mistake.
09:20       dtux| Yhg1s: ya, but that's slightly different right? that was done under the hood on all string aperations, not just os/fs related ones. it seems getfilesystemencoding is described for exactly this 
                  purpose
09:21       dtux| operations*
09:21      Yhg1s| dtux: but os.path.join is not a filesystem operation.
09:22      Yhg1s| dtux: it's a path operation, but the paths don't have to exist, so you can't know which encoding will apply to them.
09:22      Yhg1s| dtux: for example, posixpath.join can be used to assemble URLs instead.
09:25       dtux| dtux: path components are expected to correspond to file/directory names though right? and (i've been told) posix allows path components to have different encodings... hmm still wrapping my 
                  head around your URLs point
09:25       dtux| Yhg1s* ^^
09:26       dtux| e.g. posixpath.join('https://example.com', '/foo')  probably isn't what we wanted
09:27      Yhg1s| dtux: they don't have to be paths *on the system you're joining them on*. They can be paths somewhere else. Or purely virtual paths. The local filesystem encoding doesn't necessarily apply.
09:27      Yhg1s| dtux: no, but the path component of URLs (as parsed by urllib.parse for example) is a POSIX path, and can be joined and split and whatever with the posixpath module.
09:28       dtux| Yhg1s: so, would my Q make more sense for os.path.join then?
09:28      Yhg1s| dtux: no, because paths still do not have to be local paths.
09:29       dtux| Yhg1s: why does os look at the system it's running on then?
09:30      Yhg1s| dtux: os.path does not look at the system it's running on.
09:30       dtux| er.. ya os.path. hmm
09:30      Yhg1s| dtux: not for os.path.split, basename, dirname, join, etc.
09:31       dtux| Yhg1s: eh, but that's only indirectly true, no? os.path is set based on the system... does this story change for nt users? can os.path still be used to build URLs then?
09:34      Yhg1s| dtux: the URLs thing was just an example of why posixpath doesn't do this. os.path is poxixpath on POSIX systems. The more important point is that you cannot assume to know the encoding of the 
                  things passed to os.path functions that do not access the filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment