Skip to content

Instantly share code, notes, and snippets.

@sivel
Last active October 2, 2019 13:54
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sivel/9eb368f98974e15033c3ac2f989ac4e0 to your computer and use it in GitHub Desktop.
Save sivel/9eb368f98974e15033c3ac2f989ac4e0 to your computer and use it in GitHub Desktop.

Draft

Using to_bytes/to_native/to_text

errors

The default value for errors, although specified as None in the function signature is surrogate_then_replace

The most common and recommended values for compatibility between python2 and python3 are:

  • surrogate_then_replace
  • surrogate_or_strict

When to use which?

surrogate_then_replace should be used when the data is informational only, such as when displaying information to the user. Ultimately, just heading to a log or displayed to the user.

surrogate_or_strict should be used when the data makes a difference to the computer's understanding of the world. Such as with file paths or database keys.

nonstring

This specifies the strategy to use if a nonstring is passed. The default is simplerepr and will return a string representation using either str(obj) or repr(obj) preferring the str() method.

Other values are empty which returns an empty string, passthru which returns the original object, or strict which will raise a TypeError exception.

An example of using passthru would be when either passing a string or a file like object for use in a HTTP POST request with to_bytes.

to_native

"native" in this context is meant to indicate the default string type on Python 2 and 3 as produced by str

On the controller

to_native on the controller, is used for a small set of functionality:

  1. When converting information for use in exceptions
  2. When the underlying python API expects a native string type

Typically speaking, native values should not be long lived, and should be converted at the borders to native where they are needed. If a variable must be assigned to a native value, the variable should be prefixed with n_ such as n_output.

On the target

  1. Typically most all strings on the target should utilize the native string type for the most easy integration of the underlying python APIs. However, be careful to note the information from the errors section, which dictates which errors value to use for informational vs operational values.

to_bytes

"bytes" in this context refers to the data type produced by bytes on Python 2 and Python3.

On Python 2 this is str and on Python 3 this is bytes.

Values converted to bytes should not be long lived. Typically values should be converted at the borders to bytes where they are needed. If a variable must be assigned to a bytes value, the variable should be prefixed with b_ such as b_path. This includes params in the function signature, if a function accepts a bytes value.

Everywhere

When dealing with byte-oriented APIs. This is common when dealing with file paths, or with data being passed through HTTP requests.

to_text

"text" in this context is meant to indicate the type produced by the unicode function on Python2, and str on Python3.

On the controller

  1. When data is ingested into Ansible, values should typically be cast to text for the lifetime of that data.
  2. All information sent to the Display class, such as display.display or display.vvv should be cast to text.

NOTE: Only on the borders where the data leaves Ansible should it be converted to bytes or native.

On the target

It is not likely to need to_text in many scenarios on the target. Only when the API you are dealing with specifically needs text types, such as in some MySQL libraries.

@abadger
Copy link

abadger commented Jun 5, 2019

I would not use unicode and non-unicode as used here. Only use "unicode" for the python2 unicode type. Most other places should say "text" (or "text string") or "bytes" (or "byte string"). The reason is that the term "unicode" is not very clear in most programmers minds. They associate "unicode" with one of the encodings of unicode (typically utf-16 or utf-8) rather than an abstract idea of a string of human-readable characters. That association of "unicode" with encodings means that they think of unicode as a byte string which it most certainly is not in Python.

@abadger
Copy link

abadger commented Jun 5, 2019

"text" in this context refers to unicode values.

This is a place where my above advice doesn't work since we need to explain what text actually means. brainstorming:

  • text is something that people read
  • text is a string of glyphs which humans use to form words, sentences, and other written material
  • text consists of alphabetical characters, digits, punctuation, and select symbols from the whole range of human witten communication
  • text has to be encoded as actual bytes for computer hardware to do anything with it

@sivel
Copy link
Author

sivel commented Jun 14, 2019

At this point, I've removed the reference to "unicode" where not talking about the Python2 unicode function. I can look to make explanations more robust later.

@mattclay
Copy link

Is it worth including in this the history behind why this is needed? Specifically, why we chose to default to UTF-8 instead of relying on the user properly setting encoding environment variables on the controller and remote.

@sivel
Copy link
Author

sivel commented Jun 27, 2019

Sure. I haven't documented anything around encoding yet, and after today, I at least have some useful things to add for when you would want to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment