Skip to content

Instantly share code, notes, and snippets.

@KeyWeeUsr
Forked from dolang/kivy-kv-encoding.md
Created June 30, 2018 13:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KeyWeeUsr/213024f6ad8c6f583586d0f8b1539748 to your computer and use it in GitHub Desktop.
Save KeyWeeUsr/213024f6ad8c6f583586d0f8b1539748 to your computer and use it in GitHub Desktop.
[Draft] New Default Encoding for .kv Files: Pros / Cons

New Default Encoding for .kv Files: Pros / Cons

Note: This would be in addition to a directive like in Python files: coding: ...

UTF-8 Everywhere

  • (+) Simple, straight-forward
  • (-) Builder.load_file(...) and Builder.load_string(open(...)) would have differing behaviours (Windows, Linux without UTF-8 locale) because default open(...) uses preferred system encoding
  • (?) can't say what the impact on mobile devices is (Android/iOS)
  • (?) can't say what the impact on OSX is
  • (~) unsure how many people on Windows use editors which safe their files in the default encoding
  • (~) py2 only people might be surprised about the difference between .py <-> .kv

Python Behaviour

  • (+) it's probably what people expect
    • for those doing only py2 development, they already need to care about the directive, and if not there are no surprises in where it fails (.kv == .py)
    • py2/py3 compat: devs should always include the coding: utf-8 directive, that's imho the easiest way of doing compatible Python programs (together with from __future__ import unicode_literals). [TODO example?]
    • py3 defaults to utf-8 and so would .kv when run with py3
  • (+) could probably build on code in tokenize.py (lower chance of a bug) actually, just using utf-8 should be trivial enough
  • (-) Builder.load_file(...) and Builder.load_string(open(...)) would have differing behaviours (Windows, Linux without UTF-8 locale) because default open(...) uses preferred system encoding.
    • at least on py3; py2 might work as expected [TODO verify]
  • (-) py2/py3 compat pretty much requires adding that directive in every file to be on the safe side.
  • (-) Windows (Linux without utf-8 locale): introduces not just one breaking encoding change, but potentially 2, because different behaviour on py2 <-> py3.
  • (?) can't say what the impact on mobile devices is (Android/iOS)
  • (?) can't say what the impact on OSX is

Keep as is

  • (+) Doesn't break existing code
  • (-) People have to add the directive in all their files

Ref

Proposal

  • UTF-8 by default (no py2 <-> py3 distinction).
    • Make it explicit in the documentation and release information.
  • KV/Builder now operates on Unicode strings with Universal Newlines in both py2 and py3. The reader, which implements this proposal, is in charge of the translation from Bytes to Unicode.
  • PEP 263 defines the coding directive. KV has the same semantics with some specific additions:
    • Directive must be on first or second line.
    • 2 syntax styles are accepted:
      • #:coding <encoding>
      • Same as Python, i.e. as defined in PEP 263.
    • If there's a UTF-8 BOM at the start and there's a directive, then the directive must be for UTF-8; otherwise, error.
    • Accepted names for encodings see stdlib docs -> codecs
    • Special encoding directive system-preferred means: open it like before (<= v1.10).
    • Add to notes on migrating from affected systems, i.e. those whose locale.getpreferredencoding(False) says something other than UTF-8 (primarily: Windows).
  • Implementation note: multibyte encodings have to be treated specially.

Context/Rationale/Explanation

  • Allow both KV style and Python style declaration.
    • These are only relevant for the reader; for the parser they're simply comments or a no-op in case of #:coding ....
    • Reason to accept Python style as well: because Emacs, Vim and others use these to make the encoding clear. No need to reinvent the wheel completely when that can easily be supported.
  • Why accept it on both first and 2nd line like Python?
    • Ensures consistency with Python.
    • Allows for a later addition of a shebang without any changes.

Misc. Notes

  • TODO: should #:encoding ... be supported, too?
  • TODO: compile a list with preferred encodings of major platforms, or at least where this might be a breaking change.
  • UTF-8 is technically 2 codecs: utf-8 and utf-8-sig, the latter of which handles the BOM.
  • Both py2 and py3 have the class io.IncrementalNewlineDecoder which can be used to deal with Universal Newlines.
  • See detect_encoding() in tokenize.py of Python 3 for a correct implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment