KeyWeeUsr/kivy-kv-encoding.md

## kivy-kv-encoding.md

      
    Raw
  

              kivy-kv-encoding.md
            
          
    New Default Encoding for .kv Files: Pros / Cons

Note: This would be in addition to a directive like in Python files:
coding: ...
UTF-8 Everywhere


(+) Simple, straight-forward
(-) Builder.load_file(...) and Builder.load_string(open(...)) would
have differing behaviours (Windows, Linux without UTF-8 locale)
because default open(...) uses preferred system encoding
(?) can't say what the impact on mobile devices is (Android/iOS)
(?) can't say what the impact on OSX is
(~) unsure how many people on Windows use editors which safe their
files in the default encoding
(~) py2 only people might be surprised about the difference between
.py <-> .kv

Python Behaviour


(+) it's probably what people expect

for those doing only py2 development, they already need to care
about the directive, and if not there are no surprises in where it
fails (.kv == .py)
py2/py3 compat: devs should always include the coding: utf-8
directive, that's imho the easiest way of doing compatible Python
programs (together with from __future__ import unicode_literals).
[TODO example?]
py3 defaults to utf-8 and so would .kv when run with py3


(+) could probably build on code in tokenize.py (lower chance of a
bug) actually, just using utf-8 should be trivial enough
(-) Builder.load_file(...) and Builder.load_string(open(...)) would
have differing behaviours (Windows, Linux without UTF-8 locale)
because default open(...) uses preferred system encoding.

at least on py3; py2 might work as expected [TODO verify]


(-) py2/py3 compat pretty much requires adding that directive in
every file to be on the safe side.
(-) Windows (Linux without utf-8 locale): introduces not just
one breaking encoding change, but potentially 2, because different
behaviour on py2 <-> py3.
(?) can't say what the impact on mobile devices is (Android/iOS)
(?) can't say what the impact on OSX is

Keep as is


(+) Doesn't break existing code
(-) People have to add the directive in all their files

Ref


Android note: The Android platform default is always UTF-8. excerpt from Charset | Android Developers


## proposal.rst

      
    Raw
  

              proposal.rst
            
          
    Proposal


UTF-8 by default (no py2 <-> py3 distinction).
Make it explicit in the documentation and release information.


KV/Builder now operates on Unicode strings with Universal Newlines
in both py2 and py3.  The reader, which implements this proposal, is
in charge of the translation from Bytes to Unicode.


PEP 263 defines the coding directive.  KV has the same semantics
with some specific additions:
Directive must be on first or second line.
2 syntax styles are accepted:
#:coding <encoding>
Same as Python, i.e. as defined in PEP 263.


If there's a UTF-8 BOM at the start and there's a directive,
then the directive must be for UTF-8; otherwise, error.
Accepted names for encodings see stdlib docs -> codecs
Special encoding directive system-preferred means: open it like
before (<= v1.10).
Add to notes on migrating from affected systems, i.e. those whose
locale.getpreferredencoding(False) says something other than
UTF-8 (primarily: Windows).


Implementation note: multibyte encodings have to be treated specially.


Context/Rationale/Explanation


Allow both KV style and Python style declaration.
These are only relevant for the reader; for the parser they're
simply comments or a no-op in case of #:coding ....
Reason to accept Python style as well: because Emacs, Vim and others
use these to make the encoding clear.  No need to reinvent the wheel
completely when that can easily be supported.


Why accept it on both first and 2nd line like Python?
Ensures consistency with Python.
Allows for a later addition of a shebang without any changes.


Misc. Notes


TODO: should #:encoding ... be supported, too?
TODO: compile a list with preferred encodings of major platforms,
or at least where this might be a breaking change.
UTF-8 is technically 2 codecs: utf-8 and utf-8-sig, the
latter of which handles the BOM.
Both py2 and py3 have the class io.IncrementalNewlineDecoder which
can be used to deal with Universal Newlines.
See detect_encoding() in tokenize.py of Python 3 for a correct
implementation.