Skip to content

Instantly share code, notes, and snippets.

@sivel
Last active October 4, 2019 15:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sivel/86e1cc5bdd7327ffee9f0f95d4c11dbd to your computer and use it in GitHub Desktop.
Save sivel/86e1cc5bdd7327ffee9f0f95d4c11dbd to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*-
# Copyright (c) 2019 Ansible Project
# GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt)
# Make coding more python3-ish
from __future__ import (absolute_import, division, print_function)
__metaclass__ = type
import ctypes.util
import locale
from six import text_type
libc_path = ctypes.util.find_library('c')
libc = ctypes.cdll.LoadLibrary(libc_path)
libc.wcwidth.argtypes = (ctypes.c_wchar,)
libc.wcwidth.restype = ctypes.c_int
libc.wcswidth.argtypes = (ctypes.c_wchar_p, ctypes.c_int)
libc.wcswidth.restype = ctypes.c_int
locale.setlocale(locale.LC_ALL, '')
def width(u_text):
"""This function is slower than just using libc directly.
I recommend not using this and just using ``libc.wcswidth``
on a full string
A helper may still be useful, to do the isinstance check still
"""
if not isinstance(u_text, text_type):
raise ValueError('Value must be text type')
length = 0
for c in u_text:
width = libc.wcwidth(c)
if width < 0:
raise ValueError('Something bad happened')
length += width
return length
if __name__ == '__main__':
print(libc.wcswidth(u'コンニチハ', 1024))
l = width(u'コンニチハ')
print(u'コンニチハ')
print(l * '-')
@abadger
Copy link

abadger commented Oct 3, 2019

Wild ass guess is that the reason this doesn't work on python2 is that python isn't translating the unicode string into wchar_t correctly. Maybe comparing ctypes code for python-3.0 vs python-2.7 will show if that's correct.

@abadger
Copy link

abadger commented Oct 3, 2019

This gets it working on Python2:

--- print_wcwidth.py.orig       2019-10-03 15:56:34.498229625 -0700
+++ print_wcwidth.py    2019-10-03 15:57:17.929191395 -0700
@@ -9,6 +9,9 @@
 from six import text_type
 
 import ctypes.util
+import locale
+
+locale.setlocale(locale.LC_ALL, ('en_US', 'UTF-8'))
 libc_path = ctypes.util.find_library('c')
 libc = ctypes.cdll.LoadLibrary(libc_path)
 libc.wcwidth.argtypes = (ctypes.c_wchar,)

Unfortunately we can't just go setting locale in our code. But perhaps it gives us a hint as to how we can fix it.

@abadger
Copy link

abadger commented Oct 3, 2019

Okay, I think this works and we can use it:

`locale.setlocale(locale.LC_ALL, '')

@sivel
Copy link
Author

sivel commented Oct 4, 2019

Perf comparison between the custom width function, and just using libc.wcswidth directly. wcswidth is much faster, which is to be expected.

ansibledev ▶ In [2]: %timeit print_width.width(u'コンニチハ')
3.91 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

ansibledev ▶ In [3]: %timeit print_width.libc.wcswidth(u'コンニチハ', 1024)
963 ns ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@abadger
Copy link

abadger commented Oct 4, 2019

From the wcswidth man page: "If a nonprintable wide character occurs among these characters, -1 is returned." That might be part of the difference with kitchen.text.display.textual_width. Maybe we should have a width() function as a front end but it first tries to run wcswidth() on the string, then, if -1 is returned, have some slower code that steps through each character to determine what the width is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment