Skip to content

Instantly share code, notes, and snippets.

@bobvanluijt
Created September 2, 2019 11:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bobvanluijt/8904e054c3e52d614624e54fd479691d to your computer and use it in GitHub Desktop.
Save bobvanluijt/8904e054c3e52d614624e54fd479691d to your computer and use it in GitHub Desktop.
List of UTF-8 lowercase characters in Python 3
def printIfLower(i):
s = chr(i)
if s.islower():
if s is not None:
print(s)
for x in range(1, 125251):
printIfLower(x)
@etiennedi
Copy link

There's the equivalent in Go (https://golang.org/src/unicode/graphic.go?s=2957:2983#L80) I assume the output is mostly the same.

@bobvanluijt
Copy link
Author

Aha, cool, thanks @etiennedi. The point was to make the case :-)

@bobvanluijt
Copy link
Author

@etiennedi, I'm assuming that category L means Lowercase...?

@etiennedi
Copy link

etiennedi commented Sep 18, 2019

Aha, cool, thanks @etiennedi. The point was to make the case :-)

Haha, yes, what I mean is, since the implemenation has to be in Go anyway, I'll just rely on the Go implementation without comparing it to the Python one

@etiennedi, I'm assuming that category L means Lowercase...?

Category L just means letter, it is then further categorized as

pLu                // an upper-case letter.
pLl                // a lower-case letter.

I'm experimenting with this as we speak to identify by which characters to split and which to consider special characters of a language for weaviate/contextionary#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment