bobvanluijt/utf8-lowercase.py

Created September 2, 2019 11:45

Star 0 You must be signed in to star a gist
Fork 0 You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/bobvanluijt/8904e054c3e52d614624e54fd479691d.js"></script>
Save bobvanluijt/8904e054c3e52d614624e54fd479691d to your computer and use it in GitHub Desktop.

Download ZIP

List of UTF-8 lowercase characters in Python 3

Raw

utf8-lowercase.py

	def printIfLower(i):
	s = chr(i)
	if s.islower():
	if s is not None:
	print(s)

	for x in range(1, 125251):
	printIfLower(x)

etiennedi commented Sep 18, 2019

There's the equivalent in Go (https://golang.org/src/unicode/graphic.go?s=2957:2983#L80) I assume the output is mostly the same.

Author

bobvanluijt commented Sep 18, 2019

Aha, cool, thanks @etiennedi. The point was to make the case :-)

Author

bobvanluijt commented Sep 18, 2019

@etiennedi, I'm assuming that category L means Lowercase...?

etiennedi commented Sep 18, 2019 •

edited

Aha, cool, thanks @etiennedi. The point was to make the case :-)

Haha, yes, what I mean is, since the implemenation has to be in Go anyway, I'll just rely on the Go implementation without comparing it to the Python one

@etiennedi, I'm assuming that category L means Lowercase...?

Category L just means letter, it is then further categorized as

pLu                // an upper-case letter.
pLl                // a lower-case letter.

I'm experimenting with this as we speak to identify by which characters to split and which to consider special characters of a language for weaviate/contextionary#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment