Skip to content

Instantly share code, notes, and snippets.

@adrianyorke
Last active October 25, 2021 22:50
Show Gist options
  • Save adrianyorke/ddff17c2ec2ad00579827c4ce1248899 to your computer and use it in GitHub Desktop.
Save adrianyorke/ddff17c2ec2ad00579827c4ce1248899 to your computer and use it in GitHub Desktop.
Unicode character testing for Nordic region
"""Unicode character testing done for Unicode source."""
# Even in the year 2021, unicode is still not used everywhere so we must test our entire processing
# chain for uncommon or exceptional characters.
# Within the Nordic region, we must test for common letters that are used in the various
# languages of the region, which can also be found on our country-specific keyboards.
# Note: It is common for those with Russian heritage to live in Nordic countries, especially Finland.
# Our technology stack and tools must also handle these additional letters not found in the default code page.
NORDIC_SPECIAL_CHARS = [
"ų",
"ī",
"ū",
"ą",
"ę",
"į",
"ų",
"ū́",
"Ż",
"š",
"č",
"ẽ",
"ä",
"Ä",
"ö",
"Ö",
"å",
"Å",
]
for c in NORDIC_SPECIAL_CHARS:
print(c)
@adrianyorke
Copy link
Author

Here's a Python statement to demonstrate the issue of convertic Nordic letters from utf-8 to latin-1 encoding:
print('öäå'.encode('utf-8').decode('latin-1'))

Expected output:
öäå

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment