Skip to content

Instantly share code, notes, and snippets.

@guy4261
Last active April 12, 2023 20:38
Show Gist options
  • Save guy4261/d5134608e0b896a6bd32284a3181e09b to your computer and use it in GitHub Desktop.
Save guy4261/d5134608e0b896a6bd32284a3181e09b to your computer and use it in GitHub Desktop.
Hebrew niqqud unicode point values for Python programmers
# To create a hebrew letter with niqqud:
# hebrew letter [+ optional shin_dot if letter is shin] [+ optional dagesh] [+ optional niqqud]
# example: print("ש" + chr(shin_dot_right_shin) + chr(dagesh) + chr(kmz_katan)) => שָּׁ
# letter should be first, order of the rest does not matter
# print("ש" + chr(kmz_katan) + chr(shin_dot_left_sinn) + chr(dagesh)) => שָּׂ
# This is how you get things like the reverse of noël being l̈eon instead of lëon,
# as discussed in Edaqa Mortoray's two seminal blog posts:
# https://mortoray.com/we-dont-need-a-string-type/
# https://mortoray.com/the-string-type-is-broken/
# The following post by Joel Spolsky also helped me grok unicode back in the Python 2.6+ days:
# https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
# niqqud marks
shva = 1456 # שְ
segol_nach = 1457 # שֱ
pth_nach = 1458 # שֲ
kmz_nach = 1459 # שֳ
hirik = 1460 # שִ
zere = 1461 # שֵ
segol = 1462 # שֶ
pth = 1463 # שַ
kmz_katan = 1464 # שָ
holam = 1465 # שֹ
holam_hasser = 1466 # שֺ
kubuz = 1467 # שֻ
dagesh = 1468 # שּ
# 1469 שֽ
# 1470 ־
# 1471 שֿ
# 1472 ׀
shin_dot_right_shin = 1473 # שׁ
shin_dot_left_sinn = 1474 # שׂ
# 1475 ׃
# 1476 שׄ
# 1477 שׅ
# 1478 ׆
kmz_gadol = 1479 # שׇ
# [1480, 1487] - [?]
# hebrew letters
aleph = 1488 # א
bet = 1489 # ב
gimel = 1490 # ג
dalet = 1491 # ד
heh = 1492 # ה
vav = 1493 # ו
zayin = 1494 # ז
het = 1495 # ח
tet = 1496 # ט
yod = 1497 # י
kaf_sofit = 1498 # ך
kaf = 1499 # כ
lamed = 1500 # ל
mem_sofit = 1501 # ם
mem = 1502 # מ
noon_sofit = 1503 # ן
noon = 1504 # נ
samech = 1505 # ס
ayin = 1506 # ע
peh_sofit = 1507 # ף
peh = 1508 # פ
zadi_sofit = 1509 # ץ
zadi = 1510 # צ
kof = 1511 # ק
reyish = 1512 # ר
shin = 1513 # ש
tav = 1514 # ת
# [1515, 1519] - [?]
# 1520 װ
# 1521 ױ
# 1522 ײ
geresh = 1523 # ׳
# 64331 וֹ
@guy4261
Copy link
Author

guy4261 commented Apr 12, 2023

Only after posting this did I find this page:
http://www.nashbell.com/technology/he-unicode.php

The things I went to get to this gist instead of locating this page and ripping it off are truly embarrassing 😓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment