Skip to content

Instantly share code, notes, and snippets.

@nega0
Last active June 3, 2022 02:55
Show Gist options
  • Save nega0/7b90d72104fb30ccf8af46aa1f39020d to your computer and use it in GitHub Desktop.
Save nega0/7b90d72104fb30ccf8af46aa1f39020d to your computer and use it in GitHub Desktop.
display the list of unicode dashes/hyphens
#!/usr/bin/env python3 # -*- coding: utf-8; -*-
# table sourced from http://jkorpela.fi/dashes.html
## - U+002D - hyphen-minus the Ascii hyphen, with multiple usage, or “ambiguous semantic value”; the width should be “average”
## ~ U+007E ~ tilde the Ascii tilde, with multiple usage; “swung dash”
## ­ U+00AD ­ soft hyphen “discretionary hyphen”
## ֊ U+058A ֊ armenian hyphen as soft hyphen, but different in shape
## ־ U+05BE ־ hebrew punctuation maqaf word hyphen in Hebrew
## ᐀ U+1400 ᐀ canadian syllabics hyphen used in Canadian Aboriginal Syllabics
## ᠆ U+1806 ᠆ mongolian todo soft hyphen as soft hyphen, but displayed at the beginning of the second line
## ‐ U+2010 ‐ hyphen unambiguously a hyphen character, as in “left-to-right”; narrow width
## ‑ U+2011 ‑ non-breaking hyphen as hyphen (U+2010), but not an allowed line break point
## ‒ U+2012 ‒ figure dash as hyphen-minus, but has the same width as digits
## – U+2013 – en dash used e.g. to indicate a range of values
## — U+2014 — em dash used e.g. to make a break in the flow of a sentence
## ― U+2015 ― horizontal bar used to introduce quoted text in some typographic styles; “quotation dash”; often (e.g., in the representative glyph in the Unicode standard) longer than em dash
## ⁓ U+2053 ⁓ swung dash like a large tilde
## ⁻ U+207B ⁻ superscript minus a compatibility character which is equivalent to minus sign U+2212 in superscript style
## ₋ U+208B ₋ subscript minus a compatibility character which is equivalent to minus sign U+2212 in subscript style
## − U+2212 − minus sign an arithmetic operator; the glyph may look the same as the glyph for a hyphen-minus, or may be longer ;
## ⸗ U+2E17 ⸗ double oblique hyphen used in ancient Near-Eastern linguistics; not in Fraktur, but the glyph of Ascii hyphen or hyphen is similar to this character in Fraktur fonts
## ⸺ U+2E3A &#11834; two-em dash omission dash<(a>, 2 em units wide
## ⸻ U+2E3B &#11835; three-em dash used in bibliographies, 3 em units wide
## 〜 U+301C &#12316; wave dash a Chinese/Japanese/Korean character
## 〰 U+3030 &#12336; wavy dash a Chinese/Japanese/Korean character
## ゠ U+30A0 &#12448; katakana-hiragana double hyphen in Japasene kana writing
## ︱ U+FE31 &#65073; presentation form for vertical em dash vertical variant of em dash
## ︲ U+FE32 &#65074; presentation form for vertical en dash vertical variant of en dash
## ﹘ U+FE58 &#65112; small em dash small variant of em dash
## ﹣ U+FE63 &#65123; small hyphen-minus small variant of Ascii hyphen
## - U+FF0D &#65293; fullwidth hyphen-minus variant of Ascii hyphen for use with CJK characters
# for i in "\u002D" "\u007E" "\u00AD" "\u058A" "\u05BE" "\u1400" "\u1806" "\u2010" "\u2011" "\u2012" "\u2013" "\u2014" "\u2015" "\u2053" "\u207B" "\u208B" "\u2212" "\u2E17" "\u2E3A" "\u2E3B" "\u301C" "\u3030" "\u30A0" "\uFE31" "\uFE32" "\uFE58" "\uFE63" "\uFF0D":
# print(i * 72)
import os
import re
with open(os.path.basename(__file__), mode='r') as f:
line = f.readline()
r = re.compile(r'^## ')
for line in f:
if r.search(line):
q = [x.strip() for x in line.split('\t')]
q[0] = q[0].split()[1]
print("{:40} {}".format(q[3] + ':', q[0] * 18))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment