Skip to content

Instantly share code, notes, and snippets.

@rschiang
Last active October 28, 2016 19:45
Show Gist options
  • Save rschiang/eefd1a7b322694fcaab3f722352f98a0 to your computer and use it in GitHub Desktop.
Save rschiang/eefd1a7b322694fcaab3f722352f98a0 to your computer and use it in GitHub Desktop.
Ruby text syntax support proposal in CommonMark

Proper ruby text (<rb>) syntax support in Markdown

Originally posted on CommonMark discussion board, 2016/10/29.

It is fairly common for East Asian languages (mostly CJK characters) to have ruby texts annotations; not only do they provide phonetic guides, the actual meaning of text might even differ without labeling.

This technique is currently implemented in HTML as a set of <ruby> tags, as demonstrated.

ㄔㄡˊㄔㄨˊ
<ruby><rp></rp><rt>ㄔㄡˊ</rt><rp></rp><rp></rp><rt>ㄔㄨˊ</rt><rp></rp>
</ruby>

両人ふたり

<ruby>両人<rp></rp><rt>ふたり</rt><rp></rp></ruby>

In the wild, there are few Markdown extensions support the generation of ruby text, but none of them are consistent.

The Python furigana_markdown package suggests the following syntax:

[図](-と)[書](-しょ)[館](-かん)

The Node.js showdown-kanji package goes a different way, but does not automatically generate <rp> fallback tags:

{漢}(かん){字}(じ)

The PHP parsedown-rubytext extension suggests quite a few ways for adding annotations. Either inline:

[図書館]^(としょかん)
[図書館]^(としょかん) // Full-width parentheses
[図書館](としょかん)  // Full-width parentheses

Or by defining document-wide ruby text annotations:

**[図書館]: としょかん

And even allowing merging conjugating ruby texts:

 [図書館]^(と しょ かん)  <!-- Will generate three <ruby> tags -->

TL;DR: How could this syntax be proper implemented in CommonMark spec?

The full-width parenthesis case might not fit well in the context of language-independent spec, but the [base_text]^(ruby_text) syntax might worth a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment