Skip to content

Instantly share code, notes, and snippets.

@kamahen
Last active December 17, 2023 21:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kamahen/82da625805e2adbb16ec78cb4ff89510 to your computer and use it in GitHub Desktop.
Save kamahen/82da625805e2adbb16ec78cb4ff89510 to your computer and use it in GitHub Desktop.
Why Japanese Writing is so Weird

Why Japanese Writing is so Weird

  • Peter Ludemann, November 2023

Creative Commons license "BY" - may use, but must attribute the author.

This is an aggregation of what I've learned over the years about the Japanese writing system. You'll have to take my word for many of things, or find the references on your own — it would take too long to dig up references.

Classical Japanese and Middle Chinese are not related (nor is Korean or Vietnamese)

Classical Japanese was an agglutinative language, with a limited number of possible syllables, all of them being either a pure vowel or consonent plus pure vowel (no final consonants, no diphthongs). Vowels could be short or long. Most words were polysyllabic and had a pitch accent. Word order was flexible but topic-subject-object-verb was typical, with all parts optional except the verb. Nouns did not inflect, but were usually followed by a particle that was similar to a case marker in other languages. There was no grammatical number or gender, nor articles such as "a" or "the". Adjectives and verbs inflected, with multiple endings (e.g., "iki-taku-na-katta" go-want-not-past or "[I] did not want to go". Inflections and sandhi were very regular.1

Middle Chinese was an analytic language, with most words monosyllabic and uninflected, four tone-classes; and a syllable structure consisting of initial consonant, glide, main vowel and final consonant, with a large number of initial consonants and a small number of final consonants. Word order was typically subject/topic-verb-object, any of which can be droppped; word order was meaninful and relatively inflexible. There was no grammatical number or gender, nor articles such as "a" or "the". Middle Chinese was not a written language.2

I don't know Korean, but I've been told by Koreans that it is similar to Japanese in many ways, partially from the influence of Chinese (which Korean has had for much longer than Japanese) and partially from the "native" structure of the language (even though linguists haven't been able to show a relationship between the languages, Jared Diamond to the contrary)34.

Literary Chinese

"Literary Chinese" or "Classical Chinese" was the main written language in China until the 20th century; in Vietnam until the 20th century; in Korea until the 15th century; and in Japan in the 8th century (more on this later). In China, it was pronounced in the local "dialect"5; in Korea, Japan, and Vietnam in an approximation of the Middle Chinese pronunciation (and in Japan, it might be the Japanese approximation of the Korean approximation of the Middle Chinese pronunciation).

The role of Literary Chinese in East Asia was similar to the role of Latin in pre-modern Europe. Educated people knew it and could write to each other using it (although using it conversationally was more challenging); and it greatly influenced the vocabulary of educated people.

Literary Chinese is typically written in a very terse style. If you read one of the Greek or Latin histories and compare it with a classical Chinese history such as Sima Qian's (Records of the Grand Historian 史記)[https://en.wikipedia.org/wiki/Records_of_the_Grand_Historian], what might be a paragraph or two in Greek or Latin would be barely a sentence in Chinese.

The pronunciation of Literary Chinese has been reconstructed, although many details are disputed; it has many

Similarities between Chinese and Japanese

Although Chinese and Japanese are unrelated, they have some characteristics in common:

In addition, although Japanese verbs and adjectives inflect (and Chinese don't), the inflections are very regular.

A brief history of Chinese language influence on Japanese

Although there was some contact between China and Japan dating back to at least the 3rd century, the major transmission of Chinese culture started in the 6th century, when Buddhism was introduced via the Korean kingdom of Baekje.6 With Buddhism came the Buddhist texts and sutras, written in Literary Chinese. Japanese had no writing system, so theyhad to simultaneously learn how to read Chinese characters (漢字) and how to understand Literary Chinese.

To do this, the Japanese invented Kanbun, which is a way of marking a Chinese text so that it can be read in Japanese. To do this, they needed a way of parsing the sentence and indicating the meaning of the words. If such as system had been developed for Latin, the sentence "de gustibus non disputandum est" might be marked up as "de gustibus[4] non[2] disputandum[3] est[1]" and read "it is not disputing about tastes". But imagine further that this were written with kanji: "之 好達 不 議論 也", then marked up as "之 好達[4] 不[2] 議論[3] 也[1]" and read out as "it is not disputing about tastes".7

One thing that helps Kanbun is that neither Chinese nor Japanese make any changes inside a word to show grammatical information — this is unlike English which has words like "sing", "sang", "sung", "song" — so grammatical information can be separate from the original Chinese text.

Besides the mark-up for grammar, there needed to be a way to annotate the Chinese text with Japanese meanings. Chinese already has a way of transcribing words — (音譯)[https://en.wikipedia.org/wiki/Transcription_into_Chinese_characters], where Chinese characters are used purely for their phonetic values. It was a simple matter of choosing Chinese characters that sounded similar to Japanese syllable and write them next to the Chinese words.

Using Chinese characters for their phonetic value, it was then possible to transcribe purely Japanese text. One of the best known examples is the (Man'yōshū 万葉集)[https://en.wikipedia.org/wiki/Man%27y%C5%8Dsh%C5%AB], a collection of early poetry. But the influence of Kanbun, and especially the terseness of Literary Chinese remained. For example, the famous opening of the Tale of Heike, written in Middle Japanese, almost looks like Chinese, and in fact contains two (4-character phrases 四字熟語)[https://en.wikipedia.org/wiki/Yojijukugo], derived from Chinese Buddhist texts: 諸行無常, 盛者必衰; the "の" is the possessive (corresponding to Literary Chinese 之 and modern Chinese 的); only 響き, 有り, 顯す are native Japanese words (more on how these are written later).

  祇園精舎の鐘の聲、
  諸行無常の響き有り。
  沙羅雙樹の花の色 、
  盛者必衰の理を顯す。

Over time, the phonetic characters became standardized and then simplified. For example, the sound "a" was written 安 and then simplified to あ. Eventually, two sets of such phonetic characters emerged: hiragana and katakana.

Hiragana and katakana are syllabaries, not alphabets. Each syllable is either a single vowel or a consonant+vowel.

Also, over time, the stiff Chinese-style literary style was replaced by a style that was closer to how people talked, with more native Japanese words, such as The Tale of Genji.

From Kanbun to modern Japanese writing

To summarize: using Kanbun, classical Chinese texts can be marked up so that they can be read in a Japanese way. The mark-up is of two kinds:

  • word order and syntactic structures
  • word meanings and pronunciation.

Word order and syntax are fairly straitforward — the sentence is marked with "first", "second", etc. and the Japanese grammatical particles (e.g., は for topic, を for object). Pronunciation annotations can use Chinese characters for their sound value, and eventually their simplified form in hiragana and katakana.

But there isn't always a 1:1 correspondence between a Chinese word and its meaning. For example, the character can mean "to live", "to grow", "raw", "alive", "to give birth", etc. When the character appears in a compound (e.g., 学生 "student"), only the "Chinese" pronunciation needs to be shown ("sei", from Middle Chinese "sraeng"). But when it stands alone, the meaning needs to be shown, and that is done by using the native Japanese word (e.g., "iki-ru", "haya-su", "nama", "u-mu", etc., where the "iki-", "haya-", "u-" are root form of the verb and the "-ru", "-su", "-mu" are the inflected part (similar to French "-er", "-re", "-ir", except that the Japanese verbs are regular and the root part doesn't change).

People don't write in Kanbun anymore, so let's take a simple, slightly silly, sentence in modern Japanese:

The student lived to one hundred years of age.

which translates as 学生は百歳まで生きた:

学   生   は 百     歳  まで  生きた。
がく せい は ひゃく さい まで  いきた
gaku-sei wa hyaku- sai made ikita.

You can see the character 生 twice in the sentence — once as part of the word "学生" (gaku-sei: "student") and once as part of the word "生きた" (ikita: "lived"). The "は" ("wa") marks the topic, "まで" ("made") means "until" (it can also be written "迄", but that's a bit pretentious), and "生きた" ("ikita") means "lived" (the root is "生きる" "iki-ru").

There is no longer any sentence markup for order or syntax; the words are put in their normal Japanese order and Japanese particles (は, まで) are used the way they would be in the spoken language. But the use of kanji is a bit complicated.

The character "生" appears twice in this sentence:

  • 学生 is the simplified form of 學生, so its pronunciation derives from Middle Chinese; in this case "生" is pronounced "sei".
  • 生きた is the native Japanese word "ikita" ("lived"), whose root is 生きる.

How do we know whether to use the pronunciation "sei" vs "iki-"? To understand the problem, suppose English was written with Chinese characters and you encountered the word "上見" - should you pronounce this "over-see" (from the "native" pronunciation) or "super-vise" (from the Latin pronunciation)?

The general rule (with many exceptions) is: if a kanji appears by itself, use the "native" pronunciation; if it appears in a non-productive compound, then use the "Chinese" pronunciation. The word "gaku-sei" is a compound from "study" and "life"; the combination "study-life" doesn't obviously mean "a person who is studying at a school or college", so it's non-productive and should use the Chinese pronunciation. But that raises the question: which Chinese pronunciation?

Three kinds of Chinese pronunciation + Japanese pronunciation

Pronunciation changes over the years, and also varies by region. The word 学生 was pronounced /*m-kˤruk sreŋ/ or /*ɡruːɡ sʰleːŋ/ in Confucius' time; /*haewk sraeng/ in Middle Chinese and xuéshēng in modern Mandarin (hok6 saang1 in Cantonese, 8ghoq-san in Wu, etc.)

Chinese words entered Japan in three major waves:

  • With Buddhist and Confucionist teachings in the 5th and 6th centuries (Go-on 呉音), based on the then-prestigious Jiankang (now Nanjing) dialect.
  • During trade with the Tang Dynasty (Kan-on 漢音), based on the Chang'an pronunciation.
  • During trade and monastic missions during and after the Song Dynasty (Tō-on 唐音).

Of these, Kan-on pronunciations are the most common, often replacing the earlier Go-on. Tō-on were introduced piecemeal over a long period and often are somewhat specialized, such as used by the Ōbaku Zen school of Buddhism. Go-on are often used for relgious meanings, with other meanings using Kan-on.

So, when we see 生, there tree pronunciations:

  • Go-on: しょう (shō)
  • Kan-on: せい (sei)
  • Tō-on: さん (san) — this is rare

These are called "on-yomi" and "kun-yomi" ("yomi" is the noun form of "to read").

Some of the "kun-yomi" reflect the Japanese etymology rather than the Chinese. For example, 承る "uketamawaru" ("to humbly receive") is made up of "uke(ru)" + "tamawaru" - "to receive" + "to be granted" and could be written 受け賜る.

Jukujikun (熟字訓)

Jukujikun is an inseparable reading of a multi-kanji term that has no relationship with the characters individual readings, e.g. 今日 (kyō きょう vs konnichi こんにち); or 黄昏. For the latter (which means "twilight"), there is a Chinese word, which would have been pronnounced hwang xwon in Middle Chinese and therefore kō-kon in Japanese; but instead the native word tasogare is used. In a sense, this is a gloss on the entire word instead of a guide to its pronunciation.

And there are some "idiomatic" ones, which are more common in pre-WW2 writing than nowadays. For example, 一寸, which literally means "one inch" (and, with that meaning, can be read as "issun"), but which also can be read "chotto", meaning "somewhat" or "excuse me".

Post-WW2 reforms

By the 1900s, the spoken language had drifted quite a bit from the written language. For example, the word that is now written でしょう deshō, meaning "it seems" was written でせう deseu. After WW2, the use of hiragana and katakana was modernized, to reflect current pronunciations (with a few exceptions; e.g. は (which marks the subject) is pronounced "wa" and not "ha") and the object marker (pronounced "o") is written を ("wo"). Various kanji were simplified (but in a more conservative manner than the Chinese simplification in the 1960s) and the list of standard kanji was reduced to 1,850 (later 2,136), plus another 863 characters that are allowed for names.

Names

But we're not finished. People sometimes choose unusual names, so for example the character for "one" ("一") might be pronounced "hajime" (which means "beginning" and is usually written "初め"). The only reliable way to know how to pronounce a name is to ask the person or - for a place - someone who lives there. But even then there can be differences; for example, the place 上住吉 can be pronounced either "kami-sumi-yoshi" or "ue-sumi-yoshi".

Meaning shift

It's tempting to think of kanji as having immutable meanings. But, just like any other words, their meanings can shift over the years. For example, - in Japanese, this means "hot water" but in modern Chinese, it means "soup".

Korea and Vietname

For writing "native" words, Korea developed Hangul and Vietname developed Chữ Nôm.

Hangul is similar to hirakana/katakana in that can use Chinese characters mixed with phonetic characters for native words and grammar particles (however, unlike Japanese, there is typically only a single pronunciation for a Chinese character, derived from Chinese). Recently, Hangul has almost completely replaced the use of Chinese characters; this has been helped by the introduction of spaces into text.

Chữ Nôm invented new characters for native vocabulary. It has been replaced by the Vietnamese alphabet.

Footnotes

  1. https://en.wikipedia.org/wiki/Japanese_language.

  2. https://en.wikipedia.org/wiki/Chinese_language https://en.wikipedia.org/wiki/Middle_Chinese https://en.wikipedia.org/wiki/Classical_Chinese

  3. Diamond theorizes that Japanese is related to a Koreanic language from about 2900 years ago that died out about 1500 years ago when the kingdom of Silla unified the peninsula.

  4. See also Wikipedia: Classification of the Japonic languages.

  5. It's common to describe Shanghainese (Wu), Cantonese, Hakka, etc. as "dialects" but they are as distant from each other as French, Spanish, and Romanian (or German, Swedish, and English).

  6. There was extensive contact between the southern Korean peninsula's Gaya confederacy and the Japanese island of Kyushu for much longer, but with little written or archeological evidence.

  7. Due to the structure of Chinese and English - both of which are classified as analytic languages -it would make more sense to show this as marking up an English sentence for interpretation in Latin, whose neutral word order is similar to Japanese subject-object-verb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment