Log In?

The Rarest Mandarin Syllables

A recent CMS change introduced visual CSS glitches. Please bear with me as I hunt them down.

Published at 15:34:55-0800 Updated
Tags: chinese, zh, linguistics, research

Introduction

From September 2019 through October 2020, I catalogued the rarest syllables of Modern Standard Mandarin1. Occasional notetaking turned into a systematic search, and I believe this list is close to complete. To the best of my knowledge I am the first to compile much of this information -- but perhaps others have done so, somewhere beyond the reach of my search-fu.

During this investigation, I was specifically concerned with syllables which are rare regardless of which tone they are pronounced in. This meant that even certain uncommon tone-syllable pairings were not rare enough for me; for instance, only two characters are pronounced qiong in fourth tone, but qiong was not eligible for my list because there are many other characters pronounced qiong across the other three tones.

If you know a thing or two about Chinese, I urge you to read the disclaimer and leave comments, especially if you spot an error. The best way to learn something is to be wrong about it on the Internet.

One Character

Eight syllables are associated with a single character.

Unique Syllables in Everyday Use

Three of the eight are everyday words2 in heavy use. Shei is by far the most well-known of these: the only word pronounced shei is the question word "who." The word "who" is as common in Mandarin as it is in English, yet you'll never hear that syllable anywhere else.

The next everyday syllable with just one character is lia. Its one and only written counterpart is 俩 "two, both", a colloquial shortening of 两个 "two MW3." They can be used interchangeably in phrases like "you two" (“你们俩/你们两个”).

Finally, there is gei, which in third tone is the verb 给 "to give," as well as, roughly speaking, an indirect object marker and a passive marker.

MDBG claims that 假 is pronounced gei to transcribe a single word of Hokkien. I originally took this to mean that gei should be in the two-syllable category, but on 2020-11-10 I reconsidered, and moved it here. A single Hokkien transcription is far from a central example for Standard Mandarin usage, with which this essay is concerned. My MacOS input suggests that 䢎 is also pronounced gei, but as this character is so rare that even Zdic doesn't list a pronunciation, I'm willing to discard it for the time being. Eventually I will check elsewhere.

[Boy, I wish I didn't have to keep manually checking dictionary sites. If only someone was building a web interface to check all the important dictionaries at once, as part of a Chinese information-management tool.]

Unique Syllables, Rarely Seen

The remaining syllables in the single-character group are associated with characters far less common than the characters for shei, lia, and gei.

The character 欻 can be pronounced chua, first tone, to mean "crashing sound," while tei is an alternate reading for 忒 "too, very" (also pronounced tui, which is anything but rare.)

Rather more obscure is fiao4, which as 覅 which means something like "need not" or "don't." Also obscure is kei, found only as an alternate reading for 克, meaning "scold." That character, 克, is quite common, especially in adaptations or transcriptions of foreign words, but it is practically always read ke4. Had I not undertaken this project, I might never have encountered the kei reading, even if another eight years of study went by.

Lastly, eng is unambiguously associated with just one character: 鞥, an archaic literary term for a horse's reins. Technically, eng is considered a pronunciation for the interjection 嗯, "ah/oh/hmm?", but that is more often pronounced en5. Furthermore, "en" and "eng" are not distinguished by many speakers of Chinese, so the case for admitting this as a second character pronounced eng is weak, at best.

A Very Special Edge Case

A ninth syllable technically does not belong to this category, but might be considered a special case. The syllable fo, in second tone, is associated only with 佛 "Buddha." There are other characters with this pronunciation, but all of them are variants having exactly the same meaning. By contrast, the characters for the other rare syllables listed here all have distinct etymologies. As I understand it, no other characters have acquired the pronunciation fo because of the combined seriousness of Buddhist religious conviction and traditional Chinese naming taboos.

Two Characters

Three syllables are only encountered as readings for two piddly characters.

Three Characters

There are eight syllables which are associated with three characters.

Honorable Mentions: Four Characters

The honorable mentions have four characters each. Zhua and shuan are definitely in this group, while a few other candidates need more detailed checking: shai seems like it has either 4 or 5, while ha and re are up in the air.

Disclaimer

I am confident that this piece is unfinished. Even as I typed up this essay from my notes, I continued to find obscure characters which forced me to shuffle the above list.

I understand the obstacles to definitively stating that such and such syllable has exactly so many characters. Old, archaic, and variant characters make that quest questionable, as do the differences between simplified and traditional characters6. I have, however, succeeded in ruling out the rarity of most Mandarin syllables; observing what is left -- even when exact numbers are hard to come by -- is enough to make this linguist happy.

I started many of my searches for this project by typing a toneless syllable into the MacOS pinyin input method and seeing how many characters it suggested. That was a sufficiently reliable indicator to show whether I should search dictionaries such as CC-CEDICT, Xiandai Hanyu Guifan Cidian, and the Pleco app's in-house dictionary. If there is any large, reliable character database which allows a syllable as a search keyword, I have not found it.

  1. Modern Standard Mandarin (MSM) is the official version of Mandarin on the Chinese mainland, natively called Putonghua 普通话 or "common speech." The Taiwanese standard is usually called Taiwanese Mandarin (国语 "national language.")

  2. These three characters can stand alone as words, but many characters cannot.

  3. "MW" means "measure word," also called a "classifier." Roughly speaking, a measure word is an extra element which is usually7 pronounced between a number and the noun that number counts. English does not have measure words, but it does have something similar in the way "cup" is used in "three cups of milk" (which is sometimes called a "massifier8.") In Chinese, you add something like "cup" on any noun you count. In English, we could say "two cows" or "two head of cattle," but in Chinese that would always be rendered as "two head of cattle." And in English, there's no word like "head" for snakes or cats, but wiggly things like snakes (and pants, and rivers) share a measure word, as do small animals like cats, birds, and dogs.

  4. Thanks to "Unusual Syllables," at East Asia Student.

  5. In one of several tones, depending on the speaker's meaning.

  6. In all the cases I recall investigating for this essay, there was a 1-to-1 relationship between simplified and traditional characters, such that the count was the same whether I considered the traditional or the simplified character as canonical. I predict that any further large changes to my list will come from correcting mistakes made at this phase (e.g. noticing cases where two totally different traditional characters were simplified to one form, in which case my count was N but should be N+1.)