From September 2019 through October 2020, I catalogued the rarest syllables of Modern Standard Mandarin1. Occasional notetaking turned into a systematic search, and I believe this list is close to complete. To the best of my knowledge I am the first to compile much of this information -- but perhaps others have done so, somewhere beyond the reach of my search-fu.
During this investigation, I was specifically concerned with syllables which are rare regardless of which tone they are pronounced in. This meant that even certain uncommon tone-syllable pairings were not rare enough for me; for instance, only two characters are pronounced qiong in fourth tone, but qiong was not eligible for my list because there are many other characters pronounced qiong across the other three tones.
If you know a thing or two about Chinese, I urge you to read the disclaimer and leave comments, especially if you spot an error.
The best way to learn something is to be wrong about it on the Internet.
Eight syllables are associated with a single character.
Unique Syllables in Everyday Use
Three of the eight are everyday words2 in heavy use. Shei is by far the most well-known of these: the only word pronounced shei is the question word "who." The word "who" is as common in Mandarin as it is in English, yet you'll never hear that syllable anywhere else.
The next everyday syllable with just one character is lia. Its one and only written counterpart is 俩 "two, both", a colloquial shortening of 两个 "two MW3." They can be used interchangeably in phrases like "you two" (“你们俩/你们两个”).
Finally, there is gei, which in third tone is the verb 给 "to give," as well as, roughly speaking, an indirect object marker and a passive marker.
MDBG claims that 假 is pronounced gei to transcribe a single word of Hokkien. I originally took this to mean that gei should be in the two-syllable category, but on 2020-11-10 I reconsidered, and moved it here. A single Hokkien transcription is far from a central example for Standard Mandarin usage, with which this essay is concerned. My MacOS input suggests that 䢎 is also pronounced gei, but as this character is so rare that even Zdic doesn't list a pronunciation, I'm willing to discard it for the time being. Eventually I will check elsewhere.
[Boy, I wish I didn't have to keep manually checking dictionary sites. If only someone was building a web interface to check all the important dictionaries at once, as part of a Chinese information-management tool.]
Unique Syllables, Rarely Seen
The remaining syllables in the single-character group are associated with characters far less common than the characters for shei, lia, and gei.
The character 欻 can be pronounced chua, first tone, to mean "crashing sound," while tei is an alternate reading for 忒 "too, very" (also pronounced tui, which is anything but rare.)
Rather more obscure is fiao4, which as 覅 which means something like "need not" or "don't." Also obscure is kei, found only as an alternate reading for 克, meaning "scold." That character, 克, is quite common, especially in adaptations or transcriptions of foreign words, but it is practically always read ke4. Had I not undertaken this project, I might never have encountered the kei reading, even if another eight years of study went by.
Lastly, eng is unambiguously associated with just one character: 鞥, an archaic literary term for a horse's reins. Technically, eng is considered a pronunciation for the interjection 嗯, "ah/oh/hmm?", but that is more often pronounced en5. Furthermore, "en" and "eng" are not distinguished by many speakers of Chinese, so the case for admitting this as a second character pronounced eng is weak, at best.
A Very Special Edge Case
A ninth syllable technically does not belong to this category, but might be considered a special case. The syllable fo, in second tone, is associated only with 佛 "Buddha." There are other characters with this pronunciation, but all of them are variants having exactly the same meaning. By contrast, the characters for the other rare syllables listed here all have distinct etymologies. As I understand it, no other characters have acquired the pronunciation fo because of the combined seriousness of Buddhist religious conviction and traditional Chinese naming taboos.
Three syllables are only encountered as readings for two piddly characters.
- diu is the reading for 丢 diu1 "lose, discard" and 铥 diu1 "thulium". (I choose to ignore a variant on the former which uses a heng stroke instead of a pie at the top, i.e. 丟; I also choose to leave out the character 颩, which, according to Zdic, is only ever read diu1 when it stands in for 丢.)
- sen is bestowed on these and these alone: 森 sen1 "forest" and 椮 sen1 "lush growth (of trees); fishing using bundled wood."
- seng cuddles up to just two characters: 僧 seng1 "monk; sangha" and 鬙 seng1 "short hair; unkempt."
There are eight syllables which are associated with three characters.
- miu is attached to 谬, 缪, and 唒 -- and I've only found dictionary entries for the first two. The word 荒谬 huang1miu4 "absurd, ridiculous" is the most commonly-seen word containing one of those characters.
- zei has 贼 "thief, traitors; wily, deceitful; extremely" and a variant thereon, 戝 (ignoring the difference between simplified and traditional versions of the 贝 radical), as well as 鲗 "cuttlefish."
- cen has 㟥 "uneven, not uniform," 岑 "small hill" (also a surname), and 涔 "overflow; rainwater; tearful."
- fou has 否 "negate, deny"; 紑 "bright; glossy"; and 缶 "pottery." The first, fou3 "negate, deny," is common in formal language; the others are quite specific and rarely seen.
- shua is most commonly encountered as the reading for 刷, as shua1 "to brush, daub, paint" or shua4 "to select." The other characters for this syllable are 耍 "play with; wield; act cool; display (skill, temper)" and 唰, a piece of onomatopoeia for a whistling or rustling sound.
- One of the characters read ne is the extremely common speech particle 呢. It can occur at the ends of questions in certain discourse contexts, and can also be said in between listed items. The other two characters, both pronounced ne4 are 讷 "speak slowly; inarticulate" and the radical 疒 "sick, sickness". I've never heard this last one referred to as ne4, which is its name as a Kangxi radical; in everyday speech, it is called 病字旁, "the radical in 病" (which also means "sick, disease").
- den is oddly distinguished by its three characters 扽, 扥, and 㩐, all of which are read den4 and have the meaning "yank, pull tight."
- nüe has to its credit 虐 "oppressive, tyrannical" and 疟 "malaria." MacOS pinyin input claims that 硸 is one, too, but it's not in any of the dictionaries I've checked so far.
Honorable Mentions: Four Characters
The honorable mentions have four characters each. Zhua and shuan are definitely in this group, while a few other candidates need more detailed checking: shai seems like it has either 4 or 5, while ha and re are up in the air.
I am confident that this piece is unfinished. Even as I typed up this essay from my notes, I continued to find obscure characters which forced me to shuffle the above list.
I understand the obstacles to definitively stating that such and such syllable has exactly so many characters. Old, archaic, and variant characters make that quest questionable, as do the differences between simplified and traditional characters6. I have, however, succeeded in ruling out the rarity of most Mandarin syllables; observing what is left -- even when exact numbers are hard to come by -- is enough to make this linguist happy.
I started many of my searches for this project by typing a toneless syllable into the MacOS pinyin input method and seeing how many characters it suggested. That was a sufficiently reliable indicator to show whether I should search dictionaries such as CC-CEDICT, Xiandai Hanyu Guifan Cidian, and the Pleco app's in-house dictionary. If there is any large, reliable character database which allows a syllable as a search keyword, I have not found it.
Modern Standard Mandarin (MSM) is the official version of Mandarin on the Chinese mainland, natively called Putonghua 普通话 or "common speech." The Taiwanese standard is usually called Taiwanese Mandarin (国语 "national language.")↩
These three characters can stand alone as words, but many characters cannot.↩
"MW" means "measure word," also called a "classifier." Roughly speaking, a measure word is an extra element which is usually7 pronounced between a number and the noun that number counts. English does not have measure words, but it does have something similar in the way "cup" is used in "three cups of milk" (which is sometimes called a "massifier8.") In Chinese, you add something like "cup" on any noun you count. In English, we could say "two cows" or "two head of cattle," but in Chinese that would always be rendered as "two head of cattle." And in English, there's no word like "head" for snakes or cats, but wiggly things like snakes (and pants, and rivers) share a measure word, as do small animals like cats, birds, and dogs.↩
In one of several tones, depending on the speaker's meaning.↩
In all the cases I recall investigating for this essay, there was a 1-to-1 relationship between simplified and traditional characters, such that the count was the same whether I considered the traditional or the simplified character as canonical. I predict that any further large changes to my list will come from correcting mistakes made at this phase (e.g. noticing cases where two totally different traditional characters were simplified to one form, in which case my count was N but should be N+1.)↩