Phonology & writing system
of gjâ-zym-byn (gzb)

This presentation is somewhat linguistically technical; for a non-technical (but somewhat oversimplified) presentation see "Lesson 0".

The writing system is pretty phonemic, except that morpheme-boundary sandhi is not marked. In email messages and in the plain text pages of this site, I use an ASCII transcription which is a superset of x-convention Esperanto. My handwriting for gzb, however, has evolved into something that resembles the Esperanto alphabet about as much as the Cyrillic alphabet resembles the Greek, if that much.

In ASCII, a letter followed by 'x' or 'q' is a digraph; for instance, {sx} represents the postalveolar fricative in English "shoe" and {iq} represents the lax front vowel in English "lip". In Unicode, used in all the HTML pages of this site, these letters are {ŝ} and {ĭ}. Generally an 'x' digraph in the ASCII transcription corresponds to what was a circumflexed letter in my original handwritten orthography, and a 'q' digraph in the ASCII transcription corresponds to what was a hacek'd letter in my original handwritten orthography. However, not all of these characters are available in Unicode, and some which are in Unicode were not widely supported at the time I first devised the orthography, so I used the closest fits I could find.

Phoneme chart

Whenever in the tables below two letters or phoneme symbols appear paired like "t / d", the first is unvoiced and the second voiced.


 bilabial labiodental dental alveolar postalveolar palatal velar uvular glottal
plosive p / b t / d k / g q
fricative f / v θ / ð s / z ʃ / ʒ ç / ʝ x / ɣ h
affricate p͜f / b͜v t͜s / d͜z t͜ʃ / d͜ʒ c͜ç / ɟ͜ʝ
nasal m n ŋ
approximant w ɹ j
lateral approximant l
trill ʙ
tap ɾ
click ʘ ǀ ǃ
nasalized click ʘ̃


  Front unrounded Front rounded Central Back unrounded Back rounded
Close high i y ɹ̣ u
Open high ɪ ʊ
Close mid ø o
Central mid ə
Open mid ɛ ɔ
Very open mid æ
Open low ɑ

Details on phoneme inventory, with Unicode and ASCII orthographies

Plosive consonants / fĭ-θy kě'pâ-baw

gzb-Unicode gzb-ASCII IPA CXS Description
p / b p / b p / bp / b bilabial stops
t / dt / dt / dt / d alveolar stops
k / gk / gk / gk / g velar stops
ķ kx q q uvular stop or retracted velar stop

Fricative consonants / šî'fy-baw

f / v f / v f / v f / v labiodental fricatives
Φ px ʙ B\ bilabial trill or strongly aspirated /ph/
θ / ð tx / dx θ / ð T / D dental fricatives
s / z s / z s / z s / z alveolar fricatives
ŝ / ĵ sx / jx ʃ / ʒ S / Z postalveolar fricatives
š / ʝ sq / jq ç / ʝ C / j\ palatal fricatives (ich laut)
ĥ / ħ hx / hq x / ɣ x / G velar fricatives (ach laut)
hhhhglottal fricative
₣ / ƴ fx / vx p͜f / b͜v p_f / b_v labiodental affricates
c / ź c / zx t͜s / d͜z t_s / d_z alveolar affricates
ĉ / ĝ cx / gx t͜ʃ / d͜ʒ t_S / d_Z postalveolar affricates
č / ž cq / zq c͜ç / ɟ͜ʝ c_C / J\_j\ palatal affricates

Nasal consonants / nĭm-baw

m m m m bilabial nasal
n n n n alveolar nasal
ŋ nx ŋ N velar nasal (uvular nasal before "ķ" /q/)

Liquid consonants / ler-baw

r r ɾ or ɹ 4 or r\ alveolar tap alone syllable-inital (e.g. {râm}, "cat")
alveolar approximant if in initial cluster ({rjâ}, "seeking") or syllable-final ({hyr}, "hour").
l l l l lateral approximant, dental or alveolar

Clicks & ejective / Ќ-ƥ-baw

Ќ kq k_> velar ejective — same point of articulation as k, g, ĥ... tongue suddenly pushed forward
Ł lq ǃ !\ lateral or alveolar click — front of tongue pulled from roof of mouth
ť tq ǀ |\dental click — tip of tongue pulled from between teeth
ƥ pq ʘ O\bilablial click — lips pulled apart suddenly
ɱ mq ʘ̃ O\~ similar to {ƥ}, but nasal

Front vowels:

i i i i close high unrounded
î ix y y close high rounded
ĭ iq ɪ I open high unrounded
e e ɛE open mid unrounded
ô ox ø 2 close mid rounded
â ax æ & very open mid unrounded

Central vowels:

ř rq ɹ̣ r\= high retroflex
ě eq ə or ʌ: @ or V: central mid unrounded (schwa) when unstressed;
back mid unrounded when stressed
a a ɑ A open low unrounded

Back vowels:

u u u u close high rounded
y y ʊ U open high rounded
o o o o close mid rounded
ǒ oq ɔ O open mid rounded

All the vowels have nasal and oral variants. A nasal vowel is indicated by a following {ň}, as in such minimal pairs as {zuň} (alive) vs. {zu} (only), {bâ} (zero) vs. {bâň} (permission).

Semivowels, approximants:

j j j or I j or I palatal approximant
w w w or ʊ w or U bilabial semivowel
r r ɹ r\ alveolar approximant (in initial cluster or syllable-final only)

Many diphthongs occur — almost all the possible combinations of the vowels and approximants above. Keep {âw} and {aw} distinct. The first is the diphthong in English "how" or Esperanto "aŭ", conventionally transcribed /aʊ/ though it's typically pronounced as /æʊ/. For the second, /ɑʊ/, I am not familiar with any natural language equivalent. A particular tongue-twister is the postposition {rřr}, /ɾɹ̣ɹ/ "from far beyond".

This list shows all the diphthongs occuring in the lexicon as of 2012/3/10:
Rising Falling
ja aj
je aw
ju ej
jy ěr
ra ěw
ru ĭj
wa ĭr
we ir
wi îw
wo or
wu ôw
wy uj

Nonphonemic letters:

ň nq indicates the preceding vowel is pronounced nasally
ŗ rx a non-gzb rhotic sound in a foreign name

There is an unwritten glottal stop at the beginning of the postpositions {i, o, ř} when they have no directional prefix consonant, as in {mĭ-i}, "about", and {vâ-oŋ-zô}, "to eat"; but also in words like {kujm-o} "for the purpose of", where the previous syllable ends in a consonant. I am not sure if this glottal stop should be analyzed as phonemic or not.

Vowel harmony

By vowel harmony if any syllable of a word is nasalized, they all are, as are any following clitics. So the grapheme {ň} appears only at the end of a root word (or conjunction), never in suffixes or clitics. This means that nasality is allophonic, not phonemic, for the vowels that only occur in clitics and suffixes. If a compound stem is formed of a root with a nasal vowel and one with an oral vowel, all vowels in the compound are nasal. This can theoretically cause homophony; for instance, the (fairly contrived) compounds {tâ-zuň} (sibling-alive) and {tâň-zu} ( would be pronounced identically as /'tæ̃.zũ/.

Nasal vowels are rare in the lexicon (occurring in 28 root words out of 1353 as of 2014/03/16), though they are not uncommon in running text because they occur in a few common morphemes such as {zuň} "alive", {ryň} "do, act", {θuň} "story, narrative", and {kiň} "and".

(This type of vowel harmony is rare in natural languages — most natlang vowel harmony systems are based on backness, height or rounding — but something very similar apparently occurs in Guarani, where it also affects the selection of nasal vs. non-nasal consonants.)

Miscellaneous notes on allophony and sandhi

When a morpheme beginning with an affricate follows a morpheme that ends with a nasal consonant, the affricate is lenited into the corresponding plain fricative. e.g., {tyn-ca} "to situate oneself" is usually pronounced /'tʊn.sɑ/, rather than /'tʊn.t͜sɑ/. This also causes occasional homophony; for instance, {râm-źa} (cat-AUG = mountain lion) and {râm-za} (cat-ADJ = relating to cats) are both pronounced /'ræm.zɑ/.

An initial cluster of semivowels tends to be coarticulated, as in {rjâ} "quest" or {wrym} "decoration".

Consonants that are followed immediately by rounded vowels |u|, |o|, |î|, and |ô| tend to be slightly labialized.

The distinction between |u| /u/ and |y| /ʊ/ tends to be neutralized when they occur before |w| /w/.

|ě| is schwa /ə/ when unstressed, when it occurs in an open CV syllable. If it occurs in a syllable closed by an approximant, e.g. |kěr'nâ| "dogwood tree", it takes primary stress and is pronounced as a long /ʌ:/, thus: /'kʌ:ɹ.næ/. This allophone may tend to occur also preceding a nasal consonant in the same closed syllable, e.g. |sěm'su|, "samosa", /'sʌ


gjâ-zym-byn has very restrictive phonotactic constraints compared to English, German, Esperanto, etc., but comparatively liberal constraints compared with Japanese, Hawai'ian, Konya, etc.

Normally only a click, ejective or vowel can form a syllable nucleus. However, foreign names can have syllabic lateral or nasal consonants (which (in handwriting) are marked with a grave accent on the syllabic consonant), and in general are subject to much looser phonotactic constraints than native words.

Generally speaking, the form of a syllable is:


where C = any consonant except a click or ejective, S = semivowel/approximant (|r|, |l|, |j|, |w|), V = vowel, N = nasal consonant (|m|, |n|, |ŋ|). But this is a simplification; not all initial consonants can be followed by an approximant, and some can be followed only by a limited subset of approximants. Not all approximants can cluster with each other syllable-finally. Nasal vowels (in root words) cannot be followed by a nasal consonant, whether or not an approximant intervenes.

The following consonants may be followed by any approximant:

|k|, |g|, |t|, |d|, |p|, |b|, |s|, |z|, |θ|, |ð|, |f|, |v|, |c|, |ź|, |m|, |n|, |ŋ|

The following consonants may be followed only by |r|, |l|, or |w|:

|š|, |ʝ|, |ŝ|, |ĵ|, |ĥ|, |ħ|

The following consonants may be followed only by |r|, |l|, or |j|:

|₣|, |ƴ|

The following consonants may be followed only by |r| or |l|:

|ĉ|, |ĝ|, |w|

These consonants cannot be followed by any approximant:

|Φ|, |ķ|, |j|, |č|, |ž|

In addition, |h| is only followed by |w|, and initial |r| only by |j|. Initial |r| could be followed by |w|, but this isn't yet used in any already-coined word.

Syllable-finally, any approximant can be followed by any nasal, but the only final combinations of approximants occuring in the lexicon are /rj/ as in {purj}, "environment" and /jl/ as in {ojl}, "across". Possible final clusters of approximants that may be used in future coinings include /wr/ and /wl/.

Samples of syllable types: (C = any consonant except a click or ejective, K = click or ejective, S = semivowel/approximant, V = vowel, N = nasal consonant)

K ť you
CSVSfjâw awe

The vowel in a syllable indicates the type of word it occurs in:

i, o, řPostpositions ({i}, at; {son}, onto)
iň, oňCertain conjunctions (e.g. {kiň}, "and")
e, ǒOther conjunctions, adverbial particles, quantifiers, etc.
a, ôsuffixes (e.g. {-van}, the stative verb ending)
î, ě (and their nasal variants)initial or medial syllable of a polysyllabic noun root
u, y, ĭ, â (and their nasal variants)sole or final syllable of a noun root
E.g.: {fî'suň}, "Earth"; {ruŋ}, "going, movement"

This system originally gave gzb a self-segregating morphology; however, later (within the first year or two) changes to the phonotactics, allowing more initial and final clusters, broke the perfect self-segregation. Still, gzb has fewer ambiguous morpheme boundaries than many languages. Ambiguity never extends farther than the consonant cluster(s) at a morpheme boundary, and involves a series of two or more consonants of which at least one might belong to either syllable. In practice, this ambiguity won't occur in speech, or in the ASCII or Unicode orthographies which mark morpheme boundaries with hyphen; the handwritten orthography may make them ambiguous, however. For instance, {ĝyŋla} could be parsed as {ĝy-ŋla} (middle-day.of.week = Wednesday) or {ĝyŋ-la} (sixteen-affectionate); in practice, only the former parse would make sense in most contexts. The fact that {y} occurs only in the final syllable of root words and {a} occurs only in monosyllabic suffixes means that this can't be all one two-syllable morpheme, and the fact that final nasal+liquid is not a legal cluster (plus the fact that only spacetime postpositions can start with a vowel) means that it can't be parsed as {ĝyŋl-a}. Another example, {tĭwmwĭl}, looks like it could be parsed as {tĭw-mwĭl} (furniture-sleep = bed) or {tĭwm-wĭl}, but in practice the latter parse is impossible since neither {tĭwm} nor {wĭl} is an actual meaningful morpheme.

Nonsegmental phonology

Stress and intonation aren't phonemic. I haven't quite figured out what the stress and intonation rules are yet, but getting them wrong won't make a word mean something different (as in Chinese) or turn a statement into a question (as in English), or give a sentence an ironic or sarcastic turn.

Penultimate stress is much the most common, but there are exceptions, some words getting stressed on the ultima or antepenult. The high front rounded vowel /y/ {î} attracts the stress if it's in the antepenult, e.g. in {θrî'sě'kjurn} "ibis", {dî'fu-zô} "to compare". (/y/ can never appear in the ultima; it occurs only in the initial and medial syllables of two- or three-syllable content root words.) The schwa /ə/ {ě} is never stressed unless it is followed by an approximant or nasal in the same syllable, in which case it's realized as [ʌ:] and usually stressed; a two-syllable word where the first syllable is open and has schwa will be stressed on the ultima.

Those rules account for most of the exceptions to penultimate stress, but there are few others I haven't figured out yet. Tense vowels may be more likely to be stressed than lax vowels, and vowels followed by approximants more likely to be stressed than pure vowels. Also, it seems that the same word can have different syllables stressed in different contexts within specific sentences. Typically the two-syllable derived postpositions are stressed on the penult, but in some contexts neither syllable is stressed, when the primary stress of the postpositional phrase as a whole falls on the syllable before the postposition (the last syllable of the noun, or a clitic following the noun).

Certain suffixes are never stressed, e.g. the verb suffixes and the basic adjectivizing suffixes. Some other suffixes, clitics, and incorporated postpositions may get primary stress, but it's more likely to fall within a content root word. I think the distinction between suffixes and clitics that can and those that can't get stress is at least partly semantic rather than purely phonological, but I'm not sure yet.


I've moved the discussion of the handwritten orthography into a separate document, since it has some largish images.

I gradually worked out this Unicode mapping in late 2004-early 2005. I'm still not entirely satisfied with it, though. In July 2005 John Quijada suggested some possible improvements (e.g. replacing |î| with |ü|) which, on consideration, seemed like too much work to implement.

There are no capital letters. Proper names are indicated to be such by suffixes. In my handwritten script, proper names used to be preceded by an open single quote, and their stressed vowels are sometimes marked with an acute accent; syllabic nasals or liquids are marked with a grave accent. An older form of the ASCII orthography used capitalized vowels for irregular stress.

Acronyms are written with commas separating the letters. (Periods are always and only used to end a sentence, which should simplify the parsing problem if I ever get around to writing a parser for gzb.) They may be pronounced by inserting {ě} (schwa) after the first and medial consonants, {u} after the last, with stress on the last syllable.



.ends every sentence
?precedes question sentences
!precedes imperative sentences
*precedes especially important sentences (like English use of "!")
{ }quotation marks (advantage: they can be unambiguously nested, unlike "")
, : ; ( ) used much as in English
- separates morphemes in most compound words
'separates syllables in polysyllabic roots (not strictly necessary for showing syllable divisions, but helpful for providing some whitespace)

The hyphen and apostrophe are used only in the ASCII and Unicode orthographies; I no longer use them in handwriting, except for using hyphenation when splitting a word across two lines (in which case I usually have a hyphen at the end of one line and the beginning of the next, unlike in English). I only hyphenate on morpheme boundaries, not syllable boundaries within morphemes (I wish printers of books in English and Esperanto would do the same). I occasionally mark potentially ambiguous morpheme boundaries with a mid-dot |·|, e.g. between a proper name and the name-type suffix.

Frequency of gzb phonemes in running text

This table shows the frequency of gzb phonemes in my electronic corpus as of September 2010. Total size of the corpus was 65995 phonemes (not graphemes or bytes; sequences of a vowel letter and the nasal sign were counted as one phoneme, and spaces and punctuation were ignored in this count).

6.0792% 4012 a
5.8474% 3859 m
5.2610% 3472 â
4.6307% 3056 i
4.5958% 3033 j
4.3912% 2898 r
4.3822% 2892 n
4.2821% 2826 ĭ
4.0761% 2690 u
3.7230% 2457 l
3.5957% 2373 ô
3.3927% 2239 w
3.3275% 2196 y
3.1593% 2085 k
2.8381% 1873 e
2.7290% 1801 z
2.6260% 1733 ǒ
2.5184% 1662 ŋ
2.2290% 1471 t
2.1199% 1399 v
1.6759% 1106 g
1.6562% 1093 p
1.6259% 1073 o
1.4774% 975 s
1.3486% 890 c
1.2607% 832 b
1.1955% 789 h
1.1865% 783 š
1.1743% 775 f
0.9243% 610 ě
0.9198% 607 d
0.9046% 597 ĥ
0.8501% 561 θ
0.7167% 473
0.6667% 440 ĉ
0.5879% 388 Ќ
0.5410% 357 ř
0.5394% 356 î
0.5273% 348 ð
0.4955% 327 ƥ
0.4743% 313 ŝ
0.4046% 267 ķ
0.3409% 225 ĵ
0.2909% 192 ź
0.2788% 184 ʝ
0.2697% 178 ĝ
0.2682% 177 ť
0.2652% 175 Φ
0.2606% 172
0.1940% 128 ƴ
0.1924% 127 ħ
0.1894% 125 ž
0.1091% 72
0.0909% 60 č
0.0879% 58
0.0803% 53 âň
0.0621% 41 Ł
0.0379% 25 ɱ
0.0076% 5 ŕ
0.0061% 4
0.0061% 4
0.0045% 3 ĭň

