Occultists and esotericists , such as the Hermetic Order of the Golden Dawn [1], have theorized that ancient Egyptian magic is a primary source for western magic practice and ideas. Since we know that the Hermetica and Neo-platonic theurgy have had a profound influence on later European magical traditions [2], an inquiry into possible relationships between Egyptian and Greek magical ideas would be useful in exploring the veracity of the occultists' claim. This paper focuses on one set of ancient texts, the Greek Magical Papyri, which offer considerable potential for investigating this relationship.

The PGM (Papryi Graecae Magicae) [
3 ] is the name given to a cache of papryi of magical spells collected by Jean d'Anastaisi in early 1800's Egypt. Hans Deiter Betz, in his introduction to the newest English translation, speculates that these papyri may have been found in a tomb or temple library and the largest papyri may have been the collection of one man in Thebes.[4 ]However, the exact provenance for the PGM is unknown. Betz states that through literary sources it is known that quite a number of magical books of spells were collected in ancient times, most of which were destroyed.[5] Thus, the PGM are a very important source for first-hand information about magical practices in the ancient Mediterranean.

The PGM spells run the gamut of magical practices from initiatory rites for immortality to love spells and healing rites. Most of the papyri are in Greek and Demotic with glosses in Old Coptic and are dated between the 2nd century BC and the 5th century AD. The spells call upon Greek, Egyptian, Jewish, Gnostic and Christian deities.

Two of the most intriguing aspects of these texts are the practice of self-identification with deity and the use of
voces magicae in performing magical rituals. In many of the spells, the practitioner is told to use "I am" with a specific deity name to empower or work the spell. PGM I 247-62, a spell for invisibility, states `I am Anubis, I am Osir-phre, I am OSOT SORONOUIIER, I am Osiris whom Seth destroyed. ."[6] The use of specific magical language in these texts, the voces magicae, is abundant. Most of these words are considered "untranslatable" by the scholars working with the papyri [7]. Words of power in the incantations are composed of long strings of vowels, A EE EEE IIII OOOOO, YYYYYY, OOOOOOO, alone or with special names of deities or daimons which are often palindromes and significantly lengthy as in IAEOBAPHRENEMOUNOTHILARIKRIPHIAEYEAIPIRKIRALITHONUOMENERPHABOEAI. [8 ]The exact pronunciation of these voces magicae was key to the success of the spells.

Since Egyptian funerary texts clearly identify the deceased with deity and the power of words and language is a predominant feature of Egyptian magic, these notions found in the PGM appeared to provide a possible link between ancient Egyptian and Greek magic.

Throughout the funerary literature of ancient Egypt, from the Pyramid Texts to the Book of the Dead, there is abundant evidence that ancient Egyptians thought that human beings could become deities. Deities were seen as possessing
heku, magic, an aspect of the original creative power that formed the cosmos. [9 ] Thus, magic was perceived to be an intrinsic part of reality and the divine. [10 ] The Coffin Texts provide a guide book for the deceased to help her or him retain what magic they already possess and to gain more. Naming is extremely important in this experience and it is the ability to name all the gods and objects encountered that proves one has acquired enough magic to sit with the gods. [11] In these texts, the deceased is clearly identified with the god Osiris. By using historaloe the deceased will successfully navigate the journey to the afterlife as did Osiris. The use of historaloe in magical practice was common, particularly in healing rites. [12] By knowing the names of all encountered in the afterlife and establishing a link with a deity that had already been successful in this realm, the deceased was well prepared for the journey.

In the Pyramid texts, the initial Utterances appear to be a script directing the different Egyptian deities to recite specific formulas on the deceased king's behalf. Utterance 1 begins "recitation by Nut, the greatly beneficent", utterance 2, "recitation by Geb" and so forth. [
13 ]Evidence that these utterances were spoken during funeral rites are the notes after the recitations which give directions saying, for example, "pour water"(ut 23) and "cold water and 2 pellets of natron"( ut 32). The priests and priestesses are taking the role of the deities in preparing the deceased to join the gods in the afterlife as well as the deceased being identified with Osiris. Self-identification with deity is an "authentically Egyptian trait". [14 ]

Language, and particularly naming, carries substantial magical power in Egyptian thought. The goddess Isis, once she learns Ra's true name, is then able to cure him of snake bite. [
15] One of the oldest cosmologies of the Egyptians from Memphis (approx. 2700 BC) describes the god Ptah creating by his mind (heart) and word (tongue) [16]. Thus, words contain a primal substance and the act of speaking mirrors original creation. Speaking creates reality. Writing was given to humans by the god Thoth and the Egyptians called their langauge "words of the gods" and hieroglyphs "writing of the sacred words." [17 ]

The Pyramid Texts, Coffin Text and the Book of the Dead all exhibit the Egyptian belief in the power of language to affect the world. Words, spoken or written were not just symbols, but realities in themselves. [
18 ] Hieroglyphs held particular resonance with magical power and most of the funerary texts were written in hieroglyphs. The Egyptians clearly believed that humans have energetic doubles in the world beyond the physical and it seems reasonable to suspect that the hieroglyphs were thought to have a similar existence since they were written on the inside of the pyramid tombs or coffins or on scrolls placed inside the coffins for the deceased to use. Further evidence of the reality of the images themselves comes from the practice of cutting particular hieroglyphs in half to diminish their potential effect. [19 ]

Vowel chanting is also found in Egyptian religious practice as reported by Demetrius in his Roman treatise,
De Eloutione:

"in Egypt the priests, when singing hymns in praise of the gods, employ the 7 vowels which they utter in due succession and the sound of these letters is so euphonious that men listen to it in place of the flute and lyre" [
20 ]

The distinction between religion and magic in scholarly discourse breaks down in the context of Egyptian religion and it is reasonable to suspect that vowel chanting could be used for more than hymns of praise by Egyptian priests.

Thus, self-identification with deity and use of a specific kind of magical language found in the PGM places Egyptian magical notions within a Greek magical context. The question then becomes, can evidence be found that Greek magic, prior to the PGM, included these practices and do they appear in later Greek magical material that we know to have influenced the European tradition.

Betz states in the Encyclopedia of Religion that "magic was an essential part of Greco-Roman culture and religion." [
21] In classical Greece, Egypt and Thessaly were considered prime sources of magical knowledge, but by 323 BC magical material in Greece had increased considerably. Betz further states that it was "Hellenistic syncretism that produced the abundance of material available today." [22 ]Greek magical practitioners distinguished different types of magic; goeteia - lower magic, mageia - general magic and theourgia - higher magic. Theourgia, appears to be the most likely place to find self-identification with deity and the use of voces magicae.

Self-identification with deity in magical acts as part of ancient Greek magical practice prior to the PGM is not evident. The Greeks speculated that humans and gods "had the same mother", but a huge gap existed between them. From ancient times to the latest date of the PGM, Greek notions about the relationship between human existence and divine existence took a variety of forms [
23,] but never followed the Egyptian pattern of the possibility of declarative divine identity. The ancient Greeks believed that communion with the gods was possible as in the Eleusian and Dionysian mysteries [24] and Empedocles declared he had the knowledge to make himself immortal. [25]But, the Greek idea of a divine spark within the human soul which can be activated, contemplated and re-united with the gods still assumes an other-ness of deity and validates the fundamental separateness of human existence from the divine.

For the Egyptians, the divine appears to be immanent in the world. The world of humans and gods were not seen as being decidedly different. Human activity continued after death and Gods, embodied as the Pharoah, lived in human society. Magical practice was merely clarifying what already exists. For the Greeks, magic was a conduit for communication and communion with deity or a process whereby the soul could be purified through direct contact with the Divine. Egyptians had only to affirm a state of being through speech to create the sought reality. "Repeated commands or assertions that a desired state of affairs was already in being, are a common feature of Egyptian spells." [
26 ]

However, there are references to the
voces magicae in ancient Greek material aside from the PGM. Early, are the Ephesia grammata, ( ASKION, KATASKION, LIX, TETRAX, DAMNAMENEUS, AISIA ) mystic letters that were supposedly inscribed on the statue of Artemis at Ephesus used verbally and written to avert evil. A lead tablet inscribed with the Ephesia grammata dates to the 4th c BC and they were said to be used spoken as an apotropiac charm while walking in a circle around newlyweds. [27 ]

Peter Kingsley, writing of Empedocles' magical worldview, states "there is nothing that is not vibrantly and knowingly alive. For him [Empedocles] - everything - even the words spoken by a man of understanding has an existence, intelligence and consciousness of it's own." [
28 ]This notion appears close to the Egyptian ideas that words are not symbols, but realities.

Orpheus healed human pathos with poems and the lyre, while Pythagoras could chant his disciples to sleep and heal body and soul through musical words. [
29 ]Fox argues that the PGM are carrying forward this "shamantic" tradition of magical musical charms. For the actual author(s) of the PGM, the notion of the magical potency of language could have been very strong indeed coming out of both the Egyptian and Greek magical traditions.

The use of
voces magicae continues into later Coptic texts. For a spell invoking a "thundering power to perform every wish" the practitioner should say: "I invoke you. . .who is addressed with the great secret name HAMOUZETH BETH ATHANABASSETONI ." [30] Vowel incantations are also found in these Coptic texts in figures typical of the PGM: [31 ]


Voces magicae are also referred to in the Chaldean Oracles which are contemporary with the PGM and they appear to be an intrinsic part of the theurgist's ritual. What is intriguing, for this study, about the Chaldean Oracles, is the relationship between the voces magicae and the process of immortalization of the soul, which is the goal of theurgy. These texts provide the closest approximation to self-identification with deity in a non-Egyptian context. According to the Chaldeans, the soul, in its descent to the body gathers impure substances. Through theurgistic rites, the soul can re-ascend, encounter the Divine and be purified of these impure substances and attain immortality. The voces magicae invoke the assistant spirits that will help the soul to ascend without fear of being dragged down into Hades. [32 ]However, even though immortalization is the goal, self-identification with deity is not declared and only the soul can attain such a state.

The idea that the Egyptian language specifically held magical power is seen in the writings of people of the time. In the Hermetica (CH xvi) there is a passage which states that Greeks will not understand the Hermetica when translated into their language as Greek does not contain the power of Egyptian. [
33 ]The Chaldean Oracles state "do not ever alter the foreign names (of the gods)". Lewy elaborates further, "It is impossible to translate the magical formula, because its power it not due to its external sense." [34] Iamblichus, describing the difficulty of translating the Hermetica from Egyptian to Greek says ". . .for the very quality of the sounds and the [intonation] of the Egyptian words contain in itself the force of things said." [35 ] Invocation of deities by their secret names is also characteristic of Egyptian magic prior to the PGM according to Pinch, but unfortunately she does not give examples. [36 ]

Scholars have identified other potential sources beside Egyptian for specific
voces magicae. The glossary in the Betz edition of the PGM speculates on a few of the voces magicae. Jewish and Greek origins are offered as well as Egyptian for the eight names considered. Betz finds a intricate syncretism of Greek, Egyptian and Jewish elements in the texts. [37 ]To tease out the various strands and definitively locate the origin of specific voces magicae is yet to be done and will be difficult. What we may be seeing in the voces magicae is a general and wide-spread ancient Mediterranean magical practice. It could be that ABRACADABRA is a cousin to the voces magicae in the PGM.

Further questions to be asked regarding the
voces magicae are: what were the potential avenues of magical communication between Egypt and Greece in the 4th century BCE where the earliest evidence of specific magical words is found in the Ephesia grammata? Is there evidence of specific voces magicae, other than vowel chanting, in Egyptian magical practice prior to the PGM? If the specific form comes from Greek notions, why are the voces magicae in the PGM glossed into Old Coptic in many spells where the main body of the text is in Greek?

In conclusion, the claim that the roots of European magic can be traced to Egyptian magic appears highly suspect in regard to the notions discussed. Egyptian ideas and practices of self-identification with deity do not seem to be compatible with Greek notions of the relationship between the human and divine worlds. Through the
voces magicae there is evidence of a generalized magical tradition in the ancient Mediterranean from which the European tradition may draw, but not specifically from Egypt.

Runes (Proto-Norse: ᚱᚢᚾᛟ (runo), Old Norse: rún) are the letters in a set of related alphabets known as runic alphabets, which were used to write various Germanic languages before the adoption of the Latin alphabet and for specialised purposes thereafter. The Scandinavian variants are also known as futhark or fuþark (derived from their first six letters of the alphabet: F, U, Þ, A, R, and K); the Anglo-Saxon variant is futhorc or fuþorc (due to sound changes undergone in Old English by the names of those six letters).

Runology is the study of the runic alphabets, runic inscriptions, runestones, and their history. Runology forms a specialised branch of Germanic linguistics.

The earliest runic inscriptions date from around 150 AD. The characters were generally replaced by the Latin alphabet as the cultures that had used runes underwent Christianisation, by approximately 700 AD in central Europe and 1100 AD in northern Europe. However, the use of runes persisted for specialized purposes in northern Europe. Until the early 20th century, runes were used in rural Sweden for decorative purposes in Dalarna and on Runic calendars.

The three best-known runic alphabets are the Elder Futhark (around 150–800 AD), the Anglo-Saxon Futhorc (400–1100 AD), and the Younger Futhark (800–1100 AD). The Younger Futhark is divided further into the long-branch runes (also called Danish, although they were also used in Norway and Sweden); short-branch or Rök runes (also called Swedish-Norwegian, although they were also used in Denmark); and the stavlösa or Hälsinge runes (staveless runes). The Younger Futhark developed further into the Marcomannic runes, the Medieval runes (1100–1500 AD), and the Dalecarlian runes (around 1500–1800 AD).

Historically, the runic alphabet is a derivation of the Old Italic alphabets of antiquity, with the addition of some innovations. Which variant of the Old Italic family in particular gave rise to the runes is uncertain. Suggestions include Raetic, Etruscan, or Old Latin as candidates. At the time, all of these scripts had the same angular letter shapes suited for epigraphy, which would become characteristic of the runes.

The process of transmission of the script is unknown. The oldest inscriptions are found in Denmark and northern Germany, not near Italy. A "West Germanic hypothesis" suggests transmission via Elbe Germanic groups, while a "Gothic hypothesis" presumes transmission via East Germanic expansion.

Magical or divinatory use

A bracteate (G 205) from approximately AD 400 that features the charm word alu with a depiction of a stylized male head, a horse, and a swastika, a common motif on bracteates

An illustration of the Gummarp Runestone (500-700 AD) from Blekinge, Sweden

Closeup of the runic inscription found on the 6th- or 7th-century Björketorp Runestone located in Blekinge, Sweden

Main article: Runic magic

The stanza 157 of Hávamál attribute to runes the power to bring that which is dead back to life. In this stanza, Odin recounts a spell:

Þat kann ek it tolfta,
ef ek sé á tré uppi
váfa virgilná,:
svá ek ríst ok í rúnum fák,
at sá gengr gumi
ok mælir við mik.[19]

I know a twelfth one if I see,
up in a tree,
a dangling corpse in a noose,
I can so carve and colour the runes,
that the man walks
And talks with me.[20]

The earliest runic inscriptions found on artifacts give the name of either the craftsman or the proprietor, or sometimes, remain a linguistic mystery. Due to this, it is possible that the early runes were not used so much as a simple writing system, but rather as magical signs to be used for charms. Although some say the runes were used for divination, there is no direct evidence to suggest they were ever used in this way. The name rune itself, taken to mean "secret, something hidden", seems to indicate that knowledge of the runes was originally considered esoteric, or restricted to an elite. The 6th-century Björketorp Runestone warns in Proto-Norse using the word rune in both senses:

Haidzruno runu, falahak haidera, ginnarunaz. Arageu haeramalausz uti az. Weladaude, sa'z þat barutz. Uþarba spa.
I, master of the runes(?) conceal here runes of power. Incessantly (plagued by) maleficence, (doomed to) insidious death (is) he who breaks this (monument). I prophesy destruction / prophecy of destruction.[21]

The same curse and use of the word, rune, also is found on the Stentoften Runestone. There also are some inscriptions suggesting a medieval belief in the magical significance of runes, such as the Franks Casket (AD 700) panel.

Charm words, such as auja, laþu, laukaR, and most commonly, alu,[22] appear on a number of Migration period Elder Futhark inscriptions as well as variants and abbreviations of them. Much speculation and study has been produced on the potential meaning of these inscriptions. Rhyming groups appear on some early bracteates that also may be magical in purpose, such as salusalu and luwatuwa. Further, an inscription on the Gummarp Runestone (500-700 AD) gives a cryptic inscription describing the use of three runic letters followed by the Elder Futhark f-rune written three times in succession.[23]

Nevertheless, it has proven difficult to find unambiguous traces of runic "oracles": although Norse literature is full of references to runes, it nowhere contains specific instructions on divination. There are at least three sources on divination with rather vague descriptions that may, or may not, refer to runes: Tacitus's 1st-century Germania, Snorri Sturluson's 13th-century Ynglinga saga, and Rimbert's 9th-century Vita Ansgari.

The first source, Tacitus's Germania, describes "signs" chosen in groups of three and cut from "a nut-bearing tree," although the runes do not seem to have been in use at the time of Tacitus' writings. A second source is the Ynglinga saga, where Granmar, the king of Södermanland, goes to Uppsala for the blót. There, the "chips" fell in a way that said that he would not live long (Féll honum þá svo spánn sem hann mundi eigi lengi lifa). These "chips," however, are easily explainable as a blótspánn (sacrificial chip), which was "marked, possibly with sacrificial blood, shaken, and thrown down like dice, and their positive or negative significance then decided."[24][page needed]

The third source is Rimbert's Vita Ansgari, where there are three accounts of what some believe to be the use of runes for divination, but Rimbert calls it "drawing lots". One of these accounts is the description of how a renegade Swedish king, Anund Uppsale, first brings a Danish fleet to Birka, but then changes his mind and asks the Danes to "draw lots". According to the story, this "drawing of lots" was quite informative, telling them that attacking Birka would bring bad luck and that they should attack a Slavic town instead. The tool in the "drawing of lots," however, is easily explainable as a hlautlein (lot-twig), which according to Foote and Wilson[25] would be used in the same manner as a blótspánn.

The lack of extensive knowledge on historical use of the runes has not stopped modern authors from extrapolating entire systems of divination from what few specifics exist, usually loosely based on the reconstructed names of the runes and additional outside influence.

A recent study of runic magic suggests that runes were used to create magical objects such as amulets,[26][page needed] but not in a way that would indicate that runic writing was any more inherently magical, than were other writing systems such as Latin or Greek.

In Norse mythology, the runic alphabet is attested to a divine origin (Old Norse: reginkunnr). This is attested as early as on the Noleby Runestone from approximately 600 AD that reads Runo fahi raginakundo toj[e'k]a..., meaning "I prepare the suitable divine rune..."[27] and in an attestation from the 9th century on the Sparlösa Runestone, which reads Ok rað runaR þaR rægi[n]kundu, meaning "And interpret the runes of divine origin".[28] More notably, in the Poetic Edda poem Hávamál, Stanza 80, the runes also are described as reginkunnr:

Þat er þá reynt,
er þú að rúnum spyrr
inum reginkunnum,
þeim er gerðu ginnregin
ok fáði fimbulþulr,
þá hefir hann bazt, ef hann þegir.[19]

That is now proved,
what you asked of the runes,
of the potent famous ones,
which the great gods made,
and the mighty sage stained,
that it is best for him if he stays silent.[29]

The poem Hávamál explains that the originator of the runes was the major deity, Odin. Stanza 138 describes how Odin received the runes through self-sacrifice:

Veit ek at ek hekk vindga meiði a
netr allar nío,
geiri vndaþr ok gefinn Oðni,
sialfr sialfom mer,
a þeim meiþi, er mangi veit, hvers hann af rótom renn.

I know that I hung on a windy tree
nine long nights,
wounded with a spear, dedicated to Odin,
myself to myself,
on that tree of which no man knows from where its roots run.[30]

In stanza 139, Odin continues:

Við hleifi mik seldo ne viþ hornigi,
nysta ek niþr,
nam ek vp rvnar,
opandi nam,
fell ek aptr þaðan.

No bread did they give me nor a drink from a horn,
downwards I peered;
I took up the runes,
screaming I took them,
then I fell back from there.[30]

This passage has been interpreted as a mythical representation of shamanic initial rituals in which the initiate must undergo a physical trial in order to receive mystic wisdom.[31]

In the Poetic Edda poem Rígsþula another origin is related of how the runic alphabet became known to humans. The poem relates how Ríg, identified as Heimdall in the introduction, sired three sons (Thrall (slave), Churl (freeman), and Jarl (noble)) by human women. These sons became the ancestors of the three classes of humans indicated by their names. When Jarl reached an age when he began to handle weapons and show other signs of nobility, Rig returned and, having claimed him as a son, taught him the runes. In 1555, the exiled Swedish archbishop Olaus Magnus recorded a tradition that a man named Kettil Runske had stolen three rune staffs from Odin and learned the runes and their magic.

Germanic mysticism and Nazi symbolism

Further information: Runosophy, Armanen runes, Wiligut runes and Runengymnastik

Runic script on an 1886 gravestone in Parkend, England

From 1933, Schutzstaffel unit insignia displayed two Sig Runes

The pioneer of the Armanist branch of Ariosophy and one of the more important figures in esotericism in Germany and Austria in the late 19th and early 20th century was the Austrian occultist, mysticist, and völkisch author, Guido von List. In 1908, he published in Das Geheimnis der Runen ("The Secret of the Runes") a set of eighteen so-called, "Armanen runes", based on the Younger Futhark and runes of List's own introduction, which allegedly were revealed to him in a state of temporary blindness after cataract operations on both eyes in 1902. The use of runes in Germanic mysticism, notably List's "Armanen runes" and the derived "Wiligut runes" by Karl Maria Wiligut, played a certain role in Nazi symbolism. The fascination with runic symbolism was mostly limited to Heinrich Himmler, and not shared by the other members of the Nazi top echelon. Consequently, runes appear mostly in insignia associated with the Schutzstaffel, the paramilitary organization led by Himmler. Wiligut is credited with designing the SS-Ehrenring, which displays a number of "Wiligut runes".

Modern neopaganism and esotericism

Runes are popular in Germanic neopaganism, and to a lesser extent in other forms of Neopaganism and New Age esotericism. Various systems of Runic divination have been published since the 1980s, notably by Ralph Blum (1982), Stephen Flowers (1984, onward), Stephan Grundy (1990), and Nigel Pennick (1995).

The Uthark theory originally was proposed as a scholarly hypothesis by Sigurd Agrell in 1932. In 2002, Swedish esotericist Thomas Karlsson popularized this "Uthark" runic row, which he refers to as, the "night side of the runes", in the context of modern occultism.

J. R. R. Tolkien and contemporary fiction

In J. R. R. Tolkien's novel The Hobbit (1937), the Anglo-Saxon runes are used on a map to emphasize its connection to the Dwarves. They also were used in the initial drafts of The Lord of the Rings, but later were replaced by the Cirth rune-like alphabet invented by Tolkien. Following Tolkien, historical and fictional runes appear commonly in modern popular culture, particularly in fantasy literature, but also in other forms of media such as video games (for example Heimdall video game used it in especially "magical symbols" associated with unnatural forces).

4th Edition Dungeons and Dragons has a class called the Runepriest, who utilizes the original believed use of runes to fight.

Hebrew language

עברית ʿIvrit


[(ʔ)ivˈʁit] - [(ʔ)ivˈɾit][note 1]

Native to

Gaza Strip, Israel, West Bank;[1] used globally as a liturgical language for Judaism

Native speakers

5.3 million as L1 (not all native)  (1998)[2]
as L1 or L2 by all 7.4 million Israelis[2][3]

Language family


Early forms

Biblical Hebrew

Standard forms

Modern Hebrew

Writing system

Hebrew alphabet
Hebrew Braille

Signed forms

Signed Hebrew (oral Hebrew accompanied by sign)[4]

Official status

Official language in


Recognised minority
language in


Regulated by

Academy of the Hebrew Language
האקדמיה ללשון העברית (HaAkademia LaLashon HaʿIvrit)

Language codes

ISO 639-1


ISO 639-2


ISO 639-3

heb – Modern Hebrew
hbo – Ancient Hebrew (liturgical)
smp – Samaritan Hebrew (liturgical)
xdm – Edomite (extinct)
obm – Moabite (extinct)





The Hebrew-speaking world:

  regions where Hebrew is the language of the majority

  regions where Hebrew is the language of a significant minority

This article contains IPA phonetic symbols. Without proper rendering support, you may see question marks, boxes, or other symbols instead of Unicode characters.

Hebrew street sign, above in Hebrew alphabet, below in Latin letter transliteration. Aluf Batslut veAluf Shum(he) ("The Onion Champion and the Garlic Champion") is a play by Hayyim Nahman Bialik.

Hebrew street sign, above in Hebrew alphabet, below in Latin letter transliteration. Aluf Batslut veAluf Shum(he) ("The Onion Champion and the Garlic Champion") is a play by Hayyim Nahman Bialik.

Hebrew (/ˈhbr/; עברית ʿIvrit [ʔivˈʁit] ( listen) or [ʕivˈɾit] ( listen)) is a West Semitic language of the Afroasiatic language family. Historically, it is regarded as the language of the Hebrew Israelites and their ancestors, although the language was not referred to by the name Hebrew in the Tanakh.[note 2] The earliest examples of written Paleo-Hebrew date from the 10th century BCE, in the form of primitive drawings, although "the question of the language used in this inscription remained unanswered, making it impossible to prove whether it was in fact Hebrew or another local language".[9]

Hebrew had ceased to be an everyday spoken language somewhere between 200 and 400 CE, declining since the aftermath of the Bar Kochba War.[8][10][note 3] Aramaic and to a lesser extent Greek were already in use as international languages, especially among elites and immigrants.[12] It survived into the medieval period as the language of Jewish liturgy, rabbinic literature, intra-Jewish commerce, and poetry. Then, in the 19th century, it was revived as a spoken and literary language, and, according to Ethnologue, is now the language of 9 million people worldwide,[13][14] of whom 7 million are from Israel.[3][15] The United States has the second largest Hebrew speaking population, with about 221,593 fluent speakers,[16] mostly from Israel.

Modern Hebrew is one of the two official languages of Israel (the other being Arabic), while pre-modern Hebrew is used for prayer or study in Jewish communities around the world today. Ancient Hebrew is also the liturgical tongue of the Samaritans, while modern Hebrew or Arabic is their vernacular. As a foreign language, it is studied mostly by Jews and students of Judaism and Israel, and by archaeologists and linguists specializing in the Middle East and its civilizations, as well as by theologians in Christian seminaries.

The Torah (the first five books), and most of the rest of the Hebrew Bible, is written in Biblical Hebrew, with much of its present form specifically in the dialect that scholars believe flourished around the 6th century BCE, around the time of the Babylonian exile. For this reason, Hebrew has been referred to by Jews as Leshon HaKodesh (לשון הקדש), "The Holy Language", since ancient times.



The modern word "Hebrew" is derived from the word "Ibri" (plural "Ibrim"), one of several names for the Jewish people. It is traditionally understood to be an adjective based on the name of Abraham's ancestor, Eber ("Ebr" עבר in Hebrew), mentioned in Genesis 10:21. This name is possibly based upon the root "ʕ-b-r" (עבר) meaning "to cross over". Interpretations of the term "ʕibrim" link it to this verb; cross over and homiletical or the people who crossed over the river Euphrates.[17]

In the Bible, the Hebrew language is called Yәhudit (יהודית) because Judah (Yәhuda) was the surviving kingdom at the time of the quotation (late 8th century BCE (Is 36, 2 Kings 18)). In Isaiah 19:18, it is also called the "Language of Canaan" (שפת כנען).


Hebrew belongs to the Canaanite group of languages. In turn, the Canaanite languages are a branch of the Northwest Semitic family of languages.[18]

According to Avraham ben-Yosef, Hebrew flourished as a spoken language in the kingdoms of Israel and Judah, during about 1200 to 586 BCE.[19][unreliable source?] Scholars debate the degree to which Hebrew was a spoken vernacular in ancient times following the Babylonian exile, when the predominant international language in the region was Old Aramaic.

Hebrew was nearly extinct as a spoken language by Late Antiquity, but it continued to be used as a literary language and as the liturgical language of Judaism, evolving various dialects of literary Medieval Hebrew, until its revival as a spoken language in the late 19th century.

Oldest Hebrew inscriptions

In July 2008 Israeli archaeologist Yossi Garfinkel discovered a ceramic shard at Khirbet Qeiyafa which he claimed may be the earliest Hebrew writing yet discovered, dating around 3000 years ago.[20][21] Hebrew University archaeologist Amihai Mazar said that the inscription was “proto-Canaanite" but cautioned that, "The differentiation between the scripts, and between the languages themselves in that period, remains unclear,” and suggested that calling the text Hebrew might be going too far.[22]

The Gezer calendar also dates back to the 10th century BCE at the beginning of the Monarchic Period, the traditional time of the reign of David and Solomon. Classified as Archaic Biblical Hebrew, the calendar presents a list of seasons and related agricultural activities. The Gezer calendar (named after the city in whose proximity it was found) is written in an old Semitic script, akin to the Phoenician one that through the Greeks and Etruscans later became the Roman script. The Gezer calendar is written without any vowels, and it does not use consonants to imply vowels even in the places where later Hebrew spelling requires it.

The Shebna Inscription, from the tomb of a royal steward found in Siloam, dates to the 7th century BCE.

Numerous older tablets have been found in the region with similar scripts written in other Semitic languages, for example Protosinaitic. It is believed that the original shapes of the script go back to Egyptian hieroglyphs, though the phonetic values are instead inspired by the acrophonic principle. The common ancestor of Hebrew and Phoenician is called Canaanite, and was the first to use a Semitic alphabet distinct from Egyptian. One ancient document is the famous Moabite Stone written in the Moabite dialect; the Siloam Inscription, found near Jerusalem, is an early example of Hebrew. Less ancient samples of Archaic Hebrew include the ostraca found near Lachish which describe events preceding the final capture of Jerusalem by Nebuchadnezzar and the Babylonian captivity of 586 BCE.

Biblical Hebrew

Main article: Biblical Hebrew

In its widest sense, Biblical Hebrew means the spoken language of ancient Israel flourishing between the 10th century BCE and the turn of the 4th century CE.[23] It comprises several evolving and overlapping dialects. The phases of Classical Hebrew are often named after important literary works associated with them.

  • Archaic Biblical Hebrew from the 10th to the 6th century BCE, corresponding to the Monarchic Period until the Babylonian Exile and represented by certain texts in the Hebrew Bible (Tanach), notably the Song of Moses (Exodus 15) and the Song of Deborah (Judges 5). Also called Old Hebrew or Paleo-Hebrew. It was written in a form of the Canaanite script. (A script descended from this is still used by the Samaritans, see Samaritan Hebrew language.)

Hebrew script used in writing a Torah scroll. Note ornamental "crowns" on tops of certain letters.

  • Standard Biblical Hebrew around the 8th to 6th centuries BCE, corresponding to the late Monarchic period and the Babylonian Exile. It is represented by the bulk of the Hebrew Bible that attains much of its present form around this time. Also called Biblical Hebrew, Early Biblical Hebrew, Classical Biblical Hebrew (or Classical Hebrew in the narrowest sense).

  • Late Biblical Hebrew, from the 5th to the 3rd centuries BCE, that corresponds to the Persian Period and is represented by certain texts in the Hebrew Bible, notably the books of Ezra and Nehemiah. Basically similar to Classical Biblical Hebrew, apart from a few foreign words adopted for mainly governmental terms, and some syntactical innovations such as the use of the particle shel (of, belonging to). It adopted the Imperial Aramaic script (from which the modern Hebrew script descends).

  • Israelian Hebrew is a proposed northern dialect of biblical Hebrew, attested in all eras of the language, in some cases competing with late biblical Hebrew as an explanation for non-standard linguistic features of biblical texts.

  • Dead Sea Scroll Hebrew from the 3rd century BCE to the 1st century CE, corresponding to the Hellenistic and Roman Periods before the destruction of the Temple in Jerusalem and represented by the Qumran Scrolls that form most (but not all) of the Dead Sea Scrolls. Commonly abbreviated as DSS Hebrew, also called Qumran Hebrew. The Imperial Aramaic script of the earlier scrolls in the 3rd century BCE evolved into the Hebrew square script of the later scrolls in the 1st century CE, also known as ketav Ashuri (Assyrian script), still in use today.

  • Mishnaic Hebrew from the 1st to the 3rd or 4th century CE, corresponding to the Roman Period after the destruction of the Temple in Jerusalem and represented by the bulk of the Mishnah and Tosefta within the Talmud and by the Dead Sea Scrolls, notably the Bar Kokhba Letters and the Copper Scroll. Also called Tannaitic Hebrew or Early Rabbinic Hebrew.

Sometimes the above phases of spoken Classical Hebrew are simplified into "Biblical Hebrew" (including several dialects from the 10th century BCE to 2nd century BCE and extant in certain Dead Sea Scrolls) and "Mishnaic Hebrew" (including several dialects from the 3rd century BCE to the 3rd century CE and extant in certain other Dead Sea Scrolls).[24] However, today, most Hebrew linguists classify Dead Sea Scroll Hebrew as a set of dialects evolving out of Late Biblical Hebrew and into Mishnaic Hebrew, thus including elements from both but remaining distinct from either.[25] By the start of the Byzantine Period in the 4th century CE, Classical Hebrew ceases as a regularly spoken language, roughly a century after the publication of the Mishnah, apparently declining since the aftermath of the catastrophic Bar Kokhba War around 135 CE.

Around the 6th century BCE, the Neo-Babylonian Empire conquered the ancient Kingdom of Judah, destroying much of Jerusalem and exiling its population far to the East in Babylon. During the Babylonian captivity, many Israelites were enslaved within the Babylonian Empire and learned the closely related Semitic language of their captors, Aramaic. The Babylonians had taken mainly the governing classes of Israel while leaving behind presumably more-compliant farmers and laborers to work the land.[citation needed] Thus for a significant period, the Jewish elite became influenced by Aramaic.[26] (see below, Aramaic spoken among Israelites).

After Cyrus the Great conquered Babylon, he released the Jewish people from captivity. "The King of Kings" or Great King of Persia, later gave the Israelites permission to return. As a result, a local version of Aramaic came to be spoken in Israel alongside Hebrew, also the Assyrian empire before that caused Israel to speak a variant of Aramaic for trade, in Israel-Judea these languages co-mingled. The Greek Era saw a brief ban on the Hebrew language until the period of the Hasmoneans. By the beginning of the Common Era Aramaic was the primary colloquial language[dubiousdiscuss] of Samarian, Babylonian and Galileean Jews, and western and intellectual Jews spoke Greek,[citation needed] but a form of so-called Rabbinic Hebrew continued to be used as a vernacular in Judea until it was displaced by Aramaic, probably in the 3rd century CE. Certain Sadducee, Pharisee, Scribe, Hermit, Zealot and Priest classes maintained an insistence on Hebrew, and all Jews maintained their identity with Hebrew songs and simple quotations from Hebrew texts.[27][11][28] Other opinions exist on the exact date range from the 4th century BCE to the end of the Roman period.

Jewish diaspora

Rashi script

A silver matchbox holder with inscription in Hebrew

While there is no doubt that at a certain point, Hebrew was displaced as the everyday spoken language of most Jews, and that its chief successor in the Middle East was the closely related Aramaic language, then Greek,[27][note 4] scholarly opinions on the exact dating of that shift have changed very much.[10] In the first half of the 20th century, most scholars followed Geiger and Dalman in thinking that Aramaic became a spoken language in the land of Israel as early as the beginning of Israel's Hellenistic Period in the 4th century BCE, and that as a corollary Hebrew ceased to function as a spoken language around the same time. Segal, Klausner, and Ben Yehuda are notable exceptions to this view. During the latter half of the 20th century, accumulating archaeological evidence and especially linguistic analysis of the Dead Sea Scrolls has disproven that view. The Dead Sea Scrolls, uncovered in 1946-1948 near Qumran revealed ancient Jewish texts overwhelmingly in Hebrew, not Aramaic.

The Qumran scrolls indicate that Hebrew texts were readily understandable to the average Israelite, and that the language had evolved since Biblical times as spoken languages do.[note 5] Recent scholarship recognizes that reports of Jews speaking in Aramaic indicates a multilingual society, not necessarily the primary language spoken. Alongside Aramaic, Hebrew co-existed within Israel as a spoken language.[30] Most scholars now date the demise of Hebrew as a spoken language to the end of the Roman Period, or about 200 CE.[31] It continued on as a literary language down through the Byzantine Period from the 4th century CE. Many Hebrew linguists[who?] even postulate the survival of Hebrew as a spoken language until the Byzantine Period, but some historians[who?] do not accept this.[citation needed]

The exact roles of Aramaic and Hebrew remain hotly debated. A trilingual scenario has been proposed for the land of Israel. Hebrew functioned as the local mother tongue with powerful ties to Israel's history, origins, and golden age and as the language of Israel's religion; Aramaic functioned as the international language with the rest of the Middle East; and eventually Greek functioned as another international language with the eastern areas of the Roman Empire.[citation needed] Communities of Jews (and non-Jews) are known, who immigrated to Judea from these other lands and continued to speak Aramaic or Greek. According to another summary, Greek was the language of government, Hebrew the language of prayer, study and religious texts, and Aramaic was the language of legal contracts and trade.[32] There was also a geographic pattern: according to Spolsky, by the beginning of the Common Era, "Judeo-Aramaic was mainly used in Galilee in the north, Greek was concentrated in the former colonies and around governmental centers, and Hebrew monolingualism continued mainly in the southern villages of Judea."[27] In other words, "in terms of dialect geography, at the time of the tannaim Palestine could be divided into the Aramaic-speaking regions of Galilee and Samaria and a smaller area, Judaea, in which Rabbinic Hebrew was used among the descendants of returning exiles."[11][28] In addition, it has been surmised that Koine Greek was the primary vehicle of communication in coastal cities and among the upper class of Jerusalem, while Aramaic was prevalent in the lower class of Jerusalem, but not in the surrounding countryside.[32] After the suppression of the Bar Kokhba revolt in the 2nd century CE, Judaeans were forced to disperse. Many relocated to Galilee, so most remaining native speakers of Hebrew at that last stage would have been found in the north.[33]

The Christian New Testament contains some clearly Aramaic place names and quotes.[34] Although the language of such Semitic glosses (and in general the language spoken by Jews in scenes from the New Testament) is usually referred to as "Hebrew"/"Jewish" in the text,[35] this term often seems to refer to Aramaic instead[note 6][note 7] and is rendered accordingly in recent translations.[37] Nonetheless, many glosses can be interpreted as Hebrew as well; and it has been argued that Hebrew, rather than Aramaic or Koine Greek, lay behind the composition of the Gospel of Matthew.[38] (See the Hebrew Gospel hypothesis or Aramaic of Jesus for more details on Hebrew and Aramaic in the gospels.)

Mishnah and Talmud

Main article: Mishnaic Hebrew

The term "Mishnaic Hebrew" generally refers to the Hebrew dialects found in the Talmud תלמוד, excepting quotations from the Hebrew Bible. The dialects organize into Mishnaic Hebrew (also called Tannaitic Hebrew, Early Rabbinic Hebrew, or Mishnaic Hebrew I), which was a spoken language, and Amoraic Hebrew (also called Late Rabbinic Hebrew or Mishnaic Hebrew II), which was a literary language. The earlier section of the Talmud is the Mishnah משנה that was published around 200 CE, though many of the stories take place much earlier, and was written in the earlier Mishnaic dialect. The dialect is also found in certain Dead Sea Scrolls. Mishnaic Hebrew is considered to be one of the dialects of Classical Hebrew that functioned as a living language in the land of Israel. A transitional form of the language occurs in the other works of Tannaitic literature dating from the century beginning with the completion of the Mishnah. These include the halachic Midrashim (Sifra, Sifre, Mechilta etc.) and the expanded collection of Mishnah-related material known as the Tosefta תוספתא. The Talmud contains excerpts from these works, as well as further Tannaitic material not attested elsewhere; the generic term for these passages is Baraitot. The dialect of all these works is very similar to Mishnaic Hebrew.

About a century after the publication of the Mishnah, Mishnaic Hebrew fell into disuse as a spoken language. The later section of the Talmud, the Gemara גמרא, generally comments on the Mishnah and Baraitot in two forms of Aramaic. Nevertheless, Hebrew survived as a liturgical and literary language in the form of later Amoraic Hebrew, which sometimes occurs in the text of the Gemara.

Because as early as the Torah's transcription the Scribe has been the highest position in Judaism, Hebrew was always regarded as the language of Israel's religion, history and national pride, and after it faded as a spoken language, it continued to be used as a lingua franca among scholars and Jews traveling in foreign countries.[39] After the 2nd century CE when the Roman Empire exiled most of the Jewish population of Jerusalem following the Bar Kokhba revolt, the Israelites adapted to the societies in which they found themselves, yet letters, contracts, commerce, science, philosophy, medicine, poetry, and laws continued to be written mostly in Hebrew, which adapted by borrowing and inventing terms.

Medieval Hebrew

Main article: Medieval Hebrew

Aleppo Codex: 10th century Hebrew Bible with Masoretic pointing (Joshua 1:1).

After the Talmud, various regional literary dialects of Medieval Hebrew evolved. The most important is Tiberian Hebrew or Masoretic Hebrew, a local dialect of Tiberias in Galilee that became the standard for vocalizing the Hebrew Bible and thus still influences all other regional dialects of Hebrew. This Tiberian Hebrew from the 7th to 10th century CE is sometimes called "Biblical Hebrew" because it is used to pronounce the Hebrew Bible; however properly it should be distinguished from the historical Biblical Hebrew of the 6th century BCE, whose original pronunciation must be reconstructed. Tiberian Hebrew incorporates the remarkable scholarship of the Masoretes (from masoret meaning "tradition"), who added vowel points and grammar points to the Hebrew letters to preserve much earlier features of Hebrew, for use in chanting the Hebrew Bible. The Masoretes inherited a biblical text whose letters were considered too sacred to be altered, so their markings were in the form of pointing in and around the letters. The Syriac alphabet, precursor to the Arabic alphabet, also developed vowel pointing systems around this time. The Aleppo Codex, a Hebrew Bible with the Masoretic pointing, was written in the 10th century, likely in Tiberias, and survives to this day. It is perhaps the most important Hebrew manuscript in existence.

During the Golden age of Jewish culture in Spain, important work was done by grammarians in explaining the grammar and vocabulary of Biblical Hebrew; much of this was based on the work of the grammarians of Classical Arabic. Important Hebrew grammarians were Judah ben David Hayyuj, Jonah ibn Janah, Abraham ibn Ezra[40] and later (in Provence) David Kimhi. A great deal of poetry was written, by poets such as Dunash ben Labrat, Solomon ibn Gabirol, Judah ha-Levi and the two Ibn Ezras, in a "purified" Hebrew based on the work of these grammarians, and in Arabic quantitative or strophic meters. This literary Hebrew was later used by Italian Jewish poets.[41]

The need to express scientific and philosophical concepts from Classical Greek and Medieval Arabic motivated Medieval Hebrew to borrow terminology and grammar from these other languages, or to coin equivalent terms from existing Hebrew roots, giving rise to a distinct style of philosophical Hebrew. This is used in the translations made by the Ibn Tibbon family. (Original Jewish philosophical works were usually written in Arabic.) Another important influence was Maimonides, who developed a simple style based on Mishnaic Hebrew for use in his law code, the Mishneh Torah. Subsequent rabbinic literature is written in a blend between this style and the Aramaized Rabbinic Hebrew of the Talmud.

Hebrew persevered through the ages as the main language for written purposes by all Jewish communities around the world for a large range of uses—not only liturgy, but also poetry, philosophy, science and medicine, commerce, daily correspondence and contracts. There have been, of course, many deviations from this generalization such as Bar Kokhba's letters to his lieutenants, which were mostly in Aramaic,[42] and Maimonides' writings, which were mostly in Arabic;[43] but overall, Hebrew did not cease to be used for such purposes. This meant not only that well-educated Jews in all parts of the world could correspond in a mutually intelligible language, and that books and legal documents published or written in any part of the world could be read by Jews in all other parts, but that an educated Jew could travel and converse with Jews in distant places, just as priests and other educated Christians could converse in Latin. For example, Rabbi Avraham Danzig wrote the Chayei Adam in Hebrew, as opposed to Yiddish, as a guide to Halacha for the "average 17-year old" (Ibid. Introduction 1). Similarly, the Chofetz Chaim, Rabbi Yisrael Meir Kagan’s purpose in writing the Mishna Berurah was to “produce a work that could be studied daily so that Jews might know the proper procedures to follow minute by minute”. The work was nevertheless written in Talmudic Hebrew and Aramaic, since, “the ordinary Jew [of Eastern Europe] of a century ago, was fluent enough in this idiom to be able to follow the Mishna Berurah without any trouble.”[44]


Main article: Revival of the Hebrew language

Hebrew has been revived several times as a literary language, most significantly by the Haskalah (Enlightenment) movement of early and mid-19th-century Germany. Near the end of that century the Jewish activist Eliezer Ben-Yehuda, owing to the ideology of the national revival (Shivat Tziyon [(שיבת ציון)],[note 8] later Zionism), began reviving Hebrew as a modern spoken language. Eventually, as a result of the local movement he created, but more significantly as a result of the new groups of immigrants known under the name of the Second Aliyah, it replaced a score of languages spoken by Jews at that time. Those languages were Jewish dialects such as the Judeo-Spanish language (also called Judezmo or Ladino), Yiddish, Judeo-Arabic, and Bukharian language, or local languages spoken in the Jewish diaspora such as Russian, Persian, and Arabic.

The major result of the literary work of the Hebrew intellectuals along the 19th century was a lexical modernization of Hebrew. New words and expressions were adapted as neologisms from the large corpus of Hebrew writings since the Hebrew Bible, or borrowed from Arabic (mainly by Eliezer Ben-Yehuda) and older Aramaic and Latin. Many new words were either borrowed from or coined after European languages, especially English, Russian, German, and French. Modern Hebrew became an official language in British-ruled Palestine in 1921 (along with English and Arabic), and then in 1948 became an official language of the newly declared State of Israel. Hebrew is the most widely spoken language in Israel today.

In the Modern Period, from the 19th century onward, the literary Hebrew tradition revived as the spoken language of modern Israel, called variously Israeli Hebrew, Modern Israeli Hebrew, Modern Hebrew, New Hebrew, Israeli Standard Hebrew, Standard Hebrew, and so on. Israeli Hebrew exhibits some features of Sephardic Hebrew from its local Jerusalemite tradition but adapts it with numerous neologisms, borrowed terms (often technical) from European languages and adopted terms (often colloquial) from Arabic.

Eliezer Ben-Yehuda

The literary and narrative use of Hebrew was revived beginning with the Haskalah (Enlightenment) movement. The first secular periodical in Hebrew, Hameassef (The Gatherer), was published by Maskilim literati in Königsberg (today's Kaliningrad) from 1783 onwards.[45] In the mid-19th century, publications of several Eastern European Hebrew-language newspapers (e.g. HaMagid, founded in Lyck, Prussia, in 1856) multiplied. Prominent poets were Chaim Nachman Bialik and Shaul Tchernichovsky; there were also novels written in the language.

The revival of the Hebrew language as a mother tongue was initiated in the late 19th century by the efforts of Eliezer Ben-Yehuda. He joined the Jewish national movement and in 1881 immigrated to Palestine, then a part of the Ottoman Empire. Motivated by the surrounding ideals of renovation and rejection of the diaspora "shtetl" lifestyle, Ben-Yehuda set out to develop tools for making the literary and liturgical language into everyday spoken language. However, his brand of Hebrew followed norms that had been replaced in Eastern Europe by different grammar and style, in the writings of people like Ahad Ha'am and others. His organizational efforts and involvement with the establishment of schools and the writing of textbooks pushed the vernacularization activity into a gradually accepted movement. It was not, however, until the 1904-1914 Second Aliyah that Hebrew had caught real momentum in Ottoman Palestine with the more highly organized enterprises set forth by the new group of immigrants. When the British Mandate of Palestine recognized Hebrew as one of the country's three official languages (English, Arabic, and Hebrew, in 1922), its new formal status contributed to its diffusion. A constructed modern language with a truly Semitic vocabulary and written appearance, although often European in phonology, was to take its place among the current languages of the nations.

While many saw his work as fanciful or even blasphemous[46] (because Hebrew was the holy language of the Torah and therefore some thought that it should not be used to discuss everyday matters), many soon understood the need for a common language amongst Jews of the British Mandate who at the turn of the 20th century were arriving in large numbers from diverse countries and speaking different languages. A Committee of the Hebrew Language was established. After the establishment of Israel, it became the Academy of the Hebrew Language. The results of Ben-Yehuda's lexicographical work were published in a dictionary (The Complete Dictionary of Ancient and Modern Hebrew). The seeds of Ben-Yehuda's work fell on fertile ground, and by the beginning of the 20th century, Hebrew was well on its way to becoming the main language of the Jewish population of both Ottoman and British Palestine. At the time, members of the Old Yishuv and a very few Hasidic sects, most notably those under the auspices of Satmar, refused to speak Hebrew and spoke only Yiddish. There remains a sizable population in Jerusalem, particularly in the Meah Shearim area, that prefers to speak Yiddish.[citation needed]

In the Soviet Union, the use of Hebrew, along with other Jewish cultural and religious activities, was suppressed. Soviet authorities considered the use of Hebrew "reactionary" since it was associated with Zionism, and the teaching of Hebrew at primary and secondary schools was officially banned by the Narkompros (Commissariat of Education) as early as 1919, as part of an overall agenda aiming to secularize education (the language itself did not cease to be studied at universities for historical and linguistic purposes[47]). The official ordinance stated that Yiddish, being the spoken language of the Russian Jews, should be treated as their only national language, while Hebrew was to be treated as a foreign language.[48] Hebrew books and periodicals ceased to be published and were seized from the libraries, although liturgical texts were still published until the 1930s. Despite numerous protests,[49] a policy of suppression of the teaching of Hebrew operated from the 1930s on. Later in the 1980s in the USSR, Hebrew studies reappeared due to people struggling for permission to go to Israel (refuseniks). Several of the teachers were imprisoned, for example, Ephraim Kholmyansky, Yevgeny Korostyshevsky and others responsible for a Hebrew learning network connecting many cities of the USSR.

Modern Hebrew

Main article: Modern Hebrew

Hebrew, Arabic and English multilingual signs on an Israeli highway

Dual language Hebrew and English keyboard

Standard Hebrew, as developed by Eliezer Ben-Yehuda, was based on Mishnaic spelling and Sephardi Hebrew pronunciation. However, the earliest speakers of Modern Hebrew had Yiddish as their native language and often brought into Hebrew idioms and calques from Yiddish.

The pronunciation of modern Israeli Hebrew is based mostly on the Sephardic Hebrew pronunciation. However, the language has adapted to Ashkenazi Hebrew phonology in some respects, mainly the following:

  • the elimination of pharyngeal articulation in the letters chet (ח) and ayinע) by many speakers.

  • the conversion of (ר) /r/ from an alveolar flap [ɾ] to a voiced uvular fricative [ʁ] or uvular trill [ʀ], by most of the speakers, like in most varieties of standard German or Yiddish. see Guttural R

  • the pronunciation (by many speakers) of tzere ֵ as [eɪ] in some contexts (sifrey and teysha instead of Sephardic sifré and tésha)

  • the partial elimination of vocal Shva ְ (zman instead of Sephardic zĕman)[50]

  • in popular speech, penultimate stress in proper names (Dvóra instead of Dĕvorá; Yehúda instead of Yĕhudá) and some other words[51]

  • similarly in popular speech, penultimate stress in verb forms with a second person plural suffix (katávtem "you wrote" instead of kĕtavtém).[note 9]

The vocabulary used within the Hebrew language has been altered from its original form due to its reintroduction to various cultures of organic life throughout time. The mouth to ear pedagogical method used in transmitting Hebrew to generations of children has undergone Europeanization in each attempt resulting in the radically unique and unpredictable course that maintains its current form under the classification of Modern Hebrew. This "course that Modern Hebrew has embarked upon is the sure sign that Hebrew has been reborn."[52]

In Israel, Modern Hebrew is currently taught in institutions called Ulpanim (singular: Ulpan). There are government owned as well as private Ulpanim offering online courses and face-to-face programs.

Current status

Hebrew language school

Modern Hebrew is the primary official language of the State of Israel. As of 2013, there are about 9 million Hebrew speakers worldwide,[13][14] of whom 7 million speak it fluently.[3][15][53]

Currently, 90% of Israeli Jews are proficient in Hebrew, and 70% are highly proficient. Some 60% of Israeli Arabs are also proficient in Hebrew, and 30% prefer speaking Hebrew over Arabic. However, Hebrew is the native language of only 49% of Israelis over the age of 20, with Russian, Arabic, Judaeo-Spanish, French, English, and Yiddish being the native tongues of most of the rest. Some 26% of Russian immigrants and 12% of Arabs speak Hebrew poorly or not at all.[54][55]

Due to the current climate of globalization and Americanization, steps have been taken to keep Hebrew the primary language of use, and to prevent large-scale incorporation of English words into Hebrew vocabulary. The Academy of the Hebrew Language of the Hebrew University of Jerusalem currently invents about 2,000 new Hebrew words each year for modern words by finding an original Hebrew word that captures the meaning, as an alternative to incorporating more English words into Hebrew vocabulary. The Haifa municipality has banned officials from using English words in official documents, and is fighting to stop businesses from using only English signs to market their services.[56] In 2012, a Knesset bill for the preservation of the Hebrew language was proposed, which includes the stipulation that all signage in Israel must first and foremost be in Hebrew, as with all speeches by Israeli officials abroad. The bill's author, MK Akram Hasson, stated that the bill was proposed as a response to Hebrew "losing its prestige", and children incorporating more English words into their vocabulary.[57] Hebrew is also an official national minority language in Poland, since 6 January 2005.[5]


This article contains IPA phonetic symbols. Without proper rendering support, you may see question marks, boxes, or other symbols instead of Unicode characters.

Further information: Biblical Hebrew phonology and Modern Hebrew phonology

Biblical Hebrew had a typical Semitic consonant inventory, with pharyngeal /ʕ ħ/, a series of "emphatic" consonants (possibly ejective, but this is debated), lateral fricative /ɬ/, and in its older stages also uvular /χ ʁ/. /χ ʁ/ merged into /ħ ʕ/ in later Biblical Hebrew, and /b ɡ d k p t/ underwent allophonic spirantization to [v ɣ ð x f θ] (known as begadkefat spirantization). The earliest Biblical Hebrew vowel system contained the Proto-Semitic vowels /a aː i iː u uː/ as well as /oː/, but this system changed dramatically over time.

By the time of the Dead Sea Scrolls, /ɬ/ had shifted to /s/ in the Jewish traditions, though for the Samaritans it merged with /ʃ/ instead. (Elisha Qimron 1986. Hebrew of the Dead Sea Scrolls, 29). The Tiberian reading tradition of the Middle Ages had the vowel system /a ɛ e i ɔ o u ă ɔ̆ ɛ̆/, though other Medieval reading traditions had fewer vowels.

A number of reading traditions have been preserved in liturgical use. In Oriental (Sephardi and Mizrahi) Jewish reading traditions, the emphatic consonants are realized as pharyngealized, while the Ashkenazi (eastern European) traditions have lost emphatics and pharyngeals, and show the shift of /w/ to /v/. The Samaritan tradition has a complex vowel system which does not correspond closely to the Tiberian systems.

Modern Hebrew pronunciation developed from a mixture of the different Jewish reading traditions, generally tending towards simplification. Emphatic consonants have shifted to their ordinary counterparts, /w/ to /v/, and [ɣ ð θ] are not present. Many Israelis merge /ʕ ħ/ with /ʔ χ/, do not have contrastive gemination, and pronounce /r/ as a uvular trill [ʀ] rather than an alveolar trill, as in many varieties of Ashkenazi Hebrew. The consonants /tʃ dʒ/ have become phonemic due to loan words, and /w/ has similarly been re-introduced.

Hebrew grammar

Main articles: Hebrew grammar and Modern Hebrew grammar

Hebrew grammar is partly analytic, expressing such forms as dative, ablative, and accusative using prepositional particles rather than grammatical cases. However, inflection plays a decisive role in the formation of the verbs and nouns. E.g. nouns have a construct state, called "smikhut", to denote the relationship of "belonging to": this is the converse of the genitive case of more inflected languages. Words in smikhut are often combined with hyphens. In modern speech, the use of the construct is sometimes interchangeable with the preposition "shel", meaning "of". There are many cases, however, where older declined forms are retained (especially in idiomatic expressions and the like), and "person"-enclitics are widely used to "decline" prepositions.


Like all Semitic languages, the Hebrew language exhibits a pattern of stems consisting typically of "triliteral", or 3-consonant consonantal roots (4-consonant roots also exist), from which nouns, adjectives, and verbs are formed in various ways: e.g. by inserting vowels, doubling consonants, lengthening vowels, and/or adding prefixes, suffixes, or infixes.

Hebrew uses a number of one-letter prefixes that are added to words for various purposes. These are called inseparable prepositions or "Letters of Use" (Hebrew: אותיות השימוש, Otiyot HaShimush). Such items include: the definite article ha- (/ha/) (="the"); prepositions be- (/bə/) (="in"), le- (/lə/) (="to"; a shortened version of the preposition el), mi- (/mi/) (="from"; a shortened version of the preposition min); conjunctions ve- (/və/) (="and"), she- (/ʃe/) (="that"; a shortened version of the Biblical conjunction asher), ke- (/kə/) (="as", "like"; a shortened version of the conjunction kmo).

The vowel accompanying each of these letters may differ from those listed above, depending on the first letter or vowel following it. The rules governing these changes, hardly observed in colloquial speech as most speakers tend to employ the regular form, may be heard in more formal circumstances. For example, if a preposition is put before a word which begins with a moving Shva, then the preposition takes the vowel /i/ (and the initial consonant may be weakened): colloquial be-kfar (="in a village") corresponds to the more formal bi-khfar.

The definite article may be inserted between a preposition or a conjunction and the word it refers to, creating composite words like mé-ha-kfar (="from the village"). The latter also demonstrates the change in the vowel of mi-. With be and le, the definite article is assimilated into the prefix, which then becomes ba or la. Thus *be-ha-matos becomes ba-matos (="in the plane"). Note that this does not happen to (the form of "min" or "mi-" used before the letter "he"), therefore mé-ha-matos is a valid form, which means "from the airplane".

* indicates that the given example is grammatically non-standard.


Like most other languages, the vocabulary of the Hebrew language is divided into verbs, nouns, adjectives, and so on, and its sentence structure can be analyzed by terms like object, subject, and so on. However, speakers of languages such as English, French, Urdu or Persian may find the structure of Hebrew sentences quite surprising.

  • Many Hebrew sentences have several correct orders of words. One can change the order of the words in the sentence and keep the same meaning. For example, the sentence "Dad went working", in Hebrew, includes a word for Dad (אבא aba), for went (הלך halaḵ), and for working (to the working place = לעבודה la-ʿavoda). However, unlike in English, you can put those three words almost in any combination (אבא הלך לעבודה‎/ לעבודה אבא הלך‎/ לעבודה הלך אבא‎/ הלך אבא לעבודה and so on).

  • In Hebrew, there is no word that is supposed to come before every singular noun (i.e. an article)

  • Hebrew sentences do not have to include verbs; the verb To Be in present tense is omitted (although might be implied). For example, the sentence "I am here" (אני פה ani po) has only two words; one for I (אני) and one for here (פה). In the sentence "I am that person" (אני הוא אדם זה ani hu adam ze), the word for "am" corresponds to the word for "he" (הוא). However, this may also be omitted. Thus, the sentence (אני אדם זה) is identical in meaning.

  • Unlike the verb "to have" in English, none of the possession terms in Hebrew is a verb.

  • Though early Biblical Hebrew had a verb-subject-object ordering, this gradually transitioned to a subject-verb-object ordering.[58]

  • All direct objects have to be marked with a preposition in Hebrew, and there is a specific preposition (את et) for direct objects that would not have a preposition marker in English. The English phrase "he ate the cake" would in Hebrew be הוא אכל את העוגה hu akhal et ha'ugah (literally, "He ate את the cake").

Writing system

Main articles: Hebrew alphabet and Hebrew braille

Hebrew alphabet

Modern Hebrew is written from right to left using the Hebrew alphabet, which is an abjad, or consonant-only script of 22 letters. The ancient paleo-Hebrew alphabet is similar to those used for Canaanite and Phoenician. Modern scripts are based on the "square" letter form, known as Ashurit (Assyrian), which was developed from the Aramaic script. A cursive Hebrew script is used in handwriting: the letters tend to be more circular in form when written in cursive, and sometimes vary markedly from their printed equivalents. The medieval version of the cursive script forms the basis of another style, known as Rashi script. When necessary, vowels are indicated by diacritic marks above or below the letter representing the syllabic onset, or by use of matres lectionis, which are consonantal letters used as vowels. Further diacritics are used to indicate variations in the pronunciation of the consonants (e.g. bet/vet, shin/sin); and, in some contexts, to indicate the punctuation, accentuation, and musical rendition of Biblical texts (see Cantillation).

Liturgical use in Judaism

Audio example of liturgical Hebrew



This is a portion of the blessing that is traditionally chanted before the Aliyah La-Torah (reading of the Torah).

Problems playing this file? See media help.

Hebrew has always been used as the language of prayer and study, and the following pronunciation systems are found.

Ashkenazi Hebrew, originating in Central and Eastern Europe, is still widely used in Ashkenazi Jewish religious services and studies in Israel and abroad, particularly in the Haredi and other Orthodox communities. It was influenced by the Yiddish language.

Sephardi Hebrew is the traditional pronunciation of the Spanish and Portuguese Jews and Sephardi Jews in the countries of the former Ottoman Empire, with the exception of Yemenite Hebrew. This pronunciation, in the form used by the Jerusalem Sephardic community, is the basis of the Hebrew phonology of Israeli native speakers. It was influenced by the Judezmo language.

Mizrahi (Oriental) Hebrew is actually a collection of dialects spoken liturgically by Jews in various parts of the Arab and Islamic world. It was possibly influenced by the Aramaic and Arabic languages, and in some cases by Sephardi Hebrew, although some linguists maintain that it is the direct heir of Biblical Hebrew and thus represents the true dialect of Hebrew. The same claim is sometimes made for Yemenite Hebrew or Temanit, which differs from other Mizrahi dialects by having a radically different vowel system, and distinguishing between different diacritically marked consonants that are pronounced identically in other dialects (for example gimel and "ghimel".)

These pronunciations are still used in synagogue ritual and religious study, in Israel and elsewhere, mostly by people who are not native speakers of Hebrew, though some traditionalist Israelis are bi-dialectal.

Many synagogues in the diaspora, even though Ashkenazi by rite and by ethnic composition, have adopted the "Sephardic" pronunciation in deference to Israeli Hebrew. However, in many British and American schools and synagogues, this pronunciation retains several elements of its Ashkenazi substrate, especially the distinction between tsere and segol.

See also


Lingua latīna

Latin inscription in the Colosseum



Native to




Vulgar Latin developed into Romance languages, 6th to 9th centuries; the formal language continued as the scholarly lingua franca of Catholic countries medieval Europe and as the liturgical language of the Roman Catholic Church.

Language family


Writing system

Latin alphabet 

Official status

Official language in

Regulated by

Language codes

ISO 639-1


ISO 639-2


ISO 639-3






Map indicating the greatest extent of the Roman Empire (c. 117 AD) and the area governed by Latin speakers (dark green). Many languages other than Latin, most notably Greek, were spoken within the empire.

Range of the Romance languages, the modern descendants of Latin, in Europe.

This article contains IPA phonetic symbols. Without proper rendering support, you may see question marks, boxes, or other symbols instead of Unicode characters.

Latin (Listeni/ˈlætɪn/; Latin: lingua latīna, IPA: [ˈlɪŋɡʷa laˈtiːna]) is an ancient Italic language[3] originally spoken by the Italic Latins in Latium and Ancient Rome. Along with most European languages, it is a descendant of the ancient Proto-Indo-European language. Influenced by the Etruscan language and using the Greek alphabet as a basis, it took form as what is recognizable as Latin in the Italian Peninsula. Modern Romance languages are continuations of dialectal forms (vulgar Latin) of the language. Additionally many students, scholars, and some members of the Christian clergy speak it fluently, and it is taught in primary, secondary and post-secondary educational institutions around the world.[4][5]

Latin is still used in the creation of new words in modern languages of many different families, including English, and largely in biological taxonomy. Latin and its derivative Romance languages are the only surviving languages of the Italic language family. Other languages of the Italic branch were attested in the inscriptions of early Italy, but were assimilated to Latin during the Roman Republic.

The presence of elements of vernacular speech from the time the earliest authors of the Roman Republic make it clear that colloquial language, the predecessor to Vulgar Latin, existed apart from, and side by side with, the literary, throughout the classical period of the Republic. By the arrival of the late Roman Republic, a standard, literate form had arisen from the speech of the educated, now referred to as Classical Latin. Vulgar Latin, by contrast, was the more rapidly changing colloquial language, which was spoken throughout the empire.[6]

Because of the Roman conquests, Latin spread to many Mediterranean and some northern European regions, and the dialects spoken in these areas, mixed to various degrees with the indigenous languages, developed into the modern Romance languages.[7] Classical Latin slowly changed with the decline of the Roman Empire, as education and wealth became ever scarcer. The consequent Medieval Latin, influenced by various Germanic and proto-Romance languages until expurgated by Renaissance scholars, was used as the language of international communication, scholarship, and science until well into the 18th century, when it began to be supplanted by vernaculars.

Latin is a highly inflected language, with three distinct genders, five to seven noun cases, four verb conjugations, six tenses, three persons, three moods, two voices, two aspects, and two numbers.



The Latin language has been passed down through various forms.


Some inscriptions have been published in an internationally agreed-upon, monumental, multivolume series termed the "Corpus Inscriptionum Latinarum (CIL)". Authors and publishers vary, but the format is about the same: volumes detailing inscriptions with a critical apparatus stating the provenance and relevant information. The reading and interpretation of these inscriptions is the subject matter of the field of epigraphy. About 270,000 inscriptions are known.


Julius Caesar's Commentarii de Bello Gallico is one of the most famous classical Latin texts of the Golden Age of Latin. The unvarnished, journalistic style of this patrician general has long been taught as a model of the urbane Latin officially spoken and written in the floruit of the Roman republic.

The works of several hundred ancient authors who wrote in Latin have survived in whole or in part, in substantial works or in fragments to be analyzed in philology. They are in part the subject matter of the field of Classics. Their works were published in manuscript form before the invention of printing and now exist in carefully annotated printed editions such as the Loeb Classical Library, published by Harvard University Press, or the Oxford Classical Texts, published by Oxford University Press.

Latin translations of modern literature such as The Hobbit, Treasure Island, Robinson Crusoe, Paddington Bear, Winnie the Pooh, The Adventures of Tintin, Asterix, Harry Potter, Walter the Farting Dog, Le Petit Prince, Max und Moritz, How the Grinch Stole Christmas, The Cat in the Hat, and a book of fairy tales, "fabulae mirabiles," are intended to garner popular interest in the language. Additional resources include phrasebooks and resources for rendering everyday phrases and concepts into Latin, such as Meissner's Latin Phrasebook.


Latin influence in English has been significant at all stages of its insular development. In the medieval period, much borrowing from Latin occurred through ecclesiastical usage established by Saint Augustine of Canterbury in the sixth century, or indirectly after the Norman Conquest through the Anglo-Norman language. From the 16th to the 18th centuries, English writers cobbled together huge numbers of new words from Latin and Greek words. These were dubbed "inkhorn terms", as if they had spilled from a pot of ink. Many of these words were used once by the author and then forgotten. Some useful ones, though, survived, such as 'imbibe' and 'extrapolate'. Many of the most common polysyllabic English words are of Latin origin, through the medium of Old French.

Due to the influence of Roman governance and Roman technology on the less developed nations under Roman dominion, those nations adopted Latin phraseology in some specialized areas, such as science, technology, medicine, and law. For example, the Linnaean system of plant and animal classification was heavily influenced by Historia Naturalis, an encyclopedia of people, places, plants, animals, and things published by Pliny the Elder. Roman medicine, recorded in the works of such physicians as Galen, established that today's medical terminology would be primarily derived from Latin and Greek words, the Greek being filtered through the Latin. Roman engineering had the same effect on scientific terminology as a whole. Latin law principles have survived partly in a long list of legal Latin terms.

Many international auxiliary languages have been heavily influenced by Latin. Interlingua, which lays claim to a sizable following, is sometimes considered a simplified, modern version of the language. Latino sine Flexione, popular in the early 20th century, is Latin with its inflections stripped away, among other grammatical changes.


A multi-volume Latin dictionary in the University Library of Graz

Throughout European history, an education in the Classics was considered crucial for those who wished to join literate circles. Instruction in Latin is an essential aspect of Classics. In today's world, a large number of Latin students in America learn from Wheelock's Latin: The Classic Introductory Latin Course, Based on Ancient Authors. This book, first published in 1956,[8] was written by Frederic M. Wheelock, who received a PhD from Harvard University. Wheelock's Latin has become the standard text for many American introductory Latin courses.

The Living Latin movement attempts to teach Latin in the same way that living languages are taught, i.e., as a means of both spoken and written communication. It is available at the Vatican, and at some institutions in the U.S., such as the University of Kentucky and Iowa State University. The British Cambridge University Press is a major supplier of Latin textbooks for all levels, such as the Cambridge Latin Course series. It has also published a subseries of children's texts in Latin by Bell & Forte, which recounts the adventures of a mouse called Minimus.

Latin and Ancient Greek Language - Culture - Linguistics at Duke University in 2014.

In the United Kingdom, the Classical Association encourages the study of antiquity through various means, such as publications and grants. The University of Cambridge,[9] the Open University (OU),[10] a number of prestigious independent schools, for example Eton and Harrow, and Via Facilis,[11] a London based charity, do still run Latin courses. In the United States and Canada, the American Classical League supports every effort to further the study of classics. Its subsidiaries include the National Junior Classical League (with more than 50,000 members), which encourages high school students to pursue the study of Latin, and the National Senior Classical League, which encourages students to continue their study of the classics into college. The league also sponsors the National Latin Exam. Classicist Mary Beard wrote in The Times Literary Supplement in 2006 that the reason for learning Latin is because of what was written in it.[12]

Official status

Latin has been and or is the official language of European states:

  •  Croatia - Latin was the official language of Croatian Parliament (Sabor) from the 13th until the 19th century (1847). The oldest preserved records of the parliamentary sessions (Congregatio Regni totius Sclavonie generalis)—held in Zagreb (Zagabria), Croatia—date from 19 April 1273. An extensive Croatian Latin literature exists.

  •  Poland - officially recognized and widely used[13][14][15][16] between the 9th and 18th centuries, commonly used in foreign relations and popular as a second language among some of the nobility[17]

  •  Holy See - used in the diocese, with Italian being the official language of Vatican City

History of Latin

Main article: History of Latin

A number of historical phases of the language have been recognized, each distinguished by subtle differences in vocabulary, usage, spelling, morphology and syntax. There are no hard and fast rules of classification; different scholars emphasize different features. As a result, the list has variants, as well as alternative names. In addition to the historical phases, Ecclesiastical Latin refers to the styles used by the writers of the Roman Catholic Church, as well as by Protestant scholars, from Late Antiquity onward.

After the Roman Empire in Western Europe fell, and Germanic kingdoms took its place, the Germanic people adopted Latin as a language more suitable to legal, and other more formal, expression.[citation needed]

Old Latin

Main article: Old Latin

The earliest known form of Latin is Old Latin, which was spoken from the Roman Kingdom to the middle Republican period, and is attested both in inscriptions and in some of the earliest extant Latin literary works, such as the comedies of Plautus and Terence. During this period, the Latin alphabet was devised from the Etruscan alphabet. The writing style later changed from an initial right-to-left or boustrophedon[18] to a left-to-right script.[19]

Classical Latin

Main article: Classical Latin

During the late republic and into the first years of the empire, a new Classical Latin arose, a conscious creation of the orators, poets, historians and other literate men, who wrote the great works of classical literature, which were taught in grammar and rhetoric schools. Today's instructional grammars trace their roots to these schools, which served as a sort of informal language academy dedicated to maintaining and perpetuating educated speech.[20][21]

Vulgar Latin

Main articles: Vulgar Latin and Late Latin

Philological analysis of Archaic Latin works, such as those of Plautus, which contain snippets of everyday speech, indicates that a spoken language, Vulgar Latin (sermo vulgi ("the speech of the masses") by Cicero), existed at the same time as the literate Classical Latin. This informal language was rarely written, so philologists have been left with only individual words and phrases cited by Classical authors, as well as those found as graffiti.[22]

As vernacular Latin was free to develop on its own, there is no reason to suppose that the speech was uniform either diachronically or geographically. On the contrary, Romanized European populations developed their own dialects of the language.[23] The Decline of the Roman Empire meant a deterioration in educational standards that brought about Late Latin, a post-classical stage of the language seen in Christian writings of the time. This language was more in line with the everyday speech not only because of a decline in education, but also because of a desire to spread the word to the masses.

Despite dialect variation (which is found in any sufficiently widespread language) the languages of Spain, France, Portugal and Italy retained a remarkable unity in phonological forms and developments, bolstered by the stabilizing influence of their common Christian (Roman Catholic) culture. It was not until the Moorish conquest of Spain in 711 cut off communications between the major Romance regions that the languages began to diverge seriously.[24] The Vulgar Latin dialect that would later become Romanian diverged somewhat more from the other varieties due to its being largely cut off from the unifying influences in the western part of the Empire.

One way to determine whether a Romance language feature was in Vulgar Latin is to compare it with its parallel in Classical Latin. If it was not preferred in classical Latin, then it most likely came from the invisible contemporaneous vulgar Latin. For example, Romance "horse" (cavallo/cheval/caballo/cavalo) came from Latin caballus. However, classical Latin used equus. Caballus therefore was most likely the spoken form (slang).[25]

Vulgar Latin began to diverge into distinct languages by the 9th century at the latest, when the earliest extant Romance writings begin to appear. They were, throughout this period, confined to everyday speech, as, subsequent to Late Latin, Medieval Latin was used for writing.

Medieval Latin

Main article: Medieval Latin

Latin Bible from 1407

Medieval Latin is the written Latin in use during that portion of the post-classical period when no corresponding Latin vernacular existed. The spoken language had developed into the various incipient Romance languages; however, in the educated and official world Latin continued without its natural spoken base. Moreover, this Latin spread into lands that had never spoken Latin, such as the Germanic and Slavic nations. It became useful for international communication between the member states of the Holy Roman Empire and its allies.

Without the institutions of the Roman empire that had supported its uniformity, medieval Latin lost its linguistic cohesion: for example, in classical Latin sum and eram are used as auxiliary verbs in the perfect and pluperfect passive, which are compound tenses. Medieval Latin might use fui and fueram instead.[26] Furthermore the meanings of many words have been changed and new vocabularies have been introduced from the vernacular. Identifiable individual styles of classically incorrect Latin prevail.[26]

Renaissance Latin

Main article: Renaissance Latin

Most 15th century printed books (incunabula) were in Latin, with the vernacular languages playing only a secondary role.[27]

The Renaissance briefly reinforced the position of Latin as a spoken language, through its adoption by the Renaissance Humanists. Often led by members of the clergy, they were shocked by the accelerated dismantling of the vestiges of the classical world and the rapid loss of its literature. They strove to preserve what they could. It was they who introduced the practice of producing revised editions of the literary works that remained by comparing surviving manuscripts, and they who attempted to restore Latin to what it had been. They corrected medieval Latin out of existence no later than the 15th century and replaced it with more formally correct versions supported by the scholars of the rising universities, who attempted, through scholarship, to discover what the classical language had been.

Early modern Latin

Main article: New Latin

During the Early Modern Age, Latin still was the most important language of culture in Europe. Therefore, until the end of the 17th century the majority of books and almost all diplomatic documents were written in Latin. Afterwards, most diplomatic documents were written in French and later just native or agreed-upon languages.

Modern Latin

Main article: Contemporary Latin

The signs at Wallsend Metro station are in English and Latin as a tribute to Wallsend's role as one of the outposts of the Roman Empire.

The largest organization that retains Latin in official and quasi-official contexts is the Catholic Church. Latin remains the language of the Roman Rite; the Tridentine Mass is celebrated in Latin, and although the Mass of Paul VI is usually celebrated in the local vernacular language, it can be and often is said in Latin, in part or whole, especially at multilingual gatherings. Latin is the official language of the Holy See, the primary language of its public journal, the Acta Apostolicae Sedis, and the working language of the Roman Rota. The Vatican City is also home to the world's only ATM that gives instructions in Latin.[28]

In the Anglican Church, after the publication of the Anglican Book of Common Prayer of 1559, a 1560 Latin edition was published for use at universities such as Oxford and the leading public schools, where the liturgy was still permitted to be conducted in Latin[29] and there have been several Latin translations since. Most recently a Latin edition of the 1979 USA Anglican Book of Common Prayer has appeared.[30]

Some films of ancient settings, such as Sebastiane and The Passion of the Christ, have been made with dialogue in Latin for the sake of realism. Occasionally, Latin dialogue is used because of its association with religion or philosophy, in such film/TV series as The Exorcist and Lost ("Jughead"). Subtitles are usually shown for the benefit of those who do not understand Latin. There are also songs written with Latin lyrics. The libretto for the opera-oratorio Oedipus rex (opera) by Igor Stravinsky is in Latin.

Switzerland adopts the country's Latin short name "Helvetia" on coins and stamps, since there is no room to use all of the nation's four official languages. For a similar reason it adopted the international vehicle and internet code CH, which stands for Confoederatio Helvetica, the country's full Latin name.

The polyglot European Union has adopted Latin names in the logos of some of its institutions for the sake of linguistic compromise and as a sign of the continent's heritage (e.g. the EU Council: Consilium)

Many organizations today have Latin mottos, such as "Semper paratus" (always ready), the motto of the United States Coast Guard, and "Semper fidelis" (always faithful), the motto of the United States Marine Corps. Several of the states of the United States also have Latin mottos, such as "Montani semper liberi" (Mountaineers are always free), the state motto of West Virginia; "Sic semper tyrannis" (Thus always for tyrants), that of Virginia; "Qui transtulit sustinet" ("He who transplanted still sustains"), that of Connecticut; "Esse quam videri" (To be rather than to seem), that of North Carolina; "Si quaeris peninsulam amoenam, circumspice" ("If you seek a pleasant peninsula, look about you") that of Michigan. Another Latin motto is "Per ardua ad astra" (Through adversity/struggle to the stars), the motto of the RAF. Some schools adopt Latin mottos such as "Disce aut discede" of the Royal College, Colombo. Harvard University's motto is "Veritas" meaning (truth). Veritas was the goddess of truth, a daughter of Saturn, and the mother of Virtue.

Similarly Canada's motto "A mari usque ad mare" (from sea to sea) and most provincial mottos are also in Latin (for example, British Columbia's is Splendor Sine Occasu (splendor without diminishment).

Occasionally, some media outlets broadcast in Latin, which is targeted at enthusiasts. Notable examples include Radio Bremen in Germany, YLE radio in Finland and Vatican Radio & Television, all of which broadcast news segments and other material in Latin.[31]

There are many websites and forums maintained in Latin by enthusiasts. The Latin Wikipedia has more than 100,000 articles written in Latin.

Latin is taught in many high schools, especially in Europe and the Americas. It is most common in British Public Schools and Grammar Schools, the Italian Liceo classico and Liceo scientifico, the German Humanistisches Gymnasium, the Dutch gymnasium, the Boston Latin School and Boston Latin Academy. In the pontifical universities postgraduate courses of Canon law are taught in Latin and papers should be written in the same language.


Main article: Latin spelling and pronunciation

No inherited verbal knowledge of the ancient pronunciation of Latin exists. It must be reconstructed. Among the data used for reconstruction are explicit statements about pronunciation by ancient authors, misspellings, puns, ancient etymologies, and the spelling of Latin loanwords in other languages.[32]


The consonant phonemes of Classical Latin are shown in the following table.[33]

































During the time of Old and Classical Latin, the Latin alphabet had no distinction between uppercase and lowercase, and the letters ⟨J U W⟩ did not exist. In place of ⟨J U⟩, the letters ⟨I V⟩ were used. ⟨I V⟩ represented both vowels and consonants. Most of the letterforms were similar to modern uppercase, as can be seen in the inscription from the Colosseum shown at the top of the article.

The spelling systems used in Latin dictionaries and modern editions of Latin texts, however, normally use ⟨i u⟩ in place of Classical-era ⟨I V⟩. Some systems use ⟨j v⟩ for the consonant sounds /j w/, except in the combinations ⟨gu su qu⟩, where ⟨v⟩ is never used.

Some notes concerning the mapping of Latin phonemes to English graphemes are given below.




English examples

⟨c⟩, ⟨k⟩


Always hard as k in sky, never soft as in Caesar, cello, or social



As t in stay, never as t in nation



As s in say, never as s in rise or issue



Always hard as g in good, never soft as g in gem


Before ⟨n⟩, as ng in sing



As n in man


Before ⟨c⟩, ⟨x⟩, and ⟨g⟩, as ng in sing



When doubled ⟨ll⟩ and before ⟨i⟩, as clear l in link (l exilis)[34][35]


In all other positions, as dark l in bowl (l pinguis)



Similar to qu in quick, never as qu in antique



Sometimes at the beginning of a syllable, or after ⟨g⟩ and ⟨s⟩, as w in wine, never as v in vine



Sometimes at the beginning of a syllable, as y in yard, never as j in just


Doubled between vowels, as y y in toy yacht



A letter representing ⟨c⟩ + ⟨s⟩: as x in English axe, never as x in example

Doubled consonants in Latin are pronounced long. In English, consonants are only pronounced double between two words or morphemes, as in unnamed, which has a doubled /nn/ like the nn in Latin annus.


Simple vowels





iː ɪ

ʊ uː


eː ɛ

ɔ oː


a aː

In the Classical period, the letter ⟨U⟩ was written as ⟨V⟩, even when used as a vowel. ⟨Y⟩ was adopted to represent upsilon in loanwords from Greek, but it was pronounced like ⟨u⟩ and ⟨i⟩ by some speakers.

Classical Latin distinguished between long and short vowels. During the Classical period, long vowels, except for ⟨I⟩, were frequently marked using the apex, which was sometimes similar to an acute accent ⟨Á É Ó V́ Ý⟩. Long /iː/ was written using a taller version of ⟨I⟩, called i longa "long I": ⟨ꟾ⟩. In modern texts, long vowels are often indicated by a macron ⟨ā ē ī ō ū⟩, and short vowels are usually unmarked, except when necessary to distinguish between words, in which case they are marked with a breve: ⟨ă ĕ ĭ ŏ ŭ⟩.

Long vowels in the Classical period were pronounced with a different quality from short vowels, as well as being longer. The difference is described in table below.

Pronunciation of Latin vowels



modern examples



similar to u in cut when short


similar to a in father when long



as e in pet when short


similar to ey in they when long



as i in sit when short


similar to i in machine when long



as o in sort when short


similar to o in holy when long



similar to u in put when short


similar to u in true when long



similar to ü in German Stück when short (or as short u or i)


as in French lune when long (or as long u or i)

A vowel and ⟨m⟩ at the end of a word, or a vowel and ⟨n⟩ before ⟨s⟩ or ⟨f⟩, is long and nasal, as in monstrum /mõːstrũː/.


Classical Latin had several diphthongs. The two most common were ⟨ae au⟩. ⟨oe⟩ was fairly rare, and ⟨ui eu ei ou⟩ were very rare, at least in native Latin words.[36]

These sequences sometimes did not represent diphthongs. ⟨ae⟩ and ⟨oe⟩ also represented a sequence of two vowels in different syllables in aēnus [aˈeː.nʊs] "of bronze" and coēpit [kɔˈeː.pɪt] "began", and ⟨au ui eu ei ou⟩ represented sequences of two vowels, or of a vowel and one of the semivowels /j w/, in cauē [ˈka.weː] "beware!", cuius [ˈkʊj.jʊs] "whose", monuī [ˈmɔn.ʊ.iː] "I warned", soluī [ˈsɔɫ.wiː] "I released", dēlēuī [deːˈleː.wiː] "I destroyed", eius [ˈɛj.jʊs] "his", and nouus [ˈnɔ.wʊs] "new".

Old Latin had more diphthongs, but most of them changed into long vowels in Classical Latin. The Old Latin diphthong ⟨ai⟩ and the sequence ⟨āī⟩ became Classical ⟨ae⟩. Old Latin ⟨oi⟩ and ⟨ou⟩ changed to Classical ⟨ū⟩, except in a few words, where ⟨oi⟩ became Classical ⟨oe⟩. These two developments sometimes occurred in different words from the same root: for instance, Classical poena "punishment" and pūnīre "to punish".[36] Early Old Latin ⟨ei⟩ usually changed to Classical ⟨ī⟩.[37]

In Vulgar Latin and the Romance languages, ⟨ae au oe⟩ merged with ⟨e ō ē⟩. A similar pronunciation also existed during the Classical Latin period among less educated speakers.[36]

Diphthongs classified by beginning sound




ui /ui̯/


ei /ei̯/

oe /oe̯/
ou /ou̯/


ae /ae̯/
au /au̯/


Main article: Latin alphabet

The Duenos Inscription, from the 6th century BC, is one of the earliest known Old Latin texts.

Latin was written in the Latin alphabet, derived from the Old Italic alphabet, which was in turn drawn from the Greek and ultimately the Phoenician alphabet.[38] This alphabet has continued to be used over the centuries as the script for the Romance, Celtic, Germanic, Baltic, Finnic, and many Slavic languages (Polish, Slovak, Slovene, Croatian and Czech), and has been adopted by many languages around the world, including Vietnamese, the Austronesian languages, many Turkic languages, and most languages in sub-Saharan Africa, the Americas, and Oceania, making it by far the world's single most widely used writing system.

The number of letters in the Latin alphabet has varied. When it was first derived from the Etruscan alphabet, it contained only 21.[39] Later, G was added to represent /ɡ/, which had previously been spelled C; while Z ceased to be included in the alphabet due to non-use, as the language had no voiced alveolar fricative at the time.[40] The letters Y and Z were later added to represent the Greek letters upsilon and zeta respectively in Greek loanwords.[40] W was created in the 11th century from VV. It represented /w/ in Germanic languages, not in Latin, which still uses V for the purpose. J was distinguished from the original I only during the late Middle Ages, as was the letter U from V.[40] Although some Latin dictionaries use J, it is for the most part not used for Latin text as it was not used in classical times, although many other languages use it.

Classical Latin did not contain sentence punctuation, letter case,[41] or interword spacing, though apices were sometimes used to distinguish length in vowels and the interpunct was used at times to separate words. So, the first line of Catullus 3, originally written as

LV́GÉTEÓVENERÉSCVPꟾDINÉSQVE ("Mourn, O Venuses and Cupids")

or with interpunct as


would be rendered in a modern edition as

Lugete, O Veneres Cupidinesque

or with macrons

Lūgēte, Ō Venerēs Cupīdinēsque.

A replica of the Old Roman Cursive inspired by the Vindolanda tablets

The Roman cursive script is commonly found on the many wax tablets excavated at sites such as forts, an especially extensive set having been discovered at Vindolanda on Hadrian's Wall in Britain. Curiously enough, most of the Vindolanda tablets show spaces between words, though spaces were avoided in monumental inscriptions from that era.

Alternate scripts

Occasionally Latin has been written in other scripts:

  • The disputed Praeneste fibula is a 7th-century BC pin with an Old Latin inscription written using the Etruscan script.

  • The rear panel of the early eighth-century Franks Casket has an inscription that switches from Old English in Anglo-Saxon runes to Latin in Latin script and to Latin in runes.


Main article: Latin grammar

Latin is a synthetic, fusional language, in the terminology of linguistic typology. In more traditional terminology, it is an inflected language, although the typologists are apt to say "inflecting". Thus words include an objective semantic element, and also markers specifying the grammatical use of the word. This fusion of root meaning and markers produces very compact sentence elements. For example, amō, "I love," is produced from a semantic element, ama-, "love," to which , a first person singular marker, is suffixed.

The grammatical function can be changed by changing the markers: the word is "inflected" to express different grammatical functions. The semantic element does not change. Inflection uses affixing and infixing. Affixing is prefixing and suffixing. Latin inflections are never prefixed. For example, amābit, "he or she will love", is formed from the same stem, amā-, to which a future tense marker, -bi-, is suffixed, and a third person singular marker, -t, is suffixed. There is an inherent ambiguity: -t may denote more than one grammatical category, in this case either masculine, feminine, or neuter gender. A major task in understanding Latin phrases and clauses is to clarify such ambiguities by an analysis of context. All natural languages contain ambiguities of one sort or another.

The inflections express gender, number, and case in adjectives, nouns, and pronouns—a process called declension. Markers are also attached to fixed stems of verbs, to denote person, number, tense, voice, mood, and aspect—a process called conjugation. Some words are uninflected, not undergoing either process, such as adverbs, prepositions, and interjections.


Main article: Latin declension

A regular Latin noun belongs to one of five main declensions, a group of nouns with similar inflected forms. The declensions are identified by the genitive singular form of the noun. The first declension, with a predominant ending letter of a, is signified by the genitive singular ending of -ae. The second declension, with a predominant ending letter of o, is signified by the genitive singular ending of -i. The third declension, with a predominant ending letter of i, is signified by the genitive singular ending of -is. The fourth declension, with a predominant ending letter of u, is signified by the genitive singular ending of -ūs. And the fifth declension, with a predominant ending letter of e, is signified by the genitive singular ending of -ei.

There are seven Latin noun cases, which also apply to adjectives and pronouns. These mark a noun's syntactic role in the sentence by means of inflections, so word order is not as important in Latin as it is in other less inflected languages, such as English. The general structure and word order of a Latin sentence can therefore vary. The cases are as follows:

  1. Nominative – used when the noun is the subject or a predicate nominative. The thing or person acting; e.g., the girl ran: puella cucurrit, or cucurrit puella

  2. Genitive – used when the noun is the possessor of or connected with an object (e.g., "the horse of the man", or "the man's horse"—in both of these instances, the word man would be in the genitive case when translated into Latin). Also indicates the partitive, in which the material is quantified (e.g., "a group of people"; "a number of gifts"—people and gifts would be in the genitive case). Some nouns are genitive with special verbs and adjectives too (e.g., The cup is full of wine. Poculum plēnum vīnī est. The master of the slave had beaten him. Dominus servī eum verberāverat.)

  3. Dative-- used when the noun is the indirect object of the sentence, with special verbs, with certain prepositions, and if used as agent, reference, or even possessor. (e.g., The merchant hands the stola to the woman. Mercātor fēminae stolam trādit.)

  4. Accusative – used when the noun is the direct object of the subject, and as object of a preposition demonstrating place to which. (e.g., The man killed the boy. Homō necāvit puerum.)

  5. Ablative – used when the noun demonstrates separation or movement from a source, cause, agent, or instrument, or when the noun is used as the object of certain prepositions; adverbial. (e.g., You walked with the boy. cum puerō ambulāvistī.)

  6. Vocative – used when the noun is used in a direct address. The vocative form of a noun is the same as the nominative except for second-declension nouns ending in -us. The -us becomes an -e in the vocative singular. If it ends in -ius (such as fīlius) then the ending is just (filī) (as distinct from the nominative plural (filiī)) in the vocative singular. (e.g., "Master!" shouted the slave. "Domine!" clāmāvit servus.)

  7. Locative – used to indicate a location (corresponding to the English "in" or "at"). This is far less common than the other six cases of Latin nouns and usually applies to cities, small towns, and islands smaller than the island of Rhodes, along with a few common nouns, such as the word domus, house. In the first and second declension singular, its form coincides with the genitive (Roma becomes Romae, "in Rome"). In the plural, and in the other declensions, it coincides with the ablative (Athēnae becomes Athēnīs, "at Athens"). In the case of the fourth declension word domus, the locative form, domī ("at home") differs from the standard form of all the other cases.

Latin lacks both definite and indefinite articles; thus puer currit can mean either "the boy is running" or "a boy is running".


Main article: Latin declension

There are two types of regular Latin adjectives: first and second declension and third declension, so called because their forms are similar, if not identical to, first and second declension and third declension nouns, respectively. Latin adjectives also have comparative (more --, -er) and superlative (most --, est) forms. There are also a number of Latin participles.

Latin numbers are sometimes declined, but more often than not aren't. See Numbers below.

First and second declension adjectives

First and second declension adjectives are declined like first declension nouns for the feminine forms and like second declension nouns for the masculine and neuter forms. For example, for mortuus, mortua, mortuum(dead)', mortua is declined like a regular first declension noun (such as puella (girl)), mortuus is declined like a regular second declension masculine noun (such as dominus (lord, master)), and mortuum is declined like a regular second declension neuter noun ( such as auxilium (help)).

First and second declension -er adjectives

Some first and second declension adjectives have an -er as the masculine nominative singular form. These are declined like regular first and second declension adjectives. Some adjectives keep the e for all of the forms while some adjectives do not.

Third declension adjectives

Third declension adjectives are mostly declined like normal third declension nouns, with a few exceptions. In the plural nominative neuter, for example, the stem is -ia (ex. omnia(all, everything)); while for third declension nouns, the plural nominative neuter ending is -a (ex. capita (head)) They can either have one, two, or three forms for the masculine, feminine, and neuter nominative singular.


Latin participles, like English participles, are formed from a verb. There are a few main types of participles, including:


Latin sometimes uses prepositions, and sometimes does not, depending on the type of prepositional phrase being used. Prepositions can take two cases for their object: the accusative (ex. "apud puerum" (with the boy), with "puerum" being the accusative form of "puer", boy) and the ablative (ex. "sine puero" (without the boy), with "puero" being the ablative form of "puer", boy).


Main article: Latin conjugation

A regular verb in Latin belongs to one of four main conjugations. A conjugation is "a class of verbs with similar inflected forms."[42] The conjugations are identified by the last letter of the verb's present stem. The present stem can be found by taking the -re (or -ri, in the case of a deponent verb) ending off of the present infinitive. The infinitive of the first conjugation ends in -ā-re or -ā-ri (active and passive respectively); e.g., amāre, "to love," hortārī, "to exhort"; of the second conjugation by -ē-re or -ē-rī; e.g., monēre, "to warn", verērī, "to fear;" of the third conjugation by -ere, ; e.g., dūcere, "to lead," ūtī, "to use"; of the fourth by -ī-re, -ī-rī; e.g., audīre, "to hear," experīrī, "to attempt". Irregular verbs may not follow these types, or may be marked in a different way. The "endings" presented above are not the suffixed infinitive markers. The first letter in each case is the last of the stem, because of which the conjugations are also called the a-conjugation, e-conjugation and i-conjugation. The fused infinitive ending is -re or -rī. Third-conjugation stems end in a consonant: the consonant conjugation. Further, there is a subset of the 3rd conjugation, the i-stems, which behave somewhat like the 4th conjugation, as they are both i-stems, one short and the other long.[43] These stem categories descend from Indo-European, and can therefore be compared to similar conjugations in other Indo-European languages.

There are six general tenses in Latin (present, imperfect, future, perfect, pluperfect, and future perfect), three moods (indicative, imperative and subjunctive, in addition to the infinitive, participle, gerund, gerundive and supine), three persons (first, second, and third), two numbers (singular and plural), two voices (active and passive), and three aspects (perfective, imperfective, and stative). Verbs are described by four principal parts:

  1. The first principal part is the first person singular, present tense, indicative mood, active voice form of the verb. If the verb is impersonal, the first principal part will be in the third person singular.

  2. The second principal part is the present infinitive active.

  3. The third principal part is the first person singular, perfect indicative active form. Like the first principal part, if the verb is impersonal, the third principal part will be in the third person singular.

  4. The fourth principal part is the supine form, or alternatively, the nominative singular, perfect passive participle form of the verb. The fourth principal part can show either one gender of the participle, or all three genders (-us for masculine, -a for feminine, and -um for neuter), in the nominative singular. The fourth principal part will be the future participle if the verb cannot be made passive. Most modern Latin dictionaries, if only showing one gender, tend to show the masculine; however, many older dictionaries will instead show the neuter, as this coincides with the supine. The fourth principal part is sometimes omitted for intransitive verbs, although strictly in Latin these can be made passive if used impersonally, and the supine exists for these verbs.

There are six tenses in the Latin language. These are divided into two tense systems: the present system, which is made up of the present, imperfect, and future tenses, and the perfect system, which is made up of the perfect, pluperfect, and future perfect tenses. Each tense has a set of endings corresponding to the person and number referred to. This means that subject (nominative) pronouns are generally unnecessary for the first (I, we) and second (you) persons, unless emphasis on the subject is needed.

The table below displays the common inflected endings for the indicative mood in the active voice in all six tenses. For the future tense, the first listed endings are for the first and second conjugations, while the second listed endings are for the third and fourth conjugations.


1st Person Singular

2nd Person Singular

3rd Person Singular

1st Person Plural

2nd Person Plural

3rd Person Plural









-bō, -am

-bis, -ēs

-bit, -et

-bimus, -ēmus

-bitis, -ētis

-bunt, -ent














Future Perfect














Note that the future perfect endings are identical to the future forms of sum (with the exception of erint) and that the pluperfect endings are identical to the imperfect forms of sum.

Deponent verbs

A number of Latin words are deponent, causing their forms to be in the passive mood, while retaining an active meaning, e.g. hortor, hortārī, hortātus sum (to urge).


As Latin is an Italic language, most of its vocabulary is likewise Italic, deriving ultimately from PIE. However, because of close cultural interaction, the Romans not only adapted the Etruscan alphabet to form the Latin alphabet, but also borrowed some Etruscan words into their language, including persona (mask) and histrio (actor).[44] Latin also included vocabulary borrowed from Oscan, another Italic language.

After the Fall of Tarentum (272 BC), the Romans began hellenizing, or adopting features of Greek culture, including the borrowing of Greek words, such as camera (vaulted roof), sumbolum (symbol), and balineum (bath).[44] This hellenization led to the addition of "Y" and "Z" to the alphabet to represent Greek sounds.[45] Subsequently the Romans transplanted Greek art, medicine, science and philosophy to Italy, paying almost any price to entice Greek skilled and educated persons to Rome, and sending their youth to be educated in Greece. Thus, many Latin scientific and philosophical words were Greek loanwords or had their meanings expanded by association with Greek words, as ars (craft) and τέχνη.[46]

Because of the Roman Empire’s expansion and subsequent trade with outlying European tribes, the Romans borrowed some northern and central European words, such as beber (beaver), of Germanic origin, and bracae (breeches), of Celtic origin.[46] The specific dialects of Latin across Latin-speaking regions of the former Roman Empire after its fall were influenced by languages specific to the regions. These spoken Latins evolved into particular Romance languages.

During and after the adoption of Christianity into Roman society, Christian vocabulary became a part of the language, formed either from Greek or Hebrew borrowings, or as Latin neologisms.[47] Continuing into the Middle Ages, Latin incorporated many more words from surrounding languages, including Old English and other Germanic languages.

Over the ages, Latin-speaking populations produced new adjectives, nouns, and verbs by affixing or compounding meaningful segments.[48] For example, the compound adjective, omnipotens, "all-powerful," was produced from the adjectives omnis, "all", and potens, "powerful", by dropping the final s of omnis and concatenating. Often the concatenation changed the part of speech; i.e., nouns were produced from verb segments or verbs from nouns and adjectives.[49]


Here the phrases are mentioned with accents to know where to stress.[50] In the Latin language, most of the Latin words are stressed at the second to last (penultimate) syllable, called in Latin paenultimus or syllaba paenultima.[51] Lesser words are stressed at the third to last syllable, called in Latin antepaenultimus or syllaba antepaenultima.[51]

sálve to one person / salvéte to more than one person - hello

áve to one person / avéte to more than one person - greetings

vále to one person / valéte to more than one person - goodbye

cúra ut váleas - take care

exoptátus to male / exoptáta to female, optátus to male / optáta to female, grátus to male / gráta to female, accéptus to male / accépta to female - welcome

quómodo váles?, ut váles? - how are you?

béne - good

amabo te - please

béne váleo - I'm fine

mále - bad

mále váleo - I'm not good

quáeso (['kwajso]/['kwe:so]) - please

íta, íta est, íta véro, sic, sic est, étiam - yes

non, minime - no

grátias tíbi, grátias tíbi ágo - thank you

mágnas grátias, mágnas grátias ágo - many thanks

máximas grátias, máximas grátias ágo, ingéntes grátias ágo - thank you very much

accípe sis to one person / accípite sítis to more than one person, libénter - you're welcome

qua aetáte es? - how old are you?

25 ánnos nátus to male / 25 ánnos náta to female - 25 years old

loquerísne ... - do you speak ...

  • Latíne? - Latin?

  • Gráece? (['grajke]/['gre:ke]) - Greek?

  • Ánglice? (['aŋlike]) - English?

  • Italiáne? - Italian?

  • Gallice? - French?

  • Hispánice? - Spanish?

  • Lusitánice? - Portuguese?

  • Theodísce? ([teo'diske]) - German?

  • Sínice? - Chinese?

  • Iapónice? ([ja'po:nike]) - Japanese?

  • Coreane? - Korean?

  • Tagale? - Tagalog?

  • Arábice? - Arabic?

  • Pérsice? - Persian?

  • Indice? - Hindi?

  • Rússice? - Russian?

úbi latrína est? - where is the toilet?

ámo te / te ámo - I love you


In ancient times, numbers in Latin were only written with letters. Today, the numbers can be written with the Arabic numbers as well as with Roman numerals. The numbers 1, 2 and 3, and from 200 to 900, are declined as nouns and adjectives with some differences.

ūnus, ūna, ūnum (masculine, feminine, neuter)



duo, duae, duo (m., f., n.)



trēs, tria (m./f., n.)


























Fifty (50)



One Hundred (100)



Five Hundred (500)



One Thousand (1000)

The numbers from quattuor (four) to centum (one hundred) do not change their endings.

Example text

Commentarii de Bello Gallico, also called De Bello Gallico (The Gallic War), written by Gaius Julius Caesar, begins with the following passage:

Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur. Hi omnes lingua, institutis, legibus inter se differunt. Gallos ab Aquitanis Garumna flumen, a Belgis Matrona et Sequana dividit. Horum omnium fortissimi sunt Belgae, propterea quod a cultu atque humanitate provinciae longissime absunt, minimeque ad eos mercatores saepe commeant atque ea quae ad effeminandos animos pertinent important, proximique sunt Germanis, qui trans Rhenum incolunt, quibuscum continenter bellum gerunt. Qua de causa Helvetii quoque reliquos Gallos virtute praecedunt, quod fere cotidianis proeliis cum Germanis contendunt, cum aut suis finibus eos prohibent aut ipsi in eorum finibus bellum gerunt. Eorum una pars, quam Gallos obtinere dictum est, initium capit a flumine Rhodano, continetur Garumna flumine, Oceano, finibus Belgarum; attingit etiam ab Sequanis et Helvetiis flumen Rhenum; vergit ad septentriones. Belgae ab extremis Galliae finibus oriuntur; pertinent ad inferiorem partem fluminis Rheni; spectant in septentrionem et orientem solem. Aquitania a Garumna flumine ad Pyrenaeos montes et eam partem Oceani quae est ad Hispaniam pertinet; spectat inter occasum solis et septentriones.

See also

  • "Schools". Britannica (1911 ed.).

  • Nordhoff, Sebastian; Hammarström, Harald; Forkel, Robert; Haspelmath, Martin, eds. (2013). "Latin". Glottolog 2.2. Leipzig: Max Planck Institute for Evolutionary Anthropology.

  • Bryson, Bill (1996). The mother tongue: English and how it got that way. New York: Avon Books. pp. 33–34. ISBN 0-14-014305-X.

  • Who only knows Latin can go across the whole Poland from one side to the other one just like he was at his own home, just like he was born there. So great happiness! I wish a traveler in England could travel without knowing any other language than Latin!, Daniel Defoe, 1728

  • Anatol Lieven, The Baltic Revolution: Estonia, Latvia, Lithuania and the Path to Independence, Yale University Press, 1994, ISBN 0-300-06078-5, Google Print, p.48

  • Kevin O'Connor, Culture And Customs of the Baltic States, Greenwood Press, 2006, ISBN 0-313-33125-1, Google Print, p.115

  • Karin Friedrich et al., The Other Prussia: Royal Prussia, Poland and Liberty, 1569–1772, Cambridge University Press, 2000, ISBN 0-521-58335-7, Google Print, p.88

  • Sacks, David (2003). Language Visible: Unraveling the Mystery of the Alphabet from A to Z. London: Broadway Books. p. 80. ISBN 0-7679-1172-5.

  • Pope, Mildred K (1966). From Latin to modern French with especial consideration of Anglo-Norman; phonology and morphology. Publications of the University of Manchester, no. 229. French series, no. 6. Manchester: Manchester university press. p. 3.

  • Monroe, Paul (1902). Source book of the history of education for the Greek and Roman period. London, New York: Macmillan & Co. pp. 346–352.

  • Pei, Mario; compiled,, ; Gaeng, arranged by Paul A. (1976). The story of Latin and the Romance languages (1st ed.). New York: Harper & Row. pp. 76–81. ISBN 0-06-013312-0.

  • Elabani, Moe (1998). Documents in medieval Latin. Ann Arbor: University of Michigan Press. pp. 13–15. ISBN 0-472-08567-0.

  • "Conjugation". Webster's II new college dictionary. Boston: Houghton Mifflin. 1999.

  • Wheelock, Frederic M. (2011). Wheelock's Latin (7th ed.). New York: CollinsReference.

  • Sacks, David (2003). Language Visible: Unraveling the Mystery of the Alphabet from A to Z. London: Broadway Books. p. 351. ISBN 0-7679-1172-5.

  • Ebbe Vilborg - Norstedts svensk-latinska ordbok - Second edition, 2009.

  • Tore Janson - Latin - Kulturen, historien, språket - First edition, 2009.


Trilingual cuneiform inscription of Xerxes at Van Fortress in Turkey, written in Old Persian, Akkadian, and Elamite


Logographic and syllabic


Akkadian, Eblaite, Elamite, Hattic, Hittite, Hurrian, Luwian, Sumerian, Urartian, Old Persian

Time period

c. 31st century B.C.E. to 1st century C.E.

Parent systems


  • Cuneiform

Child systems

influenced shape of Ugaritic
apparently inspired Old Persian

ISO 15924

Xsux, 020



Unicode alias


Unicode range

U+12000 to U+123FF (Sumero-Akkadian Cuneiform)
U+12400 to U+1247F (Numbers)

This article contains IPA phonetic symbols. Without proper rendering support, you may see question marks, boxes, or other symbols instead of Unicode characters.

This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols.

Cuneiform script[nb 1] is one of the earliest known systems of writing,[1] distinguished by its wedge-shaped marks on clay tablets, made by means of a blunt reed for a stylus. The name cuneiform itself simply means "wedge shaped", from the Latin cuneus "wedge" and forma "shape," and came into English usage probably from Old French cunéiforme.

In use in Sumer as early as the late 4th millennium B.C.E. (the Uruk IV period), cuneiform writing began as a system of pictographs. In the third millennium, the pictorial representations became simplified and more abstract as the number of characters in use grew smaller, from about 1,000 in the Early Bronze Age to about 400 in Late Bronze Age (Hittite cuneiform). The system consists of a combination of logophonetic, consonantal alphabetic and syllabic signs.[2]

The original Sumerian script was adapted for the writing of the Akkadian, Eblaite, Elamite, Hittite, Luwian, Hattic, Hurrian, and Urartian languages, and it inspired the Ugaritic and Old Persian alphabets. Cuneiform writing was gradually replaced by the Phoenician alphabet during the Neo-Assyrian Empire. By the 2nd century C.E., the script had become extinct, and all knowledge of how to read it was lost until it began to be deciphered in the 19th century.

Between half a million[3] and two million cuneiform tablets are estimated to have been excavated in modern times. Of these, only approximately 30,000[4] – 100,000 have been read or published in modern time. The British Museum holds the largest collection, c.130,000, followed by the Vorderasiatisches Museum Berlin, the Louvre, the Istanbul Archaeology Museums, the National Museum of Iraq, the Yale Babylonian Collection (c.40,000) and Penn Museum. Most of these have "lain in these collections for a century without being translated, studied or published,"[3] as there are only a few hundred qualified cuneiformists in the world.[4]



The cuneiform writing system was in use for more than three millennia, through several stages of development, from the 34th century B.C.E. down to the 2nd century C.E.[5] Ultimately, it was completely replaced by alphabetic writing (in the general sense) in the course of the Roman era and there are no Cuneiform systems in current use. It had to be deciphered as a completely unknown writing system in 19th-century Assyriology. Successful completion of its decipherment is dated to 1857.

The cuneiform script underwent considerable changes over a period of more than two millennia. The image below shows the development of the sign SAG "head" (Borger nr. 184, U+12295 ?).


  1. shows the pictogram as it was drawn around 3000 B.C.E.

  2. shows the rotated pictogram as written around 2800 B.C.E.

  3. shows the abstracted glyph in archaic monumental inscriptions, from c. 2600 B.C.E.

  4. is the sign as written in clay, contemporary to stage 3

  5. represents the late 3rd millennium

  6. represents Old Assyrian ductus of the early 2nd millennium, as adopted into Hittite

  7. is the simplified sign as written by Assyrian scribes in the early 1st millennium, and until the script's extinction.

Proto-literate period

Sumerian inscription in monumental archaic style, c. 26th century BC..E.

The cuneiform script proper developed from pictographic proto-writing in the late 4th millennium B.C.E. Mesopotamia's "proto-literate" period spans roughly the 35th to 32nd centuries. The first documents unequivocally written in the Sumerian language date to c. the 31st century, found at Jemdet Nasr.

Originally, pictographs were either drawn on clay tablets in vertical columns with a sharpened reed stylus, or incised in stone. This early style lacked the characteristic wedge shape of the strokes.

Certain signs to indicate names of gods, countries, cities, vessels, birds, trees, etc., are known as determinatives, and were the Sumerian signs of the terms in question, added as a guide for the reader. Proper names continued to be usually written in purely "logographic" fashion.

The earliest known Sumerian king whose name appears on contemporary cuneiform tablets is Enmebaragesi of Kish. Surviving records only very gradually become less fragmentary and more complete for the following reigns, but by the end of the pre-Sargonic period, it had become standard practice for each major city-state to date documents by year-names commemorating the exploits of its lugal (king).

From about 2900 B.C.E., many pictographs began to lose their original function, and a given sign could have various meanings depending on context. The sign inventory was reduced from some 1,500 signs to some 600 signs, and writing became increasingly phonological. Determinative signs were re-introduced to avoid ambiguity. Cuneiform writing proper thus arises from the more primitive system of pictographs at about that time (Early Bronze Age II).

Archaic cuneiform

Further information: Liste der archaischen Keilschriftzeichen

Letter sent by the high-priest Lu'enna to the king of Lagash (maybe Urukagina), informing him of his son's death in combat, c. 2400 B.C.E., found in Telloh (ancient Girsu).

In the mid-3rd millennium B.C.E., writing direction was changed to left to right in horizontal rows (rotating all of the pictographs 90° counter-clockwise in the process), and a new wedge-tipped stylus was used which was pushed into the clay, producing wedge-shaped ("cuneiform") signs; these two developments made writing quicker and easier. By adjusting the relative position of the tablet to the stylus, the writer could use a single tool to make a variety of impressions.

Cuneiform tablets could be fired in kilns to provide a permanent record, or they could be recycled if permanence was not needed. Many of the clay tablets found by archaeologists were preserved because they were fired when attacking armies burned the building in which they were kept.

The script was also widely used on commemorative stelae and carved reliefs to record the achievements of the ruler in whose honor the monument had been erected.

The spoken language consisted of many similar sounds, and in the beginning similar sounding words such as "life" [til] and "arrow" [ti] were described in writing by the same symbol. After the Semites conquered Southern Mesopotamia, some signs gradually changed from being pictograms to syllabograms, most likely to make things clearer in writing. In that way the sign for the word "arrow" would become the sign for the sound "ti". If a sound would represent many different words the words would all have different signs, for instance the syllable "gu" had fourteen different symbols. When the words had similar meaning but very different sounds they were written with the same symbol. For instance "tooth" [zu], "mouth" [ka] and "voice" [gu] were all written with the symbol for "voice". To be more accurate they started adding to signs or combine two signs to define the meaning. They used either geometrical patterns or another cuneiform sign.[2]

As time went by the cuneiform got very complex and the distinction between a pictogram and syllabogram became vague. Several symbols had too many meanings to permit clarity. Therefore, symbols were put together to indicate both the sound and the meaning of compound. The word "Raven" [UGA] had the same logogram as the words "soap" [NAGA] "name of a city" [ERESH] and "the patron goddess of Eresh" [NISABA]. Two phonetic complements were used to define the word [u] in front of the symbol and [gu] behind. Finally the symbol for "bird" [MUSHEN] was added to ensure proper interpretation. The written part of the Sumerian language was used as a learned written language until the 1st century C.E. The spoken language died out around the 18th century B.C.E.[2]

Akkadian cuneiform

A list of Sumerian deities, c. 2400 B.C.E.

The archaic cuneiform script was adopted by the Akkadians from c. 2500 B.C.E., and by 2000 B.C.E. had evolved into Old Assyrian cuneiform, with many modifications to Sumerian orthography. The Semitic equivalents for many signs became distorted or abbreviated to form new "phonetic" values, because the syllabic nature of the script as refined by the Sumerians was unintuitive to Semitic speakers. At this stage, the former pictograms were reduced to a high level of abstraction, and were composed of only five basic wedge shapes: horizontal, vertical, two diagonals and the Winkelhaken impressed vertically by the tip of the stylus. The signs exemplary of these basic wedges are

  • AŠ (B001, U+12038) ?: horizontal;

  • DIŠ (B748, U+12079) ?: vertical;

  • GE23, DIŠ tenû (B575, U+12039) ?: downward diagonal;

  • GE22 (B647, U+1203A) ?: upward diagonal;

  • U (B661, U+1230B) ?: the Winkelhaken.

Except for the Winkelhaken which has no tail, the length of the wedges' tails could vary as required for sign composition.

Signs tilted by about 45 degrees are called tenû in Akkadian, thus DIŠ is a vertical wedge and DIŠ tenû a diagonal one. If a sign is modified with additional wedges, this is called gunû or "gunification;" if signs are crosshatched with additional Winkelhaken, they are called šešig; if signs are modified by the removal of a wedge or wedges, they are called nutillu.

Cuneiform tablet from the Kirkor Minassian collection in the US Library of Congress, c. 24th century B.C.E.

One of the Amarna letters, 14th century B.C.E.

Neo-Assyrian ligature KAxGUR7 (?); the KA sign (?) was a Sumerian compound marker, and appears frequently in ligatures enclosing other signs. GUR7 is itself a ligature of SÍG.AḪ.ME.U, meaning "to pile up; grain-heap" (Akkadian kamāru; karû).

"Typical" signs have usually in the range of about five to ten wedges, while complex ligatures can consist of twenty or more (although it is not always clear if a ligature should be considered a single sign or two collated but still distinct signs); the ligature KAxGUR7 consists of 31 strokes.

Most later adaptations of Sumerian cuneiform preserved at least some aspects of the Sumerian script. Written Akkadian included phonetic symbols from the Sumerian syllabary, together with logograms that were read as whole words. Many signs in the script were polyvalent, having both a syllabic and logographic meaning. The complexity of the system bears a resemblance to Old Japanese, written in a Chinese-derived script, where some of these Sinograms were used as logograms, and others as phonetic characters.

Assyrian cuneiform

This "mixed" method of writing continued through the end of the Babylonian and Assyrian empires, although there were periods when "purism" was in fashion and there was a more marked tendency to spell out the words laboriously, in preference to using signs with a phonetic complement. Yet even in those days, the Babylonian syllabary remained a mixture of logographic and phonemic writing.

Hittite cuneiform is an adaptation of the Old Assyrian cuneiform of c. 1800 B.C.E. to the Hittite language. When the cuneiform script was adapted to writing Hittite, a layer of Akkadian logographic spellings was added to the script, thus the pronunciations of many Hittite words which were conventionally written by logograms are now unknown.

In the Iron Age (c. 10th to 6th centuries B.C.E.), Assyrian cuneiform was further simplified. From the 6th century, the Assyrian language was marginalized by Aramaic, written in the Aramaean alphabet, but Neo-Assyrian cuneiform remained in use in literary tradition well into Parthian times (250 B.C.E. – 226 C.E.). The last known cuneiform inscription, an astronomical text, was written in 75 C.E.[6]

Derived scripts

The complexity of the system prompted the development of a number of simplified versions of the script. Old Persian was written in a subset of simplified cuneiform characters known today as Old Persian cuneiform. It formed a semi-alphabetic syllabary, using far fewer wedge strokes than Assyrian used, together with a handful of logograms for frequently occurring words like "god" and "king". The Ugaritic language was written using the Ugaritic alphabet, a standard Semitic style alphabet (an abjad) written using the cuneiform method.


For centuries, travellers to Persepolis, in modern-day Iran, had noticed carved cuneiform inscriptions and were intrigued.[7] Attempts at deciphering these Old Persian writings date back to Arabic/Persian historians of the medieval Islamic world, though these early attempts at decipherment were largely unsuccessful.[8]

In the 15th century the Venetian Barbero explored ancient ruins in the Middle East and came back with news of a very odd writing he had found carved on the stones in the temples of Shiraz and on many clay tablets.

In 1625 the Roman traveler Pietro Della Valle, coming back from Mesopotamia and Persia, brought back a tablet written with cuneiform glyphs he had found in Ur, and also the copy of five characters he had seen in Persepolis. Della Valle understood that the writing had to be read from left to right, following the direction of wedges, but did not attempt to decipher the scripts.

Englishman Sir Thomas Herbert, in the 1634 edition of his travel book A relation of some yeares travaile, reported seeing at Persepolis carved on the wall “a dozen lines of strange characters…consisting of figures, obelisk, triangular, and pyramidal” and thought they resembled Greek. In the 1664 edition he reproduced some and thought they were ‘legible and intelligible’ and therefore decipherable. He also guessed, correctly, that they represented not letters or hieroglyphics but words and syllables, and were to be read from left to right.[7] Herbert is rarely mentioned in standard histories of the decipherment of cuneiform.

Carsten Niebuhr brought the first reasonably complete and accurate copies of the inscriptions at Persepolis to Europe.[7] Bishop Friedrich Münter of Copenhagen discovered that the words in the Persian inscriptions were divided from one another by an oblique wedge and that the monuments must belong to the age of Cyrus and his successors. One word, which occurs without any variation towards the beginning of each inscription, he correctly inferred to signify "king".[7] By 1802 Georg Friedrich Grotefend had determined that two king's names mentioned were Darius and Xerxes (but in their native Old Persian forms, which were unknown at the time and therefore had to be conjectured), and had been able to assign correct alphabetic values to the cuneiform characters which composed the two names.[9] Although Grotefend's Memoir was presented to the Göttingen Academy on September 4, 1802, the Academy refused to publish it; it was subsequently published in Heeren's work in 1815, but was overlooked by most researchers at the time.[10]

In 1836, the eminent French scholar Eugène Burnouf discovered that the first of the inscriptions published by Niebuhr contained a list of the satrapies of Darius. With this clue in his hand, he identified and published an alphabet of thirty letters, most of which he had correctly deciphered.[7][11][12]

A month earlier, a friend and pupil of Burnouf's, Professor Christian Lassen of Bonn, had also published his own work on The Old Persian Cuneiform Inscriptions of Persepolis.[12][13] He and Burnouf had been in frequent correspondence, and his claim to have independently detected the names of the satrapies, and thereby to have fixed the values of the Persian characters, was consequently fiercely attacked. According to Sayce, whatever his obligations to Burnouf may have been, Lassen's "contributions to the decipherment of the inscriptions were numerous and important. He succeeded in fixing the true values of nearly all the letters in the Persian alphabet, in translating the texts, and in proving that the language of them was not Zend, but stood to both Zend and Sanskrit in the relation of a sister".[7]

Meanwhile, in 1835 Henry Rawlinson, a British East India Company army officer, visited the Behistun Inscriptions in Persia. Carved in the reign of King Darius of Persia (522–486 B.C.E.), they consisted of identical texts in the three official languages of the empire: Old Persian, Babylonian, and Elamite. The Behistun inscription was to the decipherment of cuneiform what the Rosetta Stone was to the decipherment of Egyptian hieroglyphs.[14]

Rawlinson correctly deduced that the Old Persian was a phonetic script and he successfully deciphered it. In 1837 he finished his copy of the Behistun inscription, and sent a translation of its opening paragraphs to the Royal Asiatic Society. Before his article could be published, however, the works of Lassen and Burnouf reached him, necessitating a revision of his article and the postponement of its publication. Then came other causes of delay. In 1847 the first part of the Rawlinson's Memoir was published; the second part did not appear until 1849.[15][nb 2] The task of deciphering the Persian cuneiform texts was virtually accomplished.[7]

After translating the Persian, Rawlinson and, working independently of him, the Irish Assyriologist Edward Hincks, began to decipher the others. (The actual techniques used to decipher the Akkadian language have never been fully published; Hincks described how he sought the proper names already legible in the deciphered Persian while Rawlinson never said anything at all, leading some to speculate that he was secretly copying Hincks.[16]) They were greatly helped by Paul Émile Botta's discovery of the city of Nineveh in 1842. Among the treasures uncovered by Botta were the remains of the great library of Ashurbanipal, a royal archive containing tens of thousands of baked clay tablets covered with cuneiform inscriptions.

By 1851, Hincks and Rawlinson could read 200 Babylonian signs. They were soon joined by two other decipherers: young German-born scholar Julius Oppert, and versatile British Orientalist William Henry Fox Talbot. In 1857 the four men met in London and took part in a famous experiment to test the accuracy of their decipherments. Edwin Norris, the secretary of the Royal Asiatic Society, gave each of them a copy of a recently discovered inscription from the reign of the Assyrian emperor Tiglath-Pileser I. A jury of experts was empanelled to examine the resulting translations and assess their accuracy. In all essential points the translations produced by the four scholars were found to be in close agreement with one another. There were of course some slight discrepancies. The inexperienced Talbot had made a number of mistakes, and Oppert's translation contained a few doubtful passages which the jury politely ascribed to his unfamiliarity with the English language. But Hincks' and Rawlinson's versions corresponded remarkably closely in many respects. The jury declared itself satisfied, and the decipherment of Akkadian cuneiform was adjudged a fait accompli.

In the early days of cuneiform decipherment, the reading of proper names presented the greatest difficulties. However, there is now a better understanding of the principles behind the formation and the pronunciation of the thousands of names found in historical records, business documents, votive inscriptions, literary productions and legal documents. The primary challenge was posed by the characteristic use of old Sumerian non-phonetic logograms in other languages that had different pronunciations for the same symbols. Until the exact phonetic reading of many names was determined through parallel passages or explanatory lists, scholars remained in doubt, or had recourse to conjectural or provisional readings. Fortunately, in many cases, there are variant readings, the same name being written phonetically (in whole or in part) in one instance, and logographically in another.


Extract from the Cyrus Cylinder (lines 15–21), giving the genealogy of Cyrus the Great and an account of his capture of Babylon in 539 B.C.E.

Cuneiform has a specific format for transliteration. Because of the script's polyvalence, transliteration requires certain choices of the transliterating scholar, who must decide in the case of each sign which of its several possible meanings is intended in the original document. For example, the sign DINGIR in a Hittite text may represent either the Hittite syllable an or may be part of an Akkadian phrase, representing the syllable il, it may be a Sumerogram, representing the original Sumerian meaning, 'god' or the determinative for a deity. In transliteration, a different rendition of the same glyph is chosen depending on its role in the present context.

Therefore, a text containing DINGIR and MU in succession could be construed to represent the words "ana", "ila", god + "a" (the accusative ending), god + water, or a divine name "A" or Water. Someone transcribing the signs would make the decision how the signs should be read and assemble the signs as "ana", "ila", "Ila" ("god"+accusative case), etc. A transliteration of these signs, however, would separate the signs with dashes "il-a", "an-a", "DINGIR-a" or "Da". This is still easier to read than the original cuneiform, but now the reader is able to trace the sounds back to the original signs and determine if the correct decision was made on how to read them. A transliterated document thus presents both the reading preferred by the transliterating scholar as well as the opportunity to reconstruct the original text.

There are differing conventions for transliterating Sumerian, Akkadian (Babylonian) and Hittite (and Luwian) cuneiform texts. One convention that sees wide use across the different fields is the use of acute and grave accents as an abbreviation for homophone disambiguation. Thus, u is equivalent to u1, the first glyph expressing phonetic u. An acute accent, ú, is equivalent to the second, u2, and a grave accent ù to the third, u3 glyph in the series (while the sequence of numbering is conventional but essentially arbitrary and subject to the history of decipherment). In Sumerian transliteration, a multiplication sign 'x' is used to indicate ligatures. As shown above, signs as such are represented in capital letters, while the specific reading selected in the transliteration is represented in small letters. Thus, capital letters can be used to indicate a so-called Diri compound – a sign sequence that has, in combination, a reading different from the sum of the individual constituent signs (for example, the compound IGI.A – "water" + "eye" – has the reading imhur, meaning "foam"). In a Diri compound, the individual signs are separated with dots in transliteration. Capital letters may also be used to indicate a Sumerogram (for example, KÙ.BABBAR – Sumerian for "silver" – being used with the intended Akkadian reading kaspum, "silver"), an Akkadogram, or simply a sign sequence of whose reading the editor is uncertain. Naturally, the "real" reading, if it is clear, will be presented in small letters in the transliteration: IGI.A will be rendered as imhur4.

Since the Sumerian language has only been widely known and studied by scholars for approximately a century, changes in the accepted reading of Sumerian names have occurred from time to time. Thus the name of a king of Ur, read Ur-Bau at one time, was later read as Ur-Engur, and is now read as Ur-Nammu or Ur-Namma; for Lugal-zaggisi, a king of Uruk, some scholars continued to read Ungal-zaggisi; and so forth. Also, with some names of the older period, there was often uncertainty whether their bearers were Sumerians or Semites. If the former, then their names could be assumed to be read as Sumerian, while, if they were Semites, the signs for writing their names were probably to be read according to their Semitic equivalents, though occasionally Semites might be encountered bearing genuine Sumerian names. There was also doubt whether the signs composing a Semite's name represented a phonetic reading or a logographic compound. Thus, e.g. when inscriptions of a Semitic ruler of Kish, whose name was written Uru-mu-ush, were first deciphered, that name was first taken to be logographic because uru mu-ush could be read as "he founded a city" in Sumerian, and scholars accordingly retranslated it back to the original Semitic as Alu-usharshid. It was later recognized that the URU sign can also be read as and that the name is that of the Akkadian king Rimush.


The tables below show signs used for simple syllables of the form CV or VC. As used for the Sumerian language, the cuneiform script was in principle capable of distinguishing at least 16 consonants, transliterated as

b, d, g, g̃, ḫ, k, l, m, n, p, r, ř, s, š, t, z

as well as four vowel qualities, a, e, i, u. The Akkadian language had no use for or ř but needed to distinguish its emphatic series, q, ṣ, ṭ, adopting various "superfluous" Sumerian signs for the purpose (e.g. qe=KIN, qu=KUM, qi=KIN, ṣa=ZA, ṣe=ZÍ, ṭur=DUR etc.[clarification needed]) Hittite as it adopted the Akkadian cuneiform further introduced signs for the glide w, e.g. wa=PI, wi5=GEŠTIN) as well as a ligature I.A for ya.





a ?,

á ?

e ?,

é ?

i ?,

í=IÁ ?

u ?,

ú ?,
ù ?


ba ?,

=PA ?,
=EŠ ?

be=BAD ?,

=BI ?,
=NI ?

bi ?,

=NE ?,
=PI ?

bu ?,

=PÙ ?


da ?,

=TA ?

de=DI ?,

=NE ?

di ?,

=TÍ ?

du ?,

=TU ?,
=GAG ?,
du4=TUM ?


ga ?,


ge=GI ?,

=KID ?,
=DIŠ ?

gi ?,

=KID ?,
=DIŠ ?,
gi4 ?,
gi5=KI ?

gu ?,

=KA ?,
gu4 ?,
gu5=KU ?,
gu6=NAG ?,
gu7 ?


ḫa ?,

ḫá=ḪI.A ?￰メタタ,
ḫà=U ?,
ḫa4=ḪI ?

ḫe=ḪI ?,

ḫé=GAN ?

ḫi ?,

ḫí=GAN ?

ḫu ?


ka ?,

=GA ?

ke=KI ?,

=GI ?

ki ?,

=GI ?

ku ?,

=GU7 ?,
ku4 ?


la ?,

=LAL ?,
=NU ?

le=LI ?,

=NI ?

li ?,

=NI ?

lu ?,



ma ?,


me ?,

=MI ?,

mi ?,

=ME ?

mu ?,

=SAR ?


na ?,

=AG ?,
na4 ("NI.UD") ?￰メフモ

ne ?,

=NI ?

ni ?,

=IM ?

nu ?,

=NÁ ?


pa ?,

=BA ?

pe=PI ?,

=BI ?

pi ?,

=BI ?,
=BAD ?

pu=BU ?,

=TÚL ?,


ra ?,

=DU ?

re=RI ?,

=URU ?

ri ?,

=URU ?

ru ?,

=GAG ?,
=AŠ ?


sa ?,

=DI ?,
=ZA ?,
sa4 ("ḪU.NÁ") ?￰メネᄒ

se=SI ?,

=ZI ?

si ?,

=ZI ?

su ?,

=ZU ?,
=SUD ?,
su4 ?


ša ?,

šá=NÍG ?,
šà ?

še ?,

šè ?

ši=IGI ?,

ší=SI ?

šu ?,

šú ?,
šù=ŠÈ ?,
šu4=U ?


ta ?,

=DA ?

te ?,

=TÍ ?

ti ?,

=DIM ?,
ti4=DI ?

tu ?,

=UD ?,
=DU ?


za ?,

=NA4 ?￰メフモ

ze=ZI ?,

=ZÌ ?

zi ?,


zu ?,

=KA ?





a ?,

á ?

e ?,

é ?

i ?,

í=IÁ ?

u ?,

ú ?,
ù ?


ab ?,

áb ?

eb=IB ?,

éb=TUM ?

ib ?,

íb=TUM ?

ub ?,

úb=ŠÈ ?


ad ?,

ád ?



íd=A.ENGUR ?￰メヌノ

ud ?,

úd=ÁŠ ?


ag ?,

ág ?

eg=IG ?,

ég=E ?

ig ?,

íg=E ?

ug ?


aḫ ?,

áḫ=ŠEŠ ?

eḫ=AḪ ?

iḫ=AḪ ?

uḫ=AḪ ?,

úḫ ?


ak=AG ?

ek=IG ?

ik=IG ?

uk=UG ?


al ?,

ál=ALAM ?

el ?,

él=IL ?

il ?,

íl ?

ul ?,

úl=NU ?


am ?/?,

ám=ÁG ?

em=IM ?

im ?,

ím=KAŠ4 ?

um ?,

úm=UD ?


an ?

en ?,

èn=LI ?

in ?,

in4=EN ?,
in5=NIN ?￰メフニ

un ?,

ún=U ?


ap=AB ?


ép=TUM ?

ip=IB ?,

íp=TUM ?

up=UB ?,

úp=ŠÈ ?


ar ?,

ár=UB ?

er=IR ?

ir ?,

íp=A.IGI ?￰メナニ

ur ?,

úr ?


as=AZ ?

es=GIŠ ?,

és=EŠ ?

is=GIŠ ?,

ís=EŠ ?


ús=UŠ ?


áš ?


éš=ŠÈ ?




úš?=BAD ?


at=AD ?,

át=GÍR gunû ?



ut=UD ?,

út=ÁŠ ?


az ?

ez=GIŠ ?,

éz=EŠ ?

iz= GIŠ ?,

íz=IŠ ?

uz=ŠE&HU ?￰メトᄋ

úz=UŠ ?,
ùz ?

Sign inventories

List of cuneiform signs

Cuneiform writing in Ur, southern Iraq

The Sumerian cuneiform script had on the order of 1,000 distinct signs (or about 1,500 if variants are included). This number was reduced to about 600 by the 24th century B.C.E. and the beginning of Akkadian records. Not all Sumerian signs are used in Akkadian texts, and not all Akkadian signs are used in Hittite.

Falkenstein (1936) lists 939 signs used in the earliest period (late Uruk, 34th to 31st centuries). With an emphasis on Sumerian forms, Deimel (1922) lists 870 signs used in the Early Dynastic II period (28th century, "LAK") and for the Early Dynastic IIIa period (26th century, "ŠL"). Rosengarten (1967) lists 468 signs used in Sumerian (pre-Sargonian). Lagash and Mittermayer ("aBZL", 2006) list 480 Sumerian forms, written in Isin-Larsa and Old Babylonian times. Regarding Akkadian forms, the standard handbook for many years was Borger ("ABZ", 1981) with 598 signs used in Assyrian/Babylonian writing, recently superseded by Borger ("MesZL", 2004) with an expansion to 907 signs, an extension of their Sumerian readings and a new numbering scheme.

Signs used in Hittite cuneiform are listed by Forrer (1922), Friedrich (1960) and the HZL (Rüster and Neu 1989). The HZL lists a total of 375 signs, many with variants (for example, 12 variants are given for number 123 EGIR).


Babylonian numerals

The Sumerians used a numerical system based on 1, 10 and 60. The way of writing a number like 70 would be the sign for 60 and the sign for 10 right after. This way of counting is still used today for measuring time as 60 seconds per minute and 60 minutes per hour.[2]


Cuneiform (Unicode block) and Cuneiform Numbers and Punctuation (Unicode block)

Unicode (as of version 6.0) assigns to Sumero-Akkadian Cuneiform script the following ranges:

U+12000–U+123FF (879 assigned characters) "Cuneiform"
U+12400–U+1247F (103 assigned characters) "Cuneiform Numbers and Punctuation"

The final proposal for Unicode encoding of the script was submitted by two cuneiform scholars working with an experienced Unicode proposal writer in June 2004.[17] The base character inventory is derived from the list of Ur III signs compiled by the Cuneiform Digital Library Initiative of UCLA based on the inventories of Miguel Civil, Rykle Borger (2003), and Robert Englund. Rather than opting for a direct ordering by glyph shape and complexity, according to the numbering of an existing catalog, the Unicode order of glyphs was based on the Latin alphabetic order of their "last" Sumerian transliteration as a practical approximation.

List of major Cuneiform tablet discoveries

This list is incomplete; you can help by expanding it.


Number of tablets

Initial discovery


Abu Salabikh









Mari, Syria




Library of Ashurbanipal














Tens of thousands[24]



Ebla tablets



Sumerian and Eblaite








Amarna letters












  It seems that various parts of Rawlisons' paper formed Vol X of this journal. The final part III comprised chapters IV (Analysis of the Persian Inscriptions of Behistunand) and V (Copies and Translations of the Persian Cuneiform Inscriptions of Persepolis, Hamadan, and Van), pp. 187–349.


A pictogram, also called a pictogramme, pictograph, or simply picto,[1] and also an 'icon'[citation needed], is an ideogram that conveys its meaning through its pictorial resemblance to a physical object. Pictographs are often used in writing and graphic systems in which the characters are to a considerable extent pictorial in appearance.

Pictography is a form of writing which uses representational, pictorial drawings, similarly to cuneiform and, to some extent, hieroglyphic writing, which also uses drawings as phonetic letters or determinative rhymes. In certain modern use, pictograms participate to a formal language (e.g. Hazards pictograms).



Ojibwa pictographs on cliff-face at Agawa Rock, Lake Superior Provincial Park

Early written symbols were based on pictographs (pictures which resemble what they signify) and ideograms (symbols which represent ideas). Ancient Sumerian, Egyptian, and Chinese civilizations began to use such symbols over, developing them into logographic writing systems. Pictographs are still in use as the main medium of written communication in some non-literate cultures in Africa, the Americas, and Oceania. Pictographs are often used as simple, pictorial, representational symbols by most contemporary cultures.

Pictographs can be considered an art form, or can be considered a written language and are designated as such in Pre-Columbian art, Native American art, Ancient Mesopotamia and Painting in the Americas before Colonization. One example of many is the Rock art of the Chumash people, part of the Native American history of California. In 2011, UNESCO World Heritage adds to its list a new site "Petroglyph Complexes of the Mongolian Altai, Mongolia"[2] to celebrate the importance of the pictograms engraved in rocks.

Some scientists in the field of neuropsychiatry and neuropsychology, such as Prof. Dr. Mario Christian Meyer, are studying the symbolic meaning of indigenous pictograms and petroglyphs,[3] aiming to create new ways of communication between native people and modern scientists to safeguard and valorize their cultural diversity.[4]

Modern uses

An early modern example of the extensive use of pictographs may be seen in the map in the London suburban timetables of the London and North Eastern Railway, 1936-1947, designed by George Dow, in which a variety of pictographs was used to indicate facilities available at or near each station. Pictographs remain in common use today, serving as pictorial, representational signs, instructions, or statistical diagrams. Because of their graphical nature and fairly realistic style, they are widely used to indicate public toilets, or places such as airports and train stations.

Pictographic writing as a modernist poetic technique is credited to Ezra Pound, though French surrealists accurately credit the Pacific Northwest American Indians of Alaska who introduced writing, via totem poles, to North America.[5]

Contemporary artist Xu Bing created Book from the Ground, a universal language made up of pictograms collected from around the world. A Book from the Ground chat program has been exhibited in museums and galleries internationally.

Pictograms are used in many areas of modern life for commodity purposes, often as a formal language (see following section).

In mathematics

A compound pictogram showing the breakdown of the survivors and deaths of the maiden voyage of the RMS Titanic by class and age/gender (click for more detail)

In statistics, pictograms are charts in which icons represent numbers to make it more interesting and easier to understand. A key is often included to indicate what each icon represents. All icons must be of the same size, but a fraction of an icon can be used to show the respective fraction of that amount.[6]

For example, the following table:


Letters sent











Can be graphed as follows. As the values are rounded to the nearest 5 letters, the second icon on Tuesday is the left half of the original.


Letters sent










Key: Email Silk.svg = 10 letters


Pictographs can often transcend languages in that they can communicate to speakers of a number of tongues and language families equally effectively, even if the languages and cultures are completely different. This is why road signs and similar pictographic material are often applied as global standards expected to be understood by nearly all.

A standard set of pictographs was defined in the international standard ISO 7001: Public Information Symbols. Another common set of pictographs are the laundry symbols used on clothing tags and the chemical hazard symbols as standardized by the GHS system.

Pictograms have been popularized in use on the web and in software, better known as "icons" displayed on a computer screen in order to help user navigate a computer system or mobile device.


Pictish language

Pictish is the extinct language, or dialect, spoken by the Picts, the people of northern and central Scotland in the Early Middle Ages. There is virtually no direct attestation of Pictish, short of a limited number of place names and names of people found on monuments and the contemporary records in the area controlled by the Kingdom of the Picts. However, evidence from place names and personal names points to the language being closely related to the Brittonic language spoken prior to Anglo-Saxon settlement in what is now southern Scotland, England and Wales. A minority view held by a few scholars claims that Pictish was at least partially non-Indo-European or that a non-Indo-European and Brittonic language coexisted. Pictish was replaced by Gaelic in the latter centuries of the Pictish period.


Language classification

Picture by H. E. Marshall (1867–1941) depicting Columba preaching to Bridei, king of Fortriu in 565.

The existence of a distinct Pictish language during the Early Middle Ages is attested clearly in Bede's early 8th century Historia ecclesiastica gentis Anglorum, which names Pictish as a language distinct from that spoken by the Britons, the Irish, and the English.[2] Bede states that Columba, a Gael, used an interpreter during his mission to the Picts. A number of competing theories have been advanced regarding the nature of the Pictish language:

Most scholars agree that Pictish was a branch of the Brittonic language, while a few scholars merely accept that it was related to the Brittonic language. Pictish came under increasing pressure and influence from Old Irish spoken in Dál Riata from the 5th century until its eventual replacement.[3]

Pictish is thought to have influenced the development of modern Scottish Gaelic. This is perhaps most obvious in the contribution of loan words, but more importantly it is thought that Pictish influenced the syntax of Scottish Gaelic, which bears greater similarity to Brittonic languages than does Irish.[4]

Position within Celtic

The evidence of place names and personal names demonstrates that an Insular Celtic language related to the more southerly Brittonic languages was formerly spoken in the Pictish area.[5] The view of Pictish as a P-Celtic language was first proposed in 1582 by George Buchanan, who aligned the language with Gaulish.[6] A compatible view was advanced by antiquarian George Chalmers in the early 19th century. Chalmers considered that Pictish and Brittonic were one and the same, basing his argument on P-Celtic orthography in the Pictish king lists and in place names predominant in historically Pictish areas.[7]

Personal names of Roman-era chieftains from the Pictish area, including Calgacus (above) have a Celtic origin.[8]

Celtic scholar Whitley Stokes, in a philological study of the Irish annals, concluded that Pictish was closely related to Welsh.[9] This conclusion was supported by philologist Alexander MacBain's analysis of the place and tribe names in Ptolemy's 2nd century Geographia.[10] Toponymist William Watson's exhaustive review of Scottish place names demonstrated convincingly the existence of a dominant P-Celtic language in historically Pictish areas, concluding that the Pictish language was a Northern extension of British and that Gaelic was a later introduction from Ireland.[11]

William Forbes Skene argued in 1837 that Pictish was a Goidelic language, the ancestor of modern Scottish Gaelic.[12] He suggested that Columba's use of an interpreter reflected his preaching to the Picts in Latin, rather than any difference between the Irish and Pictish languages.[13] This view, involving independent settlement of Ireland and Scotland by Goedelic people, obviated an Irish influence in the development of Gaelic Scotland and enjoyed wide popular acceptance in 19th century Scotland, but is no longer given credence.[14]

While Skene's notion of an exclusively Q-Celtic Pictish language has long been rejected, the Picts were under increasing political, social and linguistic pressure from Dál Riata from around the 5th century. The Picts were steadily Gaelicised through the latter centuries of the Pictish Kingdom, and by the time of the merging of the Pictish and Dál Riatan kingdoms, the Picts were essentially a Gaelic-speaking people.[15] Forsyth speculates that a period of bilingualism may have outlasted the Pictish kingdom in peripheral areas by several generations.[16] Scottish Gaelic, unlike Irish (and, for that matter, Old Irish) maintains a substantial corpus of Brittonic loan-words and, moreover, uses a verbal system modelled on the same pattern as Welsh.[17]

Pre-Indo-European theory

Difficulties in translation of Ogham inscriptions, like those found on the Brandsbutt Stone, led to a widely held belief that Pictish was a non-Indo-European language

John Rhys, in 1892, proposed that Pictish was a non-Indo-European language. This opinion was based on the apparently unintelligible ogham inscriptions found in historically Pictish areas.[18] A similar position was taken by Heinrich Zimmer, who argued that the Picts' supposedly exotic cultural practices (tattooing and matriliny) were equally non-Indo-European,[19] and a Pre-Indo-European model was maintained by some well into the 20th century.[20]

A modified version of this theory was advanced in an influential 1955 review of Pictish by Kenneth Jackson. Jackson proposed a two-language model: while Pictish was undoubtedly P-Celtic, it may have had a non-Celtic substratum and a second language may have been used for inscriptions.[21] Jackson's hypothesis was framed in the then-current model that a Brittonic elite, identified as the Broch-builders, had migrated from the south of Britain into Pictish territory, dominating a pre-Celtic majority.[22] He used this to reconcile the perceived translational difficulties of Ogham with the overwhelming evidence for a P-Celtic Pictish language. Jackson was content to write off Ogham inscriptions as inherently unintelligible.[23]

Jackson's model became the orthodox position for the latter half of the 20th century. However, it has become progressively undermined by advances in understanding of late Iron Age archaeology, as well as by improved understanding of the enigmatic Ogham inscriptions, a number of which have since been interpreted as Celtic.[24]

Despite this, Eric P. Hamp in his 2012 Indo-European family tree, classified Pictish as a non-Indo-European language.[25]

Discredited theories

Traditional accounts (now rejected) claimed that the Picts had migrated to Scotland from Scythia, a region that encompassed Eastern Europe and Central Asia.[26] Buchanan, looking for a Scythian P-Celtic candidate for the ancestral Pict, settled on the Gaulish-speaking Cotini (which he rendered as Gothuni), a tribe from the region that is now modern-day Slovakia. This was later misunderstood by Robert Sibbald in 1710, who equated Gothuni with the Germanic-speaking Goths.[27] John Pinkerton expanded on this in 1789, claiming that Pictish was the predecessor to Modern Scots.[28] Pinkerton's arguments were often rambling, bizarre and clearly motivated by his belief that Celts were an inferior people. The theory of a Germanic Pictish language is no longer considered credible.[29]


Chinese language

汉语/漢語 or 中文
Hànyǔ or Zhōngwén

Hànyǔ (Chinese) written in traditional (left) and simplified (right) characters

Native to

China, Taiwan, Singapore, Malaysia, the United States, Canada, Indonesia, and other places with significant overseas Chinese communities



Native speakers

unknown (1.2 billion cited 1984–2001)[1]

Language family


Standard forms

Standard Chinese




Wu (incl. Shanghainese)




Min (incl. Amoy, Teochew, Hoochew)


Yue (incl. Cantonese, Taishanese)


Writing system

Chinese characters, zhuyin fuhao, Latin, Arabic, Cyrillic, braille. Ancient use of 'Phags-pa script.

Official status

Official language in


 Hong Kong

Wa State, Burma

 United Nations

Recognised minority
language in



 United States

Regulated by

National Commission on Language and Script Work[2]
National Languages Committee
Promote Mandarin Council
Chinese Language Standardisation Council

Language codes

ISO 639-1


ISO 639-2

chi (B)
zho (T)

ISO 639-3

zhoinclusive code
Individual codes:
cdo – Min Dong
cjy – Jinyu
cmn – Mandarin
cpx – Pu Xian
czh – Huizhou
czo – Min Zhong
gan – Gan
hak – Hakka
hsn – Xiang
mnp – Min Bei
nan – Min Nan
wuu – Wu
yue – Yue
och – Old Chinese
ltc – Late Middle Chinese
lzh – Classical Chinese





Map of the Sinophone world


  Countries identified Chinese as a primary, administrative, or native language

  Countries with more than 5,000,000 Chinese speakers

  Countries with more than 1,000,000 Chinese speakers

  Countries with more than 500,000 Chinese speakers

  Countries with more than 100,000 Chinese speakers

  Major Chinese-speaking settlements

This article contains IPA phonetic symbols. Without proper rendering support, you may see question marks, boxes, or other symbols instead of Unicode characters.

Chinese languages (Spoken)

Traditional Chinese


Simplified Chinese


Literal meaning

Han language


Chinese language (Written)



Literal meaning

Chinese text


This article contains Chinese text. Without proper rendering support, you may see question marks, boxes, or other symbols instead of Chinese characters.

Chinese Listeni/ˈnz/ (汉语 / 漢語; Hànyǔ or 中文; Zhōngwén) is a group of related but in many cases mutually unintelligible language varieties, forming a branch of the Sino-Tibetan language family. Chinese is spoken by the Han majority and many other ethnic groups in China. Nearly 1.2 billion people (around 16% of the world's population) speak some form of Chinese as their first language.

The varieties of Chinese are usually described by native speakers as dialects of a single Chinese language, but linguists note that they are as diverse as a language family.[a] The internal diversity of Chinese has been likened to that of the Romance languages, but may be even more varied. There are between 7 and 13 main regional groups of Chinese (depending on classification scheme), of which the most spoken, by far, is Mandarin (about 960 million), followed by Wu (80 million), Yue (70 million) and Min (70 million). Most of these groups are mutually unintelligible, although some, like Xiang and the Southwest Mandarin dialects, may share common terms and some degree of intelligibility. All varieties of Chinese are tonal and analytic.

Standard Chinese (Putonghua/Guoyu/Huayu) is a standardized form of spoken Chinese based on the Beijing dialect of Mandarin. It is the official language of China and Taiwan, as well as one of four official languages of Singapore. It is one of the six official languages of the United Nations. The written form of the standard language (中文; Zhōngwén), based on the logograms known as Chinese characters (汉字/漢字; hànzi), is shared by literate speakers of otherwise unintelligible dialects.

Of the other varieties of Chinese, Cantonese (the prestige variety of Yue) is influential in Guangdong province and in Hong Kong and Macau, and is widely spoken among overseas communities. Min Nan, part of the Min group, is widely spoken in southern Fujian, in neighbouring Taiwan (where it is known as Taiwanese or Hoklo) and in Southeast Asia (also known as Hokkien in the Philippines, Singapore, and Malaysia). There are also sizeable Hakka and Shanghainese diasporas, for example in Taiwan, where most Hakka communities are also conversant in Taiwanese and Standard Chinese.



History of the Chinese language

Chinese can be traced back over 3,000 years to the first written records, and even earlier to a hypothetical Sino-Tibetan proto-language. The language has evolved over time, with various local varieties becoming mutually unintelligible. In reaction, central governments have repeatedly sought to promulgate a unified standard.[4]


Most linguists classify all varieties of Chinese as part of the Sino-Tibetan language family, together with Burmese, Tibetan and many other languages spoken in the Himalayas and the Southeast Asian Massif.[5] Although the relationship was first proposed in the early 19th century and is now broadly accepted, reconstruction of Sino-Tibetan is much less developed than for families such as Indo-European or Austroasiatic. Difficulties have included the great diversity of the languages, the lack of inflection in many of them, and the effects of language contact. In addition, many of the smaller languages are spoken in mountainous areas that are difficult to access, and are often also sensitive border zones.[6] Without a secure reconstruction of proto-Sino-Tibetan, the higher-level structure of the family remains unclear.[7] A top-level branching into Chinese and Tibeto-Burman languages is often assumed, but has not been convincingly demonstrated.[8]

Old and Middle Chinese

The earliest examples of Chinese are divinatory inscriptions on oracle bones from around 1250 BCE in the late Shang dynasty.[9] Old Chinese was the language of the Western Zhou period (1046–771 BCE), recorded in inscriptions on bronze artifacts, the Classic of Poetry and portions of the Book of Documents and I Ching.[10] Scholars have attempted to reconstruct the phonology of Old Chinese by comparing later varieties of Chinese with the rhyming practice of the Classic of Poetry and the phonetic elements found in the majority of Chinese characters.[11] Although many of the finer details remain unclear, most scholars agree that Old Chinese differed from Middle Chinese in lacking retroflex and palatal obstruents but having initial consonant clusters of some sort, and in having voiceless nasals and liquids.[12] Most recent reconstructions also describe an atonal language with consonant clusters at the end of the syllable, developing into tone distinctions in Middle Chinese.[13] Several derivational affixes have also been identified, but the language lacked inflection, and indicated grammatical relationships using word order and grammatical particles.[14]

Middle Chinese was the language used during Southern and Northern Dynasties and the Sui, Tang, and Song dynasties (6th through 10th centuries CE). It can be divided into an early period, reflected by the Qieyun rime book (601 CE), and a late period in the 10th century, reflected by rhyme tables such as the Yunjing constructed by ancient Chinese philologists as a guide to the Qieyun system.[15] These works define phonological categories, but with little hint of what sounds they represent.[16] Linguists have identified these sounds by comparing the categories with pronunciations in modern varieties of Chinese, borrowed Chinese words in Japanese, Vietnamese and Korean, and transcription evidence.[17] The resulting system is very complex, with a large number of consonants and vowels, but they were probably not all distinguished in any single dialect. Most linguists now believe it represents a diasystem encompassing 6th-century northern and southern standards for reading the classics.[18]

Rise of northern dialects

After the fall of the Northern Song dynasty, and during the reign of the Jin (Jurchen) and Yuan (Mongol) dynasties in northern China, a common speech (now called Old Mandarin) developed based on the dialects of the North China Plain around the capital.[19] The Zhongyuan Yinyun (1324) was a dictionary that codified the rhyming conventions of new sanqu verse form in this language.[20] Together with the slightly later Menggu Ziyun, this dictionary describes a language with many of the features characteristic of modern Mandarin dialects.[21]

Until the mid-20th century, most of the Chinese people living in many parts of southern China spoke only their local language. As a practical measure, officials of the Ming and Qing dynasties carried out the administration of the empire using a common language based on Mandarin varieties, known as Guānhuà (官話, literally "language of officials").[22] For most of this period, this language was a koiné based on dialects spoken in the Nanjing area, though not identical to any single dialect.[23] By the middle of the 19th century, the Beijing dialect had become dominant and was essential for any business with the imperial court.[24]

In the 1930s a standard national language Guóyǔ (国语/國語 "national language") was adopted. After much dispute between proponents of northern and southern dialects and an abortive attempt at an artificial pronunciation, the National Language Unification Commission finally settled on the Beijing dialect in 1932. The People's Republic founded in 1949 retained this standard, calling it pǔtōnghuà (普通话/普通話 "common speech").[25] The national language is now used in education, the media, and formal situations in both Mainland China and Taiwan.[26] In Hong Kong and Macau, because of their colonial and linguistic history, the language of education, the media, formal speech and everyday life remains the local Cantonese, although the standard language is now very influential and taught in schools.[27]


Adoption of Chinese literary culture and Sino-Xenic vocabularies

The Tripitaka Koreana, a Korean collection of the Chinese Buddhist canon

The Chinese language has spread to neighbouring countries through a variety of means. Northern Vietnam was incorporated into the Han empire in 111 BCE, beginning a period of Chinese control that ran almost continuously for a millennium. The Four Commanderies were established in northern Korea in the first century BCE, but disintegrated in the following centuries.[28] Chinese Buddhism spread over East Asia between the 2nd and 5th centuries CE, and with it the study of scriptures and literature in Literary Chinese.[29] Later Korea, Japan and Vietnam developed strong central governments modelled on Chinese institutions, with Literary Chinese as the language of administration and scholarship, a position it would retain until the late 19th century in Korea and (to a lesser extent) Japan, and the early 20th century in Vietnam.[30] Scholars from different lands could communicate, albeit only in writing, using Literary Chinese.[31]

Although they used Chinese solely for written communication, each country had its own tradition of reading texts aloud, the so-called Sino-Xenic pronunciations. Chinese words with these pronunciations were also borrowed extensively into the Korean, Japanese and Vietnamese languages, and today comprise over half their vocabularies.[32] This massive influx led to changes in the phonological structure of the languages, contributing to the development of moraic structure in Japanese[33] and the disruption of vowel harmony in Korean.[34]

Borrowed Chinese morphemes have been used extensively in all these languages to coin compound words for new concepts, in a similar way to the use of Latin and Ancient Greek roots in European languages.[35] Many new compounds, or new meanings for old phrases, were created in the late 19th and early 20th centuries to name Western concepts and artifacts. These coinages, written in shared Chinese characters, have then been borrowed freely between languages. They have even been accepted into Chinese, a language usually resistant to loanwords, because their foreign origin was hidden by their written form. Often different compounds for the same concept were in circulation for some time before a winner emerged, and sometimes the final choice differed between countries.[36] The proportion of vocabulary of Chinese origin thus tends to be greater in technical, abstract or formal language. For example, Sino-Japanese words account for about 35% of the words in entertainment magazines, over half the words in newspapers, and 60% of the words in science magazines.[37]

Vietnam, Korea and Japan each developed writing systems for their own languages, initially based on Chinese characters, but later replaced with the Hangul alphabet for Korean and supplemented with kana syllabaries for Japanese, while Vietnamese continued to be written with the complex Chữ nôm script. However these were limited to popular literature until the late 19th century. Today Japanese is written with a composite script using both Chinese characters (Kanji) and kana, but Korean is written exclusively with Hangul in North Korea, and supplementary Chinese characters (Hanja) are increasingly rarely used in the South. Vietnamese is written with a Latin-based alphabet.

Examples of loan words in English include "tea", from Hokkien (Min Nan) () and "kumquat", from Cantonese gam1gwat1 (金橘).


Varieties of Chinese

Jerry Norman estimated that there are hundreds of mutually unintelligible varieties of Chinese.[38] These varieties form a dialect continuum, in which differences in speech generally become more pronounced as distances increase, though the rate of change varies immensely.[39] Generally, mountainous South China exhibits more linguistic diversity than the North China Plain. In parts of South China, a major city's dialect may only be marginally intelligible to close neighbours. For instance, Wuzhou is about 120 miles (190 km) upstream from Guangzhou, but the Yue variety spoken there is more like that of Guangzhou than is that of Taishan, 60 miles (95 km) southwest of Guangzhou and separated from it by several rivers.[40] In parts of Fujian the speech of neighbouring counties or even villages may be mutually unintelligible.[41]

Until the late 20th century, Chinese emigrants to Southeast Asia and North America came from southeast coastal areas, where Min, Hakka and Yue dialects are spoken.[42] The vast majority of Chinese immigrants to North America spoke the Taishan dialect, from a small coastal area southwest of Guangzhou.[43]


Local varieties of Chinese are conventionally classified into seven dialect groups, largely on the basis of the different evolution of Middle Chinese voiced initials:[44][45]

The classification of Li Rong, which is used in the Language Atlas of China (1987), distinguishes three further groups:[46][47]

  • Jin, previously included in Mandarin.

  • Huizhou, previously included in Wu.

  • Pinghua, previously included in Yue.

The primary branches of Chinese in eastern China and Taiwan[46]

Numbers of first-language speakers (all countries):[1]

  •   Mandarin: 847.8 million (70.9%)

  •   Wu: 77.2 million (6.5%)

  •   Min: 71.8 million (6.0%)

  •   Yue: 60 million (5.0%)

  •   Jin: 45 million (3.8%)

  •   Xiang: 36 million (3.0%)

  •   Hakka: 30.1 million (2.5%)

  •   Gan: 20.6 million (1.7%)

  •   Huizhou: 4.6 million (0.4%)

  •   Pinghua: 2 million (0.2%)

Some varieties remain unclassified, including Danzhou dialect (spoken in Danzhou, on Hainan Island), Waxianghua (spoken in western Hunan) and Shaozhou Tuhua (spoken in northern Guangdong).[48]

Standard Chinese and diglossia

Standard Chinese and List of countries where Chinese is an official language

Putonghua / Guoyu, often called "Mandarin", is the official standard language used by the People's Republic of China, the Republic of China (Taiwan), and Singapore (where it is called "Huayu" or simply Chinese). It is based on the Beijing dialect, which is the dialect of Mandarin as spoken in Beijing. The government intends for speakers of all Chinese speech varieties to use it as a common language of communication. Therefore it is used in government agencies, in the media, and as a language of instruction in schools.

In mainland China and Taiwan, diglossia has been a common feature: it is common for a Chinese to be able to speak two or even three varieties of the Sinitic languages (or "dialects") together with Standard Chinese. For example, in addition to putonghua, a resident of Shanghai might speak Shanghainese; and, if he or she grew up elsewhere, then he or she may also be likely to be fluent in the particular dialect of that local area. A native of Guangzhou may speak both Cantonese and putonghua, a resident of Taiwan, both Taiwanese and putonghua/guoyu. A person living in Taiwan may commonly mix pronunciations, phrases, and words from Mandarin and Taiwanese, and this mixture is considered normal in daily or informal speech.


In common English usage, Chinese is considered a language and its varieties "dialects", a classification that agrees with Chinese speakers' self-perception. Most linguists prefer instead to call Chinese a family of languages, because of the lack of mutual intelligibility between its divisions. Measuring this mutual intelligibility is not precise, but Chinese is often compared to the Romance languages in this regard. According to the Ausbausprache, Abstandsprache and Dachsprache framework mutual intelligibility is not the decisive element to classify different language forms as "dialect" or "language". Some linguists find the use of "Chinese languages" also problematic, because it can imply a set of disruptive "religious, economic, political, and other differences" between speakers that exist between for example between French Catholics and English Protestants in Canada, but not between speakers of Cantonese and Mandarin in China, owing to China's near-uninterrupted history of centralized government.[49]

Chinese itself has a term for its unified writing system, Zhōngwén (中文), while the closest equivalent used to describe its spoken variants would be Hànyǔ (汉语/漢語, "spoken language[s] of the Han Chinese")—this term could be translated to either "language" or "languages" since Chinese lacks grammatical number. For centuries in China, owing to the widespread use of a written standard in Classical Chinese, there was no uniform speech-and-writing continuum, as indicated by the employment of two separate morphemes / and wén . The characters used in written Chinese are logographs that denote morphemes as a whole rather than their phonemes, although most logographs are compounds of similar-sounding characters and semantic disambiguation (the "radical"). Modern-day Chinese speakers of all kinds communicate using the modern standard written language, the written form of Standard Chinese.

In Chinese, the major spoken varieties of Chinese are called fāngyán (方言, literally "regional speech"), and mutually intelligible variants within these are called dìdiǎn fāngyán (地点方言/地點方言 "local speech"). Both terms are customarily translated into English as "dialect".[49] Ethnic Chinese often consider these spoken variations as one single language for reasons of nationality and as they inherit one common cultural and linguistic heritage in Classical Chinese. Han native speakers of Wu, Min, Hakka, and Cantonese, for instance, may consider their own linguistic varieties as separate spoken languages, but the Han Chinese as one—albeit internally very diverse—ethnicity. To Chinese nationalists, the idea of Chinese as a language family may suggest that the Chinese identity is much more fragmented and disunified than it actually is and as such is often looked upon as culturally and politically provocative. Additionally, in Taiwan it is closely associated with Taiwanese independence, some of whose supporters promote the local Taiwanese Hokkien spoken language.


Written Chinese, Mainland Chinese Braille and Taiwanese Braille

The relationship between the Chinese spoken and written language is rather complex. Its spoken varieties evolved at different rates, while written Chinese itself has changed much less. Classical Chinese literature began in the Spring and Autumn period, although written records have been discovered as far back as the 14th to 11th centuries BCE Shang dynasty oracle bones using the oracle bone scripts.

The Chinese orthography centers on Chinese characters, hanzi, which are written within imaginary rectangular blocks, traditionally arranged in vertical columns, read from top to bottom down a column, and right to left across columns. Chinese characters are morphemes independent of phonetic change. Thus the character ("one") is uttered in Standard Chinese, jat1 in Cantonese and chi̍t/it in Hokkien (form of Min). Vocabularies from different major Chinese variants have diverged, and colloquial non-standard written Chinese often makes use of unique "dialectal characters", such as and for Cantonese and Hakka, which are considered archaic or unused in standard written Chinese.

Written colloquial Cantonese has become quite popular in online chat rooms and instant messaging amongst Hong-Kongers and Cantonese-speakers elsewhere. Use of it is considered highly informal, and does not extend to many formal occasions.

In Hunan, women in certain areas write their local language in Nü Shu, a syllabary derived from Chinese characters. The Dungan language, considered by many a dialect of Mandarin, is nowadays written in Cyrillic, and was previously written in the Arabic script. The Dungan people are primarily Muslim and live mainly in Kazakhstan, Kyrgyzstan, and Russia; some of the related Hui people also speak the language and live mainly in China.

Chinese characters

Chinese characters

"Preface to the Poems Composed at the Orchid Pavilion" by Wang Xizhi, written in semi-cursive style

Each Chinese character represents a monosyllabic Chinese word or morpheme. In 100 CE, the famed Han dynasty scholar Xu Shen classified characters into six categories, namely pictographs, simple ideographs, compound ideographs, phonetic loans, phonetic compounds and derivative characters. Of these, only 4% were categorized as pictographs, including many of the simplest characters, such as rén (human), (sun), shān (mountain; hill), shuǐ (water). Between 80% and 90% were classified as phonetic compounds such as chōng (pour), combining a phonetic component zhōng (middle) with a semantic radical (water). Almost all characters created since have been of this type. The 18th-century Kangxi Dictionary recognized 214 radicals.

Modern characters are styled after the regular script. Various other written styles are also used in Chinese calligraphy, including seal script, cursive script and clerical script. Calligraphy artists can write in traditional and simplified characters, but they tend to use traditional characters for traditional art.

There are currently two systems for Chinese characters. The traditional system, still used in Hong Kong, Taiwan, Macau and Chinese speaking communities (except Singapore and Malaysia) outside mainland China, takes its form from standardized character forms dating back to the late Han dynasty. The Simplified Chinese character system, introduced by the People's Republic of China in 1954 to promote mass literacy, simplifies most complex traditional glyphs to fewer strokes, many to common cursive shorthand variants.

Singapore, which has a large Chinese community, is the first—and at present the only—foreign nation to officially adopt simplified characters, although it has also become the de facto standard for younger ethnic Chinese in Malaysia. The Internet provides the platform to practice reading the alternative system, be it traditional or simplified.

A well-educated Chinese reader today recognizes approximately 4,000–6,000 characters; approximately 3,000 characters are required to read a Mainland newspaper. The PRC government defines literacy amongst workers as a knowledge of 2,000 characters, though this would be only functional literacy. A large unabridged dictionary, like the Kangxi Dictionary, contains over 40,000 characters, including obscure, variant, rare, and archaic characters; fewer than a quarter of these characters are now commonly used.


Standard Chinese has fewer than 1,700 distinct syllables but 4,000 common written characters, so there are many homophones. For example, the following characters (not necessarily words) are all pronounced : 鸡/雞 chicken, 机/機 machine, basic, 击/擊 to hit, 饥/饑 hunger, and 积/積 accumulate. In speech, the meaning of a syllable is determined by context (for example, in English, "some" as the opposite of "none" as opposed to "sum" in arithmetic) or by the word it is found in ("some" or "sum" vs. "summer"). Speakers may clarify which written character they mean by giving a word or phrase it is found in: 名字叫嘉英,嘉陵江的嘉,英國的英 Míngzi jiào Jiāyīng, Jiālíng Jiāng de jiā, Yīngguó de yīng – "My name is Jiāyīng, 'Jia' as in 'Jialing River' and 'ying' as in 'England'."

Southern Chinese varieties like Cantonese and Hakka preserved more of the rimes of Middle Chinese and also have more tones. Several of the examples of Mandarin above have distinct pronunciations in Cantonese (romanized using jyutping): gai1, gei1, gei1, gik1, gei1, and zik1 respectively. For this reason, southern varieties tend to need to employ fewer multi-syllabic words.


Standard Chinese phonology, Historical Chinese phonology and Varieties of Chinese → Phonology

The phonological structure of each syllable consists of a nucleus consisting of a vowel (which can be a monophthong, diphthong, or even a triphthong in certain varieties), preceded by an onset (a single consonant, or consonant+glide; zero onset is also possible), and followed (optionally) by a coda consonant; a syllable also carries a tone. There are some instances where a vowel is not used as a nucleus. An example of this is in Cantonese, where the nasal sonorant consonants /m/ and /ŋ/ can stand alone as their own syllable.

Across all the spoken varieties, most syllables tend to be open syllables, meaning they have no coda (assuming that a final glide is not analyzed as a coda), but syllables that do have codas are restricted to /m/, /n/, /ŋ/, /p/, /ɻ /, /t/, /k/, or /ʔ/. Some varieties allow most of these codas, whereas others, such as Standard Chinese, are limited to only /n/, /ŋ/ and /ɻ /.

The number of sounds in the different spoken dialects varies, but in general there has been a tendency to a reduction in sounds from Middle Chinese. The Mandarin dialects in particular have experienced a dramatic decrease in sounds and so have far more multisyllabic words than most other spoken varieties. The total number of syllables in some varieties is therefore only about a thousand, including tonal variation, which is only about an eighth as many as English.[b]


All varieties of spoken Chinese use tones to distinguish words.[50] A few dialects of north China may have as few as three tones, while some dialects in south China have up to 6 or 10 tones, depending on how one counts. One exception from this is Shanghainese which has reduced the set of tones to a two-toned pitch accent system much like modern Japanese.

A very common example used to illustrate the use of tones in Chinese are the four tones of Standard Chinese (along with the neutral tone) applied to the syllable ma. The tones are exemplified by the following five Chinese words:

The four main tones of Standard Mandarin, pronounced with the syllable ma.



Example of Standard Mandarin tones



Pitch contour



high level


high rising



low falling-rising



high falling





question particle

Standard Cantonese, by contrast, has six tones in open syllables and three tones in syllables ending with stops:[51]

Example of Standard Cantonese tones



Pitch contour




high level, high falling



high rising



mid level

"to assassinate"



low falling



low rising



low level



high level (stopped)



mid level (stopped)



low level (stopped)

"to eat"

Phonetic transcriptions

The Chinese had no uniform phonetic transcription system until the mid-20th century, although enunciation patterns were recorded in early rime books and dictionaries. Early Indian translators, working in Sanskrit and Pali, were the first to attempt to describe the sounds and enunciation patterns of Chinese in a foreign language. After the 15th century, the efforts of Jesuits and Western court missionaries resulted in some rudimentary Latin transcription systems, based on the Nanjing Mandarin dialect.


"National language" (國語; Guóyǔ) written in Traditional and Simplified Chinese characters, followed by various romanizations.

Chinese language romanisation in Singapore and Romanization of Mandarin Chinese

Romanization is the process of transcribing a language into the Latin script. There are many systems of romanization for the Chinese languages due to the lack of a native phonetic transcription until modern times. Chinese is first known to have been written in Latin characters by Western Christian missionaries in the 16th century.

Today the most common romanization standard for Standard Chinese is Hanyu Pinyin, often known simply as pinyin, introduced in 1956 by the People's Republic of China, and later adopted by Singapore and Taiwan. Pinyin is almost universally employed now for teaching standard spoken Chinese in schools and universities across America, Australia and Europe. Chinese parents also use Pinyin to teach their children the sounds and tones of new words. In school books that teach Chinese, the Pinyin romanization is often shown below a picture of the thing the word represents, with the Chinese character alongside.

The second-most common romanization system, the Wade–Giles, was invented by Thomas Wade in 1859 and modified by Herbert Giles in 1892. As this system approximates the phonology of Mandarin Chinese into English consonants and vowels, i.e. it is an Anglicization, it may be particularly helpful for beginner Chinese speakers of an English-speaking background. Wade–Giles was found in academic use in the United States, particularly before the 1980s, and until recently[when?] was widely used in Taiwan.

When used within European texts, the tone transcriptions in both pinyin and Wade–Giles are often left out for simplicity; Wade–Giles' extensive use of apostrophes is also usually omitted. Thus, most Western readers will be much more familiar with Beijing than they will be with Běijīng (pinyin), and with Taipei than T'ai²-pei³ (Wade–Giles). This simplification presents syllables as homophones which really are none, and therefore exaggerates the number of homophones almost by a factor of four.

Here are a few examples of Hanyu Pinyin and Wade–Giles, for comparison:

Mandarin Romanization Comparison



Hanyu Pinyin









Capital of the People's Republic of China




Capital of the Republic of China (Taiwan)


Mao² Tse²-tung¹

Máo Zédōng

Former Communist Chinese leader


Chiang³ Chieh⁴-shih²

Jiǎng Jièshí

Former Nationalist Chinese leader (better known to English speakers as Chiang Kai-shek, with Cantonese pronunciation)


K'ung³ Tsu³

Kǒng Zǐ


Other systems of romanization for Chinese include Gwoyeu Romatzyh, the French EFEO, the Yale (invented during WWII for U.S. troops), as well as separate systems for Cantonese, Min Nan, Hakka, and other Chinese languages or dialects.

Other phonetic transcriptions

Chinese languages have been phonetically transcribed into many other writing systems over the centuries. The 'Phags-pa script, for example, has been very helpful in reconstructing the pronunciations of pre-modern forms of Chinese.

Zhuyin (also called bopomofo), a semi-syllabary is still widely used in Taiwan's elementary schools to aid standard pronunciation. Although bopomofo characters are reminiscent of katakana script, there is no source to substantiate the claim that Katakana was the basis for the zhuyin system. A comparison table of zhuyin to pinyin exists in the zhuyin article. Syllables based on pinyin and zhuyin can also be compared by looking at the following articles:

There are also at least two systems of cyrillization for Chinese. The most widespread is the Palladius system.

Grammar and morphology

Chinese grammar

Chinese classifiers

Chinese is often described as a "monosyllabic" language. However, this is only partially correct. It is largely accurate when describing Classical Chinese and Middle Chinese; in Classical Chinese, for example, perhaps 90% of words correspond to a single syllable and a single character. In the modern varieties, it is still usually the case that a morpheme (unit of meaning) is a single syllable; contrast English, with plenty of multi-syllable morphemes, both bound and free, such as "seven", "elephant", "para-" and "-able". Some of the conservative southern varieties of modern Chinese still have largely monosyllabic words, especially among the more basic vocabulary.

In modern Mandarin, however, most nouns, adjectives and verbs are largely disyllabic. A significant cause of this is phonological attrition. Sound change over time has steadily reduced the number of possible syllables. In modern Mandarin, there are now only about 1,200 possible syllables, including tonal distinctions, compared with about 5,000 in Vietnamese (still largely monosyllabic) and over 8,000 in English.[b]

This phonological collapse has led to a corresponding increase in the number of homophones. As an example, the small Langenscheidt Pocket Chinese Dictionary[52] lists six common words pronounced shí (tone 2): "ten"; "real, actual"; "know (a person), recognize"; "stone"; "time"; "food". These were all pronounced differently in Early Middle Chinese; in William H. Baxter's transcription they were dzyip, zyit, syik, dzyek, dzyi and zyik respectively. They are still pronounced differently in today's Cantonese; in Jyutping they are sap9, sat9, sik7, sek9, si4, sik9. In modern spoken Mandarin, however, tremendous ambiguity would result if all of these words could be used as-is; Yuen Ren Chao's modern poem Lion-Eating Poet in the Stone Den exploits this, consisting of 92 characters all pronounced shi. As such, most of these words have been replaced (in speech, if not in writing) with a longer, less-ambiguous compound. Only the first one, "ten", normally appears as such when spoken; the rest are normally replaced with, respectively, 实际 shíjì (lit. "actual-connection"); 认识 rènshi (lit. "recognize-know"); 石头 shítou (lit. "stone-head"); 时间 shíjiān (lit. "time-interval"); 食物 shíwù (lit. "food-thing"). In each case, the homophone was disambiguated by adding another morpheme, typically either a synonym or a generic word of some sort (for example, "head", "thing"), whose purpose is simply to indicate which of the possible meanings of the other, homophonic syllable should be selected.

However, when one of the above words forms part of a compound, the disambiguating syllable is generally dropped and the resulting word is still disyllabic. For example, shí alone, not 石头 shítou, appears in compounds meaning "stone-", for example, 石膏 shígāo "plaster" (lit. "stone cream"), 石灰 shíhuī "lime" (lit. "stone dust"), 石窟 shíkū "grotto" (lit. "stone cave"), 石英 shíyīng "quartz" (lit. "stone flower"), 石油 shíyóu "petroleum" (lit. "stone oil").

Most modern varieties of Chinese have the tendency to form new words through disyllabic, trisyllabic and tetra-character compounds. In some cases, monosyllabic words have become disyllabic without compounding, as in 窟窿 kūlong from kǒng; this is especially common in Jin.

Chinese morphology is strictly bound to a set number of syllables with a fairly rigid construction which are the morphemes, the smallest blocks of the language. While many of these single-syllable morphemes (, ) can stand alone as individual words, they more often than not form multi-syllabic compounds, known as (词/詞), which more closely resembles the traditional Western notion of a word. A Chinese (“word”) can consist of more than one character-morpheme, usually two, but there can be three or more.

For example:

  • yún /雲 – "cloud"

  • hànbǎobāo, hànbǎo 汉堡包/漢堡包, 汉堡/漢堡 – "hamburger"

  • 我 – "I, me"

  • rén 人 – "people"

  • dìqiú 地球 – "earth"

  • shǎndiàn 闪电/閃電 – "lightning"

  • mèng /夢 – "dream"

All varieties of modern Chinese are analytic languages, in that they depend on syntax (word order and sentence structure) rather than morphology—i.e., changes in form of a word—to indicate the word's function in a sentence.[53] In other words, Chinese has very few grammatical inflections—it possesses no tenses, no voices, no numbers (singular, plural; though there are plural markers, for example for personal pronouns), and only a few articles (i.e., equivalents to "the, a, an" in English).[c]

They make heavy use of grammatical particles to indicate aspect and mood. In Mandarin Chinese, this involves the use of particles like le (perfective), hái 还/還 (still), yǐjīng 已经/已經 (already), and so on.

Chinese features a subject–verb–object word order, and like many other languages in East Asia, makes frequent use of the topic–comment construction to form sentences. Chinese also has an extensive system of classifiers and measure words, another trait shared with neighbouring languages like Japanese and Korean. Other notable grammatical features common to all the spoken varieties of Chinese include the use of serial verb construction, pronoun dropping and the related subject dropping.

Although the grammars of the spoken varieties share many traits, they do possess differences.


The entire Chinese character corpus since antiquity comprises well over 20,000 characters, of which only roughly 10,000 are now commonly in use. However Chinese characters should not be confused with Chinese words. Because most Chinese words are made up of two or more characters, there are many times more Chinese words than there are characters.

Estimates of the total number of Chinese words and phrases vary greatly. The Hanyu Da Zidian, a compendium of Chinese characters, includes 54,678 head entries for characters, including bone oracle versions. The Zhonghua Zihai (1994) contains 85,568 head entries for character definitions, and is the largest reference work based purely on character and its literary variants. The CC-CEDICT project (2010) contains 97,404 contemporary entries including idioms, technology terms and names of political figures, businesses and products. The 2009 version of the Webster's Digital Chinese Dictionary (WDCD),[54] based on CC-CEDICT, contains over 84,000 entries.

The most comprehensive pure linguistic Chinese-language dictionary, the 12-volumed Hanyu Da Cidian, records more than 23,000 head Chinese characters and gives over 370,000 definitions. The 1999 revised Cihai, a multi-volume encyclopedic dictionary reference work, gives 122,836 vocabulary entry definitions under 19,485 Chinese characters, including proper names, phrases and common zoological, geographical, sociological, scientific and technical terms.

The latest 2012 6th edition of Xiandai Hanyu Cidian, an authoritative one-volume dictionary on modern standard Chinese language as used in mainland China, has 69,000 entries and defines 13,000 head characters.


Translation of neologisms into Chinese and Transcription into Chinese characters

Like any other language, Chinese has absorbed a sizable number of loanwords from other cultures. Most Chinese words are formed out of native Chinese morphemes, including words describing imported objects and ideas. However, direct phonetic borrowing of foreign words has gone on since ancient times.

Some early Indo-European loanwords in Chinese have been proposed, notably "honey", shī "lion," and perhaps also "horse", zhū "pig", quǎn "dog", and é "goose".[d] Ancient words borrowed from along the Silk Road since Old Chinese include 葡萄 pútáo "grape", 石榴 shíliú "pomegranate" and 狮子/獅子 shīzi "lion". Some words were borrowed from Buddhist scriptures, including "Buddha" and 菩萨/菩薩 Púsà "bodhisattva." Other words came from nomadic peoples to the north, such as 胡同 hútóng "hutong". Words borrowed from the peoples along the Silk Road, such as 葡萄 "grape," generally have Persian etymologies. Buddhist terminology is generally derived from Sanskrit or Pāli, the liturgical languages of North India. Words borrowed from the nomadic tribes of the Gobi, Mongolian or northeast regions generally have Altaic etymologies, such as 琵琶 pípa, the Chinese lute, or lào/luò "cheese" or "yoghurt", but from exactly which source is not always clear.[55]

Modern borrowings and loanwords

Modern neologisms are primarily translated into Chinese in one of three ways: free translation (calque, or by meaning), phonetic translation (by sound), or a combination of the two. Today, it is much more common to use existing Chinese morphemes to coin new words in order to represent imported concepts, such as technical expressions and international scientific vocabulary. Any Latin or Greek etymologies are dropped and converted into the corresponding Chinese characters (for example, anti- typically becomes "", literally opposite), making them more comprehensible for Chinese but introducing more difficulties in understanding foreign texts. For example, the word telephone was loaned phonetically as 德律风/德律風 (Shanghainese: télífon [təlɪfoŋ], Mandarin: délǜfēng) during the 1920s and widely used in Shanghai, but later 电话/電話 diànhuà (lit. "electric speech"), built out of native Chinese morphemes, became prevalent (電話 is in fact from the Japanese 電話 denwa; see below for more Japanese loans). Other examples include 电视/電視 diànshì (lit. "electric vision") for television, 电脑/電腦 diànnǎo (lit. "electric brain") for computer; 手机/手機 shǒujī (lit. "hand machine") for mobile phone, 蓝牙/藍牙 lányá (lit. "blue tooth") for Bluetooth, and 网志/網誌 wǎngzhì (lit. "internet logbook") for blog in Hong Kong and Macau Cantonese. Occasionally half-transliteration, half-translation compromises (phono-semantic matching) are accepted, such as 汉堡包/漢堡包 hànbǎobāo (漢堡 hànbǎo "Hamburg" + bāo "bun") for "hamburger". Sometimes translations are designed so that they sound like the original while incorporating Chinese morphemes, such as 拖拉机/拖拉機 tuōlājī "tractor" (lit. "dragging-pulling machine"), or 马利奥/馬利奧 Mǎlì'ào for the video game character Mario. This is often done for commercial purposes, for example 奔腾/奔騰 bēnténg (lit. "dashing-leaping") for Pentium and 赛百味/賽百味 Sàibǎiwèi (lit. "better-than hundred tastes") for Subway restaurants.

Foreign words, mainly proper nouns, continue to enter the Chinese language by transcription according to their pronunciations. This is done by employing Chinese characters with similar pronunciations. For example, "Israel" becomes 以色列 Yǐsèliè, "Paris" becomes 巴黎 Bālí. A rather small number of direct transliterations have survived as common words, including 沙发/沙發 shāfā "sofa", 马达/馬達 mǎdá "motor", 幽默 yōumò "humor", 逻辑/邏輯 luójí "logic", 时髦/時髦 shímáo "smart, fashionable", and 歇斯底里 xiēsīdǐlǐ "hysterics". The bulk of these words were originally coined in the Shanghai dialect during the early 20th century and were later loaned into Mandarin, hence their pronunciations in Mandarin may be quite off from the English. For example, 沙发/沙發 "sofa" and 马达/馬達 "motor" in Shanghainese sound more like their English counterparts. Cantonese differs from Mandarin with some transliterations, such as 梳化 so1 faa3*2 "sofa" and 摩打 mo1 daa2 "motor".

Western foreign words representing Western concepts have influenced Chinese since the 20th century through transcription. From French came 芭蕾 bāléi "ballet" and 香槟 xiāngbīn, "champagne"; from Italian, 咖啡 kāfēi "caffè". English influence is particularly pronounced. From early 20th century Shanghainese, many English words are borrowed, such as 高尔夫/高爾夫 gāoěrfū "golf" and the above-mentioned 沙发/沙發 shāfā "sofa". Later, the United States soft influences gave rise to 迪斯科 dísīkē "disco", 可乐/可樂 kělè "cola", and 迷你 mínǐ "mini [skirt]". Contemporary colloquial Cantonese has distinct loanwords from English, such as 卡通 kaa1 tung1 "cartoon", 基佬 gei1 lou2 "gay people", 的士 dik1 si6*2 "taxi", and 巴士 baa1 si6*2 "bus". With the rising popularity of the Internet, there is a current vogue in China for coining English transliterations, for example, 粉丝/粉絲 fěnsī "fans", 黑客 hēikè "hacker" (lit. "black guest"), and 博客 bókè. In Taiwan, some of these transliterations are different, such as 駭客 hàikè for "hacker" and 部落格 bùluògé for "blog" (lit. "interconnected tribes").

Another result of the English influence on Chinese is the appearance in Modern Chinese texts of so-called 字母词/字母詞 zìmǔcí (lit. "lettered words") spelled with letters from the English alphabet. This has appeared in magazines, newspapers, on web sites, and on TV: G手机/三G手機 "3rd generation cell phones" (sān "three" + G "generation" + 手机/手機 shǒujī "mobile phones"), IT"IT circles" (IT "information technology" + jiè "industry"), HSK (Hànyǔ Shuǐpíng Kǎoshì, 汉语水平考试/漢語水平考試), GB (Guóbiāo, 国标/國標), CIF价/CIF(CIF "Cost, Insurance, Freight" + 价/價 jià "price"), e家庭 "e-home" (e "electronic" + 家庭 jiātíng "home"), W时代/W時代 "wireless era" (W "wireless" + 时代/時代 shídài "era"), TV"TV watchers" (TV "television" + "social group; clan"), РС时代/後PC時代 "post-PC era" (后/後 hòu "after/post-" + PC "personal computer" + 时代/時代), and so on.

Since the 20th century, another source of words has been Japanese using existing kanji (Chinese characters used in Japanese). Japanese re-molded European concepts and inventions into wasei-kango (和製漢語?, lit. "Japanese-made Chinese"), and many of these words have been re-loaned into modern Chinese. Other terms were coined by the Japanese by giving new senses to existing Chinese terms or by referring to expressions used in classical Chinese literature. For example, jīngjì (经济/經濟; 経済 keizai in Japanese), which in the original Chinese meant "the workings of the state", was narrowed to "economy" in Japanese; this narrowed definition was then re-imported into Chinese. As a result, these terms are virtually indistinguishable from native Chinese words: indeed, there is some dispute over some of these terms as to whether the Japanese or Chinese coined them first. As a result of this loaning, Chinese, Korean, Japanese, and Vietnamese share a corpus of linguistic terms describing modern terminology, paralleling the similar corpus of terms built from Greco-Latin and shared among European languages.


Yang Lingfu, former curator of the National Museum of China, giving Chinese language instruction at the Civil Affairs Staging Area in 1945.

Chinese as a foreign language

With the growing importance and influence of China's economy globally, Mandarin instruction is gaining popularity in schools in the USA, and has become an increasingly popular subject of study amongst the young in the Western world, as in the UK.[56]

In 1991 there were 2,000 foreign learners taking China's official Chinese Proficiency Test (comparable to the English Cambridge Certificate), while in 2005, the number of candidates had risen sharply to 117,660.[57] By 2010, 750,000 people had taken the Chinese Proficiency Test.

