Blog Essay
The History and Etymology of Proto-Indo-Iranian
Explore Proto-Indo-Iranian—the PIE descendant behind Sanskrit and Avestan—tracing Indo-Aryan and Iranian roots, sound shifts, and early migrations.
Proto-Indo-Iranian is the reconstructed ancestor language that sits between Proto-Indo-European (PIE) and the earliest attested Indo-Aryan and Iranian traditions—most famously Vedic Sanskrit and Old Avestan. Linguists rebuild its sound system, grammar, and core vocabulary by comparing thousands of shared cognates and regular sound correspondences across Indo-Iranian languages, from modern Hindi/Urdu, Bengali, and Punjabi to Persian (Farsi), Kurdish, and Pashto. This is where classic parallels like sapta vs hafta (“seven”) and soma vs haoma come from, along with hallmark shifts (like Iranian s often becoming h) and deeply shared ritual terms and poetic formulae. Studying Proto-Indo-Iranian is basically the fastest way to understand the history and etymology of Indo-Iranian vocabulary, how the branches split, and why Sanskrit and Avestan can look uncannily like two dialects of the same older language.
Proto-Indo-Iranian Language Model
- PIE
- Proto-Indo-Iranian
- Proto-Indo-Aryan → Vedic/Classical Sanskrit → Middle Indo-Aryan (Prakrits etc.) → modern Indo-Aryan (Hindi/Urdu, Bengali, Marathi, Punjabi, …)
- Proto-Iranian → Old Iranian (Avestan, Old Persian) → Middle Iranian → New Iranian (Modern Persian, Kurdish, Pashto, …)
- Proto-Indo-Iranian
Britannica explicitly treats Indo-Aryan and Iranian as the core Indo-Iranian groups. (Encyclopedia Britannica)
How close are Proto-Indo-Iranian and Proto-Indo-Aryan?
They’re very closely related in the family-tree sense: Persian (Farsi) is Iranian, and Iranian + Indo-Aryan are the two big sister branches inside Indo-Iranian, which is itself a branch of Proto-Indo-European (PIE). (Encyclopedia Britannica)
- Old Iranian vs Vedic Sanskrit: very close—close enough that historical linguists often use them together to reconstruct Proto-Indo-Iranian. Encyclopaedia Iranica notes Old Avestan is “closely akin” to the oldest Rigvedic language. (Encyclopaedia Iranica)
- Modern Persian vs modern Indo-Aryan (e.g., Hindi/Urdu): still related, but “family resemblance” is less obvious because both have had ~3+ millennia of independent sound changes, grammar simplification, and heavy borrowing.
Easy similarities (shared inheritance)
You see lots of cognates (same inherited word, sound-shifted):
- father: Sanskrit pitā ~ Persian pedar (related forms across Indo-Iranian)
- mother: Sanskrit mātṛ ~ Persian mādar
- name: Sanskrit nāma ~ Persian nām
And they share big Indo-Iranian sound/history traits (e.g., satem-type developments), which is part of why Indo-Iranian is treated as a tight subgroup. (Encyclopedia Britannica)
Signature differences
1) The famous Iranian s → h shift
A classic “tell” is that many Iranian languages weaken s to h in certain environments. A historical phonology overview describes “Pan-Iranian [s] > [h]” and even gives the textbook comparison Sanskrit sapta- vs Avestan hapta (“seven”). That’s why you get Persian haft “seven” where Sanskrit has sapta.
A related cultural/religious pair you’ll see in intro books:
- Sanskrit asura vs Avestan ahura (same root, Iranian shows the h-type reflex). (Britannica discusses this equivalence in passing when explaining Ahura Mazda.) (Encyclopedia Britannica)
2) Grammar: Persian simplified hard; Indo-Aryan simplified differently
Both started with rich inherited morphology (cases, complex verb systems). Over time:
- Persian becomes strongly analytic (more “using helper words” than endings). Iranica notes there isn’t a clean break from Middle to New Persian, but rather a long, gradual evolution, with major post-conquest shifts in texts and structure. (Encyclopaedia Iranica)
- Many Indo-Aryan languages also become more analytic, but often keep grammatical gender (Hindi/Urdu: masc/fem), and use postpositions built on older case forms.
(So: both simplify, but they don’t land on the same endpoint.)
3) Vocabulary layering differs a lot
- Persian: massive Arabic borrowing after the Islamic conquest (plus later French/Russian/English borrowings, depending on era and register). (Encyclopaedia Iranica)
- Indo-Aryan: strong Sanskrit “learned” layer (tatsama) in many registers, plus Persian/Arabic loans especially in Urdu/Hindustani (historical prestige and administration), plus local substrate influences.
Do these all ultimately descend from Sanskrit?
No. That’s the key misconception.
- Sanskrit is not the ancestor of Persian/Iranian. Iranian and Indo-Aryan are sister branches descending from Proto-Indo-Iranian, not one from the other. (Encyclopedia Britannica)
- Even for modern Indo-Aryan, the direct spoken ancestors are better thought of as Middle Indo-Aryan (Prakrits, Apabhraṃśa, etc.), not Classical Sanskrit. Sanskrit is hugely influential as a prestige/liturgical source, but it’s not “the parent language” of the whole family in a simple linear way.
Old Avestan vs Sanskrit
Old Avestan (the language of the Gāthās) and Vedic Sanskrit (esp. early Rigveda) are basically sister languages: both come from Proto-Indo-Iranian, but one is Old Iranian and the other is Old Indo-Aryan. That’s why they line up so well that you can often “translate” forms across by applying a small set of regular sound correspondences. Encyclopaedia Iranica puts it bluntly: Avestan morphology is inherited from PIE via Proto-Indo-Iranian and “agrees largely with that of Vedic,” and systematic comparison with Vedic is a major tool for understanding Avestan forms. (Encyclopaedia Iranica)
What “close” looks like: quick cognate set
Here are high-signal pairs (not borrowings; shared inheritance). Some are also culturally matched (ritual terms) because the religions split from a shared Indo-Iranian background.
| Old Avestan | Vedic Sanskrit | Meaning |
|---|---|---|
| hafta | sapta | seven (Encyclopaedia Iranica) |
| haoma | soma | pressed ritual drink/plant (Encyclopaedia Iranica) |
| yasna | yajña | sacrifice/ritual (Encyclopedia Britannica) |
| zaotar | hotár | (chief) priest (Encyclopedia Britannica) |
| aṧa | ṛtá | truth/cosmic order (Encyclopaedia Iranica) |
| dugədar- | duhitár- | daughter (Encyclopaedia Iranica) |
| vərəṇtē | vṛṇīte | ”he chooses / wishes” (Encyclopaedia Iranica) |
The big sound correspondences (the stuff you notice fast)
1) Proto-Indo-Iranian s → Avestan h (very often)
This is the headline Iranian shift. Iranica describes it as: in positions other than a few protected environments, Proto-Indo-Iranian s becomes h in Iranian/Avestan; classic example: Avestan hafta vs Vedic saptá. (Encyclopaedia Iranica) Same pattern underlies haoma vs soma (though the etymology is discussed as sauma- in Iranica). (Encyclopaedia Iranica)
2) Aspirated stops vs fricatives (Sanskrit keeps aspiration; Iranian lenites)
A simplified version: Sanskrit is happy with bh dh gh / ph th kh; Iranian tends to turn related material into fricatives like f θ x in certain contexts. Iranica summarizes key rules: Proto-Indo-Iranian p, t, k → f, θ, x before consonants; and even pʰ, tʰ, kʰ → f, θ, x before vowels (with wrinkles/exceptions). (Encyclopaedia Iranica) So Avestan often looks “softer” where Vedic looks “stop-heavy / aspirated.”
3) The “palatal” series lines up differently
Proto-Indo-Iranian palatal affricates end up as Vedic ś/j/h vs Avestan s/z/z in many common words. Iranica gives: Avestan satəm vs Vedic śatám (“hundred”), and zaotar vs hotár (“priest”). (Encyclopaedia Iranica)
4) Syllabic ṛ behavior diverges
Avestan often has something like ər where Vedic keeps ṛ, and you can see it in paired forms like Avestan arəθa- vs Vedic ártha- (“meaning/aim/thing”), and other examples in Iranica’s vowel/sonant discussion. (Encyclopaedia Iranica)
Grammar: very similar type, with predictable differences
Nouns: same old Indo-European machinery
Both languages (at their oldest stages) have:
- 3 genders (m/f/n),
- 3 numbers (sg/du/pl),
- a rich case system (the classic IE set: nom/acc/gen/dat/abl/inst/loc/voc—details vary by stem class and tradition).
Iranica’s grammar section explicitly says Avestan noun/adjective/pronoun/verb morphology is inherited via Proto-Indo-Iranian and “agrees largely with Vedic,” which is why comparing to Vedic is so productive. (Encyclopaedia Iranica) You can see the same declensional logic across stem types in the Avestan paradigms Iranica lays out (a-stems, ā/ī-stems, i/u-stems, etc.). (Encyclopaedia Iranica)
Pronouns: very familiar shapes
Example from Avestan: second-person pronoun has tū / tūm among its forms. (Encyclopaedia Iranica) That should “feel right” if you know Vedic/Sanskrit tvám etc.—same inherited pronominal system, slightly different sound history.
Verbs: active vs middle; subjunctive/optative; old endings
Old Avestan preserves an old Indo-Iranian verb system with:
- active vs middle (like Vedic),
- moods like subjunctive and optative,
- lots of inherited endings.
Iranica gives very explicit side-by-side anchors, e.g. Avestan present middle dastē corresponding to Vedic datté, and an imperative dasuuā corresponding to Vedic d(h)atsva. (Encyclopaedia Iranica)
So how do they relate, in one sentence?
Old Avestan and Vedic Sanskrit are extremely close sister languages—two early daughters of Proto-Indo-Iranian—so close that much of the work is “apply the known sound laws, then match the shared Indo-European morphology.” (Encyclopaedia Iranica)
Old Avestan → Vedic Sanskrit conversion cheat sheet
| # | If you see in Old Avestan… | Often corresponds to in Vedic Sanskrit… | Example pair (Avestan → Vedic) | What to do mentally |
|---|---|---|---|---|
| 1 | h (from earlier s) | s | hafta → sapta (“7”) | Try swapping h → s first |
| 2 | s (where Vedic has palatal) | ś | satəm → śatám (“100”) | Map s → ś in “satem-ish” words |
| 3 | z (often from earlier palatal affricates) | j / h (varies by context) | zaotar → hotár (priest) | Treat z as “palatal-origin” and test j/h outcomes |
| 4 | x (velar fricative) | kh / k (often from aspirated/cluster contexts) | (pattern-based) | When you see x, suspect a “stronger” k/kh on the Vedic side |
| 5 | θ (dental fricative) | th / t | (pattern-based) | θ often matches Sanskrit th or sometimes t |
| 6 | f (labial fricative) | p / ph | (pattern-based) | f frequently lines up with p/ph |
| 7 | -ā / -ō vowel endings (stem-class dependent) | -ā / -aḥ / -am etc. | (paradigm dependent) | Don’t panic: match by case/number, not just the vowel |
| 8 | -m (accusative / object marker is common) | -m | tūm ↔ tvám (2sg “you”, oblique/object forms) | If you see final -m, think “object/oblique-ish” |
| 9 | -tē / -tē (middle endings show up a lot) | -te | vərəṇtē → vṛṇīte (“chooses”, middle) | Spot the shared middle marker -te |
| 10 | ər / ar sequences | ṛ / ra | (pattern-based) | Try collapsing ər → ṛ when the root matches |
Blunt Caveat: a few rows are “pattern-based” because the exact outcome depends on environment (neighboring sounds, stress, morphology). But this is still enough to get surprising mileage when you’re eyeballing cognates.
Old Avestan vs Vedic walkthroughs (noun + verb)
| Mini-example | Old Avestan (pieces) | Apply the cheat sheet | Vedic-style result | What happened |
|---|---|---|---|---|
| 1) “Seven” | hafta | #1: h → s; then recognize ft ~ pt family resemblance | sapta | Classic Iranian h where Vedic has s, plus a regular cluster difference |
| 2) “Hundred” | satəm | #2: s → ś | śatám | Avestan plain s often matches Vedic ś in this set |
| 3) “He chooses” (verb) | vərəṇtē | #9: -tē ↔ -te; #10: ər → ṛ; then match the root | vṛṇīte | Same verb/root, both using middle morphology, with predictable vowel/sonant differences |
| 4) “Priest” (agent noun) | zaotar | #3: z → h/j (here h); then align the agent ending | hotár | Same inherited title, different reflex of the palatal-origin consonant |
Conclusion
Old Avestan and Vedic Sanskrit are close enough that, with a handful of recurring correspondences, you can often jump from one to the other without “learning” the whole language—because you’re really exploiting the shared Proto-Indo-Iranian skeleton underneath. The trick is to treat the consonant swaps (especially Avestan h vs Vedic s, and Avestan s/z vs Vedic ś/j/h) as systematic, then let morphology do the rest: once you recognize common endings (like middle -te) and stem behavior, the remaining differences are usually just predictable sound history rather than random drift.