07 August 2015

Sanskrit, Dravidian, and Munda

Modern distribution of
Indian languages
In this essay, I will reiterate some important points made by Michael Witzel about the linguistic history of India. When the first anatomically modern humans reached India ca. 70,000 years ago, they almost certainly used language. But all the direct evidence for language is much more recent, the oldest being written forms of language. Comparative linguistics allows us infer a great deal more about the history of language so that we can get a picture of how people spoke long before writing was even invented.

Like many historians I use the term India or, sometimes, Greater India, to mean then whole of the sub-continent, taking in the political territories of modern day Pakistan, India, Nepal, Sri Lanka, and Bangladesh. Given that the main languages of North India and Sri Lanka are all modern Indic Languages: Urdu, Panjabi, Hindi, Bihari, Bengali, Nepali, and Sinhala, the modern political divisions belie the common linguistic history they share. However, we must be a little cautious. Language, ethnicity, and geography can be independent variables when discussing culture. This essay mainly concerns languages and the speakers of languages. We cannot be sure of the ethnicity of these people.

We know with some certainty that the speakers of Old Indic languages (now represented only by Vedic) came from outside India. This is an unpopular thesis amongst Indian Nationalists, who try to make a case for Sanskrit arising in India and spreading out. Some would have us believe it is the original language (Cf Eco 1997). However, the relationship of Old Indic with Old Iranian and a variety of other internal evidence show that Indo-Iranian, an early offshoot from Proto-Indo-European that further split into two sub-families, Iranian and Indic, was spoken by nomadic peoples of Southern Central Asia. Old Indic is mostly distinguished from Old Iranian by a few sound changes. Later grammatical forms drifted apart as well, though the attested languages, Vedic and Avestan, were closely related. 

Comparative linguists showed in the late 18th century that Greek, Latin and Sanskrit are all so similar that they must have derived from a common ancestor. That hypothetical languages is nowadays called Proto-Indo-European (PIE) and the language family that it spawned is called Indo-European (IE). PIE also has a Germanic branch giving rise to all the Germanic languages (including English), a Slavonic branch incorporating all the Slavic languages, and takes in many of the languages of Iran and Afghanistan, not to mention Armenian. In addition, we have written evidence of a number of now dead Indo-European languages such as Tocharian and Khotanese from Central Asia. By comparing the changes in many languages, linguists are able to describe pragmatic 'rules' which describe how sounds and forms of words change. This procedure has been very successful in some areas. PIE is probably the best example. But the Sino-Tibetan language family also gives a clear view of the proto language that underlies them all. 

There have been efforts of varying success to try to cover all the languages of the world in this way. And this has naturally led some scholars to propose a further ancient layer of relatedness. So, for example, there is the conjectured Nostratic proto-language (or macro-family) that takes in Afroasiatic (including the Semitic languages), Kartvelian (Caucasian languages and possibly Basque), Indo-European, Uralic (including Finno-Ugric), Dravidian, Altaic (covering the Turkish, Central Asian, and probably Korean and Japanese), and Eskimo–Aleut. These macro-families are still controversial, though many of the objections are ideological, rather than logical.

A major branch of the PIE family is Indo-Iranian, taking in languages that were spoken throughout the combined sphere of influence of Persia and India, including large swathes of Central Asia. In this essay, I will refer to the Indian branch of the PIE or Indo-Iranian as Indic. It has previously been referred to as Aryan or Indo-Aryan, but these terms have been deprecated because of the racial overtones of the word 'aryan' and the discrediting of old ideas about race. Indic is a strictly linguistic term that gives us no information about ethnicity. We can talk about three phases of Indic: Old - principally attested as Vedic, though other variations must have existed (before ca. 500 BCE); Middle - attested by Pāḷi, Gāndhārī, and Apabramsa (ca 500 BCE - 1000 CE); and New or Modern (emerging in the last millennia).

When the speakers of Old Indic crossed the Hindu Kush and entered India, ca 1700-1500 BCE, they met people who spoke languages with a much longer history in Greater India.

There is a whole family of Dravidian languages, for example, including Tamil, Telegu, Malayalam, and Kannada. Today, the people who speak languages from the Dravidian family are a large minority (about 20%). Some linguists (e.g., McAlpin 1974, 1975, 1981) have noted a similarity between Dravidian and the language spoken in ancient Elam, near what is now the border of Iran and Iraq on the Red Sea. Written records of Elamite stretch back to 3000 BCE. McAlpin, et al, believe that Dravidian speakers split off from Elamite speakers and entered Indian very early, perhaps 4000 BC. Others are more doubtful (Blench 2008), dismissing the evidence as flimsy and pointing out affiliations with other language groups, as well. 

Less well known is the Austroasiatic family. This family of languages extends from the North-east of India to Vietnam. One Indian branch of this widely geographically spread out family, is Munda, with several languages spoken in small pockets of India today, but probably more widespread in the past. In Burma there is a strong overlay of Tibeto-Burman languages that descended from the north, but there are still enclaves of Austroasiatic speakers, as well. Genetic studies of Austroasiatic speakers suggest that the Austroasiatic language family may have arisen in India and spread east. 

Additionally, there are a number of languages in India that appear to be unrelated to any known languages. These language isolates, as they are called, are found in the so-called tribal peoples who seem never to have been assimilated into the mainstream of Indian culture (in other words, they were never Brahmanised).

Michael Witzel's exploration of the linguistic history of India begins by establishing his parameters; most important for the purposes of this essay is the periods of composition of the Ṛgveda (1999: 3).
  • I. The early Ṛgvedic period: c. 1700–1500 BCE: books (maṇḍala) 4, 5, 6, and maybe book 2, with the early hymns referring to the Yadu-Turvaśa, Anu-Druhyu tribes;
  • II. The middle (main) Ṛgvedic period, c. 1500–1350 BCE: books 3, 7, 8. 1–66 and 1.51–191; with a focus on the Bharata chieftain Sudās and his ancestors, and his rivals, notably Trasadasyu, of the closely related Pūru tribe.
  • III. The late Ṛgvedic period, c. 1350–1200 BCE: books 1.1–50, 8.67–103, 10.1–854; 10.85–191: with the descendant of the Pūru chieftain Trasadasyu, Kuruśravana, and the emergence of the super-tribe of the Kuru (under the post-RV Parikṣit).
These layers of composition have been established on the basis of "internal criteria of textual arrangement, of the ‘royal’ lineages, and independently from these, those of the poets (ṛṣis) who composed the hymns. About both groups of persons we know enough to be able to establish pedigrees which sustain each other." (1999: 3).

Dutch Indologist, F. B. J. Kuiper, had already identified some 383 words in the Ṛgveda that are not Indic and must be loan words from another language family. We know this because they break the phonetic rules of Indic languages. We can use an example from English to demonstrate this. We have a word ptolemaic, which comes from the Egyptian name Ptolemy. It refers to a particular view of the world as earth-centred. Now we know that ptolemaic cannot be a native English word because English words cannot start with /pt/, and, indeed, native English speakers cannot easily pronounce this sound combination and tend to just say /t/. It is clues like this that linguists use to identify loan words. And we have to take into account that loan words are often naturalised. Many loan words in English are Anglicized. So another loan word like chocolate has been altered to fit English spelling patterns from an original spelling more like xocolātl, which clearly breaks English phonetic rules. We also have a number of Yiddish loan words like shlemiel, shlep, shlock, shmaltz, shmuck, and shnoozle, etc., that defy, but also. to some extent. redefine English spelling. Similarly. no other Indic language has retroflex consonants (ṭ, ṭh, ḍ, ḍh, ṇ, ṣ), but Old Indic absorbed these from languages it met in India and they became a naturalised aspect of the Indic phonology by the time the Ṛgveda was composed.

It's not always possible to identify where a loan word has come from. But Kuiper and Witzel manage to identify most of the 300 words as belonging to Proto-Dravidian or Proto-Munda, with a few from other language families like Tibeto-Burman.

Perhaps the most striking finding that Witzel gives, repeatedly, in his essay, is that in the early Ṛgvedic period there are no loan words from Dravidian, e.g.:
"It is important to note that RV level I has no Dravidian loan words at all (details, below § 1.6); they begin to appear only in RV level II and III." (Witzel 1999: 6)
Ṛgvedic loans from Drav[idian] are visible, but they also are now datable only to middle and late Ṛgvedic (in the Greater Panjab), and they can both the localized and dated for the Post-Ṛgvedic texts. (Witzel 1999: 19)
This is an important finding. The landscape of the Ṛgveda is that of modern day Panjab. This is clear, for example, from the names of rivers that are mentioned, e.g., the Kabul, Indus, Sarasvati (now dried up) and Yamuna rivers.

Loan words from the earliest period are from the Austroasiatic language family, meaning that the people living in this area when the Vedic speakers arrived, spoke a variety of proto-Munda. This is important because it is believed that the people living in this area were the descendants of the collapsed Indus Valley Civilisation (IVC). They had scattered as the climate became much drier and caused their large scale cities to be unlivable. The IVC had disappeared by 1700 BCE. If the people of the Punjab, ca 1500 BCE, spoke a variety of proto-Munda, this strongly suggests that the people of the IVC also spoke an Austroasiatic language, rather than, as is usually supposed, a Dravidian or even Indic language. Indian nationalists often assume that the IVC spoke Sanskrit, but this was never plausible. Interestingly, the very name we have for the north of this region, Gandhāra, is itself an Austroasiatic loan word.

It's often suggested that, because there are northern pockets of Dravidian speakers, with whom the Vedic speakers presumably interacted, that Dravidian was once considerably more widespread and perhaps that the language of the IVC was Dravidian. The loan words in the Ṛgveda argue against this view. The north-western pockets of Dravidian could be isolated populations left behind by the migration of Dravidian speakers into Southern India from Mesopotamia. Those in the North-East are more consistent with a previously larger territory, but if they were ever on the Ganges Plain they were forced out of it completely, leaving remnant populations only as far north as mountain ranges on the southern edge of the Ganges Valley.


The picture that emerges is that Old Indic speaking people crossed the Hindu Kush in small numbers and met people who spoke a form of proto-Austroasiatic; and then later, perhaps as they penetrated further into the sub-continent, people who spoke proto-Dravidian languages. The Dravidian speakers, themselves probably immigrants had lived in India for some thousands of years already, displacing and assimilating even earlier waves of human migrants. The pockets of people who speak language isolates, not related to any known language, have presumably lived in India for a very long time. Indeed, they often pursue a hunter-gatherer lifestyle that reinforces this impression.

Other authors have suggested that the Old Indic speakers had the advantage of superior technology and this led them to dominate the original inhabitants. We can't really know how it happened at this distant time but, in any case, Indic languages came to dominate the North of India - from Afghanistan to the Ganges Delta. Again, it is worth repeating that language, culture, and location may not be correlated. To the extent that we can make comparisons, there were a few surviving similarities between the people who composed the Ṛgveda and those who composed the Avesta. But, in many respects, their cultures had diverged along with their languages. Zoroastrianism was the major innovation in Iran, although the dates of the founder are difficult to pin down; the most likely scenario places him a little after the Ṛgveda. Based on informal comments by Michael Witzel, I have argued for a trickle of Iranian tribes entering India ca. 1000-800 BCE, who ended up settling on the margins of the Central Ganges city states of the second urbanisation, especially Kosala and Magadha (Attwood 2012). Genetic studies suggest that, though their language came to be spoken throughout the Punjab and down into the Ganges Valley, the Vedic speakers contributed little to the gene pool, which is remarkably homogeneous in India. The genetic contribution is far less striking than we might imagine by patterns of culture or language family (Attwood 2012).

This poses a difficulty for Indian Nationalists who want Sanskrit to be the mother tongue of India (I'm not sure how they fit Dravidian into the picture) and for it to have originated within the subcontinent. People with this view often express their hatred of Michael Witzel, referring to him in extremely uncomplimentary terms. But, as rational people, we have to follow the evidence and allow it to guide us to conclusions, even when these are uncomfortable for us. And the evidence is abundantly clear in this case. If any language is the mother tongue, then it is probably Proto-Austroasiatic, the ancestor of the modern Munda and Austroasiatic languages. Sanskrit developed from Indo-Iranian, initially somewhere in Greater Iran, then was carried into India with Vedic speaking migrants. Since we know they were nomadic cattle herders (unlike, say, the Śākyas who were settled agriculturalists) they may have made the journey up the Khyber Pass seeking greener pastures.

In Attwood (2012) I tried to show that certain important features of early Indian Buddhist culture could be tied to Zoroastrianism and/or Iran. Unfortunately, all too often, the history of the region is divided into Indian and Iranian by academics. And thus I fear that many connections between the two regions have been overlooked. The connections that are evident seem to demand more attention from suitably qualified scholars. We know a great deal about the interactions of Greece and Persia, but far too little about relations between Persia and India.



Attwood, Jayarava. (2012) Possible Iranian Origins for Sākyas and Aspects of Buddhism. Journal of the Oxford Centre for Buddhist Studies. 3.

Blench, Roger (2008) Re-evaluating the linguistic prehistory of South Asia. Toshiki OSADA and Akinori UESUGI eds. 2008. Occasional Paper 3: Linguistics, Archaeology and the Human Past. pp. 159-178. Kyoto: Indus Project, Research Institute for Humanity and Nature.

Eco, Umberto. (1997) The Search for the Perfect Language. London: Fontana Press.

McAlpin, David W. (1974) Toward Proto-Elamo-Dravidian. Language 50: 89-101.

McAlpin, David W. (1975) Elamite and Dravidian: Further Evidence of Relationship. Current Anthropology 16: 105-115.

McAlpin, David W. (1981) Proto Elamo Dravidian: The Evidence and Its Implications. American Philosophy Society.

Witzel, Michael. (1999) Substrate Languages in Old Indo-Aryan: Ṛgvedic, Middle and Late Vedic. Electronic Journal of Vedic Studies. 5(1): 1–67.
Related Posts with Thumbnails