A Rosetta Stone for the Indus script Rajesh Rao

[Music]

[Applause]

I’d like to begin with a thought

experiment imagine that it’s 4,000 years

into the future civilization as we know

it has ceased to exist no books no

electronic devices no Facebook or

Twitter all knowledge of the English

language and the English alphabet has

been lost

now imagine archaeologists digging

through the rubble of one of our cities

what might they find well perhaps some

rectangular pieces of plastic with

strange symbols on them perhaps some

circular pieces of metal maybe some

cylindrical containers with some symbols

on them and perhaps when archaeologists

becomes an instant celebrity when she

discovers buried in the hills somewhere

in North America masu versions of these

same symbols now let’s ask ourselves

what could such artifacts say about us

to people 4,000 years into the future

now this is no hypothetical question in

fact this is exactly the kind of

question we’re face to it when we try to

understand the Indus Valley Civilisation

which existed 4000 years ago now the

Indus civilization was roughly

contemporaneous with the much better

known Egyptian and the Mesopotamian

civilizations was actually much larger

than either of these two civilizations

so it occupies an area of approximately

1 million square kilometers covering

what is now Pakistan northwestern India

and parts of Afghanistan and Iran now

given that it was such a vast

civilization you might expect to find

really powerful rulers kings and huge

monuments glorifying these powerful

kings in fact what archaeologists have

found is none of that they found small

objects such as these so here’s an

example of one of these objects well

obviously there’s a replica but who is

this person a king a god a priest or

perhaps an ordinary person like you or

me we don’t know

but the Indus people also left behind

artifacts with writing on them well no

not pieces of plastic but stone seals

copper tablets pottery and surprisingly

one large signboard which was found

buried near the gate of a city now we

don’t know if it says Hollywood or even

Bollywood for that matter in fact we

don’t even know what any of these

objects say and that’s because the Indus

script is undeciphered we don’t know

what any of these symbols mean

now the symbols are most commonly found

on seals so you see up there one such

object it’s the square object with the

unicorn like animal on it now that’s the

magnificent piece of art so how big do

you think that is perhaps that big that

might be that big well let me show you

so here’s a replica of one such seal

it’s only about one inch by one inch in

size pretty tiny so what were these used

for we know that these were used for

stamping clay tags that were attached to

bundles of goods that were sent from one

place to the other so you know those

packing slips you get on your FedEx

boxes so these were used to make those

kinds of packing slips now you might

wonder what these objects contain in

terms of their text so perhaps they’re

the name of the sender or some

information about the goods that are

being sent from one place to the other

we don’t know we need to decipher the

script to answer that question now

deciphering the script is not just an

intellectual puzzle it’s actually become

a question that’s become deeply

intertwined with the politics and the

cultural history of South Asia in fact

the script has become a battleground of

sorts between three different groups of

people so first there’s a group of

people who are very passionate in their

belief that the Indus script does not

represent a language at all these people

believe that these symbols are very

similar to the kind of symbols you find

on traffic signs or the emblems you find

on shields now there’s a second group of

people who believe that the Indus script

represents an indo-european language so

if you look at a map of India today

you’ll see that most of the languages

spoken in North India belong to the

indo-european language family so some

people believe that the Indus script

represents an ancient indo-european

language such as Sanskrit now there’s a

scoop of people who believe that the

Indus people but the ancestors of people

living in South India today now these

people believe that the Indus script

represents an ancient form of the

Dravidian language family which is the

language family spoken in much of South

India today and the proponents of this

theory point to that small pocket of

Dravidian speaking people in the North

actually near Afghanistan and they say

that perhaps sometime in the past the

region languages were spoken all over

India and that this suggests that the

Indus civilizations perhaps also

Dravidian now which of these hypotheses

can be true we don’t know but perhaps if

you decipher the script you would be

able to answer this question but

deciphering the script is a very

challenging task first there’s no

rosetta stone I don’t mean the software

I mean an ancient artifact that contains

the same text in both in known texts and

in an unknown text so we don’t have such

an artifact for the Indus script and

furthermore we don’t even know what

language they spoke and to make matters

even worse most of the texts that we

have are extremely short so as I showed

you they’re usually found on these seals

that are very very tiny and so given

these formidable obstacles one might

wonder and worried whether one will ever

be able to decipher the Indus script so

in the rest of my talk I’d like to tell

you about how I learned to stop worrying

and love the challenge posed by the

Indus script now I’ve always been

fascinated by the Indus script ever

since I read about it in a middle school

textbook and what was the fascinated

well it’s the last major undeciphered

script in the ancient world now my

career path led me to become a

computational neuroscientist so in my

day job I create computer models of the

brain to try to understand how the brain

makes predictions how the brain makes

decisions how the brain learns and so on

but in 2007 my paths crossed again with

the Indus script that’s when I was in

India and I had the wonderful

opportunity to meet with some Indian

scientists who were using computer

models to try to analyze the script and

so it was then that I realized it was an

opportunity for me to collaborate with

these scientists and so I jumped at the

opportunity and I’d like to describe

some of the results that we’ve found or

better yet let’s talk collectively

decipher you ready

the first thing that you need to do when

you have an undersized

strip is try to figure out the direction

of riding so here are two tacks that

contain some symbols on them so can you

tell me if the direction of writing is

right to left or left to right I’ll give

you a couple of seconds okay right to

left how many okay

left to right oh it’s almost 50/50 okay

so the answer is if you look at the

left-hand side of the two texts you’ll

notice that there’s a cramping of signs

and it seems like four thousand years

ago when the scribe was writing from

right to left

they ran out of space and so they had to

cram the signs and one of the signs is

also below the the text on the top and

so this suggests the direction of

writing is probably from right to left

and so that’s one of the first things we

know the directionality is a very key

aspect of linguistics scripts and the

Indus script now has this particular

property what are the properties of

language does it show so languages

contain patterns right so if I give you

the letter Q and I ask you to predict

the next letter what do you think that

would be most of you said you which is

right now if I ask you to predict one

more letter what do you think that would

be another several possibilities

e it could be I it could be a but

certainly not B C or D right now the

Indus script also exhibits similar kinds

of patterns so it’s a lot of texts that

start with this diamond-shaped symbol

and this in turn tends to be followed by

this quotation marks like symbol and

this is very similar to a Q a new

example this symbol can in turn be

followed by these fish-like symbols and

some other signs but never by these

other signs at the bottom and

furthermore there’s some signs that

really prefer the end of texts such as

this er shaped sign and this sign in

fact happens to be also the most

frequently occurring sign in this script

now given such patterns here was her

idea right so the idea was to use a

computer to learn these patterns and so

if we give the computer the existing

texts and the computer learned a

statistical model of which symbols tend

to occur together and with symbols tend

to follow each other

now given the computer model we can test

the model by essentially quizzing it so

we could deliberately erase some symbols

and we can ask it to predict the missing

symbols so here are some examples

so you may regard this as perhaps the

most ancient game of the Wheel of

Fortune so what we found was that the

computer was successful in 75% of the

cases and predicting the correct symbol

now in the rest of the cases typically

the second best guess from the third

best guess was the right answer

now there’s also practical use for this

particular procedure there’s a lot of

these texts that are damaged so here’s

an example of one such text and we can

use the computer model now to try to

complete this text and make a best guess

prediction so here’s an example of a

symbol that was predicted and this could

be really useful as we try to decipher

the script by generating more data that

we can analyze now here’s one other

thing you could do with the computer

model

so imagine a monkey sitting at a

keyboard okay you might get a random

jumble of letters that looks like this

now such a random jumble of letters is

said to have a very high entropy this

that physics and information theory term

but just imagine it’s a very random

jumble of letters now how many of yours

ever spilled coffee on a keyboard you

might have encountered the stuck key

problem so basically the same symbol

being repeated over and over again now

this kind of a sequence is said to have

a very low entropy because there’s no

variation at all

now language on the other hand has an

intermediate level of entropy it’s

neither too rigid nor is it too random

what about the Indus script so here’s a

graph that plots the entropy is a whole

bunch of sequences so the very top you

find the uniformly random sequence which

is a random jumble of letters and

interesting we also find a DNA sequence

from the human genome and instrumental

music and both of these are very very

flexible which is why would you find

them at the very high range now the

lower end of the scale you find a rigid

sequence the sequence of all A’s and you

also find a computer program in this

case in the language Fortran which are

based really strict rules linguistic

scripts occupy the middle range now what

about the Indus script so we found that

the industry actually falls within the

range of the linguistic scripts now when

this result was first published it was

highly controversial there were people

who are raising a human cry and these

people were the ones who believed that

the Indus script does not represent

language

started to get some hate mail my

students said that I should really

seriously consider getting some

protection now who would have thought

that deciphering could be a dangerous

profession now what does this result

really show it shows that the Indus

script shares an important property of

language so as the old saying goes if it

looks like a linguistic script and it

acts like a linguistic script then

perhaps we may have a linguistic script

on our hands so what are their evidence

is there that the script could actually

encode language well linguistic scripts

can actually encode multiple languages

so for example here’s the same sentence

with written in English and the same

sentence with written and Dutch using

the same letters of the alphabet now if

you don’t know Dutch and you only know

English and I give you some words in

Dutch you’ll tell me that these words

contain some very unusual patterns some

things are not right and you say these

words are probably not English words now

the same thing also happens in the case

of the Indus script so the computer

found several texts two of them are

shown here that have very unusual

patterns so for example the first text

there’s a doubling of this jar shaped

sign now this sign is the most

frequently occurring sign in the

industry that’s only in this text that

it occurs as a doubling pair so why is

that the case we went back and looked at

where these particular texts were found

it turns out that they were found very

very far away from the Indus Valley they

were found in present-day Iraq and Iran

and why would they found there so what

haven’t told you is that the Indus

people were very very enterprising they

used to trade with people pretty far

away from where they lived and so in

this case they were traveling by sea all

the way to Mesopotamia present-day Iraq

and what seems to happen here is that

the Indus traders the the merchants were

using the scrip to write a foreign

language it’s just like our English and

Dutch example and that would explain why

we have these strange patterns that are

very different from the kinds of

patterns you see in the texts that are

found within the Indus Valley so this

suggests the same script the Indus

script could be used to write different

languages the results we have so far

seem to point to the conclusion that the

Indus script probably does flips in

language so if it does rips in language

then how do we read the symbol that’s

the next big challenge so you’ll notice

that many of the symbols look like

pictures of humans of insects of fishes

of birds so most ancient scripts use the

rebus principle which is using pictures

to riffs and words so as an example

here’s a word can you write it using

pictures I’ll give you a couple of

seconds got it okay great so here’s my

solution so you could use the picture of

a bee forward by picture of a leaf and

that’s belief right it could be other

solutions now in the case of the in

descript the problem is the reverse so

you have to figure out the sounds of

each of these pictures such that the

entire sequence makes sense so this is

just like a crossword puzzle except that

this is the mother of all crossword

puzzles because the stakes are so high

if you solve it now my colleagues here

what the Mahadevan and Oscar parpola

have been making some headway on this

particular problem and I’d like to give

you a quick example of papoulas work so

here’s a really short text it contains

seven vertical strokes followed by this

fish like sign and I already mentioned

that these seals were used for stamping

clay tags and they’re attached to

bundles of goods so it’s quite likely

that these texts at least some of them

contain names of merchants and it turns

out that in India

there’s a long tradition of names being

based on horoscopes and the and star

constellations present at the time of

birth in Dravidian languages the word

for fish is mean which happens to sound

just like the word for star and so seven

stars would stand for a lumen which is

the Dravidian word for the Big Dipper

Star constellation now similarly there’s

another sequence of six jars and that

translates to arrow mean which is the

old Dravidian name for the Star

constellation Pleiades and finally

there’s other combinations such as this

fish sign with something that looks like

a roof on top of it and that could be

translated into may mean which is the

old Dravidian name for the planet Saturn

so this is pretty exciting it looks like

we’re getting somewhere but does this

prove that the these seals contain

Dravidian names based on planets and

star constellation

well not yet so we cannot we have no way

of validating these particular readings

but if more and more of these readings

start making sense and if more if longer

and longer sequences are appeared to be

correct then we know that we’re on the

right track so today we can write a word

such as Ted in Egyptian hieroglyphics

and in the cuneiform script because both

of these were deciphered in the 19th

century the decipherment of these two

scripts enables these civilizations to

speak to us again directly now the

Mayans started speaking to us in the

20th century but the Indus civilization

remains silent so why should we care

well the Indus civilization does not

belong to just the South Indians or the

North Indians or the Pakistanis it

belongs to all of us so these are our

ancestors yours and mine they were

silenced by an unfortunate accident of

history now if you decipher the script

you would enable them to speak to us

again so what would they tell us what

would we find out about them about us I

can’t wait to find out thank you

[Applause]