The Turing test Can a computer pass for a human Alex Gendler

What is consciousness?

Can an artificial machine really think?

Does the mind just consist of neurons
in the brain,

or is there some intangible spark
at its core?

For many, these have been
vital considerations

for the future of artificial intelligence.

But British computer scientist Alan Turing
decided to disregard all these questions

in favor of a much simpler one:

can a computer talk like a human?

This question led to an idea for measuring
aritificial intelligence

that would famously come to be known
as the Turing test.

In the 1950 paper, “Computing Machinery
and Intelligence,”

Turing proposed the following game.

A human judge has a text conversation
with unseen players

and evaluates their responses.

To pass the test, a computer must
be able to replace one of the players

without substantially
changing the results.

In other words, a computer would be
considered intelligent

if its conversation couldn’t be easily
distinguished from a human’s.

Turing predicted that by the year 2000,

machines with 100 megabytes of memory
would be able to easily pass his test.

But he may have jumped the gun.

Even though today’s computers
have far more memory than that,

few have succeeded,

and those that have done well

focused more on finding clever ways
to fool judges

than using overwhelming computing power.

Though it was never subjected
to a real test,

the first program with
some claim to success was called ELIZA.

With only a fairly short
and simple script,

it managed to mislead many people
by mimicking a psychologist,

encouraging them to talk more

and reflecting their own questions
back at them.

Another early script PARRY
took the opposite approach

by imitating a paranoid schizophrenic

who kept steering the conversation
back to his own preprogrammed obsessions.

Their success in fooling people
highlighted one weakness of the test.

Humans regularly attribute intelligence
to a whole range of things

that are not actually intelligent.

Nonetheless, annual competitions
like the Loebner Prize,

have made the test more formal

with judges knowing ahead of time

that some of their conversation partners
are machines.

But while the quality has improved,

many chatbot programmers have used
similar strategies to ELIZA and PARRY.

1997’s winner Catherine

could carry on amazingly focused
and intelligent conversation,

but mostly if the judge wanted
to talk about Bill Clinton.

And the more recent winner
Eugene Goostman

was given the persona of a
13-year-old Ukrainian boy,

so judges interpreted its nonsequiturs
and awkward grammar

as language and culture barriers.

Meanwhile, other programs like Cleverbot
have taken a different approach

by statistically analyzing huge databases
of real conversations

to determine the best responses.

Some also store memories
of previous conversations

in order to improve over time.

But while Cleverbot’s individual responses
can sound incredibly human,

its lack of a consistent personality

and inability to deal
with brand new topics

are a dead giveaway.

Who in Turing’s day could have predicted
that today’s computers

would be able to pilot spacecraft,

perform delicate surgeries,

and solve massive equations,

but still struggle with
the most basic small talk?

Human language turns out to be
an amazingly complex phenomenon

that can’t be captured by even
the largest dictionary.

Chatbots can be baffled by simple pauses,
like “umm…”

or questions with no correct answer.

And a simple conversational sentence,

like, “I took the juice out of the fridge
and gave it to him,

but forgot to check the date,”

requires a wealth of underlying knowledge
and intuition to parse.

It turns out that simulating
a human conversation

takes more than just increasing
memory and processing power,

and as we get closer to Turing’s goal,

we may have to deal with all those big
questions about consciousness after all.

什么是意识？

人造机器真的会思考吗？

心智只是由大脑中的神经元组成
，

还是其核心有一些无形的
火花？

对于许多人来说，这些都是

人工智能未来的重要考虑因素。

但英国计算机科学家艾伦·图灵
决定忽略所有这些问题

，转而提出一个更简单的问题：

计算机能像人类一样说话吗？

这个问题引发了一个测量人工智能的想法，

该想法后来被
称为图灵测试。

在 1950 年的论文“计算机
与智能”中，

图灵提出了以下博弈。

人类裁判
与看不见的玩家

进行文字对话，并评估他们的反应。

要通过测试，计算机
必须能够在

不显着
改变结果的情况下替换其中一名玩家。

换句话说，如果计算机

的对话不能轻易
地与人类的对话区分开来，它就会被认为是智能的。

图灵预测，到 2000 年，

拥有 100 兆内存的机器
将能够轻松通过他的测试。

但他可能已经过火了。

尽管今天的计算机
拥有比这更多的内存，但

很少有人能成功，

而那些做得好的人

更多地专注于寻找巧妙的方法
来欺骗法官，而

不是使用压倒性的计算能力。

虽然它从未
经过真正的测试

，但第
一个声称成功的程序被称为 ELIZA。

它只有一个相当简短的
脚本，

它通过模仿心理学家来误导许多人
，

鼓励他们多说话，

并向他们反映自己的问题
。

另一个早期的剧本 PARRY
采取了相反的方法

，模仿了一个偏执的精神分裂症

患者，他不断地将谈话
带回到他自己预先编程的痴迷上。

他们在愚弄人们方面的成功
凸显了该测试的一个弱点。

人类经常将智能
归因于一系列

实际上并不智能的事物。

尽管如此，
像 Loebner 奖这样的年度比赛

使测试变得更加正式

，评委们提前

知道他们的一些对话伙伴
是机器。

但是，虽然质量有所提高，但

许多聊天机器人程序员使用了
与 ELIZA 和 PARRY 类似的策略。

1997 年的获胜者凯瑟琳

可以进行令人惊讶的专注
和聪明的对话，

但主要是如果法官
想谈论比尔克林顿。

最近的获胜者
尤金·古斯特曼（Eugene Goostman）

被赋予了一个
13 岁的乌克兰男孩的角色，

因此评委们将其不合逻辑
和笨拙的语法解释

为语言和文化障碍。

与此同时，Cleverbot 等其他程序
采用了不同的方法，

通过统计分析大量
真实对话的数据库

来确定最佳响应。

有些还存储
以前对话的记忆，

以便随着时间的推移而改进。

但是，虽然 Cleverbot 的个人
反应听起来非常人性化，但

它缺乏一致的个性

和无法
处理全新的话题

是一个死的赠品。

在图灵的时代，谁能预料
到今天的计算机

能够驾驶宇宙飞船、

进行精密的手术

、解决大量的方程，

但仍然
在最基本的闲聊中挣扎？

事实证明，人类语言是
一种非常复杂的现象

，即使
是最大的词典也无法捕捉到。

聊天机器人可能会因简单的停顿而感到困惑，
例如“嗯……”

或没有正确答案的问题。

而一个简单的对话句子，

比如“我从冰箱里拿出果汁
给他，

但忘记检查日期”，

需要大量的基础知识
和直觉来解析。

事实证明，
模拟人类对话

不仅仅需要增加
记忆力和处理能力，

而且随着我们越来越接近图灵的目标，

我们可能不得不处理
所有关于意识的大问题。