What happens when our computers get smarter than we are Nick Bostrom

I work with a bunch of mathematicians,
philosophers and computer scientists,

and we sit around and think about
the future of machine intelligence,

among other things.

Some people think that some of these
things are sort of science fiction-y,

far out there, crazy.

But I like to say,

okay, let’s look at the modern
human condition.

(Laughter)

This is the normal way for things to be.

But if we think about it,

we are actually recently arrived
guests on this planet,

the human species.

Think about if Earth
was created one year ago,

the human species, then,
would be 10 minutes old.

The industrial era started
two seconds ago.

Another way to look at this is to think of
world GDP over the last 10,000 years,

I’ve actually taken the trouble
to plot this for you in a graph.

It looks like this.

(Laughter)

It’s a curious shape
for a normal condition.

I sure wouldn’t want to sit on it.

(Laughter)

Let’s ask ourselves, what is the cause
of this current anomaly?

Some people would say it’s technology.

Now it’s true, technology has accumulated
through human history,

and right now, technology
advances extremely rapidly –

that is the proximate cause,

that’s why we are currently
so very productive.

But I like to think back further
to the ultimate cause.

Look at these two highly
distinguished gentlemen:

We have Kanzi –

he’s mastered 200 lexical
tokens, an incredible feat.

And Ed Witten unleashed the second
superstring revolution.

If we look under the hood,
this is what we find:

basically the same thing.

One is a little larger,

it maybe also has a few tricks
in the exact way it’s wired.

These invisible differences cannot
be too complicated, however,

because there have only
been 250,000 generations

since our last common ancestor.

We know that complicated mechanisms
take a long time to evolve.

So a bunch of relatively minor changes

take us from Kanzi to Witten,

from broken-off tree branches
to intercontinental ballistic missiles.

So this then seems pretty obvious
that everything we’ve achieved,

and everything we care about,

depends crucially on some relatively minor
changes that made the human mind.

And the corollary, of course,
is that any further changes

that could significantly change
the substrate of thinking

could have potentially
enormous consequences.

Some of my colleagues
think we’re on the verge

of something that could cause
a profound change in that substrate,

and that is machine superintelligence.

Artificial intelligence used to be
about putting commands in a box.

You would have human programmers

that would painstakingly
handcraft knowledge items.

You build up these expert systems,

and they were kind of useful
for some purposes,

but they were very brittle,
you couldn’t scale them.

Basically, you got out only
what you put in.

But since then,

a paradigm shift has taken place
in the field of artificial intelligence.

Today, the action is really
around machine learning.

So rather than handcrafting knowledge
representations and features,

we create algorithms that learn,
often from raw perceptual data.

Basically the same thing
that the human infant does.

The result is A.I. that is not
limited to one domain –

the same system can learn to translate
between any pairs of languages,

or learn to play any computer game
on the Atari console.

Now of course,

A.I. is still nowhere near having
the same powerful, cross-domain

ability to learn and plan
as a human being has.

The cortex still has some
algorithmic tricks

that we don’t yet know
how to match in machines.

So the question is,

how far are we from being able
to match those tricks?

A couple of years ago,

we did a survey of some of the world’s
leading A.I. experts,

to see what they think,
and one of the questions we asked was,

“By which year do you think
there is a 50 percent probability

that we will have achieved
human-level machine intelligence?”

We defined human-level here
as the ability to perform

almost any job at least as well
as an adult human,

so real human-level, not just
within some limited domain.

And the median answer was 2040 or 2050,

depending on precisely which
group of experts we asked.

Now, it could happen much,
much later, or sooner,

the truth is nobody really knows.

What we do know is that the ultimate
limit to information processing

in a machine substrate lies far outside
the limits in biological tissue.

This comes down to physics.

A biological neuron fires, maybe,
at 200 hertz, 200 times a second.

But even a present-day transistor
operates at the Gigahertz.

Neurons propagate slowly in axons,
100 meters per second, tops.

But in computers, signals can travel
at the speed of light.

There are also size limitations,

like a human brain has
to fit inside a cranium,

but a computer can be the size
of a warehouse or larger.

So the potential for superintelligence
lies dormant in matter,

much like the power of the atom
lay dormant throughout human history,

patiently waiting there until 1945.

In this century,

scientists may learn to awaken
the power of artificial intelligence.

And I think we might then see
an intelligence explosion.

Now most people, when they think
about what is smart and what is dumb,

I think have in mind a picture
roughly like this.

So at one end we have the village idiot,

and then far over at the other side

we have Ed Witten, or Albert Einstein,
or whoever your favorite guru is.

But I think that from the point of view
of artificial intelligence,

the true picture is actually
probably more like this:

AI starts out at this point here,
at zero intelligence,

and then, after many, many
years of really hard work,

maybe eventually we get to
mouse-level artificial intelligence,

something that can navigate
cluttered environments

as well as a mouse can.

And then, after many, many more years
of really hard work, lots of investment,

maybe eventually we get to
chimpanzee-level artificial intelligence.

And then, after even more years
of really, really hard work,

we get to village idiot
artificial intelligence.

And a few moments later,
we are beyond Ed Witten.

The train doesn’t stop
at Humanville Station.

It’s likely, rather, to swoosh right by.

Now this has profound implications,

particularly when it comes
to questions of power.

For example, chimpanzees are strong –

pound for pound, a chimpanzee is about
twice as strong as a fit human male.

And yet, the fate of Kanzi
and his pals depends a lot more

on what we humans do than on
what the chimpanzees do themselves.

Once there is superintelligence,

the fate of humanity may depend
on what the superintelligence does.

Think about it:

Machine intelligence is the last invention
that humanity will ever need to make.

Machines will then be better
at inventing than we are,

and they’ll be doing so
on digital timescales.

What this means is basically
a telescoping of the future.

Think of all the crazy technologies
that you could have imagined

maybe humans could have developed
in the fullness of time:

cures for aging, space colonization,

self-replicating nanobots or uploading
of minds into computers,

all kinds of science fiction-y stuff

that’s nevertheless consistent
with the laws of physics.

All of this superintelligence could
develop, and possibly quite rapidly.

Now, a superintelligence with such
technological maturity

would be extremely powerful,

and at least in some scenarios,
it would be able to get what it wants.

We would then have a future that would
be shaped by the preferences of this A.I.

Now a good question is,
what are those preferences?

Here it gets trickier.

To make any headway with this,

we must first of all
avoid anthropomorphizing.

And this is ironic because
every newspaper article

about the future of A.I.
has a picture of this:

So I think what we need to do is
to conceive of the issue more abstractly,

not in terms of vivid Hollywood scenarios.

We need to think of intelligence
as an optimization process,

a process that steers the future
into a particular set of configurations.

A superintelligence is
a really strong optimization process.

It’s extremely good at using
available means to achieve a state

in which its goal is realized.

This means that there is no necessary
connection between

being highly intelligent in this sense,

and having an objective that we humans
would find worthwhile or meaningful.

Suppose we give an A.I. the goal
to make humans smile.

When the A.I. is weak, it performs useful
or amusing actions

that cause its user to smile.

When the A.I. becomes superintelligent,

it realizes that there is a more
effective way to achieve this goal:

take control of the world

and stick electrodes into the facial
muscles of humans

to cause constant, beaming grins.

Another example,

suppose we give A.I. the goal to solve
a difficult mathematical problem.

When the A.I. becomes superintelligent,

it realizes that the most effective way
to get the solution to this problem

is by transforming the planet
into a giant computer,

so as to increase its thinking capacity.

And notice that this gives the A.I.s
an instrumental reason

to do things to us that we
might not approve of.

Human beings in this model are threats,

we could prevent the mathematical
problem from being solved.

Of course, perceivably things won’t
go wrong in these particular ways;

these are cartoon examples.

But the general point here is important:

if you create a really powerful
optimization process

to maximize for objective x,

you better make sure
that your definition of x

incorporates everything you care about.

This is a lesson that’s also taught
in many a myth.

King Midas wishes that everything
he touches be turned into gold.

He touches his daughter,
she turns into gold.

He touches his food, it turns into gold.

This could become practically relevant,

not just as a metaphor for greed,

but as an illustration of what happens

if you create a powerful
optimization process

and give it misconceived
or poorly specified goals.

Now you might say, if a computer starts
sticking electrodes into people’s faces,

we’d just shut it off.

A, this is not necessarily so easy to do
if we’ve grown dependent on the system –

like, where is the off switch
to the Internet?

B, why haven’t the chimpanzees
flicked the off switch to humanity,

or the Neanderthals?

They certainly had reasons.

We have an off switch,
for example, right here.

(Choking)

The reason is that we are
an intelligent adversary;

we can anticipate threats
and plan around them.

But so could a superintelligent agent,

and it would be much better
at that than we are.

The point is, we should not be confident
that we have this under control here.

And we could try to make our job
a little bit easier by, say,

putting the A.I. in a box,

like a secure software environment,

a virtual reality simulation
from which it cannot escape.

But how confident can we be that
the A.I. couldn’t find a bug.

Given that merely human hackers
find bugs all the time,

I’d say, probably not very confident.

So we disconnect the ethernet cable
to create an air gap,

but again, like merely human hackers

routinely transgress air gaps
using social engineering.

Right now, as I speak,

I’m sure there is some employee
out there somewhere

who has been talked into handing out
her account details

by somebody claiming to be
from the I.T. department.

More creative scenarios are also possible,

like if you’re the A.I.,

you can imagine wiggling electrodes
around in your internal circuitry

to create radio waves that you
can use to communicate.

Or maybe you could pretend to malfunction,

and then when the programmers open
you up to see what went wrong with you,

they look at the source code – Bam! –

the manipulation can take place.

Or it could output the blueprint
to a really nifty technology,

and when we implement it,

it has some surreptitious side effect
that the A.I. had planned.

The point here is that we should
not be confident in our ability

to keep a superintelligent genie
locked up in its bottle forever.

Sooner or later, it will out.

I believe that the answer here
is to figure out

how to create superintelligent A.I.
such that even if – when – it escapes,

it is still safe because it is
fundamentally on our side

because it shares our values.

I see no way around
this difficult problem.

Now, I’m actually fairly optimistic
that this problem can be solved.

We wouldn’t have to write down
a long list of everything we care about,

or worse yet, spell it out
in some computer language

like C++ or Python,

that would be a task beyond hopeless.

Instead, we would create an A.I.
that uses its intelligence

to learn what we value,

and its motivation system is constructed
in such a way that it is motivated

to pursue our values or to perform actions
that it predicts we would approve of.

We would thus leverage
its intelligence as much as possible

to solve the problem of value-loading.

This can happen,

and the outcome could be
very good for humanity.

But it doesn’t happen automatically.

The initial conditions
for the intelligence explosion

might need to be set up
in just the right way

if we are to have a controlled detonation.

The values that the A.I. has
need to match ours,

not just in the familiar context,

like where we can easily check
how the A.I. behaves,

but also in all novel contexts
that the A.I. might encounter

in the indefinite future.

And there are also some esoteric issues
that would need to be solved, sorted out:

the exact details of its decision theory,

how to deal with logical
uncertainty and so forth.

So the technical problems that need
to be solved to make this work

look quite difficult –

not as difficult as making
a superintelligent A.I.,

but fairly difficult.

Here is the worry:

Making superintelligent A.I.
is a really hard challenge.

Making superintelligent A.I. that is safe

involves some additional
challenge on top of that.

The risk is that if somebody figures out
how to crack the first challenge

without also having cracked
the additional challenge

of ensuring perfect safety.

So I think that we should
work out a solution

to the control problem in advance,

so that we have it available
by the time it is needed.

Now it might be that we cannot solve
the entire control problem in advance

because maybe some elements
can only be put in place

once you know the details of the
architecture where it will be implemented.

But the more of the control problem
that we solve in advance,

the better the odds that the transition
to the machine intelligence era

will go well.

This to me looks like a thing
that is well worth doing

and I can imagine that if
things turn out okay,

that people a million years from now
look back at this century

and it might well be that they say that
the one thing we did that really mattered

was to get this thing right.

Thank you.

(Applause)

我与一群数学家、
哲学家和计算机科学家一起工作

，我们坐在一起思考
机器智能的未来

等等。

有些人认为其中一些
东西有点像科幻小说，

很遥远，很疯狂。

但我想说，

好吧，让我们看看现代
人类的状况。

（笑声）

这是正常的事情。

但如果我们仔细想想，

我们实际上是
这个星球上最近到达的客人

，人类物种。

想想如果地球
是在一年前创造的

，那么人类物种
就会有 10 分钟的历史。

工业时代开始于
两秒前。

另一种看待这个问题的方法是想想
过去 10,000 年的世界 GDP，

我实际上已经不厌其烦
地在图表中为你绘制了这个。

它看起来像这样。

（笑声）

这
是正常情况下奇怪的形状。

我肯定不想坐在上面。

（笑声）

让我们扪心自问，
目前这种异常现象的原因是什么？

有人会说这是技术。

现在是真的，技术
在人类历史上已经积累起来，

而现在，技术
进步得非常快——

这是近因，

这就是为什么我们现在
如此高效的原因。

但我喜欢进一步回想
最终的原因。

看看这两位非常
杰出的先生：

我们有 Kanzi——

他掌握了 200 个词汇
标记，这是一项了不起的壮举。

Ed Witten 引发了第二次
超弦革命。

如果我们深入了解，
这就是我们所发现的：

基本上是一样的。

一个稍大一些，

它可能也有一些
与它的接线方式完全相同的技巧。然而，

这些无形的差异
不能太复杂，因为

自我们最后一个共同祖先以来，只有 250,000 代人。

我们知道，复杂的机制
需要很长时间才能进化。

因此，一系列相对较小的变化

将我们从 Kanzi 带到了 Witten，

从折断的树枝
到洲际弹道导弹。

因此，这似乎很明显
，我们所取得的一切，

以及我们关心的一切，都

至关重要地取决于一些相对较小的
改变，这些改变使人类的思想。

当然，推论
是，

任何可能显着
改变思维基础的进一步变化

都可能产生
巨大的后果。

我的一些同事
认为，我们正处于

可能
导致该基质发生深刻变化的事物的边缘

，那就是机器超级智能。

人工智能曾经
是把命令放在一个盒子里。

你会有人类程序员

，他们会煞费苦心地
手工制作知识项目。

你建立了这些专家系统

，它们
对某些用途很有用，

但它们非常脆弱，
你无法扩展它们。

基本上，你只拿出
你投入的东西。

但从那时起，人工智能

领域发生了范式转变
。

今天，行动实际上是
围绕机器学习展开的。

因此，我们不是手工制作知识
表示和特征，

而是创建
通常从原始感知数据中学习的算法。

基本上与
人类婴儿所做的事情相同。

结果是人工智能。这不仅
限于一个领域

——同一个系统可以学习
在任何一对语言之间进行翻译，

或者学习在 Atari 控制台上玩任何计算机游戏
。

当然，现在，

A.I. 仍然远
没有像人类那样拥有强大的跨领域

学习和计划能力
。

皮层仍然有一些

我们还不知道
如何在机器中匹配的算法技巧。

所以问题是，

我们
离匹配这些技巧还有多远？

几年前，

我们对一些世界领先的人工智能进行了调查。
专家们

，看看他们是怎么想的
，我们问的一个问题是，

“你认为到哪一年我们
有 50% 的可能性

达到
人类水平的机器智能？”

我们在这里将人类水平定义

为至少与成年人一样完成几乎任何工作的能力
，

因此是真正的人类水平，而不仅仅是
在某些有限的领域内。

中值答案是 2040 年或 2050 年，

具体取决于
我们询问的专家组。

现在，它可能会在很久
很久以后或更快发生

，事实是没有人真正知道。

我们所知道的是，

机器基板中信息处理的最终限制远远
超出了生物组织的限制。

这归结为物理学。

一个生物神经元可能
以 200 赫兹、每秒 200 次的频率放电。

但即使是当今的晶体管也
以千兆赫兹运行。

神经元在轴突中以
每秒 100 米的速度缓慢传播。

但在计算机中，信号可以
以光速传播。

也有大小限制，

就像人脑
必须放在头盖骨内，

但计算机可以
是仓库的大小或更大。

因此，超级智能的潜力
潜伏在物质中，

就像原子的力量在
整个人类历史中都处于休眠状态，

耐心地等待到 1945

年。在本世纪，

科学家们可能会学会唤醒
人工智能的力量。

我认为我们可能会
看到情报爆炸。

现在大多数人，当他们
想到什么是聪明什么是愚蠢的时候，

我想大概会想到这样一个画面
。

所以在一端，我们有村里的白痴，

然后在另一端，

我们有 Ed Witten，或 Albert Einstein，
或者任何你最喜欢的大师。

但我认为，从人工智能的角度来看

，真实的情况实际上
可能更像这样：

人工智能从这里开始
，零智能，

然后经过很多很多
年的努力，

也许最终我们得到了
鼠标级别的人工智能，

它可以像鼠标一样在
杂乱的环境中导航

。

然后，经过很多很多年
的辛勤工作和大量投资，

也许最终我们会得到
黑猩猩级别的人工智能。

然后，经过
多年非常非常努力的工作，

我们得到了乡村白痴
人工智能。

片刻之后，
我们超越了 Ed Witten。

火车不会
在 Humanville 站停靠。

相反，它很可能会飞快地飞过。

现在，这具有深远的影响，

尤其是在涉及
权力问题时。

例如，黑猩猩很强壮——

一磅一磅，黑猩猩的
力量大约是健康的人类雄性的两倍。

然而，
Kanzi 和他的伙伴们的命运更多地

取决于我们人类的所作所为，而不是
黑猩猩自己的所作所为。

一旦有了超级智能，

人类的命运可能
取决于超级智能的所作所为。

想一想：

机器智能是人类需要做出的最后一项发明
。

机器
将比我们更擅长发明，

而且它们将
在数字时间尺度上这样做。

这意味着基本上
是对未来的缩影。

想想你能想象到的所有疯狂的技术
，

也许人类可以
在充足的时间里发展起来：

治疗衰老、太空殖民、

自我复制的纳米机器人或将
思想上传到计算机，

各种科幻小说

中的东西。
符合物理规律。

所有这些超级智能都可以
发展，而且可能非常迅速。

现在，拥有如此技术成熟度的超级智能，

将非常强大

，至少在某些场景下，
它能够得到它想要的东西。

然后，我们将拥有一个
由这个人工智能的偏好所塑造的未来。

现在一个很好的问题是，
这些偏好是什么？

这里变得更棘手了。

为了在这方面取得任何进展，

我们首先必须
避免拟人化。

这很讽刺，因为
每篇

关于人工智能未来的报纸文章。
有这样的画面：

所以我认为我们需要做的是
更抽象地构思这个问题，

而不是根据生动的好莱坞场景。

我们需要将智能
视为一个优化

过程，一个将未来引导
到一组特定配置的过程。

超级智能是
一个非常强大的优化过程。

它非常擅长使用
可用的手段来达到

其目标实现的状态。

这意味着

在这个意义上的高度智能

与我们
人类认为有价值或有意义的目标之间没有必然的联系。

假设我们给一个 A.I.
让人类微笑的目标。

当人工智能很弱，它执行有用
或有趣的动作

，使用户微笑。

当人工智能成为超级智能后，

它意识到有一种更
有效的方法可以实现这一目标

：控制世界

并将电极插入人类的面部
肌肉，

从而引起持续不断的笑容。

另一个例子，

假设我们给 A.I. 解决
一个困难的数学问题的目标。

当人工智能成为超级智能后，

它意识到解决这个问题的最有效方法

是将地球
变成一台巨型计算机

，以提高其思维能力。

请注意，这给了人工智能
一个工具性的理由

来对我们做我们
可能不赞成的事情。

这个模型中的人类是威胁，

我们可以阻止数学
问题的解决。

当然，
在这些特定的方面，显然事情不会出错；

这些是卡通例子。

但这里的一般观点很重要：

如果您创建一个非常强大的
优化过程

来最大化目标 x，

您最好
确保您对 x 的定义

包含您关心的所有内容。

这也是
许多神话中的教训。

迈达斯国王希望
他所接触的一切都变成黄金。

他抚摸他的女儿，
她变成了金子。

他触摸他的食物，它变成了金子。

这可能变得实际相关，

不仅仅是作为贪婪的隐喻，

而是作为一个说明，

如果你创建一个强大的
优化过程

并给它错误的
或指定不明确的目标会发生什么。

现在你可能会说，如果计算机开始
将电极贴在人们的脸上，

我们就直接关掉它。

答，如果我们越来越依赖系统，这不一定那么容易做到
——

比如，
互联网的开关在哪里？

B，为什么黑猩猩没有
把开关拨到人类

或尼安德特人身上？

他们当然有理由。例如，

我们有一个关闭开关
，就在这里。

（哽咽

）因为我们
是聪明的对手；

我们可以预测威胁
并围绕它们制定计划。

但是超级智能代理

也可以，而且在这方面会
比我们好得多。

关键是，我们不应该
相信我们可以控制这里。

我们可以试着让我们的
工作变得更容易一些，比如，

把人工智能放在上面。在一个盒子里，

就像一个安全的软件环境，

一个它无法逃脱的虚拟现实模拟
。

但是我们
对人工智能有多大的信心。找不到错误。

鉴于仅仅是人类黑客
一直在发现错误，

我想说，可能不是很有信心。

所以我们断开以太网电缆
以创建一个气隙，

但同样，就像人类黑客

经常
使用社会工程越过气隙一样。

现在，正如我所说，

我确信某处的某个

员工已经被自称来自 I.T. 的人说服了
她提供她的帐户详细信息

。
部。

更有创意的场景也是可能的，

比如如果你是人工智能，

你可以想象
在你的内部电路中摆动电极

来产生
可以用来交流的无线电波。

或者，也许你可以假装出现故障，

然后当程序员打开
你的大门，看看你出了什么问题时，

他们会查看源代码——Bam！

——可以进行操纵。

或者它可以将蓝图输出
为一项非常漂亮的技术

，当我们实施它时，

它会产生一些隐秘的副作用
，即人工智能。已经计划好了。

这里的重点是，我们
不应该对我们

将超级智能精灵
永远锁在瓶子里的能力充满信心。

迟早会出来的。

我相信这里的答案
是弄清楚

如何创建超级智能人工智能。
这样即使 - 当 - 它逃脱时，

它仍然是安全的，因为它
从根本上支持我们，

因为它与我们的价值观相同。

我认为没有办法解决
这个难题。

现在，我实际上相当乐观地
认为这个问题可以得到解决。

我们不必写下
我们关心的所有事情的长列表，

或者更糟糕的是，

用 C++ 或 Python 等计算机语言拼写出来，

这将是一项非常无望的任务。

相反，我们将创建一个人工智能。
它利用它的智能

来了解我们重视什么

，它的动机系统的
构建方式是它有

动力追求我们的价值观或
执行它预测我们会赞成的行动。

因此，我们将
尽可能利用它的智能

来解决价值加载问题。

这可能会发生

，结果可能
对人类非常有利。

但这不会自动发生。如果我们要进行可控的

爆炸，

可能需要
以正确的方式设置智能爆炸的初始条件

。

人工智能的价值观
需要与我们的匹配，

而不仅仅是在熟悉的环境中，

比如我们可以轻松检查
人工智能如何进行的地方。行为，

而且在所有新的环境
中，人工智能。未来可能会

遇到。

还有一些深奥的
问题需要解决、整理：

其决策理论的具体细节、

如何处理逻辑
不确定性等等。

因此
，要使这项工作

看起来相当困难，需要解决的技术问题——

不像
制造超级人工智能那样困难，

但相当困难。

令人担忧的是：

制造超级智能的 A.I.
是一个非常艰巨的挑战。

制作超级智能的 A.I. 这是安全的，

除此之外还涉及一些额外的
挑战。

风险在于，如果有人想出
如何解决第一个挑战，

而没有解决

确保完美安全的额外挑战。

所以我认为我们应该

提前制定出控制问题的解决方案，

以便我们在
需要的时候得到它。

现在可能是我们无法
提前解决整个控制问题，

因为可能
只有

在您了解
将要实施的架构的细节后才能将某些元素放置到位。

但
我们提前解决的控制问题

越多，向机器智能时代过渡的可能性就

越大。

在我看来，这似乎是一件
非常值得做的事情

，我可以想象，如果
事情进展顺利，

那么一百万年后的
人们回顾这个世纪

，很可能他们会说
我们做的一件事真正重要的

是把这件事做好。

谢谢你。

（掌声）