3 principles for creating safer AI Stuart Russell

This is Lee Sedol.

Lee Sedol is one of the world’s
greatest Go players,

and he’s having what my friends
in Silicon Valley call

a “Holy Cow” moment –

(Laughter)

a moment where we realize

that AI is actually progressing
a lot faster than we expected.

So humans have lost on the Go board.
What about the real world?

Well, the real world is much bigger,

much more complicated than the Go board.

It’s a lot less visible,

but it’s still a decision problem.

And if we think about some
of the technologies

that are coming down the pike …

Noriko [Arai] mentioned that reading
is not yet happening in machines,

at least with understanding.

But that will happen,

and when that happens,

very soon afterwards,

machines will have read everything
that the human race has ever written.

And that will enable machines,

along with the ability to look
further ahead than humans can,

as we’ve already seen in Go,

if they also have access
to more information,

they’ll be able to make better decisions
in the real world than we can.

So is that a good thing?

Well, I hope so.

Our entire civilization,
everything that we value,

is based on our intelligence.

And if we had access
to a lot more intelligence,

then there’s really no limit
to what the human race can do.

And I think this could be,
as some people have described it,

the biggest event in human history.

So why are people saying things like this,

that AI might spell the end
of the human race?

Is this a new thing?

Is it just Elon Musk and Bill Gates
and Stephen Hawking?

Actually, no. This idea
has been around for a while.

Here’s a quotation:

“Even if we could keep the machines
in a subservient position,

for instance, by turning off the power
at strategic moments” –

and I’ll come back to that
“turning off the power” idea later on –

“we should, as a species,
feel greatly humbled.”

So who said this?
This is Alan Turing in 1951.

Alan Turing, as you know,
is the father of computer science

and in many ways,
the father of AI as well.

So if we think about this problem,

the problem of creating something
more intelligent than your own species,

we might call this “the gorilla problem,”

because gorillas' ancestors did this
a few million years ago,

and now we can ask the gorillas:

Was this a good idea?

So here they are having a meeting
to discuss whether it was a good idea,

and after a little while,
they conclude, no,

this was a terrible idea.

Our species is in dire straits.

In fact, you can see the existential
sadness in their eyes.

(Laughter)

So this queasy feeling that making
something smarter than your own species

is maybe not a good idea –

what can we do about that?

Well, really nothing,
except stop doing AI,

and because of all
the benefits that I mentioned

and because I’m an AI researcher,

I’m not having that.

I actually want to be able
to keep doing AI.

So we actually need to nail down
the problem a bit more.

What exactly is the problem?

Why is better AI possibly a catastrophe?

So here’s another quotation:

“We had better be quite sure
that the purpose put into the machine

is the purpose which we really desire.”

This was said by Norbert Wiener in 1960,

shortly after he watched
one of the very early learning systems

learn to play checkers
better than its creator.

But this could equally have been said

by King Midas.

King Midas said, “I want everything
I touch to turn to gold,”

and he got exactly what he asked for.

That was the purpose
that he put into the machine,

so to speak,

and then his food and his drink
and his relatives turned to gold

and he died in misery and starvation.

So we’ll call this
“the King Midas problem”

of stating an objective
which is not, in fact,

truly aligned with what we want.

In modern terms, we call this
“the value alignment problem.”

Putting in the wrong objective
is not the only part of the problem.

There’s another part.

If you put an objective into a machine,

even something as simple as,
“Fetch the coffee,”

the machine says to itself,

“Well, how might I fail
to fetch the coffee?

Someone might switch me off.

OK, I have to take steps to prevent that.

I will disable my ‘off’ switch.

I will do anything to defend myself
against interference

with this objective
that I have been given.”

So this single-minded pursuit

in a very defensive mode
of an objective that is, in fact,

not aligned with the true objectives
of the human race –

that’s the problem that we face.

And in fact, that’s the high-value
takeaway from this talk.

If you want to remember one thing,

it’s that you can’t fetch
the coffee if you’re dead.

(Laughter)

It’s very simple. Just remember that.
Repeat it to yourself three times a day.

(Laughter)

And in fact, this is exactly the plot

of “2001: [A Space Odyssey]”

HAL has an objective, a mission,

which is not aligned
with the objectives of the humans,

and that leads to this conflict.

Now fortunately, HAL
is not superintelligent.

He’s pretty smart,
but eventually Dave outwits him

and manages to switch him off.

But we might not be so lucky.

So what are we going to do?

I’m trying to redefine AI

to get away from this classical notion

of machines that intelligently
pursue objectives.

There are three principles involved.

The first one is a principle
of altruism, if you like,

that the robot’s only objective

is to maximize the realization
of human objectives,

of human values.

And by values here I don’t mean
touchy-feely, goody-goody values.

I just mean whatever it is
that the human would prefer

their life to be like.

And so this actually violates Asimov’s law

that the robot has to protect
its own existence.

It has no interest in preserving
its existence whatsoever.

The second law is a law
of humility, if you like.

And this turns out to be really
important to make robots safe.

It says that the robot does not know

what those human values are,

so it has to maximize them,
but it doesn’t know what they are.

And that avoids this problem
of single-minded pursuit

of an objective.

This uncertainty turns out to be crucial.

Now, in order to be useful to us,

it has to have some idea of what we want.

It obtains that information primarily
by observation of human choices,

so our own choices reveal information

about what it is that we prefer
our lives to be like.

So those are the three principles.

Let’s see how that applies
to this question of:

“Can you switch the machine off?”
as Turing suggested.

So here’s a PR2 robot.

This is one that we have in our lab,

and it has a big red “off” switch
right on the back.

The question is: Is it
going to let you switch it off?

If we do it the classical way,

we give it the objective of, “Fetch
the coffee, I must fetch the coffee,

I can’t fetch the coffee if I’m dead,”

so obviously the PR2
has been listening to my talk,

and so it says, therefore,
“I must disable my ‘off’ switch,

and probably taser all the other
people in Starbucks

who might interfere with me.”

(Laughter)

So this seems to be inevitable, right?

This kind of failure mode
seems to be inevitable,

and it follows from having
a concrete, definite objective.

So what happens if the machine
is uncertain about the objective?

Well, it reasons in a different way.

It says, “OK, the human
might switch me off,

but only if I’m doing something wrong.

Well, I don’t really know what wrong is,

but I know that I don’t want to do it.”

So that’s the first and second
principles right there.

“So I should let the human switch me off.”

And in fact you can calculate
the incentive that the robot has

to allow the human to switch it off,

and it’s directly tied to the degree

of uncertainty about
the underlying objective.

And then when the machine is switched off,

that third principle comes into play.

It learns something about the objectives
it should be pursuing,

because it learns that
what it did wasn’t right.

In fact, we can, with suitable use
of Greek symbols,

as mathematicians usually do,

we can actually prove a theorem

that says that such a robot
is provably beneficial to the human.

You are provably better off
with a machine that’s designed in this way

than without it.

So this is a very simple example,
but this is the first step

in what we’re trying to do
with human-compatible AI.

Now, this third principle,

I think is the one that you’re probably
scratching your head over.

You’re probably thinking, “Well,
you know, I behave badly.

I don’t want my robot to behave like me.

I sneak down in the middle of the night
and take stuff from the fridge.

I do this and that.”

There’s all kinds of things
you don’t want the robot doing.

But in fact, it doesn’t
quite work that way.

Just because you behave badly

doesn’t mean the robot
is going to copy your behavior.

It’s going to understand your motivations
and maybe help you resist them,

if appropriate.

But it’s still difficult.

What we’re trying to do, in fact,

is to allow machines to predict
for any person and for any possible life

that they could live,

and the lives of everybody else:

Which would they prefer?

And there are many, many
difficulties involved in doing this;

I don’t expect that this
is going to get solved very quickly.

The real difficulties, in fact, are us.

As I have already mentioned,
we behave badly.

In fact, some of us are downright nasty.

Now the robot, as I said,
doesn’t have to copy the behavior.

The robot does not have
any objective of its own.

It’s purely altruistic.

And it’s not designed just to satisfy
the desires of one person, the user,

but in fact it has to respect
the preferences of everybody.

So it can deal with a certain
amount of nastiness,

and it can even understand
that your nastiness, for example,

you may take bribes as a passport official

because you need to feed your family
and send your kids to school.

It can understand that;
it doesn’t mean it’s going to steal.

In fact, it’ll just help you
send your kids to school.

We are also computationally limited.

Lee Sedol is a brilliant Go player,

but he still lost.

So if we look at his actions,
he took an action that lost the game.

That doesn’t mean he wanted to lose.

So to understand his behavior,

we actually have to invert
through a model of human cognition

that includes our computational
limitations – a very complicated model.

But it’s still something
that we can work on understanding.

Probably the most difficult part,
from my point of view as an AI researcher,

is the fact that there are lots of us,

and so the machine has to somehow
trade off, weigh up the preferences

of many different people,

and there are different ways to do that.

Economists, sociologists,
moral philosophers have understood that,

and we are actively
looking for collaboration.

Let’s have a look and see what happens
when you get that wrong.

So you can have
a conversation, for example,

with your intelligent personal assistant

that might be available
in a few years' time.

Think of a Siri on steroids.

So Siri says, “Your wife called
to remind you about dinner tonight.”

And of course, you’ve forgotten.
“What? What dinner?

What are you talking about?”

“Uh, your 20th anniversary at 7pm.”

“I can’t do that. I’m meeting
with the secretary-general at 7:30.

How could this have happened?”

“Well, I did warn you, but you overrode
my recommendation.”

“Well, what am I going to do?
I can’t just tell him I’m too busy.”

“Don’t worry. I arranged
for his plane to be delayed.”

(Laughter)

“Some kind of computer malfunction.”

(Laughter)

“Really? You can do that?”

“He sends his profound apologies

and looks forward to meeting you
for lunch tomorrow.”

(Laughter)

So the values here –
there’s a slight mistake going on.

This is clearly following my wife’s values

which is “Happy wife, happy life.”

(Laughter)

It could go the other way.

You could come home
after a hard day’s work,

and the computer says, “Long day?”

“Yes, I didn’t even have time for lunch.”

“You must be very hungry.”

“Starving, yeah.
Could you make some dinner?”

“There’s something I need to tell you.”

(Laughter)

“There are humans in South Sudan
who are in more urgent need than you.”

(Laughter)

“So I’m leaving. Make your own dinner.”

(Laughter)

So we have to solve these problems,

and I’m looking forward
to working on them.

There are reasons for optimism.

One reason is,

there is a massive amount of data.

Because remember – I said
they’re going to read everything

the human race has ever written.

Most of what we write about
is human beings doing things

and other people getting upset about it.

So there’s a massive amount
of data to learn from.

There’s also a very
strong economic incentive

to get this right.

So imagine your domestic robot’s at home.

You’re late from work again
and the robot has to feed the kids,

and the kids are hungry
and there’s nothing in the fridge.

And the robot sees the cat.

(Laughter)

And the robot hasn’t quite learned
the human value function properly,

so it doesn’t understand

the sentimental value of the cat outweighs
the nutritional value of the cat.

(Laughter)

So then what happens?

Well, it happens like this:

“Deranged robot cooks kitty
for family dinner.”

That one incident would be the end
of the domestic robot industry.

So there’s a huge incentive
to get this right

long before we reach
superintelligent machines.

So to summarize:

I’m actually trying to change
the definition of AI

so that we have provably
beneficial machines.

And the principles are:

machines that are altruistic,

that want to achieve only our objectives,

but that are uncertain
about what those objectives are,

and will watch all of us

to learn more about what it is
that we really want.

And hopefully in the process,
we will learn to be better people.

Thank you very much.

(Applause)

Chris Anderson: So interesting, Stuart.

We’re going to stand here a bit
because I think they’re setting up

for our next speaker.

A couple of questions.

So the idea of programming in ignorance
seems intuitively really powerful.

As you get to superintelligence,

what’s going to stop a robot

reading literature and discovering
this idea that knowledge

is actually better than ignorance

and still just shifting its own goals
and rewriting that programming?

Stuart Russell: Yes, so we want
it to learn more, as I said,

about our objectives.

It’ll only become more certain
as it becomes more correct,

so the evidence is there

and it’s going to be designed
to interpret it correctly.

It will understand, for example,
that books are very biased

in the evidence they contain.

They only talk about kings and princes

and elite white male people doing stuff.

So it’s a complicated problem,

but as it learns more about our objectives

it will become more and more useful to us.

CA: And you couldn’t
just boil it down to one law,

you know, hardwired in:

“if any human ever tries to switch me off,

I comply. I comply.”

SR: Absolutely not.

That would be a terrible idea.

So imagine that you have
a self-driving car

and you want to send your five-year-old

off to preschool.

Do you want your five-year-old
to be able to switch off the car

while it’s driving along?

Probably not.

So it needs to understand how rational
and sensible the person is.

The more rational the person,

the more willing you are
to be switched off.

If the person is completely
random or even malicious,

then you’re less willing
to be switched off.

CA: All right. Stuart, can I just say,

I really, really hope you
figure this out for us.

Thank you so much for that talk.
That was amazing.

SR: Thank you.

(Applause)

这是李世石。

李世石是世界上
最伟大的围棋选手之一

，他正经历着我
在硅谷的朋友们所说

的“圣牛”时刻——

（笑声

）在那个时刻，我们

意识到人工智能实际上
比我们预期的要快得多。

所以人类在围棋棋盘上输了。
那么现实世界呢？

嗯，现实世界

比围棋棋盘大得多，复杂得多。

它不那么明显，

但它仍然是一个决策问题。

如果我们考虑

一下正在流行的一些技术……

Noriko [Arai] 提到
机器还没有阅读，

至少在理解的情况下。

但这将会发生

，当这种情况发生时，

很快，

机器就会读取
人类曾经写过的所有东西。

这将使

机器能够
比人类看得更远，

正如我们在围棋中已经看到的那样，

如果它们也可以
访问更多信息，

它们将能够
在现实世界中做出比人类更好的决策我们可以。

那么这是一件好事吗？

好吧，我希望如此。

我们的整个文明
，我们所珍视的一切，

都是基于我们的智慧。

如果我们能够
获得更多的智能，

那么人类可以做的事情真的没有限制。

正如一些人所描述的那样，我认为这可能

是人类历史上最大的事件。

那么为什么人们会说这样的话，

说人工智能可能意味着
人类的终结？

这是新事物吗？

仅仅是埃隆马斯克、比尔盖茨
和斯蒂芬霍金吗？

实际上，没有。这个想法
已经存在了一段时间。

这里有一句话：

“即使我们可以让机器
处于从属地位，

例如，通过
在战略时刻关闭电源”

——我稍后会回到那个
“关闭电源”的想法——

“作为一个物种，我们应该
感到非常自卑。”

那么这是谁说的呢？
这是 1951 年的

艾伦·图灵。如你所知，艾伦·图灵
是计算机科学之父

，在许多方面
也是人工智能之父。

所以如果我们考虑

这个问题，创造
比你自己的物种更聪明的东西的问题，

我们可以称之为“大猩猩问题”，

因为大猩猩的祖先在
几百万年前就这样做了

，现在我们可以问大猩猩：

这是个好主意吗？

所以他们在这里
开会讨论这是否是一个好主意

，过了一会儿，
他们得出结论，不，

这是一个糟糕的主意。

我们的物种陷入了困境。

事实上，你可以
从他们的眼中看到存在的悲伤。

（笑声）

所以，制造
比你自己的物种更聪明的东西

可能不是一个好主意——

我们能做些什么呢？

好吧，真的没什么，
除了停止做人工智能

，因为
我提到的所有好处

，因为我是一名人工智能研究人员，

我没有那个。

我实际上希望能够
继续做人工智能。

所以我们实际上需要更多地确定
这个问题。

究竟是什么问题？

为什么更好的人工智能可能是一场灾难？

所以这里有另一句引语：

“我们最好
确定放入

机器的目的是我们真正想要的目的。”

这是 Norbert Wiener 在 1960 年说的，

当时他看到
一个非常早期的学习系统

比它的创造者更好地玩跳棋。

但迈达斯国王也可以这么

说。

迈达斯国王说：“我希望
我接触到的所有东西都变成黄金”

，他得到了他想要的东西。

可以说，这
就是他放入机器的目的

，然后他的食物和饮料
以及他的亲戚变成了黄金

，他在痛苦和饥饿中死去。

所以我们称之为
“迈达斯国王问题”

，即陈述一个
实际上并不

真正符合我们想要的目标。

用现代术语来说，我们称之为
“价值对齐问题”。

设定错误的
目标并不是问题的唯一部分。

还有另一部分。

如果你把一个目标放到一台机器上，

即使是简单的事情，
“拿咖啡

”，机器就会对自己说，

“好吧，我怎么
可能拿不到咖啡呢？

有人可能会关掉我。

好吧，我必须采取措施防止这种情况发生。

我将禁用我的‘关闭’开关。

我将尽一切努力保护自己
免受干扰

，我已被赋予的这个目标。”

因此，这种

以一种非常防御性的方式一心一意地追求一个

事实上与人类的真正目标不一致的目标
——

这就是我们面临的问题。

事实上，这就是这次谈话的高价值
收获。

如果你想记住一件事，

那就是
如果你死了，你就拿不到咖啡了。

（笑声）

这很简单。记住这一点。
每天对自己重复三遍。

（笑声

）事实上，这正是

“2001：[太空漫游]”的情节，

HAL 有一个目标，一个使命，

它
与人类的目标不一致，

并导致了这场冲突。

现在幸运的是，HAL
不是超级智能的。

他很聪明，
但最终戴夫智胜了他

并设法让他失望。

但我们可能没有那么幸运。

那么我们要做什么呢？

我正在尝试重新定义人工智能，

以摆脱

这种智能地追求目标的机器的经典概念
。

涉及三个原则。

第一个
是利他主义原则，如果你愿意的话

，机器人的唯一目标

是最大限度地
实现人类目标

和人类价值。

我这里所说的价值观并不是指
敏感的、好的好的价值观。

我的意思是
无论人类希望

他们的生活是什么样的。

所以这实际上违反了阿西莫夫定律

，即机器人必须保护
自己的存在。

它对保持
其存在毫无兴趣。

第二条法则
是谦卑法则，如果你愿意的话。

事实证明，这
对于确保机器人安全非常重要。

它说机器人不

知道那些人类价值观是什么，

所以它必须最大化它们，
但它不知道它们是什么。

这样就避免
了一心一意

追求目标的问题。

事实证明，这种不确定性至关重要。

现在，为了对我们有用，

它必须对我们想要什么有所了解。

它主要
通过观察人类选择来获得这些信息，

因此我们自己的选择揭示

了我们更喜欢
我们的生活是什么样的信息。

所以这就是三个原则。

让我们看看这如何适用
于这个问题：

“你能关掉机器吗？”
正如图灵建议的那样。

所以这是一个 PR2 机器人。

这是我们实验室里的一个

，它的背面有一个大的红色“关闭”
开关。

问题是：它
会让你关掉它吗？

如果我们用经典的方式来做，

我们给它的目标是，“
拿咖啡，我必须拿咖啡，

如果我死了，我不能拿咖啡”，

所以显然 PR2
一直在听我的演讲

，所以它说，因此，
“我必须禁用我的‘关闭’开关，

并且可能

会对星巴克中可能干扰我的所有其他人进行电击。”

（笑声）

所以这似乎是不可避免的，对吧？

这种失败模式
似乎是不可避免的

，它源于有
一个具体的、明确的目标。

那么如果机器不确定目标会发生什么
？

嗯，它以不同的方式推理。

它说，“好吧，人类
可能会关闭我，

但前提是我做错了什么。

好吧，我真的不知道什么是错，

但我知道我不想这样做。”

这就是第一和第二个
原则。

“所以我应该让人类把我关掉。”

事实上，你可以计算出

机器人允许人类

关闭它的动机，它

与潜在目标的不确定程度直接相关。

然后当机器关闭时

，第三个原则开始发挥作用。

它了解了
它应该追求的目标，

因为它
知道它所做的事情是不正确的。

事实上，我们可以通过适当
使用希腊符号，

就像数学家通常所做的那样，

我们实际上可以证明一个定理

，即这样的机器人
可证明对人类有益。

事实证明，
使用以这种方式设计的机器

比没有它要好。

所以这是一个非常简单的例子，
但这

是我们尝试
用人类兼容的人工智能做的第一步。

现在，这第三个原则，

我认为是你可能正在
摸不着头脑的那个。

你可能在想，“好吧，
你知道，我的行为很糟糕。

我不希望我的机器人表现得像我一样。

我在半夜偷偷溜下，
从冰箱里拿东西。

我做这个做那个。 "

你不希望机器人做各种各样的事情。

但事实上，它并不
完全那样工作。

仅仅因为你表现不好

并不意味着
机器人会模仿你的行为。

它会了解你的动机
，如果合适的话，可能会帮助你抵制它们

。

但这仍然很困难。

事实上，我们正在尝试做的

是让机器
预测任何人以及

他们可能过的任何可能的生活，

以及其他所有人的生活：

他们更喜欢哪种生活？

这样做有很多很多
困难；

我不希望这
会很快得到解决。

事实上，真正的困难是我们自己。

正如我已经提到的，
我们的行为很糟糕。

事实上，我们中的一些人是彻头彻尾的讨厌。

现在，正如我所说，机器人
不必复制行为。

机器人没有
任何自己的目标。

这纯粹是利他主义的。

它的设计不仅仅是为了满足
一个人，即用户的欲望，

实际上它必须尊重
每个人的偏好。

所以它可以处理一定
程度的肮脏

，它甚至可以
理解你的肮脏，例如，

你可能作为护照官员收受贿赂，

因为你需要养家糊口
，送孩子上学。

它可以理解；
这并不意味着它会偷窃。

事实上，它只会帮助您
送孩子上学。

我们的计算能力也有限。

李世石是一位出色的围棋手，

但他还是输了。

所以如果我们看他的行为，
他采取了一个输掉比赛的行为。

这并不意味着他想输。

所以为了理解他的行为，

我们实际上必须
通过一个

包含我们计算
限制的人类认知模型来反转——一个非常复杂的模型。

但这仍然
是我们可以努力理解的事情。

从我作为 AI 研究人员的角度来看，可能最困难的部分

是我们有很多人

，因此机器必须以某种方式
进行权衡，权衡

许多不同人的偏好，

并且有不同的方法要做到这一点。

经济学家、社会学家、
道德哲学家都明白这一点

，我们正在积极
寻求合作。

让我们看看
当你弄错时会发生什么。

例如

，您可以与

可能
在几年后可用的智能个人助理进行对话。

想一想使用类固醇的 Siri。

所以 Siri 说：“你的妻子打电话
来提醒你今晚的晚餐。”

当然，你已经忘记了。
“什么？什么晚餐？

你在说什么？”

“呃，晚上七点是你的二十周年纪念日。”

“我不能那样做，我
7点30分会见秘书长，

怎么会这样？”

“好吧，我确实警告过你，但你推翻了
我的建议。”

“好吧，我该怎么办？
我不能告诉他我太忙了。”

“不用担心，我
安排他的飞机晚点。”

（笑声）

“某种电脑故障。”

（笑声）

“真的吗？你能做到吗？”

“他深表歉意

，期待明天与你
共进午餐。”

（笑声）

所以这里的价值观——
有一个小错误。

这显然是在遵循我妻子的价值观

，即“幸福的妻子，幸福的生活”。

（笑声）

它可以走另一条路。

你可能
在辛苦工作一天后回到家

，电脑会说：“漫长的一天？”

“是的，我什至没有时间吃午饭。”

“你一定很饿。”

“饿死了，是的。
你能做点晚饭吗？”

“有件事我要告诉你。”

（笑声）

“在南苏丹
，有些人比你更急需。”

（笑声）

“所以我要走了。做你自己的晚餐。”

（笑声）

所以我们必须解决这些问题

，我
期待着解决这些问题。

有乐观的理由。

一个原因是，

有大量的数据。

因为记住——我说过
他们会阅读

人类曾经写过的所有东西。

我们所写的大部分内容
都是人类在做事，

而其他人对此感到不安。

所以有大量
的数据可以学习。

还有一个非常
强大的经济动机

来做到这一点。

所以想象一下你的家用机器人在家里。

你又下班迟到了
，机器人不得不喂孩子们

，孩子们饿
了，冰箱里什么也没有。

机器人看到了猫。

（笑声）

而且机器人还没有完全
学会人类的价值功能，

所以它不明

白猫的情感价值超过
了猫的营养价值。

（笑声）

那么会发生什么呢？

嗯，它是这样发生的：

“疯狂的机器人
为家庭晚餐煮小猫。”

这一事件将
是国内机器人产业的终结。

因此，在我们到达超级智能机器之前很久，就有很大的动力
去解决这个问题

。

总结一下：

我实际上是在尝试改变
人工智能的定义，

以便我们拥有可证明
有益的机器。

原则是：

机器是利他的

，只想实现我们的目标，

但不确定
这些目标是什么，

并且会观察我们所有人

更多地
了解我们真正想要的是什么。

希望在这个过程中，
我们将学会成为更好的人。

非常感谢你。

（掌声）

克里斯·安德森：太有趣了，斯图尔特。

我们要站在这里一点，
因为我认为他们正在

为我们的下一位演讲者做准备。

几个问题。

因此，在无知中编程的想法在
直觉上似乎非常强大。

当你进入超级智能时，

什么会阻止机器人

阅读文献并发现

知识实际上比无知更好的想法，

并且仍然只是改变自己的目标
并重写程序？

Stuart Russell：是的
，正如我所说，我们希望它更多地

了解我们的目标。

随着它变得更加正确，它只会变得更加确定，

因此证据就在那里

，它将被设计
为正确解释它。

例如，它将
理解书籍

在其包含的证据方面非常有偏见。

他们只谈论国王和王子

以及精英白人男性做事。

所以这是一个复杂的问题，

但随着它更多地了解我们的目标

，它将对我们越来越有用。

CA：你不能
把它归结为一条法律，

你知道，硬连线：

“如果有人试图让我关机，

我遵守。我遵守。”

SR：绝对不是。

那将是一个可怕的想法。

所以想象一下，你有
一辆自动驾驶汽车

，你想送你 5 岁

的孩子上幼儿园。

你想让你五岁的孩子
在开车的时候把车关掉

吗？

可能不是。

所以它需要了解
这个人是多么的理性和理智。

这个人越理性

，你就越
愿意被关掉。

如果这个人是完全
随机的，甚至是恶意的，

那么你就不太
愿意被关掉。

CA：好的。斯图尔特，我能不能说，

我真的，真的希望你
能帮我们解决这个问题。

非常感谢你的谈话。
那太精彩了。

SR：谢谢。

（掌声）