How computers are learning to be creative Blaise Agera y Arcas

So, I lead a team at Google
that works on machine intelligence;

in other words, the engineering discipline
of making computers and devices

able to do some of the things
that brains do.

And this makes us
interested in real brains

and neuroscience as well,

and especially interested
in the things that our brains do

that are still far superior
to the performance of computers.

Historically, one of those areas
has been perception,

the process by which things
out there in the world –

sounds and images –

can turn into concepts in the mind.

This is essential for our own brains,

and it’s also pretty useful on a computer.

The machine perception algorithms,
for example, that our team makes,

are what enable your pictures
on Google Photos to become searchable,

based on what’s in them.

The flip side of perception is creativity:

turning a concept into something
out there into the world.

So over the past year,
our work on machine perception

has also unexpectedly connected
with the world of machine creativity

and machine art.

I think Michelangelo
had a penetrating insight

into to this dual relationship
between perception and creativity.

This is a famous quote of his:

“Every block of stone
has a statue inside of it,

and the job of the sculptor
is to discover it.”

So I think that what
Michelangelo was getting at

is that we create by perceiving,

and that perception itself
is an act of imagination

and is the stuff of creativity.

The organ that does all the thinking
and perceiving and imagining,

of course, is the brain.

And I’d like to begin
with a brief bit of history

about what we know about brains.

Because unlike, say,
the heart or the intestines,

you really can’t say very much
about a brain by just looking at it,

at least with the naked eye.

The early anatomists who looked at brains

gave the superficial structures
of this thing all kinds of fanciful names,

like hippocampus, meaning “little shrimp.”

But of course that sort of thing
doesn’t tell us very much

about what’s actually going on inside.

The first person who, I think, really
developed some kind of insight

into what was going on in the brain

was the great Spanish neuroanatomist,
Santiago Ramón y Cajal,

in the 19th century,

who used microscopy and special stains

that could selectively fill in
or render in very high contrast

the individual cells in the brain,

in order to start to understand
their morphologies.

And these are the kinds of drawings
that he made of neurons

in the 19th century.

This is from a bird brain.

And you see this incredible variety
of different sorts of cells,

even the cellular theory itself
was quite new at this point.

And these structures,

these cells that have these arborizations,

these branches that can go
very, very long distances –

this was very novel at the time.

They’re reminiscent, of course, of wires.

That might have been obvious
to some people in the 19th century;

the revolutions of wiring and electricity
were just getting underway.

But in many ways,

these microanatomical drawings
of Ramón y Cajal’s, like this one,

they’re still in some ways unsurpassed.

We’re still more than a century later,

trying to finish the job
that Ramón y Cajal started.

These are raw data from our collaborators

at the Max Planck Institute
of Neuroscience.

And what our collaborators have done

is to image little pieces of brain tissue.

The entire sample here
is about one cubic millimeter in size,

and I’m showing you a very,
very small piece of it here.

That bar on the left is about one micron.

The structures you see are mitochondria

that are the size of bacteria.

And these are consecutive slices

through this very, very
tiny block of tissue.

Just for comparison’s sake,

the diameter of an average strand
of hair is about 100 microns.

So we’re looking at something
much, much smaller

than a single strand of hair.

And from these kinds of serial
electron microscopy slices,

one can start to make reconstructions
in 3D of neurons that look like these.

So these are sort of in the same
style as Ramón y Cajal.

Only a few neurons lit up,

because otherwise we wouldn’t
be able to see anything here.

It would be so crowded,

so full of structure,

of wiring all connecting
one neuron to another.

So Ramón y Cajal was a little bit
ahead of his time,

and progress on understanding the brain

proceeded slowly
over the next few decades.

But we knew that neurons used electricity,

and by World War II, our technology
was advanced enough

to start doing real electrical
experiments on live neurons

to better understand how they worked.

This was the very same time
when computers were being invented,

very much based on the idea
of modeling the brain –

of “intelligent machinery,”
as Alan Turing called it,

one of the fathers of computer science.

Warren McCulloch and Walter Pitts
looked at Ramón y Cajal’s drawing

of visual cortex,

which I’m showing here.

This is the cortex that processes
imagery that comes from the eye.

And for them, this looked
like a circuit diagram.

So there are a lot of details
in McCulloch and Pitts’s circuit diagram

that are not quite right.

But this basic idea

that visual cortex works like a series
of computational elements

that pass information
one to the next in a cascade,

is essentially correct.

Let’s talk for a moment

about what a model for processing
visual information would need to do.

The basic task of perception

is to take an image like this one and say,

“That’s a bird,”

which is a very simple thing
for us to do with our brains.

But you should all understand
that for a computer,

this was pretty much impossible
just a few years ago.

The classical computing paradigm

is not one in which
this task is easy to do.

So what’s going on between the pixels,

between the image of the bird
and the word “bird,”

is essentially a set of neurons
connected to each other

in a neural network,

as I’m diagramming here.

This neural network could be biological,
inside our visual cortices,

or, nowadays, we start
to have the capability

to model such neural networks
on the computer.

And I’ll show you what
that actually looks like.

So the pixels you can think
about as a first layer of neurons,

and that’s, in fact,
how it works in the eye –

that’s the neurons in the retina.

And those feed forward

into one layer after another layer,
after another layer of neurons,

all connected by synapses
of different weights.

The behavior of this network

is characterized by the strengths
of all of those synapses.

Those characterize the computational
properties of this network.

And at the end of the day,

you have a neuron
or a small group of neurons

that light up, saying, “bird.”

Now I’m going to represent
those three things –

the input pixels and the synapses
in the neural network,

and bird, the output –

by three variables: x, w and y.

There are maybe a million or so x’s –

a million pixels in that image.

There are billions or trillions of w’s,

which represent the weights of all
these synapses in the neural network.

And there’s a very small number of y’s,

of outputs that that network has.

“Bird” is only four letters, right?

So let’s pretend that this
is just a simple formula,

x “x” w = y.

I’m putting the times in scare quotes

because what’s really
going on there, of course,

is a very complicated series
of mathematical operations.

That’s one equation.

There are three variables.

And we all know
that if you have one equation,

you can solve one variable
by knowing the other two things.

So the problem of inference,

that is, figuring out
that the picture of a bird is a bird,

is this one:

it’s where y is the unknown
and w and x are known.

You know the neural network,
you know the pixels.

As you can see, that’s actually
a relatively straightforward problem.

You multiply two times three
and you’re done.

I’ll show you an artificial neural network

that we’ve built recently,
doing exactly that.

This is running in real time
on a mobile phone,

and that’s, of course,
amazing in its own right,

that mobile phones can do so many
billions and trillions of operations

per second.

What you’re looking at is a phone

looking at one after another
picture of a bird,

and actually not only saying,
“Yes, it’s a bird,”

but identifying the species of bird
with a network of this sort.

So in that picture,

the x and the w are known,
and the y is the unknown.

I’m glossing over the very
difficult part, of course,

which is how on earth
do we figure out the w,

the brain that can do such a thing?

How would we ever learn such a model?

So this process of learning,
of solving for w,

if we were doing this
with the simple equation

in which we think about these as numbers,

we know exactly how to do that: 6 = 2 x w,

well, we divide by two and we’re done.

The problem is with this operator.

So, division –

we’ve used division because
it’s the inverse to multiplication,

but as I’ve just said,

the multiplication is a bit of a lie here.

This is a very, very complicated,
very non-linear operation;

it has no inverse.

So we have to figure out a way
to solve the equation

without a division operator.

And the way to do that
is fairly straightforward.

You just say, let’s play
a little algebra trick,

and move the six over
to the right-hand side of the equation.

Now, we’re still using multiplication.

And that zero – let’s think
about it as an error.

In other words, if we’ve solved
for w the right way,

then the error will be zero.

And if we haven’t gotten it quite right,

the error will be greater than zero.

So now we can just take guesses
to minimize the error,

and that’s the sort of thing
computers are very good at.

So you’ve taken an initial guess:

what if w = 0?

Well, then the error is 6.

What if w = 1? The error is 4.

And then the computer can
sort of play Marco Polo,

and drive down the error close to zero.

As it does that, it’s getting
successive approximations to w.

Typically, it never quite gets there,
but after about a dozen steps,

we’re up to w = 2.999,
which is close enough.

And this is the learning process.

So remember that what’s been going on here

is that we’ve been taking
a lot of known x’s and known y’s

and solving for the w in the middle
through an iterative process.

It’s exactly the same way
that we do our own learning.

We have many, many images as babies

and we get told, “This is a bird;
this is not a bird.”

And over time, through iteration,

we solve for w, we solve
for those neural connections.

So now, we’ve held
x and w fixed to solve for y;

that’s everyday, fast perception.

We figure out how we can solve for w,

that’s learning, which is a lot harder,

because we need to do error minimization,

using a lot of training examples.

And about a year ago,
Alex Mordvintsev, on our team,

decided to experiment
with what happens if we try solving for x,

given a known w and a known y.

In other words,

you know that it’s a bird,

and you already have your neural network
that you’ve trained on birds,

but what is the picture of a bird?

It turns out that by using exactly
the same error-minimization procedure,

one can do that with the network
trained to recognize birds,

and the result turns out to be …

a picture of birds.

So this is a picture of birds
generated entirely by a neural network

that was trained to recognize birds,

just by solving for x
rather than solving for y,

and doing that iteratively.

Here’s another fun example.

This was a work made
by Mike Tyka in our group,

which he calls “Animal Parade.”

It reminds me a little bit
of William Kentridge’s artworks,

in which he makes sketches, rubs them out,

makes sketches, rubs them out,

and creates a movie this way.

In this case,

what Mike is doing is varying y
over the space of different animals,

in a network designed
to recognize and distinguish

different animals from each other.

And you get this strange, Escher-like
morph from one animal to another.

Here he and Alex together
have tried reducing

the y’s to a space of only two dimensions,

thereby making a map
out of the space of all things

recognized by this network.

Doing this kind of synthesis

or generation of imagery
over that entire surface,

varying y over the surface,
you make a kind of map –

a visual map of all the things
the network knows how to recognize.

The animals are all here;
“armadillo” is right in that spot.

You can do this with other kinds
of networks as well.

This is a network designed
to recognize faces,

to distinguish one face from another.

And here, we’re putting
in a y that says, “me,”

my own face parameters.

And when this thing solves for x,

it generates this rather crazy,

kind of cubist, surreal,
psychedelic picture of me

from multiple points of view at once.

The reason it looks like
multiple points of view at once

is because that network is designed
to get rid of the ambiguity

of a face being in one pose
or another pose,

being looked at with one kind of lighting,
another kind of lighting.

So when you do
this sort of reconstruction,

if you don’t use some sort of guide image

or guide statistics,

then you’ll get a sort of confusion
of different points of view,

because it’s ambiguous.

This is what happens if Alex uses
his own face as a guide image

during that optimization process
to reconstruct my own face.

So you can see it’s not perfect.

There’s still quite a lot of work to do

on how we optimize
that optimization process.

But you start to get something
more like a coherent face,

rendered using my own face as a guide.

You don’t have to start
with a blank canvas

or with white noise.

When you’re solving for x,

you can begin with an x,
that is itself already some other image.

That’s what this little demonstration is.

This is a network
that is designed to categorize

all sorts of different objects –
man-made structures, animals …

Here we’re starting
with just a picture of clouds,

and as we optimize,

basically, this network is figuring out
what it sees in the clouds.

And the more time
you spend looking at this,

the more things you also
will see in the clouds.

You could also use the face network
to hallucinate into this,

and you get some pretty crazy stuff.

(Laughter)

Or, Mike has done some other experiments

in which he takes that cloud image,

hallucinates, zooms, hallucinates,
zooms hallucinates, zooms.

And in this way,

you can get a sort of fugue state
of the network, I suppose,

or a sort of free association,

in which the network
is eating its own tail.

So every image is now the basis for,

“What do I think I see next?

What do I think I see next?
What do I think I see next?”

I showed this for the first time in public

to a group at a lecture in Seattle
called “Higher Education” –

this was right after
marijuana was legalized.

(Laughter)

So I’d like to finish up quickly

by just noting that this technology
is not constrained.

I’ve shown you purely visual examples
because they’re really fun to look at.

It’s not a purely visual technology.

Our artist collaborator, Ross Goodwin,

has done experiments involving
a camera that takes a picture,

and then a computer in his backpack
writes a poem using neural networks,

based on the contents of the image.

And that poetry neural network
has been trained

on a large corpus of 20th-century poetry.

And the poetry is, you know,

I think, kind of not bad, actually.

(Laughter)

In closing,

I think that per Michelangelo,

I think he was right;

perception and creativity
are very intimately connected.

What we’ve just seen are neural networks

that are entirely trained to discriminate,

or to recognize different
things in the world,

able to be run in reverse, to generate.

One of the things that suggests to me

is not only that
Michelangelo really did see

the sculpture in the blocks of stone,

but that any creature,
any being, any alien

that is able to do
perceptual acts of that sort

is also able to create

because it’s exactly the same
machinery that’s used in both cases.

Also, I think that perception
and creativity are by no means

uniquely human.

We start to have computer models
that can do exactly these sorts of things.

And that ought to be unsurprising;
the brain is computational.

And finally,

computing began as an exercise
in designing intelligent machinery.

It was very much modeled after the idea

of how could we make machines intelligent.

And we finally are starting to fulfill now

some of the promises
of those early pioneers,

of Turing and von Neumann

and McCulloch and Pitts.

And I think that computing
is not just about accounting

or playing Candy Crush or something.

From the beginning,
we modeled them after our minds.

And they give us both the ability
to understand our own minds better

and to extend them.

Thank you very much.

(Applause)

所以，我在谷歌领导
一个致力于机器智能的团队；

换句话说，
使计算机和设备

能够完成
大脑所做的某些事情的工程学科。

这让我们也
对真实的大脑

和神经科学产生了兴趣

，尤其对我们的大脑

所做的仍然远远
优于计算机性能的事情感兴趣。

从历史上看，其中一个
领域是感知

，即
世界上的事物——

声音和图像——

可以在头脑中转化为概念的过程。

这对我们自己的大脑来说是必不可少的

，它在计算机上也非常有用。例如

，我们团队开发的机器感知算法

使您
在 Google 相册上的图片可以

根据其中的内容进行搜索。

感知的另一面是创造力：

将一个概念
变成现实世界的东西。

所以在过去的一年里，
我们在机器感知

方面的工作也出乎意料地
与机器创造力

和机器艺术的世界联系起来。

我认为米开朗基罗
对

感知和创造力之间的这种双重关系有着深刻的洞察力。

这是他的一句名言：

“每一块石头
里面都有一座雕像

，雕刻家的工作
就是去发现它。”

所以我认为
米开朗基罗的意思

是我们通过感知来创造，

而感知本身
就是一种想象的行为

，是创造力的东西。

进行所有思考
、感知和想象

的器官当然是大脑。

我想先简要

介绍一下我们对大脑的了解。

因为不像
心脏或肠子，

你真的不能
仅仅通过观察，

至少用肉眼来评价大脑。

早期研究大脑的解剖学家

给这个东西的表面结构
起了各种奇特的名字，

比如海马，意思是“小虾”。

但当然，这类事情
并不能告诉我们太多

关于内部实际发生的事情。

我认为，第一个真正

对大脑中发生的事情产生某种洞察力的人是 19

世纪伟大的西班牙神经解剖学家
圣地亚哥·拉蒙·卡哈尔（Santiago Ramón y Cajal），

他使用显微镜和

可以选择性填充或填充的特殊染色剂
以非常高的对比度渲染

大脑中的单个细胞

，以便开始了解
它们的形态。

这些
是他在 19 世纪绘制的神经元图

。

这是来自鸟的大脑。

你会看到令人难以置信
的不同种类的细胞，

甚至细胞理论本身
在这一点上也是相当新的。

这些结构，

这些具有树枝状结构的细胞，

这些可以走
很远很远的分支——

这在当时是非常新颖的。

当然，它们让人想起电线。

这
对 19 世纪的某些人来说可能是显而易见的。

布线和电力的革命
才刚刚开始。

但在许多方面，

这些
Ramón y Cajal 的显微解剖图，就像这幅画一样，

在某些方面仍然是无与伦比的。

一个多世纪后，我们仍在

努力完成
Ramón y Cajal 开始的工作。

这些是来自

马克斯普朗克
神经科学研究所合作者的原始数据。

我们的合作者所做的

是对小块脑组织进行成像。

这里的整个样本
大小约为一立方毫米

，我在这里向您展示它的
一小块。

左边的那个条大约是一微米。

你看到的结构

是细菌大小的线粒体。

这些是

通过这个非常非常
小的组织块的连续切片。

只是为了比较起见，

平均
一根头发的直径约为 100 微米。

所以我们正在寻找

比一根头发小得多的东西。

从这些系列的
电子显微镜切片中，

人们可以开始
对看起来像这些的神经元进行 3D 重建。

所以这些
和Ramón y Cajal的风格有点像。

只有少数神经元亮了，

否则我们
将无法在这里看到任何东西。

它将如此拥挤，

如此充满结构

，将
一个神经元连接到另一个神经元。

因此，Ramón y Cajal 有点
领先于他的时代

，在接下来的几十年里，对大脑的理解

进展缓慢
。

但我们知道神经元使用电

，到二战时，我们的技术
已经足够先进，

可以开始
对活神经元进行真正的电实验，

以更好地了解它们的工作原理。

这正是
计算机被发明的同时，

很大程度上是基于对
大脑建模的想法——正如艾伦·图灵所说

的“智能机器”

，计算机科学之父之一。

Warren McCulloch 和 Walter Pitts
研究了我在这里展示的 Ramón y Cajal

的视觉

皮层图。

这是处理
来自眼睛的图像的皮层。

对他们来说，这看起来
像一个电路图。

所以
McCulloch 和 Pitts 的电路图

中有很多细节不太对。

但是

，视觉皮层的工作原理就像
一系列计算元素

，
以级联的方式将信息一个一个传递到下一个，这一基本观点

基本上是正确的。

让我们

谈谈处理
视觉信息的模型需要做什么。

感知的基本任务

是拍摄这样的图像并说

“那是一只鸟”

，这对我们来说是一件非常简单的事情
，可以用我们的大脑来完成。

但是你们应该都明白
，对于一台计算机来说，

这在
几年前几乎是不可能的。

经典的计算范式

不是一个
容易完成这项任务的范式。

所以像素

之间发生了什么，鸟的图像
和“鸟”这个词之间，

本质上是一组神经网络
中相互连接

的神经元，

正如我在这里绘制的图表。

这个神经网络可能是生物的，
在我们的视觉皮层内，

或者，现在，我们
开始有能力在计算机

上模拟这种神经网络
。

我会告诉你
它实际上是什么样子的。

所以你可以把像素
想象成第一层神经元

，事实上，这
就是它在眼睛中的工作方式——

这就是视网膜中的神经元。

而那些

神经元又一层接一层地向前馈送，所有神经元

都通过
不同权重的突触连接起来。

该网络的行为以

所有这些突触的优势为特征。

这些表征
了该网络的计算特性。

在一天结束的时候，

你有一个神经元
或一小群神经元

会亮起来，说“鸟”。

现在我将用三个变量来表示
这三件事——神经网络中

的输入像素和突触
，

以及鸟，输出——

通过三个变量：x、w 和 y。

该图像中可能有大约

一百万个x——一百万个像素。

有数十亿或数万亿个 w

，代表神经网络中所有
这些突触的权重。

该网络的输出数量非常少

。

“鸟”只有四个字母，对吧？

所以让我们假设这
只是一个简单的公式，

x “x” w = y。

我把时间放在了吓人的引号中，

因为那里真正
发生的事情当然

是一系列非常复杂
的数学运算。

这是一个等式。

有三个变量。

我们都知道
，如果你有一个方程，

你可以
通过了解其他两件事来解决一个变量。

所以推理的问题，

即弄清楚
鸟的图片是鸟，

是这样的：

它是其中 y 是未知数
而 w 和 x 是已知的。

你知道神经网络，
你知道像素。

如您所见，这实际上是
一个相对简单的问题。

你乘以二乘三
，你就完成了。

我将向您展示我们最近构建的人工神经网络

，它就是
这样做的。

这是
在手机

上实时运行的，当然，
这本身就是令人惊叹的

，手机每秒可以执行
数十亿和数万亿次操作

。

你看到的是一部手机

，一张
一张地看着一只鸟的照片

，实际上不仅是说，
“是的，它是一只鸟”，

而是
用这种网络识别鸟类的种类。

所以在那幅图中

，x 和 w 是已知的，
而 y 是未知的。当然，

我忽略了非常
困难的部分，我们

到底是
如何计算出 w 的

，大脑可以做这样的事情？

我们将如何学习这样的模型？

所以这个学习
，求解 w 的过程，

如果我们
用一个简单的

方程来做这个，我们把这些看作数字，

我们确切地知道如何做到这一点：6 = 2 xw，

好吧，我们除以 2，我们 ‘重做。

问题出在这个运算符上。

所以，除法——

我们使用除法是因为
它是乘法的逆运算，

但正如我刚才所说

，乘法在这里有点谎言。

这是一个非常、非常复杂、
非常非线性的操作；

它没有逆。

所以我们必须想办法
在

没有除法运算符的情况下求解方程。

做到这
一点的方法相当简单。

你只是说，让我们玩
一个代数小把戏，

把六
移到等式的右边。

现在，我们仍在使用乘法。

而那个零 - 让我们
将其视为一个错误。

换句话说，如果我们
以正确的方式求解 w，

那么误差将为零。

如果我们没有完全正确

，错误将大于零。

所以现在我们可以通过猜测
来最小化错误

，这是
计算机非常擅长的事情。

所以你已经做了一个初步的猜测

：如果 w = 0 会怎样？

好吧，那么错误是 6。

如果 w = 1 怎么办？错误是

4。然后计算机可以
播放马可波罗，

并将错误降低到接近于零。

当它这样做时，它得到
了 w 的连续逼近。

通常，它永远不会到达那里，
但经过大约十几步后，

我们会达到 w = 2.999，
这已经足够接近了。

这就是学习过程。

所以请记住，这里发生的事情

是我们已经获取
了很多已知的 x 和已知的 y，

并通过迭代过程求解中间的 w
。

这与
我们自己学习的方式完全相同。

我们有很多很多婴儿时的图像

，我们被告知，“这是一只鸟；
这不是一只鸟。”

随着时间的推移，通过迭代，

我们求解 w，
求解那些神经连接。

所以现在，我们
固定 x 和 w 来求解 y；

那是日常，快速的感知。

我们弄清楚如何解决 w，

这就是学习，这要困难得多，

因为我们需要

使用大量训练示例来最小化错误。

大约一年前
，我们团队的 Alex Mordvintsev

决定
尝试在已知 w 和已知 y 的情况下尝试求解 x 会发生什么

。

换句话说，

你知道它是一只鸟，

并且你已经有了你
在鸟类身上训练过的神经网络，

但是鸟的图片是什么？

事实证明，通过使用
完全相同的误差最小化程序

，可以通过训练识别鸟类的网络来做到这一点

，结果证明是……

一张鸟的照片。

所以这是一张
完全由神经网络生成的鸟类图片，该网络

经过训练可以识别鸟类，

只需求解 x
而不是求解 y，

然后迭代地进行。

这是另一个有趣的例子。

这
是我们小组的迈克·泰卡（Mike Tyka）制作的作品

，他称之为“动物游行”。

它让我想起了
威廉·肯特里奇（William Kentridge）的艺术作品，

他在其中制作草图，将它们擦掉，

制作草图，将它们擦掉，

然后以这种方式创作一部电影。

在这种情况下

，迈克正在做的是
在不同动物的空间上改变 y，

在一个
旨在识别和区分

不同动物的网络中。

你会从一种动物到另一种动物得到这种奇怪的、类似埃舍尔的
变形。

在这里，他和 Alex
一起尝试

将 y 缩小到只有二维的空间，

从而
从这个网络识别的所有事物的空间中制作出一张地图

。

在整个表面上进行这种合成

或生成图像，
在表面上

改变 y，
你制作了一种地图 - 网络知道如何识别

的所有事物的视觉地图
。

动物都在这里；
“犰狳”就在那个地方。

您也可以对其他类型
的网络执行此操作。

这是一个
旨在识别人脸的网络，

用于区分一张脸和另一张脸。

在这里，我们
输入一个表示“我”的 y，

我自己的面部参数。

当这个东西解决 x 时，

它会同时从多个角度生成我这个相当疯狂的

、立体主义的、超现实的、
迷幻

的画面。

它一次看起来像
多个观点的原因

是因为该网络
旨在消除

面部处于一个姿势
或另一个姿势的模糊性，

用一种照明，
另一种照明进行观察。

所以当你做
这种重建时，

如果你不使用某种引导图像

或引导统计，

那么你会得到一种
不同观点的混淆，

因为它是模棱两可的。

如果 Alex 在优化过程中使用
他自己的脸作为引导图像

来重建我自己的脸，就会发生这种情况。

所以你可以看到它并不完美。

我们如何
优化优化过程还有很多工作要做。

但是你开始得到
更像是一张连贯的脸，

使用我自己的脸作为指导进行渲染。

您不必
从空白画布

或白噪声开始。

当您求解 x 时，

您可以从 x 开始
，它本身已经是其他图像。

这就是这个小演示。

这是一个
旨在对各种不同对象进行分类的网络

——人造结构、动物……

这里我们
只从一张云图开始

，随着我们的优化，

基本上，这个网络正在弄清楚
什么它在云中看到。

你花在这个上

的时间越多，你
也会在云中看到更多的东西。

你也可以使用人脸网络
来产生幻觉

，你会得到一些非常疯狂的东西。

（笑声）

或者，迈克做了一些其他的

实验，他拍摄了云的图像，产生

幻觉，放大，产生幻觉，
放大幻觉，放大。

通过这种方式，

你可以获得一种网络的神游
状态，我想，

或者一种自由联想

，其中网络
正在吃自己的尾巴。

所以现在每个图像都是

“我认为我接下来会看到

什么？我认为我接下来会看到
什么？我认为我接下来会看到什么？”的基础。

我第一次

在西雅图的一个名为“高等教育”的讲座上向一个团体公开展示了这一点
——

这是在
大麻合法化之后。

（笑声）

所以我想快速结束

，只是指出这项
技术不受限制。

我向您展示了纯粹的视觉示例，
因为它们看起来非常有趣。

它不是一种纯粹的视觉技术。

我们的艺术家合作者罗斯·古德温（Ross Goodwin

）进行了一些实验，其中涉及
一台相机，它可以拍照，

然后他背包里的一台电脑根据图像的内容
使用神经网络写一首诗

。

并且该诗歌神经网络
已经在

20 世纪的大量诗歌语料库中进行了训练。

诗，你知道，

我认为，其实还不错。

（笑声

）最后，

我认为按照米开朗基罗的说法，

我认为他是对的；

感知和创造力
是密切相关的。

我们刚刚看到的是神经网络

，它们完全被训练来区分

或识别
世界上的不同事物，

能够反向运行，生成。

向我暗示的一

件事不仅是
米开朗基罗确实看到

了石块中的雕塑，

而且任何

能够进行这种
感知行为的生物、任何

存在、任何外星人也能够创造，

因为
在这两种情况下使用的机器完全相同。

此外，我认为感知
和创造力绝不是

人类独有的。

我们开始拥有
可以做这些事情的计算机模型。

这应该不足为奇；
大脑是计算的。

最后，

计算
始于设计智能机器的练习。

它非常模仿

我们如何让机器变得智能的想法。

现在，我们终于开始

兑现那些早期

先驱者图灵、冯诺依曼

、麦卡洛克和皮茨的一些承诺。

而且我认为
计算不仅仅是会计

或玩糖果粉碎什么的。

从一开始，
我们就按照我们的想法对它们进行建模。

它们使我们能够
更好地理解自己的思想

并扩展它们。

非常感谢你。

（掌声）