Digital humans that look just like us Doug Roble

Hello.

I’m not a real person.

I’m actually a copy of a real person.

Although, I feel like a real person.

It’s kind of hard to explain.

Hold on – I think I saw
a real person … there’s one.

Let’s bring him onstage.

Hello.

(Applause)

What you see up there is a digital human.

I’m wearing an inertial
motion capture suit

that’s figuring what my body is doing.

And I’ve got a single camera here
that’s watching my face

and feeding some machine-learning software
that’s taking my expressions,

like, “Hm, hm, hm,”

and transferring it to that guy.

We call him “DigiDoug.”

He’s actually a 3-D character
that I’m controlling live in real time.

So, I work in visual effects.

And in visual effects,

one of the hardest things to do
is to create believable, digital humans

that the audience accepts as real.

People are just really good
at recognizing other people.

Go figure!

So, that’s OK, we like a challenge.

Over the last 15 years,

we’ve been putting
humans and creatures into film

that you accept as real.

If they’re happy, you should feel happy.

And if they feel pain,
you should empathize with them.

We’re getting pretty good at it, too.

But it’s really, really difficult.

Effects like these take thousands of hours

and hundreds of really talented artists.

But things have changed.

Over the last five years,

computers and graphics cards
have gotten seriously fast.

And machine learning,
deep learning, has happened.

So we asked ourselves:

Do you suppose we could create
a photo-realistic human,

like we’re doing for film,

but where you’re seeing
the actual emotions and the details

of the person who’s controlling
the digital human

in real time?

In fact, that’s our goal:

If you were having
a conversation with DigiDoug

one-on-one,

is it real enough so that you could tell
whether or not I was lying to you?

So that was our goal.

About a year and a half ago,
we set off to achieve this goal.

What I’m going to do now is take you
basically on a little bit of a journey

to see exactly what we had to do
to get where we are.

We had to capture
an enormous amount of data.

In fact, by the end of this thing,

we had probably one of the largest
facial data sets on the planet.

Of my face.

(Laughter)

Why me?

Well, I’ll do just about
anything for science.

I mean, look at me!

I mean, come on.

We had to first figure out
what my face actually looked like.

Not just a photograph or a 3-D scan,

but what it actually looked like
in any photograph,

how light interacts with my skin.

Luckily for us, about three blocks away
from our Los Angeles studio

is this place called ICT.

They’re a research lab

that’s associated with the University
of Southern California.

They have a device there,
it’s called the “light stage.”

It has a zillion
individually controlled lights

and a whole bunch of cameras.

And with that, we can reconstruct my face
under a myriad of lighting conditions.

We even captured the blood flow

and how my face changes
when I make expressions.

This let us build a model of my face
that, quite frankly, is just amazing.

It’s got an unfortunate
level of detail, unfortunately.

(Laughter)

You can see every pore, every wrinkle.

But we had to have that.

Reality is all about detail.

And without it, you miss it.

We are far from done, though.

This let us build a model of my face
that looked like me.

But it didn’t really move like me.

And that’s where
machine learning comes in.

And machine learning needs a ton of data.

So I sat down in front of some
high-resolution motion-capturing device.

And also, we did this traditional
motion capture with markers.

We created a whole bunch
of images of my face

and moving point clouds
that represented that shapes of my face.

Man, I made a lot of expressions,

I said different lines
in different emotional states …

We had to do a lot of capture with this.

Once we had this enormous amount of data,

we built and trained deep neural networks.

And when we were finished with that,

in 16 milliseconds,

the neural network can look at my image

and figure out everything about my face.

It can compute my expression,
my wrinkles, my blood flow –

even how my eyelashes move.

This is then rendered
and displayed up there

with all the detail
that we captured previously.

We’re far from done.

This is very much a work in progress.

This is actually the first time
we’ve shown it outside of our company.

And, you know, it doesn’t look
as convincing as we want;

I’ve got wires coming out
of the back of me,

and there’s a sixth-of-a-second delay

between when we capture the video
and we display it up there.

Sixth of a second – that’s crazy good!

But it’s still why you’re hearing
a bit of an echo and stuff.

And you know, this machine learning
stuff is brand-new to us,

sometimes it’s hard to convince
to do the right thing, you know?

It goes a little sideways.

(Laughter)

But why did we do this?

Well, there’s two reasons, really.

First of all, it is just crazy cool.

(Laughter)

How cool is it?

Well, with the push of a button,

I can deliver this talk
as a completely different character.

This is Elbor.

We put him together
to test how this would work

with a different appearance.

And the cool thing about this technology
is that, while I’ve changed my character,

the performance is still all me.

I tend to talk out of the right
side of my mouth;

so does Elbor.

(Laughter)

Now, the second reason we did this,
and you can imagine,

is this is going to be great for film.

This is a brand-new, exciting tool

for artists and directors
and storytellers.

It’s pretty obvious, right?

I mean, this is going to be
really neat to have.

But also, now that we’ve built it,

it’s clear that this
is going to go way beyond film.

But wait.

Didn’t I just change my identity
with the push of a button?

Isn’t this like “deepfake”
and face-swapping

that you guys may have heard of?

Well, yeah.

In fact, we are using
some of the same technology

that deepfake is using.

Deepfake is 2-D and image based,
while ours is full 3-D

and way more powerful.

But they’re very related.

And now I can hear you thinking,

“Darn it!

I though I could at least
trust and believe in video.

If it was live video,
didn’t it have to be true?”

Well, we know that’s not
really the case, right?

Even without this, there are simple tricks
that you can do with video

like how you frame a shot

that can make it really misrepresent
what’s actually going on.

And I’ve been working
in visual effects for a long time,

and I’ve known for a long time

that with enough effort,
we can fool anyone about anything.

What this stuff and deepfake is doing

is making it easier and more accessible
to manipulate video,

just like Photoshop did
for manipulating images, some time ago.

I prefer to think about

how this technology could bring
humanity to other technology

and bring us all closer together.

Now that you’ve seen this,

think about the possibilities.

Right off the bat, you’re going to see it
in live events and concerts, like this.

Digital celebrities, especially
with new projection technology,

are going to be just like the movies,
but alive and in real time.

And new forms of communication are coming.

You can already interact
with DigiDoug in VR.

And it is eye-opening.

It’s just like you and I
are in the same room,

even though we may be miles apart.

Heck, the next time you make a video call,

you will be able to choose
the version of you

you want people to see.

It’s like really, really good makeup.

I was scanned about a year and a half ago.

I’ve aged.

DigiDoug hasn’t.

On video calls, I never have to grow old.

And as you can imagine,
this is going to be used

to give virtual assistants
a body and a face.

A humanity.

I already love it that when I talk
to virtual assistants,

they answer back in a soothing,
humanlike voice.

Now they’ll have a face.

And you’ll get all the nonverbal cues
that make communication so much easier.

It’s going to be really nice.

You’ll be able to tell when
a virtual assistant is busy or confused

or concerned about something.

Now, I couldn’t leave the stage

without you actually being able
to see my real face,

so you can do some comparison.

So let me take off my helmet here.

Yeah, don’t worry,
it looks way worse than it feels.

(Laughter)

So this is where we are.

Let me put this back on here.

(Laughter)

Doink!

So this is where we are.

We’re on the cusp of being able
to interact with digital humans

that are strikingly real,

whether they’re being controlled
by a person or a machine.

And like all new technology these days,

it’s going to come with some
serious and real concerns

that we have to deal with.

But I am just so really excited

about the ability to bring something
that I’ve seen only in science fiction

for my entire life

into reality.

Communicating with computers
will be like talking to a friend.

And talking to faraway friends

will be like sitting with them
together in the same room.

Thank you very much.

(Applause)

你好。

我不是一个真实的人。

我实际上是一个真人的复制品。

虽然，我觉得自己是一个真实的人。

这有点难以解释。

等等——我想我看到
了一个真实的人……有一个。

让我们把他带到舞台上。

你好。

（掌声）

你在上面看到的是一个数字人。

我穿着惯性
动作捕捉服

，它正在弄清楚我的身体在做什么。

我这里有一个摄像头，
它可以看着我的脸，

并提供一些机器学习软件
，它会记录我的表情，

比如“嗯，嗯，嗯”，

然后把它传送给那个人。

我们称他为“DigiDoug”。

他实际上
是我实时控制的 3D 角色。

所以，我从事视觉效果工作。

而在视觉效果方面

，最难做的事情之一
就是创造可信的数字人类

，让观众接受为真实的。

人们真的很
擅长识别其他人。

去搞清楚！

所以，没关系，我们喜欢挑战。

在过去的 15 年里，

我们一直在将
人类和生物放入

您认为是真实的电影中。

如果他们快乐，你应该感到快乐。

如果他们感到痛苦，
你应该同情他们。

我们也做得很好。

但这真的，真的很难。

像这样的效果需要数千小时

和数百名真正有才华的艺术家。

但事情发生了变化。

在过去的五年里，

计算机和显卡
的发展速度非常快。

机器学习，
深度学习，已经发生了。

所以我们问自己：

你认为我们可以创造
一个照片般逼真的人类，

就像我们为电影所做的那样，

但是你可以看到实时

控制数字人类的人

的真实情感和细节吗？

事实上，这就是我们的目标：

如果您
与 DigiDoug 进行一对一的对话

，

是否足够真实，以至于您可以判断
我是否在对您撒谎？

这就是我们的目标。

大约一年半前，
我们开始实现这一目标。

我现在要做的基本上是带你
进行一段旅程

，看看我们必须做些什么
才能到达我们现在的位置。

我们必须
捕获大量数据。

事实上，到这件事结束时，

我们可能拥有地球上最大的
面部数据集之一。

我的脸。

（笑声）

为什么是我？

好吧，我会
为科学做任何事情。

我的意思是，看着我！

我的意思是，来吧。

我们必须首先
弄清楚我的脸实际上是什么样子。

不仅仅是一张照片或 3-D 扫描，

还有它
在任何照片中的实际样子

，光与我的皮肤的相互作用。

对我们来说幸运的是，距离我们洛杉矶工作室大约三个街区

的地方是一个叫做 ICT 的地方。

他们

是与
南加州大学相关的研究实验室。

他们在那里有一个装置
，叫做“光舞台”。

它有无数个
单独控制的灯

和一大堆相机。

有了这个，我们可以
在无数的光照条件下重建我的脸。

我们甚至捕捉到了血流

以及
我做出表情时面部的变化。

这让我们建立了一个我的脸
模型，坦率地说，这真是太棒了。

不幸的
是，它的详细程度令人遗憾。

（笑声）

你可以看到每一个毛孔，每一个皱纹。

但我们必须拥有它。

现实就是细节。

没有它，你会错过它。

不过，我们还远未完成。

这让我们建立了
一个看起来像我的脸模型。

但它并没有像我一样移动。

这就是
机器学习的用武之地

。机器学习需要大量数据。

所以我坐在一些
高分辨率的动作捕捉设备前。

而且，我们使用标记进行了这种传统的
动作捕捉。

我们创建了一大堆
我的脸部图像

和
代表我脸部形状的移动点云。

伙计，我做了很多表情，

我
在不同的情绪状态下说了不同的台词……

我们不得不为此做很多捕捉。

一旦我们拥有如此大量的数据，

我们就构建并训练了深度神经网络。

当我们完成这一操作时，

在 16 毫秒内

，神经网络可以查看我的图像

并找出我脸上的所有信息。

它可以计算出我的表情、
皱纹、血流——

甚至我的睫毛如何移动。

然后将其
与

我们之前捕获的所有细节一起渲染并显示在那里。

我们还远远没有完成。

这是一项正在进行的工作。

这实际上是我们第一次
在公司之外展示它。

而且，你知道，它看起来
不像我们想要的那样令人信服。

我有电线从
我的背后出来，

在我们捕捉视频
和在那里显示视频之间有六分之一秒的延迟。

六分之一秒——太棒了！

但这仍然是您
听到一些回声之类的原因。

而且你知道，这种机器学习的
东西对我们来说是全新的，

有时很难
说服我们做正确的事情，你知道吗？

它有点侧身。

（笑声）

但是我们为什么要这样做呢？

嗯，有两个原因，真的。

首先，这简直太酷了。

（笑声）

有多酷？

好吧，只需按一下按钮，

我就可以
以完全不同的角色进行这次演讲。

这是埃尔伯。

我们把他放在一
起来测试这将如何

与不同的外观一起工作。

这项技术最酷的地方
在于，虽然我已经改变了我的角色

，但表演仍然是我的全部。

我倾向于从嘴的右侧说出来
；

埃尔伯也是。

（笑声）

现在，我们这样做的第二个原因
，你可以想象

，这对电影来说会很棒。对于艺术家、导演和故事讲述者来说

，这是一个全新的、令人兴奋的工具

。

这很明显，对吧？

我的意思是，这将是
非常整洁的。

而且，既然我们已经建立了它，

很明显
这将超越电影。

可是等等。

我不是只是按一下按钮就改变了我的身份
吗？

这不就像你们可能听说过的“深度伪造”
和换脸

吗？

嗯，是的。

事实上，我们正在使用
一些与 deepfake 相同的

技术。

Deepfake 是 2-D 和基于图像的，
而我们的是全 3-D

并且更强大。

但它们非常相关。

现在我可以听到你在想，

“该死！

我虽然至少可以
相信并相信视频。

如果是现场视频，
那它不是必须是真的吗？”

好吧，我们知道事实并非
如此，对吧？

即使没有这个，
你也可以对视频做一些简单的技巧，

比如你如何

构图可以让它真正歪曲
实际发生的事情。

而且我
在视觉效果方面工作了很长时间，

而且我很早就知道只要

付出足够的努力，
我们就可以在任何事情上欺骗任何人。

这些东西和 deepfake 所做的

就是让处理视频变得更容易、更容易访问
，

就像
前一段时间 Photoshop 处理图像所做的那样。

我更愿意思考

这项技术如何将
人类带到其他技术中

，并使我们更加紧密地联系在一起。

既然您已经看到了这一点，请

考虑一下可能性。

马上，你就会
在现场活动和音乐会中看到它，就像这样。

数字名人，尤其是
采用新投影技术的数字名人

，将像电影一样，
但活灵活现。

新的交流方式正在到来。

您已经可以
在 VR 中与 DigiDoug 互动。

它令人大开眼界。

这就像你和
我在同一个房间，

即使我们可能相隔数英里。

哎呀，下次您进行视频通话时，

您将能够选择

您希望人们看到的版本。

这就像真的，非常好的化妆。

大约一年半前我被扫描了。

我已经老了。

DigiDoug 没有。

在视频通话中，我永远不必变老。

正如你可以想象的那样，
这将被用来

给虚拟助手
一个身体和一张脸。

一个人性。

我已经很喜欢当我
与虚拟助手交谈时，

他们会以一种舒缓的、
类似人的声音回答。

现在他们将有一张脸。

你会得到所有
让沟通变得更容易的非语言暗示。

这将是非常好的。

您将能够分辨
出虚拟助手何时忙碌、困惑

或担心某事。

现在，我不能在

没有你真正
看到我的真面目的情况下离开舞台，

所以你可以做一些比较。

所以让我在这里脱下我的头盔。

是的，别担心，
它看起来比感觉更糟糕。

（笑声）

所以这就是我们所在的地方。

让我把它放在这里。

（笑声）

不要！

所以这就是我们所在的地方。

我们正处于能够
与非常真实的数字人类互动

的风口浪尖，

无论他们是
由人还是机器控制。

就像现在所有的新技术一样，

它会带来一些我们必须处理的
严重而真实的问题

。

但我真的很

兴奋能够将
我一生只在科幻小说中看到的东西

变成现实。

与计算机通信
就像与朋友交谈一样。

和远方的朋友交谈

就像和他们
一起坐在同一个房间里一样。

非常感谢你。

（掌声）