Fake videos of real people and how to spot them Supasorn Suwajanakorn

Look at these images.

Now, tell me which Obama here is real.

(Video) Barack Obama: To help families
refinance their homes,

to invest in things
like high-tech manufacturing,

clean energy

and the infrastructure
that creates good new jobs.

Supasorn Suwajanakorn: Anyone?

The answer is none of them.

(Laughter)

None of these is actually real.

So let me tell you how we got here.

My inspiration for this work

was a project meant to preserve our last
chance for learning about the Holocaust

from the survivors.

It’s called New Dimensions in Testimony,

and it allows you to have
interactive conversations

with a hologram
of a real Holocaust survivor.

(Video) Man: How did you
survive the Holocaust?

(Video) Hologram: How did I survive?

I survived,

I believe,

because providence watched over me.

SS: Turns out these answers
were prerecorded in a studio.

Yet the effect is astounding.

You feel so connected to his story
and to him as a person.

I think there’s something special
about human interaction

that makes it much more profound

and personal

than what books or lectures
or movies could ever teach us.

So I saw this and began to wonder,

can we create a model
like this for anyone?

A model that looks, talks
and acts just like them?

So I set out to see if this could be done

and eventually came up with a new solution

that can build a model of a person
using nothing but these:

existing photos and videos of a person.

If you can leverage
this kind of passive information,

just photos and video that are out there,

that’s the key to scaling to anyone.

By the way, here’s Richard Feynman,

who in addition to being
a Nobel Prize winner in physics

was also known as a legendary teacher.

Wouldn’t it be great
if we could bring him back

to give his lectures
and inspire millions of kids,

perhaps not just in English
but in any language?

Or if you could ask our grandparents
for advice and hear those comforting words

even if they’re no longer with us?

Or maybe using this tool,
book authors, alive or not,

could read aloud all of their books
for anyone interested.

The creative possibilities
here are endless,

and to me, that’s very exciting.

And here’s how it’s working so far.

First, we introduce a new technique

that can reconstruct a high-detailed
3D face model from any image

without ever 3D-scanning the person.

And here’s the same output model
from different views.

This also works on videos,

by running the same algorithm
on each video frame

and generating a moving 3D model.

And here’s the same
output model from different angles.

It turns out this problem
is very challenging,

but the key trick
is that we are going to analyze

a large photo collection
of the person beforehand.

For George W. Bush,
we can just search on Google,

and from that, we are able
to build an average model,

an iterative, refined model
to recover the expression

in fine details,
like creases and wrinkles.

What’s fascinating about this

is that the photo collection
can come from your typical photos.

It doesn’t really matter
what expression you’re making

or where you took those photos.

What matters is
that there are a lot of them.

And we are still missing color here,

so next, we develop
a new blending technique

that improves upon
a single averaging method

and produces sharp
facial textures and colors.

And this can be done for any expression.

Now we have a control
of a model of a person,

and the way it’s controlled now
is by a sequence of static photos.

Notice how the wrinkles come and go,
depending on the expression.

We can also use a video
to drive the model.

(Video) Daniel Craig: Right, but somehow,

we’ve managed to attract
some more amazing people.

SS: And here’s another fun demo.

So what you see here
are controllable models

of people I built
from their internet photos.

Now, if you transfer
the motion from the input video,

we can actually drive the entire party.

George W. Bush:
It’s a difficult bill to pass,

because there’s a lot of moving parts,

and the legislative processes can be ugly.

(Applause)

SS: So coming back a little bit,

our ultimate goal, rather,
is to capture their mannerisms

or the unique way each
of these people talks and smiles.

So to do that, can we
actually teach the computer

to imitate the way someone talks

by only showing it
video footage of the person?

And what I did exactly was,
I let a computer watch

14 hours of pure Barack Obama
giving addresses.

And here’s what we can produce
given only his audio.

(Video) BO: The results are clear.

America’s businesses have created
14.5 million new jobs

over 75 straight months.

SS: So what’s being synthesized here
is only the mouth region,

and here’s how we do it.

Our pipeline uses a neural network

to convert and input audio
into these mouth points.

(Video) BO: We get it through our job
or through Medicare or Medicaid.

SS: Then we synthesize the texture,
enhance details and teeth,

and blend it into the head
and background from a source video.

(Video) BO: Women can get free checkups,

and you can’t get charged more
just for being a woman.

Young people can stay
on a parent’s plan until they turn 26.

SS: I think these results
seem very realistic and intriguing,

but at the same time
frightening, even to me.

Our goal was to build an accurate model
of a person, not to misrepresent them.

But one thing that concerns me
is its potential for misuse.

People have been thinking
about this problem for a long time,

since the days when Photoshop
first hit the market.

As a researcher, I’m also working
on countermeasure technology,

and I’m part of an ongoing
effort at AI Foundation,

which uses a combination
of machine learning and human moderators

to detect fake images and videos,

fighting against my own work.

And one of the tools we plan to release
is called Reality Defender,

which is a web-browser plug-in
that can flag potentially fake content

automatically, right in the browser.

(Applause)

Despite all this, though,

fake videos could do a lot of damage,

even before anyone has a chance to verify,

so it’s very important
that we make everyone aware

of what’s currently possible

so we can have the right assumption
and be critical about what we see.

There’s still a long way to go before
we can fully model individual people

and before we can ensure
the safety of this technology.

But I’m excited and hopeful,

because if we use it right and carefully,

this tool can allow any individual’s
positive impact on the world

to be massively scaled

and really help shape our future
the way we want it to be.

Thank you.

(Applause)

看看这些图像。

现在，告诉我这里的哪个奥巴马是真实的。

（视频）巴拉克·奥巴马（Barack Obama）：帮助家庭为
住房再融资

，投资
于高科技制造业、

清洁能源

和
创造良好新就业机会的基础设施。

Supasorn Suwajanakorn：有人吗？

答案是否定的。

（笑声）

这些都不是真的。

那么让我告诉你我们是如何到达这里的。

我对这项工作的灵感

是一个旨在保留我们从幸存者那里
了解大屠杀的最后机会的项目

。

它被称为证词中的新维度

，它允许您与

真正的大屠杀幸存者的全息图进行互动对话。

（视频）男：你是如何
在大屠杀中幸存下来的？

（视频）全息图：我是怎么活下来的？

我活了下来，

我相信，

因为上帝守护着我。

SS：原来这些答案
是在录音室里预先录制好的。

然而效果却是惊人的。

你觉得与他的故事
和他作为一个人的联系如此紧密。

我认为人类互动有一些特别之处

，它

比书籍、讲座
或电影所能教给我们的更深刻和个性化。

所以我看到了这个并开始怀疑，

我们可以
为任何人创建一个这样的模型吗？

一个看起来、说话
和行为都像他们一样的模特？

所以我开始看看这是否可以做到，

并最终提出了一个新的解决

方案，它可以只使用这些东西来建立一个人的模型：一个人的

现有照片和视频。

如果你可以利用
这种被动信息，

只是那里的照片和视频，

这就是扩展到任何人的关键。

顺便说一句，这里是理查德·费曼，

他除了
是诺贝尔物理学奖得主外，

还被称为传奇老师。

如果我们能把他带回来

给他讲课
并激励数百万的孩子，

也许不只是用英语，
而是用任何语言，这不是很好吗？

或者您是否可以向我们的祖父母
寻求建议并听到那些安慰的话，

即使他们不再与我们在一起？

或者也许使用这个工具，
书籍作者，无论是否活着，

都可
以为任何感兴趣的人大声朗读他们所有的书籍。这里

的创意可能性
是无穷无尽的

，对我来说，这非常令人兴奋。

到目前为止，这就是它的工作方式。

首先，我们介绍了一种新技术

，它可以
从任何图像中重建高细节的 3D 人脸模型，

而无需对人进行 3D 扫描。

这是来自不同视图的相同输出模型
。

这也适用于视频，

通过
在每个视频帧上运行相同的算法

并生成移动的 3D 模型。

这是
不同角度的相同输出模型。

事实证明，这个问题
非常具有挑战性，

但关键
是我们要事先分析该人

的大量照片
集。

对于乔治·W·布什，
我们可以在谷歌上搜索，

然后我们就
可以建立一个平均模型，

一个迭代的、细化的模型
来恢复

细节的表达，
比如折痕和皱纹。

令人着迷的

是，照片集
可以来自您的典型照片。

你在做什么表情

或在哪里拍摄这些照片并不重要。

重要的
是它们有很多。

而且我们在这里仍然缺少颜色，

所以接下来，我们开发
了一种新的混合技术

，它改进
了单一的平均方法

并产生清晰的
面部纹理和颜色。

这可以用于任何表达式。

现在我们可以
控制一个人的模型

，现在控制它的方式
是通过一系列静态照片。

注意皱纹是如何来来去去的，
这取决于表情。

我们还可以使用视频
来驱动模型。

（视频）丹尼尔克雷格：是的，但不知何故，

我们设法吸引了
一些更了不起的人。

SS：这是另一个有趣的演示。

所以你在这里看到的
是

我
从他们的互联网照片中建立的可控的人模型。

现在，如果您
从输入视频传输动作，

我们实际上可以驱动整个派对。

乔治·W·布什：
这是一项很难通过的法案，

因为有很多活动的部分，

而且立法程序可能很丑陋。

（掌声）

SS：所以说回来一点

，我们的最终目标
是捕捉他们的举止

或
每个人说话和微笑的独特方式。

所以要做到这一点，我们真的可以

通过只显示这个人的
视频片段来教计算机模仿某人说话的方式吗？

而我所做的正是，
我让一台电脑观看了

14 小时纯粹的巴拉克奥巴马
发表演讲。

这就是我们可以
只给他的音频制作的东西。

（视频）BO：结果很清楚。

美国企业连续 75 个月创造了
1450 万个新工作岗位

。

SS：所以这里合成
的只是嘴巴区域

，我们是这样做的。

我们的管道使用神经网络

将音频转换并输入
到这些嘴点。

（视频）BO：我们通过我们的工作
或通过医疗保险或医疗补助来获得它。

SS：然后我们合成纹理，
增强细节和牙齿，

并将其融合
到源视频的头部和背景中。

（视频）BO：女性可以免费检查

，不能
因为女性身份而被收取更多费用。

年轻人可以
坚持父母的计划直到 26 岁。

SS：我认为这些结果
看起来非常现实和有趣，

但同时也很
可怕，甚至对我来说也是如此。

我们的目标是建立一个准确
的人模型，而不是歪曲他们。

但我担心的一件事
是它可能被滥用。自 Photoshop 首次上市

以来，人们一直在思考这个问题

。

作为一名研究人员，我也在
研究反制技术

，我是
AI Foundation 正在进行的一项工作的一部分，

该基金会使用
机器学习和人类主持人的组合

来检测虚假图像和视频，

与我自己的工作作斗争。

我们计划发布的其中一个工具
叫做 Reality Defender，

它是一个网络浏览器插件
，可以

在浏览器中自动标记潜在的虚假内容。

（掌声）

尽管如此，

假视频可能会造成很大的损害，

甚至在任何人有机会验证之前，

所以我们让每个人都

知道当前可能发生的事情非常重要，

这样我们才能做出正确的假设
并保持批判性关于我们所看到的。

在
我们可以完全模拟个人

以及确保
这项技术的安全性之前，还有很长的路要走。

但我很兴奋也充满希望，

因为如果我们正确而谨慎地使用它，

这个工具可以让任何个人
对世界的积极影响

得到大规模扩展，

并真正帮助我们
按照我们想要的方式塑造我们的未来。

谢谢你。

（掌声）