How we can protect truth in the age of misinformation Sinan Aral

Translator: Ivana Korom
Reviewer: Krystian Aparta

So, on April 23 of 2013,

the Associated Press
put out the following tweet on Twitter.

It said, “Breaking news:

Two explosions at the White House

and Barack Obama has been injured.”

This tweet was retweeted 4,000 times
in less than five minutes,

and it went viral thereafter.

Now, this tweet wasn’t real news
put out by the Associated Press.

In fact it was false news, or fake news,

that was propagated by Syrian hackers

that had infiltrated
the Associated Press Twitter handle.

Their purpose was to disrupt society,
but they disrupted much more.

Because automated trading algorithms

immediately seized
on the sentiment on this tweet,

and began trading based on the potential

that the president of the United States
had been injured or killed

in this explosion.

And as they started tweeting,

they immediately sent
the stock market crashing,

wiping out 140 billion dollars
in equity value in a single day.

Robert Mueller, special counsel
prosecutor in the United States,

issued indictments
against three Russian companies

and 13 Russian individuals

on a conspiracy to defraud
the United States

by meddling in the 2016
presidential election.

And what this indictment tells as a story

is the story of the Internet
Research Agency,

the shadowy arm of the Kremlin
on social media.

During the presidential election alone,

the Internet Agency’s efforts

reached 126 million people
on Facebook in the United States,

issued three million individual tweets

and 43 hours' worth of YouTube content.

All of which was fake –

misinformation designed to sow discord
in the US presidential election.

A recent study by Oxford University

showed that in the recent
Swedish elections,

one third of all of the information
spreading on social media

about the election

was fake or misinformation.

In addition, these types
of social-media misinformation campaigns

can spread what has been called
“genocidal propaganda,”

for instance against
the Rohingya in Burma,

triggering mob killings in India.

We studied fake news

and began studying it
before it was a popular term.

And we recently published
the largest-ever longitudinal study

of the spread of fake news online

on the cover of “Science”
in March of this year.

We studied all of the verified
true and false news stories

that ever spread on Twitter,

from its inception in 2006 to 2017.

And when we studied this information,

we studied verified news stories

that were verified by six
independent fact-checking organizations.

So we knew which stories were true

and which stories were false.

We can measure their diffusion,

the speed of their diffusion,

the depth and breadth of their diffusion,

how many people become entangled
in this information cascade and so on.

And what we did in this paper

was we compared the spread of true news
to the spread of false news.

And here’s what we found.

We found that false news
diffused further, faster, deeper

and more broadly than the truth

in every category of information
that we studied,

sometimes by an order of magnitude.

And in fact, false political news
was the most viral.

It diffused further, faster,
deeper and more broadly

than any other type of false news.

When we saw this,

we were at once worried but also curious.

Why?

Why does false news travel
so much further, faster, deeper

and more broadly than the truth?

The first hypothesis
that we came up with was,

“Well, maybe people who spread false news
have more followers or follow more people,

or tweet more often,

or maybe they’re more often ‘verified’
users of Twitter, with more credibility,

or maybe they’ve been on Twitter longer.”

So we checked each one of these in turn.

And what we found
was exactly the opposite.

False-news spreaders had fewer followers,

followed fewer people, were less active,

less often “verified”

and had been on Twitter
for a shorter period of time.

And yet,

false news was 70 percent more likely
to be retweeted than the truth,

controlling for all of these
and many other factors.

So we had to come up
with other explanations.

And we devised what we called
a “novelty hypothesis.”

So if you read the literature,

it is well known that human attention
is drawn to novelty,

things that are new in the environment.

And if you read the sociology literature,

you know that we like to share
novel information.

It makes us seem like we have access
to inside information,

and we gain in status
by spreading this kind of information.

So what we did was we measured the novelty
of an incoming true or false tweet,

compared to the corpus
of what that individual had seen

in the 60 days prior on Twitter.

But that wasn’t enough,
because we thought to ourselves,

“Well, maybe false news is more novel
in an information-theoretic sense,

but maybe people
don’t perceive it as more novel.”

So to understand people’s
perceptions of false news,

we looked at the information
and the sentiment

contained in the replies
to true and false tweets.

And what we found

was that across a bunch
of different measures of sentiment –

surprise, disgust, fear, sadness,

anticipation, joy and trust –

false news exhibited significantly more
surprise and disgust

in the replies to false tweets.

And true news exhibited
significantly more anticipation,

joy and trust

in reply to true tweets.

The surprise corroborates
our novelty hypothesis.

This is new and surprising,
and so we’re more likely to share it.

At the same time,
there was congressional testimony

in front of both houses of Congress
in the United States,

looking at the role of bots
in the spread of misinformation.

So we looked at this too –

we used multiple sophisticated
bot-detection algorithms

to find the bots in our data
and to pull them out.

So we pulled them out,
we put them back in

and we compared what happens
to our measurement.

And what we found was that, yes indeed,

bots were accelerating
the spread of false news online,

but they were accelerating
the spread of true news

at approximately the same rate.

Which means bots are not responsible

for the differential diffusion
of truth and falsity online.

We can’t abdicate that responsibility,

because we, humans,
are responsible for that spread.

Now, everything
that I have told you so far,

unfortunately for all of us,

is the good news.

The reason is because
it’s about to get a whole lot worse.

And two specific technologies
are going to make it worse.

We are going to see the rise
of a tremendous wave of synthetic media.

Fake video, fake audio
that is very convincing to the human eye.

And this will powered by two technologies.

The first of these is known
as “generative adversarial networks.”

This is a machine-learning model
with two networks:

a discriminator,

whose job it is to determine
whether something is true or false,

and a generator,

whose job it is to generate
synthetic media.

So the synthetic generator
generates synthetic video or audio,

and the discriminator tries to tell,
“Is this real or is this fake?”

And in fact, it is the job
of the generator

to maximize the likelihood
that it will fool the discriminator

into thinking the synthetic
video and audio that it is creating

is actually true.

Imagine a machine in a hyperloop,

trying to get better
and better at fooling us.

This, combined with the second technology,

which is essentially the democratization
of artificial intelligence to the people,

the ability for anyone,

without any background
in artificial intelligence

or machine learning,

to deploy these kinds of algorithms
to generate synthetic media

makes it ultimately so much easier
to create videos.

The White House issued
a false, doctored video

of a journalist interacting with an intern
who was trying to take his microphone.

They removed frames from this video

in order to make his actions
seem more punchy.

And when videographers
and stuntmen and women

were interviewed
about this type of technique,

they said, “Yes, we use this
in the movies all the time

to make our punches and kicks
look more choppy and more aggressive.”

They then put out this video

and partly used it as justification

to revoke Jim Acosta,
the reporter’s, press pass

from the White House.

And CNN had to sue
to have that press pass reinstated.

There are about five different paths
that I can think of that we can follow

to try and address some
of these very difficult problems today.

Each one of them has promise,

but each one of them
has its own challenges.

The first one is labeling.

Think about it this way:

when you go to the grocery store
to buy food to consume,

it’s extensively labeled.

You know how many calories it has,

how much fat it contains –

and yet when we consume information,
we have no labels whatsoever.

What is contained in this information?

Is the source credible?

Where is this information gathered from?

We have none of that information

when we are consuming information.

That is a potential avenue,
but it comes with its challenges.

For instance, who gets to decide,
in society, what’s true and what’s false?

Is it the governments?

Is it Facebook?

Is it an independent
consortium of fact-checkers?

And who’s checking the fact-checkers?

Another potential avenue is incentives.

We know that during
the US presidential election

there was a wave of misinformation
that came from Macedonia

that didn’t have any political motive

but instead had an economic motive.

And this economic motive existed,

because false news travels
so much farther, faster

and more deeply than the truth,

and you can earn advertising dollars
as you garner eyeballs and attention

with this type of information.

But if we can depress the spread
of this information,

perhaps it would reduce
the economic incentive

to produce it at all in the first place.

Third, we can think about regulation,

and certainly, we should think
about this option.

In the United States, currently,

we are exploring what might happen
if Facebook and others are regulated.

While we should consider things
like regulating political speech,

labeling the fact
that it’s political speech,

making sure foreign actors
can’t fund political speech,

it also has its own dangers.

For instance, Malaysia just instituted
a six-year prison sentence

for anyone found spreading misinformation.

And in authoritarian regimes,

these kinds of policies can be used
to suppress minority opinions

and to continue to extend repression.

The fourth possible option
is transparency.

We want to know
how do Facebook’s algorithms work.

How does the data
combine with the algorithms

to produce the outcomes that we see?

We want them to open the kimono

and show us exactly the inner workings
of how Facebook is working.

And if we want to know
social media’s effect on society,

we need scientists, researchers

and others to have access
to this kind of information.

But at the same time,

we are asking Facebook
to lock everything down,

to keep all of the data secure.

So, Facebook and the other
social media platforms

are facing what I call
a transparency paradox.

We are asking them, at the same time,

to be open and transparent
and, simultaneously secure.

This is a very difficult needle to thread,

but they will need to thread this needle

if we are to achieve the promise
of social technologies

while avoiding their peril.

The final thing that we could think about
is algorithms and machine learning.

Technology devised to root out
and understand fake news, how it spreads,

and to try and dampen its flow.

Humans have to be in the loop
of this technology,

because we can never escape

that underlying any technological
solution or approach

is a fundamental ethical
and philosophical question

about how do we define truth and falsity,

to whom do we give the power
to define truth and falsity

and which opinions are legitimate,

which type of speech
should be allowed and so on.

Technology is not a solution for that.

Ethics and philosophy
is a solution for that.

Nearly every theory
of human decision making,

human cooperation and human coordination

has some sense of the truth at its core.

But with the rise of fake news,

the rise of fake video,

the rise of fake audio,

we are teetering on the brink
of the end of reality,

where we cannot tell
what is real from what is fake.

And that’s potentially
incredibly dangerous.

We have to be vigilant
in defending the truth

against misinformation.

With our technologies, with our policies

and, perhaps most importantly,

with our own individual responsibilities,

decisions, behaviors and actions.

Thank you very much.

(Applause)

译者:Ivana Korom
审稿人:Krystian Aparta

因此,2013 年 4 月 23 日

,美联社
在 Twitter 上发布了以下推文。

它说,“突发新闻:

白宫发生两次爆炸

,巴拉克奥巴马受伤。”

这条推文
在不到 5 分钟的时间内被转发了 4000 次,

此后它在网上疯传。

现在,这条推文并不是
美联社发布的真实新闻。

事实上,这是由叙利亚黑客传播的假新闻或假新闻

,他们已经渗透
到美联社的 Twitter 账号中。

他们的目的是扰乱社会,
但他们扰乱的更多。

因为自动交易算法

立即抓住
了这条推文上的情绪,

并根据

美国总统在这次爆炸
中受伤或死亡的可能性开始交易

当他们开始发推文时,

他们立即
让股市崩盘,一天之内就

蒸发了 1400 亿美元
的股票价值。 美国

特别检察官罗伯特·穆勒 (Robert Mueller)

对 3 家俄罗斯公司

和 13 名俄罗斯

个人发出起诉书,指控他们共谋

通过干预 2016 年
总统大选来欺骗美国。

这份起诉书

讲述的是互联网研究机构的故事,它

是克里姆林宫
在社交媒体上的影子机构。

仅在总统选举期间

,Internet Agency

就在美国的 Facebook 上为 1.26 亿用户提供了帮助,

发布了 300 万条个人推文

和 43 小时的 YouTube 内容。

所有这些都是假的——

旨在
在美国总统大选中挑拨离间的错误信息。

牛津大学最近的一项研究

表明,在最近的
瑞典选举中,

社交媒体上传播的

有关选举的所有信息中有三分之一

是虚假或错误信息。

此外,这些类型
的社交媒体错误信息活动

可以传播所谓的
“种族灭绝宣传”

,例如针对
缅甸的罗兴亚人,

在印度引发暴民杀戮。

我们研究了假新闻,


在它成为一个流行术语之前就开始研究它。

我们最近在今年 3 月的《科学》杂志封面上发表
了有史以来规模最大的

在线假新闻传播纵向研究报告

。 从 2006 年到 2017 年,

我们研究了所有在 Twitter 上传播的经过验证的
真实和虚假新闻报道

当我们研究这些信息时,

我们研究了经过六个独立事实核查组织验证的经过验证的新闻报道

所以我们知道哪些故事是真的

,哪些故事是假的。

我们可以测量它们的传播,它们传播

的速度,

它们传播的深度和广度,有

多少人纠缠
在这个信息级联中等等。

我们在这篇论文中所做的

是我们将真实
新闻的传播与虚假新闻的传播进行了比较。

这就是我们发现的。

我们发现,在我们研究的每一类信息中,虚假新闻比真相
传播得更远、更快、更深

、更广泛

有时甚至达到一个数量级。

事实上,虚假的政治新闻
是最流行的。

它比任何其他类型的虚假新闻传播得更远、更快、
更深、更广泛

看到这里,

我们既担心又好奇。

为什么?

为什么假新闻比真相传播
得更远、更快、更深

、更广?

我们提出的第一个假设是,

“嗯,也许传播虚假新闻的人
有更多的追随者或关注更多的人,

或者更频繁地发推文,

或者他们更经常是推特的‘经过验证’的
用户,具有更高的可信度,

或者他们在推特上的时间更长了。”

因此,我们依次检查了其中的每一项。

而我们发现
的恰恰相反。

虚假新闻传播者的追随者更少,

关注的人更少,活跃度

更低,“被验证”的频率更低,

并且在 Twitter
上停留的时间更短。

然而

,在

控制所有这些
和许多其他因素的情况下,虚假新闻被转发的可能性比真实新闻高 70%。

所以我们不得不
想出其他的解释。

我们设计了我们所谓
的“新奇假设”。

因此,如果您阅读文献

,众所周知,人类的注意力

被新奇事物所吸引,即环境中的新事物。

如果你读过社会学文献,

你就会知道我们喜欢分享
新奇的信息。

这让我们看起来好像可以接触
到内部信息,

并且
通过传播这种信息来获得地位。

因此,我们所做的是我们测量
了传入的真假推文的新颖性,


该个人

在 60 天前在 Twitter 上看到的语料库进行比较。

但这还不够,
因为我们想,

“嗯,也许假新闻
在信息论意义上更新颖,

但也许人们
并不认为它更新颖。”

因此,为了了解人们
对虚假新闻的看法,

我们查看了对真假推文

的回复中包含的信息和情绪

我们

发现,在
一系列不同的情绪指标中——

惊讶、厌恶、恐惧、悲伤、

期待、喜悦和信任——

虚假新闻

在对虚假推文的回复中表现出更多的惊讶和厌恶。

真实新闻在回复真实推文时表现出
更多的期待、

喜悦和信任

这个惊喜证实
了我们的新奇假设。

这是新的和令人惊讶的
,所以我们更有可能分享它。

与此同时,美国

国会两院前都有国会证词

考察机器人
在错误信息传播中的作用。

所以我们也研究了这一点——

我们使用了多种复杂的
机器人检测算法

来找到我们数据中的机器人
并将它们拉出来。

所以我们把它们拉出来,
我们把它们放回去

,我们比较
了我们的测量结果。

我们发现,是的,

机器人正在加速
在线虚假新闻的传播,

但它们正在

以大致相同的速度加速真实新闻的传播。

这意味着机器人

不对在线真假的差异传播负责。

我们不能放弃这种责任,

因为我们
人类应对这种传播负责。

现在,

不幸的是,对我们所有人来说,到目前为止我告诉你的一切

都是好消息。

原因是
它即将变得更糟。

两项特定的
技术将使情况变得更糟。

我们将看到
一波巨大的合成媒体浪潮的兴起。

对人眼来说非常有说服力的假视频、假音频。

这将由两种技术提供支持。

其中第一个被
称为“生成对抗网络”。

这是一个具有两个网络的机器学习模型

一个鉴别器,

其工作是确定
某事是真是假

,另一个是生成器,

其工作是生成
合成媒体。

所以合成生成器
生成合成视频或音频

,鉴别器试图判断,
“这是真的还是假的?”

事实上,生成器的工作

最大化
它欺骗鉴别器的可能性,使其

认为它正在创建的合成视频和音频

实际上是真实的。

想象一下超级循环中的一台机器,

试图
越来越好地欺骗我们。

这与第二种技术相结合,

本质上
是人工智能对人们的民主化,

任何人,

没有任何
人工智能

或机器学习背景

,部署这些算法
来生成合成媒体的能力

最终使它变得如此之多 更
容易创建视频。

白宫发布
了一段虚假的、篡改的视频,视频

中一名记者与
一名试图拿走他的麦克风的实习生互动。

他们从这段视频中删除了帧,

以使他的动作
看起来更有力。

当摄像师
、特技演员和

女性接受
关于这种技术的采访时,

他们说,“是的,我们一直
在电影

中使用这种技术,让我们的拳打脚踢
看起来更波涛汹涌,更具侵略性。”

然后他们发布了这段视频,

并部分以此为

理由撤销了记者吉姆·阿科斯塔(Jim Acosta)
的白宫新闻通行证

CNN 不得不提起诉讼
才能恢复该新闻通行证。 我能想到的

大约有五种不同的路径
,我们可以遵循这些路径

来尝试
解决当今这些非常困难的问题。

他们每个人都有希望,

但每个人
都有自己的挑战。

第一个是标签。

可以这样想:

当你去
杂货店买食物来消费时,

它被广泛地贴上了标签。

你知道它有多少卡路里,

它含有多少脂肪

——但是当我们消费信息时,
我们没有任何标签。

这些信息中包含什么?

消息来源可信吗?

这些信息是从哪里收集的?

当我们消费信息时,我们没有这些信息。

这是一条潜在的途径,
但也伴随着挑战。

例如,
在社会中,谁来决定什么是真什么是假?

是政府吗?

是脸书吗?

它是一个独立
的事实核查者联盟吗?

谁在检查事实核查人员?

另一个潜在的途径是激励。

我们知道,
在美国总统大选期间,

有一波来自马其顿的错误信息

,没有任何政治动机

,而是有经济动机。

而且这种经济动机是存在的,

因为虚假新闻比真相传播
得更远、更快

、更深入,


当你通过这类信息获得眼球和关注

时,你可以获得广告收入。

但是,如果我们可以
抑制这些信息的传播,

也许它会从一开始就降低

生产它的经济动机。

第三,我们可以考虑监管

,当然,我们应该
考虑这个选项。

目前,在美国,

我们正在探索
如果 Facebook 和其他公司受到监管可能会发生什么。

虽然我们应该考虑
诸如规范政治言论、

将其标记为
政治言论的事实、

确保外国行为
者不能资助政治言论之类的

事情,但它也有其自身的危险。

例如,马来西亚刚刚

对任何被发现传播错误信息的人判处六年徒刑。

而在专制政权中,

这类政策可以
用来压制少数意见

并继续扩大镇压。

第四种可能的选择
是透明度。

我们想
知道 Facebook 的算法是如何工作的。

数据如何
与算法结合

以产生我们所看到的结果?

我们希望他们打开和服

,向我们展示 Facebook 的内部
运作方式。

如果我们想知道
社交媒体对社会的影响,

我们需要科学家、研究人员

和其他人来
获取这类信息。

但与此同时,

我们要求
Facebook 锁定一切,

以确保所有数据的安全。

因此,Facebook 和其他
社交媒体平台

正面临着我所说
的透明度悖论。

同时,我们要求他们

公开透明
,同时确保安全。

这是一根很难穿的针,

但如果我们要在避免危险的同时实现社会技术的承诺,他们就需要穿上这根针

我们可以考虑的最后一件事
是算法和机器学习。

旨在根除
和了解假新闻及其传播方式

并试图抑制其传播的技术。

人类必须置身于
这项技术的循环中,

因为我们永远无法逃避

任何技术
解决方案或方法

背后的基本伦理
和哲学问题,即

我们如何定义真假,

我们赋予谁
定义真假的权力? 虚假

,哪些意见是合法的,应该允许

哪种类型的言论
等等。

技术不是解决方案。

伦理和哲学
是解决这个问题的方法。

几乎所有
关于人类决策、

人类合作和人类协调的理论

在其核心都具有某种真理感。

但随着假新闻

的兴起,假视频

的兴起,假音频的兴起,

我们正徘徊在
现实终结的边缘

,我们无法分辨
什么是真假。

这可能是
非常危险的。

我们必须保持警惕,
以捍卫真相

免受错误信息的影响。

凭借我们的技术,我们的政策

,也许最重要的是

,我们自己的个人责任、

决定、行为和行动。

非常感谢你。

(掌声)