The era of blind faith in big data must end Cathy ONeil

Algorithms are everywhere.

They sort and separate
the winners from the losers.

The winners get the job

or a good credit card offer.

The losers don’t even get an interview

or they pay more for insurance.

We’re being scored with secret formulas
that we don’t understand

that often don’t have systems of appeal.

That begs the question:

What if the algorithms are wrong?

To build an algorithm you need two things:

you need data, what happened in the past,

and a definition of success,

the thing you’re looking for
and often hoping for.

You train an algorithm
by looking, figuring out.

The algorithm figures out
what is associated with success.

What situation leads to success?

Actually, everyone uses algorithms.

They just don’t formalize them
in written code.

Let me give you an example.

I use an algorithm every day
to make a meal for my family.

The data I use

is the ingredients in my kitchen,

the time I have,

the ambition I have,

and I curate that data.

I don’t count those little packages
of ramen noodles as food.

(Laughter)

My definition of success is:

a meal is successful
if my kids eat vegetables.

It’s very different
from if my youngest son were in charge.

He’d say success is if
he gets to eat lots of Nutella.

But I get to choose success.

I am in charge. My opinion matters.

That’s the first rule of algorithms.

Algorithms are opinions embedded in code.

It’s really different from what you think
most people think of algorithms.

They think algorithms are objective
and true and scientific.

That’s a marketing trick.

It’s also a marketing trick

to intimidate you with algorithms,

to make you trust and fear algorithms

because you trust and fear mathematics.

A lot can go wrong when we put
blind faith in big data.

This is Kiri Soares.
She’s a high school principal in Brooklyn.

In 2011, she told me
her teachers were being scored

with a complex, secret algorithm

called the “value-added model.”

I told her, “Well, figure out
what the formula is, show it to me.

I’m going to explain it to you.”

She said, “Well, I tried
to get the formula,

but my Department of Education contact
told me it was math

and I wouldn’t understand it.”

It gets worse.

The New York Post filed
a Freedom of Information Act request,

got all the teachers' names
and all their scores

and they published them
as an act of teacher-shaming.

When I tried to get the formulas,
the source code, through the same means,

I was told I couldn’t.

I was denied.

I later found out

that nobody in New York City
had access to that formula.

No one understood it.

Then someone really smart
got involved, Gary Rubinstein.

He found 665 teachers
from that New York Post data

that actually had two scores.

That could happen if they were teaching

seventh grade math and eighth grade math.

He decided to plot them.

Each dot represents a teacher.

(Laughter)

What is that?

(Laughter)

That should never have been used
for individual assessment.

It’s almost a random number generator.

(Applause)

But it was.

This is Sarah Wysocki.

She got fired, along
with 205 other teachers,

from the Washington, DC school district,

even though she had great
recommendations from her principal

and the parents of her kids.

I know what a lot
of you guys are thinking,

especially the data scientists,
the AI experts here.

You’re thinking, “Well, I would never make
an algorithm that inconsistent.”

But algorithms can go wrong,

even have deeply destructive effects
with good intentions.

And whereas an airplane
that’s designed badly

crashes to the earth and everyone sees it,

an algorithm designed badly

can go on for a long time,
silently wreaking havoc.

This is Roger Ailes.

(Laughter)

He founded Fox News in 1996.

More than 20 women complained
about sexual harassment.

They said they weren’t allowed
to succeed at Fox News.

He was ousted last year,
but we’ve seen recently

that the problems have persisted.

That begs the question:

What should Fox News do
to turn over another leaf?

Well, what if they replaced
their hiring process

with a machine-learning algorithm?

That sounds good, right?

Think about it.

The data, what would the data be?

A reasonable choice would be the last
21 years of applications to Fox News.

Reasonable.

What about the definition of success?

Reasonable choice would be,

well, who is successful at Fox News?

I guess someone who, say,
stayed there for four years

and was promoted at least once.

Sounds reasonable.

And then the algorithm would be trained.

It would be trained to look for people
to learn what led to success,

what kind of applications
historically led to success

by that definition.

Now think about what would happen

if we applied that
to a current pool of applicants.

It would filter out women

because they do not look like people
who were successful in the past.

Algorithms don’t make things fair

if you just blithely,
blindly apply algorithms.

They don’t make things fair.

They repeat our past practices,

our patterns.

They automate the status quo.

That would be great
if we had a perfect world,

but we don’t.

And I’ll add that most companies
don’t have embarrassing lawsuits,

but the data scientists in those companies

are told to follow the data,

to focus on accuracy.

Think about what that means.

Because we all have bias,
it means they could be codifying sexism

or any other kind of bigotry.

Thought experiment,

because I like them:

an entirely segregated society –

racially segregated, all towns,
all neighborhoods

and where we send the police
only to the minority neighborhoods

to look for crime.

The arrest data would be very biased.

What if, on top of that,
we found the data scientists

and paid the data scientists to predict
where the next crime would occur?

Minority neighborhood.

Or to predict who the next
criminal would be?

A minority.

The data scientists would brag
about how great and how accurate

their model would be,

and they’d be right.

Now, reality isn’t that drastic,
but we do have severe segregations

in many cities and towns,

and we have plenty of evidence

of biased policing
and justice system data.

And we actually do predict hotspots,

places where crimes will occur.

And we do predict, in fact,
the individual criminality,

the criminality of individuals.

The news organization ProPublica
recently looked into

one of those “recidivism risk” algorithms,

as they’re called,

being used in Florida
during sentencing by judges.

Bernard, on the left, the black man,
was scored a 10 out of 10.

Dylan, on the right, 3 out of 10.

10 out of 10, high risk.
3 out of 10, low risk.

They were both brought in
for drug possession.

They both had records,

but Dylan had a felony

but Bernard didn’t.

This matters, because
the higher score you are,

the more likely you’re being given
a longer sentence.

What’s going on?

Data laundering.

It’s a process by which
technologists hide ugly truths

inside black box algorithms

and call them objective;

call them meritocratic.

When they’re secret,
important and destructive,

I’ve coined a term for these algorithms:

“weapons of math destruction.”

(Laughter)

(Applause)

They’re everywhere,
and it’s not a mistake.

These are private companies
building private algorithms

for private ends.

Even the ones I talked about
for teachers and the public police,

those were built by private companies

and sold to the government institutions.

They call it their “secret sauce” –

that’s why they can’t tell us about it.

It’s also private power.

They are profiting for wielding
the authority of the inscrutable.

Now you might think,
since all this stuff is private

and there’s competition,

maybe the free market
will solve this problem.

It won’t.

There’s a lot of money
to be made in unfairness.

Also, we’re not economic rational agents.

We all are biased.

We’re all racist and bigoted
in ways that we wish we weren’t,

in ways that we don’t even know.

We know this, though, in aggregate,

because sociologists
have consistently demonstrated this

with these experiments they build,

where they send a bunch
of applications to jobs out,

equally qualified but some
have white-sounding names

and some have black-sounding names,

and it’s always disappointing,
the results – always.

So we are the ones that are biased,

and we are injecting those biases
into the algorithms

by choosing what data to collect,

like I chose not to think
about ramen noodles –

I decided it was irrelevant.

But by trusting the data that’s actually
picking up on past practices

and by choosing the definition of success,

how can we expect the algorithms
to emerge unscathed?

We can’t. We have to check them.

We have to check them for fairness.

The good news is,
we can check them for fairness.

Algorithms can be interrogated,

and they will tell us
the truth every time.

And we can fix them.
We can make them better.

I call this an algorithmic audit,

and I’ll walk you through it.

First, data integrity check.

For the recidivism risk
algorithm I talked about,

a data integrity check would mean
we’d have to come to terms with the fact

that in the US, whites and blacks
smoke pot at the same rate

but blacks are far more likely
to be arrested –

four or five times more likely,
depending on the area.

What is that bias looking like
in other crime categories,

and how do we account for it?

Second, we should think about
the definition of success,

audit that.

Remember – with the hiring
algorithm? We talked about it.

Someone who stays for four years
and is promoted once?

Well, that is a successful employee,

but it’s also an employee
that is supported by their culture.

That said, also it can be quite biased.

We need to separate those two things.

We should look to
the blind orchestra audition

as an example.

That’s where the people auditioning
are behind a sheet.

What I want to think about there

is the people who are listening
have decided what’s important

and they’ve decided what’s not important,

and they’re not getting
distracted by that.

When the blind orchestra
auditions started,

the number of women in orchestras
went up by a factor of five.

Next, we have to consider accuracy.

This is where the value-added model
for teachers would fail immediately.

No algorithm is perfect, of course,

so we have to consider
the errors of every algorithm.

How often are there errors,
and for whom does this model fail?

What is the cost of that failure?

And finally, we have to consider

the long-term effects of algorithms,

the feedback loops that are engendering.

That sounds abstract,

but imagine if Facebook engineers
had considered that

before they decided to show us
only things that our friends had posted.

I have two more messages,
one for the data scientists out there.

Data scientists: we should
not be the arbiters of truth.

We should be translators
of ethical discussions that happen

in larger society.

(Applause)

And the rest of you,

the non-data scientists:

this is not a math test.

This is a political fight.

We need to demand accountability
for our algorithmic overlords.

(Applause)

The era of blind faith
in big data must end.

Thank you very much.

(Applause)

算法无处不在。

他们对
赢家和输家进行分类和区分。

获胜者获得工作

或良好的信用卡优惠。

失败者甚至没有得到面试机会，

或者他们为保险支付更多费用。

我们正在使用我们不了解的秘密公式进行评分，这些公式

通常没有上诉系统。

这就引出了一个问题：

如果算法错误怎么办？

要构建算法，您需要两件事：

您需要数据，过去发生的事情，

以及成功的定义，

即您正在寻找
和经常希望的事情。

你
通过观察、计算来训练算法。

该算法
找出与成功相关的因素。

什么情况会导致成功？

实际上，每个人都使用算法。

他们只是没有
以书面代码形式化它们。

让我给你举个例子。

我每天都使用一种算法
为我的家人做一顿饭。

我使用的数据

是我厨房里的食材、我

拥有的时间、

我拥有的野心

，我会整理这些数据。

我不把那些小包装
的拉面算作食物。

（笑声）

我对成功的定义是：

如果我的孩子吃蔬菜，一顿饭就成功了。

这
与我最小的儿子负责时非常不同。

他会说成功是如果
他能吃很多 Nutella。

但我可以选择成功。

我负责。我的意见很重要。

这是算法的第一条规则。

算法是嵌入在代码中的意见。

它与您
认为大多数人对算法的看法确实不同。

他们认为算法是客观
、真实和科学的。

这是一种营销技巧。

这也是一种营销技巧

，用算法恐吓你

，让你信任和害怕算法，

因为你信任和害怕数学。

当我们盲目相信大数据时，很多事情都会出错
。

这是基里苏亚雷斯。
她是布鲁克林的一名高中校长。

2011 年，她告诉我，
她的老师正在

使用一种

称为“增值模型”的复杂秘密算法进行评分。

我告诉她，“好吧，
弄清楚公式是什么，给我看。

我会向你解释的。”

她说：“嗯，我
想得到这个公式，

但我的教育部联系人
告诉我这是数学

，我听不懂。”

它变得更糟。

《纽约邮报》提出
了《信息自由法》的请求，

获得了所有教师的姓名
和所有分数，

并将其发布
为羞辱教师的行为。

当我试图
通过相同的方式获取公式和源代码时，

我被告知我做不到。

我被拒绝了。

后来我

发现纽约市
没有人可以使用该公式。

没有人理解它。

然后一个非常聪明的
人参与进来，加里鲁宾斯坦。

他
从纽约邮报的数据

中找到了 665 位教师，他们实际上有两个分数。

如果他们教

七年级数学和八年级数学，就可能发生这种情况。

他决定策划他们。

每个点代表一位老师。

（笑声）那

是什么？

（笑声）

那不应该被
用于个人评估。

它几乎是一个随机数生成器。

（掌声）

但确实如此。

这是莎拉·维索基。

她
和其他 205 名教师一起

被华盛顿特区学区解雇，

尽管她的
校长

和孩子们的父母给了她很好的建议。

我知道
你们很多人在想什么，

尤其是数据科学家，
这里的人工智能专家。

你在想，“好吧，我永远不会做出
如此不一致的算法。”

但是算法可能会出错，

甚至出于善意而产生严重的破坏性
影响。

虽然
设计糟糕的飞机

坠毁在地球上并且每个人都看到了它，

但设计糟糕的算法

可以持续很长时间，
默默地造成严重破坏。

这是罗杰·艾尔斯。

（笑声）

他在 1996 年创立了福克斯新闻

。20 多名女性
抱怨性骚扰。

他们说他们不被允许
在福克斯新闻取得成功。

他去年被赶下台，
但我们最近

看到问题仍然存在。

这就引出了一个问题：

福克斯新闻应该怎么做
才能翻身？

那么，如果他们

用机器学习算法代替招聘流程呢？

听起来不错，对吧？

想想看。

数据，数据会是什么？

一个合理的选择是最近
21 年对 Fox News 的申请。

合理的。

成功的定义是什么？

合理的选择是，

好吧，谁在 Fox News 取得了成功？

我猜有人，比如说，
在那里待了四年

，至少升职过一次。

听起来很合理。

然后将训练算法。

它将被训练来寻找
人们了解是什么导致了成功，

什么样的应用程序在
历史上导致

了该定义的成功。

现在想想

如果我们将其
应用于当前的申请人池会发生什么。

它会过滤掉女性，

因为她们看起来不像
过去成功的人。

如果您只是轻率地、
盲目地应用算法，算法不会使事情变得公平。

他们不会让事情变得公平。

他们重复我们过去的做法，

我们的模式。

他们使现状自动化。

如果我们有一个完美的世界，那就太好了，

但我们没有。

我还要补充一点，大多数公司
没有令人尴尬的诉讼，

但这些公司的数据科学家

被告知要跟踪数据

，专注于准确性。

想想这意味着什么。

因为我们都有偏见，
这意味着他们可能正在编纂性别歧视

或任何其他类型的偏见。

思想实验，

因为我喜欢他们：

一个完全隔离的社会——

种族隔离，所有城镇，
所有社区

，我们只派警察
到少数族裔

社区寻找犯罪。

逮捕数据将非常有偏见。

如果最重要的是，
我们找到了数据科学家

并付钱给数据科学家来预测
下一次犯罪会发生在哪里呢？

少数民族社区。

或者预测下一个
罪犯是谁？

少数。

数据科学家会
吹嘘他们的模型有多棒和多准确

，

而且他们是对的。

现在，现实并没有那么严重，
但我们在许多城镇确实存在严重的种族隔离

，

而且我们有大量证据表明

存在偏见的警务
和司法系统数据。

实际上，我们确实可以预测热点，

即犯罪发生的地方。

事实上，我们确实预测
了个人的犯罪

行为，个人的犯罪行为。

新闻机构 ProPublica
最近调查了

一种被称为“累犯风险”的算法，该算法

在佛罗里达州
的法官量刑期间使用。

左边的黑人伯纳德（Bernard
）的得分为 10 分（满分 10 分）。迪伦（右边）的得分为 10 分中

的 3 分

。10 分中的 10 分，高风险。
十分之三，低风险。

他们都
因持有毒品而被带进来。

他们都有记录，

但迪伦有重罪，

而伯纳德没有。

这很重要，因为
你的分数越高，你被

给予更长的句子的可能性就越大。

这是怎么回事？

数据洗钱。

这是一个
技术人员将丑陋的真相

隐藏在黑盒算法

中并称其为客观的过程；

称他们为精英。

当它们是秘密的、
重要的和破坏性的时，

我为这些算法创造了一个术语：

“数学破坏武器”。

（笑声）

（掌声）

他们无处不在
，这不是一个错误。

这些是为私人目的
构建私人算法的私人公司

。

即使是我
为教师和公共警察谈论的

那些，也是由私营公司建造

并出售给政府机构的。

他们称其为“秘方”——

这就是为什么他们不能告诉我们的原因。

这也是私人权力。

他们因行使
高深莫测的权威而获利。

现在你可能会想，
既然所有这些东西都是私人的，

而且存在竞争，

也许自由市场
会解决这个问题。

它不会。

在不公平中可以赚到很多钱。

此外，我们不是经济理性的代理人。

我们都有偏见。

我们都是种族主义者和偏执狂
，以我们希望我们不是

的方式，以我们甚至不知道的方式。

但是，总的来说，我们知道这一点，

因为社会学家
一直

通过他们建立的这些实验证明了这一点，他们在这些实验

中发送了
一堆求职申请，

同样合格，但有些
人的名字听起来像白人

，有些人的名字听起来像黑人，

而且结果总是令人
失望——总是。

所以我们是有偏见的人

，我们通过选择要收集的数据将这些偏见
注入算法中

，

就像我选择不
考虑拉面一样——

我认为这无关紧要。

但是，通过信任实际从过去实践中汲取的数据

并选择成功的定义，

我们怎么能期望
算法毫发无损呢？

我们不能。我们必须检查它们。

我们必须检查它们是否公平。

好消息是，
我们可以检查它们是否公平。

算法可以被询问

，它们每次都会告诉
我们真相。

我们可以修复它们。
我们可以让他们变得更好。

我将其称为算法审核

，我将引导您完成它。

首先，数据完整性检查。

对于我谈到的累犯风险
算法

，数据完整性检查意味着
我们必须接受这样一个事实

，即在美国，白人和
黑人以相同的速度吸烟，

但黑人更有
可能被捕 - -

可能是四到五倍，
具体取决于地区。

这种偏见
在其他犯罪类别中是什么样的

，我们如何解释它？

其次，我们应该思考
成功的定义，

审计那个。

还记得——使用招聘
算法吗？我们谈到了它。

一个人留了四年
，升了一次？

嗯，这是一个成功的员工，

但也是一个
受到他们文化支持的员工。

也就是说，它也可能是相当有偏见的。

我们需要把这两件事分开。

我们应该
以盲人管弦乐队试镜

为例。

那就是试镜的人
在一张纸后面的地方。

我想思考的

是，倾听的人
已经决定了什么是重要的

，他们已经决定了什么不重要

，他们不会因此而
分心。

当盲人管弦乐队
试镜开始时，管弦乐队中

的女性人数
增加了五倍。

接下来，我们必须考虑准确性。

这就是
教师增值模式会立即失败的地方。

当然，没有算法是完美的，

所以我们必须考虑
每个算法的错误。

多久出现一次错误
，这个模型对谁失败？

失败的代价是什么？

最后，我们必须考虑

算法的长期影响，

即产生的反馈循环。

这听起来很抽象，

但想象一下 Facebook 工程师

在决定
只向我们展示朋友发布的内容之前是否考虑过这一点。

我还有两条信息，
一条是给数据科学家的。

数据科学家：我们
不应该成为真理的仲裁者。

我们应该成为更大社会
中发生的道德讨论的翻译者

。

（掌声）

还有你们其他人

，非数据科学家：

这不是数学测试。

这是一场政治斗争。

我们需要要求
对我们的算法霸主负责。

（掌声）

盲目
相信大数据的时代必须结束。

非常感谢你。

（掌声）