Bias in Tech Algorithms

Transcriber: Gm Choi
Reviewer: Lucas Kaimaras

It’s that time of week again,
Friday night, pandemic style.

Αround this time last year,
like most of you,

I decided to settle in for a cozy night
on the couch with the TV remote

instead of any wild Friday night
adventures with COVID hanging around.

As I browse through Netflix.

I passed the movie I watched yesterday,

a cheesy rom-com recommended by a friend,

completely not my style.

I quickly moved past it and landed
on the Recommended For You category.

I eagerly began browsing through this one,

since this is where I found
most of the good movies I ended up liking.

Suddenly, though, I sat bolt upright.

Since yesterday, almost all of the movies
in my Recommended For You

were now rom-coms,
each one cheesier than the last.

I’m sure I’m not the only one that feels

like Netflix seems to know me
better than myself sometimes,

but this scenario got me thinking.

How did Netflix know
what I watched yesterday

and how were they able to recommend movies
similar to that one for me to watch today?

To answer that question,

we need to think about
what an algorithm is.

This is a pretty popular buzzword
many of you might’ve heard before,

but what actually is an algorithm?

An algorithm is simply
a list of steps to solve a problem.

In the technology world algorithms
consist of computer implementable commands

that allow you to perform computations.

In fact, algorithms are what Netflix uses

to generate the movies
in the Recommended For You category

that I had such a bitter
experience with last year.

Oftentimes, algorithms that are used

in artificial intelligence
or machine learning,

where we try to get the computer
to imitate human behavior,

are created using data.

The best way to mimic human behavior

is to analyze how we humans
behave through our actions.

For instance, when Netflix writes
their movie recommendation algorithm,

they likely harness data

from the millions of people
who use their services

to predict how humans behave
when it comes to watching movies.

For example, if one user such as myself,
watches the first To All the Boys movie,

then they’ll likely watch
the second one and then the third one.

What Netflix does is compile this trend
of movie watching into a big database,

which it then sends to the computer
to look for patterns.

Most people, after they’ve seen
the first part of a movie series,

will likely watch the second.

The algorithm’s job is
to recognize this pattern from the data,

use it to create a model of what
the correct output is, given an input,

and then apply this model to other users.

This is one of the ways
in which Netflix’s algorithm

produces the movies we see
in the Recommended For You section.

Which is why those movies are often
similar to ones we’ve seen previously,

whether that is by content,
genre or actors.

The idea is that computers
can learn on their own

when given instructions
to follow or data to mine,

and thus mimic human behavior.

But what this idea doesn’t encompass

is the fact that computers
are fundamentally not humans.

As such, they lack many of the things
we humans take for granted

when it comes to making decisions

such as common sense,
ethics and logical reasoning.

This is where the problem arises.

Before algorithms existed,
humans made all the decisions.

Now computers
either make decisions for us

or heavily influence
how we make decisions.

Whether it’s something simple,
like deciding which movie to watch next

or something more serious,

like determining the fiscal credit
worthiness of an individual

to give them a loan.

But these computers
aren’t learning from thin air.

They’re learning from us.

Machine learning algorithms
rely on human data.

They need data to understand
how we humans behave and act

in terms of concrete, hard behaviors.

They then use this data to replicate
human behavior and make tasks simpler.

The problem arises when the data that is
being passed to these machines is flawed.

This is when we run the risk
of teaching a computer the wrong thing,

or in other words,
create a flawed algorithm.

The fundamental cause behind
why such a phenomenon occurs, is this:

computers rely on data
and data comes from human behavior.

However, human behavior
often represents what is,

rather than what should be,

and what is is often racist,
sexist, xenophobic and so on.

As recent events in the US
have illustrated, our society

holds systems of institutionalized
discrimination and oppression

that have been at play for generations.

As such, it isn’t shocking

that any data we collect from people
in our society will be biased,

whether that is against certain ideas,
beliefs, groups of people or traditions,

The machines we are more heavily
relying on day by day are biased.

Why?

Because they’re learning from us
and we are biased.

It may seem like this issue
is so far removed from society.

“What does bias in tech algorithms
have to do with us?”

Well, aside from influencing
the movies Netflix recommends for us,

there are other uses for algorithms
that can impact our lives quite heavily.

One such example
is employee hiring algorithms.

A recent prime example
of how this algorithm became flawed

lies with tech giant Amazon.

Taken together, Amazon’s global workforce
is over 60% male,

with 75% of managerial positions
also held by males.

The data that was fed
to their employee hiring algorithm

allowed it to learn that a woman
were a minority at this company,

and thus it penalized
any resumé it came across

that contained the word “woman” in it,

leading to gender bias in how employees
were hired at this top company.

Another example can be seen
with offender risk assessments.

What are these?

Well, US judges use automated risk
assessments to determine things

like bail and sentencing limits
for individuals accused of a crime.

These assessments rely
on large data sets that go back ages

and include variables like arrest records,
demographics,

financial stability and so on.

The algorithms that these
assessments are based on

have been found to be inherently biased
against African-Americans

by recommending things
such as detainments,

longer prison sentences
and higher bails for them,

in comparison to an equally likely
to reoffend white counterpart.

When we think about it,
this isn’t entirely surprising,

though completely atrocious.

There has been a long history of
marginalisation and racism in our society,

that the data set to use to create
this algorithm most likely reflect.

As such, the algorithm
has learned to recommend actions

that continue to oppress African-Americans

because that is what the data set
it was trained on shows.

A more close to home example can be seen
with facial recognition software.

We use this algorithm
every time we unlock our iPhones.

However, what we may not know
is that facial recognition,

though outwardly proven
to be over 90% accurate,

is actually not this accurate
for everyone.

As the image illustrates,
facial recognition varies in accuracy

depending on the demographic of the user

and is most inaccurate for darker females,

illustrating the bias hidden
within this algorithm.

Though companies that make the software
have since announced commitments

to modify testing
and improve data collection,

it’s important to note

the widespread prevalence of facial
recognition in our society today,

and as such, the damage that this
flawed algorithm has already caused.

Not only do we use it in our phones,

but law enforcement, employment offices,
airports, security firms and more.

All use facial recognition
in multiple capacities.

What if law enforcement incorrectly
flagged an innocent female as a criminal

while failing to identify
the real perpetrator?

What if this happened over
and over and over again?

Can we imagine the terrible
effects that would have?

It may seem bleak
to begin scratching the surface

of how technology can be taught
to imitate human fallacies and bias,

creating long lasting
and far reaching negative impacts.

But there are ways we can
improve the situation.

One way is to widen the breadth of data
used to create these algorithms.

If we give computers more varied
and diverse data to learn from,

we can help ensure that they are learning
correct patterns within human behavior

that accurately reflect
what we want them to do.

Another way is to rally together
in demanding increased transparency

when it comes
to creating these algorithms.

Within the past few decades,
tech giants like Facebook and Google

have harnessed tremendous
amounts of personal data

to create powerful
machine learning technology

with little insight or oversight
from the public and government.

This means that only
a small subset of people

oversaw the creation of something
that a large subset of society is using,

which can inherently lead to bias.

We can fix this by instituting policies

that govern when and how
personal data can be used,

as well as by incorporating the work
of diversity and equity leaders

in the creation of these algorithms.

A final way we can improve the situation
is to explore more deeply

into the idea of teaching machines
societal ethics.

For instance, widely held beliefs
like “innocent until proven guilty”,

common sense reasoning and the elimination
of logical or emotional fallacies.

It is more important now than ever

for us to educate ourselves
about bias in machine learning algorithms,

a dangerous phenomenon
that gives us a dark insight

into how technology, often thought
to be created by humans for humans,

can turn ugly.

As we’ve seen, our society has
systems of bias embedded within it.

It is our job to ensure
that this inequity and unfairness

does not widen and spread
into the realm of technology,

specifically within
artificial intelligence.

I hope for you all today to leave
not only with new knowledge

about an issue
within the technology world,

but also a heart filled with empathy

towards those oppressed
by systems in place,

and what we can do
to combat those systems

and create meaningful change
using computer science.

Technology is a powerful field.

One that only seems to be getting
stronger and stronger

as our society pivots and remains
in a remote workspace.

It is incredibly important for us
to recognize and understand this power

so we can use it to our advantage.

It is up to us to ensure

that technology and artificial
intelligence is working for us,

not against us.

Thank you.

抄写员：Gm Choi
审稿人：Lucas Kaimaras

又到了每周的那个时候，
周五晚上，大流行。

大约在去年这个时候，
和你们大多数人一样，

我决定在沙发上安顿一个舒适的夜晚
，带着电视遥控器，

而不是任何周五晚上
与 COVID 闲逛的狂野冒险。

当我浏览 Netflix 时。

我通过了我昨天看的电影

，一个朋友推荐的俗气的rom-com，

完全不是我的风格。

我很快越过它，进入
了为你推荐的类别。

我急切地开始浏览这部电影，

因为在这里我找到
了我最终喜欢的大部分好电影。

然而，突然间，我坐直了。

从昨天开始，
我推荐给你的几乎所有电影

现在都是浪漫喜剧，
每一部都比上一部更俗气。

我敢肯定，我不是唯一一个

觉得 Netflix 有时似乎比我自己更了解我的人
，

但这种情况让我开始思考。

Netflix
是如何知道我昨天观看的内容

以及他们如何推荐
与我今天观看的电影类似的电影的？

要回答这个问题，

我们需要考虑
什么是算法。

这是一个非常流行的流行语
，你们中的许多人可能以前听过，

但实际上什么是算法？

算法只是
解决问题的步骤列表。

在技术世界中，算法
由

允许您执行计算的计算机可执行命令组成。

事实上，算法是 Netflix

用来生成“
为你推荐”类别

中的电影的，我去年对此有过如此痛苦的
经历。

通常，

用于人工智能
或机器学习的算法

，我们试图让
计算机模仿人类行为，

是使用数据创建的。

模仿人类行为的最好方法

是通过我们的行为来分析我们人类的
行为方式。

例如，当 Netflix 编写
他们的电影推荐算法时，

他们可能会利用

来自使用其服务的数百万用户的数据

来预测人类
在观看电影时的行为方式。

例如，如果像我这样的用户
观看了第一部 To All the Boys 电影，

那么他们很可能会
观看第二部，然后是第三部。

Netflix 所做的是将这种
看电影的趋势编译到一个大数据库中

，然后将其发送到计算机
以查找模式。

大多数人在看过
电影系列的第一部分后，

可能会看第二部分。

该算法的工作是
从数据中识别这种模式，

使用它来创建
正确输出的模型，给定输入，

然后将此模型应用于其他用户。

这是
Netflix 算法

生成我们
在“为您推荐”部分看到的电影的方式之一。

这就是为什么这些电影通常
与我们之前看过的电影相似，

无论是内容、
类型还是演员。

这个想法是，计算机
可以

在给出
要遵循的指令或要挖掘的数据时自行学习

，从而模仿人类行为。

但这个想法没有包含的

是
计算机从根本上不是人类的事实。

因此，在做出决策时，它们缺乏
我们人类认为理所当然的许多东西，

例如常识、
道德和逻辑推理。

这就是问题出现的地方。

在算法存在之前，
人类做出了所有决定。

现在计算机
要么为我们做决定，

要么严重影响
我们如何做决定。

无论是简单的事情，
比如决定接下来要看哪部电影，

还是更严肃的事情，

比如确定个人的财政信用
价值

以向他们提供贷款。

但这些计算机
并不是凭空学习的。

他们正在向我们学习。

机器学习算法
依赖于人类数据。

他们需要数据来
了解我们人类

在具体、艰难的行为方面的行为和行为方式。

然后，他们使用这些数据来复制
人类行为并使任务变得更简单。

当传递给这些机器的数据有缺陷时，就会出现问题
。

这就是我们
冒着教计算机错误的风险，

或者换句话说，
创建有缺陷的算法的风险。

造成
这种现象的根本原因是：

计算机依赖数据
，数据来自人类行为。

然而，人类行为
往往代表的是什么，

而不是应该是什么，

而现实往往是种族主义、
性别歧视、仇外心理等等。

正如美国最近发生的事件
所表明的那样，我们的社会

拥有

几代人以来一直在起作用的制度化歧视和压迫制度。

因此，

我们从社会中的人们那里收集的任何数据
都会有偏见，这并不令人震惊，

无论是违背某些想法、
信仰、人群还是传统，

我们日益依赖的机器都是
有偏见的 .

为什么？

因为他们在向我们学习，
而我们是有偏见的。

这个问题似乎与
社会相去甚远。

“技术算法中的偏见
与我们有什么关系？”

好吧，除了影响
Netflix 为我们推荐的电影之外

，算法还有其他用途
可以极大地影响我们的生活。

一个这样的例子
是员工招聘算法。

最近
这个算法如何变得有缺陷的一个典型例子

在于科技巨头亚马逊。

总的来说，亚马逊的全球劳动力
中男性占 60% 以上，

其中 75% 的管理职位
也由男性担任。

提供
给他们的员工招聘算法的数据

允许它知道女性
在这家公司是少数

，因此它会惩罚
它遇到的

任何包含“女性”这个词的简历，

导致员工如何看待性别偏见
被这家顶级公司聘用。

另一个例子是
罪犯风险评估。

这些是什么？

好吧，美国法官使用自动风险
评估来确定

被指控犯罪的个人的保释和量刑限制。

这些评估依赖
于可追溯的大型数据集

，包括逮捕记录、
人口统计、

金融稳定性等变量。与同样可能再次冒犯白人同行相比

，这些
评估所基于

的算法被认为天生
对非裔美国人

有偏见
，因为他们建议对他们进行拘留、

更长的刑期
和更高的

保释金等措施。

当我们考虑它时，
这并不完全令人惊讶，

尽管完全是残酷的。

在我们的社会中，边缘化和种族主义有着悠久的历史，

用于创建该算法的数据集
最有可能反映了这一点。

因此，该算法
已经学会推荐

继续压迫非裔美国人的行动，

因为这是它所训练的数据集
所显示的。

使用面部识别软件可以看到更贴近家庭的示例
。

每次解锁 iPhone 时，我们都会使用此算法。

然而，我们可能不知道的
是，面部识别

虽然表面上被
证明准确率超过 90%，但

实际上并不是
对每个人都如此准确。

如图所示，
面部识别的准确度

因用户的人口统计

而异，对于深色女性来说最不准确，这

说明了隐藏
在该算法中的偏见。

尽管制造该软件的公司
已宣布

承诺修改测试
和改进数据收集，

但重要的是要注意当今社会

中面部识别的广泛流行
，

以及这种
有缺陷的算法已经造成的损害。

我们不仅在手机中使用它，还在

执法部门、就业办公室、
机场、保安公司等场所使用它。

所有人都
以多种身份使用面部识别。

如果执法部门错误地
将一名无辜女性标记为罪犯，

而未能
确定真正的肇事者怎么办？

如果这种情况一遍
又一遍地发生怎么办？

我们能想象
那会产生的可怕影响吗？

开始

探索如何教授技术
以模仿人类的谬误和偏见，

从而产生长期
和深远的负面影响，这似乎是一件黯淡的事情。

但是我们可以通过一些方法来
改善这种情况。

一种方法是扩大
用于创建这些算法的数据范围。

如果我们为计算机提供更多样化
和多样化的数据来学习，

我们可以帮助确保它们
在人类行为中学习正确的模式

，准确地反映
我们希望他们做什么。

另一种方法是在创建这些算法时团结
起来要求提高透明度

。

在过去的几十年里，
像 Facebook 和谷歌这样的科技巨头

已经利用
大量的个人数据

来创建强大的
机器学习技术

，
而公众和政府几乎没有洞察力或监督。

这意味着只有
一小部分人

监督
了大部分社会正在使用的东西的创造，

这本质上会导致偏见。

我们可以通过制定政策

来管理何时以及
如何使用个人数据，

以及将
多样性和公平领导者的工作纳入

这些算法的创建中来解决这个问题。

我们可以改善这种情况的最后一个方法
是更深入地探索

教授机器
社会伦理的想法。

例如，广泛持有的信念，
如“在被证明有罪之前是无辜的”、

常识推理和
消除逻辑或情感谬误。对于

我们来说，现在比以往任何时候都更重要的是

要让自己
了解机器学习算法中的偏见，这

是一种危险的现象
，让我们对

通常被
认为是由人类为人类创造的技术

如何变得丑陋有一个黑暗的洞察力。

正如我们所见，我们的社会中
嵌入了偏见系统。

我们的工作是
确保这种不公平和不公平

不会扩大和蔓延
到技术领域，

特别是在
人工智能领域。

我希望你们今天离开时
不仅

对技术世界中的一个问题有新的认识，

而且

对那些被现有
系统压迫的人充满同情心，

以及我们可以做些什么
来对抗这些系统

并创造有意义的
改变计算机科学。

技术是一个强大的领域。

随着我们的社会转向并保持
在远程工作空间中，它似乎只会变得越来越强大。认识和理解这种力量

对我们来说非常重要，

这样我们才能利用它来发挥自己的优势。

我们有责任

确保技术和
人工智能对我们有用，

而不是对我们不利。

谢谢你。