Reimagining Trust in AI

[Music]

you may have heard the same

common sense is not so common it happens

every day around us where people break

simple rules

and exemplify this notion for instance

don’t text while you drive only cross

the street when it’s green light

apparently not so common in new york

which is the very

first thing i learned from there we all

have cognitive biases

as to what common sense is and this

notion

also applies between human intelligence

and artificial intelligence

here’s an example how would you label

the sentiments of this conversation

good morning your flight is rescheduled

to 3 am

tomorrow perfect

is it positive neutral or negative to

you

in natural language processing the

algorithm will put

a positive label because it recognizes

the positive words like

good and perfect but it might not be

able to capture any

negative signal in fact detecting

sarcasm or

irony is one of the most challenging

topics in this field

then when responses from ai machines

don’t make sense

would you trust ai it is time to

re-imagine

the trusting ai to start with

it’s worthy to see what ai really is and

what we should expect from it

let’s do a little experiment think about

the phrase artificial intelligence

what is the first image that pops up in

your mind

is it something like these

every time when we talk about ai

we use images with those

octarian brainy sci-fi robots

that have little to do with ai

we should really stop using those

because

ai actually looks like this

artificial intelligence is artificial

after all

at its core ai is ultimately a process

of optimization

based on given inputs it learns the

patents from given information

and makes optimal decisions through

computations

but when ai is overly humanized or

sometimes

overhyped they seem to have the magic

power

beyond human intelligence however

is it a realistic expectation we’re

approaching but generally

we’re not there yet now let’s take a

step back

and think about why is that let’s

compare how

us humans make decisions versus how ai

does

here’s a decision for you to make say

that you

and your partner have been living in

seattle for five years

today you finally got a job offer from

your dream company

but it requires you to relocate to

boston

how would you decide there’s definitely

a long list to consider

your personal development your

relationship

timing expenses lifestyle and on and on

if you have kids 10 more items added to

your list

as you can imagine decision making is a

fairly complicated process for us

then how do ai algorithms decide

it’s utterly simple finding the value of

either

minimum cost or maximum rewards through

mathematical

equations and for this process to work

all the complexity of human

considerations has to be simplified

and quantified into several metrics or

sometimes

even a single number therefore it’s

really unrealistic

to expect a perfect transition from the

decision making of human beings

to the decision making of machines

so far ai is a hardcore optimizer with

strong computing power

and also with limitations in its

optimization

mechanism now let’s just say we do have

a perfect

algorithm today can you fully trust it

now

don’t forget there’s a condition

everything is processed

upon given inputs for machines these

inputs

are the only learning materials and the

foundation

of any trustworthy outputs

that’s why we need to fit in sufficient

unbiased

and representative data in the first

place which is

quite challenging to be honest because

in reality

we may not have enough data there are

always random factors

the data may now fully describe the real

world and the world is changing

dynamically so naturally there are

limitations

in the given inputs with that being said

with limitations in ai algorithms and in

training data

how could we trust ai well

never just trust but validate

there are a bunch of examples that ai

could bring troubles in real cases

if without validation i noticed the

project in healthcare

the motto was to predict whether a

patient would have

negative or positive diagnosis by

looking at their x-ray images

at first the predictions were extremely

accurate

after a while the doctors find out why

and it’s because

some images were taken from portable

x-ray machines

and some were taken from regular

machines

what does that mean think about what

kind of patients

get x-rayed from portable machines

the patients who couldn’t make it to the

regular machine and those are the

patients who are more likely to have

unfavorable diagnosis

turned out that this model is merely

looking at the type

of the x-ray device rather than the

actual pathology

as a matter of fact it is so common in

healthcare that patients who have severe

symptoms

get different treatments than those who

are less sick but apparently

this should not be the only evidence for

diagnosing

that’s why when given a model we should

always

validate ask what drives the model

predictions

and whether those drivers make sense to

you

and validation exercises can be done by

not only ai practitioners

it can be done by anyone for example my

lovely younger sister she’s a dancing

teacher

once i asked her to think about this and

she looked at me

how am i supposed to do is there an

easier way to validate ai

i told her yes there are ways for you to

do it easily and critically

for example validate with corner cases

corner cases adjust the situations that

occur outside of the normal cases

if a technology works in both normal and

corner cases

we know we could trust it to a larger

extent

a couple of months ago i was researching

a very popular

object detection technology it is an

algorithm that can locate and identify

objects

in images or videos so that we can do

image annotation

face recognition and so forth so i was

testing the model by

throwing out some images let’s take a

look

take the typical rush hour afternoon on

new york friday

we can see that in this busy picture the

algorithm captured a lot of items

they are located in bounding boxes with

labels

indicating the categories and the

respective probabilities

essentially those probabilities are

telling us how certain the algorithm is

so here the model is 80 sure that

in this yellow box this is a car it also

detected

the pedestrians traffic lights etc

this model is by no means perfect but it

did a fair job in general

now let’s take it up to another level by

testing some corner cases

what if we apply a picture with a

different angle

for example from an overhead view like

this one

when the shapes of items twist when we

look at them from above

would the model still hold up

now i got a bit confused you can see

that in the center

the people and the chairs were correctly

and confidently detected

while in this corner the motor view that

person as a teddy bear

it also means the flag the picture in

the frame and the table

those are the variance objects that the

model is supposed to detect

but it wasn’t able to identify them in

weird angles

let’s try another case what if we

fit in an image with incomplete

information

for example a partially covered cat face

would the detector figure this out

somehow it totally missed the cute cat

and magically detected

three ties instead i know the kitty

feels the same

astonished as you can see by

thinking critically asking what-if

questions and testing corner

cases it’s almost effortless to validate

and the bonus point here is that being

critical of science

inspires science just like we saw it

didn’t work with the covered cat face

in july 2020 the u.s national institute

of standards and technology

has tested 89 facial recognition

algorithms

and find that the error rate spiked

up to 50 percent for faces wearing musks

plenty of developer teams were actively

working on this during

covet 19 and six months later there was

a huge

improvement in january 2021

the u.s department of homeland security

tested a new algorithm

and it can identify airline passengers

with masks

96 percent of the time

wherein musks is a corner case for

facial recognition

and being aware of this corner case

accelerated

technology advancement

yet still a lot of times ai developers

might not be aware of the corner cases

or the potential consequences

it really relies on everyone to dive

deeper

and validate from different perspectives

it’s not only

one ai expert needing validation from

other ai experts

but also inputs from people outside of

this field

even if your expertise is in something

completely different

if you are a musician a nurse a lawyer

a factory worker or someone in sports

your input

matters start asking questions towards

the ai technologies

that tries to impress you or the data

conclusion that tries to convince you

we’re at a time when everyone is close

to data

close to ai technologies while we’re

also at a time

when people and ai developers are far

apart

so i’m asking for both sides to come

closer and bridge the gap

for ai practitioners in order to collect

more

corner cases and be more aware of the

impact

feedback loops and platforms should be

created and easily accessible to the

audience

the audience on the other hand is also

responsible to participate not only just

as a pure consumer a passenger

but as a feedback provider this teamwork

really matters to help ai make more

sense

and more importantly it matters for

advancing ai

as a product as a service as a policy

and

as a society now let’s go back to the

beginning of this talk

reimagining trust in ai

just like how trust is built up between

us humans

we spend some time getting to know each

other through both glorious and

imperfections

test each other with corner cases

subconsciously or consciously

see each other experiencing fun time and

down time

and work on challenges together

then eventually you know if you trust

that person

it’s the same with ai we build trust

with ai through

understanding what it really is

acknowledging its limitations

through interactions and validations and

more crucially

through collaborations trustworthy ai

is always the result of joint efforts

we need collective human intelligence to

progress

machine intelligence we’re all in this

revolutionary disruption

together and that is the idea

worth spreading thank you

[音乐]

你可能听说过同样的

常识并不那么普遍,它

每天都在我们周围发生,人们打破

简单的规则

并举例说明这个概念,例如

在你开车时不要发短信,只有

在绿灯过马路时

显然不是这样 在纽约很常见,

这是

我从那里学到的第一件事

,对于什么是常识,我们都有认知偏见,这个

概念

也适用于人类智能

和人工智能之间,

这是一个例子,你如何标记

这次谈话的情绪

早上好 你的航班被重新安排

到明天凌晨 3 点

完美

在自然语言处理中它对你是积极的中性还是消极的

算法将放置

一个积极的标签,因为它可以识别

好的和完美的积极词,但它可能

无法捕捉到任何

消极的信号 事实检测

讽刺或

讽刺是该领域最具挑战性的主题之一,

然后当来自人工智能机器的响应没有时

' 没有意义

你会信任人工智能吗 现在是时候

重新

想象信任的人工智能了

每次我们谈论人工智能时,你脑海中

浮现的图像

都是这样的

智能

毕竟

是人工智能的核心人工智能最终是一个

基于给定输入的优化过程它

从给定信息中学习专利

并通过计算做出最佳决策

但是当人工智能过度人性化或

有时被

过度炒作时,它们似乎具有

超越人类的魔力 然而

,智能是我们正在接近的现实期望吗?

但通常

我们还没有现在让我们

退后一步

想想 为什么让我们

比较一下

我们人类如何做决定和人工智能如何

这里是你要做出的决定,

假设你

和你的伴侣今天已经在

西雅图生活了五年

,你终于从你梦想中的公司获得了工作机会,

但它需要 你要搬到

波士顿

你会如何决定 肯定

有一个很长的清单要考虑

你的个人发展 你的

关系

时间 开支 生活方式等等

如果你有孩子 你的清单上再增加 10 个项目

你可以想象 决策是一个

相当复杂的过程 那么对我们来说

,人工智能算法如何决定

它是非常简单的,

通过数学方程找到最小成本或最大奖励的值,

并且为了使这个过程起作用

,人类

考虑的所有复杂性必须简化

并量化为几个指标,

有时

甚至是一个单一的 因此

,期望人类决策的完美过渡是不现实

到目前为止的机器决策

ai 是一个核心优化器,具有

强大的计算能力

,但它的优化机制也有局限性

现在假设我们今天确实有

一个完美的

算法,你现在可以完全信任它

吗?别忘了有一个条件

一切都是

根据给定的机器输入

来处理的

没有足够的数据

总是存在随机

因素 数据现在可以完全描述现实

世界,世界正在动态变化,

所以自然地

,给定的输入存在限制,据说

人工智能算法和训练数据有限制

我们怎么能相信 人工智能

永远不会只是信任,而是要验证

有很多例子可以证明人工智能

可能会带来麻烦 在所有情况下,

如果没有验证,我注意到

医疗保健项目

的座右铭是

通过首先

查看患者的 X 射线图像

来预测患者的诊断是阴性还是阳性

,一段时间后,预测非常准确,医生找出原因

,这是 因为

有些图像是从便携式

X 光机

上拍摄的,有些是从普通机器上拍摄的,

这意味着什么?想想什么

样的患者

从便携式机器上接受 X 光检查

那些无法进入

普通机器的患者,那些是

更有可能做出

不利诊断的患者

结果表明,该模型仅

查看

X 射线设备的类型,而不是

实际的病理

,事实上,它在医疗保健中很常见,

以至于有严重

症状的患者

会得到 与那些病情较轻的人不同的治疗方法,

但显然

这不应该是诊断的唯一证据,

这就是为什么在给出模型时我们应该

总是

验证 询问是什么驱动了模型

预测

以及这些驱动程序对你是否有意义

,验证练习不仅可以由

人工智能从业者

完成,任何人都可以完成,例如我

可爱的妹妹,

一旦我让她思考,她是一名舞蹈老师 这个,

她看着我

,我应该怎么做,有没有

更简单的方法来验证 ai

我告诉她是的,有一些方法可以让你

轻松而严格地做到这一点

,例如用极端

情况验证极端情况调整

发生在外部的情况 正常情况下,

如果一项技术在正常情况和

极端情况下都有效,

我们知道我们可以在更大程度上信任它

几个月前我正在研究

一种非常流行的

对象检测技术,它是

一种可以定位和识别

图像中的对象或 视频,以便我们可以进行

图像注释

人脸识别等等,所以我

通过

丢弃一些图像来测试模型让我们来

看看典型的 纽约星期五下午的高峰时间,

我们可以看到,在这张繁忙的图片中,

算法捕获了很多项目,

它们位于边界框中,

标签

指示类别和

各自的概率,

基本上这些概率

告诉我们算法的确定性如何

该模型是 80 肯定

在这个黄色盒子里这是一辆汽车 它还

检测

到行人交通信号灯等

这个模型绝不是完美的,但

总的来说它做得很好

现在让我们通过测试一些角落案例来把它提升到另一个层次

如果我们以不同的角度应用一张图片,

例如从像这样的俯视图

,当我们从上面看物品的形状时会扭曲

,模型是否仍然可以支撑

现在我有点困惑,你

可以在 居中

的人和椅子被正确

而自信地检测到,

而在这个角落里,电机将那

个人视为泰迪熊,

这也意味着旗帜图片中

的图片 框架和表格

这些是模型应该检测的方差对象,

但它无法以

奇怪的角度识别它们

让我们尝试另一种情况,如果我们

适合包含不完整信息的图像,

例如部分覆盖的猫脸

会怎样 检测器

以某种方式发现了这一点,它完全错过了可爱的猫

并神奇地检测到了

三条

领带 这里的重点是,

对科学的批判会

激发科学的灵感,就像我们在 2020 年 7 月看到它

不适用于被遮盖的猫脸

美国国家

标准与技术研究所

已经测试了 89 种面部识别

算法

,发现错误率

飙升至 50% 的人脸戴麝香

很多开发团队

19 年和六个月后都在积极研究这个问题

2021 年 1 月的巨大改进美国

国土安全部

测试了一种新算法

,它可以

96% 的时间内识别

戴口罩的航空公司乘客,其中麝香是面部识别的一个极端案例,

并且意识到这种极端案例

加速了

技术进步,

但仍然是一个 很多时候,人工智能开发人员

可能没有意识到极端情况

或潜在后果,

它真正依赖于每个人

更深入地研究

并从不同的角度进行验证

即使您的专业知识

完全不同,

如果您是音乐家、护士、

律师、工厂工人或运动人士,

您的输入

问题开始向

试图给您留下深刻印象的人工智能技术或试图说服您的数据

结论提出问题

我们正处在一个每个人都

接近数据

接近人工智能技术的时代,而 e 我们

处于人和 AI 开发人员相距甚远的时代

所以我要求双方

更接近并

弥合 AI 从业者的差距,以便收集

更多

极端案例并更加了解

影响

反馈循环 应该

创建平台并让观众轻松访问

另一方面,观众也有

责任参与,不仅

作为纯粹的消费者和乘客,

而且作为反馈提供者,这种团队合作

真的很重要,可以帮助人工智能变得更有

意义

,更重要的是它

将人工智能

作为一种产品作为一种服务作为一种政策

作为一个社会的发展很重要现在让我们回到

这个谈话的开头

重新想象对人工智能的

信任就像我们人类之间如何建立信任一样

我们花一些时间相互了解

经历光荣和

不完美

用极端案例互相考验

下意识或有意识地

看到彼此经历有趣的时间

和休息时间

,然后一起应对挑战

最终你知道你是否信任

那个人

它与人工智能是一样的 我们

通过

了解它的真正含义来建立对人工智能的信任

通过交互和验证,

更重要的是

通过协作来承认它的局限性 值得信赖的人工智能

始终是共同努力的结果,

我们需要集体的人类智慧 为了

进步

机器智能,我们都在这场

革命性的颠覆

中,这是

值得传播的想法,谢谢