Four Ingredients for K12 Data Science

i like to think of this as sort of a

cooking presentation right we’re going

to be talking about what the ingredients

need to be to teach data science in k-12

classes i’ve worn a lot of different

hats in my life i’ve i’ve been a

computer scientist and professional

programmer as john told you i’ve been a

math teacher right here in boston um

i’ve had the incredible privilege to

work alongside giants in the field like

sriram krishnamurthy and kathy fisler on

a research project called bootstrap

based at brown university in the field

of computer science education and most

recently i’ve donned a hat as the father

to the coolest girl in the world i

promised maya she’d be in here and while

i would love to spend the next nine and

a half minutes uh giving you a ted talk

focused on her instead we’re going to

focus on something slightly less

interesting which is what’s going on in

the cutting edge of computer science

research let me take you back a ways to

about 10 or 15 years remember when

everybody was saying cs for all see us

for all we got to get coding into

schools right

at the time

we made a very controversial bet at

bootstrap first we said you know we

don’t think siloed classes are the only

way to do this in fact they might not

even be the best way

second we gambled on the idea that we

could fuse computing and mathematics

authentically so that instead of

undermining the math the computing

actually reinforced it and third we bet

there was a way to do this so it worked

equitably for all students so

fast forward a little bit

this curriculum sort of busted out of

the lab and became one of the most

widely used computing curriculum

nationwide and while we’re thrilled with

our scale we’re proud of our diversity

and the reason that we have those

numbers is because we’re working with

the teachers that already reach every

child not the computer science teachers

but the mainstream math teachers who

have no computing background at all for

them it was just a powerful way to teach

mathematics now we didn’t want to be one

hit wonders right so we rinsed and

repeated the formula and extended this

for things like algebra physics and

beyond

and about a half decade ago we started

getting really excited about something

that nobody was terribly excited about

which was what if you could teach data

science in k12

fast forward to today it’s not cs for

all anymore it’s data science for

everyone and they’re asking the same

questions that we asked 10 years ago

what should these classes look like and

where do they fit curriculum design is

essentially a recipe and every recipe

has room for flexibility your cupcake

might involve cream cheese frosting and

your cupcake might involve you know

coconut shreds or something

maybe not coconut shreds i wouldn’t put

that on you but there’s room for flexing

with these ingredients but one thing we

can all agree is that if you leave an

ingredient out completely well it might

be delicious but you haven’t baked what

you set out to bake so the question

becomes what are the must-have

ingredients for a responsible k-12 data

science class

now the prevailing wisdom is that we can

all agree on at least two ingredients

mathematics and computing and when i say

computing i mean programming algorithms

structured data and for you data

scientists out there who may not be

familiar with the k12 math standards

those are the standards that cover the

statistics content right the concepts

that are necessary for rigor those are

the standards that cover the data

visualization right it’s those standards

that talk about histograms and lines of

best fit what we’re hearing and this is

sort of like the loudest voices in the

room is that the solution here we go the

solution is we’re going to take stats

classes right thank you r.a fisher from

100 years ago we’re going to add some

coding and boom we’ve baked ourselves a

data science class

and therefore we should elevate

statistics to be just as important as

calculus now as a former math teacher i

am all about elevating statistics to be

just as important as calculus i think

it’s great but if our goal is to bring

data science to k12 i’m here to tell you

that this formula is dangerously flawed

imagine an amazing cs class kids are

building virtual worlds and 3d games and

at the end of the class they spend like

two weeks being given a calculator and

they’re taught which stats buttons

to press

to do some statistics is that a data

science class

obviously not

now let’s flip that suppose you have an

amazing statistics class totally awesome

and then at the very end we’re going to

have two weeks where they learn what

commands to type into python

also not a data science class and as a

team that’s been working on this for the

last 15 years who knows something about

combining math and computing we know

it’s not that simple what you need to do

if you want to mix these ingredients is

find the computational concepts that

bridge these worlds there’s a lot of

them i’ll just give you three quick

examples

how do you take a complex problem break

it down into simpler pieces and know

that when you’ve solved those pieces

you’ve actually solved the original

problem you set out to answer

how do you trust a computation that’s

been performed on a data set with 10 000

rows that nobody could possibly check by

hand and how do you ensure that your

results are reproducible that anyone

else could take your data and your code

and see the same results that you did

these concepts were critical to our

success over a decade ago and they’re

just as critical now

and recognize that if you’re still

thinking that what’s necessary is just

teaching some coding

it doesn’t touch any of them

but it actually gets worse because there

are two other ingredients that are often

left out of the conversation

with disastrous results so i call this

when data science goes bad

this may come as a surprise to many of

you but we live in a society that’s kind

of racist

and when you do data analysis on that

data guess what models and algorithms

come out

kind of racist ones and this isn’t just

an isolated headline right this has

become essentially an epidemic where the

darkest and deepest divides in our

society are being institutionalized in

code affecting everything from medical

care to sentencing guidelines and racism

is not just where it stops

political consultants are mining voter

data and everything else to build

tactically precise gerrymandered

districts that serve to further deepen

the polarization in our democracy

and of course we all talk about how

important it is for students to learn

about

cyber security right we gotta teach them

what a good password is teach them not

to hand out that password

and yet what we really need to be doing

is teaching them enough data science to

understand why they should not be

filling out that survey that tells them

which harry potter character they are

most like because it turns out that when

you mine that freely available data on

social media it can be weaponized to

shift public opinion about issues as

major as the fracturing of the european

union brexit itself

so why are these being left out well

because just teaching math and computing

doesn’t get the job done there’s two

more ingredients that need to be part of

the conversation that are always left

out

the first is civic responsibility so

let’s talk about civic responsibility if

you’re viewing this as math and code

great i’m sure you’ll tell students the

dangers of taking a biased sample but

what we need to be doing is teaching

students the dangers of a good random

sample taken from a society filled with

bias

if you’re thinking of this as just math

and code well great i’ll teach you the

algorithms to help you aggregate data to

predict human behavior and find out

which of you in the crowd are most

likely to commit a crime

but without what we need is civic

responsibility that says whether it is

ethical to ask that question or gather

that data in the first place

now again if the strategy is we’re going

to put it all on math teachers

are they ready to have this conversation

and if they are is it fair to demand

that it falls solely on them i don’t

think so when you teach medicine without

civic responsibility you get the

tuskegee experiments when you teach data

science without this ingredient you get

racially biased algorithms and weaponize

social media

the next ingredient that we need to

consider is domain investment because i

could be the most incredible programmer

and statistician you’ve ever met but if

i don’t know anything about baseball i

cannot go down to yawkey way and analyze

sports statistics for the red sox so

imagine if a teacher decides that her

kids are going to analyze a data set

about the best vineyards in tuscany

which students are engaged

which students feel included which

students feel left out

it turns out that the choice of data the

actual investment in the domain is a

critical component not just of

engagement and relevance but also of

diversity equity and inclusion we’ve got

a paper coming out of this research

group that talks to specifically about

this in a couple of weeks so what we

need is to have teachers who can speak

to the content areas that matter to kids

and meet them where they are

again is it fair to put all of that on

the math teachers

disrespecting the domain expertise of

humanities folks has been standard

operating procedure for the stem world

for too long we cannot afford to repeat

that mistake

so i’m excited to share with you some of

the research results that we’ve had here

currently we’ve got a curriculum that is

in use around the country right now in

the nation’s largest school district new

york city we’ve got social studies

teachers having kids analyze the stop

and frisk data set teaching social

studies in a revolutionary new way

out in arizona we’ve got physics

teachers who already had their kids

gather experimental data but now their

kids can analyze the data and try to

figure out what kind of equation models

what i’m seeing and they can figure it

out before they even see the equation in

the book

students in california are looking at

climate data you can have students in a

phys ed class analyzing their free throw

percentages or in a nutrition class

looking at their snacking habits

this can be a full court press and it’s

happening now

where i want to leave this talk is by

saying this notion that mixing math and

coding is easy is flawed but even if you

do it right leaving it at math and

coding is fundamentally dangerous

for those of us who care about data

science if the headline becomes it’s the

new math 2.0 we are sunk

this needs to be an interdisciplinary

solution a full court press that engages

teachers across grade levels and across

disciplines we need to make sure these

ingredients are part of the conversation

we need to make sure that we’re not just

picking tools because they’re free or

because they’re popular but that we’re

choosing a tool that is appropriate for

the learning goals of the subject and

for the cognitive demands of the

students we need to make sure that we’re

not just dumping kids with more data

sets we need to make sure they’re

actually better data sets

are they engaging do they meet kids

where they need to be do the columns of

your data set actually are they

accessible because if it takes a student

a week to learn what a data set is even

about

we’ve lost

and finally because we believe in this

so thoroughly we think it’s important to

make it free all of our curricular

materials we’re giving away in the hopes

that all of you out there will join us

and engage teachers from across the

discipline to make data science real but

also make it responsible

i’m fortunate enough to work with an

incredible team

and i want to thank all of you for your

time

我喜欢将其视为一种

烹饪演示,我们

将讨论

在 k-12 课程中教授数据科学所需的成分

我在我的生活中戴过很多不同的

帽子我' 我曾经是一名

计算机科学家和专业

程序员,正如约翰告诉你的那样,我一直

是波士顿的一名数学老师,嗯,

我非常荣幸能

与 sriram krishnamurthy 和 kathy fisler 等该领域的巨头一起

研究 名为 bootstrap

的项目位于布朗大学

计算机科学教育领域,

最近我作为世界上最酷女孩的父亲戴上了帽子,

向玛雅保证她会在这里,而

我很想花 接下来的九

分半钟,呃,给你一个

专注于她的 ted 演讲,我们将

专注于稍微不那么

有趣的事情,这是计算机科学研究前沿正在发生的事情,

让我带你回到

大约 10 分钟 或 15 年 ars 记得当

每个人都在说 cs for all see us

for all 我们必须让编码进入

学校

的时候,

我们在引导程序上做了一个非常有争议的赌注,

我们说你知道我们

不认为孤立的课程是唯一的

方法 事实上,这

甚至可能不是最好的方法,

其次我们赌的是我们

可以真正融合计算和数学,

这样计算不会破坏数学,而是

实际上加强了它,第三,我们打赌

有一种方法可以做到这一点所以它

为所有学生公平地工作,如此

快进一点,

这个课程有点

脱离实验室,成为全国

使用最广泛的计算课程

之一,虽然我们对我们的规模感到兴奋,但

我们为我们的多样性

和原因感到自豪 我们有这些

数字是因为我们正在与

已经接触到每个孩子的老师合作,

不是计算机科学老师,

而是

没有计算机背景的主流数学老师 对

他们来说,这只是一种有效的数学教学方式,

现在我们不想成为一个

成功的奇迹,所以我们冲洗并

重复了这个公式,并将其扩展到

代数物理等领域

,大约五年前我们开始

得到 真的对

没有人感到非常兴奋

的事情感到非常

兴奋

这些课程应该是什么样子

,它们适合什么课程设计

本质上是一个食谱,每个食谱

都有灵活的空间你的纸杯蛋糕

可能涉及奶油芝士糖霜,

你的纸杯蛋糕可能涉及你知道的

椰子丝或其他

可能不是椰子丝的东西,我不会 把

它放在你身上,但这些成分有弹性的空间,

但我们

都同意的一件事是,如果你把一种

成分完全放在外面 它可能很好

吃,但你还没有烤

出你要烤的东西,所以问题

变成

了负责任的 k-12 数据科学课程的必备成分是什么,

现在流行的智慧是我们都可以

就至少两个达成一致 成分

数学和计算,当我说

计算时,我的意思是编程算法

结构化数据,对于

那些可能不

熟悉 k12 数学标准的数据科学家来说,

这些标准涵盖了统计内容,这些标准是

严格要求这些标准所必需的

是涵盖数据

可视化的标准吗? 是

那些谈论直方图和

最适合我们所听到的线的标准,这

有点像房间里最响亮的声音

是解决方案我们去

解决方案是我们 重新上统计

课吧谢谢

100 年前的 ra Fisher 我们将添加一些

编码和繁荣 我们已经为自己开设了一门

数据科学课

,因此 作为一名前数学老师,我们应该将

统计数据提升到与微积分一样重要的位置,

我致力于将统计数据提升到

与微积分一样重要,我认为

这很好,但如果我们的目标是将

数据科学带到 k12,我就在这里 告诉你

这个公式有严重的缺陷

想象一个很棒的 CS 班的孩子们正在

构建虚拟世界和 3D 游戏,

在课程结束时,他们花了大约

两周的时间得到一个计算器,并

教他们按哪些统计按钮

来做 一些统计数据是数据

科学课

显然不是

现在让我们翻转假设你有一个

非常棒的统计课非常棒

,然后在最后我们将

有两周的时间他们学习

在 python 中输入什么命令而

不是数据 科学课,作为一个

在过去 15 年中一直致力于此的团队,

他们知道如何

将数学和计算结合起来,我们知道

如果你想混合这些,你需要做的并不是那么简单 成分是

找到连接这些世界的计算概念,

其中有

很多我只是给你三个简单的

例子

,你如何把一个复杂的问题

分解成更简单的部分,并且

知道当你解决了这些部分时,

你已经 实际上解决了

您开始回答的原始问题,您

如何信任

在包含 10 000

行且没人可以手动检查的数据集上执行的计算,

以及如何确保您的

结果是可重现的,任何

其他人都可以使用您的 数据和您的代码,

并看到与您所做的相同的结果

这些概念对我们

十多年前的成功

至关重要,现在

它们同样重要 “不要碰它们中的任何一个,

但实际上情况会变得更糟,因为

还有另外两种成分经常

被排除在对话之外

并带来灾难性的结果,所以我

在数据科学时称之为 坏了

这可能会让你们中的许多人感到惊讶,

但我们生活在一个种族主义的社会中

,当你对这些数据进行数据分析时,

猜测什么模型和算法

会产生种族主义,这

不仅仅是孤立的 标题正确,这

实际上已成为一种流行病,

我们社会中最黑暗和最深刻的分歧

正在制度化,

影响从医疗

保健到量刑指南的所有方面,

种族主义不仅仅是它停止的地方,

政治顾问正在挖掘选民

数据和其他一切以建立

战术 精确的

划分区域有助于进一步加深

我们民主的两极分化

当然我们都在谈论

学生了解

网络安全的重要性我们必须教他们

什么是好的密码教他们

不要分发该密码

和 然而我们真正需要做的

是教他们足够的数据科学来

理解为什么他们不应该

填写 但那项调查告诉

他们他们最喜欢哪个哈利波特角色,

因为事实证明,当

你在社交媒体上挖掘免费提供的数据时,

它可以被武器化,以

改变公众

对欧盟脱欧本身破裂等重大问题的

看法

那么为什么这些都被忽略了,

因为仅仅教授数学和计算

并不能完成工作还有两个

需要成为

对话的一部分的成分总是被

忽略第一个是公民责任所以

让我们谈谈公民责任如果

你认为这是数学和代码

很棒我相信你会告诉学生

采取有偏见的样本的危险,但

我们需要做的是教学

从充满偏见的社会中抽取一个好的随机样本的危险

如果您认为这只是数学

和代码很好,我会教您

算法来帮助您汇总数据以

预测人类行为并

找出你们中的哪一个 wd 最

有可能犯罪,

但我们需要的是公民

责任,即

如果策略是我们

将全部放在数学上,那么现在再次提出该问题或收集该数据是否合乎道德 老师

们,他们准备好进行这次对话了吗?如果他们准备好了

要求它完全落在他们身上是公平的

吗? 获得有

种族偏见的算法并将社交媒体武器化

我们需要考虑的下一个

因素是域名投资,因为我

可能是你见过的最不可思议的程序员

和统计学家,但如果

我对棒球一无所知,我

就不能去 yawkey 方法和分析

红袜队的体育统计数据,所以

想象一下,如果一位老师决定她的

孩子要分析一个

关于托斯卡纳最好的葡萄园的数据集,

哪些学生 参与

哪些学生觉得被包括在内 哪些

学生觉得被忽视

事实证明,数据的选择

对该领域的实际投资

不仅是

参与度和相关性的关键组成部分,也是

多样性公平和包容性的关键组成部分 我们已经发表了

一篇论文 这个研究

小组在几周内专门讨论了

这个问题,所以我们

需要的是让老师能够就

对孩子们重要的内容领域进行交流,

并在他们所在的地方再次与他们见面,这

是否公平

长期以来,数学老师

不尊重人文学科专业知识,

这一直

是stem世界的标准操作程序,

我们不能

重蹈覆辙,

所以我很高兴与您分享

我们目前在这里获得的一些研究成果

我们

现在

在全国最大的学区

纽约市有一个在全国范围内使用的课程 我们有社会研究

老师让孩子们分析 st op

和 frisk 数据集在亚利桑那

州以革命性的新方式教授社会研究

我们有物理

老师,他们已经让他们的孩子

收集实验数据,但现在他们的

孩子可以分析数据并尝试

找出什么样的方程模型

我 看到

了,他们甚至在看到书中的方程式之前就可以弄清楚

加利福尼亚的学生正在查看

气候数据,您可以让

体育课上的学生分析他们的罚球

命中率,或者让营养课上的学生

查看他们的零食习惯

这可能是一个完整的法庭新闻,它

现在正在

发生,我想离开这个演讲的地方是

说这个概念,即混合数学和

编码很容易是有缺陷的,但即使你

做对了,把它留在数学和

编码对那些人来说从根本上是危险

的 我们当中关心数据

科学的人中,如果标题变成了

新的数学 2.0,我们就沉没了

这需要一个跨学科的

解决方案 一个让教师参与的全场新闻

s 年级水平和跨

学科 我们需要确保这些

成分是对话的一部分

我们需要确保我们

选择工具不仅仅是因为它们是免费的或

因为它们很受欢迎,而是我们正在

选择一种工具 这适合

该学科的学习目标和

学生的认知需求 我们需要确保我们

不只是用更多的数据集倾倒孩子

我们需要确保他们

实际上是更好的数据集

他们是否参与 他们会

在需要去的地方遇到

孩子吗 数据集的列实际上是否

可以访问,因为如果学生

需要一周的时间来了解数据集是

什么,

我们甚至已经迷失了

,最后因为我们

如此彻底地相信这一点 我们认为

让我们免费提供的所有课程材料免费是很重要的,

希望你们所有人都能加入我们,

并让跨

学科的教师参与进来,让数据科学成为现实,但

也让它负责

我很幸运能与一个

令人难以置信的团队一起工作

,我要感谢你们所有人的

时间