i like to think of this as sort of a
cooking presentation right we're going
to be talking about what the ingredients
need to be to teach data science in k-12
classes i've worn a lot of different
hats in my life i've i've been a
computer scientist and professional
programmer as john told you i've been a
math teacher right here in boston um
i've had the incredible privilege to
work alongside giants in the field like
sriram krishnamurthy and kathy fisler on
a research project called bootstrap
based at brown university in the field
of computer science education and most
recently i've donned a hat as the father
to the coolest girl in the world i
promised maya she'd be in here and while
i would love to spend the next nine and
a half minutes uh giving you a ted talk
focused on her instead we're going to
focus on something slightly less
interesting which is what's going on in
the cutting edge of computer science
research let me take you back a ways to
about 10 or 15 years remember when
everybody was saying cs for all see us
for all we got to get coding into
schools right
at the time
we made a very controversial bet at
bootstrap first we said you know we
don't think siloed classes are the only
way to do this in fact they might not
even be the best way
second we gambled on the idea that we
could fuse computing and mathematics
authentically so that instead of
undermining the math the computing
actually reinforced it and third we bet
there was a way to do this so it worked
equitably for all students so
fast forward a little bit
this curriculum sort of busted out of
the lab and became one of the most
widely used computing curriculum
nationwide and while we're thrilled with
our scale we're proud of our diversity
and the reason that we have those
numbers is because we're working with
the teachers that already reach every
child not the computer science teachers
but the mainstream math teachers who
have no computing background at all for
them it was just a powerful way to teach
mathematics now we didn't want to be one
hit wonders right so we rinsed and
repeated the formula and extended this
for things like algebra physics and
beyond
and about a half decade ago we started
getting really excited about something
that nobody was terribly excited about
which was what if you could teach data
science in k12
fast forward to today it's not cs for
all anymore it's data science for
everyone and they're asking the same
questions that we asked 10 years ago
what should these classes look like and
where do they fit curriculum design is
essentially a recipe and every recipe
has room for flexibility your cupcake
might involve cream cheese frosting and
your cupcake might involve you know
coconut shreds or something
maybe not coconut shreds i wouldn't put
that on you but there's room for flexing
with these ingredients but one thing we
can all agree is that if you leave an
ingredient out completely well it might
be delicious but you haven't baked what
you set out to bake so the question
becomes what are the must-have
ingredients for a responsible k-12 data
science class
now the prevailing wisdom is that we can
all agree on at least two ingredients
mathematics and computing and when i say
computing i mean programming algorithms
structured data and for you data
scientists out there who may not be
familiar with the k12 math standards
those are the standards that cover the
statistics content right the concepts
that are necessary for rigor those are
the standards that cover the data
visualization right it's those standards
that talk about histograms and lines of
best fit what we're hearing and this is
sort of like the loudest voices in the
room is that the solution here we go the
solution is we're going to take stats
classes right thank you r.a fisher from
100 years ago we're going to add some
coding and boom we've baked ourselves a
data science class
and therefore we should elevate
statistics to be just as important as
calculus now as a former math teacher i
am all about elevating statistics to be
just as important as calculus i think
it's great but if our goal is to bring
data science to k12 i'm here to tell you
that this formula is dangerously flawed
imagine an amazing cs class kids are
building virtual worlds and 3d games and
at the end of the class they spend like
two weeks being given a calculator and
they're taught which stats buttons
to press
to do some statistics is that a data
science class
obviously not
now let's flip that suppose you have an
amazing statistics class totally awesome
and then at the very end we're going to
have two weeks where they learn what
commands to type into python
also not a data science class and as a
team that's been working on this for the
last 15 years who knows something about
combining math and computing we know
it's not that simple what you need to do
if you want to mix these ingredients is
find the computational concepts that
bridge these worlds there's a lot of
them i'll just give you three quick
examples
how do you take a complex problem break
it down into simpler pieces and know
that when you've solved those pieces
you've actually solved the original
problem you set out to answer
how do you trust a computation that's
been performed on a data set with 10 000
rows that nobody could possibly check by
hand and how do you ensure that your
results are reproducible that anyone
else could take your data and your code
and see the same results that you did
these concepts were critical to our
success over a decade ago and they're
just as critical now
and recognize that if you're still
thinking that what's necessary is just
teaching some coding
it doesn't touch any of them
but it actually gets worse because there
are two other ingredients that are often
left out of the conversation
with disastrous results so i call this
when data science goes bad
this may come as a surprise to many of
you but we live in a society that's kind
of racist
and when you do data analysis on that
data guess what models and algorithms
come out
kind of racist ones and this isn't just
an isolated headline right this has
become essentially an epidemic where the
darkest and deepest divides in our
society are being institutionalized in
code affecting everything from medical
care to sentencing guidelines and racism
is not just where it stops
political consultants are mining voter
data and everything else to build
tactically precise gerrymandered
districts that serve to further deepen
the polarization in our democracy
and of course we all talk about how
important it is for students to learn
about
cyber security right we gotta teach them
what a good password is teach them not
to hand out that password
and yet what we really need to be doing
is teaching them enough data science to
understand why they should not be
filling out that survey that tells them
which harry potter character they are
most like because it turns out that when
you mine that freely available data on
social media it can be weaponized to
shift public opinion about issues as
major as the fracturing of the european
union brexit itself
so why are these being left out well
because just teaching math and computing
doesn't get the job done there's two
more ingredients that need to be part of
the conversation that are always left
out
the first is civic responsibility so
let's talk about civic responsibility if
you're viewing this as math and code
great i'm sure you'll tell students the
dangers of taking a biased sample but
what we need to be doing is teaching
students the dangers of a good random
sample taken from a society filled with
bias
if you're thinking of this as just math
and code well great i'll teach you the
algorithms to help you aggregate data to
predict human behavior and find out
which of you in the crowd are most
likely to commit a crime
but without what we need is civic
responsibility that says whether it is
ethical to ask that question or gather
that data in the first place
now again if the strategy is we're going
to put it all on math teachers
are they ready to have this conversation
and if they are is it fair to demand
that it falls solely on them i don't
think so when you teach medicine without
civic responsibility you get the
tuskegee experiments when you teach data
science without this ingredient you get
racially biased algorithms and weaponize
social media
the next ingredient that we need to
consider is domain investment because i
could be the most incredible programmer
and statistician you've ever met but if
i don't know anything about baseball i
cannot go down to yawkey way and analyze
sports statistics for the red sox so
imagine if a teacher decides that her
kids are going to analyze a data set
about the best vineyards in tuscany
which students are engaged
which students feel included which
students feel left out
it turns out that the choice of data the
actual investment in the domain is a
critical component not just of
engagement and relevance but also of
diversity equity and inclusion we've got
a paper coming out of this research
group that talks to specifically about
this in a couple of weeks so what we
need is to have teachers who can speak
to the content areas that matter to kids
and meet them where they are
again is it fair to put all of that on
the math teachers
disrespecting the domain expertise of
humanities folks has been standard
operating procedure for the stem world
for too long we cannot afford to repeat
that mistake
so i'm excited to share with you some of
the research results that we've had here
currently we've got a curriculum that is
in use around the country right now in
the nation's largest school district new
york city we've got social studies
teachers having kids analyze the stop
and frisk data set teaching social
studies in a revolutionary new way
out in arizona we've got physics
teachers who already had their kids
gather experimental data but now their
kids can analyze the data and try to
figure out what kind of equation models
what i'm seeing and they can figure it
out before they even see the equation in
the book
students in california are looking at
climate data you can have students in a
phys ed class analyzing their free throw
percentages or in a nutrition class
looking at their snacking habits
this can be a full court press and it's
happening now
where i want to leave this talk is by
saying this notion that mixing math and
coding is easy is flawed but even if you
do it right leaving it at math and
coding is fundamentally dangerous
for those of us who care about data
science if the headline becomes it's the
new math 2.0 we are sunk
this needs to be an interdisciplinary
solution a full court press that engages
teachers across grade levels and across
disciplines we need to make sure these
ingredients are part of the conversation
we need to make sure that we're not just
picking tools because they're free or
because they're popular but that we're
choosing a tool that is appropriate for
the learning goals of the subject and
for the cognitive demands of the
students we need to make sure that we're
not just dumping kids with more data
sets we need to make sure they're
actually better data sets
are they engaging do they meet kids
where they need to be do the columns of
your data set actually are they
accessible because if it takes a student
a week to learn what a data set is even
about
we've lost
and finally because we believe in this
so thoroughly we think it's important to
make it free all of our curricular
materials we're giving away in the hopes
that all of you out there will join us
and engage teachers from across the
discipline to make data science real but
also make it responsible
i'm fortunate enough to work with an
incredible team
and i want to thank all of you for your
time
{{
我喜欢将其视为一种
烹饪演示,我们
将讨论
在 k-12 课程中教授数据科学所需的成分
我在我的生活中戴过很多不同的
帽子我' 我曾经是一名
计算机科学家和专业
程序员,正如约翰告诉你的那样,我一直
是波士顿的一名数学老师,嗯,
我非常荣幸能
与 sriram krishnamurthy 和 kathy fisler 等该领域的巨头一起
研究 名为 bootstrap
的项目位于布朗大学
计算机科学教育领域,
最近我作为世界上最酷女孩的父亲戴上了帽子,
我
向玛雅保证她会在这里,而
我很想花 接下来的九
分半钟,呃,给你一个
专注于她的 ted 演讲,我们将
专注于稍微不那么
有趣的事情,这是计算机科学研究前沿正在发生的事情,
让我带你回到
大约 10 分钟 或 15 年 ars 记得当
每个人都在说 cs for all see us
for all 我们必须让编码进入
学校
的时候,
我们在引导程序上做了一个非常有争议的赌注,
我们说你知道我们
不认为孤立的课程是唯一的
方法 事实上,这
甚至可能不是最好的方法,
其次我们赌的是我们
可以真正融合计算和数学,
这样计算不会破坏数学,而是
实际上加强了它,第三,我们打赌
有一种方法可以做到这一点所以它
为所有学生公平地工作,如此
快进一点,
这个课程有点
脱离实验室,成为全国
使用最广泛的计算课程
之一,虽然我们对我们的规模感到兴奋,但
我们为我们的多样性
和原因感到自豪 我们有这些
数字是因为我们正在与
已经接触到每个孩子的老师合作,
不是计算机科学老师,
而是
没有计算机背景的主流数学老师 对
他们来说,这只是一种有效的数学教学方式,
现在我们不想成为一个
成功的奇迹,所以我们冲洗并
重复了这个公式,并将其扩展到
代数物理等领域
,大约五年前我们开始
得到 真的对
没有人感到非常兴奋
的事情感到非常
兴奋
这些课程应该是什么样子
,它们适合什么课程设计
本质上是一个食谱,每个食谱
都有灵活的空间你的纸杯蛋糕
可能涉及奶油芝士糖霜,
你的纸杯蛋糕可能涉及你知道的
椰子丝或其他
可能不是椰子丝的东西,我不会 把
它放在你身上,但这些成分有弹性的空间,
但我们
都同意的一件事是,如果你把一种
成分完全放在外面 它可能很好
吃,但你还没有烤
出你要烤的东西,所以问题
变成
了负责任的 k-12 数据科学课程的必备成分是什么,
现在流行的智慧是我们都可以
就至少两个达成一致 成分
数学和计算,当我说
计算时,我的意思是编程算法
结构化数据,对于
那些可能不
熟悉 k12 数学标准的数据科学家来说,
这些标准涵盖了统计内容,这些标准是
严格要求这些标准所必需的
是涵盖数据
可视化的标准吗? 是
那些谈论直方图和
最适合我们所听到的线的标准,这
有点像房间里最响亮的声音
是解决方案我们去
解决方案是我们 重新上统计
课吧谢谢
100 年前的 ra Fisher 我们将添加一些
编码和繁荣 我们已经为自己开设了一门
数据科学课
,因此 作为一名前数学老师,我们应该将
统计数据提升到与微积分一样重要的位置,
我致力于将统计数据提升到
与微积分一样重要,我认为
这很好,但如果我们的目标是将
数据科学带到 k12,我就在这里 告诉你
这个公式有严重的缺陷
想象一个很棒的 CS 班的孩子们正在
构建虚拟世界和 3D 游戏,
在课程结束时,他们花了大约
两周的时间得到一个计算器,并
教他们按哪些统计按钮
来做 一些统计数据是数据
科学课
显然不是
现在让我们翻转假设你有一个
非常棒的统计课非常棒
,然后在最后我们将
有两周的时间他们学习
在 python 中输入什么命令而
不是数据 科学课,作为一个
在过去 15 年中一直致力于此的团队,
他们知道如何
将数学和计算结合起来,我们知道
如果你想混合这些,你需要做的并不是那么简单 成分是
找到连接这些世界的计算概念,
其中有
很多我只是给你三个简单的
例子
,你如何把一个复杂的问题
分解成更简单的部分,并且
知道当你解决了这些部分时,
你已经 实际上解决了
您开始回答的原始问题,您
如何信任
在包含 10 000
行且没人可以手动检查的数据集上执行的计算,
以及如何确保您的
结果是可重现的,任何
其他人都可以使用您的 数据和您的代码,
并看到与您所做的相同的结果
这些概念对我们
十多年前的成功
至关重要,现在
它们同样重要 “不要碰它们中的任何一个,
但实际上情况会变得更糟,因为
还有另外两种成分经常
被排除在对话之外
并带来灾难性的结果,所以我
在数据科学时称之为 坏了
这可能会让你们中的许多人感到惊讶,
但我们生活在一个种族主义的社会中
,当你对这些数据进行数据分析时,
猜测什么模型和算法
会产生种族主义,这
不仅仅是孤立的 标题正确,这
实际上已成为一种流行病,
我们社会中最黑暗和最深刻的分歧
正在制度化,
影响从医疗
保健到量刑指南的所有方面,
种族主义不仅仅是它停止的地方,
政治顾问正在挖掘选民
数据和其他一切以建立
战术 精确的
划分区域有助于进一步加深
我们民主的两极分化
当然我们都在谈论
学生了解
网络安全的重要性我们必须教他们
什么是好的密码教他们
不要分发该密码
和 然而我们真正需要做的
是教他们足够的数据科学来
理解为什么他们不应该
填写 但那项调查告诉
他们他们最喜欢哪个哈利波特角色,
因为事实证明,当
你在社交媒体上挖掘免费提供的数据时,
它可以被武器化,以
改变公众
对欧盟脱欧本身破裂等重大问题的
看法
那么为什么这些都被忽略了,
因为仅仅教授数学和计算
并不能完成工作还有两个
需要成为
对话的一部分的成分总是被
忽略第一个是公民责任所以
让我们谈谈公民责任如果
你认为这是数学和代码
很棒我相信你会告诉学生
采取有偏见的样本的危险,但
我们需要做的是教学
生
从充满偏见的社会中抽取一个好的随机样本的危险
如果您认为这只是数学
和代码很好,我会教您
算法来帮助您汇总数据以
预测人类行为并
找出你们中的哪一个 wd 最
有可能犯罪,
但我们需要的是公民
责任,即
如果策略是我们
将全部放在数学上,那么现在再次提出该问题或收集该数据是否合乎道德 老师
们,他们准备好进行这次对话了吗?如果他们准备好了
,
要求它完全落在他们身上是公平的
吗? 获得有
种族偏见的算法并将社交媒体武器化
我们需要考虑的下一个
因素是域名投资,因为我
可能是你见过的最不可思议的程序员
和统计学家,但如果
我对棒球一无所知,我
就不能去 yawkey 方法和分析
红袜队的体育统计数据,所以
想象一下,如果一位老师决定她的
孩子要分析一个
关于托斯卡纳最好的葡萄园的数据集,
哪些学生 参与
哪些学生觉得被包括在内 哪些
学生觉得被忽视
事实证明,数据的选择
对该领域的实际投资
不仅是
参与度和相关性的关键组成部分,也是
多样性公平和包容性的关键组成部分 我们已经发表了
一篇论文 这个研究
小组在几周内专门讨论了
这个问题,所以我们
需要的是让老师能够就
对孩子们重要的内容领域进行交流,
并在他们所在的地方再次与他们见面,这
是否公平
长期以来,数学老师
不尊重人文学科专业知识,
这一直
是stem世界的标准操作程序,
我们不能
重蹈覆辙,
所以我很高兴与您分享
我们目前在这里获得的一些研究成果
我们
现在
在全国最大的学区
纽约市有一个在全国范围内使用的课程 我们有社会研究
老师让孩子们分析 st op
和 frisk 数据集在亚利桑那
州以革命性的新方式教授社会研究
我们有物理
老师,他们已经让他们的孩子
收集实验数据,但现在他们的
孩子可以分析数据并尝试
找出什么样的方程模型
我 看到
了,他们甚至在看到书中的方程式之前就可以弄清楚
加利福尼亚的学生正在查看
气候数据,您可以让
体育课上的学生分析他们的罚球
命中率,或者让营养课上的学生
查看他们的零食习惯
这可能是一个完整的法庭新闻,它
现在正在
发生,我想离开这个演讲的地方是
说这个概念,即混合数学和
编码很容易是有缺陷的,但即使你
做对了,把它留在数学和
编码对那些人来说从根本上是危险
的 我们当中关心数据
科学的人中,如果标题变成了
新的数学 2.0,我们就沉没了
这需要一个跨学科的
解决方案 一个让教师参与的全场新闻
s 年级水平和跨
学科 我们需要确保这些
成分是对话的一部分
我们需要确保我们
选择工具不仅仅是因为它们是免费的或
因为它们很受欢迎,而是我们正在
选择一种工具 这适合
该学科的学习目标和
学生的认知需求 我们需要确保我们
不只是用更多的数据集倾倒孩子
我们需要确保他们
实际上是更好的数据集
他们是否参与 他们会
在需要去的地方遇到
孩子吗 数据集的列实际上是否
可以访问,因为如果学生
需要一周的时间来了解数据集是
什么,
我们甚至已经迷失了
,最后因为我们
如此彻底地相信这一点 我们认为
让我们免费提供的所有课程材料免费是很重要的,
希望你们所有人都能加入我们,
并让跨
学科的教师参与进来,让数据科学成为现实,但
也让它负责
我很幸运能与一个
令人难以置信的团队一起工作
,我要感谢你们所有人的
时间