Four Ingredients for K12 Data Science
i like to think of this as sort of a
cooking presentation right we’re going
to be talking about what the ingredients
need to be to teach data science in k-12
classes i’ve worn a lot of different
hats in my life i’ve i’ve been a
computer scientist and professional
programmer as john told you i’ve been a
math teacher right here in boston um
i’ve had the incredible privilege to
work alongside giants in the field like
sriram krishnamurthy and kathy fisler on
a research project called bootstrap
based at brown university in the field
of computer science education and most
recently i’ve donned a hat as the father
to the coolest girl in the world i
promised maya she’d be in here and while
i would love to spend the next nine and
a half minutes uh giving you a ted talk
focused on her instead we’re going to
focus on something slightly less
interesting which is what’s going on in
the cutting edge of computer science
research let me take you back a ways to
about 10 or 15 years remember when
everybody was saying cs for all see us
for all we got to get coding into
schools right
at the time
we made a very controversial bet at
bootstrap first we said you know we
don’t think siloed classes are the only
way to do this in fact they might not
even be the best way
second we gambled on the idea that we
could fuse computing and mathematics
authentically so that instead of
undermining the math the computing
actually reinforced it and third we bet
there was a way to do this so it worked
equitably for all students so
fast forward a little bit
this curriculum sort of busted out of
the lab and became one of the most
widely used computing curriculum
nationwide and while we’re thrilled with
our scale we’re proud of our diversity
and the reason that we have those
numbers is because we’re working with
the teachers that already reach every
child not the computer science teachers
but the mainstream math teachers who
have no computing background at all for
them it was just a powerful way to teach
mathematics now we didn’t want to be one
hit wonders right so we rinsed and
repeated the formula and extended this
for things like algebra physics and
beyond
and about a half decade ago we started
getting really excited about something
that nobody was terribly excited about
which was what if you could teach data
science in k12
fast forward to today it’s not cs for
all anymore it’s data science for
everyone and they’re asking the same
questions that we asked 10 years ago
what should these classes look like and
where do they fit curriculum design is
essentially a recipe and every recipe
has room for flexibility your cupcake
might involve cream cheese frosting and
your cupcake might involve you know
coconut shreds or something
maybe not coconut shreds i wouldn’t put
that on you but there’s room for flexing
with these ingredients but one thing we
can all agree is that if you leave an
ingredient out completely well it might
be delicious but you haven’t baked what
you set out to bake so the question
becomes what are the must-have
ingredients for a responsible k-12 data
science class
now the prevailing wisdom is that we can
all agree on at least two ingredients
mathematics and computing and when i say
computing i mean programming algorithms
structured data and for you data
scientists out there who may not be
familiar with the k12 math standards
those are the standards that cover the
statistics content right the concepts
that are necessary for rigor those are
the standards that cover the data
visualization right it’s those standards
that talk about histograms and lines of
best fit what we’re hearing and this is
sort of like the loudest voices in the
room is that the solution here we go the
solution is we’re going to take stats
classes right thank you r.a fisher from
100 years ago we’re going to add some
coding and boom we’ve baked ourselves a
data science class
and therefore we should elevate
statistics to be just as important as
calculus now as a former math teacher i
am all about elevating statistics to be
just as important as calculus i think
it’s great but if our goal is to bring
data science to k12 i’m here to tell you
that this formula is dangerously flawed
imagine an amazing cs class kids are
building virtual worlds and 3d games and
at the end of the class they spend like
two weeks being given a calculator and
they’re taught which stats buttons
to press
to do some statistics is that a data
science class
obviously not
now let’s flip that suppose you have an
amazing statistics class totally awesome
and then at the very end we’re going to
have two weeks where they learn what
commands to type into python
also not a data science class and as a
team that’s been working on this for the
last 15 years who knows something about
combining math and computing we know
it’s not that simple what you need to do
if you want to mix these ingredients is
find the computational concepts that
bridge these worlds there’s a lot of
them i’ll just give you three quick
examples
how do you take a complex problem break
it down into simpler pieces and know
that when you’ve solved those pieces
you’ve actually solved the original
problem you set out to answer
how do you trust a computation that’s
been performed on a data set with 10 000
rows that nobody could possibly check by
hand and how do you ensure that your
results are reproducible that anyone
else could take your data and your code
and see the same results that you did
these concepts were critical to our
success over a decade ago and they’re
just as critical now
and recognize that if you’re still
thinking that what’s necessary is just
teaching some coding
it doesn’t touch any of them
but it actually gets worse because there
are two other ingredients that are often
left out of the conversation
with disastrous results so i call this
when data science goes bad
this may come as a surprise to many of
you but we live in a society that’s kind
of racist
and when you do data analysis on that
data guess what models and algorithms
come out
kind of racist ones and this isn’t just
an isolated headline right this has
become essentially an epidemic where the
darkest and deepest divides in our
society are being institutionalized in
code affecting everything from medical
care to sentencing guidelines and racism
is not just where it stops
political consultants are mining voter
data and everything else to build
tactically precise gerrymandered
districts that serve to further deepen
the polarization in our democracy
and of course we all talk about how
important it is for students to learn
about
cyber security right we gotta teach them
what a good password is teach them not
to hand out that password
and yet what we really need to be doing
is teaching them enough data science to
understand why they should not be
filling out that survey that tells them
which harry potter character they are
most like because it turns out that when
you mine that freely available data on
social media it can be weaponized to
shift public opinion about issues as
major as the fracturing of the european
union brexit itself
so why are these being left out well
because just teaching math and computing
doesn’t get the job done there’s two
more ingredients that need to be part of
the conversation that are always left
out
the first is civic responsibility so
let’s talk about civic responsibility if
you’re viewing this as math and code
great i’m sure you’ll tell students the
dangers of taking a biased sample but
what we need to be doing is teaching
students the dangers of a good random
sample taken from a society filled with
bias
if you’re thinking of this as just math
and code well great i’ll teach you the
algorithms to help you aggregate data to
predict human behavior and find out
which of you in the crowd are most
likely to commit a crime
but without what we need is civic
responsibility that says whether it is
ethical to ask that question or gather
that data in the first place
now again if the strategy is we’re going
to put it all on math teachers
are they ready to have this conversation
and if they are is it fair to demand
that it falls solely on them i don’t
think so when you teach medicine without
civic responsibility you get the
tuskegee experiments when you teach data
science without this ingredient you get
racially biased algorithms and weaponize
social media
the next ingredient that we need to
consider is domain investment because i
could be the most incredible programmer
and statistician you’ve ever met but if
i don’t know anything about baseball i
cannot go down to yawkey way and analyze
sports statistics for the red sox so
imagine if a teacher decides that her
kids are going to analyze a data set
about the best vineyards in tuscany
which students are engaged
which students feel included which
students feel left out
it turns out that the choice of data the
actual investment in the domain is a
critical component not just of
engagement and relevance but also of
diversity equity and inclusion we’ve got
a paper coming out of this research
group that talks to specifically about
this in a couple of weeks so what we
need is to have teachers who can speak
to the content areas that matter to kids
and meet them where they are
again is it fair to put all of that on
the math teachers
disrespecting the domain expertise of
humanities folks has been standard
operating procedure for the stem world
for too long we cannot afford to repeat
that mistake
so i’m excited to share with you some of
the research results that we’ve had here
currently we’ve got a curriculum that is
in use around the country right now in
the nation’s largest school district new
york city we’ve got social studies
teachers having kids analyze the stop
and frisk data set teaching social
studies in a revolutionary new way
out in arizona we’ve got physics
teachers who already had their kids
gather experimental data but now their
kids can analyze the data and try to
figure out what kind of equation models
what i’m seeing and they can figure it
out before they even see the equation in
the book
students in california are looking at
climate data you can have students in a
phys ed class analyzing their free throw
percentages or in a nutrition class
looking at their snacking habits
this can be a full court press and it’s
happening now
where i want to leave this talk is by
saying this notion that mixing math and
coding is easy is flawed but even if you
do it right leaving it at math and
coding is fundamentally dangerous
for those of us who care about data
science if the headline becomes it’s the
new math 2.0 we are sunk
this needs to be an interdisciplinary
solution a full court press that engages
teachers across grade levels and across
disciplines we need to make sure these
ingredients are part of the conversation
we need to make sure that we’re not just
picking tools because they’re free or
because they’re popular but that we’re
choosing a tool that is appropriate for
the learning goals of the subject and
for the cognitive demands of the
students we need to make sure that we’re
not just dumping kids with more data
sets we need to make sure they’re
actually better data sets
are they engaging do they meet kids
where they need to be do the columns of
your data set actually are they
accessible because if it takes a student
a week to learn what a data set is even
about
we’ve lost
and finally because we believe in this
so thoroughly we think it’s important to
make it free all of our curricular
materials we’re giving away in the hopes
that all of you out there will join us
and engage teachers from across the
discipline to make data science real but
also make it responsible
i’m fortunate enough to work with an
incredible team
and i want to thank all of you for your
time