How data is helping us unravel the mysteries of the brain Steve McCarroll

Nine years ago,

my sister discovered lumps
in her neck and arm

and was diagnosed with cancer.

From that day, she started to benefit

from the understanding
that science has of cancer.

Every time she went to the doctor,

they measured specific molecules

that gave them information
about how she was doing

and what to do next.

New medical options
became available every few years.

Everyone recognized
that she was struggling heroically

with a biological illness.

This spring, she received
an innovative new medical treatment

in a clinical trial.

It dramatically knocked back her cancer.

Guess who I’m going to spend
this Thanksgiving with?

My vivacious sister,

who gets more exercise than I do,

and who, like perhaps
many people in this room,

increasingly talks about a lethal illness

in the past tense.

Science can, in our lifetimes –
even in a decade –

transform what it means
to have a specific illness.

But not for all illnesses.

My friend Robert and I
were classmates in graduate school.

Robert was smart,

but with each passing month,

his thinking seemed to become
more disorganized.

He dropped out of school,
got a job in a store …

But that, too, became too complicated.

Robert became fearful and withdrawn.

A year and a half later,
he started hearing voices

and believing that people
were following him.

Doctors diagnosed him with schizophrenia,

and they gave him
the best drug they could.

That drug makes the voices
somewhat quieter,

but it didn’t restore his bright mind
or his social connectedness.

Robert struggled to remain connected

to the worlds of school
and work and friends.

He drifted away,

and today I don’t know where to find him.

If he watches this,

I hope he’ll find me.

Why does medicine have
so much to offer my sister,

and so much less to offer
millions of people like Robert?

The need is there.

The World Health Organization
estimates that brain illnesses

like schizophrenia, bipolar disorder
and major depression

are the world’s largest cause
of lost years of life and work.

That’s in part because these illnesses
often strike early in life,

in many ways, in the prime of life,

just as people are finishing
their educations, starting careers,

forming relationships and families.

These illnesses can result in suicide;

they often compromise one’s ability
to work at one’s full potential;

and they’re the cause of so many
tragedies harder to measure:

lost relationships and connections,

missed opportunities
to pursue dreams and ideas.

These illnesses limit human possibilities

in ways we simply cannot measure.

We live in an era in which
there’s profound medical progress

on so many other fronts.

My sister’s cancer story
is a great example,

and we could say the same
of heart disease.

Drugs like statins will prevent
millions of heart attacks and strokes.

When you look at these areas
of profound medical progress

in our lifetimes,

they have a narrative in common:

scientists discovered molecules
that matter to an illness,

they developed ways to detect
and measure those molecules in the body,

and they developed ways
to interfere with those molecules

using other molecules – medicines.

It’s a strategy that has worked
again and again and again.

But when it comes to the brain,
that strategy has been limited,

because today, we don’t know
nearly enough, yet,

about how the brain works.

We need to learn which of our cells
matter to each illness,

and which molecules in those cells
matter to each illness.

And that’s the mission
I want to tell you about today.

My lab develops technologies
with which we try to turn the brain

into a big-data problem.

You see, before I became a biologist,
I worked in computers and math,

and I learned this lesson:

wherever you can collect vast amounts
of the right kinds of data

about the functioning of a system,

you can use computers in powerful new ways

to make sense of that system
and learn how it works.

Today, big-data approaches
are transforming

ever-larger sectors of our economy,

and they could do the same
in biology and medicine, too.

But you have to have
the right kinds of data.

You have to have data
about the right things.

And that often requires
new technologies and ideas.

And that is the mission that animates
the scientists in my lab.

Today, I want to tell you
two short stories from our work.

One fundamental obstacle we face

in trying to turn the brain
into a big-data problem

is that our brains are composed of
and built from billions of cells.

And our cells are not generalists;
they’re specialists.

Like humans at work,

they specialize into thousands
of different cellular careers,

or cell types.

In fact, each of
the cell types in our body

could probably give a lively TED Talk

about what it does at work.

But as scientists,
we don’t even know today

how many cell types there are,

and we don’t know what the titles
of most of those talks would be.

Now, we know many
important things about cell types.

They can differ dramatically
in size and shape.

One will respond to a molecule
that the other doesn’t respond to,

they’ll make different molecules.

But science has largely
been reaching these insights

in an ad hoc way, one cell type at a time,

one molecule at a time.

We wanted to make it possible to learn
all of this quickly and systematically.

Now, until recently, it was the case

that if you wanted to inventory
all of the molecules

in a part of the brain or any organ,

you had to first grind it up
into a kind of cellular smoothie.

But that’s a problem.

As soon as you’ve ground up the cells,

you can only study the contents
of the average cell –

not the individual cells.

Imagine if you were trying to understand
how a big city like New York works,

but you could only do so
by reviewing some statistics

about the average resident of New York.

Of course, you wouldn’t learn very much,

because everything that’s interesting
and important and exciting

is in all the diversity
and the specializations.

And the same thing is true of our cells.

And we wanted to make it possible to study
the brain not as a cellular smoothie

but as a cellular fruit salad,

in which one could generate
data about and learn from

each individual piece of fruit.

So we developed
a technology for doing that.

You’re about to see a movie of it.

Here we’re packaging
tens of thousands of individual cells,

each into its own tiny water droplet

for its own molecular analysis.

When a cell lands in a droplet,
it’s greeted by a tiny bead,

and that bead delivers millions
of DNA bar code molecules.

And each bead delivers
a different bar code sequence

to a different cell.

We incorporate the DNA bar codes

into each cell’s RNA molecules.

Those are the molecular
transcripts it’s making

of the specific genes
that it’s using to do its job.

And then we sequence billions
of these combined molecules

and use the sequences to tell us

which cell and which gene

every molecule came from.

We call this approach “Drop-seq,”
because we use droplets

to separate the cells for analysis,

and we use DNA sequences
to tag and inventory

and keep track of everything.

And now, whenever we do an experiment,

we analyze tens of thousands
of individual cells.

And today in this area of science,

the challenge is increasingly
how to learn as much as we can

as quickly as we can

from these vast data sets.

When we were developing Drop-seq,
people used to tell us,

“Oh, this is going to make you guys
the go-to for every major brain project.”

That’s not how we saw it.

Science is best when everyone
is generating lots of exciting data.

So we wrote a 25-page instruction book,

with which any scientist could build
their own Drop-seq system from scratch.

And that instruction book has been
downloaded from our lab website

50,000 times in the past two years.

We wrote software
that any scientist could use

to analyze the data
from Drop-seq experiments,

and that software is also free,

and it’s been downloaded from our website
30,000 times in the past two years.

And hundreds of labs have written us
about discoveries that they’ve made

using this approach.

Today, this technology is being used
to make a human cell atlas.

It will be an atlas of all
of the cell types in the human body

and the specific genes
that each cell type uses to do its job.

Now I want to tell you about
a second challenge that we face

in trying to turn the brain
into a big data problem.

And that challenge is that
we’d like to learn from the brains

of hundreds of thousands of living people.

But our brains are not physically
accessible while we’re living.

But how can we discover molecular factors
if we can’t hold the molecules?

An answer comes from the fact that
the most informative molecules, proteins,

are encoded in our DNA,

which has the recipes our cells follow
to make all of our proteins.

And these recipes vary
from person to person to person

in ways that cause the proteins
to vary from person to person

in their precise sequence

and in how much each cell type
makes of each protein.

It’s all encoded in our DNA,
and it’s all genetics,

but it’s not the genetics
that we learned about in school.

Do you remember big B, little b?

If you inherit big B, you get brown eyes?

It’s simple.

Very few traits are that simple.

Even eye color is shaped by much more
than a single pigment molecule.

And something as complex
as the function of our brains

is shaped by the interaction
of thousands of genes.

And each of these genes
varies meaningfully

from person to person to person,

and each of us is a unique
combination of that variation.

It’s a big data opportunity.

And today, it’s increasingly
possible to make progress

on a scale that was never possible before.

People are contributing to genetic studies

in record numbers,

and scientists around the world
are sharing the data with one another

to speed progress.

I want to tell you a short story
about a discovery we recently made

about the genetics of schizophrenia.

It was made possible
by 50,000 people from 30 countries,

who contributed their DNA
to genetic research on schizophrenia.

It had been known for several years

that the human genome’s largest influence
on risk of schizophrenia

comes from a part of the genome

that encodes many of the molecules
in our immune system.

But it wasn’t clear which gene
was responsible.

A scientist in my lab developed
a new way to analyze DNA with computers,

and he discovered something
very surprising.

He found that a gene called
“complement component 4” –

it’s called “C4” for short –

comes in dozens of different forms
in different people’s genomes,

and these different forms
make different amounts

of C4 protein in our brains.

And he found that the more
C4 protein our genes make,

the greater our risk for schizophrenia.

Now, C4 is still just one risk factor
in a complex system.

This isn’t big B,

but it’s an insight about
a molecule that matters.

Complement proteins like C4
were known for a long time

for their roles in the immune system,

where they act as a kind of
molecular Post-it note

that says, “Eat me.”

And that Post-it note
gets put on lots of debris

and dead cells in our bodies

and invites immune cells
to eliminate them.

But two colleagues of mine found
that the C4 Post-it note

also gets put on synapses in the brain

and prompts their elimination.

Now, the creation and elimination
of synapses is a normal part

of human development and learning.

Our brains create and eliminate
synapses all the time.

But our genetic results suggest
that in schizophrenia,

the elimination process
may go into overdrive.

Scientists at many drug companies tell me
they’re excited about this discovery,

because they’ve been working
on complement proteins for years

in the immune system,

and they’ve learned a lot
about how they work.

They’ve even developed molecules
that interfere with complement proteins,

and they’re starting to test them
in the brain as well as the immune system.

It’s potentially a path toward a drug
that might address a root cause

rather than an individual symptom,

and we hope very much that this work
by many scientists over many years

will be successful.

But C4 is just one example

of the potential for data-driven
scientific approaches

to open new fronts on medical problems
that are centuries old.

There are hundreds of places
in our genomes

that shape risk for brain illnesses,

and any one of them could lead us
to the next molecular insight

about a molecule that matters.

And there are hundreds of cell types that
use these genes in different combinations.

As we and other scientists
work to generate

the rest of the data that’s needed

and to learn all that we can
from that data,

we hope to open many more new fronts.

Genetics and single-cell analysis
are just two ways

of trying to turn the brain
into a big data problem.

There is so much more we can do.

Scientists in my lab
are creating a technology

for quickly mapping the synaptic
connections in the brain

to tell which neurons are talking
to which other neurons

and how that conversation changes
throughout life and during illness.

And we’re developing a way
to test in a single tube

how cells with hundreds
of different people’s genomes

respond differently to the same stimulus.

These projects bring together
people with diverse backgrounds

and training and interests –

biology, computers, chemistry,
math, statistics, engineering.

But the scientific possibilities
rally people with diverse interests

into working intensely together.

What’s the future
that we could hope to create?

Consider cancer.

We’ve moved from an era of ignorance
about what causes cancer,

in which cancer was commonly ascribed
to personal psychological characteristics,

to a modern molecular understanding
of the true biological causes of cancer.

That understanding today
leads to innovative medicine

after innovative medicine,

and although there’s still
so much work to do,

we’re already surrounded by people
who have been cured of cancers

that were considered untreatable
a generation ago.

And millions of cancer survivors
like my sister

find themselves with years of life
that they didn’t take for granted

and new opportunities

for work and joy and human connection.

That is the future that we are determined
to create around mental illness –

one of real understanding and empathy

and limitless possibility.

Thank you.

(Applause)