How data is helping us unravel the mysteries of the brain Steve McCarroll
Nine years ago,
my sister discovered lumps
in her neck and arm
and was diagnosed with cancer.
From that day, she started to benefit
from the understanding
that science has of cancer.
Every time she went to the doctor,
they measured specific molecules
that gave them information
about how she was doing
and what to do next.
New medical options
became available every few years.
Everyone recognized
that she was struggling heroically
with a biological illness.
This spring, she received
an innovative new medical treatment
in a clinical trial.
It dramatically knocked back her cancer.
Guess who I’m going to spend
this Thanksgiving with?
My vivacious sister,
who gets more exercise than I do,
and who, like perhaps
many people in this room,
increasingly talks about a lethal illness
in the past tense.
Science can, in our lifetimes –
even in a decade –
transform what it means
to have a specific illness.
But not for all illnesses.
My friend Robert and I
were classmates in graduate school.
Robert was smart,
but with each passing month,
his thinking seemed to become
more disorganized.
He dropped out of school,
got a job in a store …
But that, too, became too complicated.
Robert became fearful and withdrawn.
A year and a half later,
he started hearing voices
and believing that people
were following him.
Doctors diagnosed him with schizophrenia,
and they gave him
the best drug they could.
That drug makes the voices
somewhat quieter,
but it didn’t restore his bright mind
or his social connectedness.
Robert struggled to remain connected
to the worlds of school
and work and friends.
He drifted away,
and today I don’t know where to find him.
If he watches this,
I hope he’ll find me.
Why does medicine have
so much to offer my sister,
and so much less to offer
millions of people like Robert?
The need is there.
The World Health Organization
estimates that brain illnesses
like schizophrenia, bipolar disorder
and major depression
are the world’s largest cause
of lost years of life and work.
That’s in part because these illnesses
often strike early in life,
in many ways, in the prime of life,
just as people are finishing
their educations, starting careers,
forming relationships and families.
These illnesses can result in suicide;
they often compromise one’s ability
to work at one’s full potential;
and they’re the cause of so many
tragedies harder to measure:
lost relationships and connections,
missed opportunities
to pursue dreams and ideas.
These illnesses limit human possibilities
in ways we simply cannot measure.
We live in an era in which
there’s profound medical progress
on so many other fronts.
My sister’s cancer story
is a great example,
and we could say the same
of heart disease.
Drugs like statins will prevent
millions of heart attacks and strokes.
When you look at these areas
of profound medical progress
in our lifetimes,
they have a narrative in common:
scientists discovered molecules
that matter to an illness,
they developed ways to detect
and measure those molecules in the body,
and they developed ways
to interfere with those molecules
using other molecules – medicines.
It’s a strategy that has worked
again and again and again.
But when it comes to the brain,
that strategy has been limited,
because today, we don’t know
nearly enough, yet,
about how the brain works.
We need to learn which of our cells
matter to each illness,
and which molecules in those cells
matter to each illness.
And that’s the mission
I want to tell you about today.
My lab develops technologies
with which we try to turn the brain
into a big-data problem.
You see, before I became a biologist,
I worked in computers and math,
and I learned this lesson:
wherever you can collect vast amounts
of the right kinds of data
about the functioning of a system,
you can use computers in powerful new ways
to make sense of that system
and learn how it works.
Today, big-data approaches
are transforming
ever-larger sectors of our economy,
and they could do the same
in biology and medicine, too.
But you have to have
the right kinds of data.
You have to have data
about the right things.
And that often requires
new technologies and ideas.
And that is the mission that animates
the scientists in my lab.
Today, I want to tell you
two short stories from our work.
One fundamental obstacle we face
in trying to turn the brain
into a big-data problem
is that our brains are composed of
and built from billions of cells.
And our cells are not generalists;
they’re specialists.
Like humans at work,
they specialize into thousands
of different cellular careers,
or cell types.
In fact, each of
the cell types in our body
could probably give a lively TED Talk
about what it does at work.
But as scientists,
we don’t even know today
how many cell types there are,
and we don’t know what the titles
of most of those talks would be.
Now, we know many
important things about cell types.
They can differ dramatically
in size and shape.
One will respond to a molecule
that the other doesn’t respond to,
they’ll make different molecules.
But science has largely
been reaching these insights
in an ad hoc way, one cell type at a time,
one molecule at a time.
We wanted to make it possible to learn
all of this quickly and systematically.
Now, until recently, it was the case
that if you wanted to inventory
all of the molecules
in a part of the brain or any organ,
you had to first grind it up
into a kind of cellular smoothie.
But that’s a problem.
As soon as you’ve ground up the cells,
you can only study the contents
of the average cell –
not the individual cells.
Imagine if you were trying to understand
how a big city like New York works,
but you could only do so
by reviewing some statistics
about the average resident of New York.
Of course, you wouldn’t learn very much,
because everything that’s interesting
and important and exciting
is in all the diversity
and the specializations.
And the same thing is true of our cells.
And we wanted to make it possible to study
the brain not as a cellular smoothie
but as a cellular fruit salad,
in which one could generate
data about and learn from
each individual piece of fruit.
So we developed
a technology for doing that.
You’re about to see a movie of it.
Here we’re packaging
tens of thousands of individual cells,
each into its own tiny water droplet
for its own molecular analysis.
When a cell lands in a droplet,
it’s greeted by a tiny bead,
and that bead delivers millions
of DNA bar code molecules.
And each bead delivers
a different bar code sequence
to a different cell.
We incorporate the DNA bar codes
into each cell’s RNA molecules.
Those are the molecular
transcripts it’s making
of the specific genes
that it’s using to do its job.
And then we sequence billions
of these combined molecules
and use the sequences to tell us
which cell and which gene
every molecule came from.
We call this approach “Drop-seq,”
because we use droplets
to separate the cells for analysis,
and we use DNA sequences
to tag and inventory
and keep track of everything.
And now, whenever we do an experiment,
we analyze tens of thousands
of individual cells.
And today in this area of science,
the challenge is increasingly
how to learn as much as we can
as quickly as we can
from these vast data sets.
When we were developing Drop-seq,
people used to tell us,
“Oh, this is going to make you guys
the go-to for every major brain project.”
That’s not how we saw it.
Science is best when everyone
is generating lots of exciting data.
So we wrote a 25-page instruction book,
with which any scientist could build
their own Drop-seq system from scratch.
And that instruction book has been
downloaded from our lab website
50,000 times in the past two years.
We wrote software
that any scientist could use
to analyze the data
from Drop-seq experiments,
and that software is also free,
and it’s been downloaded from our website
30,000 times in the past two years.
And hundreds of labs have written us
about discoveries that they’ve made
using this approach.
Today, this technology is being used
to make a human cell atlas.
It will be an atlas of all
of the cell types in the human body
and the specific genes
that each cell type uses to do its job.
Now I want to tell you about
a second challenge that we face
in trying to turn the brain
into a big data problem.
And that challenge is that
we’d like to learn from the brains
of hundreds of thousands of living people.
But our brains are not physically
accessible while we’re living.
But how can we discover molecular factors
if we can’t hold the molecules?
An answer comes from the fact that
the most informative molecules, proteins,
are encoded in our DNA,
which has the recipes our cells follow
to make all of our proteins.
And these recipes vary
from person to person to person
in ways that cause the proteins
to vary from person to person
in their precise sequence
and in how much each cell type
makes of each protein.
It’s all encoded in our DNA,
and it’s all genetics,
but it’s not the genetics
that we learned about in school.
Do you remember big B, little b?
If you inherit big B, you get brown eyes?
It’s simple.
Very few traits are that simple.
Even eye color is shaped by much more
than a single pigment molecule.
And something as complex
as the function of our brains
is shaped by the interaction
of thousands of genes.
And each of these genes
varies meaningfully
from person to person to person,
and each of us is a unique
combination of that variation.
It’s a big data opportunity.
And today, it’s increasingly
possible to make progress
on a scale that was never possible before.
People are contributing to genetic studies
in record numbers,
and scientists around the world
are sharing the data with one another
to speed progress.
I want to tell you a short story
about a discovery we recently made
about the genetics of schizophrenia.
It was made possible
by 50,000 people from 30 countries,
who contributed their DNA
to genetic research on schizophrenia.
It had been known for several years
that the human genome’s largest influence
on risk of schizophrenia
comes from a part of the genome
that encodes many of the molecules
in our immune system.
But it wasn’t clear which gene
was responsible.
A scientist in my lab developed
a new way to analyze DNA with computers,
and he discovered something
very surprising.
He found that a gene called
“complement component 4” –
it’s called “C4” for short –
comes in dozens of different forms
in different people’s genomes,
and these different forms
make different amounts
of C4 protein in our brains.
And he found that the more
C4 protein our genes make,
the greater our risk for schizophrenia.
Now, C4 is still just one risk factor
in a complex system.
This isn’t big B,
but it’s an insight about
a molecule that matters.
Complement proteins like C4
were known for a long time
for their roles in the immune system,
where they act as a kind of
molecular Post-it note
that says, “Eat me.”
And that Post-it note
gets put on lots of debris
and dead cells in our bodies
and invites immune cells
to eliminate them.
But two colleagues of mine found
that the C4 Post-it note
also gets put on synapses in the brain
and prompts their elimination.
Now, the creation and elimination
of synapses is a normal part
of human development and learning.
Our brains create and eliminate
synapses all the time.
But our genetic results suggest
that in schizophrenia,
the elimination process
may go into overdrive.
Scientists at many drug companies tell me
they’re excited about this discovery,
because they’ve been working
on complement proteins for years
in the immune system,
and they’ve learned a lot
about how they work.
They’ve even developed molecules
that interfere with complement proteins,
and they’re starting to test them
in the brain as well as the immune system.
It’s potentially a path toward a drug
that might address a root cause
rather than an individual symptom,
and we hope very much that this work
by many scientists over many years
will be successful.
But C4 is just one example
of the potential for data-driven
scientific approaches
to open new fronts on medical problems
that are centuries old.
There are hundreds of places
in our genomes
that shape risk for brain illnesses,
and any one of them could lead us
to the next molecular insight
about a molecule that matters.
And there are hundreds of cell types that
use these genes in different combinations.
As we and other scientists
work to generate
the rest of the data that’s needed
and to learn all that we can
from that data,
we hope to open many more new fronts.
Genetics and single-cell analysis
are just two ways
of trying to turn the brain
into a big data problem.
There is so much more we can do.
Scientists in my lab
are creating a technology
for quickly mapping the synaptic
connections in the brain
to tell which neurons are talking
to which other neurons
and how that conversation changes
throughout life and during illness.
And we’re developing a way
to test in a single tube
how cells with hundreds
of different people’s genomes
respond differently to the same stimulus.
These projects bring together
people with diverse backgrounds
and training and interests –
biology, computers, chemistry,
math, statistics, engineering.
But the scientific possibilities
rally people with diverse interests
into working intensely together.
What’s the future
that we could hope to create?
Consider cancer.
We’ve moved from an era of ignorance
about what causes cancer,
in which cancer was commonly ascribed
to personal psychological characteristics,
to a modern molecular understanding
of the true biological causes of cancer.
That understanding today
leads to innovative medicine
after innovative medicine,
and although there’s still
so much work to do,
we’re already surrounded by people
who have been cured of cancers
that were considered untreatable
a generation ago.
And millions of cancer survivors
like my sister
find themselves with years of life
that they didn’t take for granted
and new opportunities
for work and joy and human connection.
That is the future that we are determined
to create around mental illness –
one of real understanding and empathy
and limitless possibility.
Thank you.
(Applause)