How we can store digital data in DNA Dina Zielinski

I could fit all movies ever made
inside of this tube.

If you can’t see it,
that’s kind of the point.

(Laughter)

Before we understand how this is possible,

it’s important to understand
the value of this feat.

All of our thoughts
and actions these days,

through photos and videos –

even our fitness activities –

are stored as digital data.

Aside from running out of space

on our phones,

we rarely think about
our digital footprint.

But humanity has collectively
generated more data

in the last few years

than all of preceding human history.

Big data has become a big problem.

Digital storage is really expensive,

and none of these devices that we have
really stand the test of time.

There’s this nonprofit website
called the Internet Archive.

In addition to free books and movies,

you can access web pages
as far back as 1996.

Now, this is very tempting,

but I decided to go back and look at
the TED website’s very humble beginnings.

As you can see, it’s changed
quite a bit in the last 30 years.

So this led me to the first-ever TED,

back in 1984,

and it just so happened
to be a Sony executive

explaining how a compact disk works.

(Laughter)

Now, it’s really incredible
to be able to go back in time

and access this moment.

It’s also really fascinating
that after 30 years, after that first TED,

we’re still talking about digital storage.

Now, if we look back another 30 years,

IBM released the first-ever hard drive

back in 1956.

Here it is being loaded for shipping
in front of a small audience.

It held the equivalent of one MP3 song

and weighed over one ton.

At 10,000 dollars a megabyte,

I don’t think anyone in this room
would be interested in buying this thing,

except maybe as a collector’s item.

But it’s the best we could do at the time.

We’ve come such a long way
in data storage.

Devices have evolved dramatically.

But all media eventually wear out
or become obsolete.

If someone handed you a floppy drive today
to back up your presentation,

you’d probably look at them
kind of strange, maybe laugh,

but you’d have no way
to use the damn thing.

These devices can no longer meet
our storage needs,

although some of them can be repurposed.

All technology eventually dies or is lost,

along with our data,

all of our memories.

There’s this illusion that
the storage problem has been solved,

but really, we all just externalize it.

We don’t worry about storing
our emails and our photos.

They’re just in the cloud.

But behind the scenes,
storage is problematic.

After all, the cloud is just
a lot of hard drives.

Now, most digital data,
we could argue, is not really critical.

Surely, we could just delete it.

But how can we really know
what’s important today?

We’ve learned so much about human history

from drawings and writings in caves,

from stone tablets.

We’ve deciphered languages
from the Rosetta Stone.

You know, we’ll never really have
the whole story, though.

Our data is our story,

even more so today.

We won’t have our record
recorded on stone tablets.

But we don’t have to choose
what is important now.

There’s a way to store it all.

It turns out that there’s
a solution that’s been around

for a few billion years,

and it’s actually in this tube.

DNA is nature’s oldest storage device.

After all, it contains
all the information necessary

to build and maintain a human being.

But what makes DNA so great?

Well, let’s take our own genome

as an example.

If we were to print out
all three billion A’s, T’s, C’s and G’s

on a standard font, standard format,

and then we were
to stack all of those papers,

it would be about 130 meters high,

somewhere between the Statue of Liberty
and the Washington Monument.

Now, if we converted
all those A’s, T’s, C’s and G’s

to digital data, to zeroes and ones,

it would total a few gigs.

And that’s in each cell of our body.

We have more than 30 trillion cells.

You get the idea:

DNA can store a ton of information
in a minuscule space.

DNA is also very durable,

and it doesn’t even require
electricity to store it.

We know this because scientists
have recovered DNA from ancient humans

that lived hundreds
of thousands of years ago.

One of those is Ötzi the Iceman.

Turns out, he’s Austrian.

(Laughter)

He was found high, well-preserved,

in the mountains
between Italy and Austria,

and it turns out that he has living
genetic relatives here in Austria today.

So one of you could be a cousin of Ötzi.

(Laughter)

The point is that we have a better chance
of recovering information

from an ancient human

than we do from an old phone.

It’s also much less likely
that we’ll lose the ability to read DNA

than any single man-made device.

Every single new storage format
requires a new way to read it.

We’ll always be able to read DNA.

If we can no longer sequence,
we have bigger problems

than worrying about data storage.

Storing data on DNA is not new.

Nature’s been doing it
for several billion years.

In fact, every living thing
is a DNA storage device.

But how do we store data on DNA?

This is Photo 51.

It’s the first-ever photo of DNA,

taken about 60 years ago.

This is around the time that
that same hard drive was released by IBM.

So really, our understanding of digital
storage and of DNA have coevolved.

We first learned to sequence, or read DNA,

and very soon after, how to write it,

or synthesize it.

This is much like how we learn
a new language.

And now we have the ability
to read, write and copy DNA.

We do it in the lab all the time.

So anything, really anything,
that can be stored as zeroes and ones

can be stored in DNA.

To store something digitally,
like this photo,

we convert it to bits, or binary digits.

Each pixel in a black-and-white photo
is simply a zero or a one.

And we can write DNA much like an inkjet
printer can print letters on a page.

We just have to convert our data,
all of those zeroes and ones,

to A’s, T’s, C’s and G’s,

and then we send this
to a synthesis company.

So we write it, we can store it,

and when we want to recover our data,
we just sequence it.

Now, the fun part of all of this
is deciding what files to include.

We’re serious scientists,
so we had to include a manuscript

for good posterity.

We also included a $50 Amazon gift card –

don’t get too excited, it’s already
been spent, someone decoded it –

as well as an operating system,

one of the first movies ever made

and a Pioneer plaque.

Some of you might have seen this.

It has a depiction of a typical –
apparently – male and female,

and our approximate location
in the Solar System,

in case the Pioneer spacecraft
ever encounters extraterrestrials.

So once we decided what sort of files
we want to encode,

we package up the data,

convert those zeroes and ones
to A’s, T’s, C’s and G’s,

and then we just send this file off
to a synthesis company.

And this is what we got back.

Our files were in this tube.

All we had to do was sequence it.

This all sounds pretty straightforward,

but the difference between
a really cool, fun idea

and something we can actually use

is overcoming these practical challenges.

Now, while DNA is more robust
than any man-made device,

it’s not perfect.

It does have some weaknesses.

We recover our message
by sequencing the DNA,

and every time data is retrieved,

we lose the DNA.

That’s just part
of the sequencing process.

We don’t want to run out of data,

but luckily, there’s a way to copy the DNA

that’s even cheaper and easier
than synthesizing it.

We actually tested a way to make
200 trillion copies of our files,

and we recovered
all the data without error.

So sequencing also introduces
errors into our DNA,

into the A’s, T’s, C’s and G’s.

Nature has a way
to deal with this in our cells.

But our data is stored
in synthetic DNA in a tube,

so we had to find our own way
to overcome this problem.

We decided to use an algorithm
that was used to stream videos.

When you’re streaming a video,

you’re essentially trying to recover
the original video, the original file.

When we’re trying to recover
our original files,

we’re simply sequencing.

But really, both of these processes are
about recovering enough zeroes and ones

to put our data back together.

And so, because of our coding strategy,

we were able to package up all of our data

in a way that allowed us to make
millions and trillions of copies

and still always recover
all of our files back.

This is the movie we encoded.

It’s one of the first movies ever made,

and now the first to be copied
more than 200 trillion times on DNA.

Soon after our work was published,

we participated in an “Ask Me Anything”
on the website reddit.

If you’re a fellow nerd,
you’re very familiar with this website.

Most questions were thoughtful.

Some were comical.

For example, one user wanted to know
when we would have a literal thumb drive.

Now, the thing is,

our DNA already stores everything
needed to make us who we are.

It’s a lot safer to store data on DNA

in synthetic DNA in a tube.

Writing and reading data from DNA
is obviously a lot more time-consuming

than just saving all your files
on a hard drive –

for now.

So initially, we should focus
on long-term storage.

Most data are ephemeral.

It’s really hard to grasp
what’s important today,

or what will be important
for future generations.

But the point is,
we don’t have to decide today.

There’s this great program by UNESCO
called the “Memory of the World” program.

It’s been created to preserve
historical materials

that are considered of value
to all of humanity.

Items are nominated
to be added to the collection,

including that film that we encoded.

While a wonderful way
to preserve human heritage,

it doesn’t have to be a choice.

Instead of asking
the current generation – us –

what might be important in the future,

we could store everything in DNA.

Storage is not just about how many bytes

but how well we can actually
store the data and recover it.

There’s always been this tension
between how much data we can generate

and how much we can recover

and how much we can store.

Every advance in writing data
has required a new way to read it.

We can no longer read old media.

How many of you even have
a disk drive in your laptop,

never mind a floppy drive?

This will never be the case with DNA.

As long as we’re around, DNA is around,

and we’ll find a way to sequence it.

Archiving the world around us
is part of human nature.

This is the progress we’ve made
in digital storage in 60 years,

at a time when we were only
beginning to understand DNA.

Yet, we’ve made similar progress
in half that time with DNA sequencers,

and as long as we’re around,
DNA will never be obsolete.

Thank you.

(Applause)