How were building the worlds largest family tree Yaniv Erlich

People use the internet
for various reasons.

It turns out that one of the most
popular categories of website

is something that people
typically consume in private.

It involves curiosity,

non-insignificant levels
of self-indulgence

and is centered around recording
the reproductive activities

of other people.

(Laughter)

Of course, I’m talking about genealogy –

(Laughter)

the study of family history.

When it comes to detailing family history,

in every family, we have this person
that is obsessed with genealogy.

Let’s call him Uncle Bernie.

Uncle Bernie is exactly the last person
you want to sit next to

in Thanksgiving dinner,

because he will bore you to death
with peculiar details

about some ancient relatives.

But as you know,

there is a scientific side for everything,

and we found that Uncle Bernie’s stories

have immense potential
for biomedical research.

We let Uncle Bernie
and his fellow genealogists

document their family trees through
a genealogy website called geni.com.

When users upload
their trees to the website,

it scans their relatives,

and if it finds matches to existing trees,

it merges the existing
and the new tree together.

The result is that large
family trees are created,

beyond the individual level
of each genealogist.

Now, by repeating this process
with millions of people

all over the world,

we can crowdsource the construction
of a family tree of all humankind.

Using this website,

we were able to connect 125 million people

into a single family tree.

I cannot draw the tree
on the screens over here

because they have less pixels

than the number of people in this tree.

But here is an example of a subset
of 6,000 individuals.

Each green node is a person.

The red nodes represent marriages,

and the connections represent parenthood.

In the middle of this tree,
you see the ancestors.

And as we go to the periphery,
you see the descendants.

This tree has seven
generations, approximately.

Now, this is what happens
when we increase the number of individuals

to 70,000 people –

still a tiny subset
of all the data that we have.

Despite that, you can already see
the formation of gigantic family trees

with many very distant relatives.

Thanks to the hard work
of our genealogists,

we can go back in time
hundreds of years ago.

For example, here is Alexander Hamilton,

who was born in 1755.

Alexander was the first
US Secretary of the Treasury,

but mostly known today
due to a popular Broadway musical.

We found that Alexander has deeper
connections in the showbiz industry.

In fact, he’s a blood relative of …

Kevin Bacon!

(Laughter)

Both of them are descendants
of a lady from Scotland

who lived in the 13th century.

So you can say that Alexander Hamilton

is 35 degrees of Kevin Bacon genealogy.

(Laughter)

And our tree has millions
of stories like that.

We invested significant efforts
to validate the quality of our data.

Using DNA, we found that .3 percent of
the mother-child connections in our data

are wrong,

which could match the adoption rate
in the US pre-Second World War.

For the father’s side,

the news is not as good:

1.9 percent of the father-child
connections in our data are wrong.

And I see some people smirk over here.

It is what you think –

there are many milkmen out there.

(Laughter)

However, this 1.9 percent error rate
in patrilineal connections

is not unique to our data.

Previous studies found
a similar error rate

using clinical-grade pedigrees.

So the quality of our data is good,

and that should not be a surprise.

Our genealogists have
a profound, vested interest

in correctly documenting
their family history.

We can leverage this data to learn
quantitative information about humanity,

for example, questions about demography.

Here is a look at all our profiles
on the map of the world.

Each pixel is a person
that lived at some point.

And since we have so much data,

you can see the contours
of many countries,

especially in the Western world.

In this clip, we stratified
the map that I’ve showed you

based on the year of births of individuals
from 1400 to 1900,

and we compared it
to known migration events.

The clip is going to show you
that the deepest lineages in our data

go all the way back to the UK,

where they had better record keeping,

and then they spread along
the routes of Western colonialism.

Let’s watch this.

(Music)

[Year of birth: ]

[1492 - Columbus sails the ocean blue]

[1620 - Mayflower lands in Massachusetts]

[1652 - Dutch settle in South Africa]

[1788 - Great Britain penal
transportation to Australia starts]

[1836 - First migrants use Oregon Trail]

[all activity]

I love this movie.

Now, since these migration events
are giving the context of families,

we can ask questions such as:

What is the typical distance
between the birth locations

of husbands and wives?

This distance plays
a pivotal role in demography,

because the patterns in which
people migrate to form families

determine how genes spread
in geographical areas.

We analyzed this distance using our data,

and we found that in the old days,

people had it easy.

They just married someone
in the village nearby.

But the Industrial Revolution
really complicated our love life.

And today, with affordable flights
and online social media,

people typically migrate more than
100 kilometers from their place of birth

to find their soul mate.

So now you might ask:

OK, but who does the hard work
of migrating from places to places

to form families?

Are these the males or the females?

We used our data to address this question,

and at least in the last 300 years,

we found that the ladies do the hard work

of migrating from places
to places to form families.

Now, these results
are statistically significant,

so you can take it as scientific fact
that males are lazy.

(Laughter)

We can move from questions
about demography

and ask questions about human health.

For example, we can ask

to what extent genetic variations
account for differences in life span

between individuals.

Previous studies analyzed the correlation
of longevity between twins

to address this question.

They estimated that the genetic
variations account for

about a quarter of the differences
in life span between individuals.

But twins can be correlated
due to so many reasons,

including various environmental effects

or a shared household.

Large family trees give us the opportunity
to analyze both close relatives,

such as twins,

all the way to distant relatives,
even fourth cousins.

This way we can build robust models

that can tease apart the contribution
of genetic variations

from environmental factors.

We conducted this analysis using our data,

and we found that genetic variations
explain only 15 percent

of the differences in life span
between individuals.

That is five years, on average.

So genes matter less than
what we thought before to life span.

And I find it great news,

because it means that
our actions can matter more.

Smoking, for example, determines
10 years of our life expectancy –

twice as much as what genetics determines.

We can even have more surprising findings

as we move from family trees

and we let our genealogists
document and crowdsource DNA information.

And the results can be amazing.

It might be hard to imagine,
but Uncle Bernie and his friends

can create DNA forensic capabilities

that even exceed
what the FBI currently has.

When you place the DNA
on a large family tree,

you effectively create a beacon

that illuminates the hundreds
of distant relatives

that are all connected to the person
that originated the DNA.

By placing multiple beacons
on a large family tree,

you can now triangulate the DNA
of an unknown person,

the same way that the GPS system
uses multiple satellites

to find a location.

The prime example
of the power of this technique

is capturing the Golden State Killer,

one of the most notorious criminals
in the history of the US.

The FBI had been searching
for this person for over 40 years.

They had his DNA,

but he never showed up
in any police database.

About a year ago, the FBI
consulted a genetic genealogist,

and she suggested that they submit
his DNA to a genealogy service

that can locate distant relatives.

They did that,

and they found a third cousin
of the Golden State Killer.

They built a large family tree,

scanned the different
branches of that tree,

until they found a profile
that exactly matched

what they knew about
the Golden State Killer.

They obtained DNA from this person
and found a perfect match

to the DNA they had in hand.

They arrested him
and brought him to justice

after all these years.

Since then, genetic genealogists
have started working with

local US law enforcement agencies

to use this technique
in order to capture criminals.

And only in the past six months,

they were able to solve
over 20 cold cases with this technique.

Luckily, we have people like Uncle
Bernie and his fellow genealogists

These are not amateurs
with a self-serving hobby.

These are citizen scientists
with a deep passion to tell us who we are.

And they know that the past
can hold a key to the future.

Thank you very much.

(Applause)