[ Music ]
>> Start off with some
basic rules of probability.
You should all know
these even intuitively
if you don't know
them from mathematics.
But if you've got
mutually exclusive events,
their probabilities if
you want one or the other,
don't care, is the sum.
If you got events that
are not mutually exclusive
and you want both to
occur, you're dealing
with the product of
the probabilities.
So, thinking in terms of
gambling, rolling snake eyes,
probability is just 1
in 36, either snake eyes
or box cars is the sum of those
two probabilities or 1 in 18.
Hardy-Weinberg, again,
basic probability
of two random gametes
fusing given that they come
from the same population
with allele frequencies.
So we're talking about a
single nucleotide polymorphism,
when will a SNP show a most--
at what allele frequency will a
SNP most likely show a different
genotype between two
unrelated individuals?
And you will remember
I said before--
oops-- that it was 0.5, 0.5.
And mathematically
you can calculate it
and that would be the answer.
So, the probabilities of
two independent events,
the probability of each
as a product of one or two
or more mutually
exclusive, the sum.
So the probability
of two individuals
having the same genotype
at one SNP is this
simply the difference
between one and the
previous one.
So, for individual
identification, not particularly
of interest to anthropology
but certainly a major issue
and the main way DNA is
now used in forensics.
We wanna find SNPs with
high heterozygosity, I--
close to 0.5, 0.5 to
minimized the probability
of two unrelated individuals
being the same and we want
to find SNPs that have the
smallest allele frequency
variation around the world.
So the probability is
not related to ancestry.
Right now, for the
CODIS markers in court,
you need to present the
probability of this genotype
for which the defendant
matches the crime scene DNA.
If it's African-American,
if it's European-American,
if it's Asian, in some
cases, if it's Hispanic.
Hispanic has no genetic
meaning or whatsoever.
But the US census has it.
So when we fill out to grant
application, we have to say,
"Are we studying Hispanics, et
cetera, and have to justify, no,
because it's a genetic study."
So, we identified a set of SNPs,
40 of them that constituted
a good preliminary panel
and published that.
And here, the match
probabilities,
again, around the world.
Mbuti is a very small
isolated Pygmy group.
I've already mentioned
the Samaritans.
The Nasioi-- ah, who wants to
explain who the Nasioi are?
Okay. They were part
of her PhD work.
It is one group on the
island of Bougainville
of Papua New Guinea, one
of the Solomon Islands.
There are 27 different
languages spoken on that island,
meaning 27 different tribes
if you will or ethnic groups,
and this is one of them.
So it is isolated.
It's relatively small.
And then the various
Mexican-Pima and the Surui
from the Amazon Basin
also relatively isolated
and low amounts of
genetic variation.
And yet look even a
Nasioi the probability
of unrelated individual
being the same is less
than 10 to the minus 12.
This is in the range of what
the CODIS markers give even
in European populations.
So we've subsequently
published 45 SNPs that get
up another two orders of
magnitude improved over this
and that being worked into
commercially kits, we hope.
Though I just wanna mention,
there are ethical problems
in dealing with this
kinds of studies
and our informed consent says
there will be no intellectual
property or commercial use based
on studies of these samples.
All data must be made public.
So when I talked about
commercial panels,
I don't get a penny out of it.
This is free for any
company to develop,
the data are publicly available.
Here are just examples of two
with the lowest variation out of
that panel and you can
see their pretty uniformed
in the allele frequencies.
So for ancestry identification,
empirically clearly
allele frequencies vary
around the world and the
underlying principle is
that the population in which the
observed genotype is most likely
to occur is the population
that the unknown is most
likely to come from.
Now, every genotype is
rare in every population.
So the absolute probability
is meaningless.
We have to think in
terms of likelihood
and relative likelihood
to interpret.
So the likelihood is
related to probability.
But probability is
the probability
of an outcome given
a hypothesis.
Snake eyes, given the hypothesis
that each die had six sides
only one of which is a one
and both are unloaded.
So, it's 1/6 times 1/6.
So there's a clear hypothesis.
Likelihood, is the likelihood
of a hypothesis given
the set of data.
So what if you roll
the pair of dice
and you rolled them 500 times
and you never got snake eyes.
So here's your data,
what's your hypothesis?
I think they're probably loaded.
So you talked about
the relative likelihood
of different amounts of loading.
So, likelihood is
proportional to the hypothesis--
to the event given
the hypothesis.
>> So, it's a proportionality.
But the constant of
proportionality cancels out
and that's why we look
at relative likelihood.
So, if we've got two different
hypothesis, like flip a coin,
we expect heads and tails
50 percent of the time.
I flipped the coin a 100 times
it always comes up heads.
I got two hypotheses.
It's a coin with two heads
or I've been very unlucky.
The probability it's a coin with
heads and tails is very low.
The probability if it's heads
only, both heads, is one,
the probability of that
outcome under the hypothesis.
So we can look at the
relative likelihoods
of those two hypotheses.
So, slight digression but it
becomes relevant there are lots
of scientific issues in forensic
DNA use and these are scientific
in quote because they
related to quality of DNA,
the use to which its put, how
close the match is it likely
to be a relative versus just.
There's always a
probability of a close match.
So, none of this is foolproof.
The underlying science
can be pretty rigorous
but the implementation is not
always and that's important.
So, we've always got
chain of custody.
When I testified, you
know, is this the pattern
of the defendant and
the DNA, I think.
Those two patterns are the same.
That's all I can say.
If that really is the
DNA of the defendant
and that really is the DNA
of-- from the crime scene,
then yes, they're the same.
But as an expert who hasn't
been involved that's a chain
of custody issue that
I can't attest to.
So, all of this becomes
relevant.
So let me talk a bit
about another resource
and that is ALFRED.
It's a database that I
started in the late--
the mid-1990s, I guess.
We're currently up to 35
million allele frequencies
where you can learn
about an allele
if it's been studied extensively
and here is the front
page, the home page.
We can look here, the 35
million allele frequency tables
for a molecularly
defined DNA polymorphism
in anthropologically
defined population with links
to answer anthropology
databases, molecular databases,
et cetera to provide
that resource.
And as a reasonably new
feature, we've added SNP sets
to the database because
as a forensic scientist,
you wouldn't be interested in
one SNP, you'd be interested
in ethnic identification
or ancestry identification
panel of SNPs.
So we have several here, some
of which come from our lab,
the SNP for ID groups, Sanchez,
Philips, the CODIS DNA markers
and there are some other
sets that didn't fit
on this screen shot, here
is going into what we have
as our final set of 45
best unlinked markers
and we can sort this list,
by any of several criteria
and we can also click
for other views.
For ancestry informative
markers, we're interested
in allele frequencies in
different populations.
You have to have that support.
And so to make this
data meaningful,
we have all of that support in
ALFRED online as a reference
that hasn't be tested but
should be readily acceptable
in the court because
it's publicly available
and highly feted by publications
supporting the data.
So the opposite of IISNPs,
the highly variable SNPs,
here are five different ones
that vary a lot around
the world.
There are multiple panels
not just the one we're trying
to work on now.
No single panel is
going to be perfect.
But the problem is most of
the existing panels that are
out there and also use for
ancestor identification tend
to be based on continental
samples, Asian, European,
African, Native American, most
not even Native Americans,
as though all Africans
are the same,
all Asians are the
same, et cetera.
Only one panel published
by Sheldon
of a 128 SNPs has been
studied on a lot samples.
His panel was published by Kosoy
et al in 2009 and this bring--
or this winter, January, my
wife, that's the kid there,
extended that by pulling
data in from other databases
in 67 populations from
other lab to bring it
up to almost 5,000 individuals
from a 119 different
populations.
More on that, here's
one of the SNP panels
that the ancestry panels
for the SNP for ID group
and you see DARC here.
Remember, that's the
Duffy blood group.
If you went in the real online
version and click Google Map,
you would get this
generated out of our database.
And you can see that here
the C allele that provides
that resistance to vivax
malaria essentially fixed
across the tropical
Africa becomes very rare
in Northern Africa,
but it's present also
in the Mediterranean population
and then essentially
absent every place else.
Here, two of the
skin color genes.
I showed you before
those surfer plots of two
of them, here are two.
One of them, the red one,
has a much more geographically
restricted distribution
than the other one.
Notice Southwest Asia.
The blue one is still nearly
fixed for the light skin
in allele but the other
light skin allele is 40
to 50 percent allele frequency.
The same with moving
in to central Asia,
you see a rapid decreased
in the red
and still reasonably high
frequency of blue and the other.
Elsewhere in the world,
both are quite uncommon.
>> In North Europe the
right-hand side here are the
Northern European populations.
Clearly both are essentially
at a 100 percent frequency.
So, we're working
to try to find more
of these alleles in a good set.
And as that figured just showed
phenotype informative can also
be ancestry informative.
So we're trying to include both.
One of the things we'll talk
about and show you and you get
to played with tomorrow is a set
of 39 provisional ancestry
informative markers
that have been studied
now in 43 populations.
And you can put in an
individual genotype and find
out what population
that individual came
from by relative likelihood.
Here is a structure plot.
You'll also get a chance to use
this tomorrow where the program
if you tell it how many groups
of individuals there are,
the K value on the left.
The program will try to come up
with to Monte Carlo simulation
so it doesn't always come
up with exactly the same,
but tries to get the best
estimate of allele frequencies
for those three groups or
however may you specify
that fit most individuals
it does need
to consider their ancestry
but we can then display
it by ancestry.
So here you can see at
the continental level,
we got a pretty good split
between Africa, Europe,
East Asia, Pacific-- and the
Americas, but Central Asia looks
so little funny and the
pacific doesn't come
out until we allowed the
program to have more degrees
of freedom to fit these data.
And what you see starting
at 5 and then through 7,
at 5 Southwest Asia come
out as distinct date.
Remember the tree that
had that little cluster
of Southwest Asia before
the cluster of Europeans.
We're beginning to see this
with only 39 SNPs whereas the
other was almost 3,000 SNPs
but randomly selected.
And then we're beginning
at K equals 6 and 7 to see
that there's variation within
East Asia and the Pacific.
So you can read this and
we'll talk more about it.
The other database that we were
funded to developed is FROG.
I like nice names for things.
So here, URL, it's
online, it's free,
everything is open
to the public.
It's a pilot version.
I'd say right now it's
approaching a beta test point.
We've only actually had anything
up on the web for a little
over a month and we
made a special port
to get things refine so you
can play with it tomorrow.
And there data input tables.
You can put it a genotype and
what you've get out we put
in the genotype with those 39
from Mexican-Pima individual
and what we've got
match probability,
now the title is
changed for that.
It's the probability of that
genotype in that population
which is interpreted
as a match probability
for individual identification.
It's what would be reported
in the court case and notice,
great, Mexican-Pima comes
out of top but the next one
which is not significantly
different is the Hakka
from Taiwan.
Now, these are markers selected
to show no variation
around the world.
And if you look, there are only
two or three orders of magnitude
between Native Americans
and Yoruba from Nigeria.
There's very little information
on ancestry here even
though Mexican-Pima did come
out on top.
And you can graph it and
the cursor is a running one
that will tag you the
population under that line.
If we put in for an
ancestry informative one,
here's a Korean.
That comes out on top.
But the next three
by likelihood ratio
where significance is
roughly think of an order
of magnitude power of 10.
It's pretty good
measure of significant.
The next three are not
significantly worse.
So what could you conclude?
Well, all of the top ones
are East Asians and all
of the bottom ones are
Africans and notice it 30 orders
of magnitude difference,
10 to the 30th.
You can conclude pretty safely
almost certainly an East Asian,
very clearly not an African.
And here, notice again,
the range of the logged
likelihood is 30 orders
of magnitude.
So interpretation, we can
go through this tomorrow
as we start on the computers.
So I think SNPs of both
ancestry informative,
individual identification, and
phenotype inference are going
to be very important in
forensic anthropology.
We can already begin
to get some inference.
I didn't show you the locus
that's thin and/or wavy
versus really thicken
and straight hair.
There's a gene that seems
to be involved in that.
I don't have the
thick hair alleles.
There are other things and
several studies underway
for facial inference based
on SNPs, some very big ones,
some not so big, not so great.
I'm involved with the "not
so big not great" one.
But here are various references,
here're the necessary
formalities.
Okay, will there gonna
be any questions?
Yes?
>> I have a couple of--
I don't think questions,
much as comments.
The population genticist
in me loves this stuff.
I think it's fantastic and
I used one of these programs
to answer anthropologic
questions and demonstrate
from a biological variation,
but the forensic anthropologist
in me says there are some
practical considerations
that suggest this
technology in the long way off
to being implemented and I
don't recall how often you come
in contact with forensic
anthropologist but many
of them are concern
of DNA and its role
in forensic anthropology
issues like this
and I guess my comments are
forensic labs are being bound
by their SOPs and starting new
panels are gonna take a while
and they're gonna cost a lot of
money in order to get personnel
that are trained
to interpret them,
that are gonna optimized
them on their machines.
>> And so that is one comment
where I think this is
still a little ways off.
And my second comment
is I come from a state
where I work in a
coroner system.
And coroners have
a very small budget
and our two questions are
how fast and how cheap?
And during a lot of
skeletal DNA work,
I know the how fast is usually
not as fast as you wanted
and the how cheap is definitely
not as cheap as you think.
And when it comes to statements
like high probability,
they came from a certain
part of the world,
I think that's what we do
as forensic anthropologist
and that we can look at
discreet traits and answer lot
of those questions much
faster and much cheaper.
So I guess my general comment is
that I think this is
fantastic technology
that I just think it is
practical applications are a
long way off in the
forensic realm.
>> They are clearly not here.
I disagree in my opinion
of how useful they will be.
They can be very cheap.
I would argue that they are
probably a lot more accurate,
in adequate numbers, and
probably much easier to document
that accuracy with
respect to ancestry.
That's not saying you can't
do a good job from bones.
I mean, I'm-- my first
faculty position was
in anthropology department.
So I'm-- I consider myself
an anthropologist as well
as a human geneticist.
So the technology is
evolving so rapidly.
It is now possible
for 5,000 dollars
to get a whole human genome
sequenced at 30X coverage.
Meaning, on average every
nucleotide is sequenced 30 times
which makes you very
confident of that nucleotide
of that stretch of DNA.
It means, some will be poorly
covered but 30X is very good.
>> There still is always the
lag between forensic labs
and academic labs, and
that's because they are
so strictly bound by their SOPs
and what they're able to do
in those laboratories.
So I'm not gonna--
[ Simultaneous Talking ]
>> Okay--
>> -- say that --
>> so--
>> -- technology, I just think
that it's gonna take a little
bit longer to implement them.
>> I think it's gonna
take awhile to implement.
But when there is better
and cheaper technology,
it should be implemented.
>> It should be but it's not.
It's-- And those people
that work in academia
and forensic see that things
don't always work the same way
in both fields.
>> I know that.
[ Laughter ]
Maybe, I'm an optimist,
but it's coming.
I know of depart-- forensic
departments that are trying
to work towards implementing
this, sort of technology,
a lot of money is being
spent on research.
Right now, I've got good funding
from the National Institute
of Justice to work
on the FROG database
and to improve phenotype
and ancestry informative
marker panels.
And because it's
considered important.
Now structure which
you're all gonna play
with a little bit is very good
at the population
genetics level,
it's used throughout
the literature.
It's not good in
forensic application.
That's why we're
developing FROG.
We can give you the
likelihood of--
for every population that is
documented for a set of SNPs,
we can give you the likelihood
of that particular genotype
in anyone of those populations.
And the more populations we
have and the better our markers,
we can make a fine distinction
between different
regions of the world.
I don't know because
that's not my expertise,
but how fine a distinction
can you make looking at bones
in terms of what population
that individual came from?
>> Usually pretty good.
>> What's good?
[ Simultaneous Talking ]
>> I saying that I come
from a rural state we--
>> 50 different...
>> We don't have a lot
of money and I just--
even knowing about DNA
technology the thought
of telling cornors-- you
know I can understand
for positive identification,
you know, often times they would
like to spend the money for
DNA positive identification,
but to spend extra money on the
ancestry when that something
that we can use other
tools to do.
>> How much do you
think it would cost
to get an estimate of ancestry?
>> I don't know.
I know that the labs-- the
DNA labs we send stuff to,
for CODIS chart almost a
thousand dollar a sample.
>> Yeah, CODIS is an
antiquated technology.
And with SNPs, you
can probably do it
for a couple of hundred dollars.
[ Inaudible Remark ]
>> And probably with faster
turnaround than CODIS.
>> Are you talking about
using it as a chip technology?
And-- and--
>> Yeah.
>> -- what are some of the
issues with trying to do
that with copying loci samples.
Are-- are there issues, have
of any of these have been ran
where the skeletal
samples that you know of?
>> No. But these are markers
for which there are not
known copy number variants.
So, chips are used all the time
to identify copy number variants
that somewhat different
than typing SNPs on a chip.
It's a slightly different
use of the chip.
But yes, I think
you could use chips.
That's what everybody
is thinking about.
In a research setting, we
aren't, because we are trying
to find the SNPs by
searching likely candidates
that we then have to test on
all of the other population.
So we've got a very
high throughput system
of typing 3,000 individuals
in two hours,
you know, for one SNP.
And automated with
robots and databases
and electronic transfer and
et cetera, then we spent many,
many hours analyzing the data.
[ Laughter ]
>> [ Inaudible ]
data
>> So, yeah, there's a--
>> Our biggest issue is just
dealing with skeletal samples
in particular and it's just
difficult to deal with.
I just wonder what the issues
might be on trying to--
I haven't ever tried to
use a chip technology
[ inaudible ]
. I was just wondering
if there was any kind
of issues that you know.
>> I think they're
pretty automated.
In the future, I think--
>> -- all the stuff that
comes along with it.
I mean, they're not
purest of the samples,
and there's not a lot of
DNA and I just wonder what--
what potiental problems
there could be.
>> All of those are
potential problems
and you won't always find
usable DNA in a skeleton.
>> We can if we use the
proper extraction techniques
[ laughter ]
but--
[ Simultaneous Talking ]
>> Well, it depends on
how old the skeleton is
and the conditions in
which it's been kept and--
>> No, we can do it.
We just have these proper
extraction techniques,
but we can't put those
into laboratories
because they can't get those
validated and then premitted
into the laboratories to be
able to get good enough DNA
out of the samples
because that takes time
and so we can't even
been able I mean--
>> But most of your concerns
are just a matter of,
it's craking the ice.
You know, right now,
Dr. Kidd is, you know--
>> Well, I love it!
>> I want to do it!
>> I want to try myself--
[ Simultaneous Talking ]
>> We see the opposite
side, where the DNA labs
where you don't have
enough money
that process the samples there
and you have limited
technology implementing
such as this question of--
I think that academics
where performing people
were doing good work
but the question
is how do we get
[ inaudible ]
the government is
funding a research.
The government going to buy
you the machines for the labs?
And train personnel?
That's the questions.
>> Some labs somewhere,
Colorado,
we're going to take a chance and
they're gonna have for reason
or another have a
surplus of money
for a short period of time.
And then the results
would get out.
And police departments and DA's
offices will say, you know,
ME's office and they'll
talk to me and they'll say,
well we heard that Colorado was
doing this can you guys send
over there?
So, sure. We'll send
off to to Colorado.
Colorado will do it.
Then they'll start
getting more business
[ inaudible ]
.
>> I think it does
become an expectation
[ inaudible ]
.
[ Simultaneous Talking ]
>> There's little-- you know,
they don't show it on CSI and
[ inaudible ]
.
[ Laughter ]
>> Well, yeah.
I always say that, you
know, the real world
in the forensic lab is not
like CSI where the DNA sample
or the blood sample or the
crime scene sample comes in,
and the DNA results
are available
after the next commercial.
No. Not that simple but I am--
>> Mostly I was just asking
if you know people
are already working
on issues, seeing any-- .
>> I know Broward County has
tried the TaqMan OpenArray
technology on SNPs and they got
some control samples from us
and we're doing some
of these as a test.
There are papers coming
out at various times.
Europeans have it very much
approved to use their panel
of 52 individual identification
SNPs which are not nearly
as good as our 45 SNPs because
there's much more variation
around the world.
Still, the largest probability
they've got in any sample is
around 10 to the minus
8, 10 to the minus 9
which is still adequate for
most courtroom situations,
whereas our highest
probability is in the Nosioi
where currently it's 10 to the
minus 15 and there are not 10
to the 15th individuals.
[ Laughter ]
>> Just a little
food for thought
as we send you on the way for
[ inaudible ]
. I do wanna mention
that, remember,
Ken started out by saying that
the most compromised samples
from the World Trade
Center disaster were the one
that were processed
by SNP analysis.
So they really are
suitable for--
>> CODIS, right?
>> No.
>> Sort of short tandem repeat.
>> Okay, yes.
>> They require electrophoresis.
>> What SNP typically do
[ inaudible ]
?
>> They tried some, but
that's what motivated me
because those they tried had
little empiric documentation
of how well they would work.
So they were know what they
were like in Europeans,
but nobody knew what it
would be like in Chinese
or Japanese what the
allele frequencies were.
So that was what
motivated me having done
so much testifying early on.
I had a clear sense that
the courts were going
to require pretty
good documentation
if this was ever gonna be
accepted in the courts.
So that's why I started
to use our populations
to provide documentation,
that's why we're working on FROG
to make all of this ability
available to anybody.
If you've got that
set of SNPs typed,
we can tell match probabilities
or ancestry likelihoods.
>> And they work well on
compromised samples, so.
>> Well, SNPs worked very
well on compromised samples,
much better than STRs.
>> [ Inaudible ]
companies may use CODIS because
they can make comparisons
across the-- some counties.
So if we were to
implement a new panel--
I guess the question is then,
how we go back and we have
to reanalyze all the stuff
that's already stored in order
to be able to use the panel?
>> Right. And buggy whip
makers are not still
around because nobody
uses buggies much anymore.
There's a problem of
shifting to a new technology.
I think for individual
identification
which is what the database,
the big database is good for,
there's likely gonna have
to be some parallel testing.
It will certainly go on as
the testing in a real lab goes
on to get it more
broadly accepted.
There will have to be a
countrywide acceptance
of a panel that everybody
will use
and several people are
advocating the one I developed
which so far is plenty
good and then there has
to be a technology
and willingness.
But in 10 years, sure for
old cold cases maybe it's
still relevant.
But 10 years from now,
a lot of the individuals
in the database are not
gonna be committing crimes.
They're gonna be dead.
Major criminals have a very
short life expectancy or,
you know, if they're
out that had shortened,
if they're in jail for life,
they're no longer
committing crimes.
So the existing database
transcends out overtime.
So sure, in two years,
three years, five years,
of course it's still relevant.
What we have now, how relevant
will it be in 15 years?
Not very. So one has
to be thinking today is not
the only thing to think about.
So I'm not arguing that this
is gonna happen overnight.
I well recognize all the
problems, maybe not quite
as intimately in terms of
the bureaucracy as you do.
I don't wanna know that.
I got enough bureaucracy
during the research.
But given my experience with
forensics and testifying
and the World Trade
Center advisory committee,
the Katrina advisory committee,
I think I have a pretty good
understanding of the system
and what's needed at least from
the scientific point of view.
And I'm trying to encourage
companies to market a panny chip
that will implement these
45 individual identity SNPs.
The ancestry were
still too far away.
But in another year or two,
there probably would be a small
ancestry informative panel
that most people could agree on,
could be subsequently
amplified for finer structure.
>> Can be exciting
in part of this is,
that it can all be done
together on one chip.
We could have all the
ancestry information,
[ inaudible ]
individual identification all
together and could be 1 to,
200 dollar cost to
run this thing.
>> Right.
>> Which is exciting.
>> Yeah, when the chips
are mass produced--
>> You're under utilizing
it just many on there.
>> Yeah, they're very cheap.
And in the meantime there
are do-it-yourself chips
where you put a tag on the
end of your PCR primary
that binds only to one cell.
And so you can try it with
do-it-yourself technology,
if you will, in an
individual lab without lots
of really expensive equipment.
[ Music ]
