JEANNETTE WING: So I'm going
to talk about Data For Good,
Data Science at Columbia
University.
But I'll just say Manuel.
As Manuel has said, because
of my experience in industry
at Microsoft Research,
and in government
at the National Science
Foundation,
and, of course, in academia
at Carnegie Mellon
and here at Columbia,
I'm more than happy to answer
questions on almost anything.
So please feel free to ask.
So what I wanted to start out
doing was putting in context how
I think of data.
And, of course, so I'm
going to talk about the data
lifecycle.
It starts on the very left
with the generation of data.
Now, in science we've
been generating lots and lots
of data for many, many years.
If you think about large
scientific instruments
like Large Hadron Collider,
telescopes in Chile,
neutrino detectors in the South
Pole, these one off very
expensive scientific instruments
have been generating
and continue to generate volumes
and volumes of data.
And so scientists have been
dealing with lots and lots
of data all the time.
And I think the finance industry
is the same.
What's new to kind
of the universe is more recently
that we, people, are generating
lots and lots of data.
Of course, we've always done
that,
but through our digital devices
and our interaction
with the digital world,
we are generating lots and lots
of data.
And it's these, if you will,
big companies that are
collecting all this data
about us,
to the point that others know
more about us than we know
about ourselves.
So generation of data.
Then we collect the data.
We don't always collect all
the data we generate.
We process the data.
Under processing I include
encryption, compression,
but also the less sexy things
of data wrangling,
data cleaning, and so on.
Then we actually store the data
in some medium.
Then we usually store it
in a way that we can retrieve it
quickly, access it quickly.
So that's where
all the database, data
management aspects of data
come in to play.
Then there's data analysis.
And I think this is what--
it's AI and machine learning
that people usually identify
with data science.
It's this data analysis phase
which really is where
the machine learning and the AI
are coming to fore.
But it's not enough to spit out
probabilities, or yeses or nos,
or cats or dogs.
One has to visualize
these results.
And that's where data
visualization comes to play.
And it wasn't until I joined
Columbia University
and I talked to all
my colleagues
across the university,
in particular the School
of Journalism, when I really
came to appreciate
that last step I put on my data
lifecycle.
The journalists call it
storytelling.
I call it interpretation.
And that is really,
it's not enough to show a pie
chart or a bar graph.
You really have to explain
to the end user, what am I
looking at?
Tell a story about the data
interpretation.
So I like to emphasize
the privacy and ethical concerns
throughout this data lifecycle.
I think that's very
important from even the start
of what data do we collect
to what data do we analyze?
What answers do we provide
the end user?
So let me also share with you
a very succinct definition
of data science, which I think
this audience will especially
appreciate.
Data science is the study
of extracting value from data.
And there are two
important words
in this definition.
The most important word
is value.
And I define it deliberately
to be value
is subject to the interpretation
of the reader.
So value to a scientist
is discovering new knowledge.
But value to a company
likely accrues
to the bottom line.
In fact, it's very likely
calculable.
And value to a policymaker
is information
so that the policymaker can make
a decision
about the local community.
So the other important word here
is extracting, because it takes
a lot of work
to get this value from the data.
And now I want to share with you
my three part mission statement
for the Data Science Institute.
The first is to advance
the state of the art in data
science.
This is about pushing
the frontiers of the field,
about doing basic research,
about doing basic long term
research,
and inventing new techniques,
new discoveries,
and new science.
The second is transforming
all fields, professions,
and sectors
through the application of data
science.
And this really speaks
to the prevalence of AI, machine
learning, data science that's
affecting all fields,
all sectors as we see it today.
Everyone has data,
and data is everywhere.
And data today is what feeds
those very hungry machine
learning algorithms.
So with a lot of data
you can do a lot using
these machine learning
techniques.
And finally, ensure
the responsible use of data
to benefit society.
This really speaks
to both benefiting society,
using data.
And this is really tackling
societal grand challenges,
like health care, energy,
climate change, the UN
sustainability goals,
for instance.
But it's ensure
the responsible use, which I
inserted in my mission statement
that I really want to emphasize.
With a lot of concern that we
read about every day in the news
about biased algorithms,
biased models, or biased data
that we feed our algorithms
to produce biased models,
I think it's very important
that we, technology people,
try to ensure that what we do
will be non-discriminatory, be
fair, and take
into consideration
these ethical concerns.
So I summarize this long winded
mission statement
into my tagline Data For Good,
data to do good for society,
and also using data
in a good manner.
So what I wanted to do--
oh, a few more facts
about the Data Science
Institute, and then I'll
go through some research
stories.
So actually at Columbia
University, the Data Science
Institute is a university level
and a university wide institute.
So we have over 350 faculty now
from 12 different schools
across the university.
Every single profession
is represented.
Every single discipline
is represented.
Every single sector
is represented, from arts
and sciences, architecture,
business, dentistry, all
the engineering disciplines,
public policy, journalism, law,
medicine, nursing,
public health, and social work.
So this is partly reflecting
the breadth of the university.
We have a few centers that are
thematic.
I'm not going to belabor this,
but just to say that we have
some themes that pop out,
like business and financial
analytics.
We have a center there.
We have a center in health
analytics,
computational social science,
and so on.
We have a robust masters in data
science program at Columbia
University.
And this is just to show you
how we define a minimum bar
of what it means to be a data
scientist.
I share this with you
because in industry today
across the country,
across the world there's
a lot of confusion about what
is a data scientist?
And there are a lot of titles
out there.
And so I thought,
well, let me set a bar for what
makes a data scientist.
So we have a highly selective
program, and a very rigorous
program.
So there are six
required courses, three
in computer science, three
in statistics,
and then the capstone course,
which is where an industry
person or affiliate comes in,
brings industry data to a team
of students.
They work on that data set,
and answer real world questions,
driven by the industry
affiliate.
And, of course, everyone gets
a job.
So that's our program.
I should mention, we just
started a PhD specialization
in data science at Columbia.
So if you're already enrolled
in certain PhD programs,
you can take
these additional courses
and get a specialization.
I wanted to mention a couple
of things going on
in the education space.
There's a program called
the CoLaboratory that's joint
between the Data Science
Institute and Columbia
Entrepreneurship.
And we have now run this program
for a few years, which requires
that professors from two or more
different disciplines
come together, co-design,
and co-teach a new course where
one of those professors
is computer science, or data
science, or applied math.
It's really to bring
computational and data science
expertise to some other domain.
And the one I want to highlight
is not just a course,
but an eight course curriculum
that was designed
by the business school
along with computer science
and data science to the point
where now 50% of the MBAs coming
out of Columbia
have had some exposure to data
science.
So this is phenomenal.
And the only limiting factor
here is capacity.
So if we could,
it would probably be 100%
eventually.
So what this says to me is,
first of all, students are
always smart.
They see the future.
They know what they need
to learn while they're in school
in order to be a good productive
employee in any discipline
later.
We have a robust industry
affiliates program.
I am double checking that JP
Morgan is there.
Yes, it is.
And we even, this past year,
expanded it
to international companies,
like three from China,
Alibaba, Baidu, and DiDis should
be up there.
And we have a company
from Brazil.
Very recently, we created
a center with IBM specific
to blockchain and data
transparency.
This was actually quite
a big deal for Columbia.
And it's going strong.
We have three tracks, one
on research, one on education,
and one on innovation, which
is really an accelerator,
incubator track.
So startups can get nurtured
and then spawned out
of this center.
So now what I wanted to do
in my remaining time
is just to share with you
a few research stories to show
the kind of work that's going on
at Columbia in Data Science.
And I'll just do one
on advancing the state
of the art,
and a few in terms
of the other two parts
of the statement.
I wanted to emphasize
that unique to Columbia
University, the foundation
of data
science builds on three pillars
of strength.
Computer science, of course.
Statistics, of course.
But also operations research.
So at Columbia we have a very
strong OR department
in the School of Engineering,
and a very strong OR group
in our business school.
In fact, they all work together.
And so from afar, it's really
a great strength of Columbia OR.
And OR, a lot of it
is optimization.
And there's just so
many similar techniques
and interests.
And OR, I think, as a field,
is moving more into machine
learning, data science,
and so on.
So the one story I wanted
to share with you in terms
of advancing state of the art
has to do with causal inference.
Of course, causal inference
has been of interest
to statisticians for decades.
It is the bread and butter
of economics, political science,
and so on.
And I think the machine learning
community has deliberately
and rightly been shy to ever
infer causality in all
the patterns they recognize.
They say it's just
a correlation.
But still what we want,
decision makers and policymakers
especially, want to know
does this cause that?
Does smoking cause cancer?
That's the canonical example.
So what I want to share with you
are some new results by Yixin
Wang and David Blei
on multiple causal inference.
And it turns out
that the classical causal
inference problem is
a univariate where you just want
to know about a single cause
having an effect.
But it turns out
that the multiple causal
inference problem is actually
more sort of a prevalent
problem.
And it turns out also
to be an easier problem to solve
with weaker assumptions.
So let me frame it in terms
of an example.
So pretend I'm a movie director
and I want to choose actors
for the movie I'm going
to produce.
And I want to know
how much money am I going
to make?
I want to predict how much money
I'm going to make.
So what happens to movie revenue
if I place a certain actor
in my movie?
And what I have at my disposal
is a little database.
It says, for this movie,
for these actors I made
that amount of money.
And mathematically
in a statistics point of view
of causality, we would frame
this as solving this equation,
or solving this expression,
estimating
the potential outcomes of a set
of actors in my movie, I.
Or if you are a computer
scientist,
and you learned causality
in some course,
or you read about it
through Judea Pearl's book,
The Book of Why,
you might express it in terms
of the [INAUDIBLE] notation.
They're essentially equivalent.
So understanding causality.
And this, by the way
is multiple causes.
The problem with understanding
causality is that there can be
many possible confounding
factors that can influence
the outcome.
And I want to account
for those confounding factors.
Otherwise, I will overcount
some factors and undercount
others.
So this problem
has many applications,
whether it's in genetics.
What genes cause
a particular trait?
The people I choose on my sports
team.
I want to know how many points
I'm going to score.
Or prices in a supermarket.
How much money is going to be
spent?
It depends on the prices
in the supermarket.
So
in classical causal inference--
by the way, for the movie
example, let me just give you
some examples
of some confounders.
And I think that will motivate
why it's a difficult problem.
So if I were making an action
movie, the genre of the movie
is likely to affect the revenue.
Action movies make more
than artsy movies.
And even knowing that I'm going
to make an action movie,
will likely affect who I'm going
to choose to be in my movie.
And then who I might choose
in a movie might affect who else
I might choose in the movie.
And whether the movie's a sequel
or not
is another confounding factor.
So today when we do
classical causal inference
in terms
of single causal inferences,
the approach is, you think
about all these confounders,
genre, sequel, other actors,
and so on and so forth.
That's the human task.
And then assuming you
have thought of all
the confounders, then you can
basically plug the numbers in
and estimate the causal effects.
That's the right hand side
of the equation.
The problem is it's
a big assumption.
And it's untestable.
But we seem to be OK with that.
But this is the way it is.
So we make this assumption
that we've been really smart.
We've thought of everything.
And then we crank it out.
And we get our numbers.
We base everything on what we
just cranked out, probably
forgetting that we made
that big assumption.
So in the new approach,
under the assumption that we
have multiple causes, the idea
is to construct what's called
a deconfounder.
And the beauty of this approach,
there are two advantages.
One is there's a weaker
assumption.
And the other is one can test
the model or the deconfounder
that you're constructing
against a goodness function.
So it's more constructive.
So the basic idea is to fit
a local latent variable model
to the assigned causes.
Those are the observables.
So think of a factor model.
And then infer the latent
variable for each data point.
That's what the z hats sub
i's are.
And it is a substitute
confounder.
And then instead of using
the w's from the previous slide,
which are the confounders I
thought of, use the z's, which
come out of the model
that I've constructed.
And then you use
the usual right hand side
construct
for the substitute confounder
in the causal inference.
And the only assumption that we
need to make is that there's
no unobserved single cause
confounder.
In the classical causal
inference case, we had to assume
we thought of all
the unobserved confounders.
So it's a weaker assumption.
Moreover, once you construct
this model, you can actually
test it for goodness
against some function
that you define as good.
And then there's a proof
in the paper that shows it's
an unbiased inference.
So this is actually,
I think, quite a move
forward in the grand scheme
of causal inference.
So if at all you're
interested in this, there's
a paper you can read.
So let me go back to movies.
What does this mean once we
construct a deconfounder based
on that little database
that I showed you?
This is a snapshot of a James
Bond movie.
Sean Connery used to play
the role of James Bond, 007.
I don't know if any of you
are old enough to remember Sean
Connery.
You probably remember Roger
Moore.
But anyway,
with the deconfounder, Sean
Connery, the person who played
James Bond, his value goes up.
Whereas, unfortunately, those
actors who played the lesser
roles, M and Q, their values
go down.
What this means
is that without the deconfounder
Sean Connery's value was
underestimated.
And M and Q's actor's values
were overestimated.
So the deconfounder corrects
for that.
And then once you have
this model, just as with
any causal model,
you can do
this counter-factual reasoning.
What if?
What if this?
What if that?
OK.
So that's one story in terms
of advancing the state
of the art.
I think causal inference is
still a very hot topic.
And it's always been
for statistics, but it's really
rearing its head in the AI
and computer science community
now.
Now, what about transforming
all fields, professions,
and sectors to the application
of data science?
What I wanted to do
is run through a lot
of little stories
to show you the breadth
of what's going on at Columbia
University.
And I'm going to start
with some science stories,
in particular biology.
And there's not
sophisticated machine learning
going on here.
It's just the big data problem,
where you're using this DNA
sequencing, in particular
of the microbiome,
around pancreatic cancer tumor
cells.
And what the scientists
discovered
is
that
the microbiome
around the pancreatic cancer
tumor cells were counteracting
the effect of the chemotherapy
used to treat the tumor.
So this is not good.
It basically says
that chemotherapy is
ineffective.
But the scientists went one step
further to show that if you
inject the tumor cells
with an antibiotic,
that antibiotic would counteract
the effect of the microbiome,
therefore making
the chemotherapy treatment
effective.
So all of that
was done through just lots
and lots of data.
I'm very
impressed by the astronomy
community, because they have
so much data.
And they've been capturing
so much data from all
these large telescopes on Earth,
flying around the universe,
giving us images
of the universe.
And why I'm
so impressed by the astronomers
is that they'll throw anything
at this data.
They're very courageous.
And so this group of faculty
at Columbia,
along with some computer
scientists and data scientists
will use
convolutional neural networks
to look
at weak gravitational lens
images coming
from large telescopes that are
flying around.
And what they showed
is that they were
able to estimate the parameters
of the lambda dark matter
model, which
is a model of the universe,
far better than off the shelf
statistical techniques.
Now, as a person who has
witnessed the use
of neural networks
and deep learning
over the past five years
or so and how it's exploded
in its success in applications
from image processing,
to speech processing,
to natural language translation
and everything,
it's exasperating to me that yet
again we throw neural networks
at this,
and there's huge success.
Because as a scientist what's
exasperating is we don't really
know why these neural networks
are so successful.
But there you go.
For something completely
different-- this is coming from
our economics faculty--
they have been looking
at online labor markets.
In particular, Amazon Mechanical
Turk and other markets
such as that.
And what they've discovered
using this technique
called double machine learning
is that these online labor
markets do not behave
as a regular free market place.
Rather, they behave like what's
called a monopsony.
So all of you probably know what
monopsonies are.
But I didn't when I read
this paper.
A monopoly is when you have one
seller and multiple buyers.
A monopsony is when you have one
buyer and multiple sellers.
And so these online labor
markets actually behave more
like that.
And it's counter-intuitive.
An example of a piece
of information they were
able to generate from all
the data they collected
is here where they show
that high reward tasks do not
get picked up more quickly
than similar low reward tasks.
The idea is that if it were
a normal marketplace,
you would go
after the high reward task cause
you get more money.
But that's not how this labor
marketplace behaves.
For something more, perhaps,
related to your world
is finance and reinforcement
learning.
This is a colleague of mine
in the operations research
department who has been looking
at robo advising.
So this is all very
familiar to you.
If you have a lot of money,
you might have
a personal financial advisor.
And over time
that personal financial advisor
learns your investor
preferences, what your risk
aversion is to investing
in a certain portfolio
of instruments.
And so what he does--
and I just wanted to show one
equation--
basically using
standard reinforcement learning,
over eight or nine iterations
of this formula you learn what
the investor's preferences are.
And what he also goes to show
is that actually the combination
of human and machine
still outweighs either the human
or the machine.
Now, again, for something
completely different,
we have some very modern history
faculty who are using
standard machine learning
techniques,
like topic modeling, a sentiment
analysis, and so on to paw
through documents.
So one set of documents that one
of the history professors
has accumulated is the largest
set of declassified documents.
So he has downloaded everything
that the federal government
produces every year.
And one example of what he's
done with that
is to look at if he were looking
at just the cablegrams
that diplomats sent each other
in the 1970s, he wanted to see
if he could detect
the anomalous events, which
should be
the interesting historical
events.
So each black dot here
represents one
of those interesting events.
And you can recognize some
of them,
the evacuation of Saigon,
the death of Mao Zedong
and so on.
So this is really the history
department and the history
faculty using machine learning,
AI, and so on in doing
their research.
It's actually quite interesting
that the history faculty have
said to me, they're
at this juncture.
And what to teach
their next generation students?
Because there's all
these techniques that you learn
as a historian.
But now all of a sudden
there are
these computational techniques,
these machine learning
techniques,
and knowing that everything is
now digitized,
it's important for the history
students to learn this material.
So now let me talk a little bit
about Data For Good, responsible
use of data.
And I want to share with you
an acronym that I really don't
like anymore.
But it reminds me
of the principles
that we need to subscribe to
in terms of using data for good.
So fairness, accountability,
transparency, ethics, safety
and security.
And I call that FATES.
My sole contribution
to this acronym is S for Safety
and Security.
Others have come up with FAT,
and FATE, and so on.
But what I wanted to focus on
is safety and security.
So there's a system
that actually JP Morgan, thanks
to Manuel's little program,
is helping to fund some larger
efforts that we are exploring
at Columbia University
on looking at deep learning,
and using formal methods
and programming language
techniques to better understand
these deep learning systems.
In this particular work, Deep
Explore, they're using two
techniques inspired by software
engineering and programming
languages
to look at deep neural networks.
And the one technique that I
find very easy to understand
is this notion of neuron
coverage.
So we know from programming
languages and writing computer
programs, when you're testing
a program there's a notion
of code coverage.
So you want to test all
the paths in your program
to make sure that at least
for those paths,
you get the right answer.
So inspired by that idea,
why not use that idea
for a notion of neuron coverage,
where you want to tickle
every node in your network
and every edge in your network,
and see what is germane given
the inputs to the output.
So that's one idea.
And what they found using
another idea called differential
testing was they
took off the shelf state
of the art DNNs used for image
processing, like ImageNet and so
on, and they've basically found
ways--
they've found flaws
in these classifiers.
So in particular, let me give
you an example.
In the first case on the left
is an image for which
the classifier-- and now think
of this classifier
as being in your car,
or your self-driving car,
the camera of your car.
So this classifier looks
at this image and correctly
says, veer to the left.
And that's fine.
What this Deep Explore does
is it finds
natural perturbations to input
images in such a way
that you can fool the classifier
to do the wrong thing.
So in this case,
a natural perturbation
is to darken the image just
slightly, which
is a natural event, because we
don't always drive in daylight.
And once you do that, then
this classifier will actually
say veer to the right.
You hit the guardrail.
You fall down the cliff.
And you die.
And so that's why they call it
fatal errors.
OK.
[LAUGHTER]
OK.
So another interest in data
science and computing more
generally, and especially
with EU GDPR, is privacy.
And what this group did
was combine the notion
of differential privacy, which
has been out
there for over a decade now,
with deep learning to,
rather than test a deep learning
system to see if it will do
the right thing, can we, once
and for all,
ensure that for all inputs
or for a set of inputs
we will guarantee the classifier
is robust to perturbations?
And so they use this notion
of adding noise that we get
from differential privacy
to add a noise layer to the DNN.
In particular, they found
that adding the noise layer
early on
will give us this ability
to prove once and for all that
the output is
robust to perturbations
from the input.
So I think this is very exciting
work.
And it there, again,
shows this combination
of results from different parts
of the field coming together.
So now, let me close
with some stories
about tackling
societal grand challenges.
I wanted to share this one
because it's a New York City
happening.
This is a huge grant
that the combination
of many universities,
including Rutgers and Columbia,
NYU received from the National
Science Foundation
to basically put in a one
square mile area in Harlem--
so right next to Columbia
University--
a test bed that will be advanced
in terms of the kinds
of antennas put in, the kinds
of wireless testbed that people
can play with.
And this is so that we can look
at not just 5G protocols,
but protocols beyond 5G.
And not just
the wireless protocols
that we need,
but the applications that will
sit on top of that.
So this is going to light up
a lot of interesting research.
And having it close to Columbia
and working with the Harlem
community has been really
wonderful.
So in terms of climate science,
as many of you probably know,
Columbia University has an Earth
Institute, which is a university
level, university
wide institute.
And within the Earth Institute
is Lamont Doherty, which
is probably one
of the premier climate science
capitals in the world.
And this work that I'm showing
you here is actually an example
of the kind of infrastructure
needed for many scientists
to share models, share data,
share algorithms.
It's called Pangeo,
another open source platform,
also partly funded
by the National Science
Foundation, but other agencies,
as you can see.
And it is already being used
by climate scientists
around the world to share
their models,
to share their data,
to share their algorithms.
And it's really
a nicely layered stack
of software.
And it's already being used
to do meteorological examples,
hydrology.
And the example I wanted to show
you is not being done by Pangeo.
But this is a simulation done
by NASA and JPL of the ocean
currents flowing
around the Earth.
This is a simulation based on 2
petabytes of data.
And 2 petabytes right now is not
a lot, because the IPCC
anticipates within the next 10
years of generating up to 100
petabytes.
And there's no one system that
can simulate all that data.
And, of course, when I look
at that, I immediately want
to zoom in and out.
I want more than the oceans.
I want so much more.
But this is state of the art.
OK.
My last example is in the health
care arena.
And I wanted to use this example
for a couple of reasons.
First, Columbia
is the coordinating center
for this federated data set
of electronic health records
around the world.
There are already 600 million
patient records
in this federated data set,
coming from 25
different countries, 80
different databases.
What is phenomenal to me,
as the IT person usually
in the room,
is that these records are all
in the same format.
Yeah.
[LAUGHTER]
But once you have this data set,
as you all know,
you can do things you would not
be able to do
in traditional medical science.
So what I wanted to show
you is a couple of examples
of what results they were
able to see from just looking
at these patient records.
No clinical trials.
Just observational data.
So first, the other reason
I wanted to show you this
is to reinforce the point
that data visualization is not
enough.
I have to tell you the stories
behind these beautiful pictures.
So first of all, they looked
at three different diseases,
diabetes, hypertension,
and depression, from left
to right.
Each circle of rings
represents a single data set.
So on top of the topmost
and the middle column is CUMC.
That's Columbia University
Medical Center.
So if you are treated
for hypertension,
you're in there.
And now, what they were looking
at is for each disease,
for each patient, the sequence
of drugs given to that patient
to treat that person
for that disease.
And so first you're given
a drug.
And that would represent
the inner circle.
So, for instance, if I pick
on that hypertension
one up on CUMC, that drug
represented by the orange color
is the first drug of treatment.
If that works, fine.
But if that doesn't work,
then I'm given a second drug.
And that drug is represented
by the second ring
around that circle.
And if that doesn't work,
I'm given a third drug.
If that doesn't work,
I'm given a fourth drug.
And so on.
And, by the way,
I'm only showing you four rings
for each circle.
This goes on for tens of rings,
in some cases.
So the first interesting
observation the scientists made
by looking at just this data
is that if I collapse all
the rings and circles
for hypertension across all
the data
sets that they collected,
1/4 of patients treated
for hypertension
are treated uniquely.
That's pretty astounding.
So that says, Manuela, if you
have hypertension
and you're in that 1/4th,
and you say to me, Dr. Jeanette,
is there anyone else being
treated in the world like me?
The answer is no.
So this is just an observation.
Of course, this observation then
asks more questions.
Like why is this?
What's really going on?
So that's one interesting result
from just looking at the data.
The second is the lower left
hand corner here.
This is diabetes.
If I were to show you
all the rings of circles for all
the data sets that are
represented, they would pretty
much look like the top two,
where this first drug,
this chartreuse drug,
works pretty well.
But you'll notice it doesn't
for the lower left hand corner.
And that lower left hand corner
represents
a Japanese medical clinic.
And it turns out the Japanese
are predisposed
against that chartreuse drug.
This was not known until looking
at the observational data.
And so, of course, this also
raises interesting questions
of why and so on.
And I asked my colleague, well,
does this hold for the Chinese
and the Koreans?
And it doesn't.
So, again,
some interesting new science
to discover what's really going
on.
So I will close now Data
For Good.
Just remember that of my talk
and I'll be thankful.
Thank you.
