- Good afternoon, everyone.
That mic will get you
going for a second there.
Thank you.
I'm Tom Russo, I'm the AVP
Industry Corporate Research
in the office of Vice
President of Research.
And on behalf of myself and
the GW Data Science Institute,
represented by Brian
Wright, and I don't know
if Eric Lawrence is here yet,
we would like to welcome
the faculty who are here,
staff, and students, welcome back.
You had a day off yesterday, so it's nice
for you to be back today.
And we also have some of
our corporate partners
here today, so I welcome
all of you to this event.
Our topic for today is
the future of data science
in the DC region.
And rather than go into all the details,
we have a long program today,
I'll just get on with the logistics
for today's events.
Today's program will
run for about an hour.
The event will be recorded
and will be available
for later viewing on my website,
the DSI website, and the
GW Business School website,
as well.
We will have a Q and A.
We'll go on for about
probably 30 to 45 minutes,
and then we'll have a Q and A.
And if you do have a question,
there'll be two colleagues
with microphones
in the aisle here, if you
just queue up behind them,
and let us know your name
and what school you're from
otherwise they will remind you.
And then, we'll go on.
And then following this,
we'll be having a reception,
in a green room where,
we think everyone can
network and have chance,
to meet our two speakers today.
Our moderator for today's
event is Dean Anuj Mehrotra.
A lot of you may not have met him.
He just came here to GW
in July of this year,
and prior to coming here he
held leadership positions,
at the Miami Business School.
His own research is also an analytics,
in large scale optimization,
in interdisciplinary applications.
Please help me in welcoming Dean Mehrotra.
(clapping)
Our guest speaker for
today is Dr. Kirk Borne.
He's the principal data
scientist, and executive advisor,
at Global Technology Firm.
At Booz Allen Hamilton.
Kirk focuses on applications
of data science,
data management,
machine learning,
AI,
modeling and simulations,
across a wide variety of disciplines.
He's an astrophysicist by training.
And he spent 20 years, almost 20 years,
at NASA on the Hubble Space telescope,
as a data archive project scientist.
He has extensive experience
in large scientific,
databases,
including expertise in
scientific data mining.
Kirk is no stranger to academia,
as he spent 12 years at
George Mason University.
And he was the co-creator,
of their data science
undergraduate program.
Dr. Borne earned his
BS in physics from LSU,
and a PhD in astronomy from Caltech.
Kirk is consistently listed
among the top worldwide,
social media influences in
big data and data sciences.
So big hand for Dr. Kirk Borne please.
(clapping)
Anuj please.
- Alright thank you Tom,
and I'll jump right into the questions.
Kirk if you will allow me.
First of all thank you all for being here.
I met Kirk a few weeks ago at one of the,
symposiums that we had at
the School of Business.
And we got talking about
some of his background.
And he told me about how
he was an astrophysicist,
who has worked for NASA and whatnot.
And I felt like I maybe,
that is the reason why he
is a good data scientist.
And perhaps there is no
hope for the audience here,
who have not gone through
the physics degree.
But I wanted to ask you,
I was very intrigued by
your story, about how you,
came about and started doing data science.
So it would be great to hear
some of your background,
and how you got into this field,
and how is that migration from,
doing some real stuff in astrophysics,
and physics for NASA.
And now,
doing something which is in data science,
and machine learning
and things of that type.
- Alright great, well thank you Anuj.
And thank you guys for being here today.
I don't know if I'm echoing
too much here or not.
The story really begins
very early in my life,
where I just became very curious
about the world around us,
as a child exploring interesting spaces.
My father was US Air Force and,
we lived in sort of remote areas.
And often times I was finding
myself in a desert climate,
or in mountain range
or somewhere like that.
And as a child just looking at the sky,
or looking in the ground
and so the curiosity,
that's natural to every
single human being,
was in play there.
And so I decided very early age,
I wanted to be an astronomer.
So I pursued that and did that.
So as a scientist in that field,
like any other science field,
you're always working with data.
So I tell people that
my night job was data,
as an astronomer.
(laughing)
But my first job after grad school,
and a couple post docs,
one of my post doctorate
fellowships actually was in DC,
at the Carnegie Institution of Washington,
just up the road here.
And so after a couple of
those post doctoral positions,
I got to work at the
Hubble Space Telescope.
And then not only was my,
night job data, but my day job was data.
So they actually put me
in charge of a database.
And then when I expanded
that database work,
into an entire system,
that every astronomer in the
world used for a few years,
and they're use of the
Hubble Space Telescope.
So I became sort of fluent
in various languages,
always working with data and databases.
I guess that is part of my day job,
is also part of my night job.
From there I moved on to,
the Hubble archive data
project scientist for NASA,
where basically,
overlooked the development of this,
state of that art science data system,
for science use around the world.
Focused a lot on user
experience and design thinking.
A lot of the things that are
now natural to data scientists.
Think about whose gonna use the stuff,
and how can you make it usable to them.
And I went from there to
a management position,
at NASA Goddard Space Flight Center,
right around the beltway.
So during those years,
I was working with data.
So my management position at NASA,
was at the National Space
Science Data Center.
And we were curating,
the experiments,
data from experiments and space science.
And at that point this was 20 years ago,
20 something years ago.
We were managing 15,000
experiments of data.
So every,
NASA experiment, whether
it's a big one like Hubble,
or a little teeny,
sensor that bolted onto
the side of a space craft,
collects data.
The PI's at some university,
when they're finished with their project,
the data comes back to NASA,
to be preserved for all time,
in this digital library,
called the National Space
Science Data Center.
So Congress mandated hey,
taxpayer you, me, us,
we pay for this data.
So we have to curate it and maintain it,
and make it usable and
useful to the world.
So about 20 years ago,
I remember it pretty well, 1997,
a colleague of mine
met me at a conference,
and said hey my project is finishing,
we want to turn our data
over to the Data Center.
Can we do that?
And so he was telling me about it.
I knew about the project but
I didn't know much about,
sort of their data, per se.
And he said oh yeah, It's two terabytes.
Well, two terabytes a
day is pretty trivial.
You probably have it on your thumb drive,
or something like that.
It certainly in the cloud on your iPhone,
you got at least two terabytes right.
But 20 something years ago,
two terabytes was enormous.
And so,
I knew it was enormous
but I didn't realize,
how enormous it was until I went back,
to the management team at the
Space Science Data Center,
after this conference and said,
hey we've got this opportunity,
to bring in these two terabytes of data,
from this particular experiment.
And they looked at me like,
you're missing something here Kirk.
And I'm saying what am I missing.
And I said you realize we
have 15,000 experiments,
of data archived here.
I said yes.
And the sum total,
cumulative combined of all
those 15,000 experiments,
is less than one terabyte.
And you want to bring in
15,000 number one after that,
which by itself would require
tripling the capacity,
of the Data Center.
So that point I didn't know,
what to do.
So a friend of mine said look,
why don't we, put together a proposal,
to get some funding to build
out the infrastructure,
to take in this data.
And I said how do you write a
proposal for infrastructure.
I'm an astrophysicists.
I know about galaxies and black
holes and things like that.
I don't know about infrastructure.
And so he said well,
why don't you look into data mining.
And I said that's a curious
combination of two words,
data and mining.
(laughing)
So in 1997, I got into looking into,
what that was and then
discovered this field,
called machine learning.
And having been a physicists I had,
a thousand semesters of math, roughly.
(laughing)
And,
I had no problem with the math,
but,
this was completely different
from what I've seen before.
Actually discovering
patterns and anomalies,
and interesting features in data.
And then, I said this
is really interesting.
So I started digging
further and discovered,
that not only was this true,
and the sciences that
people had lots of data,
but also across the world in healthcare,
and national security,
energy sector and health,
precision medicine,
things like this.
It was just data, data everywhere.
And so that's,
essentially how I just fell in love,
with the concept of doing
discovery from data.
And I would say the birth,
the data scientist in me,
it might have been when I
was nine years old years ago,
you know collecting
data about the universe,
but in terms of what we
would now call data science,
with that machine learning piece,
was about 20 years ago.
But anyway so that's,
how I got to that point in my life,
and we'll,
I won't tell you all
the rest of the story,
since then.
- That's great.
I mean from what you are talking about,
it seems like you were
perhaps among the first,
experts in data science so to speak,
given where you came from,
and how far back in time
you started working on it.
So it's a curious question.
From your vantage position,
especially as it pertains
to the region of DC,
what do you think is the future of,
the data science ecosystem here?
How do you see it moving forward,
particularly with the buzz
of Amazon Headquarters,
too coming in here?
So I don't know whether that impacts,
the future immensely or not.
Would love to hear your thoughts on that.
- Well there's a lot,
there's a lot of thoughts
in that one question there.
But I'll start it off by just saying that,
looking to the future in this field,
is very scary proposition because,
essentially the turn over
time of the knowledge,
or the state of the field
changes like every two years.
I mean it's a very rapid turn over,
in terms of what the algorithms are,
what kind of applications we're doing.
And so even two years out sometimes,
we can't even,
get it right.
Because you know two years ago,
we were just starting to
talk about deep learning.
Prior to that we weren't talking
about deep learning at all.
So there's all kinds of things that will,
appear on the scene,
in short time scales with
this field rapidly changing.
And so when we first started the program,
at George Mason University,
which was about 10 years ago,
the undergraduate program,
we were the first in the world
to create such a program.
So we really didn't know
exactly what to put in it,
and it turns out we had,
we did it wrong, okay.
I have to really admit that.
We had an awful lot of math prerequisites.
Because all of us that were
forming this program were,
physicists and folks like that.
And so we thought all these
students needed to know,
calculus, five or six semesters,
of calculus like the rest of us.
And that of course was completely foolish.
So we quickly learned
and corrected ourselves.
And the other common
mistake that happened,
which we made,
which is to think that,
the data science degree
program in university,
is the only place that this
data science would live,
within the university.
So we kept trying to like
control and handle it,
but ourselves,
when all of a sudden
there's these little groups,
that were spawning off
like in the health school.
There was health informatics,
and then in business school
there was business analytics.
And then the engineering school,
they had the, what do you call it,
massive data analytics program.
And the policy, government
school, government policy,
they had a policy informatics program.
So everywhere we looked,
people were spawning off
these informatics based,
analytics based data driven programs.
And came to realize
that really what it is,
is that sometimes actually,
you can have this concentrated,
program, much like you have hear at GW,
where you have an institute,
you know which is focused on the,
specific training for people
who want to be data scientists.
But since every single discipline,
has information, digital information,
and data,
it has to be just infused
throughout those programs.
So I see more and more of this happening,
in the future, so maybe
this is just a cop out.
What we see happening is more and more,
different disciplines recognizing
that this data analytics,
machine learning, AI,
whatever you want to call it,
the different pieces of it,
those are now gonna be
tools of every profession.
Okay, so I was just reading an article,
about the healthcare profession,
and they're saying that gone are the days,
of sort of the hand
written note so to speak.
And the stethoscope is
really more data driven.
So physicians,
and nurse practitioners and so forth,
need to be trained in the new tools of AI,
computer vision.
Maybe they're not necessarily gonna be,
doing research in these fields,
but they certainly need to know the tools.
Because those are the tools,
that they will be using
for their business,
for their organizations.
And so, I think what happens
when Amazon comes to town,
is not so much that,
I really don't really know what,
specifically in the data
science area they're gonna do,
in the office here.
And I know they claim to have
a lot of technology jobs,
a lot of software development jobs,
a lot of sort of operations
type of activities.
But at the same time, they
are an innovative type,
company, a tech company that uses,
machine learning and data science
throughout their business.
I mean if you ever watch,
Jeff Bezos give a talk,
he's always talking about
data, data, data all the,
I mean it's all about the data.
I mean the guy, he lives in his data.
(laughing)
And he's a shining example,
of a data scientist,
whose really made it
big in the world right.
Cause he's very from the beginning,
believed in the value of the data,
to learn how to serve customers,
what to deliver.
So there's no way that
them coming on then,
that we could avoid having,
an enrichment of our entire DC region,
around data and analytics,
just from that.
And not only just because,
of what they're doing,
but the talent that they will attract,
the other companies
that they will attract,
the start ups that will
grow out of this ecosystem.
So that's been sort of one
of the things that was,
visible in Seattle,
the whole incubator community,
that sort of grew out of the
presence of Microsoft there,
and then Amazon there.
And we anticipate sort of the same here.
So there'll just be a lot
of start ups that will be,
coming around.
So whether you want to work
for the big giant company,
or the small start up,
there's just gonna be a real
vibrant sort of data science,
fabric here across this region.
- So you mentioned,
that's it difficult to even predict,
what it will be in two years from now.
And that poses for academic institutions,
like ourselves here.
Both a challenge and an
opportunity with Amazon coming in,
and with the evolution of,
data science overall in the DC region.
So, what are some of you
thoughts on what kind of a role,
will academia play and
can play in terms of,
streamlining the operation towards,
this evolution of data science,
for the region.
- Well,
okay then again,
this is,
(laughing)
a question I was,
mulling over for years
and struggling with and,
sort of the solution I
came up with in my head,
is a three part solution,
I guess that's what a
physicists does right.
So sort of,
I'll jump around a little bit.
Let's start at the PhD
level and we'll work around,
to Bachelors, then to Masters,
in that order.
So at the PhD level, since
it is a research degree,
basically you just have to
provide the foundational tools,
of what data science would
look like as a PhD program.
Machine learning, algorithm,
a lot of the math and modeling
and simulation techniques.
And then it's really up to the,
to the student in the
program and their advisor,
to come together and say what's,
what are the new frontiers of research.
And so that will evolve sort of naturally,
independent of,
what's happening in the field.
The researchers who need to,
if they're gonna be
doing PhD level research,
find the frontier that
needs to be explored,
that the gaps in knowledge
haven't been filled yet.
So I'm not worried about that level,
because those people are,
automatically the people
that are creating,
the movement of the frontier.
Then if you look at the Bachelors,
again this was sort of
our learning experience.
We were thinking of the,
Bachelors level data
science program as itself,
sort of the stepping stone,
to a full career as a data scientist.
And what we came to
realize that it really,
was more of the foundational stuff,
just like any undergraduate degree,
is really the foundational stuff,
you need for a discipline right.
You can have a bachelors
degree in medical technology,
or healthcare, but that
doesn't make you a physician.
So you need the foundational stuff.
And so that foundational stuff,
maybe won't evolve
quite as rapidly either.
Cause again it's the
math, machine learning,
visualization tools,
the database tools, data
management concepts,
the programming skills.
These things will not evolve as quick.
But the tricky part,
which I saved for last,
is that at the Master's level.
Cause to me that's the real sweet spot.
The sweet spot is the Master level,
cause it's a professional level degree,
where ideally you not only have the,
discipline of the data science,
but you also have the
discipline of a subject matter.
Okay whether it's business
or health or cyber security,
or whatever your discipline is.
And so that is the thing
that is going to be,
sort of like, very fluid as the,
market out there changes,
what, right now healthcare
and cyber security,
might be the hottest things on the market.
Maybe that won't change too soon,
but maybe other things will emerge,
as hot areas are.
And so all of sudden universities say,
oh well we don't have a
program that focuses on,
the data science of Pokemon
or something like that.
I mean whatever the thing is that's,
the hottest sort of,
trend.
So the advantage of the
Master's program is that,
it can be more flexible
within an university.
Certainly you can think of it,
without the Master's degree.
You can think about graduate certificates,
are completely flexible,
as you can sort of
create those on the fly,
in response to,
industry and government demand.
Master's program I think you can sort of,
have similar fluidity in the sense,
that there's foundational courses,
and then you have courses
that can be added to it,
which are more focused on those,
specific hot topics, that
industry's demanding.
And so you can bring in,
if necessary even an outside expert,
whose not on university faculty,
to teach some of those more
subject matter focus courses.
So I think there's a way to respond to it,
but you're gonna have to be more agile,
at the Master's program level I think.
Because that is the professional degree,
which is the stepping stone.
And I believe,
to a long term career in this field.
The PhD is more of a stepping
stone into a research career,
maybe academic or a
laboratory type of career.
A bachelors program is really sort of,
again it's foundational for
you to do almost anything,
in the digital world we live in.
But it's not necessarily saying,
you're gonna be a data scientist.
- So that's excellent advice I think.
But let me,
focus on the research frontiers
that you're talking about.
And we have the Bay area.
We have the Boston area.
We have the universities have
some what of a consortium,
to be able to go in and compete,
for some of the grants for,
especially for data science
and competitional work.
Does it make sense for the,
universities in the DC region,
to somewhat form that
consortium type of spirit?
If yes, then how does one go about,
having establishing that,
especially in an area where there is,
a sense of competition,
and everybody's trying to get
a bigger piece of the pie?
- Well the first part of
your question was yes answer.
(laughing)
I already warned you a yes, no question,
I'll just give the yes, no answers.
(laughing)
So you beat me to it.
So if yes, okay.
Excuse me.
So, I definitely believe that's,
something that has to be solved.
I think as a region,
we have amazing university,
just in short driving
distance around this beltway,
within the beltway,
and just outside the beltway.
A lot of even, slightly
farther away universities,
both in Maryland and Virginia,
have campuses,
either in DC or in Arlington.
So everyone realizes
the value of being here,
so why don't we take
advantage of being here,
and work closely together.
And I think,
it's not,
I mean I've been in academia
long enough to know,
that it's not easy to share with our toys.
Okay.
(laughing)
I used to joke with astronomers about,
when I would start giving,
I started giving talks on data mining,
at conferences,
and as much about my galaxy work.
And they'd say what does data mining,
ever done for astronomy.
Which actually it's done quite a bit but,
my answer to them would be,
we've done data mining forever right.
The data are mine and you can't have them.
So we hold tight to our things,
our ideas and,
even our grant money.
And so it's sharing
gets to be challenging.
But I think,
when you look at some of the
bigger things that are coming.
And I've seen a couple of
them come out of DOPRA,
which are an eight figure grant program.
I mean an eight figure grant.
I mean DOPRA, there was
one program I remember,
was 39 billion dollars and,
it's like,
my university alone
put in three proposals,
which I never could figure out,
why we were competing
against ourselves okay.
So, these kind of things happen and,
you scratch and said why are
we competing against ourselves.
Well same thing with the region.
If there's something that's
significant like that,
can't we find a way,
to get the best out of all of us.
The chances of winning something small,
is certainly better than
100% chance of zero.
(laughing)
And so I think,
I don't have the solution
to that other than,
I like the idea of a consortium.
I like the idea,
that maybe the,
VPR's and the different universities,
getting together and saying heck,
how can we do this.
How can we create this?
And again one of the ways it can work,
is you make sure each person,
or each group or laboratory or whoever,
has some specialty.
Make sure that specialty is represented.
So not everybody is all,
trying to do the exact same thing.
I always tell this to my grad students,
would do group projects in my class.
They were required to do projects.
Of course all students
had group projects right.
(laughing)
I would tell them, I'd say
don't let everyone on your team,
be the exact same kind of person.
Cause if everyone has
exactly the same skill,
and there's no diversity on your team,
you're bound to fail.
And sure enough, those
were the group projects,
that got the worst grades,
cause everyone was
either a Java programmer,
or an SQL database administrator.
They were all the same kind of people.
So they naturally gravitated together,
formed a team and then,
the project failed because
there was no diversity.
So find the complementary,
skills, research areas,
to build a beautiful project.
And I think,
if you pitch it in a way that says,
this is gonna be a beautiful project,
and how can the agency not fund us.
You have,
a better than zero chance,
of getting some money,
versus a 100% chance
of getting zero money.
(laughing)
- I guess I purposely asked,
a yes question to start that conversation,
in the region.
But let me also,
let me change gears a little bit.
Data science is a relatively new field,
and it has really exploded at this point,
with lots of interest.
We are finding a lot of
interest from the students,
from the community overall.
But there is also significant growth,
in the area of artificial intelligence.
And it's a question on how
the practice of data science,
is going to inform or going to be,
related to how AI takes over so to speak.
And then there is a related
question I would like to ask is,
do you have concerns?
Because there is a lot of concern,
about how AI might be used,
on how data science might be used.
So what should some of those concerns be,
and are there,
mechanisms that one
should be thinking about,
and what kind of
mechanisms can there be to,
prevent the misuse or the bad use of AI,
and the bad use of data
science so to speak?
- Well that's, yeah,
that's a challenge that the
entire community is doing,
self evaluation on right
now, self questioning.
There's been a few examples in the media,
in recent years where something,
sort of went off the rails a little bit,
or a lot of bit.
(laughing)
But first I want to start
off by just saying that,
sort of an academic,
answer to the first part of your question,
and then move into the ethics part.
And sort of the academic
side of the question,
is, I'd see AI data scientists,
they're two different things.
I mean data science is,
the application of scientific methodology,
to discovery from data,
that's how I like to say it.
So the two most important
things in data science are,
the data and the science.
Okay.
And that seems like a silly joke,
but it's really the fact.
It's the data or the fuel,
the data are the experimental
observational input.
And science is the methodology,
where you infer a hypothesis,
you design an experiment,
you test your hypothesis,
you collect data from the experiment,
and to evaluate your hypothesis,
and you then you either accept
of refine the hypothesis.
And so it's this cycle right.
And so you follow a rigorous methodology,
where you have this validation step,
where you have these checks and balances,
where you have transparency
about what you're doing,
and the data your collecting,
and how you're using it.
If you're doing that correctly,
it's sort of self policing if you will.
Now AI now becomes an application of that.
So AI, the artificial intelligence,
and I like to say,
artificial intelligence,
there's nothing artificial about it.
I actually wrote an article
called, "The Seven A's of AI",
which is augmented, assisted,
accelerated, adaptable,
actionable, and a couple others.
A friend of mine just added more.
She said, hey you need
awesome intelligence.
(audience laughing)
But it's really about the
actions and the augmentation,
and the assistance to the human,
in that activity.
So whether the machine
takes over completely,
autonomously, or they're just
assist or augments the human,
amplifies the human intelligence,
either way it's not artificial,
it's the real thing.
So where the ethical question comes in is,
are we,
are we using data that
we shouldn't be using.
Are we using the data correctly?
Okay so there's proper use of data.
There's use of proper data.
And there's also,
statistical fallacies that
we can fall into like,
biased sampling,
or confirmation bias,
that are cherry picking, that is you only,
pick the things that are
gonna confirm your hypothesis.
So there's sort of a confirmation,
via a statistical bias.
But then there's also
other kinds of biases,
that have impact on people's lives.
Where the AI takes some kind of action,
presents some kind of behavior,
that in some way detrimental,
to a person or a class of
persons or something like that.
So we always have to be
on the look out for that.
And really,
one of the ways, again
I'm not an ethics expert,
but is transparency,
is that we've got to be able to,
show what we're doing,
have it evaluated by others,
to have them determine,
am I fooling myself that
this is a legitimate,
exercise of these algorithms
of these data or not.
And have different perspectives,
of people look at that for you.
So it is right now like I said,
we're doing a lot of self
examination in this field,
around this, because
there have been those,
examples in the media.
You're probably familiar
the one with Microsoft Tay.
Okay so it was a bot on Twitter,
that basically just could,
was to interact with people on Twitter,
and became,
sort of a vindictive Nazi within 24 hours.
(laughing)
It was pretty awful.
But,
we can try to understand
how that probably happened,
because there are people
on there probably,
testing it to see if it would,
fall into,
sort of repeating bad behaviors.
And just like any child,
if you emulate bad behaviors,
they'll probably repeat bad behavior so.
So I don't think any of this is,
necessarily a fault of AI.
It may be a fault, again it's
a fault of how we train it.
And just like a child, they
see us and they imitate us.
Okay so we're training children,
even when we're not aware
that we're training them.
And then I think that's sort of the,
the mind set we have to
remember also is that,
whatever, data is being
exposed to this algorithm,
it's learning from that data,
whether that's a chat
bot that's being fed,
nasty language.
(laughing)
Or whatever.
It's learning,
that that's,
how humans behave.
So let me behave that way too.
- You're so right about,
we are training children,
even when we are not
consciously training children.
That's what my wife tells me,
so I have been on good behavior all along.
But let me ask you as a follow up,
how did the data science
products development,
when it comes to dev ops,
when it comes to data ops?
How will that be need to be
undertaken as organizations,
have to become more data oriented,
for competitive reasons?
It almost seems like to me,
that this is like trying to change tires,
in a moving car.
And I am only seeing that done in races,
but there's no bay here to stop,
to be able to do that.
So how will,
organizations need to?
- Well those cars do stop
before they change the tires.
(laughing)
Only for two seconds, but they do stop.
So that's a real interesting,
question you're asking there because,
one of the things that we've
talked about in the field,
in recent years is this data ops concept,
which for lack of a
better way of saying it,
this is dev ops for data science.
And so what is dev ops
is sort of that agile,
quick iterative,
product development cycle.
Where you,
get requirements from your in user,
and you build something small
to see is this what you want,
is this what you mean,
is that what you want.
So it's,
if you want, some people
call it fail fast,
learn fast model.
So don't build like the big battleship,
without any sort of testing,
and then 20 years later,
you finally launch it out at sea,
and it sinks.
Oops.
Be sure to test some of the
systems long before that.
And so,
dev ops is that process,
that agile iterative,
short cycle development for,
software development.
Data ops I would say is
just another way of saying,
the same thing for data
science and machine learning.
So what,
so we sort of incorporate
a lot of the concepts,
from dev ops,
when we talk about data ops.
One of those concepts is
the minimal viable product,
the MVP.
So the thing that,
minimally does what,
the customer wants the product to do.
So we're moving from
minimal viable product,
because you could just
build a viable product,
a proof of concept in the sandbox.
It can never be actually deployed,
in a real operational setting.
Ideally you want a minimal,
viable,
not a minimal viable product
but a minimal lovable product,
that's something that
someone can actually,
get value out of.
Okay so the value proposition,
that's to say,
in the context of a business.
Is it making money for us?
Is it producing value for us?
That is, do our customers love it?
Are they actually spending money on it?
Those small little incremental products,
that you build,
that bring
value, that bring that
return on investment.
The benefit of that is,
first of all again,
besides the fail fast, learn fast,
that is you learn how to fix it,
if something's broken quickly.
But the other advantage
is it builds advocacy.
Yeah I can say that.
So the question comes up today,
what do I see as one of
the biggest challenges,
in this area.
It's always the cultural challenge.
Trying to get people onboard,
to do this new thing,
this new way of thinking about business,
or thinking about the things you do.
And the culture challenge
can be at any level,
like the senior executives might,
resist data science,
because they,
or decision science or AI or
whatever you want to call that,
because they may think this new idea,
of data science is gonna take
away their decision authority.
People at the grassroots level,
the business, are gonna fear it because,
they think automation is
gonna remove their job.
And then the middle management people,
probably fear it because they think,
automation is gonna remove their job.
(laughing)
I mean so there's fear and
trepidation at all levels.
Well when they see the
value that it's returning,
that I haven't lost my job yet,
and this thing is
actually producing value,
even monetary return,
if you're in the business
of selling something.
That builds advocacy.
So now all of a sudden,
you get people who are resistant saying,
hmm, this looks good,
let's try more of this.
Let's try more of this.
Let's try more of this.
And so you can now,
you can actually build out
an entire organization,
in these little incremental steps.
And so a famous innovation,
Guy Chiarello.
He's from a thought leader
in field of innovation.
He says the whole,
the difference between
success and failure,
in startups is this,
innovation mentality of,
think big but start small,
fail fast, and learn fast.
It really comes down to whoever's
gonna have the big idea,
but then you got to start small.
So if you fail, you fail fast,
and then you can learn
fast to build on that.
And so I think the,
the learning and the value,
creation is how you sort of get past,
some of the hurdles of
the cultural challenges.
And again it rolls right back to this,
data ops concept.
- That's great.
I'm gonna open it up for
questions in a couple of minutes,
so you can start thinking
about your questions.
But there is another,
you mentioned in the beginning that,
where we are seeing in
universities and other places,
data science happens to be
in many many different areas.
And it has become a compliment of,
many many,
whether it's business or healthcare or,
engineering, or other places.
Do you think that by pulling in resources,
the students for data science
would be better served,
if data science were to be taken,
as a stand alone discipline,
or is it going to prosper more the way,
it prospering right now in terms of,
being a component of various disciplines?
- Yeah I think,
what I said earlier was,
sort of addressing that,
and that is I think,
explicitly.
I think both,
things need to happen.
That is there needs to be a,
concentrated program I would say,
like a data science institute for example.
Because that's gonna be,
probably the place where
the people who really just,
in a sense want to make a
career out of data science,
if you will.
But at the same time,
there's a lot of other
people in the world,
who are not gonna become data scientists,
who are gonna do all of,
traditional work.
Whether they're journalists,
journalism students,
communication students,
or healthcare people, or
even artists or athletes,
or cyber security.
I mean it's just,
every discipline is now digital.
And so these people
need to learn the tools,
to be able to do their job,
in the modern world.
Which is a data driven world,
it's a digital world.
And so the digital
transformation that's happening,
all those disciplines requires people,
to have the expertise and the discipline,
as well as the tools.
So there's gonna be some
diffusion of this stuff,
in every department and
university let's say.
But at the same time,
there's gonna be a core I think,
where if I truly want
to be a data scientist,
I'm not necessarily gonna want
to go to the health school,
necessarily to the policy school,
or the engineering school.
This is where I want to go.
And so both those,
options need to be available to students.
And what that gives you also,
is the ability to share faculty which,
I think is another cool thing,
because then,
I actually saw one university,
they actually, the faculty
who are expertise in,
let's say your field, OR okay.
So operations research,
they might want to borrow,
that faculty member whose the expert,
from the decision science department,
to teach us a,
unit on optimization in OR.
So the Data Science Institute,
doesn't need to hire a
faculty member for that,
they can use this person.
And vice versa, that department may need,
a course on computer vision,
and you guys already have one,
so we're not gonna create our own.
So it actually benefits
the university as well,
in terms of resources right,
cause you're,
we're always worrying
about duplication right.
And so, you get the benefit both ways.
You get the expert to come to you,
or you can provide your course,
to another department.
And so I think,
again maybe not,
a firm answer to your question,
but a little squishy
answer to your question,
is I think both of these,
models--
- And that's a Solomon like solution,
right there.
So let me ask you one other question.
I think,
I've heard so many speakers talk,
when they're asked to give
some advice to students,
for example.
I have heard a lot of
folks come back and say,
that there are at least,
there are two things, two languages,
that you must learn.
And it use to be English and Spanish,
or something else.
And now it is more and
more speakers are coming,
and then saying the two
languages that you must learn,
are Mandarin,
and a programming language.
(laughing)
Do you agree with that?
And if you, disagree with
that, or agree with that,
perhaps a good advice to a student,
and to the faculty,
in terms of what may be one
thing that we are not teaching,
that we should teach
across the curriculum,
or what is it one thing that
students are shying away from,
that they should learn,
because of the way,
the world is evolving,
and data science is playing
a role in almost everything,
that they should be doing?
So I wanted to hear your comments on that.
And then I'll open it
up for some questions.
- Well even as far back as
when I was in grad school,
which was,
in the dark ages,
40 years ago was,
they dropped the second
language requirement,
the second foreign language requirement,
and allowed a computer language,
which at the time was
FORTRAN in science of course.
So they allowed the programming language,
to be the substitute
for the second language.
Which, really bothered me,
cause I worked really
really hard in college,
to take both French and German.
(laughing)
Which turned out to be a disaster,
because I had seven years of French,
plus my family was French.
But after I took the German class,
I couldn't remember any
of the French anymore,
cause it was all garbled in my head.
So I wish I had known that in advance.
I would have never taken the German class.
But I think absolutely,
every student should learn
some kind of coding skill.
Now again, there's different
levels of that obviously.
I learned FORTRAN programming,
even in high school which,
that was practically,
almost 50 years ago okay.
My high school at an Air Force base,
had a very high end education program.
And,
so I knew that very technical language,
but I taught myself basic.
Very simple programming.
I taught myself basic,
because the very first PC I ever bought,
didn't have a FORTRAN compiler.
So the very first PC I bought,
had a basic program on it.
And so I learned basic,
and I wrote little gaming
programs for my children,
to play with.
Little pong games,
and little games.
And I created my own spreadsheet
to do my monthly budget.
And it was all those skills of,
doing those things on a very
trivial programming language,
which landed me that job,
at the Hubble Space Telescope.
Because I was building my own data base.
I was building my own spreadsheets.
I was learning how to do user interfaces.
I was doing all this stuff
that no other astronomer,
in the building knew how to do.
But I did it just to have
fun with my children.
And so,
no matter how trivial that language is,
I think it's gonna open your mind to a,
greater,
computational skill,
sometime later on if that's,
what you're job requires.
But if you haven't learned
the basic fundamentals,
what people call computational thinking,
then you're gonna be at a disadvantage.
So I think the same,
I think computational
literacy and data literacy,
are two things no matter
how you look at it,
or how you acquire it,
I think those are absolutely essential.
And I think students do shy away from,
some of those things because,
they do tend to be mathematical.
So I like to tell the story,
when I was at George Mason,
in the undergraduate program,
one of the courses I taught
was the intro course,
the 101, the freshman level course.
And we called that in Virginia,
the general education course.
I don't know what you call that at GW,
but it's one of those
courses that a student,
any undergraduate needs,
one science course,
one math course, one,
literature course, et cetera,
so one of each of these things,
they need for every student
who graduates needs,
one from list of,
ten different categories.
So you have some right.
What are they called here?
- The breathe requirements.
- Right okay.
So breathe requirements, so we call them,
the Gen Ed, general
education, or the Gen Ed.
So I taught the Gen Ed course.
So as soon we come and take my course,
as the Gen Ed science class,
because they said I don't
want to know physics.
I don't want chemistry.
I don't want biology.
Okay so they came to my class because,
yeah this is data science,
that's gotta be easy.
(laughing)
So normally what would happen
the first day of class,
these students would come up to me,
in the front of the class and say,
there's no math in this class right.
(laughing)
I hate math.
There's no math in this class right.
And I said believe me,
we will have math,
but I will teach you
the math that you need.
And then another student would come up,
and say oh I hate science.
I hate science.
I totally hate science.
And I got to take this class to graduate.
And anyways, so I get all these excuses.
And I think students are
trying to negotiate with me.
I'm sure you've seen this too,
they try to negotiate and say,
as if the professors,
oh okay I'll just drop
all the science and math,
in my data science class.
Would you all be happy now?
(laughing)
Anyway, so,
during the course of that semester,
I would teach the students calculus.
I would use calculus in the class,
and I would teach it to them.
But I wouldn't tell them it
was calculus until after,
we had finished it,
and then their eyes just like oh.
I mean they heard this word calculus,
in high school right.
It was only for the,
those other people over there.
I'm not one of those people.
And I never used the word.
We just learned how to do differential,
and integral calculus
without calling it that.
And then we did all kinds of just little,
science experiments,
where we would, for example,
infer a model of what customer
would buy what product.
And I'd say, suppose you had this data,
from your smartphone.
You're searching on the web,
for this and you downloaded this app.
And so then, yeah yeah,
I do that all the time.
And so you show them sort
of what a data product,
would look like, for example.
Here's the products you looked at,
time of day you looked at it.
Other people who looked at this product,
may also want to,
also looked at that.
And so maybe you could
recommend this product.
And so they build these little models.
And I said, you just did science.
You built a model.
You inferred a hypothesis from data.
And you built it.
And then you can deploy it,
and see if people actually
clicked on your thing,
and then you can refine your
model and improve that science.
And they go, that's science?
I thought science was
memorizing facts in books.
And I said don't get me started,
on what's wrong with science
education in America,
which is memorizing factoids.
So at the end of this process,
these are the students who,
hated the math, hated the science,
they're very fearful of all this stuff.
At the end of the semester,
transformation.
I mean I still to this day get choked up.
This one student who
was the most vehement,
at the beginning of the semester,
for his hatred of math and science,
he came out to me at the
end of the final exam.
He put the final exam,
on the desk in front of me,
at the end of that semester.
And he reached out to shake my hand.
So I shook his hand and he
thanked me for the class.
And he said, you know what,
he said, this is not only
the best science class,
I've ever had, this is the
best class I've ever had.
- Oh that's great.
- And this is a student who absolutely,
had fear and trepidation
of anything to do with,
math and science three months earlier.
And so I think we need to show,
relevance of this,
and I can give you a long story about,
that which I won't right now.
But I've seen cases
where I've seen students,
who are shown the relevance
of a math or a science,
to something that they care about,
and all of a sudden
their jumping right in,
and learning it,
when just minutes before
they declared they would,
never ever want to learn this stuff.
And it has, once they see that,
in their life,
then I think it can break
down those barriers.
- That's great advice.
So let me open it up for questions,
if you have a question
please state your name,
and please get to the question so that--
- Since it's a data science thing,
also give me your social security
number and date of birth.
(laughing)
- [Jazalen] Hi I'm Jazalen Pride.
I'm a senior biology and physics major,
from Columbia Tennessee.
I'm a Howard University
Consortion student.
I previously DM'd you on Twitter like,
maybe three or four months ago about this.
I've been speaking to a few people,
at Booz Allen about data science
and relating it to physics.
And some of them really didn't understand,
how exactly.
And I was wondering if you had,
any long term goals,
relating to physics for
data science at Booz Allen?
- Good question.
So, thinking way out there,
we have a quantum computing group,
which is I get quantum
physics and quantum computing,
really are two different things,
but we are definitely,
we have physicists working in that group.
But I think the long term
value of a physicist,
as you probably are aware,
is the problem solving
skills that you develop,
as a physicist right.
So all those long nights and weekends,
where you're sitting there
solving those physics problems,
believe it or not,
that ability to solve problems,
is not taught in a lot of other fields.
And so,
that's the value you bring to a business.
So you can start interjecting,
sort of your physics knowledge,
and your physics skill,
in a place where people
didn't even realize,
that that was a necessary
component of the problem,
they were trying to solve.
So I wouldn't say that there's anything,
specifically we would call physics,
at Booz Allen Hamilton.
I mean if you were to
call something physics,
maybe you're thinking about a large,
particle experiment.
And that's not what we do.
We don't do like physics research.
We don't do that kind of research.
But we're definitely using,
those skills and talents that you learn,
as a physicist.
Certainly the mathematical skills,
the problem solving skills,
how to pose a question.
And one of the tricks
in physics of course,
is you're given all this information,
and you have to decide
what pieces of information,
in the problem statement do
I need to get the solution.
And that's really what
data science is about.
What pieces of data do I
need to get this solution.
So you're already thinking and,
your mind is already being,
directed into the kinds of thinking,
that makes a successful data scientist.
That's not to say other
people can't do that.
I'm just saying that,
there's a lot of value in
having the physics degree,
to be able to just step right in,
and do the things that are being asked,
of the data scientist.
- Over here.
- [Sajid] So, we have an octave program,
problem in block chain.
There are a couple of direction
and a couple of departments.
So what is the relation
block chain data science,
how do they or what's your
opinion about block chain?
- Your name please.
- [Sajid] Sajid Hermanesto.
- Okay so block chain is,
going through,
it's crazy hype cycle right now.
And so people will claim,
it's gonna solve all the
problems in the world,
like AI last year and
big data two years ago.
But I think in terms of,
any kind of secure transaction.
Okay so block chain has so many different,
whether it's in healthcare,
for example sharing
electronic health records.
Of course that's sort
of an origins primarily,
in financial transactions.
So again having a secure,
record,
that is only accessible by those,
who are permitted to see it.
And again it's unalterable hopefully,
immutable hopefully.
So I think the applications will start,
we'll see more and more applications,
as more people accept
the fact that this is,
a really good way to do it.
Plus it's distributed,
which I think in the modern world,
we're seeing more and
more decentralization,
of different activities.
So I've seen,
examples of these smart contracts,
in terms of sharing data,
like organizations will share data.
And it's part of the contract of,
the MDA, and who can see, who can use it.
So even just in data management itself,
it becomes a tool that can be used.
For example in,
a lot of medical research.
There's all kinds of PII,
and certainly in financial data there's,
personal identifiable information.
So if you can build your
model without actually,
exposing the data to,
a human,
but only to the model,
then that becomes,
yet another application of block chain.
That is, the data can be fed to the model,
without that human being actually,
seeing the data.
So,
the model building takes place in that,
place where the data are not exposed.
So anyways, I think there's,
a lot to come from block chain.
I'm not a block chain expert,
but I certainly pay
attention to the field.
And one of my sort of,
first thoughts about it,
is the data security aspect,
and data privacy aspect.
- Maybe this way.
- [Angelica] Hello my
name's Angelica Wilts,
and I'm a first year data
science grad student here at GW.
And my question for you
is more so on the side of,
ethics and integration in terms
of the data science field,
in this region.
So,
my question,
is more so just on thoughts
of what do you believe,
can be done within this region,
to help the data science field,
progress more in the diversity
and inclusion aspect?
I see the Booz Allen Hamilton,
has done a very very great job with,
opening up their programs for people,
from different races and genders,
and orientations and backgrounds,
but there is still needs
to be more work done,
within the field itself,
not just within one particular,
organization.
So what are your thoughts on how,
we as a field can
progress in terms of that?
- Well I think there is a lot
of advancement in this area,
which I'm really,
proud of.
My company does, but I
see other companies too.
But one of the ways,
that I think this,
happens is there's,
a lot of meetups in the DC area,
where people can go and
talk some presentations,
and hear other people.
And I think the more we see,
different representations
at those meetups,
the speakers as well as the audience.
I mean it will start,
breaking down whatever barriers,
people may have thinking that,
certain persons can't do this stuff.
I mean there's this guy,
whose like the spokes
person for cloud computing,
for Microsoft.
He's 12 years old.
I don't know if you've seen this guy,
on YouTube.
But he's, so.
He walks into the room,
no one's gonna take him seriously right.
So there's an age
discrimination going on there.
Like who is this guy,
to tell me how to build
my cloud computing.
So, the way we sort of
break down those thoughts,
is communicate more with people,
spend more time with people to see that,
they have,
a valid viewpoint, a valid perspective,
and they have the skills.
So let's--
and it doesn't happen
in that vacuum again.
I think these meetups,
provide an opportunity.
There's like literally dozens
of them in the DC area.
And so, if you're involved
with one of those, great.
If you invite yourself to be able to talk.
I know those places are always
trying to find speakers.
And again that sort of
builds that inclusivity,
and diversity just by just,
stepping out and just,
making it happen.
And so some of that is really,
we have to will ourselves to do it.
Because if we wait for
someone else to do it,
sometimes you know it
isn't gonna happen right.
- Okay.
- [Shadeeh] Hi.
I'm Shadeeh Bose in statistics
in Columbia College.
And my question, I'll
give a bit of background.
My question is how would you,
distinguish a data
scientist from someone who,
just picks up a arbitrary problem,
writes the code in Python,
and solves it.
The background is as follows.
So,
in the general education
requirement at Columbia College,
of arts and sciences,
something called G-PAC.
It stands for perspective,
analysis, and communication.
And several years ago,
I was on the,
quantitative reasoning sub committee,
which is part of analysis.
And we faced a problem
defining what's meant by,
quantitative reasoning class.
And our core definition was to say,
of course it covers,
takes a real world problem,
converts it to mathematical, statistical,
or whatever form,
solves it using tools, let's
say math or stat or geometry,
or whatever,
and then converts the solution,
back into the real world problem.
So that was to show why a math
and stat class would count,
versus a course,
somebody taught a course
in human sexuality,
and they showed a chart or a
graph and claimed that was a,
quantitative reasoning course.
So your gonna have some,
so in data science,
what do you constitute
a real data science,
a data science or data science person,
as apposed to someone,
who dashes out a piece of code,
and says well I'm a data scientist,
because I programmed this.
- Good question.
You probably are aware
that almost everyone,
on LinkedIn is now a data scientist right.
So everyone likes to claim
they are a data scientist,
and they may be the world's
best Java programmer.
And I think you raise a
really good point and that is,
we need to sort of clean up our act,
of what we call data scientist.
Because it's exactly what you said.
You have to be able to start with,
a description,
which is in the context
of maybe a customer,
a client or a real world scenario,
then convert that,
into the,
essentially the mathematical,
piece that you're an expert in,
and maybe the coding and the modeling,
that you're an expert in.
And then at the end of all this,
convert that back to a message,
like what we call data storytelling.
You got to be able to communicate it,
back to an audience.
And so the,
going back to what I said earlier,
the two most important
things in data science,
are the data and science.
It's not the coding okay.
The coding is sort of what you do to,
as part of the scientific process,
of manipulating and exploring
and doing discovery,
and inference from data.
So if a person's not doing
the scientific steps of,
taking,
a real world situation,
and here's some data from
this real world situation,
and infer a model from it,
design an experiment to test it,
and then be able to communicate
that back to someone,
then I'd say that they may
be a really excellent coder,
but they're not necessarily,
a data scientist.
And so,
I like to say there's,
sort of three types of
people who call themselves,
data scientists in the world.
There's people who like to,
who can talk the talk.
Okay so there's people,
they can sort of use all the buzzwords,
in a sentence,
maybe not even entirely correctly,
but even if they could
use them all correctly,
they're just using buzzwords.
So, they're talking to talk.
And then you got these people who can,
they can,
walk the talk.
That is they know how to
actually do the hard stuff.
So they actually can do it.
And so that's called walking the talk.
And what I like to tell people is,
in my youth I was that guy.
But I'm no longer that coding guy anymore.
So my role at Booz Allen is,
talking the walk.
That is to say,
I can explain the hard stuff,
to people.
So,
so I'm not gonna sit there
and write the Python.
I wouldn't trust myself around Python,
even though I did learn some Python,
I wouldn't trust myself around that.
But I can communicate that.
So what we're looking for
in our data scientist,
people who have some
mixture of those things.
I mean again,
so some mixture,
so maybe,
one person won't have all those skills.
I think what's one thing we've learned,
in the last few years of
the data scientist role,
was there was this mythical unicorn,
which people talked about a few years ago,
the data scientist who
knew everything right.
They knew all these programming languages,
and knew databases,
and knew a domain,
were a domain expert, et cetera.
And the statistics,
they just knew everything.
And we realize now there
is no such mythical beast,
on the planet.
So the real truth is,
you need a team of people.
Okay so just like,
if you were to take,
for example,
just pick NFL football for example.
If you took all the top 11 quarterbacks,
in the NFL and you put
them on a single team,
that team would loose every game.
Because it's not about,
having 11 quarterbacks.
It's about having the diversity of skills,
the diversity of talent.
And so,
but if you're gonna be the data scientist,
in an organization,
and again, you've got to have at least,
parts of those three phases
where you can talk about it,
you can sort of see the problem,
but then you can actually
work on the problem,
and then also you can convert it back,
and explain it to someone.
- Great.
- We'll have time for one last question.
Okay go ahead.
(laughing)
- Catch me later.
(laughing)
- [Audience Member] Thanks
very much Dr. Borne.
I thought that was a great presentation.
I have a question for you.
We've spent a lot of time
thinking about education,
at the undergraduate
and graduate level but,
one of the things that
I'm curious about is,
what we need to do at
the K through 12 level.
I have two daughters.
One whose now in college and she's,
taking science training.
And my other daughter
whose taken one course.
But it seems like the training
at the K through 12 level,
is really disjointed.
And,
so one question is should
there be a requirement,
at the K through 12 level that,
kids have to graduate with a certain,
not just math,
but a certain level of data
science or programming,
or software skills?
And if so, how do you see that evolving?
- Wow.
(laughing)
The thing I feel most
passionate about in life,
you just asked me about.
How many hours you got?
(laughing)
I mean this is one of the things,
from the very beginning,
why I got into this field.
Can I share a story?
- Sure.
- Okay.
Because I definitely
believe this should be done.
And let me just share a part of,
that part of the story
that I stopped abruptly,
at the very beginning,
that 1997.
Okay so we had that big data set come in.
A friend of mine told
me about data mining,
I was trying to learn
about machine learning.
I couldn't quite get,
what it was all about.
It seemed like it was just more math.
And okay I had a lot of math,
so why is it,
okay why is this really,
beneficial to me?
It's just more math right.
And so I was really questioning what,
benefit this was.
So right around 1997, 1998,
this was happening 20 years ago.
A little over now,
21 years ago.
And,
I was working at NASA
Goddard Space Life Center.
And at the center there,
there's literally like one or
two seminars every single day,
on something.
On engineering, on science
and software development,
whatever, I mean.
There's something every,
there's 25,000 people that work there,
so there's, and they're all
scientists and engineers,
so there's two or three seminars a day.
And one day across my email,
a seminar announcement comes,
that a machine learning expert,
from IBM Watson Research Lab,
was gonna give a talk,
a lunchtime talk on machine learning.
I said oh here's my chance to
go find out from an expert,
what is this machine learning stuff.
And so this was probably an early 1998,
so 21 years ago.
I'll never forget to this day, this talk.
I went to this talk and this lady she,
the first half hour of her talk,
she filled the board,
it was a black board, not a
white board in those days.
Filled the board,
with equations.
And,
almost like an opaque lecture.
Didn't know,
exactly what was the point.
So I was thinking,
so,
in hindsight,
I would say to myself nowadays,
that this was the most
expert professional speaker,
I had ever seen and I learned
a lot from her presentation,
because what she did was,
she lead us down this path,
thinking this was like the most un-useful,
irrelevant,
opaque field imaginable.
Halfway through this lunchtime talk,
she stopped abruptly,
in the middle of all this
mathematics and said,
I will now tell you about
our summer intern program.
And I don't know about the
other people in the room,
but it was like,
you could hear a pin drop in the room.
Like what?
I mean, surrounded by all this,
she's now gonna tell us about
her summer intern program.
Again so she,
I swear to God, she was
playing with us right.
So she said yes,
we teach data mining,
to intercity high school
kids in New York City.
And again it's like this silence right.
No way, I mean it's like,
we're looking at this stuff,
no way.
And she's paused.
The pregnant pauses they call it,
intentionally I'm sure,
just as for us to let
that sink in just how,
unbelievable that statement was she made.
She said yeah we actually
teach this stuff,
in the context of what
matters most in their life.
She pauses again,
I'm thinking.
Now as I mentioned earlier,
I was an Air Force brat.
We grew up in small little remote towns.
I have no idea what
people in the inner city,
care most about.
I'm sorry,
I just never had that experience.
I don't know.
So she volunteered it for us.
She said they care most
about street basketball.
Okay, street basketball.
They live, street basketball.
Before school, during
school, after school.
So we teach data mining in
the context of basketball.
Because at that time,
IBM created this software
product called Scout,
which them became Advanced Scout,
which they sold to all the
NBA defensive coordinators,
to predict next play in a basketball game,
given the players on the court,
the time left on the clock,
the scoring differential,
if it's a fast break or three on one,
or whatever.
They could predict what the
play would be in advance.
So they basically mined
the play by play history,
of all these previous NBA games,
and created a machine learning model,
for the coaches of these teams.
And one of the teams actually
won the National Championship,
with, I mean the NBA championship
which is another story,
I could tell you about
where I actually heard,
Al Michaels in the NBA,
the sports casters
talking about data mining,
on National TV.
It blew my mind.
Anyway so she was telling us this story,
and she said that once they
saw the relevance of this,
crazy mathematical stuff,
they decided they wanted to learn math,
because they wanted to
learn the science and math,
because it was relevant to
something that they cared about.
And so to me I was starting to,
now starting to get a feel for it.
I said yeah, this can be a hook to get,
people,
again pre-college,
in this case high school students,
into the math and the science field.
Cause you're showing the relevance of it,
to something they care about,
and then they want to go after it.
They want to learn.
So if she had stopped right there,
I think I would have,
walked away a happy person,
that I finally got it.
I finally understood the value of this.
But what she said next,
truly did change my life.
And I'll never forget it.
The next thing she said was,
she said, what matters to us is,
our success metric.
And our success metric,
is how many of these students
actually graduate high school.
She said that these students,
come from high schools
in inner city New York,
where there's less than 50% graduate.
I mean more than 50% of the students,
drop out before graduation.
So that's the population
they're drawing from.
You said but the
student's that go through,
their intern program,
97% of them graduate high school.
- Wow.
- Thank you.
He said wow.
That was 21 years ago.
And I remember,
exactly what my thoughts
were in that moment.
I mean I will never forget,
exactly my thoughts when she said,
that I said to myself.
I said if this stuff has that much power,
to change people's lives,
I have got to do this
for the rest of my life.
And that's when I made that,
flip in my career path.
And I believe if we show the
relevance of these things,
we can teach statistics and coding,
and data science if want to call it that,
age appropriate material,
throughout the K, 12 curriculum.
So you're probably
familiar with the concept,
of scaffolding right.
So you build the foundation,
then you build on that, you build on that.
Okay so children are already
good at classifying things.
They classify things by color,
by shape.
They even classify their toys by function.
So you can play ball
with something spherical.
You can build blocks with
something rectangular.
Okay, you can build a
castle with one thing.
You play soccer with the other.
So they already have these
concepts of sorting things,
clustering things, classifying things.
You don't call it those words,
but you sort of teach the concept,
so then you build it at
the age appropriate level,
the right math.
Some coding, again starting
with some very simple things,
like these scratch pads,
and these other kind of things.
And maybe some basic eventually.
And then once you get to high school,
then probably your really
into hardcore statistics,
and Python are programming.
But I absolutely believe,
this data literacy computation,
literacy has to start very young,
and it has to be through
out the curriculum.
And I believe that once
people see the value of this,
we're gonna,
see a complete transformation
in the STEM pipeline.
People who have run away from it,
just like those students in my classes,
said I hate math and science,
but I need this class to graduate.
That's the only reason I'm here.
Please change your entire
course for me, Dr. Borne.
No I'm not gonna do that, sorry.
(laughing)
I said I'll teach you the
math you need to know.
I didn't say I'll teach you
the calculus you need to know,
even though I taught them calculus,
and they had the surprise moment,
when I told them that's what they learned.
And so,
I think absolutely yes to your question.
Again, I could talk hours about this.
I've been for 21 years been
talking about this topic.
I feel very strongly about it.
And I think it's not only K through 12.
I think there's all kinds
of other populations,
and there's serve populations,
returning vets,
people in drug abuse programs,
who need to see the value in their life,
that they have a skill,
they can learn a skill,
to produce value.
My resolve in this was,
magnified significantly,
two years ago,
when my younger brother
died of a drug overdose.
A person who worked in a
factory his entire life.
He went blind.
Lost his job and felt he had no value,
to contribute to the world.
He was wayward,
he turned to drugs.
And one day I got that phone call,
and I said all he needed
was a purpose in his life.
All he needed was to see
that he could learn a skill,
that he could do something
that would contribute.
And I just feel like that,
we're, if we don't do this,
in a digital world,
in a digital universe,
that these children are moving into,
the jobs are gonna be moving into,
then we are doing a huge disservice,
to the future population.
And I'll get off my soapbox.
(laughing)
- On that note,
please join me in thanking Dr. Borne,
here this evening.
(clapping)
It's been a,
privileged talking to you,
and thank you for sharing your thoughts,
and very inspiring stories.
Thank you very much.
- Well I appreciate that.
Thank you all.
- Please join us,
for a reception outside.
- Thank you.
Thank you for your question.
Thank you.
