Long time no bark, humans!
My name is Professor Meatball
PhDog
and I am talking about how to spot a
fake data scientist today
I put a surprise at the end of the video 
so I'll see you there
Awooooooo~...!
[Snazzy hecking intro]
How's it going welcome back to DataLeap
We are bringing data science down to earth
My name is Andrew and we are
going to talk about
our favorite meme lord saltmaster
king of the drama data scientist
John Singer
electrical engineer data scientist
Man
after
my own heart
[Flashback] I am actually about
to get fired after six months on the job
That's sad
I just passed the six month
mark and i still have not been fired
oooooooh
final update i have officially been
let go
now with the coronavirus nobody's hiring
HELP!
what a roller coaster
he answers a
question
and the question today is: Can a data
scientist fake it
until they make it
He gives his
personal take on whether or not a data
scientist can
fake it
[John] Unfortunately i think you can
but only in an unsuspecting organization
that doesn't know any better
a fake data scientist is what we call a
black box button pusher (BBBP)
Okay it's a lot of words
that is
they can instantiate pre-made library
functions but have no clue
what goes on under the hood in fact i
have a fake data scientist in my team
who was hired before i came and i knew
she was fake from day one
this man
is the master of salt this man is
very good at creating this kind of drug
in his own team his full name is here
unless that's not his real name
but if it is his real name is the fact
that he was able to talk about
his own team like this um
she can answer all the basic questions
like supervised learning versus
unsupervised learning that'll fool
most people but she didn't fool me john
singer
master of
data fake data scientists are a big
issue these days if you've heard of a
company
data robot they claim one of the big
reasons they came into existence
is because there were too many fake data
scientists hidden in an organization
this is true but this also strikes me
more as a marketing point for their
business
i do not work for data robot that's a
perfect ploy for someone who does
and i am not endorsing their product i
just like their philosophy on fake data
scientists
besides i hear they have a huge toxic
culture
with a high turnover but then again that
sounds like every other tech company on
glass door
lol our boy john back with the lulls
it's he just not only called out his
team member someone who can be
verifiably
like from his company he also calls out
this whole other company and says
they're toxic and then he calls out
every single tech company on glassdoor
to be a man with so few
flips to give oh see the
field in which i sew my flips and see
how barren it is
you gotta love john wow he makes good
content he should be a youtuber i want
to subscribe to him
by the way subscribe like if you like
this if you want to see more john
this could just be a john channel
companies don't know how to hire real
data scientists consequently they hire
some good ones and some fake ones
this is true in the sense that companies
who hire data scientists and i'm going
to cover this an entirely
separate video but they're often looking
for
someone who is either a machine learning
engineer or they're looking for someone
who's a data analyst
data scientists perform both of these
tasks but
they're not necessarily someone who
lends their
skills in in creating this task
it's been prevalent through the entire
industry before data scientists they
were
uh you know business intelligence uh
consultants uh this conflates the two
positions that almost have nothing to do
with each other a data analyst
munches through data uh picks and
chooses what they want to see
and then maybe puts it through excel and
presents it on a powerpoint they need to
have the interpersonal skills
in order to make sense to every other
data stakeholder
so the designer the product manager the
sales person they all have to understand
what the data analyst is saying
so therefore there's a lot of
interpersonal communication and
cross-functional team support
but a data scientist they work with data
in a more theoretical
and model based sense they'll pump data
either through
a very high functioning algorithm but in
the higher end of the spectrum
they'll be working on machine learning
models and artificial intelligence this
is a blanket term
for the kinds of deep learning models
that exist in industry
so when amazon recommends hit movie or
netflix knows which exactly what show
you'll be dying to see when it comes out
that's because a data scientist a
machine learning engineer was able to
plug that in in the back end
that person doesn't necessarily have to
make pretty graphs that's
an entirely different job it's like
combining a baker
and a brick layer a bricklayer can build
an oven
then the baker uses the oven so i guess
they're kind of related
but you see that if you have or don't
have the infrastructure already in place
you truly only need one or the other
let's keep going
the norm or trending norm today is to
have data scientists in every department
such as marketing
hr et cetera so what ends up happening
is
there are many fake data scientists
hidden in each department
sure john's coming to the point that
like
data scientists work in silos in a fan
company each data scientist
actually might not work together on
sharing different resources
on collaborating on specific projects on
recreating code or removing the need
for superfluous code they'll actually
work
depending on what what team they're on
so an hr data scientist might actually
never talk to a marketing data scientist
depending on how big the team
is depending on how big the company is
so that's problematic
because uh one you're wasting a lot of
potential because
models and product senses
uh flow very well throughout a company
right
so a data scientist is always working
with half of the picture they're not
working with this
the whole picture the other problem with
this is that with this
divide of information john is saying
that
suddenly you don't have the ability to
understand
how many fake data scientists there are
right if there was a big head count
every single year
and we said okay these data scientists
measure up here and these data science
measure up here
that would be easier but therefore it's
harder since there's so many different
siloed data scientist roles fake data
scientists build bad models which is
actually worse than having no data
scientists
because they build flawed models that do
more harm than good
after all when a data scientist builds a
model most non-data science staff
do not question the validity of the
model hardly anybody pauses to ask
is this person doing the right thing is
he qualified to do this what qualifies
him
i have to say that the problem with this
argument
is that when you go to a dentist or a
doctor and say show me
what qualifies you what
they can show you the piece of paper
right they can show you that they
they had the degree they had the
residency uh
experience compounds and layers on
expertise and professionalism
um so it's not easy to see
out in the open i would never go up to
someone and ask them on a team what
qualifies to make design
right what qualifies you to make a
scalable infrastructure as an engineer
what qualifies you to sell as a
salesperson
but at the same time he is right that
nobody
asks whether or not the model is correct
this is
integral right you need to test test
test test engineers don't like to do it
data scientists don't like to do it
but if you don't have the tests to back
up your model
that will do more harm than good and
that's true for both fake data
scientists and
real data scientists i suppose what he's
trying to say here is
it's not like you can catch a fake data
scientist with a model
especially since human explainability
goes out the window
for modern machine learning and the more
abstracted from human understanding
uh the easier it is for a fake data
scientist to get away
finally data robot uplift themselves
caught a typo finally data robot uplifts
themselves
by saying that the answer to this issue
is one of their products which of course
is
automl automl is huge today to where
data scientists might soon become less
desired
because you have an online tool that
does all the math science and model
picking for you john
come on man what are you doing
giving out all the secrets such as how
much a data scientist makes
over here on the top left corner it
knows what algorithm to pick
and tune all you need to do is a feature
engineering part and feed in the right
data
this really is the future automl cloud
ml etc
but we are still not there yet because
most companies need a data scientist to
take care of at least
some of these things in-house this
reminds me of that simpsons quote where
uh there's this general and he's talking
to a bunch of recruits and he says
in the future all wars are going to be
fought with tiny robots but there will
be
people to maintain and clean those
robots
and those people will be you
this is true automl
will reduce the necessity for data
science but like i cover in a couple of
my videos
data science is still a big mismatch
between demand and supply in the good
way data scientists
are in heavy demand and until
ml becomes a lot more reliable and a lot
more human explainable
it's not going to catch on as quickly as
john is foretelling it will
but once it does catch on it actually
still necessitates a good data scientist
so
this just is a positive feedback loop
you want to be as good of a data
scientist as possible
understand your statistics understand
your calculus understand
your linear algebra and then when they
need someone who can actually
oversee all of these auto ml automations
you're that guy that's you you did it on
that last point you may not even
need a real data scientist in the future
a data analyst or a fake data scientist
could just
upload data to a cloud platform and run
a few tests and get results
remember that the auto ml tools will
pick the model optimize and give you the
results
heck the date the technical manager
could do that
i would love to know how to read
remember that the auto ml tools will
pick the model optimize and give you the
results heck a technical manager could
do that
at that point and even a marketing guy i
worked in a company that tried to
abstract natural language models as easy
to
work with as possible for a
non-technical consumer
i have to say that a marketing person
can do it
yes but a marketing person will
ultimately become
a consultant right a solutions
consultant as as the industry would like
to say
that person ultimately becomes a
handholder and therefore
they cannot scale right they're not the
person that actually
ends up controlling all the automation
they're actually just the person
that sells and provides support
for consumers that use this product
all until that gets automated too but
we're not there yet
to be fair there are about 50 companies
that have such auto ml tools
only 50. john can name them all can you
to be fair john is very fair i would
like
john singer for president but as of
today
let's assume you still need data
scientist in-house to run complex models
how do you know if the person you are
hiring is good enough here is my
definitive list
this should be its own video i feel like
i've had so much fun with john today
um this is i'm
this should be his own video john is
john is such a wow there's so much more
john
are you still there
to hear more options what are you
talking about
you
