[Hushed Whispers] 
I heard someone is going to get fired today
Is that?
Is that him?
A  couple of those characters uh didn't
know how to wear a mask correctly
[refreshingly amazing intro]
Today we are talking about... that's too
loud
[faster] today we're talking about
data scientists getting fired apparently
it can happen we are
going to jump down the rabbit hole that
is Quora
uh Quora is a website for
people with sticky little questions um i
recently signed up for Quora so that i
can start
sharing my content and now Quora sends me
a daily email
uh one's about dnd and one's about drama
it started out benign it started out
like hey uh this data scientist is going
to you know
use this method or use this library
should they do it and i'm like that's
not interesting
and then Quora sends
[deep voice] Have you ever seen a data scientist get fired?
And then I said...
that's interesting that's the stuff i
want to see
today is kind of my
initiative to edit record and
not script this video all in one morning
so let's see if i can get this out by
today
if this goes out tomorrow
you'll know
you won't know
so have you
seen a data scientist get fired if yes
why i am actually about to get fired
after six months on the job
that's sad especially now updated april
8th
Oooooooo
that was the time not to be fired let me
explain
first you have to understand the
following this is the truth that no one
wants to hear
you must separate the hype from reality
this person is just
hyping me up i feel like i'm ready to
learn what this guy has to say
machine learning is not completely
usable to the level that is hyped up to
be in every single industry out there
the thinking today is you toss a bunch
of features into a model
and the model will predict with 95
accuracy
here is the truth oh my god the truth
95 accuracy now that's pretty the
problem with accuracy is that if you're
looking into outlier detection which is
what many models are looking for like
looking for something that shouldn't be
happening as if
a credit card transaction isn't supposed
to be happening or they're looking for
maybe doing weird things on your website
you know anomalous stuff
you are probably going to achieve 95 to
100 accuracy
just by guessing normal right because 95
to 100
of the population is normal so 95
accuracy is probably
correct depending on the situation so
that's why it's very
it's kind of a skewed uh understanding
of accuracy
for most of the non-machine learning
public
in industry but let's see most companies
and industries are not ready for ml
oh my god they're not ready
email is so scary they don't have the
right set of features have access to the
right set of features or even know
what features drive their market it
seems like this
this is gonna be some juicy tea i don't
think this person's gonna admit
a lot of faults that they did to get
fired i think they're just gonna say
the company wasn't ready for me ml
is seen as a magic bullet that can solve
anything which is simply not true
if your feature space is not correlated
in some way to the target
variable then no model can have any
meaningful predictive power
for example having a feature such as how
many presidents over 50 have won a
second term election will be useless to
predict who is going to win
in 2020. the reason i say this is
because those are the types of features
that are most available
related but not really driving the model
okay so that's interesting because it's
a very specific gripe
that like the way that people think ml
works is not exactly how
you can actually feed data into a system
so usually how you
feed data that isn't uh numeric right
either
gender race uh ethnic background is that
you have to
either one-hot encode it which is you
have
a list of possibilities so let's say
gender right and then you have
zero or one right so zero or one for
gender but then
if it is not binary then you have to
kind of say
okay so it's zero one hot encoding is
for categorical data where you say hey
i'm gonna
hold the mustard right well if you hold
the mustard
you have to have all the other sorry
hard to explain
i'm happy to do a video on hot encoding
just put a comment right down below
most of the problems that companies want
to throw at ml are
usually not easy problems they usually
want what we call
pie in the sky problems like predicting
the stock market this is actually a fair
gripe because i feel like
a poor understanding of what machine
learning can do
will actually lead to a lot of mismatch
between
hires and then what they the output that
they can produce
uh which is kind of funny because it's
as if people didn't know what dentists
could do
and then you have a dentist come into
your company and they say okay thank you
so much please
unclog the toilet now and then they'll
be like
what in my particular case
this is what is happening been at the
job for six months and i have been given
several ml projects but ultimately none
were solvable
this was because every single problem
that they want me to solve is a pie in
the sky problem
the first project reverse engineer a
famous tech company's algorithm and
outsmart their algorithm and essentially
game the system
this was doomed to fail from the
beginning now mind you this is a famous
company that brings in millions
plus millions plus millions plus
this is a famous company that brings in
millions plus in profit
using this algorithm the point was to
outsmart the algorithm with my
ml efforts so we don't have to pay this
company as much money
i'm supposed to find an ml loophole that
has predictive power or better yet this
fortune 500 company is so stupid
that anyone could crack their algorithm
yeah right ultimately this project was
unsuccessful with the predictive power
of a random guess
did you randomly guess or did you random
forest um
okay i'm actually not sure what this
what the question is and you do find
like you know smaller companies
asking for you know like an in now
undermining larger companies
is don't try to do that if you're a
smaller company
because that's
doing that kind of business is bad so
this company is already bad if this
person
allegedly is correct that i don't know
what this person is trying to do
i hope he doesn't think that his
understanding of ml is randomly guessing
that's not very confident of you as a
data scientist
the second project another problem like
the one above this time
trying to predict the future of the
company in one way or another this was
partially successful
i kind of want to go into that why don't
you want to talk about that more
the successful one wouldn't that help
your case of why
you shouldn't be fired the third project
predict
company profit at the site level you can
already see where this is going
this company has thousands of sites
across the country akin to a retail
chain
i'm supposed to come up with one model
that captures the behavior to predict
the profit for every site at the site
level
we are actually able to get a lot of
census
we are actually we are actually we
i can't read we are actually able to get
a lot of census statistics to make sense
of this
for example you'd think all you'd need
are features like population
jobs etc to see how this will drive the
market
again this is like trying to predict the
stock market because common census
statistics alone do not drive profit
we have every feature we could think of
but still a single model
with a set of features is unable to
capture the behavior of say
new york versus modesto convincingly
because
ultimately it's not about job growth or
the people that live there
but there are other hidden socioeconomic
variables
that we cannot capture or we don't have
access to or even know about
this model actually produced reasonable
predictive power but they want 90
accuracy so the project is deemed a
failure
so so this project is kind of tricky
um i actually had a similar project like
this because
companies that don't invest in
data aggregation and data scraping will
always say hey you know the census data
is out there that's free you know the
government is paying for our data
aggregation
uh that's not true the census data is
bad because it like you said there's a
lot of
hidden variables but that is kind of the
data scientists job
to find those hidden variables and try
to like correlate that with activity
um but yeah if you if your entire
predictive model is hinged on just
census data you're going to be missing a
lot of
huge pieces uh 90 accuracy that
still depends on what the labels are
right because if he was trying to
predict the profit at every site at the
site level
predict the profit 90
accuracy for predicting a scalar
like just a just a value this is hard i
i feel like i feel like this company
wasn't ready for a data scientist so
maybe this is his gripe
is that they're trying to make me do all
these awesome projects
and i don't know how to do them now my
company is realizing that i haven't
really solved any problems yet i make a
high salary someone is going to say
well if he's not solving any problems
then why keep him around
so you see you could have all the ml
skills in the world but if you are given
these types of unsolvable problems
which is what they want ml to solve
because they think
nl is a magic bullet you are ultimately
deemed useless
the truth is that either the industry is
not ready for ml or the type of problems
being thrown at ml is not well thought
out if you could toss
a bunch of variables and predict the
stock market don't you think people
would have done that by now
yet people have done that people have
that's what
that's what like entire business models
like robin hood
and and other like high frequency trades
uh
they are entirely built off of trying to
game the stock market just a little bit
um if you go to robin hood there's
actually a little like buy
sell hold meter of of what they think
you should do with a specific stock and
like are they even allowed to do that
are they allowed to give you advice
i guess there's a disclaimer um but the
main point is
the stock market this is the less a
mismatch of like
in unsolvable problems as a mismatch of
data availability
you have almost no data they may they
are making you
you know aggregate all this data and you
didn't have either the industry
experience or the data or the product
landscape knowledge
of what exactly you needed to gather uh
so it's not exactly his fault but it is
not exactly the company's fault
it is not correct to say that this is
likes trying to put it to the stock
market because people are trying to do
that
right if they have enough data you see
yet that's the type that are thrown at
you because those are the types of
problems that bring company value
the reality is that you're probably not
going to get a neat python data set
what's a python data set what do you
mean by python data set
there's a lot of easy ways to
piece together that that certain people
aren't data scientists or the very least
they didn't have a formal data science
education
and we can we can get into it in a video
you comment down below if that's what
you want
you're probably not going to get a neat
python data set type
problems where the results are great
instead what you are going to get is the
toughest problem the company has i.e pie
in the sky
because those are the types of problems
if solved would be of most value to the
company
it's not it's not false if you are the
first data scientist
uh expect this because they don't know
what you can do
and they won't ask now it is an awkward
situation where
none of his models are useful maybe he's
too stupid or maybe ml just can't solve
any of the problems you see
they're acknowledging that it's not his
problem
um eventually will be his problem when
he's fired but
it's uh it's not his problem right now
obviously the latter is more true
lol i like how he's um you know he's
keeping it casual
well i'm i'd either i'm too stupid
or ml's not going to solve the problem
so what i have proven is ml can't solve
all these problems convincingly
instead of actually solving the
company's problems so before i get fired
i need to find a job with a company in
industry that is actually using ml in
the right direction to solve meaningful
and well thought out problems
this is something everybody should do so
that your resume is not full of failed
projects
why would you put failed projects into
your resume uh that's kind of
interesting
a project never truly fails if you're
doing a statistical project
your project might just not be
significant or they didn't have enough
power
or maybe not enough accuracy but
at the same time no project is a failure
especially since you are tackling these
humongous projects in a smaller company
i think that there is a lot of value
in building out uh infrastructure for a
company that doesn't have
any data pipelines that doesn't have any
previously
statistically significant results but to
be fair that they didn't give you a lot
of data who
who would wants to hire a data scientist
with a bunch of failed projects
depends on how depends on how you failed
fill with
style my final point be careful where
you take a job
make sure you research the company and
make sure the industry is ready for ml
fair enough he makes good points uh and
he
he still doesn't seem to take any of the
blame
but who wants to take blame on the
internet um but he but it's pretty
level-minded it seems like he
maybe didn't understand exactly what
what the company was
was pitching him for um and that's fair
if you're going to be the first data
scientist in the company just expect
that they won't know
what you can do so um you're either
going to be a huge disappointment
or a huge uh happy surprise
intermediate update i just passed the
six month mark and i still have not been
fired
oh the funny story is i believe
i was almost fired one friday everybody
started giving me that look as if
somebody just shot their dog
the it guy gave me this look like he
knew something about me but he couldn't
tell me
i suspect he was told to terminate my
axis by the end of the day
that feeling that feeling must not be
good like if you go walk around and
people are like
oh my god get fired oh my god is that
the hell
is that is that him a couple of those
characters uh didn't know how to wear a
mask correctly
and at about 3 p.m that day there was a
sudden turn of events and a huge project
came my way by accident that the ceo
mandated obviously there was nobody else
that could do it but me
because i was a resident ml guy again he
leaves out some details why
how big is this company now after six
months i believe
i've shown what type of projects can't
be done and what can
so some of the projects coming my way
have been more sensible
recently i put a model into production
by the ceo mandate as long as the model
has to be retrained and maintained i may
have job security lol
hopefully this trend can continue thanks
he thanked us
thank you final update i have officially
been let go
now with the coronavirus nobody's hiring
help
what a roller coaster
i want to help this person well if you
know who this person is or if this
person is you
um get in touch
write a comment down below about whether
or not you like this format uh
it is new but i think it's funny and
spicy
and i think if you enjoyed this content
leave a like
leave a comment have an awesome day
subscribe
peace
[Killer Outro]
