Sanyam Bhutani: Hey, this is
Sanyam Bhutani and you're
listening to "Chai Time Data
Science", a podcast for data
science enthusiasts, where I
interview practitioners,
researchers, and Kagglers about
their journey, experience, and
talk all things about data
science.
Hello, and welcome to another
episode of the "Chai Time Data
Science" show. In this episode,
I interview a data scientist,
but from the other side of the
world, a data scientist at
Kaggle, Walter Reade, also known
as Inversion famously on Kaggle.
Walter, I believe was the first
Kaggle discussions Grand Master.
And in this interview, we talk
all about his journey into data
science and on Kaggle how his
journey into data science
started via Kaggle. And his
journey now and the other side
of Kaggle. While working as a
data scientist for the team,
Walter holds a PhD in chemical
engineering. This interview also
talks a lot about the science in
data science, how Walter has
approached problems over the
years. And also what does data
science on the other side of
Kaggle. I feel this is a very
special interview in the sense
that this time it's a Kaggle
Grand Master that is a part of
Kaggle's team. So I am really
excited to be sharing this
interview. Without further ado,
here's my interview with Dr.
Walter Reed. Please enjoy the
show. And for the audience. As a
quick reminder, please remember
subscribe to my newsletter. You
can find the link in the
description of this podcast if
you'd like to stay updated with
the interview releases, and if
you're a non native English
speaker, please remember to go
to YouTube and enable the
subtitles for a better watching
experience. For now, here's the
conversation. Please enjoy the
show.
Hi, everyone. It's a privilege
for me to be talking to the
first discussions Grand Masters
ever. Grand Master Inversion now
he's on the other side of
Kaggle, but I'm on the call with
Dr. Walter Reed. Thank you so
much, Walter, for joining me on
the podcast.
Inversion: Yeah, absolutely. I
happy to be here and answer your
questions.
Sanyam Bhutani: Likewise, I'm
really excited. But before we
talk about Kaggle and your work
before joining Kaggle I want to
talk about your body grown
because you studied PhD in you
did PhD in chemical engineering
and you followed a premier
traditional path in research.
Could you tell us what made you
pick me chemistry as a career
path and take up research in
that domain?
Inversion: Yeah, yeah. So I like
telling the story, not because
it's particularly interesting,
but it just kind of highlights
the fact that in today's world,
you really can reinvent
yourself. And so yeah, so I did
a master's in a PhD in chemical
engineering, my master's was
basically in understanding
reaction rates of burning carbon
particles. It was experimental
as well as modeling. I enjoyed
that. And then for my PhD I did
it was 100% computationally
based, and it was understanding
the mechanisms of calculating
particles in turbulent fluid.
Sanyam Bhutani: Okay.
Inversion: So, so the
traditional path, this this part
is just It almost sounds unreal,
but it's kind of Ridiculous when
I was in high school, so I was
in high school a long time ago,
the mid 80s. And just to give
you an idea for how things were
different, my first computer had
two kilobytes of RAM. And I
upgraded it to 16 kilobytes of
RAM, which was kind of a big
deal. And I taken a Pascal class
in high school, so I really like
programming. And I wanted to go
into computers. But when I was
getting close to graduating,
literally everyone I knew was
going into computers. And I,
being the genius high school
student that I was I thought,
well, the market will be
completely saturated and there
won't be any job by the time I
graduate. So probably the same
thought process that data
scientists who are going into
school are thinking well, the
all the jobs will be gone. So I
ended up studying chemical
engineering, for literally the
only reason that I thought it
sounded. Really cool chemical
engineering. Whoo. I had no idea
what it was.
Sanyam Bhutani: You wanted to
rebel against the prevalent
sentiment?
Inversion: Exactly. So, you
know, it was a lot of fun. I
enjoyed it, it was challenging.
Obviously, I went on to get
three three degrees. And it gave
me a fairly good career path. So
so that's kind of my my
background. And the idea that
oftentimes your path is not as
well defined as you think it's
going to be kind of life. You
just go with how life hands to
you and you just as needed.
Sanyam Bhutani: Yeah, now when
did machine learning start to
come into the picture for you.
And what even made you being
attracted to machine learning?
Inversion: Yeah, so this is
another thing I think back so in
early 2000s 2001 2002, the
company I worked for had
distance learning classes that
you could take so this was
before Coursera And the distance
learning classes where were
university professors and they
would record their, their
lectures on DVD and they would
mail you the DVDs, you'd watch
the DVDs at home, you would do
the homework, you would email
them the homework, they would
rate it. So, it was it was like,
you know, Cora before not for
Coursera was before Coursera you
know, but very, very clunky. And
I took this class called
computational intelligence, and
it was what was really hot back
then fuzzy logic which, okay,
you know, it was genetic
algorithms and it was neural
networks. And I really enjoyed
the course. And particularly the
final project was we had these
little 80 by 120, black and
white images of soda cans pop
cans, and we had to classify
which was at a Dr. Pepper was at
a Pepsi and I was blown away.
way that a computer can learn to
recognize these images. Now,
what was interesting about this
is, you know, we actually didn't
write neural networks for this
final project, the professor
gave us his precompiled windows
binary that we ran on our single
core machines. And it was still
extremely hard to have the,
like, the, it was always the
verging. So just getting it to
actually train and predict was
was the challenge. Now, after
that, I spent years and years
looking for opportunities to
apply these tools, but there was
such a gap between what they
were able to do in any real
world problems I, I kind of just
gave up and like, this really
doesn't solve any any real world
problems, which which it really
didn't back then. But but it
planted the seed that you know,
this is kind of some cool stuff
that's going on.
Sanyam Bhutani: So it seems like
you were always ahead of the
curve, even in school. Now
Kaggle didn't exist in 2001.
When when did you discover
kaggle? And I know you become
addict, you became addicted to
it. How did that journey start?
Inversion: So it was in 2012.
And I stumbled upon a
competition and I don't remember
how I don't know if it was an ad
or a press release or whatnot.
But it is the Merc molecular
challenge in 2012. And the idea
was is they gave just a huge
number, like 10,000 molecular
descriptors, and you had to
predict basically the molecular
activity. And, hey, I'm a
chemical engineer up, I thought
it was really good at data
analysis. The top prize was
$25,000. I was convinced this
was going to be the easiest
money that just like everyone
who joins Kaggle like a, you
know, I'm going to win the
million dollars. And that this
in this competition, this is one
that Geoff Hinton and his team
won in 2012. Right using just to
show that neural networks were
kind of you could apply them to
real problems. I did really
poorly under the 300. And some
teams, I ended up coming 23rd
from the bottom of the
leaderboard.
Sanyam Bhutani: Oh, okay.
Inversion: Right. So, you know,
people go in, like Kaggle hard.
Yeah, it was it was not only
hard it was it was like
embarrassing. I was I was upset
I was I kind of crushed my view
that I knew how to do data
analysis. And that was, in my
opinion, a really good thing
because what it taught me was,
there were all of these tools,
new tools that I just had no
idea existed, and it really did
light a fire under me to you
know, get better and to learn
these tools. And that was really
what started my journey. So it
was it was a it was both the the
interestingness of it, as well
as the I was a little upset that
I couldn't do well in these
things that that that led me to
to enter quite a few
competitions.
Sanyam Bhutani: You accepted the
challenge served by Kaggle.
Inversion: Yeah, yeah. But, but
but with a lot of fear and
trepidation and intimidation and
self doubt in, you know, my
guess is what tons of people who
joined kaggle feel, you know, I
just, you know, it's like, okay,
you know, it seems like I
succeeded, but it was it was a
rough journey to start.
Sanyam Bhutani: Now you were
active as a competitor for quite
a long time. And I actually
checked your profile you
competed in 84% competitions in
a solo, which is without a team.
Can you tell us more about your
competitive journey and if you
have any favorite what I like to
call barty stories from the
competitions?
Inversion: Yeah. So you know,
it's interesting, when you look
at my numbers, you know, if you
if you scroll through my my
competitions, a lot of more so a
lot of them were weren't really
good, but I tended to enter
every single competition that
that was launched, and ofcourse,
it's very difficult to focus on
two or even three competitions.
So what I would tend to do is I
would join every competition, I
would start it, see if it was
fun, see if I was doing well,
and then kind of just like
jettison the ones that weren't
in terms of joining the team,
yeah, I didn't do a lot of team
competing and one of the reasons
was is I didn't feel like I was
worthy to join another team and
drag them down on the
leaderboard and again, that's
another mental block that I had.
You know, it took me a few
competitions before I did any,
any, you know, anything that I
would be proud of on the
leaderboard. And once I started
doing that, then it was like,
okay, now I'm really really
addicted because okay, I can I
can actually get better and you
see the progress. You know, the
teams that I were on, were
really meaningful. I like
personally, I did some very
large teams I did some small
also, one or two people to like
the, like the one or two people
that I joined with that were the
most rewarding. The ones that
were larger, typically not as
fun. So battle stories. We were
disqualified from one
competition that was a two stage
competition because the team
leader was doing things until
the very last minute and miss
the model upload timeline by 16
seconds. There's a deadline, you
had to upload your stage one
model, we missed it by 16
seconds. were disqualified from
that competition. I also it's a
funny story. I don't want to
give too many details but I was
working with somebody who was
very, very good and all neural
network studying them in a PhD
program but couldn't cross
validate worth like didn't
understand cross validate cross
validation and how you're
supposed to do it. So the no,
just the little things that you
know, as I as I look back were
are fun to reminisce on.
Sanyam Bhutani: Did you check
later if you would have won the
competition had had they made
the 16 second deadline?
Inversion: Ah, yeah, I probably
did. didn't do that well, so
maybe Uh, yeah, I didn't, I
didn't feel really really badly.
Sanyam Bhutani: Okay. You will
also the first discussions Grand
Master and this was achieved by
you after they redid the
rankings, which meant that you
were already very active in
discussions unlike people like
me who aim to become a Grand
Master because it's already
there. You were active by
yourself. Could you tell us
about your community journey and
what led you to being so active
there?
Inversion: Yeah. So I you know,
I I joined Kaggle in the night I
joked with Anthony Goldbloom
that I was working for Kaggle
for five years before. So once I
again, I came to Kaggle very
intimidated that I was the only
one I didn't know what he was
doing, which of course is false.
We most of us don't know what
we're doing. And, and so it was
very tentative to start
participating on the forums. But
once I did it, it was a very
good experience. Kaggle If you
think of your online
communities, Kaggle is amazing
in the sense of people. It's,
it's strange, because it's
competitive, but yet people are
willing to help and, you know,
give ideas and suggestions and
help you forward. So, so that
became very rewarding. Then, you
know, the other thing I kind of
joke, but it's not true anytime
I was when I was doing my real
job, I was bored, they would
just go on to the Kaggle forums
and participate in the fact that
I had 1200 forum posts suggest
that I avoided my real job quite
a bit over those years. But, you
know, again, I look at our
forums and it's, it's really as
a sense of pride. When I joined
Kaggle, there have been others
that have really stepped up to
the plate, you've interviewed
some of those people who have
just become incredible resources
to the community. And yeah, it's
really neat to see that.
Sanyam Bhutani: I think like,
like you said, sorry.
Inversion: But the one thing I
would say is, you know, no
matter where you are in your
journey, if somebody has helped
you, when you get to the point
where you can help somebody
answer a question, just just,
like, try paying it forward as
quickly as possible. And once
you start doing that, that
mindset, you don't have to be
the top, you can, you know, just
be above 2% then start helping
that 2% when you get the 5% help
the you know that 4% and keep
doing that and that'll get you
where you need to be.
Sanyam Bhutani: But it's not
just rewarding in terms of
upwards but also the community
starts to recognize you and you
start to see those familiar
little icon of faces that always
become active I'm it's also a
warm and welcoming community in
that sense, if I may. Now, now,
coming to community, do you have
any favorite moments from the
discussion then you favorite
again stories that you'd like to
mention?
Inversion: Yeah, so there's I
thought of a few, but
unfortunately a lot of them were
kind of like controversial and
drama. And so I don't
necessarily want to focus on
those, but they do stick to my
mind. You know, there was an
incident where there was a very,
very, very well respected, kind
hearted, I don't know who's a
Grand Master at the time, but
very high ranked, who had passed
away and the outpouring of
support towards this individual
again reinforced What a
wonderful community we have. You
know, I mentioned the support
and the sharing. But on top of
that Kag- the Kaggle community
does a good job of keeping it
light hearted so we we do get in
our little fights and spats and
arguments, but for the most
part, there's not there's no not
a lot of posturing and politics.
It's people wanting to learn in
this people also having fun. So
it's that combination that I
really enjoy.
Sanyam Bhutani: Okay, now once
you became active in Kaggle,
after that you made a switch
into machine learning role,
which I believe many people even
look for when the signing up on
Kaggle. Can you tell us more
about the transition? And how
was calcula helpful in making
you helping you make that
transition?
Inversion: Yeah, so I spent
about 15 years in industry, and,
you know, it was the jobs are
fine. I really, I missed that
kind of intellectual challenge
that you had in grad school. And
I think so many people are in
that spot. And, you know, so
Kaggle was, for me, it was a way
to, you know, have that be
learning on something that is
fun, but solving a problem and
doing something that I thought
was really kind of interesting
to work on. And probably about
three ears when I started to
rise in the Kaggle ranks, I
mentioned to my boss, you know
that Kaggle and I have these
skills and he was really
supportive like he would he
would tell other directors and
people and so I kind of within
the company I was working for,
which is a consumer products,
they started to get a little bit
of a reputation. And it was
interesting because it opened
some doors to work with various
teams that were struggling
solving some problems with data.
And I was able to do a few side
projects like this working with
teams, and what I found was
curious was oftentimes they
would start explaining their,
their project and because I had
done so many kaggle
competitions, I would be like,
Oh, right. So what you need to
do is this, this, this because
this is going to be a problem.
So if we and they'd be like,
like, hot you notice like, well,
because I did three eg problems.
So I understand how to do the
signal processing or whatnot. So
you don't want to talk about
Kaggle it's what one of the few
places that you can Explore so
many different types of machine
learning problems, right? So
that's a plus. So I started
doing side projects, but it was
very slow going and like said,
After about three years, I
really wanted to make a
decision. I'm going to do
machine learning and data
science. It's like that's going
to be my career, either here or
somewhere else. And so I spent
about six months, literally
every day looking at all the new
job postings. Okay. Again, this
fits into this intimidation and
you know, I'm not worthy. And so
I really wanted to see what
people were posting. See if I
had the skills and I finally
pulled the trigger. The first
one, I didn't pass the phone
interview. I like completely
screwed up some question about
Python classes. It was an easy
question. I just didn't do a lot
of coding with classes. Okay,
just parenthetically, we ran a
competition with this company
that rejected me within the last
year. Funny. So then then I
started, you know, over time
getting better kaggle and two
companies reached out to me and
not always going through the
interview process with both of
these and very, very close to
sealing the deal. And then I got
a promotion at my current
company, and the idea was that
it was this role, but I'd be
able to apply some of the
machine learning I, you know,
this is a great opportunity for
me. And so, that kind of
sidelined going to these other
companies, and which was
actually a good thing, because I
think if I had joined these
other two companies, I would
have passed on the Kaggle
opportunity.
Sanyam Bhutani: Okay. How did
the dots then connect for you?
You later joined Kaggle. What
was the story there?
Inversion: Yeah, so these, so
this was in the fall of 2016.
And then in the spring of the
next year 2016-17 you know,
about six months later the job
was I was not doing the machine
learning that I thought it would
be doing. It was like, you know,
months and months of battling
with it to even get like a GPU
to do neural network training.
So I, I would, I would do my
training at home and take the
model to work. Right.
Sanyam Bhutani: Okay.
Inversion: So it was it was
really frustrating. It was like,
again, I was like, do I want to
spend all my time doing the
political battles. And on my
birthday in April, I got an
email from Anthony Goldman. They
said that, you know, we've been
acquired by Kaggle or by Google.
And so now we're expanding your
data science team, and, you
know, would you would you like
to apply and I was having dinner
with my wife and my daughter and
I got this email. And so I was
like, super excited. And I read
this to my family to my daughter
who's 16 at the time, says, Dad,
that's that sounds like a scam
email. So, yeah, so that kind of
course I was. Absolutely, you
know, more than happy to. So
then Anthony told me about the
Google interview process. And
you said, but don't don't worry.
We'll send you there's a book
you can study up on and I got
the book it was 700 pages
cracking the coding interview.
Sanyam Bhutani: Yeah.
Inversion: And, you know, again,
this this fear and intimidation,
this this whole imposter I'm not
worthy syndrome kicked in and I
had actually drafted a letter to
Anthony, giving all these really
bad excuses reasons why I wasn't
going to do the interview or,
you know, go through the
process, but okay, I decided,
you know, what, this, I'm going
to kick myself if I pass this
up, went through the process and
ended up doing okay, working
Kaggle.
Sanyam Bhutani: Can you tell us
more about your current role?
What parts of the operations do
you handle?
Inversion: So, yeah, so Kaggle
is, you know, we have a lot of
engineers who're working on
making notebooks and journals
and data sets and all of that.
The data science team has really
broken up into two parts. We
have our program managers, and
we have our data scientists, the
program managers handle all the
inbound leads, by the way,
Kaggle gets far more inbound
leads than than we can manage.
Most of those are, this isn't
going to make a good
competition, but the ones that
are so they work with them and
kind of help them understand
what needs to happen. We get
sample data from them. Once we
get the sample data then the
data scientists kicking so you
know, that's a me Sawyer Dane
field collington???. And
welcome, Kirsty???, we basically
say, is this a good problem? Is
always the data public is, you
know, it's the data set large
enough is there leakage, we kind
of go through that and
ultimately, prepare the data for
the competition. So it's, you
know, one of the things I really
liked about Kaggle competition
is how modular was it, you know,
you do a PhD program, it's four
years or longer working on one
problem Kaggle it's like, if you
get sick of the competition in
three months, and you're done,
you get another competition. And
it's the same thing with the
work that we do is it does
overlap, you know, quite a bit.
But, you know, we get to work on
all sorts of interesting
problems, whether it's, it's,
you know, signal data,
astronomical data, you know,
whales, you name it, we can we
work on it.
Sanyam Bhutani: And if I dare to
say you also get to do all of
the cutting edge research or
applications that are they in
the industry if you working on a
domain specific competition.
Inversion: Yeah. And that's
actually funny too, because, you
know, I look at the people who,
I'll give you an example the
recent quantum chemistry atomic
coupling one I look at what
people did, I'm like, I don't
understand sequence models. it
so it actually highlights the
gap of one I don't know when you
look at what the winners are
doing. And so it's, it's, I kind
of believe no matter where
you're at, you're always going
to feel like there's such a
large gap between you and what
you need to know. But the, to
your point, being able to see
what people are doing and know
what they're doing so that if I
were to work on a type of
problem, you know, at least
where to look, I think that's a
very valuable resource to have
on Kaggle.
Sanyam Bhutani: That's
interesting. Also currently,
you're a developer advocate, a
data scientist and even act in
the community. If you were to
pick your favorite tasks of your
everyday which, which one do you
enjoy the most?
Inversion: Yes, so in terms of
the community, I, if you look at
my stance, it really hasn't
changed over the years. And I
view Kaggle is one of the best
places to learn machine learning
and data science, the things
that were the leaderboard
doesn't lie, you know, there's
things you can well, you're just
ensemble to get, that's true.
But the reality is you have to
produce good models, you have to
iterate quickly. And in terms of
a free resource where you can go
and get your hands dirty on all
sorts of problems, there isn't a
better place. So my favorite
thing is when we have these
pockets of drama, and people are
upset, and often rightly, for
real reasons of just being able
to take a step back and say,
hey, look at look at what we
have, and look at what it
provides. To me, and I do enjoy
that. Because, you know, I look
at what Kaggle has taught me and
provided me, I'm just not and
and by the way, 2012 when I
started Kaggle that's when
Coursera also launched. And, you
know, back in the so when I was
a kid, you there was no Google
if you didn't know something you
had to go to the library. Okay.
And then you know if I was mitre
???or if I wanted to take this
instead, of course, somebody had
to email me DVDs. And then we
have Coursera, where you can
take Andrew Ng course for free.
Sanyam Bhutani: Yeah.
Inversion: So yeah, I view these
things is just absolutely
amazing democratizing these
things to anyone who has the
interest in drive and motivation
to do it. And so I love taking a
step back and looking at the
hirevue.
Sanyam Bhutani: I think the
sentiment is also we were just
talking about this, but reflect
in the community as well,
people, no matter where they are
in the leaderboard, or no matter
in what hierarchical position
they are, they'll always go
ahead and share all sorts of
amazing information in writeups
during after before the company
since even.
Inversion: Yeah, yeah, one of
the things so Addison Howard is
one of our product managers. And
what he started doing was
writing these summary emails
listing out here's all of the
summaries. Here's all the top
kernels and you look at that
you're like, what is being
shared is just amazing. Yeah.
So, again, working at Kaggle,
but just seeing that summary is
is like wow.
Sanyam Bhutani: If if I again,
if I may say this in three
months, the community comes
together creates this plethora
of resources of kernels of
discussions and later human
solutions, that if you go
through, you can learn much more
than at least what I learned
during my college days.
Inversion: Yeah, absolutely, for
sure.
Sanyam Bhutani: Now about
competitions, I can,
unfortunately speak very little
about competing because I only
competed in a few tiny few
numbers. But to talk about the
other side of hosting the
competitions, can you give us
some insight about what efforts
go into hosting a competition?
This is just download the data
and upload it for your other
other things that we as
competitors miss.
Inversion: Yeah. Well, one of
the things that absolutely blew
me away was how many possible
leaders vectors there are in
data. And, you know, you, you
know, for the, the bits of
leakage, so if your competitors
and oh, you know, seems like all
the competitions have leakage,
that is like a little drop out
of the swimming pool got rid of
it before we lost the
competition, you know, you think
of image images and you have to
look at the metadata that
different types of can't, you
know, camera, the image size.
You know,  here's, there's, and,
and the thing is, is, you know,
there's one data scientists who
scrutinizing this, then we
launched the competition and you
have 2000 data scientists who
are scrutinizing it, so that
that thing alone really does
keep us up at night, trying to
think of all the different ways
that not only the data can have
leakage, but also be abused in
some sort of way. Then the other
challenges, you know, some
sponsors are absolutely
fantastic. Some have these weird
ideas of the types of metrics we
should use that are really kind
of not very good metrics that
will drive the wrong behavior.
So, so there's also a component
where we have to kind of use
soft skills to convince us hosts
or sponsors to, you know, do
things differently.
Sanyam Bhutani: Okay, now,
again, talking about the other
side of competing was actually
competing in competitions. Can
you tell us about your favorite
competition that you competed
in? And do you miss being active
on competitions now? Because
you're not as active as you used
to be early?
Inversion: Yeah, so so
absolutely. I say I joke but I'm
serious that like, the day I
leave Kaggle over there, I get
fired, or you know, whether I
retire or whatever it is, I'm
gonna compete again because I
find it so rewarding.
Sanyam Bhutani: Okay.
Inversion: My favorite one was,
you know, I had three
competitions where I didn't do
so well you know, the first and
then a couple of ??? nut then
the fourth one I did was the
acquired shopper value. And the
idea was, you're given a year's
worth of purchases from a
person. And the idea was is
okay, we're going to send these
people a coupon. So I, so the
data is on people who got the
coupon, predict which ones
would. And they all redeem the
coupon, predict which ones would
then continue to buy the
product. In other words, you
want to send it to the people
who are going to be most
valuable for your for this
coupon. And the reason it was my
favorite was because I was it
was about 1000 competitors, I
was about 500. So like, middle
of the road. I was using
logistic regression, because
that was the only thing I knew
how to use at the time, right.
That was it. That was my tool,
binary classification. And I
started doing a lot of VDA. And
I saw this thing in the data and
I realized if I took these kind
of this group out and train
separate models on these groups
that I jumped from 500 to 50.
Just Just doing that one little
insight, and I ended up 32nd. So
that was my favorite just
because it was so exhilarating
to suddenly do well, and to
realize that I wasn't working
any harder. I was just had an
insight in the data. So, yeah, I
found that very rewarding, very
motivating, and that again,
helped. Keep me addicted to
Kaggle.
Sanyam Bhutani: While you still
using logistic integration, and
you'll find little 32nd position
solution.
Inversion: Yeah, yeah. No, it
was it was. I mean, I think I
think my computer had eight
megabytes or gigabytes of RAM at
the time. I barely knew any
Python and I may have actually
even been using like an old copy
of MATLAB at the time because
that's all I had. Okay, there
was this time where I was I
started using some commercial
software that I had. And then
you know, MATLAB and realizing
all this isn't going to cut it.
And so that's, you know, I had
to learn Python. Then I
realized, well, my computer's
not going to cut it. So you
know, you kind of do this
journey of I got to improve my
tools, I have to improve my, my,
my knowledge. And that was very
early at the time, right? I
didn't have very many tools.
Sanyam Bhutani: Okay. Now,
again, to go back and summarize
your journey, you started
competing on Kaggle. After
taking a few courses, then you
switched into machine learning
slash data science role, and
then you joined Kaggle. Now, to
people who are just signing up
on Kaggle. Looking at this
journey origin is similar to
this what parts do you think we
are missing? What parts might
they miss? Being a data science
newbie or data science
enthusiast or newbie calculus?
Inversion: Yeah, you know, what
I experienced? What seven years
ago about being intimidated.
It's only gotten worse, right?
Sanyam Bhutani: Oh, sure.
Inversion: It's it's only gotten
worse for a number of reasons.
Number one Kaggle's a much
bigger place. So whereas 900
teams was big seven years ago
now we're almost you know, their
competition we have almost 9000
teams so there's that the the
fact that data science has
exploded so much, you know,
seven years ago if you are using
a random forest you are cutting
edge you were you know, an XP
boost that was the high tech now
it's like XP boost right? That's
like the baseline and then it's
even worse if you think of all
of the the neural network type
architectures that seem to just
become out like, like mad right
it's it's, it's, it's hard to
keep up with them. So here's,
here's the one. Let me let me
give a tip that I think is very
appropriate is even seven years
ago, even back then I realized
you can't learn it all. at once.
You can't learn it all and
especially you can't learn and
also I would join competitions,
not necessarily to win, but what
did I need to learn next. So
like, Okay, I need to learn
combinational neural network, I
would join a competition. And I
would just focus on learning
combinational neural I was I
didn't care if I where I placed,
right? Top 30 or whatever I
tried to do well, and I think
that's the most important thing
is forget what's going on and
look at yourself and say I don't
know how to do random force,
okay? Enter tabular and just,
I'm just gonna learn how to do
random forests. I'm gonna learn
how to cross validate, I'm gonna
learn SK learn, I'm gonna, you
know, and just do that and then
make a plan, and then take the
next step. But it's really easy
to go, oh, everyone's doing x g
boost. I'm know learn random for
this, then learn x g boost. And,
you know, and that's, that's to
me. And it's very, it takes a
lot of self control because you
want to, like, there's always
the shiny toy. I remember, back
when neural networks were
starting to get more active
income. People say, Well, I
tried this neural network didn't
work. What neural network are
you using? I'm using lasagna!
I'm always learning. I'm using
Keras, which it's like, you're,
you're chasing the wrong thing.
Sanyam Bhutani: Yeah. And when
do you say this one should jump
onto competition? Should they
complete a PhD in chemistry or
do a few courses or just jump
in?
Inversion: I think it's good for
sure to know the basics. You
know, the tools in data science
have become much more push
button. And that's a good thing.
But it's also important to not
skip at least the fundamentals,
like understand what's going on,
you know, like the, how's the
logistic regression work and
what's gradient descent? And how
does how does the decision tree
work and okay, so how you go
from that to a random forest x g
boost? I think it's, you don't
need to know like the be able to
derive the equations or write
the code, but you need to
understand the fundamentals.
I've seen some well meaning
people use some very powerful
tools and really burned
themselves because they didn't
understand the basics.
Sanyam Bhutani: Okay. And once
someone joins a new competition,
how should they go about it? And
do you have any suggestion on
what competitions to join to
team up or not team up? What
approach is to take? Should they
just copying for the top ranking
kernel during that time?
Inversion: Oh, so first of all,
I would say do what you're
comfortable with it. If you're
new, I would, I would recommend
not to teaming up with somebody
that you know, already. Don't
join and say I just want to team
up with somebody because you
don't know who that person is or
what they can contribute. So I
would, you know, approach it as
a way to get yourself
comfortable, don't worry where
you place if you know somebody
team with them like, especially
if you're co located you work
together as a team and help and
explore different things. The
most important thing is don't
worry. It took me four
competitions before I started to
do halfway decent, like no one
goes and wins their first
competition. It's happened,
actually. But it's very, very
rare. In terms of should I fork
a kernel? If all you care about
who I can get high on the
leaderboard by forking current
kernel, don't do it. I would
recommend that you approach the
problem and write your own very
simple code like the bare
minimum. I'm going to write the
regression logistic regression,
the bare minimum, I'm going to
no feature engineering. Let me
just process the data and see
how that does kind of like a
naive dummy ma model. I would do
that. The exception if the data
requires a lot of pre
processing, then I think it's
fine to swipe some code right.
Some people do some extra Here's
some pre processing. But
otherwise no I would, I would
just start from scratch and see
how far you get. And then later
you can start incorporating what
other people are doing. But I
wouldn't just swipe and start
with theirs, I would start with
yours and pull from other
people's in build it into yours.
That's what I would recommend.
Sanyam Bhutani: Okay, and this
is a stupid question but do you
think one can learn data science
by competing and this originates
from the opinion that hey
Kaggle's not data science
opinion that's now trending
slightly on on social media?
Inversion: Yeah, I mean, it's
it's easy to say Kaggle's not
data science, because you have
to define what data science is
and is data coming to putting
models into production is data
science. You know, writing
algorithms is a data science
cleaning your data, if data
science was working with
organizations to get their
legacy systems and clean up
their data, this there's no fun
in that right. Kaggle is the fun
part. Data Science, in my
opinion, there's plenty of other
skills that are important to
know and to learn. But it's it
you know, Kaggle is like, to me
the most critical right, you
learn how not to overfit. You
learn how to quickly do
exploratory data analysis, how
to rate quick models, baseline
models. So can you learn data
science? I would say that,
absolutely you, you will learn
what works in data science. I
have read a lot of papers and
tried to implement them and they
don't work very well. Kaggle is
ruthless. If it works, people
are going to use it. If it
doesn't work, they're not going
to use it. So if you want to
learn what works in data
science, that's where Kaggle is
really helpful. You know,
there's things that aren't
helpful ensemble in 20 models
isn't very helpful. But I've
talked to some Grand Masters who
say, who do very well and they
say I will make one neural
network model and focus on
feature engineering. And then
I'll do one tree based model
focus on feature engineering and
I'll ensemble those two models
and they do very well because
the focus on the feature
engineering and understanding
that the problem.
Sanyam Bhutani: Do you think
there any aspects that may be a
data science might miss while
just competing on Kaggle now can
they fulfill them if there are
any?
Inversion: Yeah, so for sure,
when, when you most machine
learning people are data science
when they go to a real
corporation, there's this
reality of you end up spending
more times in meetings and
talking about what you're going
to do and how you get access.
You of course, you don't get
that on on Kaggle. Kaggle does
tend to give you a much more
encapsulated cleaner problem
with a defined bit of question.
I think in the real world, you
have to think through what
exactly are what question are we
trying to answer? And how do we
best answer that? And what's the
data? We need to answer that.
And what's, you know, do we need
more? Do we need less? So those
are, are very, very important
skills that you're not going to
develop on Kaggle because we
give you the problem.
Sanyam Bhutani: Okay, now, my
final question to you would be,
do you have any best advice as
if I may, the official
representative of Kaggle for any
newbie characters who are just
getting started on competing or
just learning via Kaggle.
Inversion: Yeah, just this the
thing I've been saying, just
suppress that anxiety and fear
and intimidation and inadequacy
suppress that? I felt it seven
years ago, I still feel it when
I see the top category solutions
I like when I again I'm
competing against this and just
go and have fun and you know, I
promise you know, you will be
rewarded by what you learn, even
if you never win a gold medal,
even if you just, you know, win
some run some silver, what you
learn will will be useful. And
hopefully you'll have a great
time you'll be hard pressed to
find a community that is more
supportive of you as a newcomer
as somebody who you know, it may
not know everything. So, jump
into community, have fun, help
other people and and then like
you start doing good, pay it
forward and help the people who
have just joined Kaggle.
Sanyam Bhutani: Awesome. Before
we end the call, can you mention
the platforms where we can
follow you and follow your work?
Unknown: Oh, sure. Wow. I'm so
Twitter. Let's see. I think
Walter is Walter Reed. Without a
dot, Walter Reed is probably the
the main place that I post, I'm
intermittently sometimes I post
a lot. Sometimes they don't.
That's probably the best place.
Please feel free to tag me on
the forums at Inversion. I'll
just say so that name does not
come from any mathematical
theory people out here, you
know, it's, it's, I like to ride
roller coasters in my spare
time. So you know, roller
coaster seven versions. So if
there's anything I can can do to
help let me know.
Sanyam Bhutani: Awesome. Is that
the story behind your username
Inversion?
Inversion: That is exactly my
username. Yeah. I tend to get
into these hobbies and go kind
of hog wild and there was one
year I went to 14 different
reason?? parks and road east
it's different roller coasters
and I was rich??. I was making
like maps and pins on them where
the roller coasters were. So on
a business trip, I couldn't make
sure I got to a very, very into
roller coasters.
Sanyam Bhutani: Okay. Awesome.
Thank you so much, Walter for
this very honest opening and
very insightful opinion about
Kaggle and your journey and
thank you so much for joining me
on the "Chai Time Data Science"
podcast.
Inversion: Yeah, thank you so
much for giving the opportunity
I really enjoyed it.
Sanyam Bhutani: Thank you so
much for listening to this
episode. If you enjoyed the
show, please be sure to give it
a review or feel free to shoot
me a message. You can find all
of the social media links in the
description. If you like the
show, please subscribe and tune
in each week to "Chai Time Data
Science".
