Sanyam Bhutani: Hey, this is
Sanyam Bhutani and you're
listening to "Chai Time Data
Science", a podcast for data
science enthusiasts, where I
interview practitioners,
researchers, and Kagglers about
their journey, experience, and
talk all things about data
science.
Hello, and welcome to another
episode of the "Chai Time Data
Science" show. In this episode,
I interview another legendary
Kaggle Grand Master Mark Landry.
In this interview, we talk all
about Mark's journey into
machine learning. How he got
interested in machine learning?
How did he discover Kaggle? his
addiction for Kaggle? How his
approach on Kaggle has evolved
over all of these years that
he's been active on the
platform? We talked about his
journey at h2o, he's been at
h2o.ai for a few years now. We
also talk about auto ML, we
reveal interesting story auto ML
at h2o.ai was originally called
auto Mark Landry. That's another
story that we uncovered in this
episode. And there's a lot of
discussion around data science
on and off Kaggle, how to
approach Kaggle or data science
problems, broadly speaking. I
usually put a three word or four
word title in the description of
a podcast. This one is all about
data science, Kaggle, and h2o.ai
but it's true to the three
words, we talk all about these
three words and Mark's journey
in all of these three. So I hope
you enjoyed the conversation as
much as I did. A quick reminder
for the non native English
speaking audience, please
remember to enable the subtitles
on YouTube for a better watching
experienced. Without further
ado, here's my conversation with
Grand Master Mark Landry. Please
enjoy the show.
Hi everyone today I have a great
week, the data scientist artist
at h2o.ai not my words for the
record, Grand Master Mark Landy.
Thank you so much for joining me
on the "Chai Time Data Science"
podcast.
Mark Landry: Thanks so much.
Yeah, it's a pleasure being
here. I listened to several of
yours in the past. So it's an
honor.
Sanyam Bhutani: The honor is
mine. Really excited to be
talking to you. But I want to
start by talking about your
background. You did your
bachelor's in computer science
and have worked across different
roles. When did machine learning
first coming to the picture,
when did you find your passion
for the field?
Mark Landry: Yeah, really the
the machine learning I think
kind of some of the seeds
started back then I did actually
do it with the term data mining.
That was kind of one of the
interest when I did the computer
science. And so there's a lot of
programming classes but I took
an AI class I took some database
classes, you couldn't do much an
undergraduate really at that
level. But I did what I could I
thought and so but then went
straight out and to work in the
field and one of them business
intelligence, data warehousing,
and so it kind of just stayed a
little bit of a seed there. So I
had seen a little bit of it. But
you just don't get that much. I
had a couple of books. I would
read them every now and then.
But it was years before it
finally came together. And the
story part of it that I used to
tell people when I had my older
son who's now eight he wouldn't
go to sleep and I was watching a
lot of sports videos. So I would
I would take him at night and
all night long. My wife let her
sleep and I told him and after
kind of wasting a couple months
of doing this, I started
watching a lot of data science
videos. So started kind of
following along with some of the
online courses were coming out,
you know, Stanford was doing
there's some of the moves were
coming alive. But actually, I
was a really low tech one I
really liked from from Stanford,
actually, it was co taught with
Google. And it was really
approachable. And I didn't know
anything about art at the time.
And so just all a whirlwind sort
of came together, I started
going to an art meetup. I met
some people there. And that
moved me along to where I wanted
to go seek a job in data
science. And eventually, oddly
enough at my current company, so
I've been there probably about
seven years by this point, maybe
six, seven years. was doing for
the creative department to
actually start to pursue this.
So we didn't have that capacity.
And so and all this came
together in the span of just you
know, two to four months
something like that. I don't
even remember the timeline was
okay, but it was really quick.
And and that led to Kaggle. So
once people heard hey, isn't
this what you're doing? Like
you're doing this analytics
thing? I was like, oh, no, I,
you know, I'm just barely
learning, you know, just trying
to get by. And, you know, it was
one of these kind of three
things. So it was about the
third time someone asked me if I
was doing it or the third real
thing was it was going to a meet
up with someone who was at the
time ranked fifth in all of
Kaggle. He just showed up to the
Austin R-Meetup, and it was a
talk about random forest and he
was polite. And at the end, he
just said, well, that's all
nice. And then all the cool kids
are doing GBM these days. So
what's that? And it's just, you
know, two nights later, I think
I made my first Kaggle
submission and that was easy. I
just really then that became
that drove the passion, I think
so. You know, I look at it a lot
as problem solving. And I feel
like I did that even when doing
business intelligence reporting,
all that kind of stuff. You
know, you're doing a lot of
problem solving and kind of
anywhere when you're writing
code of any sort, really, but
yeah, it really got together
about there. So it was it was
quite a while of doing other
stuff where I wanted to do that.
I said, I want to do it, but I
finally acted on that and it
came through pretty quickly.
Sanyam Bhutani: Awesome. Talking
about your Kaggle journey you've
witnessed a few years on Kaggle
you've been active for a few
years, Kaggle has evolved
itself. I remember when you had
joined, there was no Grand
Master title, you became a
Master and I think around 2014?
Mark Landry: Sounds around,
right. Yeah, a couple years in.
Sanyam Bhutani: How has your
Kaggle journey been from your
first medal to today? How has
your approach and views on
Kaggle world?
Mark Landry: I think, you know,
I lucked out a little bit, you
know, as a saying, I the seeds
were planted at some of the
first competitions I did, I did
pretty well. And so there were,
you know, in the metal world,
you know, it was all kind of new
to me and everything, but it was
fun, you know, the cookies
competitive drive started coming
out. That was pretty dormant and
me but it was fun to rise up in
the leaderboard, and there's
different ways to do that. And
so, but the first couple of
competitions I did, were sort of
reaffirming of some of these
thoughts that when you read in
the books, you don't see
everything outlined where you
can really do a problem and and
I think we've gotten better as
the world there's so many
awesome notebooks that are out
there now, but you know, there
that wasn't so prevalent back
then and so seeing a couple of
competitions with some starter
code that now we see in kernels
and in Kaggle, and notebooks,
you know, in different ways
they've done the sharing, but,
you know, back then it was
Kaggle producing their own
stuff. And so it was really nice
to see that. So it was kind of
affirming of some of these ideas
I had in the back of my head.
And I latched on to those pretty
well, actually, I think it was a
database script where I saw some
nice features create, I said,
yes, that makes sense. And so
Kaggle itself has evolved. It's
gotten a lot harder, you know, I
mean, it is, you know, you see,
you know, some of my early ones,
I think I have one point I had
one of like the second most
silver medals or something like
that. But there's a lot of
silvers in that early time and
bronzes and things like that,
and they were really fun that I
think, you know, that part of
Kaggle is still there. There's
there's new competitions to do.
I think that every time I did
one, I would find something new.
There's some new middle new
wrinkle that you have to
discover and overcome. And
that's what's just really fun.
And that's what's nice about
cycle to it in the work world.
You don't always get that kind
of churn through probably limbs,
you know, I mean, it's a grind
in one sense that you know, two
to four months typically, that's
a long time to spend with a
problem, but just the same there
concurrently run many different
competitions. And for those that
can do it actively enough to
learn something, there's always
It feels like there's always
something to learn. So,
obviously, we've seen a shift
towards deep learning
competitions, you know, they
were out there back then. It was
really interesting to see you
know, Geoff Hinton actually
competed in a competition a long
time ago won it and his team but
but that wasn't even an image
one and so they've gotten more
and more image related to the
current point and that's good,
you know, they're evolving
that's what's interesting, keeps
it fresh, but it's very
different world now you know, so
there's a bunch of us maybe old
timers that will wait for
another tabular one to come out
and just like feast off of that
a little bit. So I think there's
there's ebbs and flows it's a
time thing as well and so it's
just you know, a year ago about
this time I found a lot of time
to do some did pretty well and
then you know, goes dormant for
a little while but I think
cables done really well of
staying kind of the same you
know, they've gotten better that
the crowdsourcing is amazing and
you know, it's and that's why
it's really tough. In fact,
that's the biggest thing I think
why it's tough they've, they've
really got you know, they're
just momentum is a lot more
users trying these things
particular tablet or one you're
going to hit easy for 5000 you
have just a plain old tablet
competition these days and, and
then the notebooks and kernels
just makes sharing so easy and
you take it for what it is, you
can leapfrog onto someone else's
work and pretend you know you
can fool yourself to think that
you can get in the top 15% or
something just by hitting copy
but if you take it for what it
really is, you know, you learn
little bits here and there. But
that sharing is made some really
powerful submissions so you can
go back to some older
competitions five, seven years
ago no doubt take what we know
now take what's come about extra
boost, you know, the light GBM
photography competitions, but
and do really well some of the
older ones before that was the
under coming out. So it was like
knowing the tools was was part
of it early on when I started so
that's why I think really just
getting these have a little bit
of luck to fall into GBM at the
right time and and see the
feature prep at the right time.
I think that was a good
combination. And so, I know one
of the aims was what's your
favorite algorithm? But anybody
seen kind of talks? It's
definitely GBM did general
purpose one extra boost and
light UPM are awesome
improvements on that. And we
have you know, catalyst and
others too, so, but yeah, GBM
minutes flavors. Certainly my
favorite. I'm glad I looked on
to that at the time I did, and
just kind of got a pretty good
knack for using it early on. And
you could do pretty well, if you
knew problem solving, you could
get some features in there and
use GBM that was about all you
needed back then it felt like
that to do pretty well.
Sanyam Bhutani: Awesome. So what
was the old Kaggle addiction if
I may, because of the
competitiveness or because of
the learning factor? Maybe a
mix?
Mark Landry: Definitely.
Definitely mix of both. Yeah,
because I really, I don't find
myself being that competitive in
any other place in life. It's
just, you know, I did sports,
you know, playing you know, but
it wasn't really a competitive
drive wasn't even there was to
play for fun, almost and to try
to was but now yeah, the
company, the leaderboard is
really addictive. You know it
is, so there's times you know,
it takes time to do well at one
of these, you know, the strategy
sometimes, if I don't know how
much time I can still have fun
in a Kaggle competition even now
you can look back and you know,
some somewhat embarrassing
results, but it was what I was
just trying at the time was
like, okay, let me take this
idea, I'm going to join seven
day, you know, eight days before
a competition is over, and just
see what I can do. Of course,
I'm not going to compete for the
top. But you know what little
bit it is, this looks like a fun
problem will get my hand around
the data and see if I can just
kind of improve something here
and there. And that's what I'm
going to learn. So that kind of
time that's, that's fun for me
and then just still rising up on
the leaderboard and trying to
almost nitpick, you know,
sometimes those are not really
sustaining things. You don't
want to do those in industry so
much, but there's little
gamification almost things you
can do, of just, you know,
curving your numbers a little
bit or staying the problem a
little bit and exploiting this
or that. You know, there's a
little bit of the competitive
drive that gets in there. But
the learning yeah is is
definitely the more rewarding
piece and I think I'm not sure
how when I really, I guess I
realized that so it was sort of
early on it was gave me
confidence, I suppose that I
was, I kind of understood this
whole. I was barely called Data
Science, but you know, machine
learning, artificial
intelligence, essence, whatever
you wanna call it at that time.
Yeah, I get it like, this makes
sense, the supervised problems,
it's just we've got learners
that do that. And you can make
these great features that almost
solve the problem for yourself,
and then let the machine
learning algorithm take over
from there. And so yeah, I think
it gave me confidence of almost
starting, not from scratch
really that without a lot of the
background that you see a lot of
people have, and kaggle is great
for that, too. That's what I
realized early on to we had
people from all over the globe
competing attagirl we still do,
but you know, it was really
exciting to see what the
different countries that people
are coming from so then Kaggle
breaks down a lot of barriers.
You know, you may be fake
barriers, but there is
nonetheless that people getting
interested and then I realized
yeah, as I think what really hit
me was that I was doing an
interview at one point and and I
realized I was talking
everything I was talking about.
And that interview was about
Kaggle competitions. You know, I
was there doing it for the real
world. I've been doing that job
for about a year. So by that
time, so I had some of that
experience. But the more
relevant experience and the
things you can really talk
about, you know, is that the
Kaggle problems are there.
There's a lot of pain in the
real world stuff. Sometimes,
that doesn't really translate
into the show, which, you know,
you know, it's usually one or
two things, but in Kaggle, you,
you can continue to just if you
get a question interview, you
can kind of relate that to a
cackle competition. When we
started bringing on some of the,
you know, some of the characters
ourselves, you know, we would
almost talk in terms of former
tackle problems, you know,
Dimitri, and I'm doing oh, yeah,
that that one is, you know, it's
like, like rocks man, it has
that final flavor to it is that
sort of thing and it's and it's,
it's true, you almost feel like
you don't want to do that you
you know, I do real world data
science and Kaggle but you know,
Kaggle's very much real world if
you take it, you gotta take it
with your right grain of salt
and you know, for what it is,
but there's a lot to learn
there. And so yeah, there part
was, is amazing and it's still
is, you know, especially you
know, for where it's going with
deep learning, you know, they're
pushing the envelope on a lot of
so many different things. And
the notebooks are just fantastic
you can you don't even really
have to join a competition but
it's always more fun I think if
you do but to really absorb that
code and see try to tweak
something you know, even if
you're taking someone else's
great work and if you really
understand that you'll know the
parts to tweak and two parts to
get better and when that doesn't
get better, you're going to
learn something so yeah, so I
think yeah, all that all that
put together is what you're
right to call it kind of an
addiction. If it takes a lot of
time, you know, you one of our
Kaggle Grand Master panels. That
question went around, it was
almost embarrassing, you know,
that everyone really say yes, it
takes a lot of time. I've
admired some of those people
that feels like they did. It
comes to them a little quicker.
So there's a few people out
there that seemed to just drop
in late and competition
immediately for submission can
get into like 15 place or
something like that. Like that's
impressive. But I think most of
us, you know, it takes time and;
Sanyam Bhutani: It's also a lot
of fun like Grand Master Marios
mentioned it to him it's sort of
a game like addiction.
Definitely fun. Also, not just
you just overworking. It's not
this burnout. Endless nights.
Mark Landry: Yeah, I think
there's, there's something there
that that, you know, it wasn't
necessarily competitive. But I
think back, you know, I was not
I'm not much of a gamer, but you
know, I would play these sports
games on Super Nintendo these
old systems. And I would play
them just to keep up with the
statistics, they look at this
whole seasons full of baseball
game, you know, hours and hours
with one team and try to see if
I could then beat those
statistics with another team.
And I would record it on paper
and things like that. It was
like 1415 years old or something
like that. And so, you know, and
that was all fake. And, you
know, you gain nothing in the
really cool thing about kaggle
is you are learning the skills
that can actually change the
world. And that's, that's
exciting. The Kaggle you have
really does play into some of
that the real world sometimes it
should. We tried to seek that
sometimes it's hard to get in
the real world capitalist world,
but you know, in actual
professional data science, you
know, you don't always get the
same rewards there. There's no
leaderboards or no one to
challenge against these things.
So that's why that confidence is
important. So if you know that
you can consistently hit a
leaderboard, you know, you'll
feel more confident as you're
doing the right thing when it's,
it's all on you a lot of times
out there. So yeah, I would
agree with Mario on that.
Sanyam Bhutani: You mentioned
professional world. And another
aspect of removing fake value is
this this aspiration by Kagglers
to transition into the industry
by using calculus, you were
already in the industry, then
you found Kaggle. Did you find
Kaggle useful for the real world
skill you found?
Mark Landry: I mean, yes and no.
So is it too little bit? It was
frustrating. And so I think
again, it's that confidence to
really put the right perspective
and so I was still new to it at
that point, you know, I had this
job because of this, this
interest, and I found Kaggle a
few months later, I suppose. And
so, you know, it was weird. I
really I was giving this
department with a fishing
license. It was back when this
target story about you know that
the connection is Target could
identify, you know, new moms
before sometimes their parents,
it was a kind of creepy story.
But it turned a lot of heads, it
got a lot of executives
thinking. And so at my own
company, they'd read that
stories like we want to be able
to do that. Okay. It's hard to
do that, you know, so we sought
out it was a medical company at
the time or health insurance
really, specifically. And so,
you know, we did what we could
and pair it up with some of
these things. But it is hard to
do it define the machine
learning problems out there.
Sometimes, you know, a lot of
these problems, I think,
unfortunately, can be solved
with the mindset of machine
learning, but maybe just simple
business intelligence. And so
maybe that's, you know, a little
bit of my background speaking,
but when I had moved over to
Dell, there was a team member
that was really curious. He knew
I did some of these competitions
and he said, I want to learn how
to do this. Let's get away from
it. Let's schedule some a little
bit of time here and there. And
so I said, great, find a problem
out there and let's just go take
it on and no, he did and we
talked about it a lot, you know,
in quite a bit of detail and I
said, you know, look, I just did
a pivot table that shows there's
nothing here, like, you know,
the pivot table, show me
everything I need to know, sure
we could throw machine learning
algorithm and get the last, you
know, 1% or point 1% on this.
But the problem that we were
they were trying to solve and it
was an active problem, it's got
a poor definition, there's
nothing left, you know, it's
been solved. So in case it was a
churn prediction and the
definition of churn at the time,
for a particular segment, there
was just too late, you know,
they had already, they would
wait to call it churn until
after about four quarters, and
nobody was coming back after
three. And so I think that's the
problem solving bit, you know,
yeah, you can throw machine
learning algorithm at it, but
studying the problem and seeing
where it's at, you know, Excel
showed me everything I really
needed to know, to know, wow, we
need to change this problem
altogether, you know, is the
wrong definition of charm, we
need to if you want to really
act on this, you got to act
faster, we're going to change
something here. So I didn't need
a machine learning model to do
that. I just needed, you know, a
few hours of Excel. So and you
know, everyone has their own Ada
paths and things like that, but
Yeah, that those are valuable.
And you learn that in Kaggle,
too, I think, you know, you can,
you know, it's up to everybody
what they take from it, you
know, you can just sit there and
blend the models, if that's,
that's what makes you happy, not
what makes me happy. And I'll
lose a lot of times when that's
the case. So creating features,
I think, is where that comes in.
And almost these little mini
models, like I'm happiest if I
have features that are nearly a
model themselves, and then we'll
take machine learning to get it
that last extra bit. I think all
that is, is real world. So how
does someone really get started?
It's sort of just bounced around
that question, but um, yeah, I
mean, picking it up. If there's
a lot to learn there. I think
there's a lot to see. These are
real problems, real data sets
that you a lot of people have a
tough time coming into, you
know, it's much harder. The data
sets are big enough. There are
several legitimately big ones
out there too. So I would say
it's impressive. One big thing
to to a barrier that I think a
lot of people have is and
Kaggle's done great of is just
the small data, medium data, big
data. There's these aren't
definitions but you know, if
you've done a lot of Kaggle
competitions, you don't think
twice, about trying to get you
know, 250,000 rows or a million
or something, it's just nothing,
you know, your laptops can
handle that the tools can handle
that to these days, all in
memory. It's all fast. And so
it's just, you know, whereas I
think you go back 10 years, in
some cases of probably people
are now they're just going to be
a little scared once they cross
1000 rows, or something nearly
trivial. It we would say, for
doing these. So there's, there's
a lot of stuff out there, you
know, that the kernels are
great. It's a great way to
learn. But I would say certainly
competing and trying to figure
out what you know, and what you
don't that's the biggest, I
think asset of kaggle is that
it's a bit like school in that
way. And it just keeps on coming
is that you know, you've got
this project and it just can't
you, you can fool yourself, but
you really can't fake what you
know, and what you don't know.
If you if you just go and try,
go see where you stack up on the
leaderboard, you know, you may
have hit on this grant, great
feature that lets you sit on
everything but you know, that's
very unlikely. So I think that
just continuing to do it too,
because every one of these
competitions is not going to
teach you everything. They're
all going to have their own
little wrinkle. So you don't
want to always be trying to
optimize rmse and think that
they're completely understand,
you know, the whole robust
regression and me and all that
kind of stuff. But you could go
a year without seeing
competition like that. So
there's always more out there.
So, yeah, Kaggle's useful, I
would say, people pay attention
to Kaggle results for sure. I've
had some people trying to send
stuff to me, like, you know,
touting their 3000, you know,
place in a competition that
have, you know, 5000 or
something like that. So, I
think, you know, I don't know
where the recruiters are out of
catching that. But I think, you
know, it's, it's realistic, you
know, if you be honest with
yourself, where where things are
at, and could you do this, you
know, when you're out there for
a job, I think that's maybe one
of the big things to think about
is as you're doing this Kaggle
kind of prepare yourself, if you
get a job in most companies, not
all of them, but most companies,
you're going to be either solo
or on a very small team that
who's going to rely on you to do
some things. So unless you get a
great opportunity to learn from
someone, like an internship or
something like that, and some of
the The companies are so big,
they have giant departments and
you can really can sit in there
and learn from people. And
without too much pressure to but
there's there's no crowd for you
to follow. There's no
correction. So many times I'll
still do it years in, you know
you're sitting on, like day one
is was usually the most fun with
a competition. And so you got
this new data set, you've seen
where the leaderboards at and
your own model is telling you
your CV is telling you that man,
you know, this could be a first
place submission, I'm onto
something here, but you know,
you know better and so you drop
that in? And yeah, sure enough,
you down, you missed the boat,
you know, you're at 80
percentile or something like
that. And so that correction
doesn't happen when you're out
there in the real world. So
that's one thing to, to be sure
of that, you know, I think look
at it and just really try to
test yourself all the time and
be honest with what you know.
And if you don't, the people who
share afterwards, you know, that
stuff is great. I don't read
those as much as I used to, but
early on, every time you've done
a competition and you put all
your effort into it, and then
all of a sudden it's over in the
top 10 you know, half of them
are saying this is what I did
have I did. And sometimes you
know, there's going to be some
people are gonna do some things
you didn't think about that are
really just great. And you can
learn from that. And there's
going to be some people that do
what you thought you did, and
they just did it better, you
know? And then you have to look
at why did that that was my
idea. Why did it not work for
me, you know, those kind of
things. So that's the kind of
learning I think if you look at
it with the right lens, you can
really get a lot out of some
unfortunate takes about two
months to get there. Because
these competitions last a long
time. But yeah, there's there's
a lot of time cackles awesome
for learning, I would say.
Sanyam Bhutani: It's also a
humbling experience, like you
mentioned, and I think you
mentioned about this in a school
as well. You spend equal amount
of time on the same problem as
everyone else, and you realize
the other person has done much
better. And then you go back, do
your homework learn and come
back for another competition.
Mark Landry: Yeah, that's that's
really yeah, that's that's a lot
of what that that early part
was. And so there were some
really, you know, there's almost
these eras of, you know, of
super Kagglers, almost. So the
people up at the top, you know,
there were a lot of our users.
Fortunately, just there was one
of them, you know, he sort of
grew to the top at that time. So
you know, but he was very active
when, when I was starting, and
then several other really good
ones were putting a lot of nice
our code out there. And so it
was really good to see. And
yeah, it almost amazed me how
many times when you would look
at that it wasn't that
otherworldly what they had done.
There are a lot of really cool
ones that was like, wow, you
know, that is just impressive.
There's no way I would have
ever, you know, gotten there,
either it was a good idea, or
just some people just, they have
the execution and the
persistence, to really, you
know, go the extra mile or 10
miles a scene sometimes to
really see that great feature
and just grind the screws in and
just keep taking it further and
further and further. So you
know, there's some times I've
been fortunate enough to latch
on to something that I can do
that and just keep with it for a
while. But you know, that part's
hard. You see how that is the
idea, but man, that's a way
better implementation of it, or
it would take me so long to
write that code to get that
working the way they did. So,
yeah, lots to learn on those.
That was kind of but yeah, I
think if you just sort of be
honest with yourself about where
you're at a lot of times, and I
say that, I guess I've said this
a lot in the last few minutes,
but I've seen a lot of people
who are not I think a lot of
people who look at some of these
solutions and think they know
it. And then, you know, I either
teamed up with them, or we're
just kind of just discussing
things and, and they're sort of
a little too passive about it,
you know, you think, you know,
you see that solution. Oh, it
looks so easy. I can do that.
You know, that's why actually
getting in there and actually
competing and not making excuses
for yourself on on why you're in
the bottom half of the
leaderboard. You know, there's
stuff out there and then there's
something for everybody. Some
people legitimate are like,
yeah, I see what I was doing.
But this is what I'm doing. I'm
trying to optimize GLM for this
one, you know, it's not going to
win, but this is what I want to
do. That's great, you know,
Kaggle there's no pressure.
There's really it's nice in that
there's really no pressure to
have a bad finish. I feel like
you know, early on I felt you
know, there a couple of them
where I was embarrassed to like,
oh my goodness, I finished in
you know, the 14th percentile or
something like that. How could
that be but time is such a
factor. Some of these things,
you know, you have time and you
can't even really count
submissions, sometimes you can
just get in there and fire off
five cheap submissions doesn't
really show how much time it is.
So it does take time for most of
us to do well. And if you don't
have it, so why, you know,
that's the thing how easy is it
to get into Kaggle it's, it's
really easy, you know, you could
just just go submit all zeros if
you want to, you know, you're on
the leaderboard and then you
just go from there. I mean, you
know, the mean is it was kind of
what when I say that was the
very first submission I had was
just the average submission and
that got me about halfway
through is like, Well, that was
easy. It was just a CSV that was
that's that's nice and easy. I
could do better than that and
just started doing some other
things. And these these
constants kind of jailed inside
that was that if that? That was
it. I was hooked on as far as I
can remember day one or night
one really, I think as it was.
Yeah, because it's like, it was
really approachable. So okay, go
should be out for some people,
so.
Sanyam Bhutani: Do you have any
favorite battle stories or any
favorite competitions out of all
of the huge number that you've
competed in?
Mark Landry: Yeah, I mean, I
think there's there's these
little little things about all
of them, I guess. And so you
know, it's fun to go solo on
some of these, but most of them
are. My favorite experiences are
as teams, certainly with SRK. I
think there's, you know, there's
some of these aha moments, you
know, SRK is just so much better
at it at all the little pieces
of modeling than I was. And so
my contribution to a lot of
those teams was just coming up
with an interesting idea to try.
Often my implementations would
be worse if we were doing the
same things. You know, he's just
great. But, yeah, there were
these times and I just remember
I have these, you know, we were
communicating on GitHub with
these GitHub issues, like kind
of hijacking that tone was there
was not, we weren't doing slack
back then. And it was so late
and one of the competitions that
eventually we got a gold medal
in and we've been doing this for
felt like a couple months or so
and slowly creeping up the
leaderboard, and there are a lot
of familiar faces in the
leaderboard at that time too.
And it just felt like this aha
moment and as I started to
realize one of the little kinks
in the competition we just
hadn't been looking at one part
of it right? And it wasn't a
super valuable part to do the
problem, you know, overall so it
wasn't really embarrassing that
we missed it, but I figured out
oh, this is what it was and we
just finally reached into the
top 10 when that happened and so
I feel like like those aha
moments are kind of fun. Because
I guess that's my personality to
do these competitions. I'm
always in it sort of for the
features in this explorer
exploit. I'm always in favor
more exploring usually for these
sorts of things I hate just just
okay, finally going to give up
on my features because it feels
like you don't have to do this.
It can be a trade off explorer
exploit but, you know, feel like
once I start to get into the
hyper parameter tuning to quite
a while then I feel like I'm
done. Like I said, I always feel
like out of the future, that's
what's gonna difference and
sometimes it is sometimes it's
so but yeah, that was the that
was one of the competitions.
ICDM, I get that right. icml(?)
But with SRK and then Rob from
Austin. And that was a fun one,
I suppose. But there's a lot of
those little moments in these
competitions. I mean, being
first for the first time, that
was exciting. That was that took
over the data geek. We teamed up
as my first ever time to team up
with somebody I didn't know
personally from Austin already
he just kind of reached out to
me early on and so we worked
pretty hard at that one and just
really started getting getting
things working and we had this
really nice you know, it was
beautiful trade off he was doing
a regression he was doing a
classification model that fit
into my regression model. Nice
separation of duties both doing
some very similar tasks. And
yeah, we finally got up into
number one I think at about two
in the morning wherever I was at
that time that was exciting. Oh
screenshot like number one and
the Kaggle competition is
exciting. So that was those
couple of standout and then in
the I think the team part of it
too, like just someone to share
with some of those things. Yeah,
that's nice as well.
Sanyam Bhutani: This is from the
AMA, any advice on how to win
your first gold medal in Kaggle
competition?
Mark Landry: Yeah, it to me all
I can really is what I am to
some extent, you know, I'm sort
of a feature person so I'm in
and I like this I mean knowing
GBM knowing how to operate GBM I
use the R implementation for the
longest time before actually
booth starting to get popular
and started a huge user base
chose GBM and still am to this
day. But the early RGM just sort
of understood it at a pretty it
just came out a little bit
naturally idea of decision
trees. So I think you definitely
need to be familiar with the
tool you got. And so, you know,
the tools now are you see and
you can have to ensemble at
least you know, your best GBM
basic algorithms if it's a
tabular one you know, deep
learning I'm not gonna make too
many comments on that because
I'm not the authority on that
for sure. You can see my tracker
going on that you know, that's
altogether different. So study
up to those kind of things but
um, yeah, if it's a tabular one
with a gold medals, I guess I've
earned it. They are, they are
hard. And so you know, some of
them are I guess that just
really understand the problem.
CEO, you'll see the same kind of
way you set up. I think a lot of
the ones that have fallen the
right way for me would be ones
where you understand your CV
structure a little better than
some of the other people. And so
there may be non obvious ways of
crossable validating some of the
tougher problems, I think. And
so in fact, the reason I met
Marios and SRK was because of a
post you know, back in the rain
competition where I was first
for about half that competition
and that each tool was decided
to share some information in the
meetup there and so I gave arto
some content he could he could
do the want to stay compliant.
So we made a post that did
everything inside the Kaggle
rules there and said, okay, it
was one of the tricks that you
know, I had two main tricks and
I said I can't give them both
away but here's this one. You
know, I'm at this problem. I
think a lot of people are not
seeing the problem right. And so
that was one of the keys for me
to doing really well is that I
just spent the time looking at
that at the setup. It was a very
weird competition metric for a
lot of people. And but it kind
of became obvious when I looked
at it, but it takes some time
and so explore and and really
understand, you know, what are
you doing? How it all feeds into
how do we cross validate
ourselves or just validate
ourselves however the you do
folds or whatever it is? That's
a huge part of the problem,
particularly in industry, too. I
think that's a really big part,
you know, how does your test
data look? So how does Kaggle
frame this problems that I can
do the same? I'll do my inner
validation, that kind of stuff.
That's a key part. And then just
yeah, just trying to figure out
I suppose what, what the
algorithms won't see by
themselves. And we have some of
that that's, that's, you know,
targeting coding has caught on
in the last couple years, you
know, you could do really well
if you understood target
encoding, maybe two years back
or so. But now it's become
pretty common to be able to do
that. So you have a kind of this
mini model in itself. But just,
you know, whatever flavor of
GBM, you know, you're using,
they all are decision trees in
the back of it. And decision
trees are limitations. How can
you overcome that? What can they
see what can't they see? That's
a hard thing to describe. But
you know, if you understand that
the algorithm and so maybe
studying that a little bit and
seeing, you know, what are
people why are people changing
feature that way, you know, we
think a lot of times feature
engineering is this simple
concept, where we're just really
trying to talk about whether
we're one hot encoding or
labeling coding, you know, these
features to present to them, you
know, and that's part of it. But
really, when you get deeper,
there's just things that the
algorithms can't see very well,
that would be obvious to you
solving the problem, especially,
you know, doing eta and things
like that. So, I think exploring
all the time and trying to
figure out where the next, you
know, the next thing to give the
algorithm that might help, and
then understanding that it's
going to fail a lot, you know,
everyone almost says that, you
know, there's there's a lot of
good ideas, a lot of great
ideas, how can this not work
and, and it's just, it just
doesn't, you know, the algorithm
already had that or there just
wasn't anything there. It's not,
it looks nice and clean when you
see it your way. But, you know,
there's all a lot of noise on
the end of that and can't use
it. So, you know, some of that
takes time, I suppose. I think
of it a lot of problem solving.
So if you really take it from
the basic sometimes and that's,
you know, I think you can get
pretty far with that. You might
have uncover something in that
phase that that helps you create
features to go into some of
these algorithms. But that's,
that's most of what it's about.
So and you see everyone uh, you
know, all the people that are
consistent about being at the
top of the leaderboard, you
almost a one is figure out what
the cross validation strategy
is. And so really understand
that understand it well, because
you can make more submissions to
yourself than you can to the
Kaggle leaderboard, and you
don't want overfit it too much
anyway. So you want to be have a
nice, clean validation process.
So that's important.
Sanyam Bhutani: If,
unfortunately, you don't join
many competitions now, but if
you were to become active in the
competition today, what will be
your first go two steps when
working on it? How would you
approach the problem?
Mark Landry: I think, you know,
to me, I guess everyone is sort
of different, you know, I would
lie I get excited about the data
sets and I can sometimes get
excited about things that almost
no better you know, you just
you're looking into it and you
really see some meaty
categorical features and like,
oh, yes, I've got this and then
you know, the algorithm can see
those two so, but so I think I
get excited about the day since
I've certainly downloaded far
more data sets than I've
actually participated in
sometimes I don't even really
get a competition, you know, a
submission in a lot of times,
I'll get just a few submissions
in, because it's just so fun to
try some of these, you get into
some, like some of the
earthquake predictions, you
know, I didn't have a
particularly great finishing
that one. But it's really fun to
look at that data set and to
really think, and that's another
one too, if you step back, you
look at that earthquake
prediction. Nobody knew what
they were doing in that one, you
know, the scores are all really
bad if you really look at what
we're trying to do, of solving
that problem. And there's,
there's these higher level
things going on, if you really
look at the way that those
requests are, you had some that
would last, you know, three
times as long as the others and
these things matter, you know,
understanding your data, I
guess, from the maybe old school
statistician kind of standpoint,
but really understanding the
nature of your data, not just
how the algorithms see it, or
how a lot of people do you know,
what are what problem are we
solving and what are the things
that make a difference in the
way this problem might be
solved. You know, and your
algorithms again, do a lot of
for you, but it still helps to
kind of have that. So that's it
spend some time with the data
set. You know if it's also fun
to sort of get in there early
and get us some submissions in
it is kind of fun to be first
sometimes. I don't know how many
I've done that for probably four
or five, something like that.
And that's kind of fun, but
eventually it'll fade unless you
really have some some traction
on something you're gonna get
that by usually putting more
time into it so yeah, for me
just looking at it trying to
study the problem Look, look
like literally look at the data
set. See what some of the things
are, try some ideas begin get an
understanding of that data set.
That's always what's fun to me.
So that would be my that's my go
to on all these sorts of things.
Even though the Santander's and
all those kind of things, just
trying to what are we solving
what would work and developing a
mental model maybe that I guess
that's that's the truth there go
to studying the problem and
developing a mental model of
what I think would solve that
problem so that I could
illustrate that almost in words
to somebody whether I've tried
an algorithm against it or not,
you know, preferably even if I
have it, this is what I think
because then you have this sort
of way of you can gauge your
expectations when a feature is a
total miss. So I think the
problem would be this way it is
or it isn't. And once you once
your models are going forth, and
you see the feature importance
or whatever it really is to see,
whatever your favorite way of
trying to figure out how useful
these features are, you're gonna
start to see some of those
initial that initial perception,
it goes away, you know, it's
either wrong or just though this
problem isn't that way, because
they framed it this way. So you
learn a lot of that by having
again, I think this sort of
mental model of how you would
solve the problem, what you
think the problem is, how the
problems can be solved. And some
of them are very difficult to do
that. So they really do not
necessarily require industry
expertise, but it's just hard to
understand the data sometimes.
But so I guess I prefer the ones
where you sort of can't them so
I think one of the last ones I
was sort of high up was a
Microsoft malware one and time
was a big deal and understanding
all the way that they release
the software and all of that.
And so a lot of my model was
really just trying to figure out
how, how the data changed as new
releases would come out and how
you're, you know, a lot of
people spend time on the timing
of all that. But I based a lot
of my model based on some of
that and what I had observed for
the training set. And I felt
like I understood that quite
quite a bit at that point. So
that was, I guess, I feel I feel
in control. I guess when I do
that, too, even when sometimes
your mental model doesn't really
work out, or it's just blown
away by the machine learning
several times. It doesn't help
you in the end, it feels like it
doesn't help you in the end, but
I think it probably does to some
extent.
Sanyam Bhutani: Okay. Now,
fortunately for Kagglers,
there's one less person to
compete with, he's not active as
much.
Unknown: I am waiting. Yeah,
yeah. So um, yeah, there's a few
sort of, yeah, there was a bit
of it. It's a sore spot, and I
still love Kaggle. You know,
that's okay. And I have had some
of these discussions a little
bit like, you know, okay, you
know, I was 33rd at one point,
and I'm not going to get back to
that and I've really probably
time to put into to get there.
Got the Grand Master thing. You
know, what left is there to
achieve? You know, it's not i'm
not going to reach for number
one. I'm just not that. Not that
good. And so, but it's just so
fun, I think and so are you he
was trying to say, well, yeah,
you get away from it a little
bit. This is years ago, but it
is, I still want to do it, you
still want it. And I think, you
know, learning deep learning
better is something certainly
that's, that's something I can
do with it, but I am sort of
waiting for one of the tabular
ones to come out. And in the
meantime, I think I the
competitive drive is still
there. And so you know, Kaggle
is certainly the main show, but
you know, these as I refer to
them sometimes like off Kaggle
kind of competitions and I've
probably kind of sense you know,
analytics Vidhya, crowd
analytics X is a recent one
night sort of seeing and having
fun doing those and they all
have different perspectives, you
know, it's not Kaggle you're not
going to have the same kind of
field to compete with but but
they're still fun they've got
there's a distribution of a
leaderboard, and while it might
be a little easier to hit the
top and some of those it's it's
still fun. And the difference,
particularly with analytics
video early on, was that they
would do them quickly. And so
you know, where I don't
necessarily, I can't always find
the time to grind it out for two
months on a Kaggle come
petition, you know, doing one in
a weekend, like, I can't find
the time for that one and you
can wait for a few hours, you'll
know if you have traction on
that problem and you want to
keep pursuing it or not. So that
was kind of fun doing some other
ones. And I find that that you
know, really, that those those
first stage, again, it does come
into that mental model, getting
your tools to execute what you
think, and get a quick riser, I
guess, is probably a little bit
more of my personality and some
of these, and so just hold on to
the end while everyone else sort
of catches up and see if so,
with these short competitions,
that's everything, you know,
what can you do in a day? You
know, that's exciting. And so
and and I found, you know, that
that's usually the way I would
start hiring people is that you
know, what you can do in two
months is interesting, but
standing on the shoulders of
giants is sometimes hard to
separate shoulders of other
people at least I'm not sure
when we say ourselves but not
me. But you know, the Kaggle
kernels there's so much to build
off of the ideas are out there,
you know, a lot of that just
it's just not going to the
positions. I've been hiring for
what we need is someone who's
confident in looking at a
problem quickly and acting on
that and really knowing what
they can do and then a diversity
of approaches. So when you start
to see people consistently
hitting leaderboard quickly
before some of those ideas
aren't out and aliens video, you
can see that very quickly. There
are all that it's what you can
see all the time. So that's
where I spent some of the time
my phone almost embarrassed
about that sometimes because
yeah, the old ones are just so
hard to do, but, you know,
participated here in there. So
it'll come back we have March
Madness coming up pretty soon,
so I'll get involved with that
one. But yeah, I do, too. The
time is freed up so my second
son took a lot of that time
away. But he's, he's, he sleeps
a little more at night. So I'm
ready to get back in but I'm not
going to be quite quite what I
was before.
Sanyam Bhutani: Coming to where
you are currently solving
exciting challenges at h2o.I,
can you tell us what tasks are
you working on? And how does
your Kaggle experience come into
the picture?
Mark Landry: Sure. Yeah. The the
immediate stuff is pretty
exciting. As much as I say some
of the, you know, the deep
learning, you know that those
aren't my strong suit on Kaggle
you can clearly see that. But
you know, I've learned a lot. We
have Brandon on the team. So
it's been great having Brandon
on the team for for pushing four
years now. You know, and he's
diverse in both sides. You know,
he's excellent at both sides the
problem and then kaggle keeps
you current on a lot of you
know, the new models. So what
we've been doing lately is using
a variety of machine learning
techniques to look at PDF
documents, and this is coming
info you can think of it as
invoice reading, there's a lot
of interest doing that right
now. So we've got our kind of
own brand of doing that. But
we're doing it specifically for
a different context. We have a
different twist on that where
it's not just straight up in
with reading. So we're doing
there's a few little bits of
machine learning that come in
here and there. And I think the
part of Kaggle is that that I
guess, helps with directly this
problem and certainly other
problems we've done over time.
Is is feeling confident again,
in how you approach a problem?
How do you break up the problem
so we've got we've we've
approached this problem in three
different ways. And trying to
get your predictions out at the
end of the day, like sometimes,
you know, picking up a Kaggle
competition can be a lot of code
to get from picking up the data
set to actually a compliance
submission. And eventually those
kernels will come out and you
can borrow all that but you
know, if you really, those first
few days, sometimes it can be
like, oh my goodness, now I
gotta do this, I gotta do this,
I gotta do this. There's a lot
of steps sometimes you know, in
in in your confidence encoding
you can get through this pretty
quickly usually, and maybe cut
corners here and then put them
back later. So you know that bit
all comes together I think of
just having this pipeline that's
a bit difficult to see what all
the pieces are, we have some
sort of bizarre, you know, you
don't just set a model loose on
some of these invoice tasks.
What we have to do is associate
pieces together, but they're not
obvious. So we've tried some
deep learning approaches that do
it an image recognition like
summit segmentation models and
with that work some some cases
in some cases, we have NLP
models that do OCR and convert
it more text mining. So there's
diversity of models that are
going in there, which is kind of
fun. But there's a lot of,
there's a lot of post processing
that comes in, you know, post
processing, sometimes in some
competitions is one of my
favorite parts of doing is just
tweaking the numbers a little
bit to better fit the
distribution. Okay, the model
does this part. Now I'm going to
kind of take it further, but
usually you've only written, you
know, 10 lines of code. To do
that here you know, we're
running a lot of code to sort of
fix up every last little
situation. So I think in this
case, it's really paying
attention to every little
prediction to an extent that
sometimes you don't get with
Kaggle. And then that's what's
fun with some of the March
Madness ones is that there's
only 63 games that get played is
basketball for those don't know
that one. So you're predicting
all the possible games that can
happen, but only 63 are going to
happen. And they happen very
slowly. And you get a feel for
oh, no, I hope Michigan State
just to pick a team doesn't lose
here because they'll go down or,
you know, you feel your
predictions like why did I
predict this? Why am I
predicting 92% that Lou was
gonna win. That's ridiculous.
You know, oftentimes in machine
learning, you're issuing so many
predictions, the downside of
having, you know, it's so easy
to get a quarter million million
10,000,020 million records is
you just issuing 20 million
things, you know, one here there
is off, and you don't really
feel that the pain of that. So I
think the problem we've got
right now is where this right
there and so we're paying a lot
of attention to how these models
can fit together and things like
that. But I think the Kaggle
experience is helpful for
sitting in position, I guess, as
part of what we do with the is
to listen to the business needs,
you know, what do we want to do
here? And what's the is that
possible with the datasets we
have? You know, how can you
contort that so framing
problems, you know, that's a
really important part Kaggle
will do that for you. So again,
if you don't have your blinders
on, you'll realize that you
know, it, a lot of your work as
a data scientist in the field is
done for you by Kaggle. They do
so much they choose the metric
they choose the test sets,
there's a lot of important
things to learn there, you can
make mistakes doing that, you
know, and and for all the things
that you know that you know,
occasionally they have some
leakage, things come out. It's a
hard too hard task. But then
setting up the problems taking
the datasets picking a problem
that makes sense, all that kind
of stuff. So there's a, there's
a lot of ideas that just don't
make it past the concept phase,
because know that it just won't
work, the timing of the data and
that kind of stuff. So I think
again, iterating through Kaggle,
doing a lot of them, just give
you in that frame of mind where
you can see problems, what they
are, if you think, again, of
what these problems really are
with the earthquake problem,
what's the data we've got? What
are they trying to solve? How
does this target actually work
out this way? Is this the you
know, the way it's really
happening? You know, that those
those that meant that
understanding of the data set is
pretty crucial, certainly in how
you work as a prescient
practitioner, because a lot of
those problems are going to come
at you and they're counting on
your expertise of framing that
problem. So you want to be good
at doing that and think of all
the options, but also know when
to say no to some of these
things, too, so.
Sanyam Bhutani: Okay, now
zooming out to your journey at
h2o, you've been at h2o for a
few years, how has your work
evolved? You were already
involved with the framework
before you go into a very
familiar with it, how has the
journey been since?
Mark Landry: Yeah, you know, I
mean, yeah, no, no, I mentioned
this a little bit on your
podcast with him a couple months
ago or, you know, it was the
African soil competition and
village exactly Karna says so
there were a lot of people they
were you know, using that to
show a stros neural nets and so
many people were following along
I was following along David P.,
who was working with me at the
time was following now he's with
a finished well, we were
following along it's really cool
to see neural nets on a CPU like
legitimately deep learning
happening on a CPU and h2o was
making that happen and in our of
all things, because it's remote
controlled. And so that was
really cool. And so when that
competition finished, you know,
I did pretty well. And and it
was a lot about this framing of
the problem. So and that was one
of the things so I know it kind
of messed something up on that
one. You had to treat a lot of
samples in the same region, all
within a fold because that's how
the test I was going to happen
to you. So who is your
accidentally leaking
information. I learned that when
you don't do that, and so this
is one of these conscious
decisions, you definitely have
to make, you know, Kaggle makes
it for you. They didn't they
didn't make that clear. But you
could figure it out if you
looked at the data set. So
that's where I really paying
attention studying these things
and really thinking about it.
Because sometimes you can say,
well, is this this way? Or is it
not? And the data will tell you
a lot of times our submission
will tell you you look at it
either way. But yeah, so that
when I reached out to our no,
because it just seemed cool. You
know, he was really actively
participating. So with Joe
Vitale, who's also now at h2o,
nobody wasn't at the time he was
doing, I think, Domino. And so
there is this, this fusion was
happening already. And I just
sort of luckily got to help out
with that. So yeah, I had no
intention of really joining h2o
when I just wanted to team up
with our no other that would be
really cool to team up with, you
know, developer like that and
really get into the framework
and are ours amazing. And so
that was fun. And I remember
just talking through one of the
problems I was on the weekend
and click click, click on one of
my GitHub things I was just
using as communication. I
thinking I'm talking to our know
a little bit, but the whole team
is watching what I'm saying, I
guess in Cliff says, no, that's
not how random forest works or
something like that. So whoa,
and cliff is the original CTO,
and wrote a lot of h2o three.
And so whoa, you know, and, and
so that was fun. Did that for
quite a while and eventually did
join the team. So yeah, very
familiar with it, at that time,
very much wanting to just kind
of help out and try it. It was
really neat for an hour user,
especially to really handle some
of these data sizes. I really
there the GBM was, was, you
know, much better than what I
had been using an R and I think
fortunately, when I joined, we
started get it, catch it up to x
g boost at that point, as a sort
of put it. So those were those
were fun. So a lot of the
journey that for the first half
years, though, was, you know,
some of it, you know, it was
sort of billed as being, you
know, competitive data
scientists. So I think what
happened LinkedIn and still, I
guess, technically my job title,
I was doing kaggle competitions.
You know, a lot more rather than
I was prior to joining h2o, so
early on, I was yet definitely
legitimately spending some of my
85 not all of it, but some of
it. So it's legitimate. So doing
that, that was fun, but you
know, trying to latch onto
something at the time, it was a
show so small, it didn't seem
you know, doing Kaggle full time
was not really a reality. But
what it really did, and that was
there was really fun, just as
much as I said, like taking an
interview, and talking
exclusively about Kaggle
problems. The same was one that
being true talking to our
clients, you know, prospective
clients, just taking a call, you
know, again, just having good
problem solving ideas,
understanding a problem, be able
to jump in on some of those
calls and listen to some of the
data scientist or data engineers
on the other end of the call
that was was really fun. So it
was fun to take some of those
calls and just try to help walk
people through find a fit with
h2o or just sometimes just walk
them through some maybe mistakes
they were doing or things they
were, you know, not seeing and
see where we can help them. So
yeah, that was a lot of fun. Now
it's working on some pretty
specific problems. You know, the
last several years really but
those problems change shape and
so we've seen a lot of the
Kaggle style algorithms are in
there with some of the products
we've been able to produce the
last couple years so.
Sanyam Bhutani: I think Sri
mentioned this on Twitter, auto
ML was named after you, auto
Mark Landry, can you tell me
more about that story, Because I
wasn't a part of H2O then;
Mark Landry: He wasn't kidding.
Yeah, that is a coincidence. But
yeah, sure. He made it seem like
it wasn't so now he is yeah,
this this at h2o world probably
2015 yeah, that was where the
steam dream Team sort of came
out and this auto ML idea and
now he means it and in
coincidentally, Dmitry 
arko's initials are deep le
rning so you know he had auto ML
and auto DL there so pure co
ncidence, but yeah, he he's he
s not entirely kidding when he
says that but no, thank, you kn
w Aaron's driven our auto ML pr
duct to to be much better si
ce those days. I was I was or
ginally tasked with the auto ML
for a few weeks and then sort of
things changed and I'm glad a li
tle bit of a bit of a silence th
re that the path I was taking it
was as a data science problem it
s still interesting track a li
tle bit but I'm thankful that Di
itri and and are now an in
erest you know, it's great wo
k there on driverless got us th
ough the things that I just wo
ld not have really thought po
sible then. So Jarvis is re
lly cool. And I cannot say th
t I had that vision back in th
 again, late 2015 when the au
o middle term sort of comes up
 He wasn't kidding. And I was wo
king on that for a few weeks. A 
ittle bit. It was in kind of my
court but I'm thankful that ki
d of Dimitri got this what au
o DL is and driverless AI , as
we know it now, on track it aw
some.
Sanyam Bhutani: Okay, is this a
ritual at h2o, you will team up
with someone on Kaggle you smash
the leaderboard, and then you
invite them to join the team or
you continue doing that during
lunch? Is this a ritual?
Mark Landry: No, I guess yeah, I
have to dispel that one. So no,
I'm unfortunate. I guess I think
that I I've met some really
great people along the way. And
so, in, it's just a lot of
almost coincidence. You know, a
lot of times when I do these
competitions, I want to keep my
night parts close. Like, if I
have an edge, I'm not gonna give
that away. I'm not going to just
talk about it in some cases. But
the one time I did just all this
stuff unfolded and so, you know,
meeting us, okay, meeting
Marios, you know, and just
trying to team up with these
things. And I think, to me, I
guess it's he's sort of organic
kind of relationships that got
created. And eventually, you
know, the circumstances changed.
You know, we were lucky that,
you know, I just happened to try
and meet up again with Marios in
London on a work trip, and let's
just sort of got his mind going
about thinking of joining h2o.
And so are you with us arcade
is, you know, back and forth.
But he's always been, you know,
intrigued with that. And so it
was great that we could, you
know, that he was able to join
and done a lot of cool and op
stuff. You know, and Brandon was
just something different. I
legitimately wanted to meet
Brandon, but you know, I just
found like, wow, there's just
someone you know, out there, so
close to the Bay Area, and we
really could use someone like
Brandon, and so I said, hey,
let's meet up. But you know, I
maybe we could tell you know,
it's like, it would be really
nice if you would join. So
Brandon was kind of the first of
the Grand Masters I don't know
if the Grand Master was a term
back then but if so he certainly
beat me to it. But yeah, he was
kind of the first that joined we
had Dimitri join who I hadn't
met fortunately you know, a
little while past I wasn't the
one that got him to join but
it's kind of instrumental gets a
little bit in him joining But
yeah, I just met Dimitri at you
know, talk so i think but a lot
of the team ups I wish we could
do a little more you know, if
you really look you know, the
h2o team ups I think a lot of
those happened really prior to
joining h2o. So SRK and always,
always want to get backed into
it. You've seen a couple of our
team upset since then, but our
kind of best work with really
before before that.
Sanyam Bhutani: Okay. Now, this
is one of the another mysteries
from h2o, I think makers gonna
make. Can you elaborate on that
philosophy? What's it all about?
Mark Landry: Well, you know,
it's interesting because I'm
not, you know, that the maker
culture is one that I was
probably worked in with call it
industry and everything's
industry but you know, and just
the sort of the jobs like the
maker culture almost like passed
me by so I've had to sort of
learn it on the fly a little bit
too. And so I do kind of where
that makers make makers gonna
make shirt can proudly and so
you know, my son has one like
really my son when he was six
years old, my wife made him a
mini size makers gonna make so
but the maker culture makes you
know, it does it does make a lot
of sense. And I think as data
scientist, you know, that, that
that that does come naturally to
a lot of it should, you know, I
was just seeing one of the one
of the end people was saying,
you know, what the ways they set
up their problems is to some it
comes all different ways. They
have someone coming at us with a
problem, they have a cool data
set, and they think maybe could
solve a different problem or
they just think of a data set,
what could we solve with this?
You know, and you want to free
that that you know, people to
think that way and we just look
you know, it's just it's just
code we're writing it's not that
hard. So you know, make us gonna
make look at the grand things we
can make. True, you know, the
makers within h2o certainly. So
there's there's really two sides
of this, you know, the the
Arno/Prithvi you know, just 
ust amazing people just just 
t like is dizzying to sit next 
o or know and watch him like 
omething flashed through all t
e windows just he's a pheno
enal coder and I'm not for sure 
hey're almost the oppos
te. I'm a bad coder that just 
ets lucky enough with our data 
able a lot. But yeah, Arno is ju
t phenomenal and the passi
n persistence that he has so yo
 know that that maker thing
and and that's it exactly in hi
 like look what he has creat
d you know, he's created some 
eally great stuff and then in tu
n you know, we can enable other
people to make as well so maker
 going to make you know, take 
his software use it go make 
omething great you know that'
 that's I was it was next to ou
 know when he you know, someo
e was showing us they were doing
some legitimate cancer resea
ch using h2o's you know, deep 
earning that was Arno's reall
 the first big project with 
2o. It's amazing. You know,
it's it's really amazing to se
 this tool you build someo
e who has gone off and used 
nd made to make the world bette
 a better place so we're pleas
d to trying to do that. So you k
ow that's that's to me that'
 really my interpretation Sri p
obably has it you know maybe a d
fferent take on that I don't eve
yone may have their own kind 
f take but you know maker is 
oing to make it just really ju
t let's free everyone up to use
their creativity. You know, som
 of the things that Prithvi h
s done there's just you know, 
low I still like slideshows
GUI for for a show three is is 
wesome, I really like it and 
e just kind of did that as a si
e project you know, in the drive
lessAI interface these things.
We have some people who ar
 phenomenal and but you know,
tireless workers You know, he 
as an immense knowledge of 
ou know, visual elements but 
hat the you know, the data back
ends he's very you know, all kin
 of the reporting systems data 
arehousing he knows it all 
e knows all that kind of stuff
 So, and same with you know, all
of us are tackle helps you stu
y kind of what the latest and gr
atest is it points us to the rig
t direction, but you know, the
software world's not quite alw
ys so called we're in. So you kn
w, we have some phenomenal pe
ple that really, it just comes
natural to them, and they're f
ee to make.
Sanyam Bhutani: Awesome. Now,
this would be the last data
science question. If you were to
give one best advice to someone
who's just starting on Kaggle
feeling completely intimidated
has no clue what to do, what
would be your;
Mark Landry: Yeah, I did. I
certainly get past that. So you
know, get past that, to me would
be just just get a submission in
there, you know, make that
submission. It depends on really
where you how you can take that
question a few different ways.
But yeah, the first barrier is
thinking Kaggle and being like a
Kaggle lurker or something like
that, that pays attention a
little bit, but just just go in
there. Just go join one, see
what happens. It's so easy, you
know, csvs are nothing fancy.
There's a few competitions where
you have to work quite hard to
get a valid submission, but so
many of them and if it's a live
competition, I find personally,
that's the most fun. But look at
the wealth of historical
competitions that are out there,
you know, and you can submit to
almost all of those two, you
know, you can see where you'd be
at with some of those other
ones. So whatever your kind of
take if you're if you're
intimidated by a leaderboard and
all these people attacking this
problem actively then maybe you
seek out an older one or
something like that. But, you
know, I would really say get an
account go sign in and you know,
go go put a submission in you
know, make it all zeros make it
the average, you know, and just
just improve. It's a it's an
iterative process to almost all
of us so I was the one meetup I
got to attend. One of the done
the Bay Area was Francois chalet
was doing, you know, something
about keras was pretty new at
the time, newer, I should say,
at least and so, you know, he
put some really cool you know,
that the backbone of really what
he was thinking of, in trying to
do this is common with software
in general, we build things on
top of things on top of things.
And so he says, you've got to
free people up so that they can
experiment, you know, about 40
times a day, you know, you're
running experiments, you know,
you'd be as fluid as possible.
So, there's different you know,
one part of that is this
iterative nature like you know,
you is usually not best to go
and solve the problem and plan
out your full solution as much
as I say you want this mental
model around problems. You don't
want to solve the whole thing
you don't want to sit there on
the sidelines for most of it and
develop this idea I think I'd
want to do this, I think want to
do is just go in there and do
it. And most people will have
better results if they can chip
away at that problem in bite
sized chunks. So the first thing
is get the data in, go submit
something really basic and then
just iterate from there and see
where it takes you.
Sanyam Bhutani: Now, these are a
few interesting questions that I
think I asked from the AMA,
first one is what what's your
favorite sports?
Mark Landry: That when I played
baseball early on, but it's
definitely a later age in
American football. So I do like
international football soccer as
we call it here in the us quite
a bit but but no American
football it hopefully we can get
some of the player safety things
out of the way. Nice things John
was doing. Hopefully it will go
and help the game. So that's an
unfortunate side effect. We're
learning a lot more about both
footballs really. So it's a bit
I feel a bit guilty saying that
because that's a true it's a
reality but nonetheless know
that that this sports the game
watching it, yeah it's it's
Dallas Cowboys have been my
favorite teams since I lived in
Connecticut and yeah, big,
pretty big football fan.
Sanyam Bhutani: Okay, this is an
interesting one if you could be
a superhero who would it be?
Mark Landry: You know, I, I
think for me it could be several
maybe I mean, I think one of the
trade of a superhero I think I
would look most forward to
something almost simple little
bit but you know, flying so it's
been a thing of mine for a
while. So you know, we could go
classic and go Superman or Iron
Man or something like that, and
not really sure. And I'm not
even my super knowledge not that
great. But I think you know, it
would be exciting to you know,
to have the ability to fly of
course and make cool things so
why not go with it?
Sanyam Bhutani: I would have I
would have said I want to be a
Grand Master. You're already
one.
Mark Landry: Yeah. We're not
superheroes, though. watching
this, you can do it too. So it's
the passion, the persistence, I
think, to keep it through a lot
of time put into it, you can't
really ignore that one.
Sanyam Bhutani: Last one, if you
had a time machine that you
could take anywhere for one day
with and when would you go?
Mark Landry: Yeah, let's tell us
what I'm naming so that it's a
hard one, actually. So I think,
you know, I mean, this is an
awful answer. But, you know, I
mean, just, I like where I'm at.
It's one of these things, I
wouldn't really change so much,
you know, so many of these. I
guess, I feel like a lot of, you
know, luck has gone into so many
things, you know, and then maybe
would happen a different way or
something like that. So, I
think, you know, maybe to cite
an oddly religious sort of want
to make go back to kind of zero
bc times or something like that.
With something like that, but,
um, you know, pick a interesting
sporting event, but what is that
you're just watching a sporting
event. So yeah, I don't have a
great, great answer for that.
But I think that would be
probably be it.
Sanyam Bhutani: Okay. Mark,
thank you so much for joining me
on the podcast and thank you for
all of your contributions to the
community.
Mark Landry: And the same yeah,
it's exciting to get these
together so I'm happy to be part
of this and yeah, definitely the
the honor is mine on this, glad
I can, hopefully can help
somebody out.
Sanyam Bhutani: Thank you so
much for listening to this
episode. If you enjoyed the
show, please be sure to give it
a review, or feel free to shoot
me a message. You can find all
of the social media links in the
description. If you like the
show, please subscribe and tune
in each week to "Chai Time Data
Science".
