- Hello and welcome to
Experience Weekly Data Talk.
A show where we talk to data science
leaders from around the world.
Today we are talking
about how data science
is improving e-commerce and we are super
honored and excited to
have Dr. Liangjie Hong,
who is the head of data science at Etsy,
which is one of my favorite online stores.
He previously was the senior
manager of research at Yahoo.
He received his PhD in computer science
from Lehigh University and
it's just an honor, Dr.
Hong, to have you today.
- Yeah, thank you for having me.
- So, can you come share
with our community,
kind of your path that led you
to start to work in data science?
- Yeah, sure, absolutely.
So, I first studied sort of
machine learning data mining
in my graduate school, then
I slowly developed interest
in machine learning and how
machine learning can apply
to real-world problems, at that time,
probably about 10 years
ago, social networks were
very popular, so then majority
of my dissertation work
is about how to apply data
science to social network.
Then I came to Yahoo
Research, so I basically
spent quite a bit of time
to apply cutting edge,
machine learning techniques for
a wider range of problems.
Then I came to Etsy where I spent a lot
of time studying those
problems for e-commerce
as well as how to interact with
product design and product managements.
- Very cool, so did you...
Did you always know...
Because I remember even ten years ago,
the term "data scientist"
wasn't really around.
I don't remember hearing it.
- Right.
- So I'm kinda curious, like...
You were in love with computer science.
- Right, right.
- You were in school.
Where were you kind of...
Were you planning on teaching, or
did you wanna get out of academia and
actually apply what you were learning?
- Right, so, "data scientist", this
buzzword, actually came
around in 2011, 2012,
and actually that was funny because
during that time, I was an intern in
LinkedIn, and I believe LinkedIn was
first a couple of, you know, places
where people coined this
term "data scientist".
So, yes, before that, I was, in general,
interested in data-mining
and machine learning.
Those are more towards
algorithm parts, right?
And so, like, "Oh, this
algorithm, that algorithm.
This model and that model."
I think the beauty and the passion of
the data scientists is really, we bring
models and real-world problems together,
and try to sort of help out the business,
and so on and so forth.
- So, could you tell us a little bit of
now that you're head of
data science at Etsy,
can you kind of talk about your...
The work you're doing now?
- Yeah, absolutely.
So, we have roughly 15 data
scientists in the team.
So, we have half the
team in San Francisco,
half of the team in Brooklyn, New York,
which is our sort of headquarters.
Basically, we are part of an engineering
organization where the
goal of our teams is
to build engineering-quality, kind of,
end-to-end machine learning solutions to
a lot of our products inside Etsy.
For example, we build up
machine learning solutions for
search ranking, where you sort of
type a keyword that we want to
return the most relevant result to you,
and we also develop algorithms and
solutions for our recommendations.
So, if you come to Etsy, you see different
modules, and how we can recommend the
most relevant route to you.
So this is a mixture of engineering, plus
we work very closely with
our product managers,
designers, US researchers, to really
flesh out what is the
best user experiences
to present to users.
- Yeah, I mean...
And for those who have never gone to
Etsy.com, you definitely
gotta check it out.
There is just so much amazing things to...
I dunno.
For me and my wife, when we had kids,
we were shopping on Etsy every week
because there's so many
cute products there.
- Right.
- And, I dunno, I was
even browsing yesterday,
and I was looking at the
home and living section,
and at the top it showed you how many
items are in that section, and I counted,
it said there was over nine million items
just in the home and living section.
It boggles my mind.
- Yep, yep, gladly, yeah.
So, we have like 39
million active listings.
We have more than 40
million active buyers.
We have 3 million active sellers, so
it is really a very, very large-scale
market places, I would say.
Of course, comparing to Amazon or eBay,
we're still sort of small, but
in terms of unique goods,
hand-crafted goods,
we're definitely a very large marketplace.
- Yeah, so, how does...
With so many millions of items,
like you said, 39 million, plus the
hundreds and thousands that are
being added every single day, how
does machine learning help and
assist people find what they need?
Because there's so many things that
you could possibly find there.
- Yeah, exactly.
So, this is an ongoing challenge of ours,
so A) is when we process
the, like you mentioned,
the hundred-thousands
of new listings per day,
and we have to tag them,
we give them the label,
we have to sort of categorize them into
different categories, and make sure
we understand, "Oh, this
is female, small size,
wedding dress" versus that
is "ring" or something,
right, so we use a lot of machine learning
techniques to basically process the data.
A lot of data is not perfect, and
a majority of data is very, very noisy,
so we have to make a lot of effort there.
Now, after that, it's basically, we're
ingesting into our sort of search engines,
or recommendation engines where it is
a really challenging job to find the
things that you would
like to interact with
in the future, right?
So, typically we have, in either case,
search or recommendations, we need to
search among millions of listings and
narrow it down to a, let's say,
several thousand, let's say one with
two thousand candidates pool, then we
use a more advanced machine
learning to basically
rank those things and make sure that we
recommend you the last
five or six items for you.
So, it is a really challenging problem.
- I'm kinda curious,
because I know when...
If you're trying to
sell something, you can
write your own descriptions and maybe
categorize and tag, and
I'm kind of curious,
how well do humans tag versus
machine learning tagging?
Because sometimes people might mistag
because we make errors.
- Yep.
So, human tagging is
definitely very useful,
because you need the
bootstrap that they are.
Users need to give us the
input, "Oh, this is this size.
That is that category."
Or, "What's the material of this product?"
The machine learning is to solve the
scalability of each, so we have very, very
tiny portion of the data, that's basically
what can be tagged by human beings.
So, in general, for that portion, I think,
the human being...
The sort of human quality
is very, very high.
And, of course, machine
learning algorithms can
be as good as the data, right?
So, the human part.
But for the vast majority of the
non-tagging world, non-human tagged world,
that's where the machine learning
models can be put into play.
Yeah, so that's why we use these models.
It's not necessarily they're more
accurate, but it's more like they
can be applied in large scale.
- You mentioned that
there's a lot of noisy data.
Can you kind of explain
what you mean by that?
- Well, A) is that...
So, from our side, we definitely want to
know more about your product, right?
So you upload a product listing to Etsy,
we want to know, okay, what kind of
materials you have, what's the color,
what's the size, where are some of
the raw materials coming from, right?
So, if there is a brand...
All kinds of aspects, right, and
in fact, we probably have hundreds of
such aspects, or such attributes that...
- Wow.
- But imagine you are a seller, right?
So you just want to upload a photo, and
just list it on the website.
- Yeah.
- That's a very, very, sort of
cumbersome kind of process.
So, we need a balance between user
experience and the quality of the data.
So, usually, we ask some key elements
that you need to fill in, but you
can leave the rest of the things blank,
then we can ask some other crowdsourcing
kind of support to help us to tag.
Now, if we talk about the crowdsourcing
because they are not the owner,
they are not the seller of those products,
so they may misunderstand the things.
So, that's where from multiple places
that the noisy data comes from.
- Mm.
Yeah, I can't even imagine the amount of
data that you're having to collect to
help make the user experience better,
especially with all the
human involvement and
people categorizing their own content,
writing descriptions,
and then, on top of that,
you have the various machine learning
algorithms helping to
sort all through that.
Can you talk a little bit about some...
For data scientists who are interested in
getting involved in e-commerce,
can you kind of talk about what are
some popular machine learning algorithms
or techniques that are used in e-commerce?
- Yeah, that's a great question.
So, I always talk to
interview candidates that
machine learning e-commerce
is extremely challenging,
and the interesting part of that is that
there's not too many off-of-shelf
algorithms or models they can use.
That's actually the beauty of the work,
is that you keep exploring, and
a lot of things you can borrow, you
of course, found
traditional sort of domains,
but there is a lot of things you
need to innovate, right?
So, I usually give this example, for
example, if you are on Netflix, and
you wrote to recommend
movies to users, and
so, let's say you watch House of Cards 1,
and you say, "Oh, let me recommend
House of Cards 2, House of Cards 3, okay."
That's okay for Netflix.
There's usually some kind of Netflix
recommendation system that will do that.
But imagine that's e-commerce, right?
So, you just bought a camera, right?
- Right.
- And then we start to show you other
cameras, and people are going to complain.
And in fact, for us, we
have a situation where
a customer from Britain purchased a
wedding dress, then we keep showing
the wedding dress, right?
So then, this person actually complained,
wrote an email to us.
(laughter)
- Yeah, I bought my wedding dress.
- "Stop showing my wedding dress."
You know, you buy chairs, and we're
going to show you chairs.
So, that's sort of the phenomenon of
machine learning e-commerce.
Meaning, like, there
are a lot of problems.
We need to rethink, repurpose for
the e-commerce domain.
- Yeah, I can see that, what a
huge challenge, because for certain items,
like a wedding dress,
you just want one, right?
- Yeah, for a very short time.
- Yeah, and you don't need to see
any more after you choose that.
That's a huge challenge, and so,
I guess, for certain types of products,
there has to be rules in place,
like if someone buys this, probably
not a good idea to show
other similar items.
- Well, I mean, even coming up
with those rules is challenging, right?
So, we're talking about 40 million buyers,
and all buyers are very different, right?
- Yeah.
- Because some of them might be resellers.
In fact, we definitely have very high
sort of engaged high-volume buyers.
Not the wedding dress, per se, but
they keep buy wedding-related stuff.
- Mm.
- You can imagine that's kind of
the main resell, other purposes, right?
So, if you have a hard logic saying
you purchased wedding-related things,
so then we just stop showing you,
for the next two weeks or next two months,
those users may jump to us, saying,
"Hey, what's wrong with your algorithm?
I want to see similar things.
I think I showed you strong enough
my personal preference, why don't
you take my personal
preference into account?"
So, it's an extremely
difficult problem to solve.
- Yeah, I could see that.
I could just see how complex the
work is for you and your team.
- Right, so...
And a lot of challenging
problems with e-commerce
is also about the sparsity
of the data, right?
So, because a lot of users go to
Amazon or eBay on a daily basis,
so they actually give a lot of
opportunities for this site to be
able to exploit their
personal preferences,
then, so forth and so on, right?
For Etsy, a lot of people come here to
buy gifts, right, so
to buy a special thing,
that's things for their special occasions.
So we do have a lot of buyers.
They show up in, say,
Thanksgiving, or holiday season,
right, then they disappear
for the whole year.
They show up again in
the next holiday season.
- Okay, yeah.
- Right, so then you can imagine, like,
"Oh, we don't have too
many data points," right?
"The last batch was last year."
Are you willing to
utilize those data points?
Or do you say, "Those data points are
outdated, so we don't have too much
information about these guys," right?
So it's a very difficult situation that
where we need to provide personalized
and engaging experiences for these users.
- Mm.
I'm kinda curious, also, obviously there's
people who do trolling
and do inappropriate
things, and I'm kind of curious,
how do you help prevent and care for
the Etsy community by
making sure there's no
offensive content, you know,
pictures being uploaded?
- Yeah, that's a great question.
So, we have dedicated teams to basically
vet through a lot of shops and sellers,
and a lot of content that we have on-site.
We also have machine
learning algorithms to
scan fraud stuff, and
even money laundering.
- Mm.
- Yeah, so there's a
mixture of a lot of human
investment as well as
machine learning algorithms.
- Can you talk a little bit about that
machine learning being
used to prevent fraud?
I'm curious about that.
- Right, so basically,
we have teams trained to,
over the years, look at user activities,
and look at how people
want to exploit the site,
or want to exploit a lot of the rules that
we put in place, and
use those behaviors and
train our models such that
we can detect those things.
It's a never-ending process, because of
people changing their behavior.
They invented new games, and then we
have to catch up on those frauds, but
yes, we utilize that to do a lot
of fraud-detection problems.
- You know, I'm also curious about
how machine learning is helping shoppers
when they're...
You know, I think more and more people are
looking through their mobile devices, and
I think behavior is
sometimes different on how
we use mobile devices
versus desktop computers.
- Yep.
So, actually half of the traffic to
Etsy is from mobile devices, and
we also understand the behaviors on
mobile devices are very
different from desktop.
One thing, I mean, that's
probably special for
Etsy is that people tend to browse and
explore on their mobile devices,
but eventually check out
from desktop machines.
One thing is that, a lot of things are
actually very expensive.
It's not a commodity, right?
So, you buy a painting from the UK
that's probably going to be $70 or $80,
so, then, a lot of people want to
make sure that transactions and
all the things are correct, so
that's where they use their desktop,
but the mobiles devices are definitely
driving more and more traffic nonetheless.
- Mm.
Yeah, I think, just when I
watch my wife as she's...
She loves to browse, she'll browse Etsy on
her mobile device, and add things to
her cart that look interesting, but
then she'll go to her desktop to
actually make the purchase.
Is that what you see a lot?
- Yes, yes, and that's
a very common pattern.
I mean, on the other side, we are
trying to improve the checkout procedures
such that people feel comfortable
to checkout on their mobile devices.
- Yeah, okay.
Yeah, and also, it's funny, because
you were talking about how everybody's so
different in the way that they shop.
For example, when I
shop on an online store,
I usually do my research, look at
product reviews, and then I'll buy
within that 30-minute period.
I shop fast, and if I go to a store,
like a physical store, I'm
set to what I wanna buy.
My wife, on the other hand,
likes taking her time.
She will spend a lot of time thinking
before she actually will buy something.
I'm kind of curious, how does machine
learning kind of adjust things that
are shown, or "Hey,
it's time to buy this"?
- Yeah, that's a very, very good question.
So, I think...
Okay, so generally, I actually say to
a lot of candidates and a lot of
people who may think
about machine learning or
especially recommendations,
or search e-commerce,
is that, a lot of people...
Let's say you go to a
shopping mall, right,
so you go to the department store.
I would say not everybody is willing to
buy certain things.
A lot of people are exploring and just
walking around, and they
also still enjoy the
atmosphere, the environment, and
from the shop's perspective, they also
understand that not everybody is
interested in buying instantly.
They want to inspire you, that
maybe you are purchasing next time.
So, that's very normal in our sort of
offline kind of shopping experience.
I think the challenge is that, how
can we mimic that experience online?
I think we're doing a lot of...
A reasonably good job for the folks that
have a very, very strong shopping intent.
They know exactly what they want to buy,
they have exactly some kind of keyword,
and they just type that in, then checkout,
like you said, right, so...
- Yeah.
- Read the real checkout, and, you know,
it's all very strict.
And we have challenges, and I think that's
not only for Etsy, but that's for
the e-commerce across the board.
How can we really model a
discovery process, right?
So, say I come to the site, I have,
say, ten minutes to kill.
I don't have a very strong intent
in my mind, so what is this inspirational
kind of process such that we can
inspire people to purchase things?
That's where, I think,
the machine learning
can now be integral in the play, and
also a lot of innovations in machine
learning e-commerce should happen.
I think right now we are at the
very, very early stages of this,
because nobody has already defined...
There's, you know, you search the paper,
we search a blog, there's no such thing as
an e-commerce discovery model, or
an e-commerce inspiration model.
So, that really, I think, will
change the way people shop online.
- Mm.
It's fascinating hearing about
all the different ways you're leveraging
machine learning for e-commerce,
and all the challenges involved.
I mean, so many challenges that
you and your team are working through.
- Yeah.
- Daily.
I was noticing, I was on your website,
your personal one, and I saw you were at
a Big Data meetup recently, and
you talked about optimizing gross
merchandise value in e-commerce.
Can you talk a little bit about
what you mean by that?
- Right, so, that's one example
where I mentioned a little earlier that
we need to adopt the traditional
models to the e-commerce domain, right?
Traditional information retrieval, or
traditional search, a
classic example is Google,
where they optimize static relevance.
So, say you want to search Barack Obama,
then you have the Wikipedia, probably,
jump to that at the top, and you have
some other sites, and these ranking is
basically golden for every single person,
and the notion of the relevance is
sort of like building,
or generically there.
But e-commerce, it's different.
A) let's say you search Harry Potter,
and you want to buy
some magical sticker or
something, and I search Harry Potter,
I want to buy a T-shirt, so the
notion of relevance is personalized,
in general, in the e-commerce search,
and B) is that for the e-commerce side,
I mean, relevance is one way to
look at the things, but we optimize
revenue, which is called
gross merchandise value.
It's basically...
You can think of that as
expanding your revenue.
We want to optimize when
people search things,
so it's not only we want to provide the
most relevant result, but also
the result can generate the most revenue.
So then, we need to model how likely
you are to click on that thing, and
after you click on that thing,
how likely you are going
to purchase that thing,
and we also need to take
the price into account.
Okay, is that we recommend the things that
have higher conversion
rate but a lower price?
Or a low conversion rate
but a very high price?
Right, so you see all these trade-offs
and all these compromises that
we need to make, such that
we adopt the traditional model to
optimize incremental revenue
in the e-commerce setup.
- Mm.
That's amazing, I never
even thought about...
(laughter)
How you're placing products, or
recommending to different people based on,
"This has a higher conversion rate but
lower cost, but this item generates more
revenue for the company,
but lower conversion rate."
Are you doing just a lot of testing?
- Yeah, we do hundreds of AB testing.
Offline, we also do a lot of testing.
So, to make sure that all the algorithms
or all the models we put out there
have measurable effects, right?
We know that every single one we
put out there, what's the incremental
revenue it's generating, what's
incremental user engagement in,
that general thing, so, yeah.
- So, one of the questions I always
love to ask data science leaders is,
because we have a lot of people in
our community that are looking to
get jobs in data science, and so
I'm always curious, when you're
hiring someone for your team, what
skill sets and maybe even personality
types are important to you for
someone who's going to be good to
work in a machine learning team
specialized in e-commerce?
- Right, so, I always
also get such questions,
going to meetups and conversing and so on.
I think I want to
emphasize something that's
probably not super emphasized.
So, one is the ability to formulate the
real-world problems into
machine learning setups.
A lot of students, a lot of people who are
very interested in the field, they
tend to think machine learning or
data science is a basket of models.
A basket of techniques, and "I need to
learn these 20 models, I
need to learn these five
programming languages,"
and so on and so forth.
Those are definitely very important.
Those are hard skills that
you need to have, right?
But for us, one very, very important
thing is that, because we talk to
product managers, we talk to designers.
They are not necessarily folks with
machine learning and data sets backwards.
So then, we have to translate, if you
like this word, translate whatever
their requirements and the way they
think into machine learning setups.
This is a very, very
difficult skill, actually.
Because there are too many possibilities.
Like, one scenario you
can translate it into
five different setups, and all these
five difference setups might mean
different things, and may have a
different consequence,
and so on and so forth.
So, how can you think about this
is very, very important for us.
I think it's a key as to data scientists,
because this is where this kind of
scientist, or science part
is really taking place.
So, that's one very, very important
skill set that we are looking for.
The other part is very similar to this.
It is communication skills, right?
So, again, you invented
this fantastic model.
You sort of discover these
really good solutions,
but how can you communicate
with the shareholders?
Again, we are talking
about the shareholders that
have a very diverse background, like
product managers, designers,
company executives,
all sides, like students,
and so on and so forth.
So, how can you make sure the things that
you put up there can be summarized in
plain English words, right?
So, this a very, very important skill as
data scientists grow, and I think
it's going to sort of
help them along the way.
- Yeah, I think the soft skills
right there are so key, right, because
if you can't communicate it well,
you're not gonna get buy-in, or it's
not gonna be very easy to sell
it within the organization, and yeah.
So, I'm glad you touched on that,
because a lot of times people will
focus on the hard skills: the models
and the background in stats, or
the different program languages.
But, to your point, to get anything
done in an organization, you gotta
have that soft side.
- Yeah, so, right now
the hard side is already
emphasized enough, so I
think we sort of agree
on what the kind of hard sides are.
But the way I look at this is,
I see more successful
data scientists are...
They have much more mature soft skills.
They can maneuver inside
the organization, and
how they can really put data science and
machine learning as a driving force in
the organizations, so that's why
I've emphasized the softer skills.
- Before we end, I always
like to ask a series of
questions, and the first one is, what is
your favorite programming language?
- Right.
(laughter)
- It's like children, it's like children.
- Right, right, right.
So, I like Python quite a bit.
I think it's a very flexible and
a very good tool for data science.
- Okay.
And last question is,
what advice do you have
for our community who is interested in
getting started in a data science career?
- Yeah, so, one advice I would have
is to have patience and keep learning.
I want to share a very short, good example
is that we'd recently got a candidate
submit the things for our full-time
data scientist job, and that person
actually is from Julliard Music School.
- Really?
- His major, actually, he's getting the
masters of piano playing,
and he has a lot of...
All his reference letters are from
the Lincoln Performance Center, right.
So, I actually had to send an email to
this guy that said, "Look, obviously
you're not the data scientist role that
we're looking for, but if you really
think you have a passion about that,"
because this person also attached his
GitHub repository in his resume, and
obviously this person has a bimodal
kind of interest, alright, so daytime,
probably, he's a musician, but
free-time as a data scientist.
So, I had sent a personal email to
this person, encouraging
him to pursue the way.
So, that's where I want to give the
folks that advice, alright, so
even though, today, you may not
really tap into this industry, but
just keep your interest in place,
and one day, I think, there
is some good outcome of it.
- That's a cool story,
I can't believe that...
Somebody who obviously
is going to Julliard
and mastering in piano,
that's in another level.
- Yeah, exactly.
- Super smart, complex thinker, and
then actually wanting
to pursue data science.
That is amazing.
(laughter)
- I was shocked when I
looked at that resume.
- That's brilliant.
Okay, before we end, where can
everyone learn more about you?
- Where?
- Yeah, what website, or...
If someone wants to connect with you.
- I have a persona website, so
just search my name, and that's
basically the top one from Google results.
So there, you can actually check out
what we're looking for, like the
job description and so on and so forth,
and we also list a bunch of papers,
blog posts, the we post out there.
- Awesome.
And I'll make sure to put links to
your LinkedIn profile, so
people can follow you there.
And also links to your
website on our blog,
and for those that are
listening to the podcast,
the short url is just ex.pn/datatalk40,
and that'll bring you over to the
website where we'll have this interview
in video format along with the
podcast episode, and a full transcription,
and links to where you
can connect with Liangjie.
So, thank you so much, Dr. Hong.
- No problem.
- Fascinating talking with you,
and I hope you have a great week.
- Sure, thank you.
- Alright, take care.
We'll see you all next week on Data Talk.
