Thank you so much for the inspiring sharing, Dani.
Hopefully we can learn a lot from here as well.
Connecting to the many points that Dani had discussed,
I want to go straight to the theme, how is the potential research in Bukalapak?
As Dani mentioned before, to be able to have large amount of data, we can use our learning system.
And here, these are the things we have in Bukalapak.
One thing in Bukalapak, from the beginning, our principle is, just collect the data first eventhough we cannot use it yet.
For example, transaction, we have a lot of transactions, tens of millions of data transaction.
And every transaction has rich context.
So, not only we know how much is this transaction, and from whom to whom, but also from which city to which city,
for what type of goods, what price, how much is the delivery cost, even until this item arrived after what period of time.
And all of this data are very potential to dig and to make, for example AI that can predict and can help our buyers in terms of purchasing decision.
Then apart from that, apart from the data transaction..can we go on the next slide?
We also have data product, data product is even much bigger.
If the data transaction is tens of millions, data product can up to hundreds of millions.
These are all products that have been uploaded to Bukalapak.
And every product have, for example; the image, product description, title, price, and categories.
So it actually can be considered as a gigantic dataset.
Just imagine, we have dataset for instance; images that had been labeled into category of mobile phone or laptop.
And we can really build an AI that makes predictions from the images, object revolution to help our sellers find something that fits better with what they will sell.
And now, this is now the most widely used, this and our data fixion there will be one example of how we start building AI from here, can we go next?
Another example, well this is related to the dimension of natural language understanding.
Indeed we still have a lot of homework to do for Bahasa Indonesia compare to English, which already have good support for understand natural language.
But we also have very big data here, we have hundreds of millions of chat messages in Bukalapak.
So this is all that has ever circulated in our platform.
And everything is in natural language because people are chatting to the seller; “Hi, I’m looking for this item, is there any stock left? Do you have this one?”
We can really use those to build a system that can interpret all kinds of data,
even to answer high-level question such as, what kind of language style of a buyer that people liked?
That's from a high-level question right? And we can build based on the data we have.
This is highline also, so it's not just static data but also traffic data, it's a very large behavioral data.
We have more than 1,2 billion views to Bukalapak system; this is from mobile phone, from web, and from desktop.
And every one of these behaviors, it's got rich text too, for example what category?
An AI system that is truly beneficial for our users.
I'll give you a few examples; this is a live example of AI that already exist in Bukalapak.
We have product recommendation, why di I brought this? We have several active AI systems but this is the most that can be shared in terms of impact.
We used to have product recommendations like this, and this is based on elastic search,
so elastic search is the one that powered our search, and from elastic search it can find similar products with ours.
And it search based on similar titles, for example similar descriptions.
And here are also produce the same products, but this is boring right?
So boring actually, for instance if I’m as a buyer want to find other product, I’d rather search.
I will get the search result, no need for this.
While a good recommendation should be able to provide inspiration, offer alternatives.
I want to buy this, oh there is another one, this one is better, okay I’ll buy this instead.
Now the idea is, we have very large behavioral data, very large traffic data, where we can build an AI from this behavioral data to help our users.
And this is what we built, we built an AI for product recommendations from all the data we have, and this is an example of this AI.
Here you can the before and after, quite different right? If for example here we offer other alternative of the sewing machine,
people can see there are other models better for beginners for example, or other models that fit my budget better.
So this AI set out to transform this section from displaying similar products, so it inspires our users.
And it generates 50 Billion Rupiah GMP per month from this AI system, so this is an example of AI in Bukalapak,
even though we are still in the foundation building phase, we are still establish a strong division.
But like our tradition, we do trial and error, we A/B test it, we see the result, and there are already several that went live and generate value.
That’s why we want full commitment to establish very strong research team in Bukalapak,
because we have proven, we have gone beyond the theoretical phase, we have proven we can generate value from this.
So that was a little sharing from Dani and me.
So basically this has 2 aspects right? The first one, Google Deep Mind has already building something very advance, world class.
We have enormous potential, we have big data, we have started step by step to that direction, Dani mentioned earlier about structure, good organization,
how to structure a research so that it become more advanced than other researches?
And then there’s something about focusing in certain direction, maybe Dani can re-elaborate how’s a good structure according to him.
Dani: In my opinion, there are two options if we want to build a research division, the first we use Hybrid method,
whereas the research division is integrated with teams that require this technology.
So a researcher can also be an engineer sometimes, engineer can also be a researcher sometimes,
let’s just call it a Hybrid method in which there are small AI teams in various groups that require this technology, that's one of the option.
Other alternative that I personally think it’s better, if we really make some sort of research division whereas their work just produces new technologies.
This research division can do research that although in the next 3 months - 1 year ahead may not be integrated into any product yet
, but they are still given the opportunity, oh we might be able to use this technology in the next 5 years.
Because we never know, sometimes it come faster than we expected, the next thing we know, we’re discovering something completely new.
This group division’s only task is doing researches, there are also teams that help transfer technology
that has been produced by this research division to be applied to the products.
So there are 3 divisions; one is the researchers team, it can consist of research scientists, or research engineers,
whose main task is to work with these scientists to implement ideas.
Usually I could also have been an engineer, but there are many that are stronger than me in terms of implementation,
especially in the tens of thousands scalable.
So it’s good if the researchers got some support, maybe the engineers can pitch in ideas to make the progress faster,
you actually can implement it yourself, in the small scale, for instance about to be evaluated for the product scale,
there must be team that used to deal with these kind of problems.
And then the second one, the group that took this technology and help deploy it to the products,
I think this kind of structure is clearer, everyone has their own roles and everyone is hired to do what they really want to do,
and the technology transfers are also more understandable, than if being combined,
it makes it a bit blurry, and it can even make the innovation somewhat obstructed.
So the research is really short-term, for maybe we can improve the recommendation system for now,
but in the future there is no long-term vision if we lose the groups that really just doing exploration research, that's my personal opinion.
Ibrahim: Thank you, thank you, So I was thinking, I kind of curious, we were structuring these groups and there are 2 approaches that basically we can do,
first: one group have expertise in certain domain area, for example there is s group that is expert in computer vision,
there is s group that is expert in NLP/NLU, or better if the structure per problem domain,
for example there is about domain on how we can assist our buyers, there is about domain on how we understand more about our customer,
according to you between these 2 approaches, which one is the best?
Dani: Between the two?
Ibrahim: Between the problem domain or knowledge domain, the grouping.
Dani: Depends on how big the group is, I think it’s okay if the research team made into one, it’s just the technology transfer is better if it’s according to the problem domain.
If the group division that took the technology, for instance the research division of Bukalapak,
there is also the flight research that took the technology because if directly to the product,
it’s better the ones who understand the product, but with the assist of people in the research division.
So the people in the basic research division assist the people in the apply research division for the implementation to the product.
It would be better if the people in the apply research have the expertise in the product also,
it would be fine of people in the basic research made into one because the technology is similar, except for the one who has big size, and I mean huge.
Ibrahim: Okay, how big is big?
Dani: Around 100-200, one team can maybe handle 10-15 for people who mutually interested in the same field,
for example language team, vision team, reinforcement learning team, and so on,
if you only have 10, 15, 20, you ended up feeling exclusively if you haven’t got a lot of crew.
Ibrahim: You were mentioned about collaboration with the product side, right? From your own perspective as a Machine Learning Scientist,
what kind of form of support and collaboration with the product expected? Because most of the colleagues present today are from product development side.
Dani: It’s more convenient if we know what kind of technology the products need, so the people would know how to assist.
Collaboration that usually works were from people that really interested in the product.
Usually the scientist lack of understanding of product data, how’s the scale, etc.
So a smooth communication from the beginning is a must, we as a scientist must think what modal should we use,
because it depends on the product also, we cannot apply all the big ones but when it times to predict will take a long time, its not suitable for some products.
So it must be smooth communication from the beginning and the product team must really interested in using AI technology in their products,
so that the AI people will be enthusiastic in doing their job compare to them who half-heartedly because they’re not interested.
Ibrahim: Interesting, you were discusses about sitting together with the product side and generate ideas,
but that was before the people from research jump in and able to see what can they build.
Maybe you can share the flow of from sitting together until finally an AI product can be live in production.
Dani: There are actually two types, the first one: the research division produce something, for instance we have several researcher groups,
one of them is trying synthetic speech, and suddenly they’ve come up with a great modal, better than any modal in the world, it could happen.
The project started from their own interest in something, and then they develop it, they seek for it, and when it’s ready they can deploy it,
whereas they have the best modal in the whole world for instance, that was the first flow.
The second, from the product itself, from the beginning the product team has discovered,
for instance of they automated this part it would be easier for them to get a lot of clicks.
They discussed with us what they’re trying to do with the product, the product team elaborate about their product specifications
and the AI division can measured up if the product can be automated or not.
That’s how it usually started, and from then on the flow would be okay, and we can provide the data, and then process it,
until it can be process by the AI team, and then the AI team will trial and error, if fail they can iteration it over and over until they can see the improvement.
Ibrahim: Typically for how long? I understand depends on the complexity, but what’s the typical time frame?
Dani: About 6 months from the idea inception until you can actually see the product, if you work really hard, I might check on you in about 6 months later.
Ibrahim: Thank you so much for the sharing, Dani.
