[MUSIC PLAYING]
SPEAKER 1: Ladies and
gentlemen, good afternoon.
Please welcome Sudhir Hasbe.
[APPLAUSE]
SUDHIR HASBE: Hello, everybody.
I'm Surhir Hasbe.
I'm the Director of Product
for Data Analytics and GCP.
Thank you, for coming
to this session.
I know it's just after
lunch, or around lunchtime,
so I hope I'm not going to bore
you too much in this session.
We'll keep it exciting.
So let's get started
with the session.
The key thing is
most of the folks
in the audience are outside
know Google from the search box.
And the first experience
people have with Google is,
go to the search box,
search for a term,
and you get some
interesting results
that you're looking for.
Actually, behind
the scenes, when
you search for anything
on the search box,
there's a lot of infrastructure.
There's a lot of
analytics that's going on.
We are one of the largest
organizations that
collects massive
amounts of data,
and analyzes it, and uses that.
It's not just search, though.
If you look at it, we have
more than seven products.
And I know I heard in
keynote earlier today we
may have an eighth
one with Drive,
which will hit 8 billion
users, active users monthly,
going forward.
The key here is big
data is in our DNA.
We leverage data.
We leverage machine
learning for giving those
amazing experiences in
all of these products.
And the way we do that is
through internal technology
that we have built. If you
think about Dremel, which
we use internally for
all our analytics,
BigQuery is actually
an enterprise version
of that same piece
of technology that's
available for enterprises.
So what we are doing here
is bringing the technology
that we have invested
over years and making
it available for our
customers in the Cloud.
If you think about it, data
across the world is growing.
It will be 163
zettabytes by 2025.
And as datasets grow
within organizations,
you want to have infrastructure.
You want to have analytics
capability that can actually
process that amount of data.
Just a data point--
one of our customers
really, when they
started their data
collection and their
streaming analytics biplanes,
they used to collect like
50 million events a day.
Now they are up to five
billion within 18 months.
So what happens is, as you start
seeing value from your data,
you will collect more and more.
But you want ability
and infrastructure
that can actually seamlessly
scale as your needs grow
within your organizations.
Similarly, there
was a survey done--
MIT survey was done about
machine learning and AI,
how many customers are
using it, how it was going.
The key thing was
the organizations
that are actually using AI
are able to make 2x faster
decisions, 5x faster decisions.
They're able to do two times
more data-driven decisions
within their
organizations, and also
3x faster execution on the
decisions that they are making.
So overall, if you think
about it, machine learning,
AI is going to be critical
for all your organizations.
And then the key
point though is,
if your organization is
not good at analytics,
it's never going
to be great at AI.
So the first thing,
the foundation--
you have to have great
foundation on analytics data.
How do you process data?
How do you analyze data?
And then you can think
about how you go ahead
and do machine learning on
top of that data, leverage
AI for differentiation.
If you look at the numbers,
though, less than 1%
of world's unstructured
data is actually
being used for analytics
and analysis today.
Less than 50% of structured
data is analyzed today
within organizations.
So what is our approach to this?
If you look at Google,
what are we doing?
There are four key things.
One, we are focusing on
infrastructure or solutions
that allow you to go out
and focus on analytics, not
infrastructure.
We'll talk more about that.
The second is developing
comprehensive solutions.
So we know customers need the
whole portfolio of solutions
to go ahead and do analysis.
We are focusing on
end-to-end, all the components
that you need.
End-to-end ML lifecycle, and
we'll look at that quickly.
And then innovative and open--
being an open cloud,
making sure you
have options with
open source software
so that you can go ahead and
run the workloads the way you
want to run is super
critical for us.
And we have lot of
investments that we do
in making sure we promote that.
Let's talk about what does
focusing on analytic means
and not on infrastructure.
If you think about us--
if you're doing analysis
with BigQuery, which
is our cloud scale
[INAUDIBLE] product, you can
get started within seconds.
You basically can
bring your datasets
and start analyzing instantly.
The key thing is, if you're
not using a serverless product
like BigQuery or
Dataflow, you will
have to worry about
monitoring, you
will have to worry
about performance
tuning, infrastructure.
How many nodes do I need?
What kind of cluster
size do I need?
How do you performance tune?
None of that is going to
be a problem if you're
focused on serverless.
So that's what our focus is.
We want to provide you
with infrastructure
that automatically scales, gives
you ability to do analysis,
and you don't have to
worry about anything.
Just bring your data,
start doing analysis on it.
Let's talk about
the second point--
end-to-end
comprehensive solutions.
The big thing is, if you
think about analysis,
it actually starts
with ingestion.
How do I get my data?
The first step is, how do
I get my streaming data?
We have lots of customers using
massive amount of streaming
events that are coming to them.
And how do you scale this
infrastructure seamlessly?
So Cloud Pub/Sub is
our solution that
allows you to do millions
of events per second
that you can collect
and do analysis on them.
Similarly, a lot
of our customers
use different Google
products, like, for example,
AdWords and DoubleClick
and all of those,
for advertising purposes.
What we have done is we've made
it really easy for customers
who want to use Google Cloud
for marketing analytics.
Within a few clicks,
you can literally
go ahead and get your AdWords
data, your DoubleClick data,
into BigQuery for
analysis seamlessly.
Similarly, IoT is
super critical.
You saw some amazing
announcements yesterday morning
with HTPU and on Cloud IoT core.
We have a Cloud IoT course.
If you are interested
in collecting IoT data,
you can seamlessly
collect that and actually
leverage the whole
platform from there.
So we've covered the ingestion.
If you think about reliable
data processing and streaming
pipelines, we have multiple
options for our customers.
One is data flow with Beam.
So Beam is an open
source SDK for you
to build batch and
streaming pipelines
with the same programming model.
You can use Dataflow,
which allows
you to automatically build
large scale data processing
pipelines.
Great for developers.
But we also realize that
a lot of our customers
have in-house capabilities
with Spark and Hadoop.
And they love Spark.
I used to use Spark before
in my previous role.
So I love Spark too.
So for that, we have managed
Hadoop and Spark environment
with Datablock.
And then for
analysts-- and we know
a lot of our analyst community,
which is familiar with data,
also wants to do
raw data wrangling,
also wants to do
data preparation,
so that they know best
before the data is
used what kind of
analysis they want
to do and clean up the data.
So we have Cloud Dataprep
for those audience.
After that, once
your data is ready,
you want to do
analysis at scale.
You want to build
your data lakes.
You can actually use GCS,
Google Cloud Storage,
to go ahead and put all your
structured, unstructured data,
and then process it.
Or you can use Cloud Scale
Data Warehouse with BigQuery
to put all this data
at petabyte scale,
and then do analysis
on top of it.
And once you have analysis
platform ready for you, then
for advanced analytics,
you can use ML Engine,
you can use TensorFlow.
For visualizations, you
can use Data Studio.
We'll see some of the
new enhancements we
are making available on that.
And also Sheets, a lot of
our customers, especially
G Suite customers who
use Sheets every day--
we're making some
enhancements on that
to easily make data from
BigQuery and other places
available today.
So that's there.
If you think about
ML lifecycle--
there's the whole
lifecycle, right?
ML lifecycle is you
start from ingestion.
You have to explore.
You have to prepare.
You have to pre-process.
Then you start the process of
training, hypertuning, testing,
and predictions.
There's a whole lifecycle
that has to happen.
And what we provide
is a whole suite
of products that
allow you to go do
every one of those processes.
But what we are also doing
is making it very easy
for you to do machine learning.
And you heard some
of the announcements
we did earlier today.
And I will go a bit
more detail into that.
And actually, we have
an amazing demo for you
later in the session.
From a customer
momentum perspective,
that's our portfolio.
We are seeing tremendous
growth in the data analytics
side with our customers.
A lot of customers using the
whole portfolio across industry
verticals, from
financial services
to retail, from gaming
to media entertainment,
all across the
board, manufacturing.
All across the board, we
are seeing tremendous growth
in our data analytics
capability being used
in different organizations.
And its across the different
sizes of data sets, too.
So you heard earlier
today, [INAUDIBLE]
talked about moving their
large scale Hadoop deployment.
I think it was mentioned
300 petabytes of data
being moved into GCP and
running that scale of cluster.
And the highlight was like
our network and our capability
that we provide with our
networking stack allows
you to have this
decoupling of storage
and compute that really
makes it easy to manage
the whole environment,
reduce the costs and all.
So we are seeing
tremendous growth
with folks like Twitter,
Yahoos of the world,
but also a lot of
enterprise customers
that are using the platform.
So with that, let me
invite Aireen Omar, who's
the Deputy CEO for
AirAsia, on the stage
to talk more about this.
[APPLAUSE]
SUDHIR HASBE: Hi,
Aireen, how are you?
AIREEN OMAR: Hi, thank you.
SUDHIR HASBE: Can you just
do a quick introduction
about you, your role, and tell
us a bit more about AirAsia.
AIREEN OMAR: Sure.
AirAsia is the largest
low-cost carrier in Asia.
So we started back in 2001.
We've just two aircraft.
We carried about
20,000 passengers.
And now, 16 years later,
we have over 230 aircraft.
SUDHIR HASBE: Wow.
AIREEN OMAR: And
over the years, we
have carried over 500
million passengers.
And this year, we're looking
about 19 million passengers
that we're carrying per year.
So we've grown very fast.
We have bases in Southeast Asia.
Southeast Asia, ASEAN,
is our backyard.
And the reason why we are
focused in building that market
base, because it has
over a 600 million
population base, the third
largest after China and India.
And it has a very
young population base
with about a median age
of 28, 29 years old.
50% of the population
is under 30.
70% is under 40.
50% of the population
live in urban areas.
And it's one of
the fastest growing
GDP in the world and one of the
fastest growing middle income
earners in the world.
So this is where SS low-cost
carriers fantastic opportunity
to grow with the population.
And if you look at the
geographical landscape
of Southeast Asia, it's
surrounded by water.
And that's where we feel there's
a lot of opportunity to learn
about the population, to
grow further, and build
other business
opportunities, apart
from just running an airline.
SUDHIR HASBE: So
tremendous growth
within two to three years from
two planes to 230, I guess,
now.
AIREEN OMAR: Yeah.
SUDHIR HASBE: What were the
key challenges you were facing?
And then, tell us
more about what
were the business challenges.
And how are you using Google
Cloud for some of those?
AIREEN OMAR: I think
that the key challenge is
because we have operations
in various countries--
Malaysia, Thailand,
Indonesia, Philippines,
and recently in
India and also Japan.
And we're looking at
getting data from all
over, from various systems,
or so forth, basically.
So we have data coming
from our booking system.
80% of our booking goes
through our internet
and our mobile app,
unlike other airlines,
which is the other way around.
And then we have data
coming from our aircraft
and from our engines.
And we use our aircraft
in the most efficient--
and we maximize the
utilization rate.
The 8020 that we use,
we fly 14 hours a day.
And we have 25 minutes to turn
around, so that we can fit in
as many sectors as we can.
So if you look at
the whole group
as about departing flights,
there's about 1,500 per day.
And we're looking at departing
passengers about almost 300,000
per day.
So that's a lot of
data coming there.
And it's really
important when you
are running an
efficient operations,
you need it to be precise.
And you need something
that's scalable and accurate,
so that we'll be
able to understand
this data better and be able
to focus more on serving
our consumers better.
So the data that we
need is really more on,
how do we improve the consumer's
experience and the revenue
that we can get
from them, and be
able to provide the right
kind of products and offerings
for them?
And how do we use
this data to improve
the overall operational
efficiency of our operations
and so that we
reach productivity
in the most efficient
way and be able to focus
more of our efforts into
looking at the insights of not
only just our operations,
but also the behavior
of our consumers so we
can provide better product
offerings, and so forth?
SUDHIR HASBE: Got it.
And I know you use
BigQuery, and Data Studio,
and all the other
tools in Google Cloud.
Are there key metrics you
can share with us where
you have seen, really, growth
or savings that-- can you
share some of the things
with the audience?
AIREEN OMAR: Yeah.
So I'm also in charge of
digital transformation.
So the key thing is for us to
integrate all this data coming
from various sources and to
be able to combine those data
and make meaningful
algorithm out of it.
And what we have found, even
though we only probably use
less than 20% of the data
that we have already combined,
is that the conversion rate
of the revenues or consumer
has doubled.
SUDHIR HASBE: Oh, wow.
AIREEN OMAR: Every
1% conversion rate
actually increased the revenue
by about 50 million US dollars,
and so forth.
And what we have seen
also, because we're
able to predict better
in terms of operations,
in terms of maintenance,
and so forth,
we reduced the number of
aircraft on the ground.
And that means it's a better
experience for our passengers
and so forth.
And we have seen that
the cost has probably
reduced by at least 10% or so.
And that's actually quite
big in our operations
of running an airline.
SUDHIR HASBE: That's
amazing, especially
in an airline
where, as you said,
the operation cost is heavy.
So 10% saving, doubling
the conversion rate,
and you're just using
20% of the data.
AIREEN OMAR: Yes.
It's probably a little
bit less than that,
because we've just started
only a couple of years ago.
And there's a lot to
do, so it's very key
to be able to streamline
all those in BigQuery.
And it's a powerful
tool that allows
us to be scalable, and
be able to work faster,
and be more focused on the
requirements of our consumers,
and so forth.
Yeah.
SUDHIR HASBE: That's great.
Thank you.
Thanks a lot.
AIREEN OMAR: Thank you.
SUDHIR HASBE: This
is awesome results.
And I'm looking forward
to what we can do together
as you get to 20%, to
30%, to 100% of the data,
as you said, analyzing it.
AIREEN OMAR: Thank you.
SUDHIR HASBE: Thank you, Aireen.
AIREEN OMAR: Thank you.
[APPLAUSE]
SUDHIR HASBE: So
that's about AirAsia.
Let's talk about there
are four key areas
that we focus on normally
when we talk to our customers,
when they're using the
different portfolio of solutions
that we have.
One is, of course, modernizing
the data warehouses.
And we'll talk more about that.
Analyzing streaming data,
which is super critical
as organizations are collecting
massive amounts of event data
from different places--
clickstream to IoT devices--
streaming data and
processing streaming data
is super critical
in organizations.
Running open source
software and, of course,
visualizing and using the
data in a visual manner
is critical for organizations.
Let's talk about
BigQuery for a second.
BigQuery is actually a
cloud-scale data warehouse.
It's a natively built--
if you haven't read Dremel
paper, you should check it out.
It's a ground-up built data
warehouse from the scratch.
It's cloud-scale.
You can do petabyte-scale
queries within seconds.
It supports standard SQL.
You can actually get
started with it at no cost.
There is a free tier
that's available.
How many of you actually
use BigQuery here?
Great.
There are a lot of people who
don't, so my recommendation
would be you should
go check it out.
It will take you a
couple of minutes
to go actually bring your
data in and start analyzing.
As I said, completely
serverless.
You don't have to worry
about infrastructure.
Bring data in and
start analyzing.
That's the key thing.
It's highly secure.
We encrypt the data at rest.
And it's highly available.
And then, real-time streaming
is native to BigQuery.
You can stream hundreds
of thousands of events
directly into BigQuery, and
then actually analyze it
at the same time.
So that's super critical.
So one of the announcements
you heard today morning
was Rajen talking
about BigQuery ML.
The key thing in this was--
the two big challenges
we started hearing
from our customers
was, it's great to use
BigQuery-- massive amount of
data, bring all the data in.
But if you want to do
any machine learning,
you have to move that data out.
And then, if you have
seen some numbers,
like 80% of data scientists
spend time in data preparation,
moving data around, and doing
testing of the models and all,
so our thing was, how
do you reduce that time
by making machine
learning available
in the data warehouse and stuff,
moving data to machine learning
engine?
Why can't I move machine
learning engine closer to data?
So that was the whole
premise of that.
The second thing
was skill set gap.
In the industry, we just
don't have that many PSG data
scientists to go do
advanced machine learning.
So our thing was,
can we just leverage
the skill our audience
already has, which is SQL,
and then make machine learning
available to them in SQL?
So that's exactly what
we have tried to do.
BigQuery ML is nothing but
a SQL-based machine learning
model creation within BigQuery.
If you have BigQuery,
you are already
using SQL to analyze data.
You have queries ready.
You understand your data.
Just write two lines
of code on top of it.
Create model, what
type of model you want.
We can auto-detect
models if you want.
And then just give us the input
and what you want to predict.
And for prediction, you're
just saying, select ML.Predict,
and you can get the
predictions out.
So that's how easy
it is to do machine
learning within BigQuery.
One of the things, if
you saw earlier today,
was 20th Century Fox
where they talked
about how they were able to
predict what audience are more
likely to come back to a movie,
to come back to the newer movie
that they're launching.
I want to take a different
example right now with Geotab
So why don't I invite--
Neil, can you please
come on the stage
and help us understand
what Geotab does?
Come.
[APPLAUSE]
Thanks, Neil.
NEIL CAWSE: Nice to be here.
SUDHIR HASBE: So can you do a
quick introduction of yourself
and tell us a bit
more about Geotab?
NEIL CAWSE: Sure.
Geotab is a global leader
in vehicle telematics.
Many people ask, what
is vehicle telematics?
We have a little device that
collects data out of a vehicle.
We are in 1.2 million vehicles.
We collect all that
data, and then we
analyze it at massive scale.
So we collect information
about where the vehicle is,
how fast it's moving, how
the engine's performing,
fuel consumption information,
whether you're out
going over a pothole, whether
you slammed on brakes.
And so you can just
imagine the opportunities
that we have to
analyze that data,
to deliver results to our
customers using products
like BigQuery and machine
learning are really massive.
And so that that's
really what we do.
SUDHIR HASBE: Awesome.
Can you share more about
your current existing
infrastructure?
Before you went into BigQuery
ML, what kind of technology
do you use from Google Cloud?
How does the business do?
And then your transition to
BigQuery ML, we can discuss.
NEIL CAWSE: Sure.
We think of our
relationship with Google
as our competitive advantage.
We have more than 500 servers
in GC that process the data.
Every single piece of data
that the organization generates
is actually pushed up
into Google BigQuery.
And we're a massive use of
Google ML and TensorFlow.
We use Dataproc.
We use products like Kubernetes.
And anything that gets
announced by Google,
we very keenly look at because,
really, the benefit is--
and it's an
understated problem--
is that when first you
start collecting the data,
you have it in one place.
The next point is, if
you want to leverage ML,
you have to have that ML
close to where the data is.
Otherwise, you spend your
life just moving data around.
And so its been a
great relationship
and a great partnership.
SUDHIR HASBE: And
I know you have
been involved with BigQuery ML
since we announced our alpha.
So I also know you have a demo.
So why don't you
tell us what you're
going to show in
the demo, and then
what audience are we targeting,
and then show us the demo then?
NEIL CAWSE: Sure, I'll do that.
Just to kind of level-set,
we do have, probably,
the most comprehensive and
largest big dataset of vehicles
in the world.
And as I mentioned before, this
data set is very, very rich.
You know, we know the ambient
air temperature, air pressures.
We know whether it's a
dangerous intersection.
We know a tremendous
amount of data.
So one of the things
that I'm going
show you here today
is how we have
an add-in into our
standard product, our feed
management product, but
this one's focused more
around smart city.
And what we're
going to do is we're
going to use ML to predict
outcomes for safety,
based on weather.
So I'll get to it, and I'll show
you how that all fits together
and how that works.
SUDHIR HASBE: Great, Neil.
And while you are
getting set up on that,
the key thing is-- there's
another key thing we'll
be launching today
is it's GIS alpha.
So BigQuery will
natively support
GIS capabilities,
like GIS data types,
within the data warehouse.
We'll talk more
about it a bit later.
And there's a detail
session at 3:15
that we are going to talk about,
but I will hand it over to Neil
to talk more about the demo.
NEIL CAWSE: OK, super.
So we'll get the demo up.
All right, we're up.
So what you're seeing over here
is a view inside our product.
As I mentioned before,
this is an add-in.
This is one of hundreds
of add-ins that
are available in the product.
This is one really cool
one where we're leveraging
Google ML and Google GIS--
the BigQuery's GIS features that
have just been announced here
that we just talked about--
in order to get some
really interesting data.
And this is just
starting to scratch
the surface of where
we can go with this,
and you can understand.
So what you're seeing
on the left-hand side
is a view of the dangerous
intersections in the Chicago
area, so over the
last two weeks.
And essentially, the
hotspots are areas
where it's more dangerous.
Now you ask, how could
we possibly tell that?
Well, there's about
100,000 accidents a year
happening in our
pool of vehicles.
We know where people
are slamming on brakes,
so we aggregate that data.
We can then look at where are
people having these accidents?
And where are people slamming
on brakes, or dangerous lane
changes, and
swerving, and whatnot?
So what happened is the Big
Data team, which are actually
sitting here today, what they
did is they took the data
and then they said, let's
use the public dataset that
was available in Google
BigQuery around weather data.
And so we know for a
particular date and time,
for a particular
location, what is
the weather in that location.
And they used 250
different metrics
to analyze and compute,
what can we tell about how
weather impacts safety?
And so they ran this
experiment, and I'm
going to show you
some results of that.
So let's say we drop the
temperature down to around
freezing, and let's do snow.
And I'm going to run the
predictive analysis now live.
And then what we see is
actually really interesting.
Some of the areas
that were dangerous
before are still
dangerous, but there's
been a big change
in the pattern.
And so we are seeing things
look remarkably different.
And if we zoom into
areas now, now we
can start seeing,
well, where are
those dangerous intersections?
Let's just take one
little area over here
where I'm going to zoom in to.
And we'll find that
wherever it snows,
we seem to have a dangerous
area near a school.
And so we might consider
what is happening here.
Maybe the parents are
waiting across the road
to pick up the kids.
And it's snowing,
and so the kids
are running across the road.
And so you get the circumstance.
Or perhaps vehicles
break down there.
But the point is, by leveraging
ML, by leveraging this data,
cities can now look at
what the infrastructure is
and change the way the
roads are set up in order
to keep everybody safer.
And this is really just starting
to scratch the surface of what
you can do when
you leverage such
a powerful tool, like Google
BigQuery and Google ML.
SUDHIR HASBE: Thank you, Neil.
This is awesome.
Thanks a lot.
NEIL CAWSE: Thank you.
SUDHIR HASBE: The
key thing is just
making the city smarter and
just having that kind of impact.
And you can actually do model
generation and prediction
so fast, it's just going to
expedite the whole solution
creation.
NEIL CAWSE: Absolutely.
One of the key things
was how quickly our team
was able to put this together.
There is no coding involved.
There's no Kubernetes.
There's no spinning of
magnitudes of servers.
SUDHIR HASBE: We love Kubernetes
too, but we have SQL people.
We love SQL, so.
NEIL CAWSE: All right.
Thank you.
SUDHIR HASBE: Thanks, Neil.
Thanks a lot.
[APPLAUSE]
There is actually
a session at 3:15
to go deep-dive into the Geotab
solution, the GIS capabilities.
If you're interested in
GIS data types and all,
that would be a good session
to go to later today.
Other than that, we've also
worked with our partners
to go ahead and
give an integrated
experience for the
BigQuery ML capabilities.
So Looker, for example, has
this end-to-end workflow
that they have built within
Looker where you can actually
go ahead and pull out a
dataset, see it in Looker views,
actually create a model within
that, visualize a prediction,
and then actually go ahead
and fine-tune your model
from the Looker UI itself.
So we will be working
with more partners
to bring these kind of
integrated capabilities,
so analysts who are
using these tools
can, from within
the tools, actually
leverage BigQuery
ML from these tools
and make it really easy
for creating these models,
visualizing the models, and all.
So yeah, looking forward
to this, going forward.
A couple of things
in BigQuery ML--
you have linear and logistic
regression models that
are already available.
The beta is available,
so please go try it.
Give us some more
feedback in the beta mode.
A couple of other things.
We are also announcing
Clustering beta is coming.
Again, I won't be
able to go in details
of partitioning, clustering,
key capabilities.
Just think about it this way--
you can do a petabyte-scale
query from BigQuery.
You could do it
like two years back.
You can do it now.
But with partitioning
plus clustering,
you can reduce the
cost drastically
because the queries are going
to be way more efficient.
We only access the data, what
is required within that cluster
or within that partition.
So partitioning
plus clustering is
going to help you make your
queries way more efficient
and actually reduce
costs drastically
if you are using
on-demand pricing model.
There's a detailed session at
3:15 by Jordan Tigani today.
You should absolutely
go if you're
interested in that
topic later today.
There's some amazing demos
Jordan does in that session.
Again, as we just
quickly touched upon,
GIS alpha is available today.
The scenario that we were
hearing from our customers
was all around-- for example,
we are in Moscone Center.
Within a two-mile
radius, how many
taxis are available
of this region?
If you want to
that kind of query,
historically, it's been
really difficult to do.
And with availability
of GIS features,
you can do that kind of queries
directly inside the query now.
We have some new connectors
that are going live.
One of the other key
things we are launching
is our new BigQuery
UI, which will give you
the capabilities
is it looks better,
and it also has some
one-click experiences
to go into Data Studio
and do visualization.
And then we'll
quickly take a look
at Google Sheets integration
that's available.
So this is an example.
Along with the GIS capability
of the core data types
and their ability
to query, we also
have a visual tool
that we are launching,
which allows you to go ahead
and visually fire a query
and look at the
points on the map.
Because if you're doing
a query around, hey,
show me all the points that
are in a two-mile radius
of another point, how are
you going to visualize it?
It's really difficult. So we
worked with our Earth Engine
team at Google and
have this visual tool
that gives you the ability
to visualize that data.
So please take a look at that.
Again with Sheets, as I
said, a lot of our customers
use Sheets for analysis
and move data into it.
Now with Google Sheets, you
have a connector for BigQuery.
From within there,
you can just click,
connect to your
BigQuery instance,
pull data in, and
start analyzing that,
visualizing it out-of-the-box.
So one of the other
key capabilities--
making it easy to go do
that analysis, connect
to the datasets, and all.
So that's been one of our
big themes for this year.
So that's BigQuery--
how do you make
it easy to go ahead and
analyze data on BigQuery?
Streaming analytics-- I
touched upon it earlier.
We have a whole
portfolio of products
that allow you to do that, like
you can do millions of event
collection from using Pub/Sub.
Dataflow allows you to do
large scale data processing.
You can use Cloud ML or BigQuery
to go ahead and do analysis
on top of that data.
Brightcove is one of the
best examples of this.
They literally collect 8,500
years of video per month.
That's seven billion events
a day is what they collect.
And they use
Dataflow plus Pub/Sub
to go ahead and analyze
those videos and leverage
some great insights from it.
But it's not just
the Brightcove.
Traveloka uses it for
e-commerce, clickstream
collection, and analyzing that.
Qubit is another example
where, in retail, they are
doing point-of-sale analysis.
Amazing scenarios with
Nintendo in in-game analysis,
in-game utilization
of consumables.
And then also Nest for IoT data.
So any kind of large scale-event
collection processing
analytics, you can use
Pub/Sub, Dataflow for that.
We are actually announcing a
few enhancements in that space.
One of the big things that
we are doing is Python.
Python is one of the fastest
growing languages in GitHub,
if you just look at all
of the comments and all.
And we wanted to make it easy
for our Python developers
to do streaming.
So now, we are going to enable
Python streaming capability
with Beam, so customers can
actually build scalable data
pipelines using Python.
So that's going in beta now.
So customers can use that.
We also have Dataflow streaming
and shuffle capability.
It will help you do large-scale
data processing easily.
Auto-scaling capabilities
will come with it.
There are detailed
deep-dive sessions on these
that you should check
out if you're interested.
One of the other
things we have done
is we have actually enhanced
the performance on and made
our libraries much more
efficient for Pub/Sub
in seven different
languages that you can use.
But in addition to
that, we have a lot
of customers who love Kafka.
They're like, hey,
I use Kafka already.
I want to continue
using it on GCP.
What are my options?
So historically, you could just
go ahead and deploy it yourself
and manage it.
But what we have now
is, with Confluent,
we have a managed Kafka
solution that's available.
So if you want to go
ahead and use a managed
service for Kafka, you can just
use Confluent Kafka on GCP.
And that's one of
our strategies is
to work with our
partners to provide
these end-to-end solutions that
you can leverage as a customer.
So it's already available
that you can use.
One of the other things,
which is core to our strategy
as well as core belief,
is this open source
and being an open cloud.
And we fundamentally look
at the things from Istio
to Kubernetes that
we're are investing in.
On our side in the
big-data world,
we are investing a lot in
open-source technologes.
Like if you look at just
like big data roadmap,
last 15 years, the
amount of innovation
that Google has driven
and made available--
before Google Cloud, we used
to make this available as paper
so that the industry could learn
from all the research that we
had done, everything
from Dremel paper,
to MapReduce, to GFS,
like all different papers.
And then we also are building
a lot of these products
based on these technologies.
There are two key product areas
that we have been investing in
on open-source side.
One is Dataproc.
It's managed Hadoop
and Spark capability,
as well as Composer.
Composer is fascinating.
When it was in private alpha,
we had more than 1,000 customers
using it.
I have no idea how do
you keep it private,
and then they have that
many customers using it.
So it just took off.
It's based on Airflow,
Apache Airflow,
and it was just basically
all the customers loved it.
And we started seeing a
tremendous adoption of it.
So we are announcing
the GA for Composer now.
So it's already available.
You should be able to use it.
Major enhancements in
our Dataproc side--
auto-scaling and
custom packages.
Custom packages allow
you, with a few clicks,
to pick our top-level
Apache projects
that you want to go ahead
and deploy now in Dataproc.
That's interesting.
And auto-scaling based
on your resource needs
automatically will scale
your Hadoop clusters, Hadoop
and Spark clusters,
on your behalf.
And then, of course, we
announced a few weeks back
that [? Toddenworks ?] now
supports their infrastructure
and GCP natively.
So you can use HDP or
HDF directly on GCP.
With that, let me call upon
Michael from Blue Apron
to talk about how
they're using GCP.
[APPLAUSE]
Michael.
Welcome.
MICHAEL COLLIS: Hi Sudhir.
How are you doing?
SUDHIR HASBE: Good.
MICHAEL COLLIS: Good to be here.
SUDHIR HASBE: Can you do
a quick intro of yourself,
as well as the
company, your role?
MICHAEL COLLIS: Sure.
Absolutely.
Hey, everyone.
Hope you're enjoying the
second day of Next, like I am.
So Blue Apron was founded
six years ago with a--
well, with a modest goal.
And that goal was to
re-envision how the food system
worked in this country.
And so while we've made
some good progress,
that is an audacious goal as
visions are supposed to be.
And we thought we can
get at this vision
by making home cooking
more accessible, easier,
more affordable for more
people in this country.
And in doing so, we
could go out there,
work with farmers,
producers, and make sure
that we were investing in
sustainable agriculture,
humane ways to raise livestock,
all these different things.
So basically, what
we do is we send out
proportioned, seasonal
ingredients to you in a box
with a recipe to make those.
And we are around millions
of dinner tables in the US
every single night,
which is a privilege.
SUDHIR HASBE: I'm one of
them, so I love Blue Apron.
MICHAEL COLLIS: OK.
SUDHIR HASBE: So how is data
analytics used at Blue Apron?
MICHAEL COLLIS: So one of
the greatest privileges
about working in food, I
think, that I've learned
is that people always want
to tell you what they think.
We don't have to really go
out and solicit much customer
feedback.
[LAUGHTER]
No.
As I said, you're around
people's dinner tables.
It's a very personal
moment, right?
And it's very intimate.
And basically, we have a
responsibility to listen.
And as I said,
people will show us,
they'll tell us what they
want in their recipes.
We were joking before
that all the recipes have
kale in them in the summer.
Don't ask me to fix that.
I can't fix that.
So data is a really
core part of how
we make our business decisions.
And that's not
immediately obvious,
if you look at what we do.
You think, oh, you
ship a box of food.
OK.
So that's great.
But actually, we are looking
at the customer lifecycle
at every stage, and
we are ingesting data
about what you like, what
recipes appeal to you,
what photos appeal to you,
what titles appeal to you.
And we're building up a
profile of what you like.
And as I said, people
tell us how they feel.
If you've ever written a
comment on one of our recipes,
know that a human
being has read it.
SUDHIR HASBE: That's awesome.
MICHAEL COLLIS: But we
can do better, right?
And what we think
is we can build
a virtuous cycle around
what we're doing here
and the vision with data.
And the way that we think
about doing that is--
just if we use an
example of something
that my team does
a lot of, which
is recipe recommendations,
obviously-- helping people
make sure we put the
right recipe in the box
that you'll like, obviously.
So if we have better
recommendations,
we have better forecasting.
We have better purchasing.
We are going out and sourcing
the right ingredients,
and the right proteins, and
the right dry goods that
meet our needs.
That is reducing food waste.
It is cutting out another
middleman in the step, right?
The supermarket.
And if we get better
at that, we end up
saving thousands and thousands
of tons of wasted food, right?
So every small change is really
important for us in that.
And at scale, it makes
a huge difference.
SUDHIR HASBE: And tell me
more about your philosophy
around open-source
software, and how
you use it, and stuff like
that within the organization.
MICHAEL COLLIS: Right.
So we're on the record
as using a laundry
list of GCP services--
you know, Iowa Enterprise
Data Warehouse is, BigQuery.
We use Dataflow for our
streaming processing.
We use Dataproc for our
batch machine learning.
We use GCS for our data lake,
for our prepared features,
our trained models,
all of this stuff.
But a lot of that orchestration,
we use Airflow for.
We have been using Airflow
from, more or less,
day one that data engineering
existed at Blue Apron.
And it's incredibly
important for us
because it helps us
ingest information
from outside sources.
It helps us run
batch ETL processes.
It helps us run our
batch machine learning
models, all of that stuff.
And it's actually a key
piece of how we end up
actually serving our batch
machine learning predictions
as well.
We use Airflow to compute
122 million recommendations
every day.
And we load those up into
a little level DB artifact
that we serve in memory from
our services, which is great,
because it means
we can serve 122
million recommendations daily
with about 15-microsecond
latency.
That's pretty good.
SUDHIR HASBE: Wow.
MICHAEL COLLIS: We
can work with that.
SUDHIR HASBE: That's awesome.
MICHAEL COLLIS: Right.
But open source is a
huge part of that, right?
We got burned early on by--
I think this story should be
familiar to everyone who's
worked as a startup, maybe.
We got burned early on by vendor
lock-in on certain clouds.
And we've been committed to
open source in the beginning.
But that really made
us realize, oh, we
have to take open
source seriously
as an engineering organization
and not end up in that position
again.
We're not a big
engineering organization.
Data engineering, for
us, we're only 15 people.
We've got to work on what we
have competitive advantages in,
and that is not running Airflow.
Our Data Operations team managed
to do the most recent Airflow
1.9 update on our cluster.
Yeah.
Well, they do not
sleep well that week.
So we don't want
to get locked in.
And we want to write it
once, run it everywhere
in our hybrid cloud.
And when Google
is saying, we are
committed to an open cloud,
that's very important for us.
And that's very
important because you
can compete for our business
on any other dimension,
but it's not you're
locked into our product.
And that matters a lot, like
that's a good signal to us.
Beam, Spark, TensorFlow--
these are all things that we
have big investments in.
And if it's open-source, we
can move it anywhere we like,
and we're not--
SUDHIR HASBE: We hope you'll
never move them, but I get it.
You have an option of moving
them whenever you want.
MICHAEL COLLIS: I could.
[LAUGHTER]
SUDHIR HASBE: Perfect.
Thanks.
Thanks a lot, Michael.
Any other key metrics
that you have seen
or business results
you would like
to share before we wrap up?
MICHAEL COLLIS: You can't ask me
that on the week of an earnings
release.
But no, I mean,
basically, we have
seen a huge amount of uptick
in engagement with our product.
And when we give customers
more ways to give us feedback,
we get even more feedback.
So it is a really
virtuous cycle there.
And we're also
using those insights
to help our culinary team and
our amazing chefs basically
plan recipes better.
So that's a new,
exciting frontier for us.
It's using AI to actually
provide feedback from what we
know our customers will like,
so that on the menu are more
things that---
you know, there's something
for everyone and things
that people are going
to love that much more.
SUDHIR HASBE: Awesome.
Thank you.
Thanks, Michael.
MICHAEL COLLIS: Right.
Thanks.
SUDHIR HASBE: Thank you.
[APPLAUSE]
So as you saw, when I talk to
customers across the board,
this whole thing is on open
cloud actually resonates a lot,
especially keeping the
expertise customers
have in Spark, Hadoop, as well
as with Beam what we have done,
and other areas.
The fourth topic I want
to talk about quickly
was visualizing and
activating your data.
The key thing is
self-service BI is
one of the priorities in
various organizations.
How do you explore
your data yourself,
like enable your users to
go out and explore the data,
as well as do collaborative
data-driven decision making,
are the topics that come
up in every conversation
I have with customers.
So one of the things-- if
you haven't used Data Studio,
it's our BI tool that's
available, highly
collaborative.
It's basically built
around collaboration.
The key thing is with the
new BigQuery UI capability
that I announced, if
you use the new UI,
you can literally do a
one-click from your query.
Click once, and directly
do a visualization and data
exploration.
So you can go explore
what dataset it is.
You can blend that data
with other sources,
like AdWords or something,
and pull that data in.
And you can actually go ahead
and create a report out of it
within seconds.
Like literally, you don't need
a specialist to go do that.
Additionally, we also
have pre-built templates
that are available now.
So you can literally go in and--
there's actually a template
I found on Cloud Billing.
So if you want to just
visualize your billing
on cloud, or Google
Cloud, you can actually
have a template for that.
Or you want to analyze
your AdWords stuff,
you have a template for that.
So really good capabilities.
We also have our data
visualization developer preview
that's available.
But you can do a
D3-based visualizations,
create custom visualizations.
The other area that
we have invested
with one of our
partners, Trifecta,
is the data prep solution.
So a lot of our
customers want to do
data wrangling, as analysts
want to go do that visually.
Data Prep actually allows
you to go ahead and visualize
your data, which
may be in BigQuery,
figure out what anomalies
are there in the data,
clean that data up,
and store it back.
As we are getting
ready for our GA
with that tool over
the next few months,
the key thing is we
have focused a lot
on getting the
feedback from beta,
and we'll have some key
capabilities available.
One big area of
enhancements we have done
is all-around team-based
data wrangling.
How do you share your recipes,
share and copy your flows?
How do you go ahead and reuse
your custom sample recipes
and stuff like that?
So a lot of focus on that.
Focus more on productivity,
like how do you
go ahead and have
quick shortcuts
to the popular items and all?
And then we have a completely
new comprehensive design,
which looks much better
and is way more efficient.
So that's one of the areas.
Before I jump into the next
one, so one of the other things.
Somebody told me a while back,
being good is not enough,
you should also do good.
So we have been working
with some nonprofits
to see how we can help
democratize our analytics
and machine learning
capabilities
in the nonprofit community.
And so let's run the video
of how Precision Medicine is
using it, and then I'm going
to talk more about that.
Can we take it away?
[VIDEO PLAYBACK]
[MUSIC PLAYING]
- My name is Robert Tabz.
And five and a half years ago,
my mother was diagnosed with
Alzheimer's.
I knew not all the
medicines were working.
The entire time was
a downward spiral.
- I also lost my grandfather
about 25 years ago
to the disease.
My family, at the time,
felt like it was already
too late to change the
trajectory of the disease.
And it breaks my heart when I
hear the same stories today.
The mission for Foundation
For Precision Medicine
is to bring AI and
health care together
to detect Alzheimer's early.
- If you can detect
Alzheimer's very early on,
that is when the disease is
most susceptible to treatment.
- The data that
we have access to
are anonymized electronic
health care records.
We needed a HIPAA-compliant
environment, which
is why we used Google Cloud.
- We're dealing with
hundreds of variables
on millions of
patients, which generate
billions of lines of data.
- Google Cloud enables us
to scale our operations.
And with BigQuery ML, we're
able to develop machine learning
models faster and
utilize our entire data.
Being a nonprofit, we rely on
our volunteers across the US,
and Google Cloud really
enabled us to do that.
We wanted them to be able
to apply machine learning
on the data and look
at trends themselves,
to empower them to come up
with more innovative approaches
to change the progression
of the disease.
- This work is so
important to me,
because it helps us address
this devastating illness that
has no cure.
- I heard somewhere
that they said,
don't forget the dots
on the plot are people.
And we really take
that seriously.
[MUSIC PLAYING]
[END PLAYBACK]
SUDHIR HASBE: So great example
of how Precision Medicine
is using the data analytics
capability in BigQuery
ML, along with other
BigQuery features,
to go ahead and do
advancement in their area.
So what we were able
to do was, today, we
are announcing Data
Solutions for Change.
It's a program that we're
launching for nonprofits
across the world where they can
go ahead and, on-need basis,
get access to Google
Cloud credits,
along with
self-training resources
and hands-on enablement.
As I've said, our
goal is, how do we
democratize analytics and
machine learning for nonprofits
around the world and
give these capabilities
in hands of organizations that
want to do good in the world?
So that's launching today.
The other one more thing
that we are launching
is Visualize 2030.
So this is a collaboration with
World Bank, United Nations,
U.N. Foundation, and other
affiliate organizations where
we want to drive awareness
and actions around the U.N.--
sustainable development goals.
There are 17 goals that are
within the next 12 years
we want to meet.
And so basically, this is
a storytelling competition
for students, the grad students
around the world, where
they can go ahead and submit,
create, visual stories,
and insights, and
actions, based on Data
Studio and the public
datasets into BigQuery.
So BigQuery has
70-plus public datasets
that's available that you can
go use, start analyzing today.
So in this, you can go ahead
and create these visual stories,
and then submit it
by end of September.
And then we'll announce the
winners at the U.N. World Data
Forum in Dubai, in October.
So this is one of the things
that we are announcing today.
We want the
next-generation students
who are like-- earlier, we
were talking about 80 million
students using G Suite.
We want to extend
similar capabilities
on data analytics
for this audience
so that they can start
analyzing, visualizing,
and coming up with
insights to go solve.
Along with that, one of the
things I do want to talk about
is our partner ecosystem
is super critical to us.
We have partners
across the board.
Like from [INAUDIBLE],, we
have some amazing partners.
If you want to get data into
BigQuery or different analytics
products that we have,
we have amazing partners
that provide those solutions.
We have data
integration partners.
We have partners
for visualization.
You saw a previous
example of Looker.
Tableau is a big
partner in that--
Click.
There are a bunch of partners
that provide BI tools,
as well a lot of SI partners
coming on board to help you
with your various engagements
that you may have.
So that's key.
Google is a leader in the
insights-as-a-platform,
for the platform-as-a-service
from Forrester.
And I'm hoping
we'll be recognized
more and more in different
upcoming reports that come out.
The key thing for me
is there's a lot more
information on BigData that's
available on the solution
place.
Please take a look at that.
There are amazing sessions
I highlighted the GIS
one, the deep-dive
one on clustering
and all, with the
Enterprise Data Warehouse,
and Beyond Enterprise
Warehouse by Jordan Tigani.
There are a lot of other
good sessions on the big data
topics in the conference.
Please attend them, and
give us more feedback.
Thank you, everybody.
[APPLAUSE]
[MUSIC PLAYING]
