[MUSIC PLAYING]
MARK MANDEL: Hi, and
welcome to episode
number 192 of the weekly
"Google Cloud Platform Podcast."
My name is Mark, and just for
funs, I'm joined by Mark again.
[LAUGHS] How you doing, Mark?
MARK MIRCHANDANI: Hey, Mark.
I'm doing well.
How are you?
MARK MANDEL: I'm good.
Just to confuse people who are
listening, it's Mark and Mark.
MARK MIRCHANDANI: Always
excited to come on here
and have the two Marks talk
about all the cool Google Cloud
things.
MARK MANDEL: Actually,
what I should do
is I should say my
name's "Maak" and I
should say your name
is Mark, and then
I use the appropriate accent
at the appropriate time.
MARK MIRCHANDANI: [LAUGHS]
I don't think I can--
I don't think I have
an accent that I can
switch to so quickly like that.
MARK MANDEL: [LAUGHS]
Anyway, why don't we
get stuck into the
actual podcast.
Who are we talking to this week?
MARK MIRCHANDANI:
Well, this week
we're talking to Billy
Jacobson about Bigtable, which
is something we have not
talked about in quite a while.
So--
MARK MANDEL: Yeah.
MARK MIRCHANDANI: Super
interesting to hear
a little bit more
about, especially
for people who haven't used it,
what Bigtable is good for, what
it's not good for,
how it plays into all
of the different products that
Google Cloud Platform has.
MARK MANDEL: Great interview
with Billy as well.
But then I have a fun question
for you, wherein I ask you,
if I have an organization
for my GCP project,
how do I break down my
billing data by folder?
MARK MIRCHANDANI: Yeah,
it's a surprisingly--
it takes a little bit of
extra work to do that.
But it is something
that anyone who
is running a GCP organization
is super interested in knowing,
because, well, as it turns
out, when you pay money,
you like to generally
track where it's going.
MARK MANDEL: Psh.
I guess.
It's fine.
MARK MIRCHANDANI: I mean.
Just not--
MARK MANDEL: It's fine.
MARK MIRCHANDANI: --throwing
dollar bills everywhere.
Just not making it rain, right?
MARK MANDEL: [LAUGHS]
MARK MIRCHANDANI:
You need to know.
MARK MANDEL: You need to know.
These things are important.
Awesome.
Well, why don't we get
stuck into our cool things
of the week so we can talk
about all the other fun
stuff that's coming up, too?
[ELECTRONIC MUSIC PLAYING]
What have you got?
MARK MIRCHANDANI: Sounds good.
Well, the first thing I've
got is our Cloud Run button,
which is-- first of
all, it's a button,
so that immediately makes
it more fun and very cool.
MARK MANDEL: Yep.
MARK MIRCHANDANI:
We actually talked
about this a couple of weeks
ago, or maybe a few months ago.
But now it's built into the
Google Cloud official GitHub
repositories.
And it's a pretty simple--
it's actually pretty
much just a link.
So if you've got
your code on GitHub
and it either uses a build
pack or it's dockerized,
then you can just take
this quick button, put it
into your GitHub
repo, and then people
who are viewing that
repo can click the button
and it'll just send it
right over to Cloud Run.
So you don't need to
do any additional work.
It's just a really fantastic way
for someone to take your code
and be like, oh, well,
I can see the code,
but let me see what it actually
looks like when I run it.
Well, just by
clicking this, it'll
set up Cloud Run
on their account.
It'll get the code in there.
It'll install it.
And then you'll have that
server list ready to go.
MARK MANDEL: Oh, that's sweet.
So if I have like a little
app or something in GitHub,
I can just push a button.
It'll [INAUDIBLE] on my project,
and I can just Cloud Run it?
MARK MIRCHANDANI: That's it.
You can just Cloud Run it, and
then people can see, like, hey.
You know, it's great
to see the code,
but let's see what
it actually does.
MARK MANDEL: Nice.
Very cool.
I've got a fun one, too,
by a fellow teammate.
Dane Zeke Liergaard
did an article
and has written a solution
for Firebase Unity
Solutions, Update game
behavior without deployment
with Remote Config.
So Remote Config's a
great Firebase tool
can be used for mobile
apps, especially,
if you want to be able to
change how your app behaves
at runtime.
Basically, you can change
configuration values
within the Firebase Console,
and that gets propagated out
to all your applications,
and you know,
you can be like, make this thing
bluer or redder, or et cetera.
This is also great
for games, as well.
Maybe you want to change
things like difficulty levels
or how many hit points are
available to certain characters
in the game, or if you
want to balance your play
style or your play
mechanics inside your game,
without having to push
a whole new versions.
MARK MIRCHANDANI: Right.
There's so much extra
work that has to go
into pushing a new version.
But if you can use--
MARK MANDEL: Yep.
MARK MIRCHANDANI:
--Remote Config,
it sounds like you're easily
able to just say, well,
let's make these tiny
tweaks, and then people
will see those changes
instantaneously.
MARK MANDEL: Yeah.
So what's super
nice about this is
this is built in directly
into the Unity Editor.
You can just pop in the
provided component onto you
any game object
or prefab, and you
can tweak how you want
to basically set up
each of the properties
within there
and how they tie to the
Remote Config Management API.
Not only that, once your
game is actually live,
you can actually go back
into the Unity Editor,
see the values that
are available in there,
and actually upload
any changes you
want to try tweaking
locally up to Remove
Config so it gets propagated
out without actually
having even to go to the
Firebase Console as well.
So super nice thing.
Thank you very much
for that work, Dane.
MARK MIRCHANDANI:
Yeah, it sounds
like a really cool way
and really quick way
to make those tweaks.
Well, on the not-so gaming side,
but in talking about BigQuery,
we've actually just
launched out a new Terraform
module for BigQuery.
So if you're into
Terraform and you're
looking at automating
deployment or really
just taking your
infrastructure's code solutions
and adding in the
ability to say,
well, now we can
actually push out
which tables in BigQuery,
which data sets in BigQuery,
structure out your
BigQuery structure,
and then load in data into it.
All of that is now
possible through Terraform,
so you have an open
source alternative now.
Instead of using Cloud
Deployment Manager,
you can use, well,
Terraform to drive out
what that structure
looks like and then
also to make sure that you have
that backed up as code, right?
So now you can actually
version control it.
MARK MANDEL: Nice.
Always good to see
more stuff happening
with Terraform in Google Cloud.
That makes me very happy.
Finally, just a
nice announcement,
talking about Macy's, the
very large retail store,
is now using Google
Cloud to streamline
its retail operations.
So it is moving its
infrastructure to the cloud
and taking advantage
of Google Cloud data
warehousing and analytical
solutions, basically
to make everything a
little bit more efficient.
So welcome to the
family, Macy's.
MARK MIRCHANDANI:
Always very cool to see
cases of big customers like
Macy's, which here is certainly
known as a relatively
large shopping chain
and seeing how they
can streamline it.
But in reality, it's like there
are so many little pieces here
and there that the Cloud offers
to people, and in combination,
especially with
these retail stores
that can use the hardware.
You know, I was actually just
at the DMV the other day,
and I saw that of the monitors
that everyone was using
was Chromebooks.
MARK MANDEL: Oh.
MARK MIRCHANDANI: So it was
super exciting to see there,
and then to add on
all the physical layer
with the actual
cloud layer, I think
there's some really cool
possibilities out there.
MARK MANDEL: Definitely.
And lots of data, lots
of data to manage.
MARK MIRCHANDANI: So much data.
MARK MANDEL: So much data.
Excellent.
Well, speaking of so much data,
why don't we go talk to Billy
all about Bigtable?
MARK MIRCHANDANI: Sounds good.
[MUSIC PLAYING]
MARK MANDEL: Very happy today to
have a fellow DevReller, Billy
Jacobson, coming all the
way down from New York
to come hang out with
us in the studio today.
How are you doing, Billy?
BILLY JACOBSON: I'm good.
Thanks for having
me, Mark and Mark.
[CHUCKLES]
MARK MIRCHANDANI: Absolutely.
MARK MANDEL: Do you
feel peer pressure
to change your name to Mark
so that you could blend
in with this table of Marks?
BILLY JACOBSON: What if I told
you my middle name was Mark?
MARK MANDEL: Really?
BILLY JACOBSON: Nope.
MARK MANDEL: Oh.
MARK MIRCHANDANI: Oh.
[CHUCKLING]
I was so close to give him entry
into the Marks club for that.
MARK MANDEL: But no?
OK, fair enough.
[LAUGHS] Before we get stuck
talking about Bigtable, which
is what we're going
to talk about today,
why don't you tell us a
little about yourself?
What do you do here at Google?
BILLY JACOBSON: All right.
I've been at Google
for four years,
and I've actually been working
on the Google Cloud Platform
for all four of those years.
My first three years I was
working on the Cloud Console UI
as a front-end
engineer, so I had
to learn about ton of
different products,
about the user experience
of those products.
I think Michael Kleinerman,
who was on that team,
was actually just interviewed
on this podcast a few episodes
ago.
But now I am in
developer relations.
I'm a developer
programs engineer
focusing on Cloud Bigtable.
So that involves writing code
samples and the documentation,
creating code labs and
tutorials, working on going
and giving some talks.
Really just trying to look
at the developer experience
and try to improve that in
as many ways as I see fit.
MARK MIRCHANDANI: So
I have a question.
MARK MANDEL: Uh-oh.
MARK MIRCHANDANI:
What is Bigtable?
MARK MANDEL: [LAUGHS]
MARK MIRCHANDANI:
Because there has been--
Bigtable's been mentioned on
the podcast before, right?
MARK MANDEL: So
the last episode we
did was back in
2016 with Ian Lewis.
So it's been a
while, episode 18.
[LAUGHS]
MARK MIRCHANDANI: So a
refresher on what Bigtable
is would be super helpful.
BILLY JACOBSON: Yeah.
Our phrasing, furtively, as
is our petabyte-scale fully
managed NoSQL database.
And it's really the combination
of all of those things
that make it special,
that you can get so big,
that it's fully
managed, and that it's
NoSQL, which kind of
ends up being part
of the reason it can be so big.
MARK MANDEL: Right.
And so that sounds like it
could be used for anything,
but I'm willing to bet
that there are probably
some particular niches
that it probably
works better in than others?
BILLY JACOBSON: Yeah.
MARK MANDEL: Like, what sort
of use cases is it good for?
BILLY JACOBSON: One of the big
uses is for a time series data,
like user analytics.
So if you have your application
just tracking every interaction
people have on that,
thinking-- well,
Spotify is one of
the big customers,
so tracking every song that
gets listened to and storing
that in Bigtable,
all of those listens.
And then having
that data they can
run different kinds
of analytics on
and different kinds of
machine learning on.
Once you have all
that time series data,
then you can do so many
different things with it
that you might not be able
to do with other databases.
So running all kinds
of analytics on it,
doing different kinds
of personalization.
And you can do that
all in real time, which
you can't necessarily
do with something
like BigQuery, where you can
take those queries and run them
ad hoc.
Bigtable let you set that up
to be used in an application,
and maybe create a dashboard
of all those analytics.
Or when you're training
machine learning,
you can train
directly on the data.
In Bigtable, you don't have
to move it to somewhere
that can be accessed faster.
So all of those
different pieces are
why you might want
these Bigtable
in adjacent to
something like BigQuery.
MARK MIRCHANDANI: So BigQuery
is more like the warehouse,
kind of?
You go and you
make these queries.
They come back, you know,
usually pretty fast, seconds,
but that's still seconds.
Whereas Bigtable, you're
placing it as a proper database,
and you can treat it like
one, and it works like one?
BILLY JACOBSON: Yeah, exactly.
So BigQuery is your
data warehouse.
You might store the
data in both locations,
but you could give BigQuery
to your data analysts
and let them run queries
over the entire data set
or even just play around
with different data sources.
But really, if you're
doing a full table scan,
BigQuery would be
where you'd want to go.
Bigtable, you're going to want
to do different pieces of that.
So maybe get me all
the data for this user,
and then do training on it,
or display it on dashboard,
or something like that.
MARK MANDEL: And what is
it about Bigtable that
makes that such a good fit?
What can it do that makes it
be able to do that so well?
BILLY JACOBSON:
So one of the ways
Bigtable's able
to do all of this
is that it's a NoSQL database.
And because of that,
there's also no schema,
so-- well, there's one index,
which is just on the row key.
So you define your row key
based on a few properties.
So let's say you would set
maybe a timestamp, a user ID,
and some kind of other
categories in that row key.
Then you can only perform a
row get or a row range scan.
So you'd say, maybe, get me all
of the user data for this user
ID in this time range,
and maybe for this genre
or maybe for all genres.
And it would depend
on how you organize
those segments of that row key.
So it's basically just
a concatenated string
of a few of those
pieces of data.
So with BigQuery, you really
get that full SQL interface,
and it's really easy to
create queries, give it
to some kind of
analyst, when Bigtable,
you really have to
think about how you're
going to be using
it, setting it up
for specific kinds of queries.
And once those are set up, those
are the only kinds of queries
you're going to be able to run.
But they're going
to be super fast.
MARK MIRCHANDANI: So by
putting a little bit more,
it sounds like thought,
into how you're
going to be arranging especially
the keys, but kind of how
the data is going to be put
into this database, that's
how you're able to keep
those really quick speeds up
and that huge scale.
BILLY JACOBSON: Yeah.
With Bigtable you're getting
sub-10 millisecond latency.
And Bigtable scales with nodes.
I said it's massively
scalable, so with each node
you get 10,000 QPS.
So you can just keep
adding more nodes.
So the typical minimum is three
nodes, which is 30,000 QPS,
but you can really scale
that up until there aren't
any more nodes left, like in all
of Google's cloud, which is--
MARK MIRCHANDANI: Until
Google runs out of compute.
[LAUGHTER]
BILLY JACOBSON: Yes.
Until we run out of computers,
that's the end point.
MARK MIRCHANDANI: It's OK.
It's the cloud.
MARK MANDEL: That's great.
And just so I'm
100% clear as well,
I'm assuming you can do that
while things are live, as well?
I can scale that up and
down as my throughput
and my requirement change?
BILLY JACOBSON: Yeah.
So you can scale
that up and down.
It's not going to
be instantaneous,
because some of the data
needs to rearrange itself
for you to see that benefit.
I'm not totally sure how long
it takes, but it's not days.
It's--
MARK MANDEL: Minutes?
BILLY JACOBSON:
Definitely minutes, yeah.
Definitely on that scale.
But back to those
row keys, if those
aren't distributed
evenly, then you
might run into some
performance issues.
But we've got a lot
of documentation
and some code labs that
walk you through how
to think about those row keys.
And there are also
a bunch of talks
that we've done at Next
showing sample row keys.
And we've got this great tool
called the Row Key Visualizer.
So you can maybe take
a subset of your data
and a subset of
your queries when
you're starting to plan
out what that schema might
look like and see, are
there any hotspots of data?
Are there any scans we're doing
a lot that don't look great?
We have a guide on what
query or what kind of imaging
doesn't look great.
So you can use that.
We've got that tool
to help you figure out
what schema, in quotes, is
going to be best for your data.
MARK MANDEL: And
so, also, like so we
were talking about it being
like a NoSQL store, as well.
So you've got your key, right?
So that can be it basically
a concatenated string,
it sounds like?
BILLY JACOBSON: Yeah.
MARK MANDEL: And
then your value.
Are we just dumping JSON
blobs in, or is there
a binary protocol, or--
BILLY JACOBSON: So we've got--
MARK MANDEL:
--whatever you like?
BILLY JACOBSON: Yeah.
So we've got column families
and column qualifiers.
So the families do some
grouping on making it faster.
If you're only querying
things from the same column,
families under the
column qualifiers
can help you map
to a specific cell.
So you can just have a
bunch of cells with data.
And a cell is identified by
the row key, the column family,
and the column
qualifiers-- just kind
of-- we just shortcut that
as the column, as well
as the timestamps.
You can multiple timestamps
within each cell.
So that can make it
really good if you
want to use that for some part
of your time series management,
maybe saying, store an
hour of data in each cell,
or even just to have
just version tracking
across the word.
But you don't want to use that
for your entire time series,
because there's a
limit to how much data
you can store per cell.
MARK MIRCHANDANI: So Bigtable
is a giant NoSQL database.
Why would someone use
Bigtable over setting up
their own like Mongo
instance, for example?
BILLY JACOBSON: Because
Bigtable is fully managed,
you're going to get the
security and the reliability
that Google Cloud is offering.
So we offer However many nines
of reliability for Bigtable.
We have replication as a feature
so you can distribute your data
amongst the same
zone, or we also
have multiregion replication.
So you can really
distribute or data
and take advantage of
all the computing power
that Google Cloud has to offer.
MARK MIRCHANDANI: Plus, as the
name implies, I hear it's big.
[LAUGHTER]
BILLY JACOBSON: Yeah.
And a table.
MARK MANDEL: And it's a table.
[LAUGHS] You know, I think
you said something really
interesting there as well.
You were talking about
it where not only can we
do interzonal
communication, basically
like regional
failover, it sounds
like we can do global failover.
Is that correct?
BILLY JACOBSON: Yeah, so you
can setup to have multiregion
replication.
So if you're a global
business and want
to have your data stored
across all different regions
so it's easier
for you to access,
you're going to get that faster.
You're going to
get lower latency
to people trying to access
data within those regions.
MARK MANDEL: So it
sounds like your data
is essentially mirrored?
Or like you said, replicated,
I guess, around the globe.
It's not just a failover system.
You're actually making sure
you have multiple copies all
over the world?
BILLY JACOBSON: Yeah, so you'll
be able to actually set up
different application
profiles in terms of where
that data might get accessed.
You can specify that if your,
maybe, customers are in the US,
you could have them go towards
the US replication, access
data from that.
But you can set it up
in a few different ways.
If you want to be using the
replications as a backup,
then you can have those not
get accessed at all for any
reads or writes, and
only use that to store
the data as a backup.
Or you can have it where
you're continuously
interacting with both
instances of the replication.
It's going to be an
eventual consistent system
so you might have some
stale data that comes back.
But it really just
depends on your use case.
It's a very flexible
product, so really, you
need to figure out what parts
of it you're going to use
and how they apply
to your use case
and turn all the knobs to
specifically for that use case.
MARK MANDEL: Now
I'm wondering here,
it sounds like I could
probably do something like,
maybe these kind of rows
that have this kind of data,
maybe replicate that in one way,
but these kinds of rows maybe
don't.
Would that be right, or--
BILLY JACOBSON: Yes, I think
it's more of-- at least
at the table level.
It might even be the entire--
MARK MANDEL: Oh,
that makes sense.
BILLY JACOBSON: I think it's of
the entire instance, actually.
MARK MANDEL: Yeah
BILLY JACOBSON: But we can
include the replication docs
in the show notes.
MARK MANDEL: Absolutely.
MARK MIRCHANDANI: So if you
were going to design a system
and had your major
use base, let's just
say, being in the
US, but you wanted
to still have lower read latency
for people across the globe.
If you were then to
spin up a major writing
center in, let's
say, the APAC area,
could there be a
multimaster scenario here,
or would everyone
just be writing to one
and then replicating to
all these read replicas?
BILLY JACOBSON:
Yeah, I think there
could be multiple masters.
From what I know, I
think that sounds OK.
I'd have to check in the
docs just to confirm.
MARK MANDEL: So I kind of want
to head over to the open source
side, too, which is
kind of interesting.
I know we support
like an HBase client.
Can you tell us about HBases as
well, and just how that works?
BILLY JACOBSON: Yeah.
So this in the history
of how Bigtable formed.
Google's been using
Bigtable internally for--
basically since the
start of Google, almost.
It's helped our search,
helped our Gmail,
and these core business
products that we have.
And I don't remember
which year, exactly,
but we published a white
paper that explained,
what is Bigtable?
Basically, how could
someone create it?
How we're using it?
And the open source
community took that paper
and created HBase, which
is the open source database
version of this, basically.
And over time, some
of these features
have diverged from what
internal Bigtable has to offer.
And then a few years
after we created
this white paper and HBase came
out, we created Cloud Bigtable.
So when we started
with Cloud Bigtable,
we wanted to make
sure that people
who were using
HBase and used to it
had an ability to just take
whatever they were using
and just swap it directly
out from HBase to Bigtable.
So we support the HBase API.
So anyone who's familiar
with using HBase
should be able to just transfer
their stuff over pretty easily
and use all of the API
commands that they're used to
and not have to really
change their code.
And because of the open
source community with HBase
and because we have
this HBase client,
there are all these
open source libraries
that we get to use for free
with Bigtable, just because
of that compatibility.
So one of them is JanusGraph,
which does graph databases.
There's OpenTSDB for
time series and GeoMesa
for geospatial data.
MARK MIRCHANDANI: So it kind of
fits pretty well into the cloud
modularity model, where you
can take these different pieces
and swap them out.
And this one happens
to be a great solution
for a fully-managed
database in the backend,
but then your application,
wherever it is
or whatever it is,
could theoretically
swap this out and get all
those benefits without having
to rewrite anything?
BILLY JACOBSON: Yeah, yeah.
So it's really nice for that.
But now, many years later--
I mean, I know the
last time you all had
some talk about Bigtable was--
MARK MANDEL: [LAUGHS]
It's a while ago.
BILLY JACOBSON: --in 2016.
MARK MANDEL: Yeah.
BILLY JACOBSON: So now we
actually have nine clients
to use with Bigtable.
We just came out
with a Java client.
So for people who want to
use a more idiomatic Java
client, as with the
other products we have,
because they don't
have their own-- like,
Spanner doesn't have an HBase or
open source equivalent client.
So now if you're using
other products that we have,
you'll be able to use a very
easy-to-use Java client.
And we have Python, Go.
It's all the assorted languages
that we have to offer.
MARK MANDEL: So here's an
interesting question, then.
Where do you think people should
choose, using either the GCP
native clients, or should
they use the HBase ones?
Or where does that
pro and con come from?
BILLY JACOBSON: I think it's
just if you are using the HBase
clients already, probably
stick with HBase clients.
I know a lot of people,
even before we came out
with this Java client, had
created their own Java clients
and just found that
easier to work with.
So we were like, oh,
let's do that, too.
MARK MANDEL: So like
greenfield, to use one of ours,
is probably a bit of
a nicer experience.
But if you already
have HBase, then maybe
go with that instead?
BILLY JACOBSON:
Yeah, definitely.
MARK MIRCHANDANI: And you
also mentioned-- you know,
earlier we were talking about
Bigtable versus BigQuery.
And so they're very
different solutions,
despite sounding
somewhat similar.
BILLY JACOBSON: Yes.
MARK MIRCHANDANI: And then
you just mentioned Spanner.
So if someone's coming to
this from a fresh perspective,
when is a use case that you
might use Bigtable or Spanner,
or vise versa?
BILLY JACOBSON: Yeah.
I think this is a very
common point of confusion,
because it really does
depend on your use case.
And Daniel Bergqvist
and I actually
gave a talk on this about
specifically internet
of things, at Google
I/O. So we should
link that in the description--
MARK MANDEL: Yep.
BILLY JACOBSON: --for
more of the details.
But for Internet of Things, we
realized that a very common use
case for all three of these
types of databases or data
warehouses.
And I guess what
we were saying is,
if you have this
time series data
and you're looking at
these three solutions, what
might be helpful to consider
is typically, if you're just
trying to do something with
a time series database,
go with Bigtable.
That has all the basics you
need for a time series database.
It's really fast, can
scale, and do all of that.
We also looked a little bit
Datastore and Firestore,
and that's good at a smaller
scale with lower QPS.
And when you get into the
scale of it, with the Bigtable
or with Spanner, you
have to play with it
to figure that out.
And then for Spanner, if
you have transactions.
That's really where if you're
using a time series database,
transactions are
really where you're
going to get the power of
using Spanner over something
like Bigtable.
And you also get the SQL
interface, which is nice,
so that might be
part of the reason.
But typically it's because
of those transactions.
And with Spanner,
you're going to get--
I believe the writes are going
to be slower with Spanner, just
because of the way the
data is replicated.
So they have to write the
data to all the locations
that the data is replicated
before you can read from it,
whereas Bigtable, because
it's eventually consistent,
you'll write it and then
you can read it right away.
It just may not be there,
especially if it's replicated.
MARK MANDEL: Yep.
BILLY JACOBSON: And
then for BigQuery, we
kind of said, think about those
databases totally separately.
So if you're going to choose
Datastore or Firestore,
Bigtable or Spanner, that's
one part of your solution.
And then to use it, a
data warehouse, that's
kind of like a
totally separate part.
So for that, you'd use BigQuery.
And like I said, or you
do those ad hoc queries,
you give that to a
data analyst, you
can do full table scans, which
aren't really great for Spanner
or Bigtable.
That's why you
would use BigQuery.
MARK MIRCHANDANI: And
like you mentioned,
I mean, you might
have a system in place
where you use both Bigtable
and BigQuery at the same time
for different purposes.
BILLY JACOBSON:
Yeah, it's a very--
yeah.
So that's very common.
We typically see people
have a Dataflow job that
writes to both at the
same time, or even takes
their data from BigQuery
and writes it over
to Bigtable, or vise versa.
MARK MANDEL: Oh.
I know what I want
to talk about.
[CHUCKLING]
So what's the local developer
experience like with Bigtable?
Do I have any emulation
tools or anything like that
that I can run
locally, so that I
can play with this
without having
to spin up a bunch of
clusters in the cloud?
BILLY JACOBSON: So we
have Bigtable emulator.
I haven't had a chance
to play too much, so--
MARK MANDEL: [LAUGHS]
BILLY JACOBSON:
--I can't answer--
MARK MANDEL: It
is there, though.
BILLY JACOBSON: So we have that.
MARK MANDEL: Yep.
BILLY JACOBSON:
And we also have--
so Bigtable nodes
are fairly expensive.
I think they're around 600, 650.
I don't want to say
the exact number.
MARK MANDEL: We
can look into it.
BILLY JACOBSON: Per
node, so typically you
do with three nodes for
a production environment,
so that can get pretty
costly per month.
And I think-- although,
when I was listening back
to Ian's podcast, I think it was
triple that rate for one node.
So that's at least great
we've come down on the price.
MARK MANDEL: [LAUGHS]
BILLY JACOBSON: But we do have
a development instance you can
use, so that's just one node.
MARK MANDEL: Oh, cool.
BILLY JACOBSON: So
that's a good way
to actually just play
with it in the cloud
and actually get to use--
you're going to lose
out on some of the power
by only having one
node, but you can really
start interacting
with it and get
a feel for the API and stuff.
MARK MANDEL: So yeah.
I mean, as you said, it's been a
while since we talked about it.
What has changed in the last
two years with Bigtable?
We've got like new, cool
stuff, just more columns
and more keys?
[LAUGHTER]
BILLY JACOBSON: Well,
I mean, at the base,
Bigtable has stayed
the same, because I
think it's such a foundational
product it can't really
change too much.
But we've added the replication.
We've added these new clients.
And we also have
this Key Visualizer.
I think what's changed is the
environment around Bigtable.
I mean, GCP has grown so
much, so there's more--
if you want to store
stuff in Bigtable,
the rest of your ecosystem
can comfortably be in GCP,
and you're going to find
all those ways you'd
want to interact
with Bigtable there.
So Dataflow jobs, Dataproc,
even, some of the ML
features that we have in GCP.
So really, it's all
about this environment
that's around Bigtable.
You want-- Bigtable, you're
getting that low latency,
so you don't want to have
your stuff in Bigtable
and then be doing
analytics on it
somewhere else,
because then you're
going to lose some
of that low latency.
So getting to have
an ecosystem that
supports Bigtable and
supports everything around it,
I think that's where GCP has
grown over the past few years.
MARK MANDEL: One of
my favorite questions.
What has been the most
interesting thing you've
seen someone do with Bigtable?
Possibly weird or wacky.
BILLY JACOBSON: One of
the things I really like
is Spotify's Discover Weekly.
So that happens on Bigtable.
We've given a few talks
about how it works,
about their ecosystem.
And I think they were saying
before they had Bigtable,
they'd have some
engineers up all night
trying to make sure that their
recommendation algorithm wasn't
going to crash or was
going to be available
that Monday morning for those
Discover Weekly playlists.
And for them to
get on to Bigtable
and start using the
machine learning on that
and to have it to just be
not something they even
have to worry about anymore
and could bring more
recommendations and
different ways with that,
I think that's just such a cool
thing that's impacted my life.
One of the cool ways I've
got to play with Bigtable
was this past GCP
Next, or Next 2019.
We were figuring out what
can we do to really try out
a real Bigtable use case,
play with it in a way
that developers might
be playing with it?
So we were like,
where do we find
like a cool Internet of Things
or time series database?
And we were poking
around, and we
found this team called
Air View, and they've
put air quality sensors in the
back of Google Street View cars
and--
MARK MANDEL: Oh, wow.
BILLY JACOBSON: --have been
driving them around the Bay
Area and a few other cities.
So we contacted them and started
streaming some of that data
into Bigtable through
a Dataflow pipeline.
And then we're able to create
a real visualization with it.
So that was cool
just to see like, oh,
if you've got like
an Internet of Things
and a fleet of
Internet of Things,
how that process
can be done and how
you could use that to
create a real-time dashboard
and do real analytics on it.
And while we only
had a few cars,
it's cool to know
we could easily
is to scale it up to a
thousand cars or 10,000 even
and-- just by adding more nodes.
MARK MANDEL: Is there anywhere
people should specifically
go if they want to learn
more about Bigtable?
BILLY JACOBSON: I would say go
to the Bigtable documentation.
We've got links to
a lot of the talks
we did at Next, which show off
some of these really cool use
cases.
Some of the personalization,
some anti-fraud,
the time series
analysis, that talk
at I/O about how to
choose your database.
I think with
Bigtable, you really
want to know how other
people are using it
so you can learn about
each knob and how to turn
them to best benefit you.
So before you dive
into any development
with Bigtable,
learning a lot about it
is probably the best
recommendation I'd have.
MARK MANDEL: Fantastic.
MARK MIRCHANDANI: And
it sounds like that talk
about choosing the
right database would
be super helpful, too, because
there are a lot of options
out there.
BILLY JACOBSON: Yeah
MARK MANDEL: And are you going
to be anywhere in particular?
You going to any conferences
where people can come
see you talk?
BILLY JACOBSON: I have nothing
scheduled, unfortunately.
MARK MANDEL: [LAUGHS]
BILLY JACOBSON: But if anyone
listening wants to hear more
about Bigtable, sign me up.
MARK MANDEL: There you go.
And I'm sure you have a
Twitter account, I assume?
BILLY JACOBSON: I have a
Twitter account, @BillyJacobson.
MARK MANDEL: There you go.
We'll make sure we put
that in the show notes--
BILLY JACOBSON: Sure.
MARK MANDEL: --so people
can find you as well.
Awesome.
Well, Billy, thank you
so much for joining us
today and talking to
us all about Bigtable.
BILLY JACOBSON:
Thanks for having me.
MARK MIRCHANDANI:
Thanks so much to Billy
for coming in, for
talking about Bigtable.
I mean, it's been a while.
I think we had
mentioned, you know,
quite a number of episodes since
Bigtable was last mentioned.
MARK MANDEL: Yep.
MARK MIRCHANDANI: But it's
a really, really cool tool,
and I think understanding
what it's really good at
and how people can look
at what's using it as well
as any future changes
for it, I think really
highlights exactly
how cool it is.
MARK MANDEL: Yeah, no.
Great interview.
Thanks so much for
joining us, Billy.
Well, let's get stuck into
our question of the week.
[MUSIC PLAYING]
So if I have an
organization, I've
set up an organization for
my variety of GCP projects,
and my projects are
separated out by folder
so I can manage them
in a coherent way,
it's easy to see
what's going on,
how do I break down my
billing data by folder
so I can see what's going
on at that folder level?
MARK MIRCHANDANI: Right, yeah.
I mean, this is a really
important question
for people who are running an
organization inside of GCP.
So a lot of people who are
just either individual users
or hobbyists or something like
that can spin up Google Cloud
Platform with their
personal accounts,
and they can still set
up a billing account,
export that billing
account to BigQuery,
and then you have a
actual line-by-line more
detailed version of
your billing data set.
So you can write any analytics--
MARK MANDEL: Nice.
MARK MIRCHANDANI:
--based on that,
or you can do any
work based on that.
But when you're an organization,
maybe even like a Macy's,
and your organization
structure is a little bit more
complicated, you can create
folders to group your projects.
But when you export
that to BigQuery,
you only have the IDs of the
folders, not the actual names.
MARK MANDEL: Oh.
MARK MIRCHANDANI: And that can
be a big challenge, because,
you know--
MARK MANDEL: Got it.
MARK MIRCHANDANI: --I may not
be looking for folder 1256734,
I'm looking for folder
Production and Development.
So Nick has written this cool
little article and a very, very
quick tutorial to show you how
to use the open source GitHub
project GCP Folder Look
Up, which will basically
take your organization,
take your folders,
and put that out
to BigQuery table,
and then you join
that with your billing
data set to actually get
a more detailed view.
Now all of your line
items will actually
have the organization
structure in them as well.
So any analysis you're
doing with BigQuery
is a lot easier because you
can say, OK, well, give me
all the costs and show
me to me by this folder,
or show it to me by
this folder, but only
for this specific GCP project.
MARK MANDEL: Oh, nice.
So this creates another
table inside your query data
structures so that you
can then join that up
based on the ID of that table?
Would that-- is right?
MARK MIRCHANDANI: Exactly, yeah,
with the name of the folder
and then with the line items
that come through the BigQuery
export.
It's way, way more
powerful, and you
can see every single individual
cost over any time span
as soon as you start enabling
the BigQuery export, of course.
So it's a great way to either do
data analytics on your billing
data or even just to
look up information.
You can also plug
it into Data Studio
and you get a nice,
cool visualization layer
on top of that.
MARK MANDEL: Oh, that's nice.
I'm just looking at it here.
You can both get like
the parent of the folder,
but also what level and stuff.
So if you have like semantic
reasoning behind which level
your folders are
at, you'll be like,
grab me everything from level 2.
You can do some really
neat stuff this way.
MARK MIRCHANDANI: Yep.
A full view of
your organization.
So I think a lot
of people will be
really excited to be able to
follow the instructions here.
And like I said,
they're very quick,
and then you'll have a lot more
information in your BigQuery
export.
MARK MANDEL: Very cool.
Awesome.
Well, before we
finish up, Mark, are
you producing any cool
videos, going anywhere fancy?
What are you up to?
MARK MIRCHANDANI: Oh, well,
I'll be headed to LA for a week,
in a few weeks.
But in the meantime,
I think it's mostly
just working on videos.
We are working with
releasing some new billing
content, very much in line
with our previous question.
MARK MANDEL: Yep.
MARK MIRCHANDANI: And
talking a little bit more
about how people can
understand the tools that
are available to them inside
of the Google Cloud Platform
console.
So I'm super excited to see
those coming out, probably
in the next couple weeks.
MARK MANDEL: Very, very cool.
MARK MIRCHANDANI:
How about yourself?
MARK MANDEL: I'm going to shoot
a little bit in the future,
because there's some stuff
I'm real excited about.
So when this comes
out on Wednesday,
I'll be probably finished
with PAX Dev, where
I'll be doing two
panels, but I'll
be hanging around at PAX West
for the rest of the week,
so you can always drop
me a line if you're
going to be there as well.
Later in the year I
will be at KubeCon
in some capacity or
another, so I'm very much
looking forward to that.
And so one thing
else I also want
to talk about is next year,
but I'm very excited about it.
Game Developers
Conference next year,
the big game
developers conference,
essentially, myself
and Ed Pereira
have been working with
Games Developers Conference
to do a one day
summit at the event.
Basically, Game
Developers Conference
does a series of summits
on Mondays and Tuesdays
about specific topics.
And so Online Games
Technology Summit
is what we're calling it.
Basically, it's for
anyone who works
in backend systems for
online games, anyone
who works on client
and server networking.
Basically, anything
to do with multiplayer
online connected games,
anything in that capacity,
infrastructure management,
scaling, security, monitoring,
all that kind of stuff.
There really isn't a
space for us people
who work in that to really
gather and do stuff.
So SFP will open on
the 29th of August,
so after this comes out.
So if you're looking to
attend or possibly submit,
you know, put it in your
calendar now and get it done.
I'm really excited about it and
very happy to work in the GDC
on this initiative.
So I got very
excited about this.
MARK MIRCHANDANI: Yeah,
that's super cool.
I know you've been
talking about working
with them for a little while,
so it's really exciting
to see it come to fruition.
And I think, like
you said, this isn't
a space that exists right now.
There's not a lot of
people talking about this,
so having a conference
for all these people
to get together and talk
about these technologies
I think will ultimately
help the information sharing
aspect of it.
MARK MANDEL: That's the idea.
Trying to help grow
this particular part
of the industry.
But anyway, it's
good, good stuff.
MARK MIRCHANDANI: Well,
it'll be a lot of fun,
and I'm sure a lot of people
are super excited to hear
about it as well.
MARK MANDEL: I hope so.
That's the plan.
MARK MIRCHANDANI:
[LAUGHS] All right, Mark.
Well, thanks so much for
joining us this week.
And we'll talk to you all soon.
MARK MANDEL: See
you all next week.
[ELECTRONIC MUSIC PLAYING]
SPEAKER 1: Coming to you
live from New York City--
MARK MANDEL: In San Francisco.
SPEAKER 1: In San Fran--
MARK MIRCHANDANI: And not live.
[LAUGHS]
MARK MANDEL: And not live.
MARK MIRCHANDANI: We're
just lying everybody today.
MARK MANDEL: Yeah.
[CHUCKLES] There we go.
