>> Displaying data on a map can
become a problem when
you have a lot of data,
and you can actually
leverage features from
Azure Maps for clustering the data.
Ricky, from the Azure
Maps team, comes again,
on the IoT Show to
tell us everything you
need to know about
clustering data on maps.
[MUSIC]
Hi everyone, thanks for
watching the IoT Show.
I'm Olivier, your host.
Today, we have Ricky,
from the Azure Maps team.
Ricky, thanks for coming
to the show again.
>> Thanks for having me.
>> So today, we'll talk about
something which is
pretty interesting.
It was Maps, one of
the many capabilities.
Maps is about visualization,
and Maps is also about things
that are happening in the back.
Today, we're talking
about clustering data.
So when we think about IoT scenarios,
because I'd like to put
things around IoT scenarios.
It's called the IoT Show after all.
Potentially, you have lots
of data you want to show.
You want to have any device
that actually shows on the map,
was its own dataset,
they deploy whatever,
you can have millions
of devices showing up.
>> Right, yeah. One of
the big issues that
developers often come across is they
have all these things they
want to show on a map.
I'm sure our map control can
handle a large datasets,
but you get it on there and you
realize that you have so
many pins on the map.
You can't even see the map.
>> Exactly.
>> Half the pins are
hidden by other pins.
So you want to make that more
usable and more navigable.
So when you're zoomed out and
you're looking at the whole world,
you don't need to see
every individual pin.
You just want to know, "Okay,
where is my group of pins?"
So that aggregate is
known as clustering.
>> Clustering, that's
the name of that thing.
So there's I guess different
methods for doing that.
From the developer's perspective,
the AD is to simplify
developer's life but then
not to have to think about
it and give them option,
some knobs to configure all of that.
So how is it done in Azure Maps?
Show me that.
>> Well, let's start
off with just taking
a look at the whole
concept of clustering.
So if we switch to the computer here,
we can see a typical example.
I think this is actually shipwrecks.
So tons of shipwrecks
all over North America.
>> Along the coast.
>> Yeah, along the coast. Well,
some of them are in lakes.
>> In lakes as well.
>> In lakes as well, we see
a lot in the great lakes.
There's a lot of different
algorithms for clustering,
but one of the basic one
is called grid-based
clustering where essentially
you take the map,
you break it up into a grid,
and then you count the number
of pins in each grid cell.
That's your size of your cluster.
>> Pretty straightforward.
>> Then you can go through and
you pick a point inside of
that grid where you want
to put your cluster.
It could be the average location,
it could be just the
center of the cell.
Then now, we get our clusters
inside of each cell.
Then finally, we'll move that.
Normally, the grid doesn't
show up. That's just
for our demonstration purpose.
>> Pretty straightforward.
>> Yeah.
>> Useful. Because yes, the result
you're seeing here is actually way
more understandable and
digestible for a human eye.
>> Right. Then as you zoom in,
this gets recalculated and
you'll end up with a
different set of clusters.
The clusters will break apart.
When you zoom in close enough,
there won't be any more clusters.
We'll look at the individual pins.
>> Okay. Love it.
>> Yeah. So bit of interesting
history about this.
So the grid-based
algorithm is an older one,
there's a lot and a
newer things here.
But 12 years ago,
I was contracted by Microsoft to
rate some interesting
articles on how to use Maps.
So this was one thing I
was experimenting with,
then I end up writing
an article for MSDN on
how did you grid-based
clustering and made
all the code open-source
on as an article on MSDN.
That's how we did it back then.
>> Twelve years ago, first
person on the MSDN come in.
>> Then ended up making its way
into a bunch of the
Microsoft mapping platforms,
and actually ended up making
only to the community platforms
such as Leaflet and even some of
our competitor mapping
platforms out there too.
>> Nice.
>> As time went on, the
community took over,
and made it a lot better,
and made a lot more improvements.
Now today, we actually use
an open-source library
called the Supercluster.
>> What goes around, comes around.
>> So yeah, that initial payoff
of handing off something to
the community is come back and we got
something that was
better which is great.
>> The baby is no longer
is a grown up now.
>> Right.
>> Awesome.
>> So let's take a
quick look at how to
actually call an actual map
and how to actually do this.
So as we can see here,
we have a map with some
cluster similar to before.
As I zoom in, these
clusters will break apart
into smaller bits and how
you get the individual ones.
Inside of Azure Maps,
so in this example here,
we're pulling in data from earthquake
feeds from the past month.
In Azure Maps, one of the
main ways of bringing data,
a GeoJSON data, is you can
get and put into what we
call the data source.
That manages all the data
inside the map for you.
It does all the heavy lifting.
One of the options in
there is a cluster option,
so we can see this here
right in the code.
With that, there's a bunch of other
clustering options
such as the radius.
So this is using instead
of a grid-based method,
it takes the first
point, it has a radius,
puts everything inside
of that cluster,
and then takes the next unclustered
point, and does it again.
>> Okay.
>> So slightly different algorithm,
but very same end goal.
>> Do you have rules on which
method to use versus another one?
>> So we only use the one.
So the grid one is nice and fast,
but the final layout
isn't always great.
Whereas the point-based one
is a little bit more compute,
but it always looks a lot better.
>> Got it. Okay, makes sense.
>> So that's what we prefer.
So we can go through,
set the radius, you can even
set a maximum zoom level.
Because if you have two points
that are right next
door to each other,
you have to zoom in really close
before they'll actually separate.
So you might say, "I don't care
if they overlap a little bit."
So let's say, once we hit zoom
level 15, just turn off clustering.
>> Okay, I got it. Yeah.
>> So that's what we do. So we simply
just set these option
on the data source.
Everything else you
would do normally with
Azure Maps for rendering
stays the same.
One thing that you would
probably want to do is
render clusters differently.
So in this example here,
I used a bubble air,
and I'm using some
expressions to scale
the size based on these number
of points inside the cluster
and also to change the color
based on the number of
points inside the cluster.
So there's this point count,
which is one of the
properties of the cluster.
>> Pretty straightforward.
You realize that people were
actually just use your
sample here to get on
>> Yeah. Copy, paste,
that's the beauty. All
of this is on GitHub.
>> Once again, 12 years from now,
you'll see your code coming back.
>> It'll be even better.
So in this example,
I was using skilled circles
but you can use any of
the rendering capabilities
inside of Azure Maps.
So here's an example of
doing clustering and using
the icons. So we see these.
The triangles are representing
the number or a cluster,
and the circular things are
individual earthquakes in this case.
>> Okay.
>> Some other interesting things,
clusters represent an area and
so sometimes you want to know what is
that area or what is
that general area.
So I have this sample
here where you can hover
over different points and
it calculates an area.
What this essentially is doing
is taking all the points
of that cluster and calculating
what's called the convex hull.
The convex hull basically
you take an elastic band,
stretch it around your data
and that your polygon.
>> That gives you. Got it.
>> So it's actually quite
simple to implement.
So in this case here
I've set up a mouse over
event and that calls this function
here called displayClusterArea.
I grabbed my shape out of that event.
Then with the data source,
we have some methods
that we've exposed
from getting access to that raw data.
So you'd go through for clusters.
You can pass in the idea
of the cluster and say,
"Get the cluster leaves."
So that basically gets
all the children.
From their that will be
it has a callback in-terms
of calculations sometimes,
and it'll return back
all the all the shapes or all
the points in that cluster.
From there, if there's
only two points,
we don't even need to calculate,
we just draw a line.
But if there's more than two points,
then we can use
getConvexHull function of
our math library that is built-in to
Azure maps and simply
pass those points in.
That generates our polygon and we
just add it to the data
source and we're done.
>> Love it. One thing I want
to point out is, once again,
because it's a PaaS service,
calculations are
happening on the backend.
So you can have a client
which is the browser and
the machine is running on,
which is pretty poor in
terms of CPU and others.
Because maps is actually
powered by the Cloud,
you would actually have that
interaction with that map even on
a poorly featured and
limited hardware client
and actually have the
same result in terms of
the comfort view ID
information and so forth.
>> So in this example here,
I'm using the library that's
built into our web SDKs.
This is happening
locally but we do have
a set of services if we're doing
the same calculations on the Cloud.
>> But the data actually
is in the Cloud itself.
So if some calculation
happening locally,
but lots of it is actually
happening in the Cloud as well.
>> Yeah. So let's take a
look at another example.
So this is a newer
feature that we exposed.
So you have your clusters again,
but in this case here this is
clusters points of interest,
but it's a mix of
104 different types.
So what I've done is we have this
thing called cluster aggregates.
So a lot of times you have a cluster,
you want to do some calculation
with the data inside of it.
Maybe you want to calculate
the total revenue
of all that things inside.
Well, the best time to do
that calculation is when
the cluster is being calculated
because we're doing
one loop of the data.
If I had to do it later,
then now I'm doing a
second loop of the data.
I can do this in line and
say in my data source,
also calculate this while
you're doing the clustering.
So in this case here if I
click on one of the clusters,
we're just seeing a
pop-up of that lists,
or the number of points
of interest of each
type, of the four types.
We've got gas stations
and things like that.
So if we take a quick look.
Here we are again at the data source.
We got a cluster and I've modified
my radius for whatever
reason that need to be.
But we introduced this thing called
cluster properties,
and didn't go through,
and defined an object to where
the key is whatever I want it to be.
In this case, I'm just
using the entity names.
But then we use a data-driven
expression to do our math.
So we have a whole other video on
how to do expressions in
tons of documentation.
But the idea here is I'm looking for
the entity type
property of each point.
Checking to see if in this
case is a gas station.
If it is, I then take
my aggregate which
is gas stations again,
and add one to it.
Otherwise I add zero.
>> Okay. Pretty straight forward.
>> So these calculations will
happen for each cluster
as it's being built.
So as I pan and zoom in,
all that's already done.
I don't have to do
all of these again.
Additionally, as I go down,
I want to go through and
show that in the pop-up.
Let's go here, we click the cluster.
So now, if I go into the cluster,
I have the properties.
Then there's a bunch of properties
that come out of there.
But there's also another property
which all the properties I added,
our cluster properties will also
be properties of that as well.
So I'm doing just a for each
on those properties here,
on the entity types and
just creating some HTML
to list that all out.
But makes it right there.
So I don't have to
do any calculations
on the fly, it's all done for me.
Another interesting way to use
cluster aggregates is
create a pie chart.
So now we've got a really
interesting thing and I'm using
actually SVG with the mouse
overs a for each slice.
So I'm using an HTML
marker to render each pie,
and so now it's fully interactive.
I get a ton of data, so I get
the number of points in that cluster.
I can see the data for
each individual metric.
As I zoom in,
this data here will also update.
>> Nice.
>> So it's a really nice
data visualization to see
a lot of data on the map
and get a lot of insights,
without having to mainly
do a lot of work.
>> You steal my words what
we're about to ask for now.
Once again the notion is not
just to display data somewhere.
Its actually to allow for the
user to extract insights rapidly.
So sometimes you have to take
action on these data as a user,
or you need to not spend hours
actually deciphering or trying
to understand what's going on.
So this typically is the kind
of things that you would
expect that the user to get.
I feel we make it simpler for
developers to deliver
a kind of results.
>> Yeah. Again it all comes back
to the power of the community.
Being able to reuse the libraries
that are already out there.
>> Fixed flow use though.
>> There's been a lot of
improvements over the time,
over the years and
more things get added.
So this just helps us do
development faster and be able to
get these things to customers faster.
>> I love it. Thanks Ricky.
>> Thank you.
>> So if you want to learn more
about clustering in Azure maps,
you go to aka.IMS/IoTshow/maps
clustering in one Word.
Awesome. Ricky thanks again for
another great IoT show
episode about maps.
>> Well, thanks for having me.
>> We're looking forward
to see you again soon.
>> Great.
>> Well, thanks watching
the IoT Show. See you soon.
[MUSIC]
