- Coming up, we look at how we
can now bridge live analytics
over your operational
data without compromise.
Microsoft's technical
fellow Raghu Ramakrishnan
joins us for an overview
of the new cloud-native approach
for hybrid transactional
analytical processing or HTAP
with Azure Synapse Link,
along with Cosmos DB.
Now we'll put it to the test
and also look at how this new
approach enables analytics
over your operational data
in seconds versus hours.
So, Raghu, welcome to Microsoft Mechanics.
- Thanks, Jeremy, thanks for having me.
- Thanks so much for
joining us from home today
and congrats on your announcement
for Azure Synapse Link.
Now just to set some context
here before we start,
Azure Synapse Link is an
extension of Azure Synapse.
So Microsoft's single
managed service for analytics
over your data lake and data warehouse,
using either serverless
or provisioned compute.
And now this even extends to
your operational data sources.
So what's the significance then
of what we're announcing today?
- We have developed the first
cloud native version of HTAP.
The operational and analytic systems,
there's a bottleneck between them.
And today, we've
eliminated that bottleneck.
- Right, and just to
set some context here,
HTAP as a concept, it's been
around for a few decades now
but it's the idea of
really being able to shift
from after-the-fact analysis
of your operational data
to real-time analytics
against the transactions
as they're occurring.
But the promise has never
been fully realized, right?
- No, it's not an easy nut to crack
because on the one hand,
you're always chasing
like continuous analytics.
So you want to look at the data
in the operational store in real time.
On the other hand,
you don't want to interfere
with the operational store.
Usually people end up
having separate operational
and analytic systems and that
bottleneck, it's a problem.
- Right, it's not a trivial thing
to provision all the parts that are needed
even for the analytics infrastructure.
You've got to maintain it
from an IT perspective,
troubleshoot if failures occur
and if these systems are on premises,
you can easily scale them
without buying lots of
expensive metal, right?
- Now this is not an easy
thing, because take an example,
there's a manufacturing plant
with its operational database.
And on the side because you
don't want to interfere with it,
you have your analytical store.
And to keep costs down,
because the amount of data in here
can get to be very, very large,
you have historical
aggregations from many sources,
you're going to do everything
to optimize the analytical store.
At the same time,
you typically need to build
and manage an ETL pipeline
to get data from the operational store
over into the analytical store.
And again, to minimize
interference, keep costs down,
you're going to try and schedule
this with off-peak hours.
And that means your data
is going to be unpredictably
out of lag with the source.
So clearly interferes
with real-time analytics.
- And to be clear, there's
a lot of complexity here
and I know that there are some attempts
that are trying to solve
for this today, right?
- Yeah, and those
basically amount to trying
and doing your operational
and analytic workloads
in the same system.
And that requires
provisioning it to do both;
expensive in terms of memory, resources,
and you can never really scale this enough
for the analytics which are,
as I said, very large scale.
- Right, and this is an area I think
where the cloud can help
but almost warrants this
cloud-native approach
and all the elasticity that we get there.
So how do we solve for this
and how's your team been solving this,
using all the different
components we have in Azure?
- [Raghu] They've been
working for a long time
to lay the foundation by
separating storage from compute
across all our data services.
On the transactional side, for
example, we have Cosmos DB,
a high performance, geo-replicated,
multi-model database service.
On the analytics side, as you know,
we have limitless scale with
Azure Synapse, which part you can see.
So you can scale the resources
for both transactions
and for analytics Atmel independently.
Together we have most
of the pieces we need
to make cloud native-HTAP a reality.
But there's one little problem,
the data on the operational side
needs to be made available
to the analytics side
without interfering with
the operational workload.
This is where users need to build
and manage their own ETL pipelines.
Today these pieces come together
simply with Azure Synapse Link.
As soon as the user indicates
what data in Cosmos DB
they need to make available for analytics,
this data becomes available in Synapse.
We take your operational
data you want to analyze
and automatically maintain
an analytics-oriented
columnar version of it.
Any changes to the
operational data in Cosmos DB
are continuously updated to
your Link data and Synapse.
- This is great because
there's no more scheduled batch processing
or having to build and
maintain operational pipelines
but I want to make this real
and put this to the test.
So in my case, I actually
have a manufacturing system,
and you'll see here that
everything is running
and back end is in Cosmos DB.
So this is actually pulling
all the data in real time
as it's being transacted.
And you can look at the throughput here,
I have around 1.27 million
requests per second.
So it's not a small data set.
But what we want to do actually now
is have a look at kind
of what we get with HTAP
and without HTAP.
So here, in my case, I have a
couple of different queries,
on the left hand side, the no HTAP query,
meaning it's against current
state-of-the-art services.
It's got its own pipeline as
you can see in the lower left,
so I have to do all this
stuff to ETL and get it in.
On the right hand side you'll
see there's no pipeline,
we don't need to do that,
and you'll see it's got
more or less the same query
but we've got one linked service here
with Cosmos DB analytics.
So what I want to actually
do is take a look at these
and how they're going to
look like running again,
look at the right hand side,
but more or less the same query.
And if I go back to the left,
I'm going to go ahead and run this.
And that will take just about
three or four seconds to run.
And now we'll see our results here.
And I'm just going to expand this out
so I can look at it full screen,
and you can see there's about
a one to two minute variance
in terms of the time it took to run this
versus the operational data.
When I run it on the right, though,
you can see that there's about
14 to 21 milliseconds of gap.
So, way more real time,
way more fidelity on the right hand side
using what we have on the Synapse end.
But where this gets even more real
is when we look at our Power BI report.
So Power BI against the
non-HTAP environment,
what you're seeing here
actually looks very green,
it's a very rosy picture,
in the sense that nothing
looks like it's going wrong.
But when I actually start
to incorporate that higher fidelity data,
I can show you that here
we'll see a lot more red,
we'll see a lot more truth
in terms of the data itself.
So we can see just the big
difference it makes here
in terms of better recency of our data.
Now one other thing that's
super important to point out
is that we actually, when we run this,
you can see that on the
left hand side, no HTAP,
every time we run against
our operational data
it's actually pulling down
our amount of transactions per second.
On the right hand side, with HTAP,
it's not impacting our operational data.
So everything is healthy,
we're not actually taking a
toll on the operational systems.
So let's look at what it
takes to enable all of this.
So all I have to do in this
case, in my Cosmos environment,
is click the Enable Synapse Link button.
That's going to do
everything at the back end
to actually wire up the
two different datasets,
it's creating all of the integration
between the two systems.
And then when I do that,
where you can see that
actually show up is here,
Cosmos DB is one of my data
sets that is available for me.
So, very simple to get all
of that up and running,
and I can train or retrain data
for machine learning models
or make more accurate predictions
based on the recency of data.
So, you saw that it's pretty
straightforward to set up
and get up and running,
and it's going to save a
lot of people a ton of time.
- Yes, it's much simpler
to perform continuous live analytics.
And it's also more performant and cheap.
- And this is all seamless,
there no pipelines to manage
between your transactional
data that's in Cosmos DB,
against your analytics data
that's in Azure Synapse.
And another thing to point out here
is that there aren't any
VNets or virtual networks
to set up for your data
services to talk to one another.
But Raghu, are there
any operational changes
then for data teams as
they start to incorporate
and use these technologies?
- No, their lives just got simpler.
All they need to do
is indicate what data
in the operational store
they need to make available for analytics
and Synapse Link takes care of the rest.
- It's really great to see
all of this work being done
but what's next on the
agenda though for you
and the data team in Azure?
- Big shout-out to the Cosmos DB team.
Today we announced
the integration of Cosmos DB with Synapse.
So I'm going to allow similar integrations
from Azure SQL, Azure Postgres,
and sources beyond Azure.
Also we're working on a lot of features,
data lifecycle management,
time travel in Synapse
and all of these will accrue
to the data and Synapse Link.
So, very excited.
- All right, thanks so much
for joining us today, Raghu
and we'll be watching
this space super closely.
Now if you liked what you saw
with this new approach to HTAP,
you can learn more by
being among the first
to try Azure Synapse Link
by signing up for the private
preview aka.ms/AzureHTAP.
And if you're new to Azure Synapse,
it's available now in
public preview today.
And you can try that out, go
to aka.ms/TryAzureSynapse.
And of course,
don't forget to subscribe
to Microsoft Mechanics
if you haven't already
for the latest updates across Microsoft.
Thanks for watching and good bye for now.
(upbeat music)
