[MUSIC PLAYING]
BRANT SWIDLER: Howdy, y'all.
My name is Brant Swidler.
I'm the director of customers
access at SparkCognition.
And this is Ben
Wilson, my partner.
BEN WILSON: I'm his partner.
Glad you all are here.
When people see people like
us, they know we're from Texas.
And usually what they're
thinking about in their head
is usually this.
They think this is
what goes on in Texas.
But I don't really
think it does.
And what's really
interesting about this
is we were told by the
Google press people--
that might be
offensive to cowboys.
But we didn't find
it that offensive.
BRANT SWIDLER: But
what really happens
in Texas, when
you actually think
about Texas, maybe some Texas
BBQ, Willie Nelson, South
by Southwest.
But what we really
think of Texas
is big houses and
no state income tax.
So what Ben and I are here to
talk to you guys today about,
we're going to talk about
GCP and SparkCognition.
I'm going to do a little bit
of an intro into who we are
and SparkCognition.
And then we're going to walk
through an oil and gas customer
use case.
Again, my name is Brant.
I'm the director of customer
success at SparkCognition.
You can think of that
as a quasi-product
manager/engagement manager role.
My background is in oil and gas.
I spent a number of
years at Schlumberger,
both as a consultant
and an engineer.
So for those of you who haven't
heard about us, SparkCognition,
we're an AI machine
learning software provider.
We have three main
products in this space.
The first one, which I'll be
heavily focusing on today,
is a product we call Darwin,
which is for automated machine
learning.
We have a second product
called SparkPredict,
which is a software that
allows users to interact
with the outputs of models.
So for an oil and
gas customer, you
can think of being
able to interpret
the output of a model,
and make decisions
based on the statistics of the
data sets that are coming in.
And then the third product is
a natural language processing
product that we have
to help users or help
industrial companies sort of
make sense of maintenance logs,
as well as other
NLP applications
that they might have.
We've been working on
delivering solutions
to markets since 2014.
And we're based
in Austin, Texas,
as you can probably tell.
We raised $56 million
in our Series B--
that was earlier this
year-- from the likes
of Boeing's HorizonX
investment group,
Invenergy's Future Fund--
Invenergy's a large wind
provider-- as well as
Verizon Ventures.
And we list a
number of companies,
not just oil and gas focused.
We're in government.
We have financial applications,
aviation, utilities,
and manufacturing.
And we've been highly ranked
on some disruptors lists.
So the idea of what
we're going through today
is a product that we
have called Darwin.
And I'm going to go into a
lot more details about what
Darwin actually is.
But it's a automated
machine learning tool.
And it differentiates itself
in many different ways.
But as of right now, Darwin
sits on Google Cloud.
And you can access
Darwin on Google Cloud.
It's a user.
It makes a call to Darwin's
API, which they can upload data
to Google Storage.
And it uses Google
Compute to actually do
a lot of the machine
learning work that's
happening behind the scenes.
And I'll go into a
lot more detail about
how it actually works.
And it will get a
little bit technical.
So look forward to that.
BEN WILSON: Awesome.
You didn't press the button.
There we go.
So we're here to talk about
how machine learning works
in the oil and gas industry.
For me, one of the things I
like to think about when I think
about machine learning
is what platform
are you going to build upon?
When I think about a
platform, I think about it
in basically three main ways.
The first, because many
people around the world
rely on Google Gmail, Maps, and
of course, the all important
cat videos, Google has created
a rock solid infrastructure
that you can rely on.
So if you are going to go
build a petabyte sized scale
seismic data lake,
or you're going
to do seismic
interpretation, this
is a performant environment
that you can work upon.
The second I'd
like to talk about
is our serverless data platform.
Here we built a
unique platform that
allows you to spend more
time analyzing your data
and understanding your data,
and less time caring and feeding
your production applications
or your seismic applications.
So you're allowed to go
down deep into your data,
and be able to really
make decisions.
Third, because we believe deeply
in the power of developers,
we have a
developer-centric platform.
And what we like
to do is we like
to let developers do
what they love most--
and that's code--
while we manage
the deploying, the
scaling of that code
so that they don't have to.
Whoops, I missed that.
I forgot to press that button.
There we go.
Let developers just code.
This platform is
actually in five regions,
or in 15 regions right now,
expanding to 15, or five more.
These data centers are
all over the world,
and probably within reach.
And these data centers do things
that are absolutely fantastic.
And they have technology
that just is unsurpassed.
To give you a little
taste of that,
when you think about
bandwidth, and you think
about bisectional bandwidth,
the entire internet
has a bisectional bandwidth of
roughly 200 terabits a second.
Now that sounds like a lot,
but when you take a peek inside
a Google data center, and we
have 1,300 terabits a second
for bisectional bandwidth, we
feel pretty confident that we
can handle almost any
network-intensive application
that you might have--
things like seismic
interpretation as just
yet an example.
Now I put this slide
up, and this is always
an interesting one, especially
when I show it to an energy
group, because they
look at it, it's
like, wow, that's a lot of dots.
That's a lot of lines.
Like, OK, you have
a nice network.
Why is this important?
Well, we operate the largest
backbone network in the world.
Third parties estimate roughly
25% of all internet traffic
travels across Google's
network on any given day.
We have more than 100 points of
presence across 33 countries,
and continue to be
scaling regions and zones
every day to meet customers'
preferences and policy
requirements.
Now I said we have
a large network.
But the question is, when you
look at that, how big is that?
That's roughly 100,000 miles
of fiber optic network.
That is huge.
Now we'd like to think
we have the largest,
but we can't actually prove it,
because our competitors don't
actually publish
that information.
But one thing that we all
do publish are cache nodes.
And we have roughly
800 cache nodes
throughout the whole
world-- and those
are those little
gray circles there.
We like to think that Google's
well-provisioned, highly
reliable global network
matters to customers who care
about privacy and security.
And that's almost
every single customer.
One of the reasons
we're able to go
do this is because
we do something
that's called cold potato.
Now you might ask,
what is cold potato?
Cold potato is this.
Basically, it means
that what Google does
is we hold onto a data
packet as long as possible
before it gets to the end user.
And we do that because we
want to provide security
and privacy, but we also want to
make sure it's very performant.
When you look at
other cloud providers,
generally what they do
is they play hot potato.
They get your packet, and they
want to pass it off to an ISP
as soon as possible.
And it might be
an ISP that might
have the reliability, the
privacy, and quality of service
that you might get
inside Google Cloud.
I actually think highly
performant networks
are really, really important.
If you go talk to
Niantic as an example--
they make "Pokemon Go"--
one of the things
that they did is
they looked at Google's
network and say,
hey, that's a great network for
our application, for our game.
And it turned out to be
the fastest-growing game
in the internet.
And they had to have a network
like ours to make it work.
But the question you
have to ask yourself
is, if you're in
the energy business,
why does this really matter?
I'm not making games.
I'm actually going and
looking for oil and gas,
or I'm producing electricity.
Since most oil and gas
companies or energy companies
are all over the world, they
need secure communications
in places like Africa,
Western Australia,
all over Asia, Alaska.
We have a network there.
We have edge nodes
and POPs everywhere.
So you can make sure
that your data is
going to get there
performantly, and also securely.
Now when I talk
about the platform,
I'm really enamored with
it, because I'm a developer,
and I like to do these things.
The question is, what
are others thinking?
In this case, this is Forrester,
and this is their wave chart.
And they happen to put
us in the leader circle.
I like that.
I think it's true.
But I like to think it's more
because we have integrated
new things together,
like languages like Go,
Kubernetes container
clusters, globally
scalable databases like Spanner.
And we've also got that global
network all over the world,
with more regions
coming on, that
allows you to have the
best experience that's
highly reliable and secure.
Now for me, one of the reasons
I like SparkCognition is I
like to think that's
one of the reasons why
they decided to build on
Google Cloud Platform.
So with that, let me hand
it over, and you can go on.
BRANT SWIDLER: Thanks.
BEN WILSON: Sure.
BRANT SWIDLER: Yes.
So as Ben mentioned,
a lot of why
we chose GCP was because it
was driven by our engineers.
So in that case,
we decided to put
Darwin, which is our automated
machine learning product,
onto Google Cloud.
So what is Darwin,
and what is it
in the context of
the industrial space,
and especially in oil and gas?
So I'm going to go through a
little bit about how it works
and what's happening
behind the scenes.
And then we'll get into
an oil and gas use case.
So for those of you who
aren't aware of the issues
with industrial data
sets, we're essentially
working on sensor information
coming off of big machines.
And sensors are really
inconsistent and untrustworthy.
There's sparse labels,
so people are not
sitting there labeling all of
the information that comes off
of it, which is mostly required
for adequate machine learning.
And it's not prepped for
adequate data science.
So if we look at two
separate wind turbines--
I go through the
wind turbine example
because it's a lot
easier for people
to understand relative
to oil and gas examples.
But essentially, if
you have wind turbines
in two separate
locations, those are
going to require two separate
machine learning models.
A wind turbine in Location A
might have a different failure
profile or predictive profile
than one in Location B.
An operator that runs a wind
farm might operate their farm
differently than another one.
The sensor packages that
are on a wind turbine
will be different.
And the actual turbine
that's actually
running in the wind
farm, those will
be different between
two separate locations.
So what does that mean?
That actually means that you
cannot build one model that
will run on both of these.
There needs to be a
scalable solution--
an automated solution-- that
can build specific models
for each individual asset.
So when we looked at this
problem when we started out,
and we started receiving
a lot of information
from industrial companies
looking for data science work,
we ended up having too
many customers, too
many customer requests for
the existing data that we had.
So these were some of
the internal factors
that we were looking at, to need
to build an automated solution.
In addition, to
actually build a model
for each individual
turbine, that's
a time-intensive process.
The current solutions that we're
offering were not scalable,
and there was clearly a lack
of an automated solution
on the market that actually
could do predictions
for each individual asset.
Some of the external
factors that point us
to automation-- and I went
through a lot of these,
but there's a lot of variation
from different assets
depending on location,
what's on the assets and all
that kind of stuff.
In addition, you
needed something
to solve problems with sparse
data and noisy data as well.
So when we saw all of these
problems, we went through
and we said, OK, let's look
at the workflow of a data
scientist, and figure out
what we can automate around.
So for a data scientist,
the first thing that you do
is you go through a
data cleaning step.
You take your data.
You do any sort of imputation,
or any sort of methods
that sort of make it into
a functional data set.
Then you go through your
feature engineering practice,
and you try to figure
out what features
you need to engineer,
what you can drop,
and all of those sorts
of little nuances
that's sort of the magic of
one specific data scientist.
And then you need
to build the model.
And if that model
doesn't work, depending
on what tools you
use, you go back
and you revisit your
cleaning methods.
You go back and you revisit your
feature engineering methods.
And you go back and
you revisit your model.
And its inherently just
an iterative process
that can take a very long time.
Now on two separate
wind turbines,
maybe you could
expedite this by just
having background knowledge
of what usually works.
But in essence, you will need
a person or a data science team
to actually build a model
for each individual asset.
So all of this requires a high
amount of technical expertise,
a lot of time.
And there was a lack
of productization tools
out there that could
solve all these problems.
So we went out, and we
tried to build something
that could automate the
data cleaning process,
automate the feature
engineering process,
and especially automate
the model building process.
So I'm going to take a
quick technical segue
to talk about how we
went through this,
and I hope that everyone
can stick with me on this.
So when we talk about automating
the feature engineering process
and cleaning step, I'll
get to that in a second.
But our approach towards
model building, this
is an image of a neural network.
There are three layers inside
of this neural network.
And each circle is essentially
math happening at each circle.
And each line, you can think
of that as a coefficient that
happens at each line.
Now what a data
scientist did is they
designed what this looks like.
So where all those circles are,
and where all those lines are,
a data scientist said, this is
the structure of the network
that I want.
And what machine
learning does is
it goes and just tries to
figure out what weights
or what coefficients happen
at each one of those lines.
Now if this is not the
optimal network structure,
there is no way to determine
what actually might be.
It's just your leverage to
whatever data scientist is
inherent knowledge or
whatever is in a library that
already exists.
So what we did was
how can we automate
the search for the
right network topology,
or what structure actually works
best for a specific data set.
So we went through,
and we built a product
that leverages neuroevolution.
So it uses
evolutionary algorithms
to actually search for the
network architecture that
solves this problem.
And it not only searches for
the network architecture,
it chooses different imputation
methods for cleaning.
It chooses different features
after features are generated
automatically, and it searches
for the network optimization
architecture that is
perfect for or optimized
for the actual data set
that it's being trained on.
So how does it do this?
So you upload your data set.
And in the first
generation you build
70 different neural networks.
And we also combined it with
some tree-based algorithms.
And in that generation,
we look at what
are the genomes of those
specific models that
worked the best to make
those models accurate.
Those genomes then get mutated
into the next generation,
and spawn 70 new neural networks
off of that initial population.
And we go through
that process again.
We take the genomes from
the best-performing models,
we mutate and breed them
into the next generation.
And we spawn off 70
new models again.
And that happens for n
number of generations.
So in this case, now you
can upload your data set,
and it will do all of the
feature engineering, the data
cleaning process, and
the network search--
the network
architecture search--
specific to the data set
that you upload it to.
And this is really
important when
we talk about things like wind
turbine A versus wind turbine
B. You're actually
doing a lot of the steps
specific to the asset that
it's being trained upon.
So what do these networks
end up looking like?
So here is three
separate use cases.
It's essentially
use case agnostic.
The one on the left
predicts bearing failure.
That's the architecture that was
trained off of 400 generations.
The one in the middle we
trained to essentially mimic
one of our in-house
pilots using a flight
simulator to try and determine
exactly how his flight
path works.
And the one on the right, that
is kind of blacked out here,
trades Dow futures.
So when you talk
about deep learning,
you can see how big
a model can actually
get by just doing this
generative process,
and running it for however
many generations you see fit.
So again, it's
use case agnostic.
It's not necessarily specific
to industrial applications,
although that's
where our DNA is.
We've kind of grown up
in the industrial space,
working on wind turbines
and oil and gas assets.
But with that, I'm
going to kind of go
through what the oil
and gas application is
that we've used this for.
So for those of you who are not
familiar with the oil industry,
there is a concept
called artificial lift.
So when you first
produce from an oil well,
the natural
pressures of the well
will lift fluids to the surface.
So if you've ever seen those
images of oil spewing out
from the top of
a derrick, that's
because the natural
pressure of the earth
is pushing that oil up.
Now after a certain
amount of time
the pressure starts to
subside, you produced a lot
from a reservoir, you need a way
to be able to lift that oil out
of the ground.
So the sucker rod pump on the
left, there is about a million
of these in operation
around the world.
What this is essentially
doing is if you've ever
had a cup of water
with a straw in it,
you put your thumb over
the straw and lift up,
and you see the column
of fluid come up with it,
it's essentially just
doing that on repeat.
So it's just kind of sucking all
of the oil out of the ground.
An electric submersible
pump is essentially a pump
that you've lowered
into the ground.
And it turns and pumps
oil to the surface.
So at a certain point
in the production
of a reservoir, after the
pressure has subsided,
you can no longer
produce to the surface,
you start implementing all of
these artificial lift systems.
And with these there are
many different manufacturers.
They've been in place
since the early 1940s.
There is a variety
in every field--
the same aspects of what I went
through with the wind turbines.
Different fields have
different profiles.
Different sensors are put
on each piece of machine.
And different operators have
different tactics in order
to make sure that
they're optimizing
these specific assets.
So with this, we
worked with a company--
one of our partners, Apergy.
Apergy is one of
the biggest software
providers that actually
monitors a lot of these assets.
So you can think about a
monitoring software that's
streaming information directly
from these artificial lift
systems into their platform.
And all of their data kind of
sits inside of their monitoring
software.
And production analysts
and specialists
look through this
information, and try and make
determinations on
what's going to fail,
what's going to optimize,
those sorts of things.
Apergy also sits across
wide-ranging applications
both in drilling for oil,
as well as producing it.
So even though we're
talking specifically
about these artificial
lift systems here,
we're eventually
going to start moving
into more and more of these
really complex applications.
You can go all the way up to
drilling applications as well.
But for this, as you
see circled on the left,
we're focusing on the
artificial lift systems.
So the way that this
works in practice
is that an ESP in the field
will stream data directly
into an Apergy
software, where someone
can make interpretations
off of what's
actually happening inside of
the well and make decisions.
And then we'll take that
data either through an API
or through just a downloaded
CSV from their software,
and put it into
Darwin, where that
will build a predictive model.
And Darwin sits in Google Cloud.
That model that can
then be put back
into Apergy's software,
and any user, any producer
can actually just make
determinations off
of the prediction model.
Now for us, Darwin, we have
it as a standalone software.
And I'm going to go
through that demo.
You can upload CSVs
just to the software
that sits on the cloud.
Or you can leverage it in
sort of an API call connection
to any sort of monitoring
software there.
Now here I talk about this
specific monitoring software.
But you can imagine it being
any sort of monitoring software
or business intelligence tool
that you want to put it into.
I had an executive come
to me and say, well,
in the context here, what do
you guys mean about BI tool?
And I said, you don't even know.
We're thinking so far past AI.
No, that didn't hit that well?
[CHUCKLING]
BEN WILSON: That's a bad joke.
That's just a bad joke.
BRANT SWIDLER: No one
was prepped for it.
BEN WILSON: We weren't
sure if it was a good joke,
but now that I see it--
BRANT SWIDLER: I thought
that that hit a lot harder.
Already thinking past AI.
So I'll go through
the demonstration
of how this actually works.
So I'm going to
upload this data set.
What this data set is is
all of the information
that's coming off of this
electric submersible pump.
This is an electric
submersible pump.
So it's the one that you
lower into the ground,
and it pumps oil to surface.
What we're trying to
do is do a prediction
on how much oil it will produce
within the next seven days.
So off of the sensor,
we're getting some metadata
on where this well is actually
located, production figures--
so oil, water, gas.
An oil well doesn't
just produce oil.
It produces a mixture of oil,
water, gas, and other sorts
of dead dinosaur stuff.
And then you also have
the tubing pressure,
casing pressure, and all of the
sensors coming off of the pump.
And again, what we're
trying to get to
is just a very simple
prediction of what
the next seven-day
forecast will be.
So within our software-- and
this is the software version--
you can also set
it up where it just
builds models automatically
through a connection.
But what we're
going to go through
is here I've already
built all of these models.
They're basically cards that
sit on side the software.
Let's see if it starts playing.
I go Create New Model.
And I can upload or name the
model that I want to create.
In this case it's Seven
Day Production Forecast.
I can give it a description.
And then I'm given all
of these workflows-- so
Infer Values, Forecast
Values, Infer Categories, Find
Patterns, or Predict Events.
So in this case,
we're trying to build
a library of these
sorts of cards
that, depending on the
workflow people want,
this can sort of organize
the data set in that fashion.
Infer Values is a
simple regression.
Forecast Values is just
shifting that sort target column
to be able to forecast, predict
something in the future.
Infer Categories, this
is a classification.
Find Patterns is our
clustering approach.
And Predict Events
is our strategy
towards predicting
a point in time.
In this case, we're
going to go ahead,
and we're going to infer values.
So I go to this card.
And I'll click Infer Values.
If you have any questions on the
software, you can go through.
And here I'm pointed to--
I want to upload it, drag
and drop my data set.
Give a little
description of what
the data set should look like.
So we try and
formulate everything
in these rectangular
formats where,
if you think about this
as an Excel file, or CSV,
all of your inputs are
in rows, or in columns,
and every timestamp is in rows.
And one of your
predicted outputs
is in the target column.
So you go find your data set.
You can visualize
your data set here.
And then you can go ahead and
select the targeted column.
So in this case, you're
going to go ahead and select
the column that is Oil
Production Prediction.
Click Next.
And then you're driven
to a point where
you can choose how long you
want to train your model for,
or how many generations
you actually
want to go through and
actually build this network
architecture.
Obviously, the
longer you train for,
the more it should converge.
We're giving out metrics
as well to make sure
that you can see how well
it's converging over time.
But this, as of right
now, you can just
determine the amount
of time that you want
to wait for this to train for.
There's also advanced features.
You can change the
accuracy targets.
For this, we use normalized
root mean squared error.
But now we're building our
model behind the scenes.
It's going through this process
of building models, determining
the right genomes, mutating
those into the next generation,
and continually going
through that process
until we derive the
network architecture
specific to this data set.
We give out the statistics
on this specific model.
So you can see how accurate the
models got over a generation
or over time.
You can look at all
of the statistics
around the data set
that you uploaded.
In addition, you can
eventually get to the point
where you're looking
at feature importances.
Here's the graph of
the actual output.
So you can see your
predicted versus actual,
and see how that
trends over time.
And then you can look at
your feature importances.
Or here you can see
that in an Excel format,
or CSV format, the
predicted versus actual.
In this case now
you get to the point
where you can look at
what features are actually
driving these predictions.
So this is the model-based
prediction feature importances
for this specific ESP for the
model for this specific ESP.
And once you have
your model built,
you have a number of options.
Either you can run, meaning
test your model on unseen data.
You can publish it,
or improve, meaning
upload your data,
upload new data
to train the model further.
You can publish it.
That means making an API call
to the model that you have.
Or you can download it
in the pickle file format
that we have, so you can put
it into a runtime engine.
And now we're going
to upload a test set.
So uploading the test set,
you can upload your code
in the same format
as drag and drop.
And you have your predictions.
So here is now the predictions
on the next test set.
We're doing seven days
production forecasting.
And now you have
your model built.
You can see the
testing accuracy.
And in addition, we
actually give out
another feature that gives
you a per prediction feature
importance.
So when I click
on a value, I can
see what's driving this
prediction down, less accurate;
and what's driving
this prediction
to become more accurate.
So on a per
prediction basis, now
you can see what's actually
driving your predictions.
So here through
this entire tool we
went through the
process of uploading
a data set from an
ESP, building a model
to predict the forecasted
values of the next seven days
production, and then being able
to look at the model output
and test it on new
data sets, and see
each individual prediction, and
what drives that up or down.
So even more than
that, the way that I
like to think about this from
the oil and gas perspective
is what decisions can I
actually make off of this model
that I've now built. So in this
case, what I've gone and done
is I have the test set up there.
And I changed some of
the control parameters
that I have as an operator.
So in this case, I can
make the pump pump faster.
I can make the pump pump slower.
And I can see how that
impacts my production outcome.
So the gray line here is the
actual changed production,
where the orange line is what
was originally predicted.
So when I make control
parameter changes to my pumps,
I can see how this impacts
my production figures.
And now prediction becomes
part of my optimization plan.
So in this case, the
average production
increased by changing
the pump parameter
is increased by 700
barrels per week.
And now you can go ahead
and just make changes,
and see how that might impact
the optimization of your ESP.
Now for those of you who
are oil and gas people,
you might look at this and
say, well, it's really actually
kind of rare to have a
production figure per well.
So in actuality,
what happens is you
have upwards of
60 wells producing
into one massive tank.
And inside of that one tank is
where you get your production
figures.
So in this case,
you actually have
all of these wells that
are using different types
of artificial lift systems.
Different types are
operated possibly
by different people
in different fields.
And they're all going
into one battery that's
the only targeted outcome.
So for the first
one I went through,
I had this seven-day
production forecast.
That seven-day
production forecast
is specific to that well.
In actuality you'll
have 60 wells
that are producing
into that one forecast.
So now your
optimization is how do I
change all of these
pump parameters
in order to make sure that I'm
driving the most production.
BEN WILSON: Yeah, I mean
for me, for those of you who
have actually operated
batteries, and operated pumps,
this is the realistic situation.
I've done a whole
bunch of AI trying
to do prediction,
and also try to do
preventative maintenance to try
to be able to predict failures.
And the biggest challenge is
being able to go out there
and say, OK, I have a battery.
It's got 60 wells.
Odds are all 60 wells
were not drilled by me.
They were drilled by probably
three or four other parties.
They have three or four
different types of equipment.
They have different
types of sensors.
So being able to have this
iterative approach to going
and looking at the
data that's coming off,
and being able to do
predictions by well
makes a huge difference.
You can think about it from
an operational perspective,
and how full is that
tank going to be.
You can also look at it just
from a maintenance perspective,
too.
So when I look at this as
an operator, I look at it
and say, this is really,
really important.
One other thing I'd
say, too, is in case
you don't recognize the name
Apergy, that used to be Dover.
And most people will
recognize Dover as one
of the largest artificial lift
manufacturers in the world.
And when you think
about artificial lift,
you think about those pushing
them down into the well.
And usually you put three or
four or five of these things
together.
And being able to get these
things to operate correctly
across a set of
batteries like this--
really complex.
So this really does
simplify that process.
I will also mention
is if you didn't
see the dev keynotes today,
they actually did all of theirs
live.
They didn't use
video like you did.
Maybe next time we
need to do it live.
BRANT SWIDLER: Yeah,
I appreciate that.
Thanks, Ben.
BEN WILSON: Yeah,
I'm here to help.
BRANT SWIDLER: I mean,
the big thing here
is that what we're
going after is
the scale of the problem
that's actually in hand here,
where there is
upwards of a million
of these wells in
operation around the world.
So being able to build
these models automatically,
and through the tools
that we've built,
is a huge benefit to a lot
of oil and gas operators.
And being able to make these
changes in optimization
plans in real time,
and see how that
impacts the
predictive production,
is going to be pretty huge.
Kind of back to
Darwin for a second,
I pitched it
originally as this sort
of use case agnostic position.
And in reality we've tested
it on a number of data sets.
And we've done a number
of Kaggle competitions
as well, where we've
seen that it's actually
a pretty useful tool.
Depending on how long
you train it for,
or how long you want to
go through the process,
it really does well for time
series especially when we
add sort of a
recurrency aspects,
or looking back into the
past components of the model
building [INAUDIBLE].
BEN WILSON: In case you
don't recognize-- oh,
you've got to go back one.
I got to say something.
Kaggle credit card fraud,
Google owns Kaggle.
Nice to see that we can actually
do something right there, huh?
BRANT SWIDLER: Some
of the new things
that we're adding in here--
unsupervised learning.
We've had some success
with different approaches
on unsupervised learning before.
Now we start to automate and
do a very similar process
for a Gaussian mixture model
variational autoencoder.
That's fancy-speak
for saying we're
doing clustering and
dimensionality reduction in one
step.
In addition, what we do, what we
call normal behavior modeling,
where it's trying to just
predict normal or not normal,
and what that
actually looks like.
We also use a very
similar approach
towards leveraging the
evolutionary approach
for building autoencoders.
In addition, we're still
working on our user experience.
So any sort of feedback that you
have on what you'd like to see,
or what those cards might
look like, or workflows
that we can sort of
automate to make sure
that the tool works
for the design process,
we're working on that as well.
And then we're always working
on speed and accuracy.
So speed in the model
development process
composes of both leveraging
Google and their expertise,
as well as our
approach towards making
the evolutionary
algorithms smarter.
With that, I mean, it's been
awesome working with Google.
Ben's been fun, even though he
kind of makes fun of me a lot.
BEN WILSON: My job.
Thank you.
I'm Ben Wilson.
I'm the smarter,
better-looking cowboy.
This is Brant.
Oh, I'm sorry.
You're the
better-looking cowboy.
BRANT SWIDLER: There you go.
BEN WILSON: We appreciate
you all being here.
Thank you very much.
[MUSIC PLAYING]
