My name is Sean Sall.
I have a background in Economics and Healthcare
Microsimulation Modeling.
My Capstone Project for the Galvanize Data
Science Program was to build an Early Detection
Model for Forest Fires.
So my motivation for the project stems from
the fact that we have satellite imagery of
detected fires but it's not at all used to
aid in forest fire prevention.
I don't think it's an issue of anybody having
not thought of this.
I think it's an issue of it being incredibly
hard to actually use that imagery to aid in
forest fire prevention.
From my understanding, I'm not a forest fire
expert by any means.
I'm kind of just diving into this realm.
But forest fires move incredibly quickly and
there's so many different factors that go
in, the type of forest it is, the weather,
the location, and how dry that kind of forest
is.
So it's just all those factors come into play
and it's incredibly hard to capture that in
one model.
So the way it currently works is that NASA
has satellite imagery that gets run through
this Algorithm Out of the University of Maryland
and that Algorithm Outputs this dataset that
holds a large number of rows that are supposed
detected fires.
Those detected fires could be anything from
house fires to farm or burn piles to actual
forest fires.
The main question I was trying to originally
set out to ask was, "Could I somehow pair
down that data set and grab only those detected
fires which were forest fires?"
Just so happens that States at the end of
every day are required to submit forest fire
perimeter boundaries.
If you take the latitude, longitude coordinates
from all those detected fires and compare
those with the forest fire perimeter boundaries,
you can pretty easily pair down a Detected
Fire Data Set into only those which are forest
fires.
The issue without those that you're doing
that after the fact and historically.
And if you want to do that in real time, what
you have to do is build some kind of predictive
model that could take in the input, the Detected
Fire Data Set, and output which detected fires
which are actually forest fires.
So that's the goal that I set out with at
the beginning, was to kind of build this model
that would take in that data set and output
which detected fires or forest fires.
My funding are ongoing.
I have done...I think, I've done a very good
start but I still have a bunch of data that
I'd like to input into the model and I'm still
kind of re-working the model and trying to
tease everything out that I can.
So my hope is that eventually I'll get to
some point where I can use this model, hopefully,
to actually aid in forest fire prevention
in real time.
The way that those satellite images are used
currently is that at the end of every day
the Forest Services will say "Here are these
Forest Fire Perimeter Boundaries in States.
Is there a satellite image of a detected fire
to kind of back that up?
Can we confirm these Forest Fire Perimeter
Boundaries with these satellite images?
And so that happens at the end of every day.
And if you think about it, if we could know
that there is a fire somewhere, two, four,
six, twelve hours before we use to know that
there was a fire there, that could help quite
a bit.
Theoretically, any fire if you catch it early
enough could be put out with garden hose.
But if you don't catch it early enough could
develop into something where you're sending
in choppers, you're sending in forest fire
teams so it can get into pretty expensive
and pretty costly pretty fast.
So my current methodology has been to build
Predictive Models that use the Detected Fire
Data Set along with Geographical information
to maximize or see area another curve.
That metric will tell me how effective my
model is at producing a probability that a
detected fire is actually a forest fire.
After countless considerations on Amazon web
sources of fitting a logistic random forest
gradient boosting model and a neuron that
work to the data, I found that the gradient
boosting model consistently outperforms the
others and gives me the highest or see another
curve.
Next steps though are to start including weather
data because I have yet to include that.
And I think that would be incredibly helpful
in adding predictive power.
The only issue right now is finding a Robust
Granier Weather Data Set.
I have Robust Weather and I have Granier Weather,
but I don't have the intersection of the two.
