thank you a lot it's great to be here
it's great to meet the students and the
faculty for the program and hopefully
it's a fun talk so this is a standard
quote that gets thrown around a lot
you know the NFL is a very unpredictable
league where you know a lot of times the
better team wins but generally speaking
lots of crazy stuff happens I don't know
how many people are like avid football
fans or minor fans but even if you're
not this talk is kind of designed to be
showing like how you can use machine
learning for this particular problem but
it's also designed to show you like how
like machine learning you can actually
use right most of the time we talked
about machine learning Netflix is giving
you recommendations Google is searching
things for you Amazon wants you to buy
things but you know it feels very other
people are doing it for you like what if
you actually have a real life problem
yourself right for me my problem was I'm
in a fantasy football league with my
brothers-in-law and I want to win right
so and I've ... I was doing it without
machine learning for years I was just
thinking like oh I'll pick this team
I'll pick that team and every year I was
doing it I'm like I could probably do
better I should get the computer to make
some picks for me and I waited waited
many years and then finally I'm like I'm
just gonna do it and so this is sort of
like an example for you guys also that
you know you can actually use this stuff
in like real life so this is joint work
with a good friend of mine we kind of
came up with the idea together we coded
it together you know he teaches at
another University in a financial
engineering program also honorary
mention to my 12 year old daughter she
she also helps with this process she
helps make picks she insists on taking
some cut of the winnings
fifty dollars is a lot for like a
twelve-year-old so she's pretty happy
anyways okay so how do fantasy football
leagues generally work right there's all
different varieties of leagues this one
that we're looking at today is like a
relatively simple league we're not
picking individual players but just like
let's go back one step you know
basically as a football fan or a sports
fan you watch every week and you think
you like know better the coach should
have done this or that team should have
won or you know like you think you know
better and then also there's like all
the people on TV they're talking like
this should happen or that should happen
this team's gonna win like is that all
really true the other thing that you
know we have like some market
information in the sense that you know
before the games like there is a betting
line you can go to Las Vegas you can bet
on these games and you know you can say
this thing called the point spread which
is like the amount that a certain team
is supposed to beat another team by like
that probably encapsulates like a lot of
information right ... people are you
know sort of irrational people always
vote for their own team but you imagine
like lots of people voting and they're
actually putting their real money at
stake they're not gonna on average let's
make stupid decisions so one of the
ideas for this league is that you know
we start with the point spreads as like
a simple way to get started and see if
we can really like do any better than
that using machine learning techniques
okay so how does our league work our
league is like I said relatively simple
it's called you know a pick'em league
meaning that on average every week
there's like 16 games sometimes there's
14 on bye weeks and stuff like that but
you're supposed to rather than worry
about the point spreads you're just
supposed to pick who wins so for example
you know whatever like I forgot all
the games this week but I think Denver
was playing like you know Oakland this
week right and you're supposed to pick
out right you're not supposed to worry
about who's favored and who's not
favored you just pick who you think is a
winner and then the way you kind of like
accumulate points in this league is you
have to assign points like 16 all the
way down to one and if you get your top
pick right then you get 16 points
you get your second top pick right you
get 15 points if you miss your 14th pick
you get zero points for that one so you
can imagine like and then the way you
win this league is you accumulate points
over the course of the year and the
person with the most points at the end
of the year wins wins the league and
generally speaking you can win in an
individual week right by B just going
crazy
picking all the right upsets and getting
everything just right but on average if
you're gonna win over the course of the
season it's gonna be better to be steady
and consistent and just like not make
mistakes so let's just look a little bit
of like how this looks when I go to the
website and this is how I make my picks
I don't even remember this was probably
like a long time ago but this is just
kind of how it works you know you pick
these two teams this week I you know the
model or I or whoever like decided that
Indianapolis was the top pick and so we
assigned like 16 weight to them 15 right
to them and then you go and you just
enter your picks and it kind of locks it
in and then you're competing against
everybody else okay so what are the
various strategies right like so one
thing I kind of already mentioned and
eluded to is like let's pick the
simplest strategy that requires like no
brain power well I mean one brain power
no brain power would just be the guest
randomly right but that's not what we're
trying to do so we take exactly what you
know Las Vegas is telling us and we
basically take the team that's like the
highest spread so if a certain team is
favored to win by ten and that's the
highest that week we put them at sixteen
and then the next team may be their only
favored to win by seven so we put them
second and then we go down the line and
order them in that ... sort of way and
then we have like some various tie
breakers like if two teams are both you
know favored to win by four you know we
just pick the ones that are like a home
team or like if there's still a tie then
we pick okay which ones got the better
record but an on average like those
little differences like don't make much
of a difference on the other hand you
could just do it just ad hoc based right
you could not care about what Las Vegas
says you could just do your own thing
you could look at the win-loss records
of the teams you could look at or are
they playing a good team are they
playing an away game or a home game are
they playing a division game or a non
division game so this is a little
nuanced depending on how familiar you are with the
NFL basically the league is broken up
into you know like six divisions or I
think it's like eight divisions nowadays
and basically you play all the teams in
your division multiple times during the
season as opposed to not you don't play
everybody so you have a much more
familiar relationship with the teams in
your division you play them much more
there's more heated rivalries there's
more competition you tend to play a
little bit different and then you know
other things you could look into you could
look into injury reports you could just
have personal preference intuition for
example I know my brother-in-law is a
giant Steelers fan he kind of can't
physically bet against them even though
they might be like you know favorite to
lose so but you know but he'll ...
he'll pick them but he'll put them at
the bottom like for one point even if it
means that he's picking the wrong thing
but you know like I'm not going to do
that I want the machine to tell me
what's the right thing to do um but the
other thing to remember is like ideally
aside from the personal preference and
intuition part ideally the point-spread
encapsulates a lot of what's out there
in the world if some major player got
injured the point spreads will affect
that if some team doesn't play good on
you know artificial turf or you know bad
weather like it should encapsulate that
so our data set is actually relatively
clean so if we just look back like
historically at this league like you
know what happens it turns out that this
spread guessing strategy which you know
I say like requires no brain power wins
... straight up will win this league
half the time and you know so I just
kind of compiled the years the winning
score of like whoever won that year and
then what this spread method you know
using some back testing would have would
have gotten us and you can see like you
know basically four out of the eight
years that I looked at it just using you
know no smartness no machines no
intuition you would have won this league
so now you're you're already in like
you've already put a set a pretty high
bar right because all these people maybe
like 50 people in the league they're all
doing their best to try to win and you
know this really simple method that
requires no guessing is already kind of
outperforming them so how can you do
better so this is where I decided to
give my
machine learning project and see if I
could do better so just some basic
machine learning basics we're gonna use
a technique called supervised learning
supervised learning is where you give
the computer some training data you give
it what you call our features which are
like the known you know variables and
then you actually give it a known result
and then the computer extracts like a
model out of that and then using that
model it can now predict what's going to
happen with new examples that it's never
seen before
and how good your model is is basically
how well did you train the model okay so
one quick thing I think we've all seen
linear regression before this is not
what we're gonna use linear regression
is good for predicting you know some Y
variable when you have a bunch of X
variables and you know we've all done
this before we've minimized things but
that's not necessarily going to help us
with this problem because we're trying
to predict wins and losses on the other
hand a technique called logistic
regression is good for classifying
things so this is like you know some
sort of I think this is from the
Coursera machine learning course but
basically you're trying to discriminate
between two people and you can see this
blue line is what's called the decision
boundary and you have people that are
these yellow dots and you have those
black pluses and the decision boundary
more or less does a good job of
discriminating between the two but it
doesn't get everything quite right and
maybe that's okay because you know you
can't expect that your machine learning
algorithm like will get it hundred
percent right and you're willing to live
sort of with what Ed mentioned is like
some googliness right like it's okay it
doesn't have to be perfect and in fact
if you had drawn the perfect line that
like you know just discriminates between
the you know the two different data sets
here you get into a problem area that's
called overfitting like you fit your
data exactly but you don't actually
you're not very good at making
predictions you're only good at
memorizing what happened in the past
so you don't want to get into that
problem so when you're doing logistic
regression because you're basically
doing like some sort of a binary
classification you want to use a
function that like helps you sort stuff
out so you can see this the standard
thing that goes into a
logistic regression is this thing called
a sigmoid function it has like a nice
feature that it's like smooth and then
if you're above 0.5
you know you very quickly go up to 1
and then if you're sorry if you're above
zero you very quickly go up to 1 if
you're below zero you very quickly go
down to zero and the basically the the
answer you get is like related to the
probability of the confidence in that
pick so if your probability is closer to
0.99 and you have a very high
probability if your probability is like
close to 0.1 then you're closer to zero
so you have a very low probability of
well low probability of being classified
as a 1 you have a very high probability
of being classified as a zero and this
works for us because in the end what
we're trying to do is we're trying to
classify based on our you know history
of these NFL games like did the team
that was favored did they win the game
that they were supposed to win or not so
now we're getting a little bit closer to
solving our problem so in the simplest
form like the logistic regression has a
set of inputs called features and
it has a single output for a binary
classifier and in our case we have to
figure out what are the relevant
features that I want to include in the
model and I have to also think about
like exactly like carefully like what am
I going to get the computer to predict
because I want that probability to be
meaningful for when I go to like make my
picks like 1 through 16 so what are the
things that we picked we picked a very
simple amount of features we didn't look
at a ton of data we just looked at your
current year's and last year's win loss
record we looked at what week of the
season it is because let's say you have
a hundred percent winning record that's
... much different than if you're 1 and 0 or 10 and 0 it's much more
meaningful
also we look to see if it was a home
game because it was a clear advantage to
playing at home and we look to see if
it's a division game because this is
one of the things that I've kind of
noticed over the years that like two
teams in general will play each other
well and in these division games even if
you're like an underdog you tend to play
much much better against your division
opponents because of the familiarity
and especially here at home and I don't
know how many people are like fans you
all sorts of crazy stuff happens you
know like the Jets for example are not
good but they'll beat New England at
home and you know nobody's surprised and
then the spread this is also like one of
the key pieces of information that goes
into it and the idea here is that we're
using the spread and we're using these
other features to sort of augment the
model to see if we can do better and
then the binary classifier the final
thing we're trying to predict is did the
team that was favored did they win the
game or not so just zeros and ones okay so
it's a data science talk we're gonna do
some Python so we use this Python comes
with a really nice machine learning
package I'm sure if you're taking the
machine learning course you run into
what's called scikit-learn it's actually
pretty straightforward like the actual ...
like Ed said like 80% of this work was
getting the data formatted correctly so
that it could actually do three lines of
code right literally three lines of code
you have X's which are your features you
have Y which is your classifier and you
fit the model you score the model and
you predict and that's it
and all the other stuff I'm going to
show you is the 80% which like goes into
making sure that like the ones and the
zeros and the numbers all look good
together alright so and then how do we
do this in Python there's these things
called iPython notebooks you know
normally on a weekly basis I have like
just scripts that run automatically and
you know spit out the right answer but
when I'm doing what's called exploratory
data analysis or looking at results and
trying to visualize results we try to
use these notebooks so let's see if this
doesn't break completely ... oh look at that
nice use of technology all right so
here's my notebook we'll go through it
relatively quickly there's you know
first of all they are just some like
setups you import some directories you
import some packages turn off warnings
let's see here so I'm not going to run
it live because
I'm sure if I try to do that it would
break but I did run it just not too long
ago so you should believe me it's not
completely canned okay so first of all
we have some reference data ... here's
the team's what league they're in what
division they're in this is important
aside from the historical data the next
thing we're going to do is we're
going to define what we call the test
and training sets so anytime you're
doing machine learning you want to
you're trying to make predictions and
you're trying to see how good your
predictions are so you don't want to
validate your data based on stuff that
you memorize so you want to hold out
some data that you haven't seen before
and then you want to see how good your
model works on that luckily for us
because we have like a lot of historical
data I can basically run the model on
let's say and I what we chose to do is
like pick three years of data so let's
say we took the data from 2008 9 and 10
and then we predict what we think would
have happened in 2011 and since 2011 has
passed already
we can test to see if our model was any
good or not and so this is how we tested
the model but this is actually live
where I'm gonna show you like what we do
on a weekly basis to make the
predictions for this week so right now
the test year is 2016 we don't know
what's gonna happen we want to predict
for 2016 and we're gonna train based on
these three years 2013 through 15 and we
kind of mess with different ideas of
which how many years to use like five
years seems like a good idea but ended
up being too you know it like
incorporated information was a little
too old one season was like not enough
information to get the statistics like
kind of robust and then the other thing
I would like remind you is that this is
mostly like a fun project and you know
you guys can ask like a ton of questions
like did I do this and did I do that and
we thought about some things and we
didn't think about others but I think
this idea is that you know you can you
know use this as a starting point in
your explorations using machine learning
and see how far you want to go but I'm
happy for suggestions though because I
do want the model to get better ok so
this is the part that this is like the
80% basically getting all the training
data you read in all the games you like
look at the records of the teams you
have to compute all these like metrics
for
you know who's in what division who won
who lost and then so I do it for the
training set I do it for the test set
not that exciting
okay so right before I'm about to send
in the data to the model like what does
it look like so I the computer doesn't
care whether Baltimore's playing
Pittsburgh it's just just a name to it
right so the things that the computer
cares about is the features that I
talked about so this is what the
features look like the favored record
this is the first week of the season so
clearly your ... your current record
for everybody is 0% and then and this is
why the previous year's record is
somewhat important because the first
game of the season who knows who's gonna
win there's just the spread right but
hopefully if like the Super Bowl winner
is playing you know somebody who was
like terrible last year that's some
indication of you know who might be
better so we have the previous record we
have which game of the week you're at we
have the line we take the absolute value
of it because we have another field here
that says favored home game so that
automatically accounts for the minus
sign or the plus sign as to who might
be favored and there's this flag for a
division game and then this is the
classifier it's not that exciting it's
just zeros and ones in that week did the
favorite team win so I send this all to
the scikit classifier and it's pretty
straightforward this is all wrapped so
that we can you know run this over and
over again but what I showed you before
about running the classifier and
predicting it inside it really is like
just those three lines so we set up the
classifier and then we can predict week
9 which is the week that just happened
so we're gonna look to see what happens
and then we kind of look at like what
does the prediction data look like
and so basically what's happening is you
know these were the games this last week
and I ranked them by the probability that
the particular team would win and so the
nice thing here is like not only does it
tell me that if I'm above 50% that's
telling me that the favorite team should
win and there's only one upset pick this
week turned out it didn't work but
everything that's above 50% should be
that the favorite team wins and this
also gives me a way to rank the teams
between 16 all the way down to one
there's also only 14 games
this week so it goes 16 down to three and
so this is what I need to do in order to
make my picks into the system and then
you can just see like and that's pretty
much it like we can see what the model
would have predicted and so let's just
jump back to basically the the other thing here we'll present and then we'll show a
little bit results now that we showed how
we use this okay so back testing so we
trained over multiple sets of three year
periods and like looking forward like
another year and we look to see how the
spread strategy would have done against
the person who won the league that year
and we also extrapolated how like the
machine learning strategy would have
done that year and it looks pretty good
and this is back testing so we just have
to remember that like back testing's like
never as good as forward testing I don't
know if anyone's ever traded on Wall
Street at a hedge fund you have all
these great ideas you're gonna make
money you try to put it in action in
real life it doesn't work but you know
but still you have to do your back
testing and you have to convince
yourself that you went through some
like reasonable amount of you know
effort to make sure that you think the
strategy is gonna work going forward and
then you tweak it along the way as
things break or you come up with more
information one thing that I'll mention
is like I keep referring to this
moderate strategy over here there's a
bunch of like different ways that you
could actually make the picks in this
particular league one particular way
which I call the conservative strategy
is to just always pick the favorite
regardless but then only you like use
the numbers to kind of reshuffle the
order so that would be very similar to
the spread strategy it would just kind
of change the order of some of them the
other thing is to actually pick the
predicted team so for example I don't know if you remember at the bottom it said Baltimore was well
Pittsburgh had a 44 percent chance of
winning which means Baltimore the
underdog should be favored to win so
we're gonna actually pick Baltimore to
be favored to win but we're gonna put
them at the bottom of the pile just
because it's an upset well the other
thing we could do which I call the
aggressive strategy
is to figure out what's the relation to
the point five because like what if
Baltimore what if the probability of
Pittsburgh winning was zero right that
means it's a hundred percent chance that
Baltimore is gonna win so then I should
actually take Baltimore and put it way
at the top at sixteen but you know we
did some back testing on that and it
turned out that the aggressive strategy
tends to have like a very high standard
deviation it like wins some years like
by one hundred and forty points and it
loses other years by a hundred and forty
points and so you know in an effort to
be you know a little bit more
conservative and to see if we could like
win more consistently we decided to pick
this moderately conservative strategy
and and then live testing right live
testing like how how does it work any
good at all or not so 2014 was the first
year that we ran the strategy the spread
strategy actually won that year my
daughter was happy because she's the one
who puts in the picks for the spread
strategy because she's pretty sure that
that's the best one the moderate
strategy did not do so well this year
that year last year was pretty ideal the
moderate strategy came in first place
and the spread strategy came in third
place and the second person was just
barely above the spread strategy so um
that was actually kind of nice and it
was like a little bit of validation of
the model and how it works
and we were happy to see that happen and
and then currently we're not doing so
hot but I will say that because it's
like a slow and steady strategy like
about two-thirds of the way through the
season is like when it really kind of
like starts to build up and like the the
consistency of it starts to like
outperform the people that are just like
making random guesses on a weekly basis
so hopefully good things will happen and
and then just you know depending on how
much football you watch on Sundays and
Monday nights this is what we had picked
for this current week and you can see
that the spread strategy which is the
far corner if you see favored win they
only got two wrong whereas the algorithm
with the moderate strategy actually got
three wrong because it wrongly picked
the upset of Baltimore over Pittsburgh
and Pittsburgh actually won
so um and then the Seattle game
which is why I'm wearing a Seattle
t-shirt is gonna happen tonight and
they're predicted to win I'm not
necessarily a fan but it's fun to root
for the algorithm ... and that's all I believe we have a little time for questions
thanks very much we've got a hand right
up there at the back straight away if
anyone's got a mic we'll go to the back
thank you hi thanks so much for your
presentation one immediate question I
have is football to me doesn't see I'm a
sports fan I like a lot of different
sports and if I were gonna do something
like this football would not be first on
my list because of the very limited
number of games
that's like seems like at least one
thing that would make this a little less
conducive your training sets and
dev sets and all that just can't be as
large you know baseball yeah so did that
go into you I mean are you just a huge
football fan like what what are the
contributors don't know I'm like a big
sports fan all around and the idea is to
sort of like use this as like a starting
point and then we definitely want to
look into like baseball and even like I
don't know Pro Cycling you know it's
like one of my favorite things you know
it doesn't seem like a team sport but it
is if so yeah I agree like with baseball
there's definitely a lot more like 162
games over the course of the season lots
of individual player statistics and then
the other thing you know this was just
literally to get started to win this
particular league but even like you can
imagine starting to look at player
statistics and how to how do you do like
a player team oriented fantasy league
but yeah it's certainly a good point not
limited to football at all Thanks okay
we've got let's go right across at the
end of we just on we down at one mic at
the moment yeah
okay gentleman in the white shirt there
and then we'll and then you can pass it
back for the next question after that
thank you hello thank you for the
presentation we can hear you yeah it's
good yes so my question is that it
sounds like your model relies a lot on
the Vegas spread mm-hmm I was thinking
um why do use the Vegas spread and have
you thought about the place to get say
with I don't know the predictions from
538 for instance instead of the Vegas spread yeah so that's a good point
so for I think the thing is like what
538 does is they do some version of this
right they do something else that is
also like model driven right so I think
one of the lessons that I got from the
years of working at Wall Street is
there's like market information right so
538 is model information and there's
market information and there's a
difference between what the bank says is
the valuation of a security according to
the model and what the market says and
companies have gone bankrupt and credit
crises has have happened so the idea was
to take market information and I think
what we do is we look at 538 to see what
they're predicting I think even like
being like Microsoft search engine if
you just type in like NFL games like
they give like a probability of winning
and we've kind of like matched up ours
you know see like oh like are we totally
off-base what are they doing we're
trying to we have you know we don't
really know like what they're doing but
it must be something along these lines
but you know probably they're using like
a much larger richer data set to kind of
you know pull in this is meant to be
like relatively simple like literally
like five inputs into the model and see
if we could like you know do something
that's interesting and effective but yeah
great question
hi my question is can you talk a little
bit about why you picked logistic
regression versus any other classifier yeah so one of the reasons I picked so the thing is one of
the things I didn't show here is that we
do like to run a third strategy and
... using like support vector
machines and that one also is like a
really high volatility and so we haven't
gotten that to work so there is like
certain amount of like what's the word
like machine learning know-how and like
really understanding like some of those
like algorithms like much more like
theoretical basis logistic regression I
feel like is the simplest to understand
because of the binary classifier and
because we're doing a binary output for
example the support vector machines when
it gives the probability it's not
usually visualizable in this like
sigmoid function oriented way so I think
for the illustrative purposes like
logistic regression is also like a great
idea but we are trying to test like
other you know decision trees and random
forests and see if like something would
do better or not but that being said
like so far over the years the logistic
regression has actually performed the
best in terms of like going up against
live competition okay thank you okay
we've got a hand right at the back there
in the corner yes mic's coming to you
thanks sorry since it's a dialogue I
feel obligated to talk for a minute not
ask you a question is that fair kind of
trying to talk about what Ed talked
and you talked together with the
question of like why start at a domain
that doesn't have that much data not
sports but this particular sport I'd
like to actually tell you that I think
statistics and machine learning started
with small data not big data and I know
it's a very good thing to always think
about big data as the challenge like
processing the data and doing all that
it's much more intensive when you have
big data but the challenge with small
data is actually a very important one I
know sports may not be like
life-threatening moments
and we may not think about it as
important as I personally don't think
about it as important as other things
I'm life but but I think that this
raises are actually a really good point
in data more data is better than no data
and it's a really big important topic
there's a lot of domains that don't have
big data and are very very important to
tackle and the algorithms not all of
them but a lot of them especially like
the stuff that you were talking about
are applicable and you should not shy
away from them just because they're
small smaller data sets so I really do
I'm going to talk a little bit about
agriculture where life isn't as pretty
with big data sometimes and so I'm
really a big advocate of taking
something with small data and showcasing
it trying it and even failing and
learning from it so I appreciate the
effort to go not into baseball which I
personally dislike so thank you for trying to tackle something different ... I've got a mic so I'm going to talk so we're going on going on
the same line of thinking do you think
adding features to those additional five
those initial five adding additional
features would improve the quality of
your predictions given your experience
or do you feel like the simplest method
is the best given the success of the
spread relatively so I think definitely
like more data would kind of enhance the
model but the nice thing about machine
learning models is like you don't
presuppose like so I put in this thing
like for the division games that I think
is important right but then when you
actually run the model it kind of spits
back out at you like what is the
relative weight of that factor and if it
thought it was useless it would be zero
or you know you can another thing you
can do with what's called feature
engineering and machine learning is like
you can take out that feature and seeing
if you're training accuracy goes up or
these results are more stable so we kind
of did that with at least the features
that we picked and then but we are
trying to figure out like how to add
more data but it is also an 80% problem
right like you know it's just like work
and and we just haven't gotten there but
I think definitely like there's lots of
other data in the world where you can
look at like defensive statistics and
offensive
statistics you know do you care about
like individual like players and
injuries and stuff like that like if a
star quarterback is like not playing
like is the difference that I guess the
thing is like what this the whole point
of this exercise is that the spread is
already telling you something and can
the model uncover like does a home game
mean more than what the spread is
already telling you right because
generally speaking you kind of hear
anecdotally that like all things being
equal the home team has like a
three-point advantage in the spread
right so it's already baked in so it's
only efficient market I thought what's
that like the efficient market
hypothesis right exactly
so it's only like can the model discover
something that it's not taking into
account as much as it should or over
doing something and so yeah that's like
an open question like who knows which
things will be relevant or not but it's
definitely we're trying with other data
sets volunteers are welcome ... we have
tome for a couple more questions let's
take yeah the mic is getting passed to
you there and then we'll go to the back
for the final question so has anyone in
your league become more data-driven as a
result of you doing this project I'm not
sure they all know I'm doing it I even
tell them that the person who like wins
the league just straight up uses spreads
but I think the thing is everybody like
it's a fun thing to write so there is
like some I think what happens with a
lot of people is they start with the
spreads and then they make tweaks like
they just kind of adjust based on
personal preference and intuition and
what they happen to know so I think
maybe some people are doing that a
little bit more but it still seems to be
just kind of like a fun thing let me see
if I can outwit the algorithm and not to
say the algorithm is not amazing all the
time it does pick bizarre things and you
know even I like question it and I'm
like who knows but you know we just go
in it and we root for the algorithm yeah
question at the back there yeah
all right so when you were talking about
the the spread because there's a lot of
interesting information baked into it
have you ever actually thought of trying
to predict the spread and see what kind
of inputs actually go into the spread
itself and try to understand that a
little bit more deeply so we haven't
tried to predict the spread but one
thought that we had was to take out the
spread and see if like it could like
rank you know independent of the spread
right so that would be like an
interesting thing because then you're
almost like recreating the spread or
doing like an agnostic thing where
you're not taking this like market
information so that's like one thing we
thought of we haven't tried to predict
the spread that's an interesting it would
be I guess what you'd have to do is
you'd have to take these probabilities
and like map them historically to what
that means right like is a 90%
probability a 10-point spread or not you
know then you would do some linear
regression probably yeah
seems like it'd be fun yeah and the
other thing I was thinking of have you
thought of incorporating because we have
these beautiful simulators that we've
been building for 12 years now is in
Madden 2016 and actually running a
bunch of games on that and then adding
that as another feature
the one thing I have thought of though
so this is like obviously like writing a
bunch of Python code using scikit-learn
etc etc there are starting to be like
drag-and-drop machine learning tools for
the non programmer I think like
Microsoft has something there's
something called BigML there's this new
thing I just ran across the other day
called orange I forgot what it was
called orange something or other but
basically there are tools for the data
savvy person but not necessarily like a
Python programmer to like pull in your
data you know say that these things are
you know kind of do a little bit of
cleansing do a little bit of that 80
percent and you know say that these are
the features this is the variable and
predict for me and I actually tried
running this on BigML and it more or
less gives like the same answer that you
know the model was giving
short of like not knowing what the
tiebreakers were so that was kind of
cool I thought that it was like like
this is almost achievable for the masses
or my brother-in-law if he wanted to you
know great Amit thanks so much for
sharing your fantasy football work with us
