Hi, everyone, welcome back.
And this next little bit of time, we're
going to be talking about machine learning
and artificial intelligence and I'm
going to be giving some basic introduction
to these different techniques and
then I'm going to dive into just one area.
As I get started, I just wanted
to mention that I've got actually
quite a bit of material online if
you are interested in going deeper,
because in just an hour of discussions,
there's only so far we'll get.
I gave a two day course on machine
learning and causal inference which is
available on the American Economic
Association website with videos.
And then on that site, there's also
a link to a Google Drive, and so
I've uploaded since then more recent
versions of those slides, and
I have tutorials and R scripts and I have
a GitHub with sample data and so on.
So there's actually quite a bit of
material that you can draw from.
And I've also got on this slide a list
of kind of the friendly introductions
that you can start to read if you just
want to get a taste of what's going on.
I have this annual reviews
paper that Heto and Vince and
I wrote called machine learning methods,
economists should know about, that has
more of a focus on just putting prediction
methods in the context of econometrics.
And then some of my other articles
talk more about causal inference,
which has really been my big focus.
So just to introduce some themes, let me
start out by saying there are two kinds of
machine learning really there's more
than two but the two that you're most
likely to encounter first, are supervised
and unsupervised machine learning.
Supervised is basically going to be
richer versions of regressions, so,
or classification models or
multinomial choice models.
So supervised, we're going to have
a Y variable and some x variables and
we're going to try to use the x
variables to predict the Y variables.
Unsupervised is going to
be about finding groups or
clusters of objects that are similar
without any y variables.
So with unsupervised I'm not going to
talk about them as much today partly
because I don't feel like economics
really adds that much to the standard
methods that are out there in
the general machine learning field.
So, in general when people
are using unsupervised methods,
what I suggest is just go google it, read
blog post showing how to do it and apply
those best practices which are evolving
every few months to your own problem, and
I wouldn't really do anything different
because you're an economist, necessarily.
So in unsupervised, you might have
something like a collection of images,
or a collection of documents, or
maybe collections of the histories
of individual internet activity,
but you have a collection.
And then what you do is you you
take those that just raw data and
you ask an algorithm to
put them into groups.
So the input is just the list of
documents, and the output are groups.
So you might get,
group one has object number 3, 7, 11,
243 and 1,026, and they're in group one.
And then group two will have
a different set of objects.
So what you get out of that,
it's just a collection nothing else.
But when you look at those objects then
a human might later describe those groups.
So for example, if you run unsupervised
learning on YouTube videos,
the biggest group that will generally
pop out will be group Group A,
which just has the most views
say the most popular group.
If you start watching all the videos
in there, there'll be videos of cats.
And then if you start watching the second
set of videos, there'll be videos of dogs.
And so what you'll be able
to do as a human is say,
group one is the cat videos and
group two is the dog videos.
But the thing is that we didn't tell the
algorithms anything about cats or dogs or
animals or anything we gave the algorithms
a bunch of bits that were describing
the videos and the algorithm figured
out which ones were similar, okay?
So that is unsupervised, and people have
used it in things like Macroeconomics and
Political Science, looking at,
the text and minutes of fed meetings, or
you can use it to look at the text
of political speeches and congress.
And political scientists actually,
usually historically, economics was
ahead of political science in terms of
econometrics and statistical techniques.
But political science has actually been
of economics in terms of using text in
social science.
So if you want to look for, how are people
using this type of technique in
social science, you should look for
a few applications from economics but
actually a fair number from
political science as well.
And Matt Jintao also has has got some
really nice things that sort of at
the intersection around media polarization
and news, so I would fit maps
where he's an economist, but it's in
sort of the political economics camp.
So, but the reason that I don't spend
more time on those in my lectures is that
again, the goal and the way that we when
you think about things is very similar
across all the different applications and
so there's no reason we need to innovate
or do something really different for
that necessarily.
Now, there's a few caveats to that and
I think we probably will see some
customization depending on the fact
that we're going to use these things in
econometric models but it's it's close to
being what you want already off the shelf.
On the other hand,
supervised machine learning often is not
exactly what you want off the shelf.
And so that's why I think there's
been a lot of interest and
methodological work in trying
to combine econometrics and
supervised machine learning, because
generally although there's some people
have made good papers just using
supervised machine learning off the shelf.
Most of the time, the supervised machine
learning is going to be used as part of
an another exercise that has
different objectives, and so
actually we can improve things by
not taking it off the shelf, so
that's why I want to
spend more time on it.
The main thing in supervised machine
learning that it's really a paradigm
that requires very few assumptions and
it's very easy to teach and learn.
It's basically got the assumptions
that you have a bunch of independent
observations, and in the simplest case,
you have a cross section of
observations independent observations.
For each unit,
you have x's and you have y's.
And you assume it's a stable
environment basically that all of
the units are exchangeable.
So, there's no notion that some of them
come from a different distribution than
others at least, if that's true,
then that's accounted for with the Xs.
And there's two kinds of supervised
there's regression or prediction where
basically you're essentially trying to
look at the expectation of Y given X.
And classification where you're trying to
find the probability that y is equal to
a discrete value given x.
Now already actually the way
that I've written it this way,
because that's makes it look like what
you learned in econometrics class,
but that's not actually the way that
machine learners think about it.
They don't think about you is actually
wanting to learn those functions,
they think about you as
wanting to accomplish a task.
So you're just going to get the output,
you're going to get the guess of which
label it is you're not necessarily
going to get a probability.
And in fact,
with a lot of these algorithms,
you have to ask it to do more computation
if you want to probability out.
It's generally just going to spit
out that's a cat, that's a dog.
And in the production systems, they don't
tell you necessarily a probability.
Sometimes they do, sometimes they don't.
So but it's really the mentality that
these are Machines that spit out a guess,
rather than statistical estimation
routines that try to estimate a function.
And I'll try to kind of show you
as I go along why that matters.
And of course, sometimes you can
just be a little bit short-handed,
and not think about the distinction.
But a lot of times for economics,
we do care about that distinction.
And once you understand the way the
machine learners are thinking about it,
it helps you understand why they're
asking certain questions and not others.
All right, so when I started going
to machine learning conferences, so
I got into all of this when I
started working for a search engine.
So in 2007, I started consulting for
Microsoft and I quickly became their
consulting chief economist, and
I worked on the search engine.
And so when I got there, I had never
heard of the term machine learning or
at least I didn't really
understand what it meant.
And I knew basically nothing about the way
that this group of people was trained
to think about data.
And of course, machine learning itself
was very young in 2007 already.
So I kind of got plucked in there,
thinking I knew a lot about data and
I suddenly met hundreds and thousands of
people who were all working on a problem
who thought about data very
differently than I did and
I was the only one who
thought about it my way.
So that kind of indoctrination by fire,
[LAUGH] taught me a lot.
And then I started also
going to conferences and
things like that to try to learn more and
understand.
Because it was very clear that I was not
going to make these thousands of people
think differently than they were already
thinking without understanding very well
what they were doing.
And of course, you need to be humble that
probably, if they were doing it that way,
there was a reason, and it might be right,
maybe there was nothing I had to add.
But it turned out that there were
things that I could bring, but
I need to understand both sides.
So when you start going to these computer
science conferences, especially,
around 2010, 11, 12,
you would go to these AI conferences.
And basically like session after session
would be, have a follow a similar format.
So people would put up a description of
a problem like they're going to do image
classification, and
my colleague at Stanford,
Fei Fei Li actually created a big data set
that then everybody used to test off of.
And then people would say,
here's my new way to do neural nets, and
I can make them deeper, I can train them
faster, or I can tweak them a little bit.
And then here's the way here's my
goodness of fit, this is how much
better I do classifying cats and
dogs relative to what they did last year.
And then they would show a few cat and
dog pictures, and I'd be like, okay, well,
when are you going to tell me
like how this works or why?
And I kept going to session after session
thinking that suddenly the insight was
going to emerge, and I was finally
going to learn how all of this worked.
And then I realized after a while, but
that was never going to happen, and
in fact,
most of this was about engineering.
So it's very interesting kind of thing
about the sociology of science that by
putting up a big data set
a bunch of pictures with labels.
So literally the data set is there's
an image, and then it says cat,
then there's an image,
and then it says dog.
Putting up this big data set and
getting an entire research community with
thousands of people in it to spend all
their time trying to do a better
job accomplishing the same task.
They had massive progress in improving
their ability to do that task.
But one of the things that was
really important about making
that work was that there
was a right answer.
So you could hold out the test data, and
here would be some pictures with cats and
dogs.
And then you could see whether
an algorithm could actually tell cats and
dogs apart in the test data.
So if I was better than you,
that was going to be very
clear about my performance.
And you can contrast that to your typical
economics seminar where suppose we start
arguing about whether my paper that shows
the minimum wage doesn't hurt employment
is better than your paper that shows
the minimum wage does hurt employment.
And we can argue, and argue, and
argue for an entire seminar for weeks, or
months, or years, and
not actually know who was right.
Well, with this type of thing you know
whose was better was then who's because
you hold that ticket, find a data set
those algorithms haven't seen before,
you test them and you see which one works.
So it's a very different problem, I think
of it as an easy problem because I think
a lot of our problems in economics come
from not knowing what the right answer is.
A problem where you do know what the right
answer is, sounds like an easy problem.
So it's easy in some ways, but of course,
it's hard in other ways because it took
years of work to be able to get these
neural nets to actually be able to
accomplish this task with high accuracy,
okay?
So you go to the seminars,
you see cats and dogs.
So here what you're going to do is
you're going to take these images, and
you're going to translate them into Xs.
And so, if you think about like, your
monitor has red, green, blue, so I can
take any image, I can think about making
that image with red, green, and blue.
And I can have three matrices,
each matrix like is how bright is the red,
how bright is the green,
how bright is the blue?
And that's going to describe the picture,
and of course,
each little point in the matrix
would be a pixel, okay?
So that's one way to encode this.
Then that could translate it into Xs, and
then we want to say given
that given X is that a cat.
Now if I put this into a, say if we just
had the images of cats and dogs, I could
put this into a binary logistic regression
and I could just put in all the pixels,
is Xs, and try to do this,
an interactions between the pixels.
But that wouldn't work very well.
And we would get out is probability a cat,
probability a dog is
a function of the pixels.
The magic of the neural nets is that they
find a really flexible functional form
with lots, and lots, and
lots of parameters in it.
And you can think of it a little bit like
trying to find ways to transform, and
retransform, and retransform those pixels
into features, or kind of constructed X's.
And then those constructed
Xs will be used to predict.
And so they can, for example,
figure out over here this triangle thing,
that's going to be a feature.
And if I see that triangle thing
somewhere else on the picture,
I'm going to think it's
more likely to be a cat.
So it's kind of like automated variable
discovery going on in the background.
And I have actually longer lectures that
have sort of introduction to neural nets
for economists.
So if you're interested,
you can look on my Google Drive or
ping me and I can show you good ones
to learn more about how they work.
So now when I look at this
particular picture of a cat,
we see this cat is playing a piano.
So one thing about these algorithms
is that they are very much blackbox.
So you as a user didn't tell
the algorithm anything about whiskers,
ears, nose, eyes, animals,
biology, bones, nothing.
You just put in the pixels, and the
labels, and out came the classifications.
So, what's that going to
mean is that whatever it
is in your data that's predictive, is
going to get picked up by the algorithms.
So here we see this cat was playing
a piano, but turns out there's more piano
playing cats on YouTube than there are
piano playing dogs or piano playing cats,
and Google in these image net than dogs.
Then your algorithm
will create features or
variables that are black and
white rectangles.
And when you see those together,
it will increase the likelihood this
gets classified as a cat, okay?
Now, as an economist, you might sit back
and say, well, I'm not sure I'd like that.
Because what you're saying,
well, why don't I like that?
Like what is it that's special
about an ear shape that should be
part of a model but
piano that's not a part of the model?
If you thought, pause and
think about why that would be
wrong to have pianos predict cats.
And I think one way that I would
like to phrase that is to say.
A piano is not a stable feature of a cat.
It's not a structural feature of a cat.
It happened that today there's
more piano playing cats than there
are piano playing dogs.
But if I trained my dog to play the piano
and started a craze of piano playing dogs,
then tomorrow it could be that
there's more piano playing dogs than
there are piano playing cats.
So as an economist I would
think that this model,
it's going to work well in this sample,
but
it might not work well in another
situation, in a different point in time.
And so that makes you realize that,
as economists,
we often want models that are
generalizable, that have really stable or
structural features of
environment as part of them.
Because we're trying to actually
build a model of how the world works.
But that is not part of the explicit
objective of most machine learning models.
Machine learning models, off the shelf,
are just trying to fit the data that you
have and they're going to use everything.
And the fact that they're blackbox
means it's really hard to make them not
use something.
If you wanted to have a really good
neural net that was telling cats and
dogs apart that did not pick up on pianos.
You would have to do quite
a bit of work to do that and
it would take you a long time.
You would have to basically like
add a penalty for the pianos and
you would have to change
you objective function.
You would have to do engineering and
most people out of grad school
wouldn't actually even really
know how to do that well.
So you're kind of with these things,
you're kind of stuck with them,
it's all or nothing, you get the whole
black box or you get nothing.
And until recently, when the machine
learners have also started to catch on
that some of these things can be problems.
That's been the way it works.
So just to think about like how that
matters in practice, if you were say,
doing macro modeling, suppose you were
working for a Federal Reserve and
you were trying to understand risk for
banks.
Or suppose you were, I'm on the board
of a tech company that does lending.
And so
suppose we had a credit scoring model.
Well, we might hire someone out of
Stanford that says, hey, why don't you use
a neural net because that has better
goodness of fit in your data for
predicting loan default?
Why don't you use that to make your loans?
And the problem would be well,
what if something comes along like
COVID-19 where the economy changes?
How would I even know whether my model was
going to hold up or continue to perform
well when the world changed if I can't
even understand how it's working?
And whereas if I have a simpler
model that I understand,
I might be able to evaluate, well,
gosh, it's loading up a lot on this one
variable which maybe t doesn't have the
same information content it does before.
So I could actually, as a human, evaluate
how important it is to change my model and
what problems I might face
if I kept using that model.
Okay, so when somebody comes in and
says let's use a neural net in a business
environment or a macro forecasting
environment, you have to be very
cautious and think through,
do I need this to be stable or not?
Now when I talked to the guy who did
Google Images about this, I said,
hey, don't you realize
you're going to be unstable?
He says, well, who cares?
Because we add new data every day.
And if dogs started playing the pianos,
we update.
And if we're making mistakes we'll
just continue to retrain our model.
So as long as you can update your
model faster than the world changes,
you don't really care
that it's not stable.
And as long as you have a way to continue
to assess its performance, you don't care.
But in contrast, if you're a bank, and
you need to hold your models fixed for
a period of time because of regulation and
a variety of other factors.
Then you may not like these kinds
of models that pick up on spurious
things that might change over time.
Okay, so
those are some of the considerations.
Now, when I first started teaching
about this, I felt that most machine
learners actually really hadn't thought
this all the way through themselves.
They really weren't very articulate
about the weaknesses of the models.
One of the things that's happened
over even the past five years is that
as people go out and try to
implement these things in the world,
they find lots of problems they're facing.
And then now more and
more sessions at the top machine learning
conferences focus on things
like interpretability and
stability and so on,
to try to address these problems.
So just to give another sense of what's
good and bad about these models,
this is a marketing
brochure from McKinsey.
And so this is an example where they
were trying to show why machine learning
is better for predicting when
customers are going to quit.
So this is drivers A and drivers B,
you can think of this as xa and xb,
two different covariates.
And what you're trying to predict is
a binary outcome, does a customer quit or
churn, churn is the word for quit?
And so they say in our old version,
where we just did a logistic regression.
This green line is
the ISO probability line.
And it's a very nice, simple line.
So when the drivers A and B are high,
we think you're likely to quit.
When drivers a and b are low,
you're unlikely to quit.
Then when they used machine learning,
they get these ISO probability lines or
lines where probabilities
are equal that are much richer.
So in their brochure,
they're saying, hey, look,
we discovered these complicated
relationships in the data that that we
wouldn't have learned if
we just did it manually.
And now we're going to get a better fit.
But when I look at that,
I say, well, that's great.
You got a better fit in
your data wonderful.
But if you were trying to use this to
make decisions or maybe take an action
that might change drivers A or
drivers B, you might be misled.
So for example if we hold drivers
A at 80 and increase drivers B,
what this things says is the probability
a customer quits goes up and then down,
and up and then down,
and up and then down.
I would say well,
I don't know what drivers B is, but
I think it's highly
unlikely whatever it is,
if you're holding drivers A fixed that
there even exists a variable that where
really probabilities go up and
down and up and down and up and down.
Very unlikely that that
is a causal relationship.
Or even that that's necessarily
a correct partial correlation if I
just held drivers A fixed.
You might say, well,
why did I get this picture then?
That's because actually they probably
put in a whole lot of covariates and
there's a very complicated correlation
structure going on in the background.
And so the reason you get this wavy shape
is because as you change drivers B,
that's changing the conditional
distribution of a bunch of other things.
And this was the best fit in this
highly multi-dimensional space.
But it's not something that
you would really believe as
a conditional correlation.
And one of the things is that
probably the model has made choices,
it hasn't really had enough
data to control for everything.
So it's going to do well, it doesn't
really matter this wavy shape as long as
you, on average, figure out who's going to
quit and who's not going to quit.
But it's not so good if you're
actually trying to say, gee,
I want to go and
target these particular people,
but if their drivers B goes up a little
bit, I'm not going to target them anymore.
That kind of stuff doesn't really work.
Another problem from this is it
actually this is a predictive model.
And you'd say, well, why do I want to
predict which customers are going to quit?
Well, you don't want to predict anything
unless you can take an action on it.
Well, sometimes you want to do it for
your financial budgeting or something.
But if you really think about it, the
reason you want to predict which customers
are going to quit is often because you
might want to call those customers or
send them an offer to get them to stay.
But predicting who's going to
quit is not the same thing as
figuring out who would
respond the best to a call.
And some people might be quitting because
they're moving out of the country, or
they don't need your service anymore.
So predicting who's going to quit is not
the same thing as determining who you
should intervene on.
And that's the difference between
prediction and causal inference.
I can predict who's going to quit, but
that's not who I should
put a treatment on.
I don't want to treat the people who
are going to quit no matter what I do.
I want to treat the people
who might quit and
might not quit depending
on how I treat them.
And there was a nice paper out of
Columbia that did an analysis of this
with a with a company.
And they found that there was only
a 50% overlap between the people who
were going to quit, and the people
who were responded to intervention.
So, now that we've kind of got,
I laid a little bit of ground work.
What I hope I've done so far is give you a
sense of machine learning is kind of black
box and started to help you think about
what's the difference between prediction
and causal inference.
Now, let's kind of get a little
bit more precise here.
So if I want to do a prediction
in a stable environment.
What we're going to try
to do is build something,
say if it's a continuous y,
we want to get expectation of y given x.
And our goal is to minimize mean
squared error and a new data set,
where only x as observed.
So you're going to see the x, you're
going to come up with the mew hat of x.
And then your goal is to
minimize the mean squared error.
So a few more things to note about this.
First of all,
no matter how complex a model you use,
the output,
the prediction is a single number.
And you evaluate the model only by
how well your predicted single number
matches the actual single number.
And you want to do it in a test set,
which is a different data set then
you used to construct new hat.
And so that's really the formalization
of the idea that you can
tell if you did a good
job by just checking.
I can hide data in my drawer, send a
research assistant off to build a new hat.
They can come back.
I don't need to look at their code,
I don't need to know if they were smart.
I can just say, here's my x's.
Let's run your model.
Let's get some new hats and I can tell if
they did a good job by just seeing if they
match up with my y's that I hid from them.
So, you really can tell very
objectively how things were done.
And so that's actually there's an analogy
I think about machine learning as
a robotic research assistant.
So I'm willing to delegate something to
a black box if I am able to check if they
did a good job.
Just like I'd be willing
to delegate something to
research assistant without
checking their code,
if I could just check afterwards
whether their work performed well.
And again, the only assumptions
require in independent observations,
joint distribution of y and
x being changed being the same and
the test set is a training set.
Now, minimizing mean squared error is
actually going to entail what's called
the bias variance trade off.
And so one of the things about minimizing
mean squared error is that you will always
accept some bias.
And the idea is that the estimator is
too sensitive to the current data set,
then it won't do well in a test set.
So we're going to trade off, making a very
rich model and making a simpler model.
In order to balance having the model
be expressive, if it's expressive,
it'll get the right answer for
every individual in an unbiased way.
Well, if I just predict everybody with
the sample mean, it's not going to be very
expressive and it won't be the right
answer for any individual.
But on the other hand, if you give me 10
different datasets as long as they're
sufficiently big, I'll get the same
answer from each of the 10.
Through a very expressive model and
estimate a model with lots of parameters.
You'll model estimates will differ
from dataset to data set, but
it'll be expressive and
it'll be very personalized and
it'll get the right answer for
each x more often.
And so there's this trade off between
expressiveness and simplicity.
And so what you typically do is you use
cross validation to mimic the idea of
a test set in order to
make that trade off.
So for machine learning algorithms,
you consider a family of models,
you use the data to select among
the models using cross validation.
So you might break the data into 10 folds,
estimate models on nine tenths
of the data, and then see how
well they fit on the last 10th.
You do that over and over again, for
simple models to really expressive models.
And then you see across all of those
holdout sets, were in the spectrum
from simple models to complex models,
gets the best fit in the in held out data.
And then you pick that you say, it's
a medium expressive model is the best.
And then I estimated on all the data and
that's what I use.
And again I can accurately
evaluate their performance without
additional assumptions.
So in a robotic research
assistant can do great, and
I'm happy to delegate to a black box.
Okay, contrasting that with traditional
econometrics, economics is typically
focused on the case with substantially
more observations than covariates.
So n the number of observations much
greater than p the number of covariates.
In that situation, the n sample mean
squared error is a good approximation to
out of sample mean squared error.
So if you just passed your econometrics
comps not too long ago, you probably.
Like I did 25 years ago, had to
memorise the proof that OLS was blue,
best linear unbiased estimator.
And so, you might wonder,
why wouldn't I always use OLS for
everything if it's best?
But one of the things we asked for
there was that OLS was unbiased.
And in unbiased estimator is generally
not going to minimize mean squared error,
first of all.
And second of all, you probably didn't
spend a lot of time thinking about test
sets in your intro econometrics class.
And the reason you didn't do that
was because there was an unstated
under emphasized assumption.
That you had a fixed model that
the God of economics gave you,
and then in went to infinity.
And if you have a lot of observations
in a fixed model, then your in sample
mean squared error is the same as your
out of sample means squared error.
You have a very simple model like think
about just estimating a sample mean, if I
have 200 observations, my goodness of fit
in this data set that I use to estimate
the mean is probably going to be about
the same as if I took a new data set.
But instead if I try to fit a regression
model that has 200 covariates
on a data set with 200 observations.
Then it's going to fit perfectly
in the first data set,
and it'll fit horribly
in the second data set.
And so the machine learning paradigm is
about a world where the model gets richer
as you get more covariates.
And so you can't assume that in sample
fit is the same as out of sample fit.
And I guess if you think about it,
though, for me, I think well,
that's actually really the right
way to think about it.
In most cases, actually,
I haven't got the right functional form.
And I would like to make a richer and
richer model as I got more data.
And so actually,
I should be taking seriously the fact that
I'm overfitting and the data that I have
Now then going back to what else does
traditional econometrics do, we often
think about casual effects, counterfactual
predictions, correlation versus causality.
We think about standard errors,
we think about structural models
incorporating behavioral assumptions.
And all of those things are not part
of the basic Machine Learning Toolkit.
They don't think about
any of those things.
Now there's some parts of
machine learning to do.
And there are some machine learners
that write Bayesian models
that are generative models which
are close to structural models.
But I'm just talking the basic one on one
machine learning that you would get from
your basic course you wouldn't think about
any of these things that I just talked
about really.
Now when we think in economics
about identification,
we think about correlation
versus causation.
Identification problems cannot be
evaluated using a holdout set.
So if we have prices and
if prices responding to observables
in a training set It'll that'll
also be happening in a holdout set.
And so
a lot of the problems we worry about and
argue about in economics cannot be
solved with test sets and training sets.
The thing that can be solved is stuff
about is your functional form a good fit,
but not is there some unobservable.
Okay, so
that's really important to think about.
And then another thing to think about is
causal methods sacrifice goodness of fit
to focus only on the variation in the data
that identifies parameters of interest.
So you probably learn in supply and
demand, you might look for
an instrument for price.
If you just regress quantity on price,
you could get an R squared of 0.95.
Once you instrument for price, you know
your goodness of fit might fall to like,
you might explain only
1% of the variation.
But we do that without batting an eyelash
because what we care about is an unbiased
estimator of a price rather
than just predicting quantity.
Okay?
So our goal is often
very different as well.
So then just to say what we say
versus what we do in econometrics,
we say we do all these
things causal inference.
God gave us the model,
the God of economics told us that
it should be income squared and
not income cube, you know, and we
pretend we don't do any model selection.
But what we actually do is we get our
research assistants to make 200 versions
of our regressions to make
sure things are our best.
And then we pick the one that
looks the most stable, and
we report five columns in our papers.
So that is actually kind of dishonest.
And it's also uncomfortable for
the research assistant and the researcher.
And it makes us vulnerable to
non replicable findings and
invalidates our standard errors.
So I would argue that today we're in
an untenable situation where we pretend we
don't do model selection.
But we do it in this ad hoc way.
But because we don't know how to report
our standard errors if we tell them what
we really did, we lie about what we did.
Because we don't know how to fix it.
And so what I would argue is that machine
learning can help us have a systematic way
to do model selection where we can
tell our referees what we did.
People can reproduce what we did.
And then if somebody finds a different
result somewhere else, we can say,
well, okay, but we tried,
you know, we basically
tried to build in functional forms
using this robotic research assistant.
And this is how we selected so we weren't
being dishonest with what we showed you.
Maybe there is something
else we didn't think of,
but this is the process we went
through to find our models.
But we as I show in some of my research,
when you go down that road,
you have to do a few things.
You have to use things like sample
splitting and be careful about test sets
and overfitting to your own data to
still get your old econometrics to work.
And so one way to think about a bunch of
my research has been to allow us to bring
in machine learning models to
get the best of both worlds.
To use the data to select the models, but
to keep our econometric properties so
I can still make the same
tables with standard errors and
so on, that we use before.
I'm not going to have time
to tell you how I did that.
So you're going to have to
read my papers to see it.
But I just that's the goal.
And really today I just
wanted to set up the goal.
So why don't I pause here and
take questions.
So and then I'll take a break and
then I'll show you in the second part,
which is going to be shorter.
A little bit about some
applications of in panel models,
which I think might be
the one the part of this.
It's most interesting for this audience.
So let me now pause here if we
can turn off the recording.
