{Music}
 - So, I'm very, very pleased
 to make the introduction
 for our first academic panel,
 which will be led by my
 colleague, Hanna Halaburda.
 So, Hanna is a star in
 blockchain research,
and she's currently been
working with the Bank of Canada.
 So, this is really exciting, I
 think it'll be a great panel.
So, Hanna, over to you.
  - Thank you, thank
  you very much.
Technology continuously
transforms finance,
 and in this session
 we're going to see
how one pretty much well
established technology,
 machine learning, is
 changing financial activity,
 and also look at the potential
 of another new technology,
 blockchain and smart
 contracts, to do so.
 We will have Bryan
 Kelly to tell us how
 machine learning is
 improving our ability
to predict risk premium,
 shedding light on
 some academic debates
  about what are the best
  factors in asset pricing,
 but also solving some very
 practical investment problems.
Then, Julapa Jagtiani
will talk not only about
  machine learning, but also
  about new data sources
that are enabled by technology,
 and online lending,
 which started with
 peer to peer lending,
  allowed us to collect
  new data, new information
on borrowers' activity,
and borrowers' behavior.
 And Julapa is going to tell us
  how using this new
  alternative data
 along machine learning
 helps us better price,
 more efficiently price loans.
 Then finally, Will
 Cong will talk about
 blockchain and smart
 contracts and their potential
to farther revolutionize
data collection,
and data processing in finance.
  It is important to keep
  in mind, at the same time,
as I'm sure our speakers
today will remind us,
 that these new
 technologies also come
  with some undesired risks.
 So, with that introduction,
 let us start with Bryan Kelly
 of Yale School of
 Management, and AQR Capital.
  (applause)
- All right, thank you
very much for having me,
 it's great to be back
 at my Alma Mater.
  So, today we'll
  be talking about,
 well we'll be talking about
 machine learning in finance,
 but I want to couch
 it around a question,
 which is can machine's
 learn finance?
So, to give us a little
bit of a baseline,
  all you have to do is look
 at the popular
 media a little bit,
 and you'll understand
 that it appears
machine learning can
do just about anything.
  Deep neural networks,
  their accomplishing things
 that historically we
 would've considered
 just absolutely unachievable.
  For example, they can beat
 the best chess players in the
 world, they can drive cars,
 they can beat the best
 Jeopardy player in the world,
 they can recognize speech
 and translate it on the spot,
they can beat the best
Go players in the world.
 I like this example, because
 when they first beat Kasparov,
 there were headlines
 around that time
  that said, "That's fine,
  but chess is an easy game.
  "Go is much more
  complicated, the machines
 "will never be able to
 beat humans at that."
So, you can see exactly
how long it took
 for machines to knock
 that one off the list.
 And then, robotics are driven
 by deep neural networks.
 This was an example that was
 in the New York Times
 a couple weeks ago.
  The researcher was talking
  to his robot and said,
 "Please show me the letter A,"
 and the robot was surrounded
 by a bunch of objects,
 the robot reached
 down, picked up a cube
 with a bunch of letters on it,
  manipulated it in
  its robot hand,
 and showed letter A
 to the researcher.
 That was all driven by a bunch
 of sensors in its environment
 and a neural network that was
 processing that sensor data.
  So, it seems like machines
  can really do anything,
 they can sing, dance, juggle.
 But the big question
 in my industry
 is can machines learn finance?
And I want to emphasize
that the answer,
 from my point of view, is
 by no means an obvious yes.
 It might be useful, and
 there's a lot of speculation
and hope that it will be useful,
  but what I would
  like to emphasize
is that finance is just
fundamentally different
 from the domains
 where machine learning
has had success to date.
So, let me just talk
about one particular way
  I think is the
  most important way
  that finance differs from
  things like driving cars,
or image recognition, et cetera.
The best way to think about this
is really to talk about
signal to noise ratios.
So, when I talk about a
signal to noise ratio,
think about the expected success
that a human would have
in a particular task.
 So for example,
 classifying images.
I hand 1,000 images to a human,
  and I ask them to
  categorize them
  as an image of a
  cat, or not a cat.
  The human's going
  to achieve that
 with an R squared
 of essentially one.
There might be a couple
misses here and there,
 if the image is
 particularly blurry,
 they couldn't tell if
 it was really a cat
 or a dog in that
 particular picture,
 but almost always the
 human will nail it.
The reason why is because that's
  a high signal to
  noise environment.
 The signal, what the
 image represents,
 dominates the
 noise in the image,
 things that are coming
 from background,
  or blurriness, et cetera.
 All right, so now let's
 contrast that with finance.
If we arranged 1,000 of the top
finance experts in the world
and ask them to
forecast which direction
 the S&P 500 was going
 to move tomorrow
 our hit rate would be
 almost exactly 50%.
 I feel very confident
 in that prediction.
 And that's not a shortcoming
 of the finance researcher,
  or the market participant,
  that's just a statement
 about the nature of
 financial markets.
  Financial markets
  are competitive,
 and because they're
 extremely, highly competitive,
 they're extremely efficient,
 and it's the very
 efficiency of markets
 that make them unpredictable.
 It makes the signal
 to noise ratio
 in financial markets
 extremely low.
 So, suppose that
 I, as the investor,
 had a good forecast
 for where the S&P 500
 was gonna be a week from now,
 I believed it was gonna go up,
  and I had a really strong
  conviction about that.
 I wouldn't wait a
 week to begin trading,
 I'd start trading now,
and that would start to
push up the price now.
 That very action
 would actually pull
  some of the predictability
  out of the market,
 but I wouldn't stop
 there of course.
  If there was still some
  predictability left over,
  I'd buy more, and I'd buy
  more, and I'd buy more,
 and I'd keep
 pushing the price up
  til a very particular
  point, the point at which
 there was no longer
 predictability.
 The price today
 equaled my best guess
 of where it was gonna
 be a week from now.
 That's the entire idea
 underlying efficient markets.
Because markets are competitive,
 because all of these
 traders in the world
 are trying to make a buck
 faster than everybody else,
 they make markets efficient,
 they drive down signal
 to noise ratios,
they pull predictability
out of market.
 All right, so machine
 learning tools
 are tools of
 predictive analytics,
  they work best in settings
 where predictability
 is really high.
When you go to recognize a cat,
 it doesn't make future
 cat images blurrier.
 That's essentially
 what happens though
 when you go to predict
 in financial markets.
  If you can make a
  better prediction,
 you're making the
 market more efficient,
 and less predictable.
 So, long story short,
 finance is just
 a very different
 system than the places
where machine learning has been,
 historically, very successful.
 So, what do we do when
 we're in a situation
  that you talk to lots of
  asset managers these days.
 They're going around
 telling their clients
 that they're developing
 some machine learning funds,
 relying heavily on big
 data and algorithms
 to put together portfolios.
But they can't tell you
about what their doing,
 cause their argument will be
 they need to protect
 intellectual property,
 but based on the
 argument I just made,
machine learning might
be really poorly suited
 for asset management.
  So, when you find yourself
  in this kind of situation,
this is where academics
start to be useful.
 We don't always know
 if their useful,
but this is where
academics can be useful.
 They can do research
 and shed light
  on exactly these questions
  that asset managers
 would prefer to be
 more reticent about.
  So, that's the idea behind
  this paper that I wrote
  with some of my colleagues
  at University of Chicago.
Empirical Asset Pricing
Via Machine Learning.
 What we're trying
 to do in this paper
 is really to just shed a light
 on what we can get
 in asset management
 from machine learning methods.
  All right, so when you're
  trying to understand
 what an asset manager is doing
 when they say they're building
 machine learning portfolios,
 it's really a tough problem.
Because a lot of things
can be contributing
 to their performance.
For example, they might
just have different,
or more beneficial
data in some dimension.
 That's not about the
 algorithm at all.
We always know that
big data's better data.
  You have more information
  to work off of,
 you have a better chance at
 building a good portfolio.
But we can't do that
performance attribution,
  that's what we'd
  really like to do,
  split it out to
  information from the data
  versus information
  that's just gained
  from having a better
  model, a better technique.
 So, in this paper it's about
a comparative analysis of
methods holding the data fixed.
 All right, so my
 view as a researcher
 is that if I want to
 help us understand
 where machine learning
 as a method is useful,
  let's start with a
  baseline data set
 that we all understand
 really well.
 I'll tell you
 about that data set
 a little bit in a minute,
 but it's essentially gonna be
 US monthly stock returns, and
 all of the standard predictors
  that people have looked at
in the literature up til today.
 All right, so the
 primary contribution
from this research
comes in a couple forms.
 First of all, I'm gonna argue
that machine learning is
economically meaningful.
  Cause what we're
  doing in finance,
 asset pricing in particular,
 the entire object
of my field of research
is to understand
  an object called
  the risk premium.
 All right, Gene Fama's
 Nobel, he pointed out
  that this is the centering
  question in asset pricing,
 why do different assets earn
 different average returns,
and why do those average returns
  seem to behave differently
  over the business cycle?
  All right, so that
  object, the risk premium,
is a conditional
expectation of a return.
Well, what happens when
you build a forecast?
Let's say you run a
forecasting regression.
  What are you really doing?
What you're really trying to do
 is build a conditional
 expectation
of whatever your Y variable is.
  That's to say, the object
  that we care most about
  as researchers in finance
  is exactly the output
 from a forecasting tool set.
 That just means that
 machine learning
 is really well adapted
 to help us understand
 the field of asset pricing,
 hence the argument that it's
 economically meaningful.
All right, so that's also along,
  I'm not gonna go into too
  much more depth on this.
  This is also the idea why
 it's ideally suited
 to asset pricing.
  But I also want
  to make the point
 that we're gonna provide
 some empirical context here,
and we'll show you that compared
  to essentially the leading
  traditional methods
  that people have studied
  in the finance literature,
 you get some incremental gain
 from doing machine learning.
  It's not going to be huge,
  and I think that's
  a sensible answer.
  It's not going to
  be a revolution
 from using machines
 to form portfolios,
 it's gonna be an evolution.
It's gonna take the
quantitative investment process
that we've been looking
at for a long time
 and putting it on
 steroids a little bit.
  That's sort of how
  the quan industry
has been evolving for 30 years.
Let's find more, newer,
better, bigger data,
and let's incrementally improve
our methods as we go along
 to incrementally make
 our portfolio choices
 better, and better over time.
 All right, so here's
 the empirical setting.
  This is the data set that
  I'm gonna fix myself to.
And I really think
that this is a data set
 that if you ask any empirical
 researcher in finance,
  they'll know how this data
  set behaves really well.
 So again, it's monthly stock
 returns, all US stocks.
 All right, so a very efficient
 market to begin with.
 I'll be looking at about 100
 stock level characteristics
 that are now prominent, highly
 cited in the literature.
 You know them,
 you've heard of them.
 They're size, value, momentum,
 accruals, yada, yada, yada.
 I'll also be looking at some
 aggregate predictor variables.
  Things like the aggregate
  price dividend ratio,
  the term spread, the short
  rate, the default yield,
a whole bunch of macro variables
  that people have studied,
  again in the literature,
 and figured out that
 these are useful
for understanding
business cycle variation
 and expected returns.
  So, I'm gonna put
  those together,
 I'm gonna build a bunch of
 forecasts at the stock level.
 It's really going to
 be a big panel model
 to try and understand returns
 on each individual stock
 at each point in time.
  All right, so we
  have a lot of ways
 that we can evaluate
 the performance
  of this particular
  set of models,
 and the one that I'm
 gonna be focusing on
is to aggregate all of
my stock level forecasts
into portfolios and see how well
  my portfolio forecast does
 at forecasting
 portfolio performance.
  All right, by aggregating
  up to something
 like a portfolio level return,
it's something that it's easier
 for you and I to have
 a conversation about.
 All right, so what
 is machine learning,
 what are the methods that
 we're gonna actually look at?
Well, the methods that we study
 for this comparative analysis
 kind of looks like
 the chapter outline
 for any leading graduate
 textbook on machine learning.
 So, we're gonna start
 with linear models.
Linear models include
ordinary least squares,
 and in fact that's
 gonna be my benchmark
 because that is the
 most standard tool set
  when you're doing
  empirical finance.
 I'm gonna look at a particular
 least squares model
 as my benchmark
one that uses only three
predictor variables.
  And there's a typo
  on this slide.
 The three predictor variables
 that I'm gonna look at
 are in asset's market equity,
  it's book to market
  ratio, and it's momentum.
 Okay, it's 12 month momentum.
  The reason why I wanna use
  this set as a benchmark
is because it's well
known in the literature,
that's a highly selected model.
  It's worked well, and it's
  work well for a long time.
 All right, so in some sense
 it's a conservative benchmark.
Think of it as what
the humans have learned
 over a long period of time.
All right, beyond
that, I'm gonna look at
much bigger models now.
 I'm gonna have all of
 my characteristics,
size value, momentum, et cetera,
all of these macro
predictors put together
 in much bigger models.
So, I'll start off with
a big linear model,
 OLS with about 1,000
 predictor variables.
 Then, I'll realize right away
 that that's gonna do horribly,
  so I'll compare to a model
  that uses least squares,
 just basic regression,
 but uses penalization.
So, you may have heard of things
 like a lasso, or elastic net.
 So, that's gonna
 basically shrink down
the parametrization of the
traditional least squares model.
 All right, so those
 are linear models,
we'll also look at some
dimension reduction techniques.
You've probably heard of
some of these as well,
 things like principle
 components analysis,
  and partial least squares.
 Then, from there we'll move
 on to more sophisticated,
 bigger, and now
 non-linear techniques.
  So, the first one
  I'm gonna look at
 is a generalized linear model,
this is kind of the easiest way
 you could put non-linearities
 in your model.
  Instead of just
  regressing onto X,
you'll regress onto X, X
squared, X cubed, and so forth.
 All right, but that's kind of
 a contrived non-linear model.
 We have some true
 non-linear models that exist
 and that are really
 key components
 of the machine
 learning repertoire.
  In particular,
  tree based models,
 so things like random forests,
  and then we'll be looking
  at deep neural networks.
Which, if you think about where
  the successes are
  in Silicon Valley,
 they're primarily dominated
 by deep neural networks.
 All right, so a summary of
 the main empirical findings
 before I actually
 show you some data.
The basic conclusion is
that machine learning
  seems to work for
  portfolio choice.
Again, these are not
gonna be massive gains,
 they're gonna be
 incremental, but important.
All right, where are
these gains coming from?
They're primarily coming
from non-linearities.
  The nice thing about doing
  a comparative analysis
with a lot of different methods,
  you can see where
  some methods work,
  and some methods fail, and
  then use that comparison
 to draw inference about what
 is the source of the gain.
  So, we have really
  simple models in finance,
 mostly because we only
 have simple theories
that we find tractable.
 What we're actually
 seeing in the data
 is that we need a lot
 of non-linearities,
 and we need a lot of
 interactions among
 predictor variables.
  That means that we need a
  more sophisticated model,
  a richer model than we've
  typically looked at.
  So, if you want to
  understand where
 the gains are coming from
 from using machine learning,
  it's from having more
  flexibility in our model.
 Another point that is
 maybe a little bit subtler,
 and more relevant for people
 with some background
 in this area
  is that shallow learning
  outperforms deep learning.
 So, if you ask how a
 driverless car works,
or how image recognition
works at Facebook,
 they use neural networks,
 but they use neural networks
  with dozens and
  dozens of layers.
 Which means they are extremely
 highly parametrized models.
 What we find is that
 performance kind of maxes out
 with a three layered
 neural network.
 So, that's viewed as typically
a pretty shallow neural network.
But I think the reason for that
 is sort of sensible as well.
 In finance, we don't
 have as much data
 as you have about
 sensory information
  from a self-driving car,
  where they have billions,
 and billions of observations.
 We have a couple hundred
 months of data to work off of,
 and maybe 1,000
 predictor variables.
 So, it's not a true
 big data setting.
 That means you can't support
 such a huge deep model.
 Another interesting point is
 that this relative performance
 of non-linear methods
 versus linear methods
 it becomes wider when I
 start to look at portfolios,
 rather than individual stocks,
 and I think this
 makes sense as well.
 What happens when I
 build a portfolio?
Taking all of these
stocks that have a bunch
 of individual, idiosyncratic
 noise, and I'm averaging them.
That's what you do when
you build a portfolio.
And in doing so, you average out
 a lot of their
 idiosyncratic risk.
  What does that do?
 Well, that actually boosts
 up the signal to noise ratio.
It eliminates a lot of the noise
in your prediction setting.
  By eliminating
  some of the noise,
 you give these more
 sophisticated methods
 a better chance,
 you kind of bring them
 closer to their home court,
 and they do better.
 All right, so a lot
 of those statements
 you can see just by looking up
  a statistical
  performance of the model,
 but as an economist,
 we want to understand
 how these performance
 improvements look
 from an economic standpoint.
 So, instead of just reporting
 things in terms of statistics,
 R squareds, et cetera, I'll
 put them in economic terms,
 in terms of portfolio
 Sharpe ratios.
 And we'll see that the
 gains are meaningful
 from that perspective as well.
And the best predictors are
things that we've sort of known
for a long time are predictive.
 What's interesting now
 is that we recognize
  that we need some
  non-linearities
 to really get all
 the juice out of them
 in terms of their
 predictive content.
 All right, so we have a lot
 of results in the paper,
  I have time to go through
  about two or three,
 so let's just start
 with this basic
 cross-sectional comparison.
What happens when I try
and forecast the S&P 500
 using each of these different
 machine learning models.
 So, this is my
 benchmark model here,
my OLS with those three
simple predictors.
  I'm gonna do all of my FIC
  comparisons out of sample.
So, what we see is that
even the best model
 from the literature, well
 that looks kind of data mined.
 In sample, in the literature,
 it looks quite good,
  but once we start doing
  an out of sample analysis,
 the R squared actually
 goes negative.
 Which means that you'd
 actually do better
 with a naive
 forecast of something
 like just the risk free rate.
 As I increase, sort of,
 the richness of the models,
  here's partially squares
  and principle components,
you see that again
these are linear models,
 they're more sophisticated,
 but they're still linear
  and they just are
  not working well.
Where do we start to hit gains?
We start to hit predictive gains
when we get into things
like random forest,
 and different neural
 network architectures.
So, I mentioned that
we have this conclusion
 that shallow learning
 outperforms deeper learning.
  So, these are one
  layer, two layer,
 all the way up to five
 layer neural nets.
  If we were doing
  image recognition,
 we'd see R squareds
 rise, rise, rise, rise
as I went to 16, 17, 20 layers.
 What we're seeing here is that
the R squareds top out at three.
 There's the conclusion
 that shallow learning
 is the most useful.
All right, so these are
in terms of R squared.
 Those are hard to interpret
 from an economic standpoint.
 So, I'm gonna convert them
 into a Sharpe ratio statement.
 And again, it's useful
 to think about this
 from the perspective
 of the S&P 500.
 The annualized Sharpe
 ratio of the S&P 500
 over a long period
 of time is about 0.4.
 That's what you could do if
 you're a buy and hold investor
 that just puts all your wealth
 in the S&P 500 and sits.
 So, the question is how much
 would I benefit as an investor
 if I used the
 predictive information
  from each of these models
  to time the S&P 500?
 Put more weight in when I have
 a prediction of a high move,
 and less weight in when I have
 a prediction of a down move.
 So, what this tells you here
 is the incremental annual
 Sharpe ratio that
 you'd get from using
  the predictive information
  from each model.
 So, again these are the models
 that had negative R squareds,
  so I'm not even
  considering those.
  These incremental gains
  from the non-linear models
 mean you go from a
 Sharpe of about 0.4
 to about 0.6, a 50% increase.
 Right, so from the
 welfare of an investor
 that's trying to save their
 earnings for their future
that's a very meaningful
economic gain.
 The last thing that I want to
 show is instead of looking,
 by the way this is true for
 a whole bunch of
 preexisting portfolios
  that people study
  in the literature.
Think about size, value,
momentum portfolios,
  if I were trying
  to forecast those.
  All right, what
  I want to do last
is say what if I just
built my own portfolios
  by looking at which stocks
  were the best predicted
  in terms of their returns
  coming out of each model,
 the worst predicted
 in terms of returns
 coming from out of each model,
and then took a
long-short bet on those?
 So, what I'm gonna do
 is for each method,
  I'm gonna sort
  stocks every month
  based on their
  forecasted return.
I'm gonna buy the stocks
that have high forecast,
 sell the stocks that
 have low forecast,
 and track their performance
 out of sample over time.
  So, here's where
  you see that even
 the benchmark model
 does pretty well.
 This is like a market
 neutral portfolio,
 cause it's long-short,
 but it still has
 a Sharpe ratio of about 0.3.
 All right, so there
 is some information,
  there is some alpha in
  that long-short portfolio.
 What's fascinating
 is that when you go
 to these more sophisticated
 non-linear methods,
you start to see out of
sample Sharpe ratios now
that are exceeding one.
 All right, this is
 on evaluated basis,
 so I'm not just forecasting
 small, illiquid stocks,
 I'm actually
 forecasting big stocks
  that are big components of
  each of these portfolios,
  and we're still seeing the
  predictive gains there.
  I've not taken into
  account transaction costs,
 or anything like that, so
 there's an overstatement here,
 but the point is there
 are gains to be had.
 All right, so I just
 want to summarize,
come back to this question
that I motivated the talk with,
 can machines learn finance?
 All right, and I want
 to put this question
 in a little bit of
 historical context.
Let's go back to about
2011 when Google started
  its Google Brain Division.
 At that point it was unclear
 how well deep neural nets
were gonna do in tasks
like image recognition,
 self driving cars, et cetera.
 Since then, Google
 now has roughly 1,500
 machine learning,
 deep learning projects
that they're working on.
 All right, so I want to think
 about the performance here,
  although it's incremental,
  it's not revolutionary,
 we should think about
 this as early days.
There's a lot of hope
for using these methods
  in finance going forward.
  You might think about this
  as something like 2011 cat
  recognition, thank you.
  (applause)
 - Thank you Bryan, our next
 speaker is Julapa Jagtiani.
 She is a Senior Special
 Advisor at Philedelphia Fed,
 and a fellow at Wharton
 Financial Institutions Center.
Welcome, Julapa.
  (applause)
 - So, good morning,
 as Hanna said,
I'm a Senior Advisor at
the Philadelphia Fed,
 and so I'm required
 to remind you
 that everything I say today
 is my own view,
 and not necessarily
 at the Federal Reserve
 Bank of Philadelphia,
or the Federal Reserve systems.
 So, I would like to
 thank the organizers
  for inviting me here, and
  so I'm going to talk about
the roles of alternative
data in FinTech lending,
 and how it impacts consumers.
 So, what kind of alternative
 data are we talking about?
 So, generally when
 we apply for a loan,
we know that FICO score
is a traditional factor
 that's used commonly,
 and for mortgage
  you also have to report
  your income or employment.
Now, alternative data currently
have been used pretty widely
among FinTech lenders,
and that could include,
  besides cash flow,
  which you are now
  actually allowing
  FinTech lenders
 to access your bank accounts,
 your bank statement
 transactions.
 Also, they are using utility
 payments, rent payments,
 your medical payments,
 your online footprint,
 all the websites that
 you have visited,
 and how long you are there,
  how often you go
  to these websites,
  your shopping habit, your
  education and your major,
  all kinds of information.
 And so, if you
 actually allow lenders
 to access your cellphone,
 that's a lot of information
 about all the apps
 that you load there,
  and what time you get up,
and how often your phone
has ran out of battery.
So, all this information
could be used,
 and there are benefits, and
 new types of risk involved
in this type of lending
credit decision.
 So, I'm going to talk
 about the benefit,
 and new types of risk, and
 also the impact on consumers.
So, this is just an example of,
 a real example of a
 vendor that focused
  on identity purposes, but
  also on fraud detection,
  but the data is, basically
  it has over 400 members,
 and you can see
 some of the example
of the members includes
Facebook, Ebay, Walmart,
 Citibank, Visa,
 Mastercard, Aquafax.
  So, a lot of information
  is the consortium of data
that actually pull together
from over 6,000 websites, right.
 So, it is pretty scary
 to see how much data
about all of us is out
there, and we don't know
 whether it's
 accurate, we don't if,
 who is using it, and
 for what purpose.
 So, there are benefits to this
because we have seen
there's statistics shows
that using this alternative data
would allow a lot of consumers,
 for the US about 26
 million Americans
either don't have bank account,
or have thin credit file.
 And so by using this
 alternative data,
  they would be able
  to be included
 in the financial system,
 and it's not just in the US,
 because particularly in China,
we can see that a lot of people
don't have a bank account,
 but they have a cellphone, and
 that has a lot of information
 that could be used
 for credit decision.
  There are risk involved as
  Bryan also said earlier,
 it's very complicated,
 using machine learning
 to analyze big data,
 and alternative data.
 I forgot to mention
 that for this vendor,
  so every time you log into
  your Facebook account,
for example, about 100 variables
 are being collected about you.
  So, they can tell,
  and also lenders
 can tell where you are
 applying for the loans from,
  are you sitting in
  a high crime area?
 What kind of routers
 you're using,
 everything is there
 basically about you.
  So, the risk is
  that we don't know
 what information is there, we
 don't know if it's accurate,
we don't know if
the relationship
is stable enough going forward.
 So, there are
 issues around there,
 and as regulators, we
 intend to provide protection
 to consumers to
 make sure it's fair.
 It's also difficult to
 interpret the results often
 with the unsupervised
 machine learning
because it's sometimes
just not interpretable.
  There's no theory that
  explains the relationship,
 and we don't know if it's fair
  because it could be
  related to race or gender.
  So, from the regulators
  we point there are issues
  related to the black box.
A lot of lenders that subscribe
 to this AI vendors actually
 may not really understand
 how they themselves made the
 decision, credit decision,
  because they don't
  fully understand
 what's inside the black box.
In addition, a lot of AI vendors
 are actually working,
 being outsourced
 to working with many
 large institutions,
  a lot of lenders,
  including FinTech
and traditional lenders,
 and so if there is something
 wrong in the black box,
 it could be a pretty
 widespread problem,
 it could potentially
 impact how we measure
 systemic risk and financial
 stability overall.
 Now, I'm going to
 talk about my research
 that look at FinTech lenders
 compared with
 traditional lenders
 and how it impacts credit
 access and consumers overall.
 So, there are lot
 of, at least some of
the FinTech lenders
here, and they all have
a comparative advantage in
different kind of unique access
  to different information,
  alternative data.
 The consumer lending
 space, there's a paper
 that already has
 been on the website,
and actually has been on
SSR in top ten download.
  Looking at using
  LendingClub data,
 which is the public data
 on the LendingClub website,
and this is loan level data,
and it has all the information,
 a lot of information
 about the borrowers.
 A lot information also about,
  because every month it is
  updated with performance,
so we could see how it
performs over the years.
 I compared a LendingClub loan
  with also loan level data
  from traditional banks,
 this is Y14N data,
 it's stress test data
  that we collect from large
  banks, (mumbles) banks,
loan level, credit card,
we focus on only loans
 and credit card that
 actually carries a balance.
  So, people actually
  borrow using credit card,
  as opposed to just use it
  for transaction purpose,
 and we can compare the rate,
 controlling for all the risk,
 and see how the rates compare.
Also, looking at mortgage space.
  Comparing FinTech lending
  with traditional lenders,
 and so for this paper, it's
 still a work in progress,
 it's not fully done,
 but I wanted to share
 a little bit about that,
 because we have some evidence
 related to consumer
 access to credit.
 So, we used HMDA data,
 which is a loan application,
 mortgage application for
 every application in the US.
  And so we can see
  whether the loans
  actually, the application
  was accepted or denied,
 and whether the
 loans was originated,
 and look at also Mintel
 data, which is a credit offer
  by FinTech and
  non FinTech firms.
  So, this is, basically
  it's, we have information
 about the lenders, we have
 information about the pricing
that the lenders offer
to different consumers,
  and some information about
  consumers' credit risk.
 So, as I said, the first paper
 is based on LendingClub data.
  And LendingClub
  was founded 2006,
  so it has gone
  through the whole,
 pretty much, economic cycle,
  and so when people
  apply for a loan,
  this is consumer loan from
  the consumer platform.
 When people apply for a
 loan on LendingClub platform
 what happen is that
 you pretty much get
 the tentative result,
 decisions immediately,
but if it is accepted,
then LendingClub assigns
 its own rating grade,
 from A to G, right.
 And so, if you look at loans
 that originated in 2007,
 it looks like the rating grade
 is very highly correlated
 with FICO score, which means
 that a lot of information used
 to assign the
 rating actually is,
it's basically not too creative,
not really alternative.
But over the years we
see that the correlation
 actually decline from about
 80% to only about 30%,
and so we can see that
increasingly LendingClub
has been using more, and
more alternative data
 in assigning the rating grade
 A to G in the credit decision.
 And this is A to G assigned,
 and it is used for pricing,
 so it determines what spread
the consumers would have to pay.
  So, we're going
  to focus on loans
 that will originate in 2015,
 and you can see that
 from the right panel
 that this is a distribution
 of rating grade,
 of different FICO segment
 for different rating grade,
from A to G, and a lot of, some
of the sub-prime borrowers,
 based on traditional
 measures, FICO score,
 actually were rated A
 or B by LendingClub,
 so the best rating.
 So, we are going to
 follow these people
  to see how they
  actually perform,
 whether LendingClub
 actually A to G rating
actually is better than
making a mistake, right?
 So, on the left panel,
 the plot on the left
 actually includes only
 sub-prime borrowers,
only people who actually
have FICO below 680,
  and but they are rated by
  LendingClub from A to G,
 and the vertical axis
 actually represents
 default probability
 within 24 months
 after the loan origination.
So, we can see that actually
of all the sub-prime borrowers,
 they're not defaulting
 at the same rate.
  The A and B consumers
  actually have very low PD
compared to the F or G.
  So, it looks like
  alternative data
 actually is able to,
 it has allowed lenders
 to actually identify what we
 call invisible prime consumers
 who are pooled within
 the sub-prime group.
And this is the same
for the right hand side,
basically look across
all the FICO statements.
It's okay, I have a few minutes.
 And so, this shows that most
 of the LendingClub loans,
90%, are used to pay
off credit card balance,
 and for debt consolidation.
So, we compare credit card rate
 for people who
 actually borrow from
the bank through a credit card,
  with the rate that
  they have to pay
 controlled for FICO
 score, and we find
that there's big
saving for these people
 who actually to borrow
 from LendingClub
 to pay off their
 credit card balance.
Now, in terms of credit access,
 we also see that about 50% of,
 we tried many
 different measures,
 (mumbles) index, and
 declining in bank branches,
  just to identify
  underserved areas,
 and so this shows that about
 50% of LendingClub loans
  were originated in
  the areas that has
 less number of bank
 branches per capita.
Now, so overall we find
that alternative data
 has an important
 role in identifying,
  allowing invisible
  prime consumers
 to actually be,
 have access to loan
at the much lower cost.
 I wanted to show also that,
 so this shows how the
 lift from using both,
 we include actually
 not just FICO,
  the four different
  models, FICO score
 with all the economic factors,
 and risk of the borrowers,
 including income, employment,
 credit inquiry, home
 ownership, everything,
  but still without
  the rating grade
 it's still not good enough,
 and so with the rating
 grade included as well
there's a lift, a
significant lift there.
 Now, just to offer mortgage,
 I wanted to just give you
a quick summary, because I
only have like one minute left.
 So, using mortgage HMDA data,
  we find that consumers who
  apply for a mortgage loan,
 and got denied by
 traditional lenders
  actually turned to
  FinTech lenders.
 So, what we find is
 that area that has
 a lot of denial, high denial
 rate of a mortgage application
 which increase the
 ratio of FinTech usage
 in the next period,
 actually within the county.
 Now, some statistics
 that shows that,
this is from Intel Credit offer,
 that basically mortgage rate
 that FinTech lenders offer
 lies between traditional
 bank, and shadow bank lenders,
 but also the higher
 rate between,
  over the traditional bank,
  could also be explained by
 the fact that actually
 they made more offers
to less credit worthy consumers
  in terms of lower income,
  and lower FICO score.
 So overall,
 basically we see that
  alternative data
  has a role to play
  in expanding consumer
  credit, and at lower cost.
  Thank you.
  (applause)
 - [Hanna] Thank you Julapa.
Our final speaker of
the session is Will Cong
  of University of Chicago's
  Booth School of Business.
Thank you, welcome Will.
 - Thank you.
  (applause)
 Very good morning everyone,
 today I'm going to talk about
 blockchain disruption,
 and smart contracts.
  Rather than going
  into the details
 of a specific paper,
 what I plan to do
is actually to clarify concepts
and provide you a
framework to think about
this very new, emerging
field of FinTech.
 I'm sure many of
 you have come across
  all these concepts
  about blockchain,
  it comes under different
  names, different articles.
 This is just showing
 the Google search
for the word blockchain
as compared to S&P 500.
It's scaled, you should notice.
 The average search is
 actually more than 10 times.
 And I'm sure you
 know that blockchains
 are also typically associated
 with cryptocurrencies,
  which also comes with a
  gazillion different names,
altcoins, cryptotokens,
so on, so forth.
  So, what I really hope is
  by the end of this session
 you can take away what
 exactly is blockchain,
what are the key issues
that we should look at?
 You can approach the
 investing in cryptocurrency,
  or ICO's with a
  proper framework,
and that would be my goal today.
 But towards the end,
 I'll also go back
to one particular paper
on blockchain disruption
 and smart contract,
 just to give you
  some more concrete
  examples of the mechanism.
  So, just yesterday
  I read an article,
 kind of a review article from
 a top research institution,
which is talking about
blockchains as if it's Bitcoin.
I'm very opposed
to that concept.
 Bitcoin is a early
 experiment of the technology.
 Many people will say, "Well,
 the key feature of blockchain
 "is anonymity, because Bitcoin
 provides this key feature."
 Well, that's a
 feature of Bitcoin,
 but that's a parameter
 we can design
under the blockchain technology.
So, what exactly is blockchain?
I would argue the key innovation
 for the technology is really
 decentralized consensus.
 And decentralized here
 is a matter of extent.
 Even if we are talking
 about permission blockchain
 where a few big institutions
 form a consortium group
  use this shared ledger to
  record their information.
 That's also more decentralized
than what is traditionally done.
  Right, so that's what I
  mean by decentralization.
 So, what is consensus?
 Well, consensus is
 a familiar concept.
 Societies, and economies
 have functioned on consensus
  for hundreds of years, if
  not thousands of years.
Typically, it's provided
by a centralized party,
 such as a government, court,
 or a third party
 business arbitrator.
 So, the innovation of
 blockchain technology
 is really having a
 more decentralized way
 to generate consensus so that
 even if you don't like the law
 you're still going to behave
 as if it were the truth.
 So, it's a way for people to
 interact and work together.
  Now, what are the
  benefits of having
a more decentralized consensus.
 Well, typically
 there are two reasons
 practitioners and
 economics would give
 for more decentralized
 consensus.
 Number one is we can prevent
 single points of failure.
It could be a technical
failure of a database
located in a particular region.
For example, cloud computation.
  If it's located in
  a particular city,
and there is a natural disaster,
 then the system could go down,
  whereas if you do it in
  a more decentralized way,
 there's (mumbles) that
 we can gain to it.
It doesn't have to be technical.
 It could be a single
 judge who's prone
 to bribery or corruption that
 could make a case, a failure.
 Whereas if we have
 a more decentralized
 way of generating consensus
 that would prevent this.
Another reason people
typically talk about is
  we'll get more
  disintermediation,
  we'll reduce the rent
  paid to the intermediaries
because this, at least
with public blockchains,
 there's free entry, there's
 a lot of competition
  for people who
  compete to provide
this service of
decentralized consensus.
 That is a endogenous variable,
 if I may use a economist term,
 so I don't take a
 strong stand on that,
  but these are the typical
  reasons people give.
 So, given this, there are
 really two sets of questions
 we can ask ourselves.
The first set is, "Okay,
within the system,
"how do we provide this
decentralized consensus?
  "What are the trade offs?
 "Is that a sustainable
 system going forward?"
  One subset of this
  set of questions
  is how does this consensus
  provision game plays out?
 I'm sure you've
 heard of mining games
where miners compete by
solving cryptographic problems,
 or puzzles to win
 the right to provide
 this decentralized consensus
in the form of recording
the next block,
 and getting rewarded for that.
 Right, so there's a
 number of studies,
 I leave some references here,
  I believe the slides will
  be available publicly,
 so you can definitely
 read more into that.
Another key question is
many of the protocols,
 where the proof of
 work, proof of stake,
  they're typically
  under the premise
 that there is adequate
 decentralization.
That's a technical possibility,
 but it may now be
 a economic reality,
 in the sense that
 when people actually
play the game according
to the protocols,
 there might be
 centralizing forces.
 What are some of the
 centralizing forces?
 Well, we've all heard
 how Bitcoin mining
 is taking up more than 1.5%
 of US energy consumption,
 as comparable to Switzerland's
 energy consumption annually.
So, energy cost is pretty high.
 Why is that the case?
Well, if we want
to decentralize,
 we have to duplicate nodes,
  we have to update all
  the nodes in the network,
 there is a force
 for centralization,
 at least an argument
 for centralization
 to reduce duplication cost.
 On smart contract information,
 I'll come back to that
 later for this talk,
there's also information
consideration
 that's more related to data
 storage and data processing.
Here I'm just going to
give one quick example.
Another force for centralization
which is reassuring.
 So, if we think about
 miners for Bitcoin,
  or proof of work
  based blockchains,
 what they do is they
 solve these puzzles,
they have a probability
of being allocated
 the right to record
 the next block,
 and win the reward,
 but that's random,
there's some randomness.
So, in order to share the risk,
  very much like an
  insurance company,
 mining pools naturally arise.
We get drawn together no matter
 who's successfully
 mines the next block,
we can share the reward.
That (mumbles) our consumption,
and there are benefits to that.
 Is that going to lead
 to over concentration,
which defies the purpose
of decentralized system?
Well, there is a lot of concern,
 and this is discussed heavily,
 debated vehemently in
 forums among practitioners.
  This is a picture showing
  the overall global
 computation power devoted
 to Bitcoin mining over time.
And all these different
colors are mining pools
plotted as a percentage
of global mining power,
  computation power
  devoted to this activity.
 We see rise of mining pools,
 but it seems that there's a
 mirror reverting force there.
 So, what is leading to this?
  Well, yes reassuring leads
  to concentration of pools,
 but that's not the only force,
there are other forces that
would sustain decentralization.
 For example, if we've
 taken Finance 101
 we know that instead
 of devoting everything
 to the biggest pool,
 I can also diversify
across smaller pools,
that gives me the same,
 that reciprocation benefit.
There are also in this
real organization forces
playing a role, larger pools
are going to charge larger fees
 and that's going to make
 them grow percentagely less.
 So, the key concern is really
 not over concentration.
 It's really about
 this arms race where,
at least under the
proof of work protocols,
 where we are spending
 a lot of energy
in a tournament that's
not socially beneficial.
That's just one example
of centralization
versus decentralization.
 Now, to summarize that
 in a picture here,
 the internal economics
 of blockchain,
 at least in my humble opinion,
is really about
balancing three factors.
  This is very much akin to
  the impossibility trinity
of international finance
where you can't have
 free exchange rate,
 free capital flow,
and sovereign monetary
policy at the same time.
  Here is, it's very hard
  to have decentralization,
and consensus, and
scalability at the same,
 which together can provide
 you a functional trust system.
 Bitcoin has decentralization
 and consensus,
but it doesn't have scalability.
  Visa or Mastercard have
  consensus and scalability,
 but it's not a
 decentralized system.
 So, this is something
 practitioners
are actively working on,
  whether through layer one
  or layer two innovations.
 So, I'm not going to go into
 that in detail too much.
  Now, the second line, or
  second set of questions is
taking the functionality
of blockchains as given,
 how do they impact traditional
 industries and business?
  There are many
  references, at least here,
 including work done by
 Hanna on platform adoption,
 and auditing, that's really
 the two technology as well,
one quick example I can give is
  how ICO's or tokens affect
  platform of network roles.
 This is just plotting
 the theorem platform
which is for smart contracting.
  I'm plotting both
  the market cap,
 and the active user address,
 number of active user address.
 So, as you can see the
 endogenous adoption
 of the platform
 is closely related to how
 much these tokens are valued.
 Right, so in a separate study
 we actually look at
 the token pricing
  and the endogenous
  adoption of users
 to relate these two parties.
 I just want to point
 out we can talk about
 the fundamental
 evaluation of tokens.
Where do they derive value from?
Well, they are used as a medium,
or exchanged on these platforms.
So, in that sense they
have some money feature,
but they are not stable enough.
  You can actually show they
  are inherently volatile.
So, it's not pure money,
but at the same time,
  it's not cash flow based.
So, it's not our typical
investment asset.
 So, it's more like a hybrid,
  and I highly encourage you
 to explore more
 along that dimension.
So, that's more relevant
for secondary market.
  If we go back to
  the primary market
 of investing in
 ventures and startups,
it's also important to
understand the roles of tokens.
  Actually, I just mentioned
  one rule of token
  which is accelerating the
  adoption of platforms,
 which is very much in line
 with practitioners' concept
of bootstrapping the community,
bootstrapping the platform.
I think that's very
important to understand
 when you look at a startup,
 if they claim they
 are using tokens for,
 in (mumbles) the community,
 and there's no user in that
 work effect for the community,
 then that's not a valid claim.
  I think these are concepts
  important to clarify.
So, now let me come back
 to this smart
 contract a little bit.
  So, this is a paper titled
 Blockchain Disruption
 and Smart Contract.
The main idea is, "Okay,
we know blockchain
"provides decentralized
consensus,
  "presumably that's
  going to allow us
 "to contract on
 certain contingency,
  "or contingent outcomes."
Let me just jump to the example,
  I think that's a little
  bit easier to talk about.
I'll come back to these
quotes in a little bit.
 The example I like to use
 is a trade finance example.
 When a exporter sends
 goods to the importer.
 Traditionally, well firstly,
 this is a three trillion
  US dollar annual business,
and traditionally, it's
not very efficient.
 You need to get letter
 of credit from banks,
  you need to have multiple
  paperwork to get it done.
The sender doesn't want to send
before they receive the payment,
the receiver doesn't want to pay
 before they receive
 the goods, right.
 So, this is where blockchain
 and smart contract
could potentially help.
We could generate a
decentralized consensus,
 and more real time consensus
 of the delivery status.
  For example, the shipment
  has passed it in ports,
or we can use internet of
things to monitor the condition
 of the shipment of
 wine, for an example,
from, I don't know,
California to New York.
 Whether it's under the proper
 temperature environment.
 And based on this consensus,
 the sender and the receiver
 can then use smart contract
 to automate certain transfers
  of either cryptocurrency,
 or the same equivalent
 value in field money.
So, that's where smart
contracting could help,
using the blockchain technology.
 But at the same time,
 how do we maintain
 this decentralized consensus.
Remember, we're saying,
"Well, if we wanted
 "to be very advanced,
 we might have to
 "make it reasonably
 decentralized
 "and that requires
 distribution of information."
  There are several reasons
  why that's a concern.
Well firstly, there are
several central banks
  that have reported to
  generate these consensus,
  or to distribute
  more information.
  There is privacy concern
  of the clients, for sure.
I'd like to focus your attention
  to the third quote
  which says that,
 "Technology also
 facilitates cartel."
  Because a few big
  oligarchy firms
 can actually observe
 more information
 if they are in this
 blockchain system.
 And they can tacitly recruit,
  which is a concern
  for regulators,
 and that's something we
 explore more in this paper.
In the interest of time,
I'm going to actually
skip this illustration,
 it's just showing
 there's a trade off
 between decentralization
 and the quality of consensus
 we can maintain or generate
 in this environment.
I have one minute, yeah.
So, let me quickly wrap up here.
 At least as regards to
 this particular study.
To generate
decentralized consensus,
 we necessarily have to
 distribute more information,
  and you might say, "Well,
  there's proof of work.
 "There are algorithms
 that allow us
 "to encrypt the information.
"That should solve the problem."
 Well, encrypted information
 is still information,
and the more you
encrypt the information,
 the less you can verify or
 confirm (mumbles) dimensions.
  So, there is a trade off.
 So, that is something
 I wanted to point out.
 And that should be relevant
 for how we regulate
 monopoly, and market power,
 and the industry organization
 of trade finance,
 it's just one example
 you can think about
 transactions, or
 trading and (mumbles),
so on, so forth.
 And there are more details
 in the references I listed,
 and I hope this is a
 helpful, conceptual framework
 for everyone to apply,
 whether you are in
 secondary market, or primary
 market investing in ventures.
 Thank you very much.
  (applause)
 - So, I want to give
 a very warm thank you
 to Hanna and to this panel.
That was some fascinating stuff.
So, I think everybody
probably wants to go out
 and figure out how to trade
on a triple neural
network right about now.
  (laughter)
When you're thinking about that,
go get a cup of coffee,
and come back here
  at by around 10:30,
  10:35 for our next panel,
and also some really cool demos.
  So, we'll you in
  about 20 minutes.
