It is the hot topic for data
journalists in this election.
Some call it MRP, some
Mr P. But the full name
is multi-level regression
with post-stratification.
So what is it?
In short, it's a way of
using a big national poll
to estimate how people will
vote at constituency level.
National polls of
1,000 people are
good at telling us what
share of the national vote
each party will get, but not
so good at predicting who
will win each of the 650 seats.
And in the UK system it's
seats, not votes, that counts.
For example, in 2015,
the Conservatives
won 37 per cent of votes
and 51 per cent of seats.
Ukip won 13 per cent of
the vote and less than 1
per cent of seats.
Traditionally,
pollsters have tried
to get around this
using a method
like uniform national swing.
If Labour was down 11 percentage
points on the last general
election nationally, the
pollsters would subtract 11
percentage points from their
vote share in every seat.
But that can't capture all the
electoral nuance, for example,
the influence of Leave and
Remain in particular areas,
or big student votes
in university towns.
So in comes MRP.
Step one, a large poll
sample, tens of thousands
of people across the country.
That's because you want dozens
of people in each constituency,
the more the better,
to pick up on what
makes that seat
different from the rest.
Step two.
Don't just ask them
who they're voting for.
But who they are.
Age, sex, ethnicity,
education level,
housing, occupation, how they
voted in the EU referendum.
You'll also have gathered
lots of local information
about their constituency, from
which parties have historically
done well or poorly
there, to what's
happened to house prices.
So you have data at
the individual level,
but also the context of the
wider geographical area.
That's why it's
called multi-level.
You then run a
regression on that data.
That's a statistical
technique that
measures the
probability of someone
with those combinations
of personal and local
characteristics, A,
voting at all, and B,
voting for a particular party.
So we've done MR. Then comes
P, post-stratification.
This is where the
modellers use data
from sources like the census and
the annual population survey.
They can tally up
the number of people
with each combination of these
demographic and socio-economic
characteristics in
every constituency
and then apply the voting
probabilities from the MR step
onto the population data.
So you have an estimate for
how a white British male who
left school at 16 is likely
to vote, for example.
And that estimate will
differ between Great Grimsby,
Northwest Durham,
and Glasgow Central.
In fact, you have a
series of estimates
for different
demographic combinations
in different places.
So you then combine
them to give you
the total number of votes
each party is likely to secure
in every one of the
650 constituencies,
and that can be used to
calculate which party
is most likely to win each
seat, which is most likely to be
its closest challenger,
how big the margin
between first and second place
is likely to be, and so on.
This allows parties to better
target their campaigning
resources on seats that
are going to be close.
And it can help people like you
make a more informed decision
on who to vote for tactically.
It gives the public a
more nuanced picture
of how the election
is likely to play out.
Now, it's not a perfect system.
This type of
modelling is complex,
and there are many variables.
And the choices the
modellers make mean one model
will have different
outputs to the next.
But MRP is the most
refined tool we have
until the votes are
actually counted.
