In probability theory, the expected value
of a random variable is a key aspect of its
probability distribution. Intuitively, a random
variable's expected value represents the average
of a large number of independent realizations
of the random variable. For example, the expected
value of rolling a six-sided die is 3.5, because
the average of all the numbers that come up
converges to 3.5 as the number of rolls approaches
infinity (see § Examples for details). The
expected value is also known as the expectation,
mathematical expectation, mean, or first moment.
A bit more formally, the expected value of
a discrete random variable is the probability-weighted
average of all its possible values. In other
words, each possible value the random variable
can assume is multiplied by its probability
of occurring, and the resulting products are
summed to produce the expected value. The
same principle applies to an absolutely continuous
random variable, except that an integral of
the variable with respect to its probability
density replaces the sum. The formal definition
subsumes both of these and also works for
distributions which are neither discrete nor
absolutely continuous; the expected value
of a random variable is the integral of the
random variable with respect to its probability
measure.The expectation of a random variable
plays an important role in a variety of contexts.
For example, in decision theory, an agent
making an optimal choice in the context of
incomplete information is often assumed to
maximize the expected value of their utility
function.
For a different example, in statistics, where
one seeks estimates for unknown parameters
based on available data, the estimate itself
is a random variable. In such settings, a
desirable criteria for a "good" estimator
is that it is unbiased -- that is, the expected
value of the estimate is equal to the true
value of the underlying parameter.
== History ==
The idea of the expected value originated
in the middle of the 17th century from the
study of the so-called problem of points,
which seeks to divide the stakes in a fair
way between two players who have to end their
game before it's properly finished. This problem
had been debated for centuries, and many conflicting
proposals and solutions had been suggested
over the years, when it was posed in 1654
to Blaise Pascal by French writer and amateur
mathematician Chevalier de Méré. Méré
claimed that this problem couldn't be solved
and that it showed just how flawed mathematics
was when it came to its application to the
real world. Pascal, being a mathematician,
was provoked and determined to solve the problem
once and for all. He began to discuss the
problem in a now famous series of letters
to Pierre de Fermat. Soon enough they both
independently came up with a solution. They
solved the problem in different computational
ways but their results were identical because
their computations were based on the same
fundamental principle. The principle is that
the value of a future gain should be directly
proportional to the chance of getting it.
This principle seemed to have come naturally
to both of them. They were very pleased by
the fact that they had found essentially the
same solution and this in turn made them absolutely
convinced they had solved the problem conclusively.
However, they did not publish their findings.
They only informed a small circle of mutual
scientific friends in Paris about it.Three
years later, in 1657, a Dutch mathematician
Christiaan Huygens, who had just visited Paris,
published a treatise (see Huygens (1657))
"De ratiociniis in ludo aleæ" on probability
theory. In this book he considered the problem
of points and presented a solution based on
the same principle as the solutions of Pascal
and Fermat. Huygens also extended the concept
of expectation by adding rules for how to
calculate expectations in more complicated
situations than the original problem (e.g.,
for three or more players). In this sense
this book can be seen as the first successful
attempt at laying down the foundations of
the theory of probability.
In the foreword to his book, Huygens wrote:
"It should be said, also, that for some time
some of the best mathematicians of France
have occupied themselves with this kind of
calculus so that no one should attribute to
me the honour of the first invention. This
does not belong to me. But these savants,
although they put each other to the test by
proposing to each other many questions difficult
to solve, have hidden their methods. I have
had therefore to examine and go deeply for
myself into this matter by beginning with
the elements, and it is impossible for me
for this reason to affirm that I have even
started from the same principle. But finally
I have found that my answers in many cases
do not differ from theirs." (cited by Edwards
(2002)). Thus, Huygens learned about de Méré's
Problem in 1655 during his visit to France;
later on 
in 1656 from his correspondence with Carcavi
he learned that his method was essentially
the same as Pascal's; so that before his book
went to press in 1657 he knew about Pascal's
priority in this subject.
Neither Pascal nor Huygens used the term "expectation"
in its modern sense. In particular, Huygens
writes: "That my Chance or Expectation to
win any thing is worth just such a Sum, as
wou'd procure me in the same Chance and Expectation
at a fair Lay. ... If I expect a or b, and
have an equal Chance of gaining them, my Expectation
is worth a+b/2." More than a hundred years
later, in 1814, Pierre-Simon Laplace published
his tract "Théorie analytique des probabilités",
where the concept of expected value was defined
explicitly:
… this advantage in the theory of chance
is the product of the sum hoped for by the
probability of obtaining it; it is the partial
sum which ought to result when we do not wish
to run the risks of the event in supposing
that the division is made proportional to
the probabilities. This division is the only
equitable one when all strange circumstances
are eliminated; because an equal degree of
probability gives an equal right for the sum
hoped for. We will call this advantage mathematical
hope.
The use of the letter E to denote expected
value goes back to W.A. Whitworth in 1901,
who used a script E. The symbol has become
popular since for English writers it meant
"Expectation", for Germans "Erwartungswert",
for Spanish "Esperanza matemática" and for
French "Espérance mathématique".
== Definition ==
=== Finite case ===
Let
X
{\displaystyle X}
be a random variable with a finite number
of finite outcomes
x
1
,
x
2
,
…
,
x
k
{\displaystyle x_{1},x_{2},\ldots ,x_{k}}
occurring with probabilities
p
1
,
p
2
,
…
,
p
k
,
{\displaystyle p_{1},p_{2},\ldots ,p_{k},}
respectively. The expectation of
X
{\displaystyle X}
is defined as
E
⁡
[
X
]
=
∑
i
=
1
k
x
i
p
i
=
x
1
p
1
+
x
2
p
2
+
⋯
+
x
k
p
k
.
{\displaystyle \operatorname {E} [X]=\sum
_{i=1}^{k}x_{i}\,p_{i}=x_{1}p_{1}+x_{2}p_{2}+\cdots
+x_{k}p_{k}.}
Since all probabilities
p
i
{\displaystyle p_{i}}
add up to 1 (
p
1
+
p
2
+
⋯
+
p
k
=
1
{\displaystyle p_{1}+p_{2}+\cdots +p_{k}=1}
), the expected value is the weighted average,
with
p
i
{\displaystyle p_{i}}
’s being the weights.
If all outcomes
x
i
{\displaystyle x_{i}}
are equiprobable (that is,
p
1
=
p
2
=
⋯
=
p
k
{\displaystyle p_{1}=p_{2}=\cdots =p_{k}}
), then the weighted average turns into the
simple average. If the outcomes
x
i
{\displaystyle x_{i}}
are not equiprobable, then the simple average
must be replaced with the weighted average,
which takes into account the fact that some
outcomes are more likely than the others.
==== Examples ====
Let
X
{\displaystyle X}
represent the outcome of a roll of a fair
six-sided die. More specifically,
X
{\displaystyle X}
will be the number of pips showing on the
top face of the die after the toss. The possible
values for
X
{\displaystyle X}
are 1, 2, 3, 4, 5, and 6, all of which are
equally likely with a probability of 1/6.
The expectation of
X
{\displaystyle X}
is
E
⁡
[
X
]
=
1
⋅
1
6
+
2
⋅
1
6
+
3
⋅
1
6
+
4
⋅
1
6
+
5
⋅
1
6
+
6
⋅
1
6
=
3.5.
{\displaystyle \operatorname {E} [X]=1\cdot
{\frac {1}{6}}+2\cdot {\frac {1}{6}}+3\cdot
{\frac {1}{6}}+4\cdot {\frac {1}{6}}+5\cdot
{\frac {1}{6}}+6\cdot {\frac {1}{6}}=3.5.}
If one rolls the die
n
{\displaystyle n}
times and computes the average (arithmetic
mean) of the results, then as
n
{\displaystyle n}
grows, the average will almost surely converge
to the expected value, a fact known as the
strong law of large numbers.The roulette game
consists of a small ball and a wheel with
38 numbered pockets around the edge. As the
wheel is spun, the ball bounces around randomly
until it settles down in one of the pockets.
Suppose random variable
X
{\displaystyle X}
represents the (monetary) outcome of a $1
bet on a single number ("straight up" bet).
If the bet wins (which happens with probability
1/38 in American roulette), the payoff is
$35; otherwise the player loses the bet. The
expected profit from such a bet will be
E
⁡
[
gain from
$
1
bet
]
=
−
$
1
⋅
37
38
+
$
35
⋅
1
38
=
−
$
1
19
.
{\displaystyle \operatorname {E} [\,{\text{gain
from }}\$1{\text{ bet}}\,]=-\$1\cdot {\frac
{37}{38}}+\$35\cdot {\frac {1}{38}}=-\${\frac
{1}{19}}.}
That is, the bet of $1 stands to lose
−
$
1
19
{\displaystyle -\${\frac {1}{19}}}
, so its expected value is
−
$
1
19
.
{\displaystyle -\${\frac {1}{19}}.}
=== Countably infinite case ===
Intuitively, the expectation of a random variable
taking values in a countable set of outcomes
is defined analogously as the weighted sum
of the outcome values, where the weights correspond
to the probabilities of realizing that value.
However, convergence issues associated with
the infinite sum necessitate a more careful
definition. A rigorous definition first defines
expectation of a non-negative random variable,
and then adapts it to general random variables.
Let
X
{\displaystyle X}
be a non-negative random variable with a countable
set of outcomes
x
1
,
x
2
,
…
,
{\displaystyle x_{1},x_{2},\ldots ,}
occurring with probabilities
p
1
,
p
2
,
…
,
{\displaystyle p_{1},p_{2},\ldots ,}
respectively. Analogous to the discrete case,
the expected value of
X
{\displaystyle X}
is then defined as the series
E
⁡
[
X
]
=
∑
i
=
1
∞
x
i
p
i
.
{\displaystyle \operatorname {E} [X]=\sum
_{i=1}^{\infty }x_{i}\,p_{i}.}
Note that since
x
i
p
i
≥
0
{\displaystyle x_{i}p_{i}\geq 0}
, the infinite sum is well-defined and does
not depend on the order in which it is computed.
Unlike the discrete case, the expectation
here can be equal to infinity, if the infinite
sum above increases without bound.
For a general random variable
X
{\displaystyle X}
that need not be non-negative, first define
X
+
=
max
{
X
,
0
}
{\displaystyle X^{+}=\max\{X,0\}}
and
X
−
=
max
{
−
X
,
0
}
{\displaystyle X^{-}=\max\{-X,0\}}
. Observe that
X
=
X
+
−
X
−
{\displaystyle X=X^{+}-X^{-}}
, and both
X
+
{\displaystyle X^{+}}
and
X
−
{\displaystyle X^{-}}
are non-negative random variables. Hence,
E
⁡
[
X
+
]
{\displaystyle \operatorname {E} [X^{+}]}
and
E
⁡
[
X
−
]
{\displaystyle \operatorname {E} [X^{-}]}
are well-defined (using either the definition
for finite discrete random variables or non-negative
countable random variables). Then, we define
E
⁡
[
X
]
{\displaystyle \operatorname {E} [X]}
as follows:
E
⁡
[
X
]
=
{
E
⁡
[
X
+
]
−
E
⁡
[
X
−
]
if
E
⁡
[
X
+
]
<
∞
and
E
⁡
[
X
−
]
<
∞
;
∞
if
E
⁡
[
X
+
]
=
∞
and
E
⁡
[
X
−
]
<
∞
;
−
∞
if
E
⁡
[
X
+
]
<
∞
and
E
⁡
[
X
−
]
=
∞
;
undefined
if
E
⁡
[
X
+
]
=
∞
and
E
⁡
[
X
−
]
=
∞
.
{\displaystyle \operatorname {E} [X]={\begin{cases}\operatorname
{E} [X^{+}]-\operatorname {E} [X^{-}]&{\text{if
}}\operatorname {E} [X^{+}]<\infty {\text{
and }}\operatorname {E} [X^{-}]<\infty ;\\\infty
&{\text{if }}\operatorname {E} [X^{+}]=\infty
{\text{ and }}\operatorname {E} [X^{-}]<\infty
;\\-\infty &{\text{if }}\operatorname {E}
[X^{+}]<\infty {\text{ and }}\operatorname
{E} [X^{-}]=\infty ;\\{\text{undefined}}&{\text{if
}}\operatorname {E} [X^{+}]=\infty {\text{
and }}\operatorname {E} [X^{-}]=\infty .\\\end{cases}}}
==== Examples ====
Suppose
x
i
=
i
{\displaystyle x_{i}=i}
and
p
i
=
k
i
2
i
,
{\displaystyle p_{i}={\frac {k}{i2^{i}}},}
for
i
=
1
,
2
,
3
,
…
{\displaystyle i=1,2,3,\ldots }
, where
k
=
1
ln
⁡
2
{\displaystyle k={\frac {1}{\ln 2}}}
(with
ln
{\displaystyle \ln }
being the natural logarithm) is the scale
factor such that the probabilities sum to
1. Then, using the direct definition for non-negative
random variables, we have
E
⁡
[
X
]
=
∑
i
x
i
p
i
=
1
(
k
2
)
+
2
(
k
8
)
+
3
(
k
24
)
+
⋯
=
k
2
+
k
4
+
k
8
+
⋯
=
k
.
{\displaystyle \operatorname {E} [X]=\sum
_{i}x_{i}p_{i}=1\left({\frac {k}{2}}\right)+2\left({\frac
{k}{8}}\right)+3\left({\frac {k}{24}}\right)+\dots
={\frac {k}{2}}+{\frac {k}{4}}+{\frac {k}{8}}+\dots
=k.}
An example where the expectation is infinite
arises in the context of the St. Petersburg
paradox. Let
x
i
=
2
i
{\displaystyle x_{i}=2^{i}}
and
p
i
=
1
2
i
{\displaystyle p_{i}={\frac {1}{2^{i}}}}
for
i
=
1
,
2
,
3
,
…
{\displaystyle i=1,2,3,\ldots }
. Once again, since the random variable is
non-negative, the expected value calculation
gives
E
⁡
[
X
]
=
∑
i
=
1
∞
x
i
p
i
=
2
⋅
1
2
+
4
⋅
1
4
+
8
⋅
1
8
+
16
⋅
1
16
+
⋯
=
1
+
1
+
1
+
1
+
⋯
=
∞
.
{\displaystyle \operatorname {E} [X]=\sum
_{i=1}^{\infty }x_{i}\,p_{i}=2\cdot {\frac
{1}{2}}+4\cdot {\frac {1}{4}}+8\cdot {\frac
{1}{8}}+16\cdot {\frac {1}{16}}+\cdots =1+1+1+1+\cdots
\,=\infty .}
For an example where the expectation is not
well-defined, suppose the random variable
X
{\displaystyle X}
takes values 1, −2, 3, −4, ..., with respective
probabilities
c
1
2
,
c
2
2
,
c
3
2
,
c
4
2
{\displaystyle {\frac {c}{1^{2}}},{\frac {c}{2^{2}}},{\frac
{c}{3^{2}}},{\frac {c}{4^{2}}}}
, ..., where
c
=
6
π
2
{\displaystyle c={\frac {6}{\pi ^{2}}}}
is a normalizing constant that ensures the
probabilities sum up to one.Then, it follows
that
X
+
{\displaystyle X^{+}}
takes value
2
k
−
1
{\displaystyle 2k-1}
with probability
c
/
(
2
k
−
1
)
2
{\displaystyle c/(2k-1)^{2}}
for
k
=
1
,
2
,
3
,
⋯
{\displaystyle k=1,2,3,\cdots }
and takes value
0
{\displaystyle 0}
with remaining probability. Similarly,
X
−
{\displaystyle X^{-}}
takes value
2
k
{\displaystyle 2k}
with probability
c
/
(
2
k
)
2
{\displaystyle c/(2k)^{2}}
for
k
=
1
,
2
,
3
,
⋯
{\displaystyle k=1,2,3,\cdots }
and takes value
0
{\displaystyle 0}
with remaining probability. Using the definition
for non-negative random variables, one can
show that both
E
⁡
[
X
+
]
=
∞
{\displaystyle \operatorname {E} [X^{+}]=\infty
}
and
E
⁡
[
X
−
]
=
∞
{\displaystyle \operatorname {E} [X^{-}]=\infty
}
(see Harmonic series). Hence, the expectation
of
X
{\displaystyle X}
is not well-defined.
=== Absolutely continuous case ===
If
X
{\displaystyle X}
is a random variable whose cumulative distribution
function admits a density
f
(
x
)
{\displaystyle f(x)}
, then the expected value is defined as the
following Lebesgue integral, if the integral
exists:
E
⁡
[
X
]
=
∫
R
x
f
(
x
)
d
x
.
{\displaystyle \operatorname {E} [X]=\int
_{\mathbb {R} }xf(x)\,dx.}
The expected value of a random variable may
be undefined, if the integral does not exist.
An example of such a random variable is one
with the Cauchy distribution, due to its large
"tails".
=== General case ===
In general, if
X
{\displaystyle X}
is a non-negative random variable defined
on a probability space
(
Ω
,
Σ
,
P
)
{\displaystyle (\Omega ,\Sigma ,\operatorname
{P} )}
, then the expected value of
X
{\displaystyle X}
, denoted by
E
⁡
[
X
]
{\displaystyle \operatorname {E} [X]}
, is defined as the Lebesgue integral
E
⁡
[
X
]
=
∫
Ω
X
(
ω
)
d
P
⁡
(
ω
)
.
{\displaystyle \operatorname {E} [X]=\int
_{\Omega }X(\omega )\,d\operatorname {P} (\omega
).}
For a general random variable
X
{\displaystyle X}
, define as before
X
+
(
ω
)
=
max
(
X
(
ω
)
,
0
)
{\displaystyle X^{+}(\omega )=\max(X(\omega
),0)}
and
X
−
(
ω
)
=
−
min
(
X
(
ω
)
,
0
)
{\displaystyle X^{-}(\omega )=-\min(X(\omega
),0)}
, and note that
X
=
X
+
−
X
−
{\displaystyle X=X^{+}-X^{-}}
, with both
X
+
{\displaystyle X^{+}}
and
X
−
{\displaystyle X^{-}}
nonnegative. Then, the expected value of
X
{\displaystyle X}
is defined as
E
⁡
[
X
]
=
{
E
⁡
[
X
+
]
−
E
⁡
[
X
−
]
if
E
⁡
[
X
+
]
<
∞
and
E
⁡
[
X
−
]
<
∞
;
∞
if
E
⁡
[
X
+
]
=
∞
and
E
⁡
[
X
−
]
<
∞
;
−
∞
if
E
⁡
[
X
+
]
<
∞
and
E
⁡
[
X
−
]
=
∞
;
undefined
if
E
⁡
[
X
+
]
=
∞
and
E
⁡
[
X
−
]
=
∞
.
{\displaystyle \operatorname {E} [X]={\begin{cases}\operatorname
{E} [X^{+}]-\operatorname {E} [X^{-}]&{\text{if
}}\operatorname {E} [X^{+}]<\infty {\text{
and }}\operatorname {E} [X^{-}]<\infty ;\\\infty
&{\text{if }}\operatorname {E} [X^{+}]=\infty
{\text{ and }}\operatorname {E} [X^{-}]<\infty
;\\-\infty &{\text{if }}\operatorname {E}
[X^{+}]<\infty {\text{ and }}\operatorname
{E} [X^{-}]=\infty ;\\{\text{undefined}}&{\text{if
}}\operatorname {E} [X^{+}]=\infty {\text{
and }}\operatorname {E} [X^{-}]=\infty .\end{cases}}}
For multidimensional random variables, their
expected value is defined per component, i.e.
E
⁡
[
(
X
1
,
…
,
X
n
)
]
=
(
E
⁡
[
X
1
]
,
…
,
E
⁡
[
X
n
]
)
{\displaystyle \operatorname {E} [(X_{1},\ldots
,X_{n})]=(\operatorname {E} [X_{1}],\ldots
,\operatorname {E} [X_{n}])}
and, for a random matrix
X
{\displaystyle X}
with elements
X
i
j
{\displaystyle X_{ij}}
,
(
E
⁡
[
X
]
)
i
j
=
E
⁡
[
X
i
j
]
.
{\displaystyle (\operatorname {E} [X])_{ij}=\operatorname
{E} [X_{ij}].}
== Uses and applications ==
It is possible to construct an expected value
equal to the probability of an event by taking
the expectation of an indicator function that
is one if the event has occurred and zero
otherwise. This relationship can be used to
translate properties of expected values into
properties of probabilities, e.g. using the
law of large numbers to justify estimating
probabilities by frequencies.
The expected values of the powers of X are
called the moments of X; the moments about
the mean of X are expected values of powers
of X − E[X]. The moments of some random
variables can be used to specify their distributions,
via their moment generating functions.
To empirically estimate the expected value
of a random variable, one repeatedly measures
observations of the variable and computes
the arithmetic mean of the results. If the
expected value exists, this procedure estimates
the true expected value in an unbiased manner
and has the property of minimizing the sum
of the squares of the residuals (the sum of
the squared differences between the observations
and the estimate). The law of large numbers
demonstrates (under fairly mild conditions)
that, as the size of the sample gets larger,
the variance of this estimate gets smaller.
This property is often exploited in a wide
variety of applications, including general
problems of statistical estimation and machine
learning, to estimate (probabilistic) quantities
of interest via Monte Carlo methods, since
most quantities of interest can be written
in terms of expectation, e.g.
P
⁡
(
X
∈
A
)
=
E
⁡
[
1
A
]
{\displaystyle \operatorname {P} ({X\in {\mathcal
{A}}})=\operatorname {E} [{\mathbf {1} }_{\mathcal
{A}}]}
, where
1
A
{\displaystyle {\mathbf {1} }_{\mathcal {A}}}
is the indicator function of the set
A
{\displaystyle {\mathcal {A}}}
.
In classical mechanics, the center of mass
is an analogous concept to expectation. For
example, suppose X is a discrete random variable
with values xi and corresponding probabilities
pi. Now consider a weightless rod on which
are placed weights, at locations xi along
the rod and having masses pi (whose sum is
one). The point at which the rod balances
is E[X].
Expected values can also be used to compute
the variance, by means of the computational
formula for the variance
Var
⁡
(
X
)
=
E
⁡
[
X
2
]
−
(
E
⁡
[
X
]
)
2
.
{\displaystyle \operatorname {Var} (X)=\operatorname
{E} [X^{2}]-(\operatorname {E} [X])^{2}.}
A very important application of the expectation
value is in the field of quantum mechanics.
The expectation value of a quantum mechanical
operator
A
^
{\displaystyle {\hat {A}}}
operating on a quantum state vector
|
ψ
⟩
{\displaystyle |\psi \rangle }
is written as
⟨
A
^
⟩
=
⟨
ψ
|
A
|
ψ
⟩
{\displaystyle \langle {\hat {A}}\rangle =\langle
\psi |A|\psi \rangle }
. The uncertainty in
A
^
{\displaystyle {\hat {A}}}
can be calculated using the formula
(
Δ
A
)
2
=
⟨
A
^
2
⟩
−
⟨
A
^
⟩
2
{\displaystyle (\Delta A)^{2}=\langle {\hat
{A}}^{2}\rangle -\langle {\hat {A}}\rangle
^{2}}
.
== Basic properties ==
The basic properties below replicate or follow
immediately from those of Lebesgue integral.
Let
1
A
{\displaystyle {\mathbf {1} }_{A}}
denote the indicator function of an event
A
{\displaystyle A}
. Then
E
⁡
[
1
A
]
=
1
⋅
P
⁡
(
A
)
+
0
⋅
P
⁡
(
Ω
∖
A
)
=
P
⁡
(
A
)
.
{\displaystyle \operatorname {E} [{\mathbf
{1} }_{A}]=1\cdot \operatorname {P} (A)+0\cdot
\operatorname {P} (\Omega \setminus A)=\operatorname
{P} (A).}
If
X
=
Y
{\displaystyle X=Y}
(a.s.), then
E
⁡
[
X
]
=
E
⁡
[
Y
]
{\displaystyle \operatorname {E} [X]=\operatorname
{E} [Y]}
.If
X
=
c
{\displaystyle X=c}
(a.s.) for some constant
c
∈
[
−
∞
,
+
∞
]
{\displaystyle c\in [-\infty ,+\infty ]}
, then
E
⁡
[
X
]
=
c
{\displaystyle \operatorname {E} [X]=c}
. In particular, for a random variable
X
{\displaystyle X}
with well-defined expectation,
E
⁡
[
E
⁡
[
X
]
]
=
E
⁡
[
X
]
{\displaystyle \operatorname {E} [\operatorname
{E} [X]]=\operatorname {E} [X]}
.If
X
≥
0
{\displaystyle X\geq 0}
(a.s.), then
E
⁡
[
X
]
≥
0
{\displaystyle \operatorname {E} [X]\geq 0}
.Linearity of expectation: The expected value
operator (or expectation operator)
E
⁡
[
⋅
]
{\displaystyle \operatorname {E} [\cdot ]}
is linear in the sense that, for any random
variables
X
{\displaystyle X}
and
Y
{\displaystyle Y}
, and a constant
a
{\displaystyle a}
,
E
⁡
[
X
+
Y
]
=
E
⁡
[
X
]
+
E
⁡
[
Y
]
,
E
⁡
[
a
X
]
=
a
E
⁡
[
X
]
,
{\displaystyle {\begin{aligned}\operatorname
{E} [X+Y]&=\operatorname {E} [X]+\operatorname
{E} [Y],\\\operatorname {E} [aX]&=a\operatorname
{E} [X],\end{aligned}}}
whenever the right-hand side is well-defined.The
following statements regarding a random variable
X
{\displaystyle X}
are equivalent:
E
⁡
[
X
]
{\displaystyle \operatorname {E} [X]}
exists and is finite.
Both
E
⁡
[
X
+
]
{\displaystyle \operatorname {E} [X^{+}]}
and
E
⁡
[
X
−
]
{\displaystyle \operatorname {E} [X^{-}]}
are finite.
E
⁡
[
|
X
|
]
{\displaystyle \operatorname {E} [|X|]}
is finite.Sketch of proof: Indeed,
|
X
|
=
X
+
+
X
−
{\displaystyle |X|=X^{+}+X^{-}}
. By linearity,
E
⁡
[
|
X
|
]
=
E
⁡
[
X
+
]
+
E
⁡
[
X
−
]
{\displaystyle \operatorname {E} [|X|]=\operatorname
{E} [X^{+}]+\operatorname {E} [X^{-}]}
.
For the reasons above, the expressions "
X
{\displaystyle X}
is integrable" and "the expected value of
X
{\displaystyle X}
is finite" are used interchangeably throughout
this article.Monotonicity: If
X
≤
Y
{\displaystyle X\leq Y}
(a.s.), and both
E
⁡
[
X
]
{\displaystyle \operatorname {E} [X]}
and
E
⁡
[
Y
]
{\displaystyle \operatorname {E} [Y]}
exist, then
E
⁡
[
X
]
≤
E
⁡
[
Y
]
{\displaystyle \operatorname {E} [X]\leq \operatorname
{E} [Y]}
.Proof follows from the linearity and the
previous property for
Z
=
Y
−
X
{\displaystyle Z=Y-X}
, since
Z
≥
0
{\displaystyle Z\geq 0}
(a.s.).Non-degeneracy: If
E
⁡
|
X
|
=
0
{\displaystyle \operatorname {E} |X|=0}
, then
X
=
0
{\displaystyle X=0}
(a.s.).
=== If ===
E
⁡
[
X
]
<
+
∞
{\displaystyle \operatorname {E} [X]<+\infty
}
then
X
<
+
∞
{\displaystyle X<+\infty }
(a.s.)
==== Corollary: if ====
E
⁡
[
X
]
>
−
∞
{\displaystyle \operatorname {E} [X]>-\infty
}
then
X
>
−
∞
{\displaystyle X>-\infty }
(a.s.)
==== Corollary: if ====
E
⁡
|
X
|
<
∞
{\displaystyle \operatorname {E} |X|<\infty
}
then
X
≠
±
∞
{\displaystyle X\neq \pm \infty }
(a.s.)
=== ===
|
E
⁡
[
X
]
|
≤
E
⁡
|
X
|
{\displaystyle |\operatorname {E} [X]|\leq
\operatorname {E} |X|}
For an arbitrary random variable
X
{\displaystyle X}
,
|
E
⁡
[
X
]
|
≤
E
⁡
|
X
|
{\displaystyle |\operatorname {E} [X]|\leq
\operatorname {E} |X|}
.
Proof. By definition of Lebesgue integral,
|
E
⁡
[
X
]
|
=
|
E
⁡
[
X
+
]
−
E
⁡
[
X
−
]
|
≤
|
E
⁡
[
X
+
]
|
+
|
E
⁡
[
X
−
]
|
=
E
⁡
[
X
+
]
+
E
⁡
[
X
−
]
=
E
⁡
[
X
+
+
X
−
]
=
E
⁡
|
X
|
.
{\displaystyle {\begin{aligned}|\operatorname
{E} [X]|&={\Bigl |}\operatorname {E} [X_{+}]-\operatorname
{E} [X_{-}]{\Bigr |}\leq {\Bigl |}\operatorname
{E} [X_{+}]{\Bigr |}+{\Bigl |}\operatorname
{E} [X_{-}]{\Bigr |}\\[5pt]&=\operatorname
{E} [X_{+}]+\operatorname {E} [X_{-}]=\operatorname
{E} [X_{+}+X_{-}]\\[5pt]&=\operatorname {E}
|X|.\end{aligned}}}
This result can also be proved based on Jensen's
inequality.
=== Non-multiplicativity ===
In general, the expected value operator is
not multiplicative, i.e.
E
⁡
[
X
Y
]
{\displaystyle \operatorname {E} [XY]}
is not necessarily equal to
E
⁡
[
X
]
⋅
E
⁡
[
Y
]
{\displaystyle \operatorname {E} [X]\cdot
\operatorname {E} [Y]}
. Indeed, let
X
{\displaystyle X}
assume the values of 1 and -1 with probability
0.5 each. Then
(
E
⁡
[
X
]
)
2
=
(
1
2
⋅
(
−
1
)
+
1
2
⋅
1
)
2
=
0
,
{\displaystyle \left(\operatorname {E} [X]\right)^{2}=\left({\frac
{1}{2}}\cdot (-1)+{\frac {1}{2}}\cdot 1\right)^{2}=0,}
and
E
⁡
[
X
2
]
=
1
2
⋅
(
−
1
)
2
+
1
2
⋅
1
2
=
1
,
so
E
⁡
[
X
2
]
≠
E
2
⁡
[
X
]
.
{\displaystyle \operatorname {E} [X^{2}]={\frac
{1}{2}}\cdot (-1)^{2}+{\frac {1}{2}}\cdot
1^{2}=1,{\text{ so }}\operatorname {E} [X^{2}]\neq
\operatorname {E^{2}} [X].}
However, if
X
{\displaystyle X}
and
Y
{\displaystyle Y}
are independent, then
E
⁡
[
X
Y
]
=
E
⁡
[
X
]
E
⁡
[
Y
]
{\displaystyle \operatorname {E} [XY]=\operatorname
{E} [X]\operatorname {E} [Y]}
.
=== Counterexample: ===
E
⁡
[
X
i
]
↛
E
⁡
[
X
]
{\displaystyle \operatorname {E} [X_{i}]\not
\to \operatorname {E} [X]}
despite
X
i
→
X
{\displaystyle X_{i}\to X}
pointwise
Let
(
[
0
,
1
]
,
B
[
0
,
1
]
,
P
)
{\displaystyle \left([0,1],{\mathcal {B}}_{[0,1]},{\mathrm
{P} }\right)}
be the probability space, where
B
[
0
,
1
]
{\displaystyle {\mathcal {B}}_{[0,1]}}
is the Borel
σ
{\displaystyle \sigma }
-algebra on
[
0
,
1
]
{\displaystyle [0,1]}
and
P
{\displaystyle {\mathrm {P} }}
the linear Lebesgue measure. For
i
≥
1
,
{\displaystyle i\geq 1,}
define a sequence of random variables
X
i
=
i
⋅
1
[
0
,
1
i
]
{\displaystyle X_{i}=i\cdot {\mathbf {1} }_{\left[0,{\frac
{1}{i}}\right]}}
and a random variable
X
=
{
+
∞
if
x
=
0
0
otherwise.
{\displaystyle X={\begin{cases}+\infty &{\text{if}}\
x=0\\0&{\text{otherwise.}}\end{cases}}}
on
[
0
,
1
]
{\displaystyle [0,1]}
, with
1
S
{\displaystyle {\mathbf {1} }_{S}}
being the indicator function of the set
S
⊆
[
0
,
1
]
{\displaystyle S\subseteq [0,1]}
.
For every
x
∈
[
0
,
1
]
,
{\displaystyle x\in [0,1],}
as
i
→
+
∞
,
{\displaystyle i\to +\infty ,}
X
i
(
x
)
→
X
(
x
)
,
{\displaystyle X_{i}(x)\to X(x),}
and
E
⁡
[
X
i
]
=
i
⋅
P
(
[
0
,
1
i
]
)
=
i
⋅
1
i
=
1
,
{\displaystyle \operatorname {E} [X_{i}]=i\cdot
{\mathrm {P} }\left(\left[0,{\frac {1}{i}}\right]\right)=i\cdot
{\dfrac {1}{i}}=1,}
so
lim
i
→
∞
E
⁡
[
X
i
]
=
1.
{\displaystyle \lim _{i\to \infty }\operatorname
{E} [X_{i}]=1.}
On the other hand,
P
⁡
(
{
0
}
)
=
0
,
{\displaystyle \mathop {\mathrm {P} } (\{0\})=0,}
and hence
E
⁡
[
X
]
=
0.
{\displaystyle \operatorname {E} \left[X\right]=0.}
=== Countable non-additivity ===
In general, the expected value operator is
not
σ
{\displaystyle \sigma }
-additive, i.e.
E
⁡
[
∑
i
=
0
∞
X
i
]
≠
∑
i
=
0
∞
E
⁡
[
X
i
]
.
{\displaystyle \operatorname {E} \left[\sum
_{i=0}^{\infty }X_{i}\right]\neq \sum _{i=0}^{\infty
}\operatorname {E} [X_{i}].}
By way of counterexample, let
(
[
0
,
1
]
,
B
[
0
,
1
]
,
P
)
{\displaystyle \left([0,1],{\mathcal {B}}_{[0,1]},{\mathrm
{P} }\right)}
be the probability space, where
B
[
0
,
1
]
{\displaystyle {\mathcal {B}}_{[0,1]}}
is the Borel
σ
{\displaystyle \sigma }
-algebra on
[
0
,
1
]
{\displaystyle [0,1]}
and
P
{\displaystyle {\mathrm {P} }}
the linear Lebesgue measure. Define a sequence
of random variables
X
i
=
(
i
+
1
)
⋅
1
[
0
,
1
i
+
1
]
−
i
⋅
1
[
0
,
1
i
]
{\displaystyle \textstyle X_{i}=(i+1)\cdot
{\mathbf {1} }_{\left[0,{\frac {1}{i+1}}\right]}-i\cdot
{\mathbf {1} }_{\left[0,{\frac {1}{i}}\right]}}
on
[
0
,
1
]
{\displaystyle [0,1]}
, with
1
S
{\displaystyle {\mathbf {1} }_{S}}
being the indicator function of the set
S
⊆
[
0
,
1
]
{\displaystyle S\subseteq [0,1]}
. For the pointwise sums, we have
∑
i
=
0
n
X
i
=
(
n
+
1
)
⋅
1
[
0
,
1
n
+
1
]
,
{\displaystyle \sum _{i=0}^{n}X_{i}=(n+1)\cdot
{\mathbf {1} }_{\left[0,{\frac {1}{n+1}}\right]},}
∑
i
=
0
∞
X
i
(
x
)
=
{
+
∞
if
x
=
0
0
otherwise.
{\displaystyle \sum _{i=0}^{\infty }X_{i}(x)={\begin{cases}+\infty
&{\text{if}}\ x=0\\0&{\text{otherwise.}}\end{cases}}}
By finite additivity,
∑
i
=
0
∞
E
⁡
[
X
i
]
=
lim
n
→
∞
∑
i
=
0
n
E
⁡
[
X
i
]
=
lim
n
→
∞
E
⁡
[
∑
i
=
0
n
X
i
]
=
1.
{\displaystyle \sum _{i=0}^{\infty }\operatorname
{E} [X_{i}]=\lim _{n\to \infty }\sum _{i=0}^{n}\operatorname
{E} [X_{i}]=\lim _{n\to \infty }\operatorname
{E} \left[\sum _{i=0}^{n}X_{i}\right]=1.}
On the other hand,
P
⁡
(
{
0
}
)
=
0
,
{\displaystyle \mathop {\mathrm {P} } (\{0\})=0,}
and hence
E
⁡
[
∑
i
=
0
∞
X
i
]
=
0
≠
1
=
∑
i
=
0
∞
E
⁡
[
X
i
]
.
{\displaystyle \operatorname {E} \left[\sum
_{i=0}^{\infty }X_{i}\right]=0\neq 1=\sum
_{i=0}^{\infty }\operatorname {E} [X_{i}].}
=== Countable additivity for non-negative
random variables ===
Let
{
X
i
}
i
=
0
∞
{\displaystyle \{X_{i}\}_{i=0}^{\infty }}
be non-negative random variables. It follows
from monotone convergence theorem that
E
⁡
[
∑
i
=
0
∞
X
i
]
=
∑
i
=
0
∞
E
⁡
[
X
i
]
.
{\displaystyle \operatorname {E} \left[\sum
_{i=0}^{\infty }X_{i}\right]=\sum _{i=0}^{\infty
}\operatorname {E} [X_{i}].}
== Inequalities ==
=== Cauchy–Bunyakovsky–Schwarz inequality
===
The Cauchy–Bunyakovsky–Schwarz inequality
states that
(
E
⁡
[
X
Y
]
)
2
≤
E
⁡
[
X
2
]
⋅
E
⁡
[
Y
2
]
.
{\displaystyle (\operatorname {E} [XY])^{2}\leq
\operatorname {E} [X^{2}]\cdot \operatorname
{E} [Y^{2}].}
=== Markov's inequality ===
For a nonnegative random variable
X
{\displaystyle X}
and
a
>
0
{\displaystyle a>0}
, Markov's inequality states that
P
⁡
(
X
≥
a
)
≤
E
⁡
[
X
]
a
.
{\displaystyle \operatorname {P} (X\geq a)\leq
{\frac {\operatorname {E} [X]}{a}}.}
=== Bienaymé-Chebyshev inequality ===
Let
X
{\displaystyle X}
be an arbitrary random variable with finite
expected value
E
⁡
[
X
]
{\displaystyle \operatorname {E} [X]}
and finite variance
Var
⁡
[
X
]
≠
0
{\displaystyle \operatorname {Var} [X]\neq
0}
. The Bienaymé-Chebyshev inequality states
that, for any real number
k
>
0
{\displaystyle k>0}
,
P
⁡
(
|
X
−
E
⁡
[
X
]
|
≥
k
Var
⁡
[
X
]
)
≤
1
k
2
.
{\displaystyle \operatorname {P} {\Bigl (}{\Bigl
|}X-\operatorname {E} [X]{\Bigr |}\geq k{\sqrt
{\operatorname {Var} [X]}}{\Bigr )}\leq {\frac
{1}{k^{2}}}.}
=== Jensen's inequality ===
Let
f
:
R
→
R
{\displaystyle f:{\mathbb {R} }\to {\mathbb
{R} }}
be a Borel convex function and
X
{\displaystyle X}
a random variable such that
E
⁡
|
X
|
<
∞
{\displaystyle \operatorname {E} |X|<\infty
}
. Jensen's inequality states that
f
(
E
⁡
(
X
)
)
≤
E
⁡
(
f
(
X
)
)
.
{\displaystyle f(\operatorname {E} (X))\leq
\operatorname {E} (f(X)).}
Remark 1. The expected value
E
⁡
(
f
(
X
)
)
{\displaystyle \operatorname {E} (f(X))}
is well-defined even if
X
{\displaystyle X}
is allowed to assume infinite values. Indeed,
E
⁡
|
X
|
<
∞
{\displaystyle \operatorname {E} |X|<\infty
}
implies that
X
≠
±
∞
{\displaystyle X\neq \pm \infty }
(a.s.), so the random variable
f
(
X
(
ω
)
)
{\displaystyle f(X(\omega ))}
is defined almost sure, and therefore there
is enough information to compute
E
⁡
(
f
(
X
)
)
.
{\displaystyle \operatorname {E} (f(X)).}
Remark 2. Jensen's inequality implies that
|
E
⁡
[
X
]
|
≤
E
⁡
|
X
|
{\displaystyle |\operatorname {E} [X]|\leq
\operatorname {E} |X|}
since the absolute value function is convex.
=== Lyapunov's inequality ===
Let
0
<
s
<
t
{\displaystyle 0<s<t}
. Lyapunov's inequality states that
(
E
⁡
|
X
|
s
)
1
/
s
≤
(
E
⁡
|
X
|
t
)
1
/
t
.
{\displaystyle {\Bigl (}\operatorname {E}
|X|^{s}{\Bigr )}^{1/s}\leq \left(\operatorname
{E} |X|^{t}\right)^{1/t}.}
Proof. Applying Jensen's inequality to
|
X
|
s
{\displaystyle |X|^{s}}
and
g
(
x
)
=
|
x
|
t
/
s
{\displaystyle g(x)=|x|^{t/s}}
, obtain
|
E
⁡
|
X
s
|
|
t
/
s
≤
E
⁡
|
X
s
|
t
/
s
=
E
⁡
|
X
|
t
{\displaystyle {\Bigl |}\operatorname {E}
|X^{s}|{\Bigr |}^{t/s}\leq \operatorname {E}
|X^{s}|^{t/s}=\operatorname {E} |X|^{t}}
. Taking the
t
{\displaystyle t}
th
root of each side completes the proof.
Corollary.
E
⁡
|
X
|
≤
(
E
⁡
|
X
|
2
)
1
/
2
≤
⋯
≤
(
E
⁡
|
X
|
n
)
1
/
n
≤
⋯
{\displaystyle \operatorname {E} |X|\leq {\Bigl
(}\operatorname {E} |X|^{2}{\Bigr )}^{1/2}\leq
\cdots \leq {\Bigl (}\operatorname {E} |X|^{n}{\Bigr
)}^{1/n}\leq \cdots }
=== Hölder's inequality ===
Let
p
{\displaystyle p}
and
q
{\displaystyle q}
satisfy
1
≤
p
≤
∞
{\displaystyle 1\leq p\leq \infty }
,
1
≤
q
≤
∞
{\displaystyle 1\leq q\leq \infty }
, and
1
/
p
+
1
/
q
=
1
{\displaystyle 1/p+1/q=1}
. The Hölder's inequality states that
E
⁡
|
X
Y
|
≤
(
E
⁡
|
X
|
p
)
1
/
p
(
E
⁡
|
Y
|
q
)
1
/
q
.
{\displaystyle \operatorname {E} |XY|\leq
(\operatorname {E} |X|^{p})^{1/p}(\operatorname
{E} |Y|^{q})^{1/q}.}
=== Minkowski inequality ===
Let
p
{\displaystyle p}
be an integer satisfying
1
≤
p
≤
∞
{\displaystyle 1\leq p\leq \infty }
. Let, in addition,
E
⁡
|
X
|
p
<
∞
{\displaystyle \operatorname {E} |X|^{p}<\infty
}
and
E
⁡
|
Y
|
p
<
∞
{\displaystyle \operatorname {E} |Y|^{p}<\infty
}
. Then, according to the Minkowski inequality,
E
⁡
|
X
+
Y
|
p
<
∞
{\displaystyle \operatorname {E} |X+Y|^{p}<\infty
}
and
(
E
⁡
|
X
+
Y
|
p
)
1
/
p
≤
(
E
⁡
|
X
|
p
)
1
/
p
+
(
E
⁡
|
Y
|
p
)
1
/
p
.
{\displaystyle {\Bigl (}\operatorname {E}
|X+Y|^{p}{\Bigr )}^{1/p}\leq {\Bigl (}\operatorname
{E} |X|^{p}{\Bigr )}^{1/p}+{\Bigl (}\operatorname
{E} |Y|^{p}{\Bigr )}^{1/p}.}
== Taking limits under the ==
E
{\displaystyle \operatorname {E} }
sign
=== Monotone convergence theorem ===
Let the sequence of random variables
{
X
n
}
{\displaystyle \{X_{n}\}}
and the random variables
X
{\displaystyle X}
and
Y
{\displaystyle Y}
be defined on the same probability space
(
Ω
,
Σ
,
P
)
.
{\displaystyle (\Omega ,\Sigma ,\operatorname
{P} ).}
Suppose that
all the expected values
E
⁡
[
X
n
]
,
{\displaystyle \operatorname {E} [X_{n}],}
E
⁡
[
X
]
,
{\displaystyle \operatorname {E} [X],}
and
E
⁡
[
Y
]
{\displaystyle \operatorname {E} [Y]}
are defined (differ from
∞
−
∞
{\displaystyle \infty -\infty }
);
E
⁡
[
Y
]
>
−
∞
;
{\displaystyle \operatorname {E} [Y]>-\infty
;}
for every
n
,
{\displaystyle n,}
−
∞
≤
Y
≤
X
n
≤
X
n
+
1
≤
+
∞
(a.s.)
;
{\displaystyle -\infty \leq Y\leq X_{n}\leq
X_{n+1}\leq +\infty \quad {\hbox{(a.s.)}};}
X
{\displaystyle X}
is the pointwise limit of
{
X
n
}
{\displaystyle \{X_{n}\}}
(a.s.), i.e.
X
(
ω
)
=
lim
n
X
n
(
ω
)
{\displaystyle X(\omega )=\lim \nolimits _{n}X_{n}(\omega
)}
(a.s.).The monotone convergence theorem states
that
lim
n
E
⁡
[
X
n
]
=
E
⁡
[
X
]
.
{\displaystyle \lim _{n}\operatorname {E}
[X_{n}]=\operatorname {E} [X].}
=== Fatou's lemma ===
Let the sequence of random variables
{
X
n
}
{\displaystyle \{X_{n}\}}
and the random variable
Y
{\displaystyle Y}
be defined on the same probability space
(
Ω
,
Σ
,
P
)
.
{\displaystyle (\Omega ,\Sigma ,\operatorname
{P} ).}
Suppose that
all the expected values
E
⁡
[
X
n
]
,
{\displaystyle \operatorname {E} [X_{n}],}
E
⁡
[
lim inf
n
X
n
]
,
{\displaystyle \textstyle \operatorname {E}
[\liminf _{n}X_{n}],}
and
E
⁡
[
Y
]
{\displaystyle \operatorname {E} [Y]}
are defined (differ from
∞
−
∞
{\displaystyle \infty -\infty }
);
E
⁡
[
Y
]
>
−
∞
;
{\displaystyle \operatorname {E} [Y]>-\infty
;}
−
∞
≤
Y
≤
X
n
≤
+
∞
{\displaystyle -\infty \leq Y\leq X_{n}\leq
+\infty }
(a.s.), for every
n
.
{\displaystyle n.}
Fatou's lemma states that
E
⁡
[
lim inf
n
X
n
]
≤
lim inf
n
E
⁡
[
X
n
]
.
{\displaystyle \operatorname {E} [\liminf
_{n}X_{n}]\leq \liminf _{n}\operatorname {E}
[X_{n}].}
(
lim inf
n
X
n
{\displaystyle \textstyle \liminf _{n}X_{n}}
is a random variable, for every
n
,
{\displaystyle n,}
by the properties of limit inferior).
Corollary. Let
X
n
→
X
{\displaystyle X_{n}\to X}
pointwise (a.s.);
E
⁡
[
X
n
]
≤
C
,
{\displaystyle \operatorname {E} [X_{n}]\leq
C,}
for some constant
C
{\displaystyle C}
(independent from
n
{\displaystyle n}
);
E
⁡
[
Y
]
>
−
∞
;
{\displaystyle \operatorname {E} [Y]>-\infty
;}
−
∞
≤
Y
≤
X
n
≤
+
∞
{\displaystyle -\infty \leq Y\leq X_{n}\leq
+\infty }
(a.s.), for every
n
.
{\displaystyle n.}
Then
E
⁡
[
X
]
≤
C
.
{\displaystyle \operatorname {E} [X]\leq C.}
Proof is by observing that
X
=
lim inf
n
X
n
{\displaystyle \textstyle X=\liminf _{n}X_{n}}
(a.s.) and applying Fatou's lemma.
=== Dominated convergence theorem ===
Let
{
X
n
}
n
{\displaystyle \{X_{n}\}_{n}}
be a sequence of random variables. If
X
n
→
X
{\displaystyle X_{n}\to X}
pointwise (a.s.),
|
X
n
|
≤
Y
≤
+
∞
{\displaystyle |X_{n}|\leq Y\leq +\infty }
(a.s.), and
E
⁡
[
Y
]
<
∞
{\displaystyle \operatorname {E} [Y]<\infty
}
. Then, according to the dominated convergence
theorem,
the function
X
{\displaystyle X}
is measurable (hence a random variable);
E
⁡
|
X
|
<
∞
{\displaystyle \operatorname {E} |X|<\infty
}
;
all the expected values
E
⁡
[
X
n
]
{\displaystyle \operatorname {E} [X_{n}]}
and
E
⁡
[
X
]
{\displaystyle \operatorname {E} [X]}
are defined (do not have the form
∞
−
∞
{\displaystyle \infty -\infty }
);
lim
n
E
⁡
[
X
n
]
=
E
⁡
[
X
]
{\displaystyle \lim _{n}\operatorname {E}
[X_{n}]=\operatorname {E} [X]}
(both sides may be infinite);
lim
n
E
⁡
|
X
n
−
X
|
=
0.
{\displaystyle \lim _{n}\operatorname {E}
|X_{n}-X|=0.}
=== Uniform integrability ===
In some cases, the equality
lim
n
E
⁡
[
X
n
]
=
E
⁡
[
lim
n
X
n
]
{\displaystyle \displaystyle \lim _{n}\operatorname
{E} [X_{n}]=\operatorname {E} [\lim _{n}X_{n}]}
holds when the sequence
{
X
n
}
{\displaystyle \{X_{n}\}}
is uniformly integrable.
== Relationship with characteristic function
==
The probability density function
f
X
{\displaystyle f_{X}}
of a scalar random variable
X
{\displaystyle X}
is related to its characteristic function
φ
X
{\displaystyle \varphi _{X}}
by the inversion formula:
f
X
(
x
)
=
1
2
π
∫
R
e
−
i
t
x
φ
X
(
t
)
d
t
.
{\displaystyle f_{X}(x)={\frac {1}{2\pi }}\int
_{\mathbb {R} }e^{-itx}\varphi _{X}(t)\,dt.}
For the expected value of
g
(
X
)
{\displaystyle g(X)}
(where
g
:
R
→
R
{\displaystyle g:{\mathbb {R} }\to {\mathbb
{R} }}
is a Borel function), we can use this inversion
formula to obtain
E
⁡
[
g
(
X
)
]
=
1
2
π
∫
R
g
(
x
)
[
∫
R
e
−
i
t
x
φ
X
(
t
)
d
t
]
d
x
.
{\displaystyle \operatorname {E} [g(X)]={\frac
{1}{2\pi }}\int _{\mathbb {R} }g(x)\left[\int
_{\mathbb {R} }e^{-itx}\varphi _{X}(t)\,dt\right]\,dx.}
If
E
⁡
[
g
(
X
)
]
{\displaystyle \operatorname {E} [g(X)]}
is finite, changing the order of integration,
we get, in accordance with Fubini–Tonelli
theorem,
E
⁡
[
g
(
X
)
]
=
1
2
π
∫
R
G
(
t
)
φ
X
(
t
)
d
t
,
{\displaystyle \operatorname {E} [g(X)]={\frac
{1}{2\pi }}\int _{\mathbb {R} }G(t)\varphi
_{X}(t)\,dt,}
where
G
(
t
)
=
∫
R
g
(
x
)
e
−
i
t
x
d
x
{\displaystyle G(t)=\int _{\mathbb {R} }g(x)e^{-itx}\,dx}
is the Fourier transform of
g
(
x
)
.
{\displaystyle g(x).}
The expression for
E
⁡
[
g
(
X
)
]
{\displaystyle \operatorname {E} [g(X)]}
also follows directly from Plancherel theorem.
== The law of the unconscious statistician
==
The expected value of a measurable function
of
X
{\displaystyle X}
,
g
(
X
)
{\displaystyle g(X)}
, given that
X
{\displaystyle X}
has a probability density function
f
(
x
)
{\displaystyle f(x)}
, is given by the inner product of
f
{\displaystyle f}
and
g
{\displaystyle g}
:
E
⁡
[
g
(
X
)
]
=
∫
R
g
(
x
)
f
(
x
)
d
x
.
{\displaystyle \operatorname {E} [g(X)]=\int
_{\mathbb {R} }g(x)f(x)\,dx.}
This formula also holds in multidimensional
case, when
g
{\displaystyle g}
is a function of several random variables,
and
f
{\displaystyle f}
is their joint density.
== Alternative formula for expected value
==
=== Formula for non-negative random variables
===
==== Finite and countably infinite case ====
For a non-negative integer-valued random variable
X
:
Ω
→
{
0
,
1
,
2
,
3
,
…
}
∪
{
+
∞
}
,
{\displaystyle X:\Omega \to \{0,1,2,3,\ldots
\}\cup \{+\infty \},}
E
⁡
[
X
]
=
∑
i
=
1
∞
P
⁡
(
X
≥
i
)
.
{\displaystyle \operatorname {E} [X]=\sum
_{i=1}^{\infty }\operatorname {P} (X\geq i).}
==== General case ====
If
X
:
Ω
→
[
0
,
+
∞
]
{\displaystyle X:\Omega \to [0,+\infty ]}
is a non-negative random variable, then
E
⁡
[
X
]
=
∫
[
0
,
+
∞
)
P
⁡
(
X
≥
x
)
d
x
=
∫
[
0
,
+
∞
)
P
⁡
(
X
>
x
)
d
x
,
{\displaystyle \operatorname {E} [X]=\int
\limits _{[0,+\infty )}\operatorname {P} (X\geq
x)\,dx=\int \limits _{[0,+\infty )}\operatorname
{P} (X>x)\,dx,}
and
E
⁡
[
X
]
=
(R)
∫
0
∞
P
⁡
(
X
≥
x
)
d
x
=
(R)
∫
0
∞
P
⁡
(
X
>
x
)
d
x
,
{\displaystyle \operatorname {E} [X]={\hbox{(R)}}\int
\limits _{0}^{\infty }\operatorname {P} (X\geq
x)\,dx={\hbox{(R)}}\int \limits _{0}^{\infty
}\operatorname {P} (X>x)\,dx,}
where
(R)
∫
0
∞
{\displaystyle {\hbox{(R)}}\textstyle \int
_{0}^{\infty }}
denotes improper Riemann integral.
=== Formula for non-positive random variables
===
If
X
:
Ω
→
[
−
∞
,
0
]
{\displaystyle X:\Omega \to [-\infty ,0]}
is a non-positive random variable, then
E
⁡
[
X
]
=
−
∫
(
−
∞
,
0
]
P
⁡
(
X
≤
x
)
d
x
=
−
∫
(
−
∞
,
0
]
P
⁡
(
X
<
x
)
d
x
,
{\displaystyle \operatorname {E} [X]=-\int
\limits _{(-\infty ,0]}\operatorname {P} (X\leq
x)\,dx=-\int \limits _{(-\infty ,0]}\operatorname
{P} (X<x)\,dx,}
and
E
⁡
[
X
]
=
−
(R)
∫
−
∞
0
P
⁡
(
X
≤
x
)
d
x
=
−
(R)
∫
−
∞
0
P
⁡
(
X
<
x
)
d
x
,
{\displaystyle \operatorname {E} [X]=-{\hbox{(R)}}\int
\limits _{-\infty }^{0}\operatorname {P} (X\leq
x)\,dx=-{\hbox{(R)}}\int \limits _{-\infty
}^{0}\operatorname {P} (X<x)\,dx,}
where
(R)
∫
−
∞
0
{\displaystyle {\hbox{(R)}}\textstyle \int
_{-\infty }^{0}}
denotes improper Riemann integral.
This formula follows from that for the non-negative
case applied to
−
X
.
{\displaystyle -X.}
If, in addition,
X
{\displaystyle X}
is integer-valued, i.e.
X
:
Ω
→
{
…
,
−
3
,
−
2
,
−
1
,
0
}
∪
{
−
∞
}
{\displaystyle X:\Omega \to \{\ldots ,-3,-2,-1,0\}\cup
\{-\infty \}}
, then
E
⁡
[
X
]
=
−
∑
i
=
−
1
−
∞
P
⁡
(
X
≤
i
)
.
{\displaystyle \operatorname {E} [X]=-\sum
_{i=-1}^{-\infty }\operatorname {P} (X\leq
i).}
=== General case ===
If
X
{\displaystyle X}
can be both positive and negative, then
E
⁡
[
X
]
=
E
⁡
[
X
+
]
−
E
⁡
[
X
−
]
{\displaystyle \operatorname {E} [X]=\operatorname
{E} [X_{+}]-\operatorname {E} [X_{-}]}
,
and the above results may be applied to
X
+
{\displaystyle X_{+}}
and
X
−
{\displaystyle X_{-}}
separately.
== See also ==
Center of mass
Central tendency
Chebyshev's inequality (an inequality on location
and scale parameters)
Conditional expectation
Expected value is also a key concept in economics,
finance, and many other subjects
The general term expectation
Expectation value (quantum mechanics)
Law of total expectation –the expected value
of the conditional expected value of X given
Y is the same as the expected value of X.
Moment (mathematics)
Nonlinear expectation (a generalization of
the expected value)
Wald's equation for calculating the expected
value of a random number of random variables
