We’ve seen how we can describe light using
different models, each model more complicated
and more complete than the preceding one.
The simplest model is the ray model, which
says that each object point scatters rays
of light in all directions, and that each
ray travels in a straight line. With this
model, the pinhole camera can be explained.
In the double-slit experiment, one can observe
interference fringes which cannot be explained
using the ray model. Rather, to explain this
phenomenon we have to model light as a wave.
Later we found that light has a property called
polarization, which can be demonstrated using
polarization filters: depending on the orientation
of the filter, different amounts of light
can be transmitted. So we found that light
is described using a vectorial wave equation.
More specifically, it was found that light
is an electromagnetic wave, and its wave equation
can be derived from Maxwell’s equations
that govern electromagnetism.
In classical physics, this is the most complete
and accurate model of light. However, in the
early twentieth century, certain observations
indicated that light comes in discrete energy
packets, or quanta, which is not predicted
by Maxwell’s equations. One such observation
comes from blackbody radiation. To accurately
describe the radiation spectrum of a blackbody
as a function of its temperature, it was assumed
that this radiation is emitted or absorbed
in discrete packets of energy.
Another observation was the photoelectric
effect. When shining light on a metal surface,
electrons are emitted, and to explain the
rate at which these electrons are emitted,
it was assumed that light comes in discrete
packets of energy, called photons.
A third observation was Compton scattering.
If light hits a particle from a certain direction,
the light gets scattered in one direction
while at the same time getting a longer wavelength,
and the particle recoils in another direction.
The way this scattering occurs could be explained
by assuming that light is quantized.
So we have these two conflicting models of
light, both well-supported by experiment:
one stating that light is an electromagnetic
wave described by Maxwell’s equations, the
other stating that light acts more like a
particle, coming in discrete packets of energy
called quanta, or photons. Naturally, the
question arises: how do we reconcile these
two models?
The short answer would be: quantum mechanics.
We will elaborate on this answer in four parts.
First, we need to understand how in quantum
mechanics particles have a wave-like character.
For example, an electron orbiting the nucleus
of an atom isn’t really like a planet orbiting
the Sun. Rather, it is described by a wavefunction
that is present all around the nucleus.
Next, we need to understand how we can assign
observables to a particle if it is described
by a wavefunction. More specifically, if we
could imagine an electron as a point particle
orbiting the nucleus, we could at each point
in time assign it a position and a momentum.
But if it is instead described as a wave function
that is always spread out over space, then
what would its position or momentum be? We
will see that the position isn’t a vector
anymore, but an operator, and just like there
is a position operator, there is also a momentum
operator.
Once we know how to describe the behavior
of a particle using quantum mechanical theory,
we will study a particle in a particular system,
namely the Harmonic Oscillator. In a classical
harmonic oscillator, the particle would oscillate
the same way an electric field in an electromagnetic
wave would oscillate. So once we know how
to treat the Harmonic Oscillator quantum mechanically,
we’ll be able to quantize the electromagnetic
field.
Let’s see how we can describe a particle
as a wave. We can imagine some wave function
psi(x) that somehow describes the particle.
In quantum mechanics, this wave function is
typically denoted using special brackets.
The reason we want to do this is to distinguish
the physical object from a particular representation.
For example, we can write down the wave function
as a function of position, but we could also
take the Fourier transform and write the same
wave function as a function of spatial frequency.
We’re still describing the same particle,
and nothing physically changed, but we simply
represent the same physical object in two
different ways.
As another example of how the same physical
object can be represented in different ways,
consider some vector. Currently it’s a physical
object that exists, but we cannot yet represent
it with some numbers. Once we choose a certain
coordinate system, or basis vectors, we can
express the vector in terms of these basis
vectors. But if we choose a different coordinate
system, or different basis vectors, then we
will use different numbers to represent the
same physical object.
More generally, we can describe it as follows.
If we have two basis vectors e_1 and e_2,
then we can represent the vector psi by projecting
it onto these basis vectors. So the first
coefficient is given by the inner product
of psi with e_1, and the second coefficient
is given by the inner product of psi with
e_2. We could write these inner products more
explicitly by multiplying a row vector with
a column vector. In quantum mechanics, the
row vector is called a bra-vector, and the
column vector is called a ket-vector, so that
the inner product of two vectors form a bracket.
In the example we just saw, psi was a simple
two-dimensional vector, but the same holds
when psi is a function of a continuous variable.
In this case, the position-representation
of psi is found by taking the inner product
of psi with position basis vectors. This inner
product can be written as an integral, where
the basis vector for a position x is (in the
position basis) given by a delta function
that peaks at x. Similarly, the frequency
representation is found by taking the inner
product of psi with frequency basis vectors,
where the basis vector for a spatial frequency
f_x is (in the position basis) given by a
plane wave with period 1/f_x. In the frequency
basis, the frequency basis vectors would be
given by delta functions.
So we’ve seen how the same physical wave
function can be represented in different ways
by projecting the wave function onto different
basis vectors. But the more important question
is: if this wave function is supposed to represent
a particle, then how do we assign particle-like
observables (such as position and momentum)
to a wave function? After all, a point particle
has one certain position at a certain point
in time, but a wave function can be spread
out throughout space.
What we can do is try to imagine what the
wave function should look like if it were
to have a definite position or definite momentum.
A wave function with a definite position would
have to be a delta function: if the delta
function peaks at some position x_0, then
we could unambiguously say that the particle
with that wave function has a position x_0.
So what would the wave function of a particle
with definite momentum look like? According
to De Broglie’s hypothesis, the momentum
of a particle is related to its wavelength:
a particle with wavelength lambda has a momentum
of Planck’s constant h divided by the wavelength.
We can rewrite this expression in terms of
the reduced Planck’s constant h-bar, and
the wave number.
So a particle with definite momentum should
have a definite wavelength, which means its
wave function should be a plane wave, and
its wave number is equal to its momentum divided
by h-bar. Note that this means that in quantum
mechanics, a particle cannot have a definite
position and momentum at the same time, because
a wave function cannot simultaneously be a
delta peak and a plane wave.
We can now find how the momentum-representation
is related to the position-representation
of the wave function. By taking the inner
product of the wave function with the momentum
states (both in the position representation),
we find that the momentum representation psi-tilde
of p is given by the Fourier transform of
the position representation psi of x. Because
the x- and p-representations are related by
Fourier transform, they are called conjugate
variables.
So in general a particle has a wave function
that is spread out in both the position and
momentum representation. But what does it
mean for a wave function to have multiple
positions and momenta? To answer that question,
we have to consider the squared modulus of
the wave function. Note that in terms of optics,
if psi(x) were an optical field, then the
squared modulus of psi would be the intensity,
which is what we would see or measure with
a camera. If we sent an optical field through
a double slit, then the intensity we would
measure on the screen shows interference fringes.
Now imagine that instead of light, we send
electrons through the double slit. On the
screen, we would detect the electrons one
at a time, at random but definite positions,
since electrons are particles. However, if
we collect electrons over a sufficiently long
time, we will find that an interference pattern
emerges. This is evidence that the electron,
while it is a particle, also exhibits wave-like
behavior. It also indicates how we should
interpret the wave function: the squared modulus
of the wave function describes a probability
distribution, according to which the electrons
hit the screen. If we measure enough electrons,
this probability distribution is revealed.
This statistical interpretation of the wave
function is called Born’s Rule. Also note
that since modulus psi squared is a probability
distribution, it must integrate to 1. In other
words, the wave function must be normalized.
Let’s discuss another consequence of the
probabilistic interpretation of the wave function.
Let’s say the position of the electron is
described by the probability distribution
modulus psi(x) squared. Once we measure the
position of the electron (one way or another),
there is no uncertainty anymore in the position,
and so the probability distribution must collapse,
and accordingly, the wave function must collapse.
As the wave function collapses to a delta
function in the position representation, it
must become a constant function in the momentum
representation, which means that while the
position is now completely certain, the momentum
is now completely uncertain.
So there is an uncertainty between position
and momentum: if we know the position, then
the momentum is uncertain and vice versa.
We can then ask the question: can we quantify
this uncertainty between position and momentum?
Fundamentally, the uncertainty comes from
the fact that the position and momentum representations
are related by Fourier transform, and if in
one domain you make the function narrower,
then in the other domain the function becomes
broader.
We can quantify this relation using the variance,
which is given by the square of the standard
deviation, which is denoted by sigma. To calculate
the variance in x, we multiply the squared
distance from x to its mean value with the
probability distribution of x, which we just
saw is given by the squared modulus of psi
of x. Similarly we can write down the expression
for the variance in p. Using the fact that
psi of x and psi tilde of p are related by
Fourier transform, one can derive the following
relation between sigma x and sigma p: the
product of the two must always be larger than
or equal to h-bar over two. This mathematical
result is referred to as the Kennard inequality,
while its physical implications regarding
position and momentum are known as Heisenberg’s
uncertainty principle.
We have seen how we can describe a particle
using a wave function, how this leads to uncertainties
in position and momentum, and how these uncertainties
are related. The next question is: how do
we use these observables in calculations,
given that they have inherent uncertainties?
For example, if we have a quadratic energy
potential, which is proportional to x squared,
what would it mean for a particle that doesn’t
have a definite x? Or what if we want to calculate
the kinetic energy which is proportional to
p squared? How would we do that if p is not
well defined?
Let’s first look at the simplest case: what
if we have a state with definite position
x_0, or a state with definite momentum p_0?
In this case we can easily define the value
for the position, which is x_0, or the value
for the momentum, which is p_0. The values
for the potential and kinetic energies are
then also calculated straightforwardly by
plugging the value of x_0 or p_0 in the expression
for the potential or kinetic energy.
Now let’s suppose we have a state which
is the superposition of two states with definite
position, x_0 and x_1, or a state which is
the superposition of two states with definite
momentum p_0 and p_1. What would be ‘the
position’ or ‘the momentum’ of such
a state? We cannot simply assign a single
value to the position observable, or a single
value to the momentum observable. Rather,
observables now become operators that can
act on the wave function, and these operators
are indicated with a hat. When you apply the
x-operator to the wave function, then the
state with position x_0 will get multiplied
by x_0, and the state with position x_1 will
get multiplied by x_1, and the same goes for
the momentum operator. By the same reasoning,
the potential energy and kinetic energy now
also become operators, and if you apply the
potential energy operator to the wave function,
then the state with position x_0 will get
multiplied by the potential energy at x_0,
et cetera.
Let’s now extend these results to arbitrary
wave functions. If we have a wave function
psi, we can represent it in the position basis
by taking the inner product with states of
definite position, or we can represent it
in the momentum basis by taking the inner
product with states of definite momentum.
If we apply the position operator to psi,
and then express the result in the position
basis, then we have simply multiplied psi(x)
with x, and similarly, applying the momentum
operator corresponds to a multiplication with
p when in the momentum basis. If we apply
the potential energy operator to the wave
function, then we multiply by V(x) in the
position basis, and if we apply the kinetic
energy operator, then we multiply by T(p)
in the momentum basis.
So we’ve seen how to define the potential
and kinetic energy in quantum mechanics, namely
as operators that act on a wave function.
But now let’s say we want to calculate the
total energy H, which is the sum of the potential
and kinetic energy. Quantum mechanically,
H becomes an operator that acts on a wave
function psi, and we can express the result
in the position basis by projecting it onto
states with definite position. We can write
H as a sum of V and T, and what we find then
is that we need to apply T to psi in the position
basis. Now we’ve already seen how to apply
T in the momentum basis, but how do we apply
it in the position basis? Or more fundamentally,
how do we apply the momentum operator in the
position basis?
There are two ways to find the answer to this
question. One way is to simply apply p in
the momentum basis, and then express the result
in the position basis by inverse Fourier transforming
it. So let’s write down the inverse Fourier
transform of p times psi tilde of p. We can
substitute the factor p with a derivative
with respect to x, which can be taken out
of the integral. The integral can then be
expressed as psi of x, so we find that applying
the momentum operator in the position basis
means taking the derivative of the wave function.
Let’s write down the same calculation in
a somewhat different notation to get a different
perspective on what we’ve done. We wanted
to apply the momentum operator to psi, and
express the result in the x-basis. We found
that this expression should be equal to an
integral over all momenta of a momentum state
expressed in the position basis, multiplied
by the momentum operator applied to psi, expressed
in the momentum basis. We can rewrite this
expression by observing that we only integrate
over p, and by comparing the left-hand side
of the equation to the right-hand side of
the equation, we conclude that this integral
must be the identity operator.
We can intuitively see why that must be the
case by looking at a simple two-dimensional
example. Let’s say we apply this operator
to a wave function psi which is now a two-dimensional
vector. The vectors e_1 and e_2 are orthogonal
basis vectors. When applying the operator
to psi, you compute the projections of psi
onto the basis vectors, so you decompose psi
into e_1 and e_2 components. Then you sum
the components back together, to find psi
again. By changing the order of the symbols
around, one can conveniently rewrite this
procedure of decomposing psi into orthogonal
components and adding them back together as
a single operator, which is the identity operator
since it does not change the wave function.
So we saw that applying the momentum operator
means taking the derivative in position space.
Another way to find the same result is by
considering the eigenvectors and eigenvalues
of the momentum operator. Let’s consider
once more the case where the state has a definite
position x_0 or a definite momentum p_0. We
saw that when applying the position or momentum
operator to such a state, the state is simply
multiplied by the corresponding position or
momentum value. In other words, if you have
an operator that corresponds to a certain
observable, then its eigenvectors correspond
to the state with a definite value for that
observable, and that value is given by the
eigenvalue.
So in the case of the momentum operator, we
know that in the position basis the eigenvectors
are given by plane waves, and the eigenvalues
are p_0. Indeed, these are also the eigenvectors
and eigenvalues of the operator -i h-bar times
the derivative with respect to x.
So we saw that in the position representation,
the position operator corresponds to a multiplication
with x, and the momentum operator corresponds
to taking the derivative. Using this result,
we can make the following observation: if
we first apply p and then x, we get something
different than if we first apply x and then
p. In other words: x and p do not commute.
We can make a more precise statement by defining
the commutator of x and p, which is the difference
between xp and px. If we apply this operator
to an arbitrary wave function psi, then we
can write out the derivative using the product
rule and cancel some terms to find that we’ve
multiplied psi with a constant factor of i
h-bar. This result is known as the canonical
commutation relation.
Now let’s get back to our original question:
how do we calculate the total energy of a
particle quantum mechanically? The classical
expression for the total energy H, also called
the Hamiltonian, is given by the potential
energy plus the kinetic energy. In quantum
mechanics, this Hamiltonian becomes an operator,
and its expression in position space is found
by substituting the momentum by the momentum
operator, which is the derivative with respect
to position. This means that the Hamiltonian
contains a second derivative, or in the case
of three-dimensional space, it contains a
Laplace operator.
So we have found the operator for the total
energy. Remember the observation we made before:
the states of definite energy are given by
the eigenvectors of the energy operator, and
the possible values for the energy are given
by the eigenvalues of the energy operator.
One could use this equation to calculate the
energy levels of an electron in a hydrogen
atom: plug in the expression for the electrostatic
potential energy, and then solve for the eigenvalues
and eigenvectors. This procedure correctly
predicts the observed energy levels up to
a certain degree of accuracy, which is further
evidence that this is indeed the correct way
to treat a particle quantum mechanically.
We now know how to find the position, momentum,
and energy of a quantum mechanical wave function.
We now ask: how does the wave function evolve
in time? Remember that the entire idea of
quantum mechanics is to describe a particle
as a wave. Therefore, if we want to know how
the wave function of a particle evolves in
time, let’s recall how a normal, classical
wave evolves in time. A monochromatic field
has some complex amplitude phi that is a function
of position, and a time-dependent part that
evolves as e to the power -i omega t. The
time evolution of a polychromatic field is
found by summing all the monochromatic fields
together.
Now comes the quantum mechanics: according
to the Planck-Einstein relation, the angular
frequency and the energy of a particle are
related to each other. So if we substitute
omega by E over h-bar, then we have the expression
for how a quantum mechanical wave function
evolves in time. We can further rewrite the
expression by recalling that the energies
E_n are given by the eigenvalues of the Hamiltonian
H, and the functions phi_n are the eigenvectors
of H. So we can apply H to psi, and write
psi as the sum of monochromatic fields phi_n.
Because phi_n are eigenfunctions with eigenvalues
E_n, we can substitute H with E_n. The factor
of E_n can also be obtained by taking the
time derivative, and this derivative can be
taken outside of the summation. The summation
is the same as psi, so we now find the differential
equation that describes the time-evolution
of a quantum mechanical wave function. This
equation is known as the Schrodinger equation.
The eigenvalue equation that gives the energies
and the energy states is also referred to
as the time-independent Schrodinger equation.
We can draw an insightful analogy between
the time-independent Schrodinger equation
from quantum mechanics, and the Helmholtz
equation from classical optics. We can write
down the time-independent Schrodinger equation,
and rearrange its terms. We can then write
down the Helmholtz equation for a refractive
index n that changes as a function of position,
and we find that the two equations have the
same structure: in both cases, the wave function
phi is multiplied by a Laplacian plus some
spatially varying function.
So we can draw the following analogy: if light
goes from a medium with a high refractive
index like glass, to a medium with a low refractive
index like air, then the field doesn’t propagate
in the second medium if the angle of incidence
is higher than the critical angle. Instead,
there is an evanescent field that decays exponentially.
However, if there is a second medium with
a high refractive index, then light can still
propagate through the gap. This phenomenon
is known as Frustrated Total Internal Reflection.
To make the analogy with quantum mechanics,
we can write the potential energy V in terms
of the refractive index n to make the time-independent
Schrodinger equation the same as the Helmholtz
equation. We now see that a wave function
can propagate through a potential energy barrier
the same way light can propagate across a
gap of air between two glass blocks even at
an angle higher than the critical angle. In
quantum mechanics, this phenomenon is known
as quantum tunneling.
Now let’s summarize all that we have seen
so far on basic quantum mechanics. The fundamental
assumption is that we can describe the particle
with a wave function. This wave function can
be represented in different bases, for example
the position basis or the momentum basis.
The squared modulus of such a function can
be interpreted as a probability distribution
according to Born’s rule. A wave function
with a definite position would be a delta
function in the position basis, and because
of De Broglie’s relation, a wave function
with a definite momentum would be a plane
wave in the position basis. From these two
observations, it follows that the position
and momentum representations of the wave function
are related by Fourier transform.
Using properties of the Fourier transform,
one can derive that an uncertainty relation
holds between position and momentum, which
is known as Heisenberg’s uncertainty principle.
Because in general a wave function doesn’t
have a single unambiguous position or momentum,
these observables cannot be described by a
single value. Rather, they become operators
that can act on the wave function. If the
wave function has a definite position, then
applying the position operator to it simply
multiplies the wave function with that position
value. So mathematically speaking, states
with a definite observable value are given
by the eigenvectors of the observable’s
operator, and the corresponding observable’s
value is given by the eigenvalue of the operator.
Since we know the eigenstates and eigenvalues
of the position and momentum operators, we
can derive that in the position representation,
the position operator is a multiplication
with x, and the momentum operator is a derivative
with respect to x. One can straightforwardly
check that these operators don’t commute,
and that their commutator is i h-bar. We can
find the allowed energies and their corresponding
wave functions by finding the eigenvalues
and eigenvectors of the energy operator, which
is known as the Hamiltonian. According to
the Planck-Einstein relation, these energies
are related to angular frequencies, which
describe the time evolution of each energy
state. Using this property, one can derive
the Schrodinger equation, which is a differential
equation that relates the Hamiltonian to the
time evolution of the wave function.
We know how to treat a particle quantum mechanically.
But what we want to achieve in the end is
to describe light quantum mechanically. We
know that a monochromatic field oscillates
harmonically, so let’s study a harmonically
oscillating particle using quantum mechanics,
so that we can later use those results to
quantize optical fields.
Let’s first recall how to treat the harmonic
oscillator classically. We have some potential
that is quadratic in x. By taking the derivative
we can calculate the force that acts on the
particle, which is in this case proportional
to negative x. From Newton’s second law
we know that this force determines the acceleration
of the particle. By equating the two expressions
for the force we find a second-order differential
equation for the position as a function of
time. One solution is given by x(t) equals
cosine omega t, which describes a harmonic
oscillation. From this we can also calculate
the momentum of the particle as a function
of time, which turns out to be proportional
to sine omega t.
What we can do next is to make these quantities
dimensionless so that they can be compared
to each other more easily. We can multiply
x by some factor and p by some other factor,
so that we now have two quantities which are
both dimensionless. We can now write them
as the real part and imaginary part of a complex
exponential, just like we could write a monochromatic
optical field as a real part of a complex
exponential. Since these two quantities are
now dimensionless, we can conveniently plot
them against each other in a phase space picture.
We find that as a function of time, the particle
rotates in circles in phase space.
Now let’s turn to the quantum mechanical
treatment. We saw previously that to understand
the dynamics of a system quantum mechanically,
we need to look at the energy operator, or
the Hamiltonian, which is given by the sum
of the potential and kinetic energy. We can
plug in the expressions for the potential
and kinetic energy, and then introduce the
same dimensionless quantities we saw previously
to end up with a fairly simple Hamiltonian.
What we want to do next is to find the energy
levels of the system, which are given by the
eigenvalues of H.
There are two ways to do this. One way is
to substitute the x and p operators with their
representations in the position basis, which
transforms the eigenvalue equation into a
differential equation that can be solved for
psi(x) under the condition that psi(x) should
be normalizable. The other approach, which
is the one we will use, is to use ladder operators.
We start with our Hamiltonian with dimensionless
quantities. The trick is to try to factorize
X^2 + P^2. If X and P were scalars, we could
straightforwardly factorize the expression
into X-iP and X+iP. However, X and P are not
scalars, but operators that do not commute.
Therefore, if we expand the product, the two
cross terms do not cancel out. Rather, they
can be rewritten as the commutator of X and
P.
From the canonical commutation relation, we
can work out that the commutator of the dimensionless
X and P is given by i over 2. We can plug
this result into our equation, and then add
one half to both sides to find an expression
for X^2 + P^2. If we then multiply both sides
by h-bar omega, we find a sort of factorized
expression for the Hamiltonian. We can rewrite
this expression further by introducing the
ladder operators A and A dagger, which are
also called the annihilation and creation
operators for reasons that will become apparent
in a minute. The dagger indicates the Hermitian
conjugate of an operator. With these definitions,
we can rewrite the Hamiltonian in terms of
A and A dagger.
Using the commutation relation between X and
P and the definitions of A and A dagger, we
can furthermore derive that the commutator
of A and A dagger equals 1. With these results
we can demonstrate that the creation and annihilation
operators respectively raise and lower the
energy level of the system, hence their names.
To see how this works, let’s assume that
phi_n is a state with energy E_n. This means
that phi_n is an eigenvector of H with eigenvalue
E_n. Using the commutation relation, one can
then demonstrate that A times phi_n is also
an eigenvector of H, with energy E_n minus
h-bar omega. In other words, applying the
annihilation operator reduces the energy level
of the system by h-bar omega. Similarly, one
can demonstrate that applying the creation
operator increases the energy level of the
system by h-bar omega.
It can be shown that by applying the creation
and annihilation operators repeatedly, all
energy levels of the quantum harmonic oscillator
can be found. So if we plot the energy levels
of the harmonic oscillator, we find that we
can start at the lowest energy level, which
is not zero but one half times h-bar omega,
and then we can go up in steps of h-bar omega
by repeatedly applying the creation operator.
Notice that we’re getting on the right track
now. We saw that to explain the photoelectric
effect, we assume that light comes in packets
that have an energy of h-bar omega. Since
a classical electric field oscillates harmonically
with frequency omega, we decided look at the
quantum harmonic oscillator. And indeed we
found that the energy levels of the harmonic
oscillator increase in steps of h-bar omega.
So in a sense, we can think of the harmonic
oscillator as containing packets of energy
of h-bar omega, which can be created or annihilated
with the creation and annihilation operators.
Note that the number of packets, or quanta,
correspond to the eigenvalues of the operator
A dagger times A. Therefore, this operator
is also referred to as the number operator,
as it counts the number of energy packets
that are present in the harmonic oscillator.
So the results we found for the quantum harmonic
oscillator seem plausible, but it’s still
not obvious how the quantum mechanical and
the classical harmonic oscillator resemble
each other. For example, we saw that classically
the position and momentum of the particle
oscillate harmonically and out of phase with
each other, but where is the oscillatory motion
in the quantum case?
Classically, a particle has at each point
in time a definite position x and a definite
momentum p. Quantum mechanically, there is
an uncertainty relation between the two. So
to find the quantum state that most resembles
the classical case, it makes sense to find
the state for which the uncertainties in x
and p are minimal and equal. Such quantum
states are called coherent states.
So let’s see if we can find the quantum
states that minimize the uncertainty in x
and p. Let’s start by recalling the uncertainty
principle. It states that the standard deviation
of x times the standard deviation of p must
at least be h-bar over 2. By squaring this
inequality we find that the product of the
variances must be at least h-bar squared over
4. If we consider the dimensionless quantities,
we find that the product of their variances
must be at least 1 over 16. If we want the
uncertainties in X and P to be minimal and
equal, we require that the variance in X must
be 1 over 4, and the variance in P must also
be 1 over 4.
Now let’s see how these variances depend
on the quantum state psi. Recall that the
variance of X is defined as the expectation
value of the squared difference between X
and its mean value. The expectation value
of an arbitrary observable Q is given by the
integral of Q times its probability function.
According to Born’s rule, the probability
function of an observable is given by the
squared modulus of the wave function in the
basis of that observable. This squared modulus
of the function can be written as that function
times its complex conjugate. The multiplication
with Q can be obtained by applying the Q-operator
to the wave function in the Q-basis. We can
now rewrite the integral to find the identity
operator, which can leave out. We then find
the simple but important result that the expectation
value of an observable Q is found by sandwiching
the Q-operator between the bra- and ket-vectors
of the wave function. We can use this result
to write the variances of X and P as a function
of the wave function psi.
So now we have obtained two equations using
the variances of X and P. Using the commutation
relation between X and P, we can write down
a third equation. With these three equations,
we will find the condition for psi to be a
coherent state. So let’s write down these
three equations. We multiply the bottom equation
with i, and then add the three equations together,
and expand the squares. Next, we’re going
to collect a bunch of terms and rewrite them
in terms of creation and annihilation operators.
First, we can rewrite X^2 plus iXP as X times
A. Then we can write P^2 minus iPX as minus
i P A. We can combine these terms into A dagger
A. Next, we can factorize the mean of X squared
plus the mean of P squared, so that we get
the mean of A dagger times the mean of A.
For the remaining terms, let’s consider
the expressions for A dagger times the mean
of A, which we can write in terms of X and
P and expand, and A times the mean of A dagger,
for which we can do the same. If we add them,
we get 2 times X times the mean of X plus
2 times P times the mean of P, which are exactly
the two remaining terms in our equation. So
we can substitute them to obtain an equation
purely in terms of A dagger and A.
We can further factorize this expression,
and we find that some vector times its Hermitian
conjugate equals zero. This means that the
length of that vector is zero, which means
that the vector itself must be the null vector.
If we now rearrange the terms, we find an
eigenvalue equation that says that A applied
to psi must equal the mean value of A times
psi. So in other words, a coherent state is
an eigenstate of the annihilation operator,
and its eigenvalue is the expectation value
of A.
We have found the condition for psi to be
a coherent state. Now let’s find out what
the wave function of such a coherent state
looks like as a function of position. First,
for the ease of notation, it’s common to
call the expectation value of A alpha. Now
recall the definitions of A, and the dimensionless
quantities X and P. If we use this to write
A in terms of the usual x and p-operators,
and substitute these with their position basis
representations, we now find how the annihilation
operator acts on a wave function psi(x). By
plugging this into the eigenvalue equation
for psi, we find the differential equation
that needs to be solved.
We can rearrange the terms of this equation,
and find that the solution is given by a Gaussian
function, which is shifted by alpha. Note
that in general alpha can be complex-valued,
so let’s write it as its real part plus
its imaginary part. If we then expand the
square, we find three terms. The first term
is a quadratic term depending on x. The second
term is a constant term independent of x.
And the third term is a purely imaginary term
depending on x. We can pull out the constant
term and absorb it in the integration constant,
since anyway the wave function must be normalized.
Moreover, if we’re interested in the probability
distribution of x, which is given by the squared
modulus of psi, then the imaginary part of
the exponent becomes irrelevant, and we find
that the probability distribution of x is
given by a Gaussian that is shifted by the
real part of alpha. And similarly, one can
demonstrate the probability distribution of
the momentum p is given by a Gaussian that
is shifted by the imaginary part of alpha.
So we have found that a coherent state, which
is the most classical-like quantum state,
is an eigenstate of the annihilation operator,
which is a Gaussian wave packet in both the
position and momentum representation.
We wanted to see whether for these coherent
states one could observe the same oscillatory
motion one sees in a classical harmonic oscillator.
So, we must now look at the time evolution
of the coherent state, which means we need
to describe it in the energy basis. Recall
that the energy levels of the quantum harmonic
oscillator are spaced equally apart by h-bar
omega, and that one can go up or down these
energy levels by applying the creation or
annihilation operators.
Let’s denote the states of definite energy
as n, where the number n indicates the number
of energy quanta in the harmonic oscillator.
These states are called Fock states, and they
are eigenstates of the Hamiltonian, and also
eigenstates of the number operator, with eigenvalue
n. So if we calculate the expectation value
of the number operator for an n-photon Fock
state, we will of course find n. We can rewrite
the expectation value of the number operator
as the squared length of A times n, and we
know that A times n is proportional to the
n-1 Fock state. We can combine these two results
to infer that the annihilation operator applied
to an n-photon Fock state gives an n-1 -photon
Fock state multiplied by square root n.
We now know that a coherent state is eigenstate
of the annihilation operator, and we now know
how the annihilation operator acts on an energy
state. So we can find an expression for a
coherent state in the energy basis, which
means we can calculate the time evolution
of a coherent state. Let’s write the coherent
state as a linear combination of energy states,
with coefficients that are to be determined.
We can apply the annihilation operator to
both sides of the equation, and we know that
on the left hand side it reduces to a multiplication
with alpha, and on the right hand side each
n-photon Fock state becomes an n-1 -photon
Fock state multiplied by square root n.
Because the n equals 0 term of the summation
vanishes, we can start the summation at n
equals 1. Now we can shift all indices from
n to n+1 and start the summation from n equals
0 again. If we then divide both sides by alpha,
we can compare this equation to the one above
it and find the recursive relation that relates
c_n+1 to c_n. If we write c_n+1 as a function
of c_n, we can pick some arbitrary c_0, and
use the recursive relation to subsequently
find c_1, c_2, etcetera. If we look at the
expressions we get, we can come up with a
formula that directly relates c_n to c_0.
We can plug this formula in the expression
for the coherent state. Now all we need to
do is to find c_0, which is determined by
the normalization condition. If we plug in
the expression we found for the coherent state,
and recognize the summation as the Taylor
expansion of the natural exponential function,
then we find an expression for c_0. And with
this result, the expression for the coherent
state in the energy basis is now complete.
Note that the coefficients give a Poissonian
probability distribution with a mean that
is the squared modulus of alpha.
With this expression for the coherent state
in the energy basis, we can straightforwardly
find the time evolution by multiplying each
coefficient with a time-dependent complex
exponential. We can plug in the expression
for the energy and then simplify and rearrange
some factors to eventually find that the time
evolution of the coherent state is given by
multiplying alpha with a time-dependent complex
exponential.
Now recall that the probability distributions
of the position and momentum were Gaussians
shifted by the real and imaginary part of
alpha. This means that as a function of time,
the probability distributions oscillate as
a cosine and sine, just like a particle in
a classical harmonic oscillator. We could
describe the behavior of a classical particle
as a point rotating in circles in phase space.
For a coherent state we could do the same,
except the point now becomes a spread out
function because the probability distributions
for X and P are Gaussian. The radius of the
circle is given by the modulus of alpha, and
the spread of the function is one half, due
to Heisenberg’s uncertainty principle. Just
like in the classical case, this function
rotates in circles with angular frequency
omega.
This plot is a called a Wigner quasiprobability
distribution. It is created using the probability
distributions of X and P. However, it cannot
be interpreted as an actual two-dimensional
joint probability distribution, because in
quantum mechanics you cannot measure position
and momentum simultaneously. So you cannot
assign a certain probability to a pair X and
P. Another way to see why it cannot be an
actual probability distribution is because
for some quantum states the Wigner function
has negative values, and it makes no sense
to speak of a negative probability.
Let’s summarize what we have seen about
the harmonic oscillator. In the classical
case we saw that the position and momentum
oscillate harmonically and out of phase with
each other. Quantum mechanically, we have
to look at the energy operator, or Hamiltonian,
which we can write either in terms of position
and momentum, or in terms of the creation
and annihilation operators. By analyzing this
Hamiltonian, we saw that the energy levels
of the harmonic oscillator are equally spaced
by h-bar omega. By applying the creation operator
you can move up one energy level, or create
one energy quantum, and by applying the annihilation
operator you can move down one energy level,
or destroy one energy quantum. The operator
A dagger A is the number operator, which counts
the number of quanta that are present in the
oscillator.
The quantum states that most resemble classical
states are coherent states. These are states
that have minimal and equal uncertainty in
position and momentum. The probability distributions
for X and P are Gaussians, and they oscillate
harmonically and out of phase with each other,
just like in the classical case. It can also
be demonstrated that the coherent state is
an eigenstate of the annihilation operator,
and that the probability coefficients of the
energy states follow a Poisson distribution.
Now let’s get back to our main question:
how can we quantize the electromagnetic field?
We know that classically a monochromatic field
oscillates harmonically, and we just learned
how to treat a particle in a harmonic oscillator
quantum mechanically, so we should be able
to guess how the electric field is quantized.
To treat a particle quantum mechanically,
we need to calculate its total energy, which
for a harmonic oscillator is given by X^2+P^2.
So to quantize the electric field, we probably
also have to look at its energy, which is
proportional to the squared modulus of the
complex field, which we can rewrite as the
real part squared plus the imaginary part
squared. Note that classically the position
and momentum oscillate harmonically and out
of phase, just like the real part and imaginary
part of the complex-valued electric field.
For a particle we could define the creation
and annihilation operators as the position
operator minus or plus i times the momentum
operator. These operators could add or remove
quanta of energy. Similarly then, the real
part minus or plus i times the imaginary part
of the complex-valued electric field should
constitute operators that add quanta to or
remove quanta from the electric field. Classically
these operators would correspond to the complex
field and its complex conjugate. We could
represent coherent states, which are the most
classical-like quantum states, as a spot rotating
in circles in the X-P plane. By analogy then,
we can represent the quantized electric field
as a spot rotating in the plane defined by
the real and imaginary part of the field.
In general, if we have a scalar polychromatic
classical real-valued field, we can write
it as the sum of real parts of complex-valued
monochromatic fields. We can write the real
part of E as one half times E plus its complex
conjugate, and we can write this as two separate
sums. We can then substitute omega with minus
omega, so that both sums contain the same
exponential e to the power minus i omega t.
The first sum is over all positive frequencies,
and the second sum is over all negative frequencies.
We can then associate the positive frequency
part, which contains the complex amplitudes
E, with an annihilation operator, and we can
associate the negative frequency part, which
contains E conjugate, with a creation operator.
Note the creation and annihilation operators
we found previously were time-independent,
whereas these new field operators depend on
position and time. To treat this in further
detail, one would have to turn to quantum
field theory, but that’s beyond the scope
of the current introduction. Also note that
in this discussion we have for simplicity
ignored polarization and the magnetic field.
So we’ve seen that we can identify the complex-valued
field and its complex conjugate with the annihilation
and creation operators. But how would that
manifest itself in practice? Or in other words,
how does it affect what we measure? In the
end, a detector detects photons by absorbing
them one way or another, which means that
in the electric field a photon is annihilated.
The rate at which this happens is given by
Fermi’s golden rule. If the field starts
in some initial state i, the rate at which
it transitions to a final state f after photon
annihilation is proportional to the matrix
element W_if.
Since we aren’t interested in the final
state f of the field, and we don’t measure
it, we find the total transition rate by summing
over all final states. We can write out the
squared modulus as the product with the complex
conjugate. Now recall from earlier that the
summation over the outer products of all f
is equal to the identity operator, which means
we can leave it out. What’s left is the
product of the creation operator with the
annihilation operator, which is the same as
the number operator, which counts the number
of photons in the field. Recall that by sandwiching
this operator between the bra- and ket-vectors
of the quantum state, we compute its expectation
value. In other words, the photon absorption
rate is proportional to the mean number of
photons in the field.
Classically, the annihilation operator corresponds
to the complex-valued field, and the creation
operator corresponds to its complex conjugate.
Their product gives the intensity, which makes
sense: the rate at which we detect photons
is related to the intensity that one would
expect classically.
We can now study the photon statistics. Let’s
say we have a photon detector, and we shine
light on it so that it detects on average
10 photons per second. How many photons do
we measure in a 1 second interval? Obviously,
on average that would be 10 photons because
we just stated that, but one has to bear in
mind that this is only an average number,
and for each 1-second interval, the number
of detected photons might be different. For
example, we could detect 8 photons in the
first 1-second interval, 5 photons in the
second interval, 11 in the third, and so on.
When taking the average of all these numbers,
we expect to find approximately 10 photons
per interval.
What we can do now is plot a histogram of
the photon statistics. So on the horizontal
axis we can put the amount of photons detected
in a 1 second interval, and on the vertical
axis we can put how often we detect that amount,
which we also call the frequency, and which
is not to be confused with the optical frequency
omega of the electric field. We will find
a distribution that has its mean at 10 photons,
but also has a certain spread caused by the
random fluctuations. So what determines the
photon statistics? What determines what this
histogram will look like?
We saw that the photon absorption rate at
time t is proportional to the expectation
value of the photon number at time t, and
that we could find the classical equivalent
by substituting the annihilation and creation
operators with the complex field and its complex
conjugate, the product of which gives the
intensity. Now we’re interested in the time
interval tau between two photon detections.
So we can ask: if at a time t a photon was
absorbed, what is the absorption rate at t+tau?
If at time t a photon was absorbed, it means
that the annihilation operator of time t was
applied to the quantum state psi. If we now
want to calculate the absorption rate at time
t+tau, we use the same formula for the photon
absorption rate as used above: we have the
quantum state and its Hermitian conjugate,
and in between we put the number operator.
The resulting quantity is called the second-order
correlation function, which is evaluated at
tau.
We can again look at the classical analogue,
which is found by substituting the annihilation
and creation operators with the complex-valued
fields and their complex conjugates. We write
the products as intensities, and find that
the classical analogue is given by the intensity
correlation. So we have a second-order correlation
function G^(2) of tau that can be interpreted
classically as the intensity correlation,
and quantum mechanically as the probability
of having a certain time interval tau between
photon detections. The classical intensity
correlation we can write more explicitly as
a correlation integral.
Now we will study these quantities for three
different types of light: coherent light,
thermal light, and non-classical light which
can only be understood quantum mechanically.
Fully coherent light is classically given
by a monochromatic field, and its intensity
is constant in time. So, the intensity correlation
function will also be constant as a function
of tau. Quantum mechanically, coherent light
would be described by a coherent state, which
is an eigenvector of the annihilation operator.
So, just like in the classical case, the value
of G^(2) of tau is independent of tau. Recall
that quantum mechanically, tau represents
the interval between two photon detections.
So we find that every interval tau between
photons is equally likely to occur. This means
that the photon detections are completely
uncorrelated, so independent from each other,
and such a process is described by a Poissonian
distribution. One characteristic of the Poissonian
distribution is that its variance, which is
a measure for the spread of the histogram,
is equal to its mean value.
Now let’s consider thermal light. For thermal
light, the intensity can vary chaotically
as a function of time. If we compute the correlation
for different tau, we find that G^(2) of tau
must peak at tau=0, which makes sense intuitively:
the intensity at time t is more correlated
with itself than with the intensities at other
times. How would we interpret this plot quantum
mechanically?
Quantum mechanically, G^(2) of tau represents
the probability to observe a certain time
interval tau between two photon detections.
So if G peaks around tau equals 0, it means
it is more likely to observe short intervals
between photon detections than long intervals.
So photons tend to bunch together, and indeed
this phenomenon is called photon bunching.
If we were to plot a histogram of photon counts,
we would find it has a larger spread than
the histogram for coherent light, because
the intensity fluctuates in time. This means
that the variance is larger than the mean,
so we call it a super-Poissonian distribution.
Now let’s look at non-classical states of
light, that can only be understood in the
context of quantum mechanics. If we consider
an arbitrary function I of t, then its correlation
integral must always peak at tau equals 0.
So a correlation function that has a dip at
tau equals 0 would make no sense classically.
But can we make sense of it using the quantum
mechanical interpretation?
In the quantum mechanical interpretation,
a dip at tau equals 0 would mean that it is
less likely to observe short intervals between
photon detections. This is known photon anti-bunching,
and it would lead to a narrower photon count
histogram compared to the coherent case, because
the photons are detected at more regular intervals.
Because the variance is in this case smaller
than the mean, this is called a sub-Poissonian
distribution.
Let’s summarize what we’ve seen for the
three types of light. One way to distinguish
them is by looking at the second-order correlation
function G^(2) of tau, which classically describes
intensity correlations. Another way is by
looking at whether photon detections are uncorrelated,
bunched, or anti-bunched. A third way is by
looking at whether the photon statistics follow
a Poissonian, super-Poissonian, or sub-Poissonian
distribution.
So now we know the theory behind quantized
light, and how this theory can explain certain
experimental observations. But can we also
use this new theory for practical applications?
One important application is noise reduction.
We’ve seen that if we measure photons at
a certain rate, then for classical light the
variance in the number of photons that we
detect in a certain interval is at least as
large as the mean number of photons. So if
N denotes the mean number of detected photons,
and delta N denotes the standard deviation,
then delta N squared must be at least as large
as N for classical light. And for coherent
light we have a Poissonian distribution, in
which case the variance is equal to the mean.
We can define the relative error as the standard
deviation divided by the mean. We can then
find that for Poisson noise the error decreases
with the detected number of photons as 1 over
square root N. Here’s what it would look
like in practice. Let’s consider an ideal,
noise-free interference pattern which consists
of light and dark fringes that we theoretically
expect to measure on a camera. Depending on
the strength of the light source and the exposure
time of the camera, we expect for each pixel
a mean number of detected photons N.
Now let’s pick a very low exposure time,
so that even at the pixels where the interference
pattern is brightest, we only have a mean
number of detected photons of 0.1. We find
that the detected pattern is very noisy, because
we detect very few photons. If we increase
the exposure time so that we measure more
and more photons, we see that the relative
error decreases, and interference pattern
becomes clearer. So the quantization of light
results in noise that is for classical light
limited by 1 over square root N, which follows
from the properties of the Poissonian distribution.
However, we have seen that non-classical light
can follow sub-Poissonian photon statistics
with a lower variance, so can we beat this
quantum noise limit using non-classical light?
Recall that the electric field has some inherent
uncertainty due to Heisenberg’s uncertainty
principle. A coherent state, which is a state
that most resembles a classical state, has
a minimal and equally distributed uncertainty.
Now we want to decrease the uncertainty, but
according to Heisenberg’s uncertainty principle,
if we want to decrease the uncertainty in
one observable, we have to increase the uncertainty
in another observable. If we do this, we get
something that’s called squeezed light.
We could for example decrease the uncertainty
in the phase and increase the uncertainty
in the amplitude to get phase squeezed light,
or do the opposite to get amplitude squeezed
light.
Just like a coherent state can be represented
as a spot rotating in circles, squeezed states
can be represented as ellipsoidal spots rotating
in circles. The probability distribution of
the real part of the field can be measured
experimentally, which is evidence that these
states indeed exist and can be generated.
We saw that for classical light the error
scales as 1 over square root N, so in principle
we could make the error arbitrarily low by
making N arbitrarily high. However, in practice
it’s not always possible to increase the
laser power or exposure time by the required
amount. In that case, squeezed light can help
reduce the noise error while keeping N fixed.
This technique is used in interferometers
for the detection of gravitational waves.
So in summary: to quantize the electric field
we have to interpret the complex field E as
the annihilation operator, and its complex
conjugate as the creation operator. The classical
intensity then corresponds to the number operator.
With this, we can describe the photon detection
process, and we can distinguish three types
of light based on the photon statistics. For
coherent light the photon statistics follow
a Poissonian distribution and the photon detections
are uncorrelated. For thermal light we have
a super-Poissonian distribution, and photons
are bunched together. Non-classical light
gives a sub-Poissonian distribution, and photons
anti-bunch. Because photon detection is a
random process that follows at best a Poissonian
distribution in the classical case, measurements
are shot-noise limited. This limit can be
overcome using non-classical light, such as
squeezed light.
