In 1982, the fruits of a partnership between
Sony and Philips were released to the music-loving public.
The Compact Disc, or CD, was unveiled
as the first consumer digital audio format.
These 12 cm discs could store up to 74 minutes
of perfect quality digital audio, and as was
famously demonstrated many times over, were
immune from liquid damage, could withstand
significant scratching, and could be played
without wearing them down for a theoretically
infinite number of times. That’s because,
unlike all previous sound formats where the
sound is encoded as an analog impression of
the sound wave on plastic discs or magnetic tape,
the sound on a CD is encoded as a series
of samples, which function as a set of instructions
on how to recreate the sound.
The Compact Disc was a big freaking deal in
many ways. It represented a giant leap in
convenience for the consumer, quality of the
recorded sound, and in raw data storage capacity.
That last bit wouldn’t be too relevant to
the computing industry for some time, but
it solved the central problem of digital sound--needing
a, for the time, absurdly massive amount of
raw data.
Before we get too far into the specifics of
the Compact Disc, it’s time to dig a little
deeper into how digital sound actually works.
Recall from my last video that an analog-to-digital
converter is taking instantaneous samples
of the analog signal at a specific sampling frequency.
Then, a digital-to-analog converter
can recreate the original analog signal with
only those samples. Well, a lot of people
think that this can’t possibly work to recreate
all the detail in the original analog sound
wave. But thanks to mathematics, we know that
it can. And does.
It’s time to explore the Nyquist-Shannon
sampling theorem. This theorem was co-discovered
by E. T. Whittaker and Vladimir Kotelnikov
so it is also, but less commonly, referred
to as the Whittaker–Nyquist–Kotelnikov–Shannon
sampling theorem. Anyway, Harry Nyquist and
Claude Shannon, along with the other two,
discovered that a band-limited signal, that
means a signal which does not contain frequencies
above a certain limit, can be perfectly described
and perfectly reconstructed by taking instantaneous
samples at twice the rate of the frequency limit.
This gets a little complicated, so
I’m going to explain it as best as I can.
Our hearing itself is bandlimited. Though
frequencies exist in nature above 20,000 hertz,
our ears cannot detect anything above that
frequency. Some people claim they can but,
well most people rapidly lose their hearing
at those high frequencies as they age, and
I’m talking like as they leave childhood,
so let’s just not go there. With the knowledge
that anything above 20,000 hertz will just
not be audible, we can therefore capture all
of the audio we can hear by only recording
sounds below 20 kilohertz.
Passing an analog input through what’s called
a low-pass filter will eliminate any frequencies
above a specified point. There’s no sense
in recording sounds we can’t hear, so we
can use a low-pass filter to eliminate frequency
components above 20 kilohertz and we end up
with a signal that represents all that we
can hear, and is, importantly, band-limited.
For now, we’re going to assume this can
work perfectly, so we’ll say that any signals
above 20 kHz cannot pass through the low pass
filter.
Nyquist-Shannon tells us that by sampling
at a rate of 40 kilohertz, we can capture
all the detail possible in this newly band-limited
signal. That may sound a little unintuitive,
but let’s explain why it’s true. When
you have a band-limited signal, certain types
of sounds get reduced to a representation
of themselves made as a sum of sine wave harmonics.
Fourier transformation, which is another complicated
math subject that will rear its ugly head
a little later, means that we can represent
any waveform as a sum of sine wave harmonics.
The most classic illustration of this is a
square wave.
A true square wave looks like this, but to
actually reproduce this signal would require
near infinite bandwidth. And I know what you’re
saying, that seems silly, this is just a rapid
hard cut in and out of a signal. Like flipping
a light switch on and off. Ehh, that’s true,
but if we could produce this signal, then
this vertical piece, which represents an instantaneous
increase in amplitude, would require a frequency response that’s ridiculously high. See if we can
make that instantaneous shift from low to
high intensity, then we must be able to produce
the same shift downward in the same amount
of time--which is no time at all if it truly
is instantaneous. To do that would require
infinite bandwidth. That’s more than we
have to play with, so when this signal is
passed through a low-pass filter of 20 kilohertz,
it comes out like this. This is the sum of
harmonics that would create this square wave
via a fourier transform, but the highest frequency
harmonic possible is 20 kilohertz.
Because we’ve placed a bandlimit on the
input, we’re dealing with nothing but sine
waves now, piled on top of each other. We
are constructing any other waveforms as a
sum of sine wave harmonics, and this explains
why the Nyquist-Shannon theorem holds true.
If the sample goes from the lowest possible
to the highest possible value and back, the
only waveform that can hit those three samples
is a sine wave at the Nyquist frequency of
20 kilohertz. We can’t describe any frequencies
higher than 20 kilohertz, but just two samples
per cycle is enough to define our highest
possible frequency.
When you band-limit a signal, you eliminate
all of the high frequency harmonics that could
define a square wave with greater detail than
this. Now before you cry foul that this is
creating detail where there shouldn’t be
detail, just remember that your hearing is
just as bandlimited as the output of the low-pass
filter. And also keep in mind that the whole
chain here is messy. Everything in nature
will oscillate with a sudden impulse of energy.
Even if a true square wave were to burst on
the scene, your eardrums will oscillate between
the peaks in energy, sort of like the wiggly
wobblies in this bandlimited waveform.
So they key here is--
don’t worry about it.
This drove me crazy for days but the fact
is--this is just how sound waves and nature
work. Any signal can be represented as a sum
of sine waves, and bandlimiting it simply
forces these oscillations into existence.
The most pressing issue with this representation
is that it can create ringing artifacts as
a result of the Gibb’s phenomenon, but now
we’re getting into really nitpicky stuff,
and you could easily argue that this would
happen even in analog systems due once again
to natural oscillations either in the physical
realm, such as the fact that both a phonograph
needle and loudspeaker driver have mass and
thus cannot move instantaneously so they will
oscillate at their own harmonic frequency
and create their own ringing artifacts anyway,
and also the electrical realm because the
nature of any circuit will have some oscillations,
too. So again,
don’t worry about it.
Now comes the part that sounds crazy but is
completely true, and I can show it to you
in a moment. Nyquist-shannon tells us that
if we simply have twice the number of samples
per second as the frequency of our signal’s
band limit, the exact bandlimited signal can
be reproduced perfectly, and I mean literally
perfectly, using only those samples.
This is admittedly weird, so let’s talk through
it.
Imagine an ADC is recording a sound. Every
40 thousandth of a second it takes an instantaneous
reading of the signal it receives. By quantifying
it on a digital scale of our choosing, it
creates 40 thousand discrete samples every
second. But remember, that signal has passed
through a low-pass filter before it reached
the ADC, so it does not contain any frequencies
above 20 kilohertz. The truly mind-blowing
part about Nyquist-Shannon is that the samples
we get from this bandlimited signal can ONLY
reproduce the original signal. There is only
ONE signal that can possibly produce the exact
series of samples that the ADC recorded
Again, this is because we are dealing with a bandlimit.
Without a bandlimit in place, the samples
could be defining parts of other strange waveforms
due to aliasing, but when adhering to this
bandlimit the resulting string of samples
can only define exactly one waveform. There
is literally only one mathematical solution
for the bandlimited waveform
that would pass through all samples.
(Mind blow)
This is some complicated stuff here. But just
know that if I have any series of samples,
and I assume these are representing a bandlimited
signal, then they can only possibly satisfy
one waveform. And that, ladies and gentlemen,
is why the myth that digital sound creates
a stair step pattern in the output is false.
But the weirder bit is that the DAC very well
might. But before you freak out--that doesn’t
mean any stair-steppy signal has ever come
out of a DAC or CD player or anything. And
that’s because of the same low-pass filter
that originally bandlimited the input signal.
Many digital to analog converters are actually
quite simple. They use a resistor ladder,
which is tied to the actual bits in each discrete
sample, to produce the appropriate voltage.
I don’t want to go on too much of a tangent
here but they are really neat and explain
how the simplest DACs work. Each bit of the
sample is tied to a resistor. If it’s a
1 the resistor is activated and passes voltage
through it, and if it’s a 0 the resistor
is not. The network of resistors will create
a unique voltage for each possible combination
of bits, and thus you now end up with zeros
and ones equalling a voltage of however specific
you want. A 16 bit DAC, like those used in
the Compact Disc standard
(most of the time--we’ll get to that)
will have 16 resistors, each
controlled by one bit of the datastream. These
feed into intermediary resistors to create
all of the possible voltages. But the more
bits you add, the more accurate these resistors
have to become, which helps explain why the
earliest DACs were very expensive. The technology
to produce resistors in an integrated circuit
within an accuracy within approximately .000015%
was expensive for a while.
Anyway, these R-2R DACs, as they’re sometimes
called, will produce a stair-step waveform
from the output of the resistor ladder. This
is what’s called sample-and-hold. Each sample
sustains the given voltage level until the
next sample is received by the DAC. This had
led to many, many, far too many people believing
that this is the signal that comes out of
your CD player and goes into your amplifier.
It is easy to imagine this blocky-looking
waveform screwing around with your favorite
recording of Beethoven’s 9th.
But you forget, dear audiophile, that the
stair-steppy waveform will pass through a
low-pass filter on its way out. And that filter
will create the same bandlimit on the output
from the DAC as was placed on the input of
the ADC. Now, what this means, is that the
output from the DAC is also bandlimited to
20 kilohertz. And why does that matter? Because
the very stair-steppy nature of the resistor
ladder’s output is impossible with a bandlimit
of 20 kilohertz. Just like our square wave
example, these vertical components require
infinite bandwidth. Good luck with that.
But what’s even weirder, and kinda difficult
to grasp, is that because the output of the
DAC has the same bandlimit as the ADC did,
now we are dealing with Nyquist-Shannon again.
And the truly strange-but-true part of this,
is that the only possible result of the output
from the low pass filter is the original waveform
that the ADC recorded. Remember, with a bandlimited
signal, we can represent all of the detail
within that signal with discrete samples,
and with only a sample rate that is twice
the bandlimit frequency. If we create a waveform
that passes through all of the samples, then
it must be the original waveform recorded
by the ADC.
The fact that the waveform comes out of the
resistor ladder all choppy-like doesn’t
matter in the slightest, because the low pass
filter will bandlimit it and get rid of the
choppies. Now it can only contain frequencies
of 20 kilohertz or below. Remember, the vertical
parts of the stair-step pattern are impossible
with that bandlimit, so they just get smoothed
out. And since we know that the DAC was outputting
the correct voltage level with each sample,
all of the samples must have been satisfied.
Which means that after the LPF smooths the
waveform, it must have passed through all
of the samples. And because there’s a bandlimit
in place, Nyquist-Shannon proves that the
output signal is the exact same one as the input.
To provide some evidence to back this claim
up, take a look at this CD player. This is
a Sony CD changer from 1993. It has a relatively
rudimentary DAC in part because it’s a cheaper
machine and in part because it’s older.
Let’s hook an oscilloscope up to it and
take a look at the output coming from its
RCA jacks.
This is just some music it’s playing right
now. Notice that there’s nothing in here
that looks remotely stair-steppy. But let’s
take it even further. I’ve created a CD
with various tones generated in Audacity.
Let’s start with a 1 kilohertz square wave.
Even though in Audacity the samples look like
this--straight up, then hold, with a completely
straight line between peaks--the output from
the CD player is that wiggly wavy thing. That
happens because those wiggly wavies are the
only way to make this square wave with a 20
kilohertz bandlimit, and the wiggly bits are
passing through each of those samples.
Now let’s switch to some sine waves. This
is again 1 kilohertz. This looks perfectly
smooth, no stair-steps to be seen. To be fair,
though, even in Audacity it looks pretty good.
Let’s move up to a 10 kilohertz sine wave.
Now in Audacity it looks really gnarly, with
the connections between the samples making
a barely intelligible wave. There aren’t
even 5 samples per cycle, so how can the smooth
detail of the sine wave possibly be reproduced?
Well, take a look. There’s a perfectly smooth
sine wave for you, right there.
This is why some of you cringed when I drew
straight lines between the samples. That’s
only sort of what happens, and even then it’s
not that accurate. But it does serve as a
sort of blend between the two realities. There
is a stair-step pattern in the intermediate
between the resistor ladder and the low pass
filter. So the DAC does connect the dots,
but like this. Then the LPF smooths out the
connections between the dots, but that only
happens as a side-effect of the fact that
it’s creating a bandlimit so the high frequency
components, that’s the vertical parts here,
get tossed out. What you’re left with is
the only possible waveform that can both hit
all the samples, and which does not contain
frequency components above Nyquist. Simple,
right?
Ah! We haven’t even really talked about
the CD itself yet! And this is pushing into
the 14 or 15 minute mark already, if my gauge
of time per written page is at all correct.
OK, I guess we’re going to push the technology
of the CD into another video. But that’s
OK, since we covered what makes sound out
of numbers. And hopefully we’ve destroyed
the myth that digital audio cannot produce
smooth waveforms. It does.
Much of the information from this video (and
indeed some selected clips) came from a lovely
video by Monty at xiph.org. I’ve linked
to a great article of his down below, and
a card will pop up now heading to his video.
Many, many people brought this to my attention
on Twitter and elsewhere, so thank you. He’s
got some much better demonstrations than I
do that cover this topic. He also explains
why the bit depth affects noise, and not clarity,
what dithering is and how it reduces quantization
noise, and much more.
But I will give you one last tid-bit before
I sign off. You may have noticed that in the
video we’ve been discussing a theoretical
sampling rate of 40 kilohertz, as this Nyquist
sampling rate could perfectly capture all
of what human hearing can pick up. But the
CD standard’s sampling rate is 44.1 kilohertz.
Why the extra 4.1? That seems awfully specific.
Well, that’s due to the fact that low-pass
filters aren’t perfect. They can’t just
cut off frequencies above a point, they instead
have a transition window where the frequencies
degrade to zero.
The 44.1 kilohertz sampling rate is to accommodate
for that window. Without a hard-cut on the
low-pass filter, aliasing could occur because
the samples might define a waveform of a higher
frequency than Nyquist. This is a precisely
why we need an LPF. Both of these waveforms
satisfy all the samples, so to prevent one
of them from coming through, we need to decide
a limit in frequency. If the red waveform
is above the Nyquist limit, then it won’t
get reproduced. But if the low-pass filter
could let slip some signals above our decided
sample rate, scenarios like this might occur.
Therefore, the sample rate was chosen to be
44.1 kilohertz, that way it exceeds the transition
window for our desired 20 kilohertz cutoff.
And by sampling a bit beyond the audible range,
we don’t have to worry about spurious aliasing
artifacts from samples in the transition band.
But the more interesting thing about the
44.1 kilohertz rate is that it was also chosen
for easy digital sound storage before a CD
gets pressed. This was the perfect sampling
rate for storing sound on both an NTSC and
a PAL U-Matic videocassette recorder. The
commercial VCR format from Sony was among
the earliest ways to store a digital audio
stream, using the video signal sort of like
a giant QR code. It’s not literally being
read in that sense, but the data is stored
on the tape as a field of black and white
bars in each line, so if watching the output on-screen
it would look like a flashing screen of QR
codes whizzing by 60 times per second. 44.1
kilohertz would, on both NTSC and PAL signals,
work out to 3 samples per video line. So in
a strange twist, the world of analog video
dicated how digital sound would work.
Thanks for watching, I hope you enjoyed the
video. As I end all of my videos apparently,
if this is your first time coming across the
channel and you liked what you saw, please
consider subscribing. Also, and I know this
can sound weird, but for those that follow
my videos--you might want to make sure you
are actually subscribed. Often times YouTube
just serves you content because it knows you
like it, and you might think you are subscribed
when you aren’t. Same goes for all channels,
but I’ll be the selfish one today and post
this reminder.
As always, thank you to everyone who supports
this channel on Patreon, especially the fine
folks who are scrolling up your screen. Supporters
on Patreon have turned my weird hobby of making
videos about technology into a job, and you
all deserve my thanks. If you would like to
pledge some support and help the channel grow,
please check out my Patreon page.
Thanks for your consideration, and I’ll see you next
time!
