Welcome to "Do You Know" series on
Semiconductor
I am Sathyan Munirathinam 
Many people ask me what's difference between a CPU
a GPU and a TPU i thought this is a
perfect way
to start this program of explaining the
difference
with an animation the central processing
unit
or cpu works as a brain of the computer
that performs the basic operations
gpu the graphics processing unit
is a specialized electronic circuit
designed to render 2D and 3D
graphics together with your CPU
tensor processing unit or TPU
is a custom built integrated circuit
developed specifically for machine
learning
and deep learning applications let's
quick look
at the computing timeline a greatest
invention is the cpu which is a
microprocessor
consists of many transistors forming
many gates
and logic functions to be a processor
with the invent of internet the data
is exploding at an exponential rate
in order to manage and harvest
information
AI chips like GPU and TPU are born
as part of the new generation of
computing chips
when we say the processing units it's
all computing power
to teach machines to think like human
it needs a lot of compute power in data
so let's have a closer look and how each
of this processing unit works
it operates at a higher performance
speed
Artificial Neural Networks is inspired
by the way
the biological nervous system operates
in the brain
a neural network is a bunch of
multiplications
and additions on a millions of numbers
is called matrix operation
functions are mathematical equations
that determine
the output of a neural network the
function
is attached to each neuron in the
network
and determines whether it is should be
activated or fired
or not based on whether each neuron's
input is relevant for the model's
prediction
machine learning is the study of
computed algorithms
that allow computer programs to
automatically
improve through experience
imagine that you are in charge of
building a machine learning prediction
system to try
and identify images between dogs and
cats
the first step would be to gather a
large number of
label images with a dog for dogs
and cat for cats second we would try the
computer to look for
specific patterns on images to identify
dog and cats respectively
every image is nothing but a set of
matrix of pixels
that provides patterns of pixels and
patterns of colors
to guests predict the target based on
the learning experience
let's have a closer look at and how each
of this processing unit works
as i mentioned before neural networks is
all about
matrix operations and how each of this
processing unit handles
the matrix multiplications and additions
cpu handles the matrix multiplication
in a form of scalar operation and it can
handle tens of
operations per cycle going to gpu
it handles the matrix operation in the
form of a vector
and it can handle tons of thousands of
operations per cycle
but looking at the tpu it handles the
matrix operation
in the form of tinsar and it can handle
up to hundreds of thousands of
operations per cycle
the mnist is the hello world of
neural networks and it's a large
database
and written digits that is commonly used
for training
various image processing system you can
see few examples here
imagine that we are using neural network
for recognizing an and written image
of the digit 8. the image is a grid of
28 into 28 grayscale pixels
it is converted to a vector with sound
84 values
called dimensions the neuron that
recognize
digit 8 takes those values and multiply
by the parameter values
the red line shown here
the parameter works as a filter to
extract the future from the data that
tells the similarity between the image
and the shape of it in short
neural networks require massive amount
of multiplications
and additions between data and
parameters so the problem is
how you can execute large matrix
multiplication
as fast as possible with less power
consumption
cpu is a very fast general purpose
processor based on the one newman
architecture
that means scp works with software and
memory
a cpu has to store the calculation
results on memory
inside for every single calculation this
memory access becomes the downside of
cpu architecture called
the one newman bottleneck
even though this huge scale of neural
network calculations
means that these future steps are
entirely predictable
each cpu's arithmetic logic units
executes them one by one accessing the
memory
every time limiting the total throughput
and consuming significant energy
on the other side cpu is so powerful
because it can run millions of different
applications
just by changing the software
you can clearly see the time that takes
for cpu
to train the model it's really slow
to gain higher throughput than a cpu
a gpu uses a simple strategy why not
have thousands of alus arithmetic logic
units in a processor
the modern gpu usually has
2500 to 5000 alus
in a single processor that means you
could execute thousands of
multiplication additions simultaneously
so it's a super efficient on
applications with a massively parallel
task such as the matrix operation
as you can see both cpu and gpu
read and write the intermediate result
on memory at every calculation
so that they can support a wide variety
of different
algorithms as general purpose processes
but this leads back to our fundamental
problem
the one new man bottleneck for every
single calculation in the thousand of
alus
gpu needs to access registers or
shared memory to read and store
the intermediate calculation results
because the gpu performs more parallel
calculations
on its thousands of alus it also spends
proportionally more energy accessing
memory and also
increases the footprint of gpu for
complex wiring
google designed the tpu to build a
domain specific architecture that means
instead of designing a general purpose
processor
google design it has a matrix processor
specialized for neural network loads
tpus cannot run word processors
control rocket agents or execute bank
transactions
but they can handle the massive
multiplications
and additions for neural networks at
blazingly fast speeds
while consuming much less power and
inside a smaller physical footprint
with the introduction of systolic array
architecture
the goal is to reduce the one
newman bottleneck let's see how a
systolic
array executes the neural network
calculations
at first tpu loads the parameters from
memory
into the matrix of multipliers and
adders
then the tpu loads data from memory
as each multiplication is executed
the result will be passed to next
multipliers
while taking summation at the same time
so the output will be the summation of
all multiplication
result between the data and parameters
during the whole process of
massive calculation and data passing no
memory
access is required at all that's why
the tpu can achieve a high computational
throughput
on neural network calculations with much
less power consumption
and small footprint
I hope this video gave some insights in
the comparison
between cpu gpu and tpu
Thank you for watching see you next week
with
another semiconductor topic
