In this lecture we shall be starting our discussion
on the ARM microcontrollers.
What are their architecture like, what are
their specific features, and how are they
different from the earlier generation of microcontrollers.
The topic of this lecture is Architecture
of ARM Microcontroller, this is the first
part of the lecture.
In this lecture we shall be covering some
general ideas about the ARM series of microcontrollers,
how they have evolved and some of the important
architectural features.
Let us look into the history.
The architectural ideas that that have evolved
into this ARM class of microcontrollers were
developed long back in 1983.
There was a company called Acorn computers
that was the first to develop and evolve such
ideas.
Now these ideas were little different because
they started to develop architectural ideas
based on the RISC architecture concept.
And at that time there was a very popular
microprocessor called 6502 from a company
called Mostek that was used in one of the
very popular microcomputers called BBC micro.
The first attempt of these people was to replace
that processor by a more powerful processor,
which will make the BBC micro faster and more
powerful.
This resulted in the first commercial RISC
implementation.
It was not called ARM during that time, but
it evolved into the ARM architecture.
There was a company that finally got founded
in 1990.
The name ARM is the acronym for Advanced RISC
Machine.
So, you see in the name itself the word RISC
is embedded.
ARM architecture essentially borrows concepts
from the RISC architectural concept.
Initially this company ARM was jointly formed
and owned by Acorn which was the initiator,
Apple was also there and there was another
company called VLSI.
These three companies came together and formed
this new company called ARM.
Now what is so interesting about ARM?
Why do we have to talk specifically about
ARM?
You may ask in this course why are we specifically
trying to use ARM as the vehicle for teaching
embedded systems.
The reason is ARM has been increasingly used
in many applications, they are the most popular
category of microcontrollers that are seriously
used in embedded system applications.
Let us take some examples that you all know
about.
The iPods from Apple through which you can
listen to music, there was an ARM processor
inside.
Benq, Sony Ericsson these are very well known
companies who manufacture TV sets and many
audio visual equipments, there are ARM processors
inside each of these equipments.
Typically they started to use ARM 9, but subsequently
they upgraded them to the later version of
ARM processors.
The Apple iPhone all of us are familiar with
and some of the very popular Nokia phones
all have ARM11 processor inside them.
Now, this is a pretty old piece of statistics.
Till 2010, 90% of all serious embedded applications
had this kind of ARM processors inside them.
When you talk about embedded system application,
depending on the application you decide how
much power you need from the processor.
If it is a very simple application you do
not need a processor as powerful as ARM.
You can use 8-bit PIC microcontrollers.
ARM processors are typically 32 bit and above.
So, you use ARM processor when you need reasonably
powerful computation capability that will
make the heart of your embedded system.
Now another thing is that ARM processors have
very low power consumption and of course,
reasonably good performance.
Because of this low power consumption, they
are very widely used in battery operated devices.
There are many battery operated devices like
the mobile phones.
If you look at 
this diagram, it shows an evolution of ARM
based 
processors and ASICs over the years.
First thing is that, ARM is essentially a
RISC based architecture.
We shall go into some detail what a RISC architecture
is.
It borrows some advanced architectural ideas
in contrast to conventional microcontrollers
that had very primitive kind of designs.
I talked about 8051 that was a very popular
microcontroller, but architecture wise it
was pretty primitive, it did not use any kind
of architectural enhancement or advanced features.
As I told you this ARM processor is not just
one, but a whole family of processors and
the most important thing is that in order
to maintain backward compatibility, all these
share essentially a common instruction set.
Of course in the later generations some additional
instructions have been added, but the older
instructions are also carried through, such
that a program that was developed for a older
generation would run pretty well for the next
generation also.
Now the design philosophy here was of course
we need a small processor so that we can have
lower power consumption and can be used for
embedded systems application.
So, the size of the processor should be small,
it should consume low power, and high code
density.
You see in microcontrollers I told that program
memory and data memory are all inside the
chip.
There is a scarcity of real estate; you cannot
put very large memory inside.
So, there is a maximum limit to the size of
the program that you can run.
Let us say the program memory size is 100
kilobytes.
Whatever I write must fit within this 100
kilobytes.
So, if my instruction set supports that in
this 100 kilobytes, I can pack my code very
nicely, so that I can implement more functionality
as compared with some kind of competing architecture
where a much more memory would be required
to implement the same thing.
What I mean to say is that, suppose I have
an application X that I want to implement
in a conventional architecture maybe it will
be requiring 120 kilobytes but in RISC ARM
Architecture I can fit it within 100 kilobytes.
This is something called high code density.
There are some instruction features that allows
us to reduce the number of instructions required.
This can take advantage of limited memory
and physical size restrictions.
And of course, here there is lot of flexibility
in the interface.
We can interface with a wide variety of memory
systems, very slow or also relatively.
And of course, reduced die size means when
you are actually fabricating the chip the
size of that silicon is very small, so that
when you develop that ASIC it would occupy
a very small portion of it.
So, you can use the remaining space to put
in much more; with additional functionality
to make the ASIC very powerful.
Some of the popular ARM architectures are
shown here.
There are many, I am only showing three, ARM
7, 9, 10 there are 11 and beyond.
ARM 7 has 3 pipeline stages, we shall be talking
about pipeline later.
Now pipeline stage essentially means how the
instructions get executed.
There are 3 stages: fetch, decode, execute.
It supports high code density and low power
consumption.
You do not need very high power ARM processors
everywhere, whatever you need inside a mobile
phone you will not need possibly inside a
refrigerator, you need very simple kind of
calculations there.
Coming to ARM 9 first thing is that these
are all backward compatible, but the pipeline
stages are increased to 5; fetch, decode,
execute, memory, write.
And the concept of cache memory came in, and
there is a separate instruction cache and
separate data cache
In ARM 7 instruction and data were both in
the same memory.
So, it was like a von Neumann architecture,
but from ARM 9 onwards the architecture started
a shift towards Harvard architecture.
Moving to ARM 10, the main difference was
the pipeline was further enhanced by adding
another stage called issue.
In this way the basic architecture started
evolving making the processor more powerful
and faster by adding novel architectural concepts.
This table gives a quick comparison among
4 ARM family members ARM 7, 9, 10 and 11.
You can also see the year when it was first
introduced.
First thing is pipeline depth; depth means
how many stages of the pipeline are there..
So, we are enhancing the number of stages
in the pipeline to make the execution faster
in some sense.
Sometimes the speed of a processor is determined
by how fast we can make a clock.
In ARM7 it was 80 MHz then 150, 260, 335 and
so on.
So, the clock frequencies are increasing,
the processors are becoming faster.
Power consumption is also a measure of the
clock frequency, faster is the clock more
will be the power consumption.
So, you should estimate the power consumption
with respect to the clock frequency, because
every microcontroller has a range of permissible
clock frequencies.
It depends what clock frequency you want,
if you can operate with a lower clock frequency
and serve your purpose it is fine, you will
be consuming lower power.
So, power consumption in microcontrollers
is typically measured by milliwatt per megahertz.
In microcontrollers you measure throughput
typically by million instructions per second
per megahertz.
ARM7 was based on von Neumann, but subsequently
there is a move towards Harvard Architectures
and inside the processor there is a built
in multiplier.
There was an 8 x 32 multiply in the first
two generations, whereas for the next two
generation there was a 16 x 32 multiplier
because many applications frequently require
multiplication operation.
So, if there is a hardware multiplier built
in, it speeds up operation quite significantly.
The point to note is that 
ARM is based on the RISC architecture.
RISC is based on some architectural features.
These architectural features are like this.
With respect to instructions, there is less
number of instructions.
Instructions are simple so that all instructions
can be executed in a single cycle.
They are all of fixed length so that decoding
of the instruction becomes very easy, and
the hardware for the controller becomes very
simple.
Then with respect to the pipeline here we
shall see later that instructions are typically
executed in a pipeline in all modern day processors.
Now if the instructions are simple they are
fixed length, then decoding of the instruction
becomes very easy.
You can decode in one stage itself, you do
not have to again look at the instruction
and try to find out what this instruction
was.
There is no need for microprogramming that
is a standard norm for complex instruction
set computers.
You can directly implement the control unit
in hardware, this also becomes much faster
and you can run it at a higher clock.
One characteristic of RISC architecture is
that there is a very large number of general
purpose registers, typically 32 or more.
There are very few special purpose registers
unlike CISC where there are lot of special
purpose registers like program counters, stack
pointer, base registers, and so on.
And another important thing is that RISC is
based on a load/store architecture, which
means there are some load and store instructions
responsible for transferring data between
registers and memory.
All other instructions only work on registers,
they do not access memory.
This kind of instruction set is sometimes
called load store architecture, where only
load and store instructions access memory
and all other instructions work only on the
registers.
Now I told earlier that even the CISC machines
of today like the Intel class of machines
they use micr programming, they translate
those complex instructions into some kind
of microprograms that look more like the RISC
instructions.
So, they also implement RISC concept in some
way, they make an initial translation after
which they execute using standard RISC techniques.
Now talking about ARM, well although the name
ARM contained this RISC this middle R is the
acronym for RISC,
strictly speaking ARM is not a pure RISC architecture.
There are some features that have been introduced
in the architecture because they are very
useful in embedded system applications, which
are not RISC characteristics.
Some of the differences are as follows.
Certain instructions require variable number
of clock cycles for execution.
While talking of RISC, I said all instructions
should be executed in a single clock cycle,
but in ARM some of the instructions can be
more complex, it can require multiple clocks.
One classical example is multiple register
load store.
Normally we load a value from memory into
a register, but ARM allows you to specify
that the value loaded will be loaded into
let us say 4 registers.
So, to write into those 4 registers you need
4 clock pulses.
You cannot do this in one cycle, such multiple
data transfer instructions are supported in
ARM.
And there is something called a barrel shifter
that is a very common architectural concept.
It is a hardware which allows multiple bit
shifting very efficiently in a single cycle.
This barrel shifter is part of the ARM architecture
and there are many instructions which directly
utilize this barrel shifting capability.
Let me take an example.
Suppose there is an ADD instruction which
adds 2 registers, let us say r2 and r3.
It adds, but I can also say you add r2 and
r3 shifted left by 4 positions.
Shift left by 4 positions will be done by
the barrel shifter, it will not take any additional
time, in that single clock cycle everything
can be done.
Because of the presence of the barrel shifter
this kind of shift and operate kind of instructions
are possible.
And another feature is that you can configure
ARM in the thumb mode.
Thumb is a subset of the ARM instructions,
which works in 16-bit mode.
Normally ARM processors are 32-bit processors,
but there may be many application where you
do not need that power, you need much simpler
power.
You can have the thumb instruction set which
is essentially a 16-bit instruction set.
If we use instructions that are smaller, this
can further lead to a shortening of the total
code size.
Your code density can further improve.
And there is another very interesting feature
we shall be discussing this in detail, called
conditional execution.
You can say you add these 2 numbers provided
the 0 flag is set.
In conventional processors if the 0 flag is
set you can have a JZ or JNZ kind of instruction,
you do a jump then check if it is not 0, and
so on; that means, you need so many jump instructions.
But if you have a conditional instruction,
like you say add if 0 flag is set, then you
are avoiding the jump instruction altogether.
The number of instructions also get reduced.
And of course, there are some enhancement
like multiply and add, this kind of instructions
have been added to the instruction set.
Because of these ARM has deviated slightly
from the pure RISC, but still it is a fairly
powerful processor mostly based on RISC, only
for a few cases it deviates because of very
good reasons of course.
Now talking about this von Neumann and Harvard
architecture you already talked about earlier,
this ARM7 and the even older processors were
based on von Neumann, there was a single memory.
The later processors have 2 separate memory,
instruction memory and data memory, and inside
also there is an instruction cache and a data
cache.
Now another feature is that ARM processors
does not have separate instructions for input/output,
they use something called memory mapped IO.
Like let us say this is your total memory
area, this is your memory.
There is one part of memory that you reserve
for the IO devices.
Normally when you access memory you store
the data here, but when you are trying to
access some address in this region the decoding
circuitry will automatically be accessing
the IO ports instead of the memory.
But there will be the same decoder for memory
and IO operation.
Say load store instructions are typically
used to transfer data between memory and register,
the same load store instructions will be used
for reading from IO port or writing into IO
ports.
There are no separate instructions for input
and output.
The address to be used will be the address
corresponding to the IO devices.
This is the typical ARM architecture, I just
wanted to show you a snapshot of it.
You see you have the register bank, here we
have the arithmetic logic unit.
One of the data is coming directly from the
register bank here, and for the other one
you see there is a barrel shifter sitting
here.
So, the other data can be shifted and then
applied to ALU or it can even come without
that, with no shifting.
The multiplier is sitting here; whenever you
need this multiplier hardware you can multiply
and bring it to the a-bus.
There are some other instruction features.
There is an address register, address incrementer,
so these are the address bus and here is the
data bus for interfacing with memory.
But inside this is interesting, there is a
multiplier, there is a barrel shifter that
are sitting before the ALU.
This makes the implementation of some of the
instructions very efficient.
So, with this we come to the end of this lecture.
We shall be continuing this discussion over
the next couple of lectures where we shall
be looking at some of the more additional
features that are there in the ARM instruction
set and also the ARM architectures.
Thank you.
