I'm very happy to be here.
When I was a young professor, Donald Peterson,
whom the award before the break was given,
was a force of nature at Berkeley.
His idea was, if we're going to build software
in universities, we should give it away for
free including the source, so he kind of started
the open source movement, the rest of us followed.
I remember being told before I gave my first
talk here that this is the world's best organized
conference and it still is after 40 years
later.
So, I have three parts to my talk.
I'm going to tell you about, first for you
circuit designers you may not have been keeping
up on computer architecture - I�m going
to do 50 years in 15 minutes to set the foundation
for two exciting new trends of domain-specific
architectures and open architectures.
So, in the 1960s, IBM had this big problem.
They had four different incompatible lines
of computers.
They had their own instruction sets, they
had their own software stack, they had their
own operating systems, they had their own
market niches.
So, the engineers at IBM made this bet that
they could invent a single instruction set
that would handle all the markets.
Basically, they invented what we call today
binary compatibility.
The same program would run on the 8-bit version
and the 64-bit version and all the software
would be the same.
Now, that was a bold thing to try and then
as is now, the hardest part of processor design
isn't the datapath, isn't the number crunching
part, it's the brain of it, it's the control.
How are they going to pull all those things
together?
Maurice Wilkes, one of the computing pioneers,
had written some papers as an academic that
he thought there was a better way to design
control since at that time logic was slower
than ROM or RAM and ROM was faster and cheaper
than RAM, his idea was to specify control
as a matrix.
He referred to every word of that ROM as a
microinstruction, so it's more like programming.
So, IBM decided to make that bet, when they
bet the company, that they could pull it off.
Then 1964, IBM announced IBM 360, the biggest
announcement in the history of the company
and you can see on the left, there are 8-bit
computers and the right, there 64-bit computers.
They all ran the same instruction set.
The microcode varied, basically the more powerful
the hardware, the wider the microinstruction
must be.
In this case, the fastest ones were the 3k
by 87 bits and the narrower one would be simpler
hardware and less ROM, it would be 50 bits
wide, but it took more instruction since it
was a little bit slower to get there.
IBM won that bet.
That instruction set is classic, it�s here
50 years later, there�s descendants of it
still for sale today and they still dominate
the mainframe market.
What happened next is we moved to the minicomputer
industry and semiconductors.
Logic, RAM, and ROM are all built from the
same transistors, so the RAM and ROM are about
the same speed but with Moore's law you could
have a lot bigger control store so microcodes
got bigger.
What this meant was, you could build a bigger
instruction set interpreter, which meant you
could have more complicated instruction sets.
A classic example of this is the VAX minicomputer
from Digital Equipment Corporation and it
had 5K microinstructions, each 96 bits wide.
In the semiconductor street down here in California,
Gordon Moore, who was leading Intel, believed
that the instruction set that they had, the
8080, the next one they did, they'd be stuck
with for the whole life of the company just
like IBM is with the 360.
He wanted to make it a really great instruction
set so he hired lots of PhDs in computer science,
sent them to Oregon to invent the next great
instruction set.
Eventually they came back with an incredibly
ambitious one; it was 32-bit addressing where
they needed 16-bit addressing and it was what�s
called a capability-based operating system
today.
The instructions weren't multiples of bytes,
they are multiples of bits - they could be
2 bits wide or 19 bits wide.
A lot of microcode.
They even wrote their own custom operating
system in an esoteric programming language.
Of course, when it was released, it was released
to tremendous amount of fanfare - the greatest
thing since electromechanical analog computers.
Alas, the problem was, it had big performance
problems and actually the microprocessor didn't
fit in one chip and usability, so it didn't
turn out as well as they hoped.
In fact, they warned Gordon Moore several
years into it that they weren't going to make
the deadline, so Moore was forced to start
an emergency project.
He tagged a small group of Intel engineers
and gave them 52 weeks to design an instruction
set and to build a microprocessor and get
it to the market.
That gave them only 3 weeks of lab time to
design the instruction set, about 10-person
weeks altogether.
In that time, they kind of took the old instruction
set, widened it to 16 bits, added few thing,
so it could be assembly language compatible.
The 8086 was announced but it wasn't that
exciting of a product and it wasn't tremendous
fanfare.
Fortunately for Intel, at the same time, IBM
had decided to enter the personal computer
market following Apple.
They really liked the Motorola 68000 instruction
set since it was like the IBM instruction
set but it was late, so they ended up taking
the 8-bit bus version of the 8086.
They thought they could sell a quarter of
a million, but they were wrong.
They sold a hundred million personal computers
and that made the 8086 family an overnight
success.
The binary compatibility with PC software
meant this was an incredibly valuable franchise.
But what happened to these microcoded machines?
This is a picture of John Cocke, one of the
heroes of computing.
He and the group at IBM Research were building
something that didn't need microcode and they
had built a nice compiler for it.
They said, well what would happen if we tried
that.
Basically, the IBM 360 instruction set, suppose
we just use the simple instructions that didn't
use the complicated ones.
Well, programs ran quite a bit faster, three
times faster.
Then over at Digital Equipment Corporation
in Boston, some engineers took a look at the
microcode to see how it was used and found
that 20% of the instructions were 60% of the
microcode but it was almost never used.
It wasn't clear that that was necessary.
That led to the reduced instruction set ideas.
Instead of having RAM memory with a microcode
interpreter in that, instead, we would replace
that with an instruction cache and they�d
hold the instructions of the program and you
could just compile and change what's in that
RAM.
You'd use really simple instructions.
You can think of them as simple as microinstructions,
just not as wide.
That made it easier to pipeline them to get
even better performance.
Then with Moore's law, we were able to get
that 32-bit datapath all into a single chip
with no chip crossings and it was a very attractive
design.
The problem was, in those early days of RISC,
trying to explain why it was a good idea.
This formula was the key to explaining it.
So, the time per program, how long it takes
to run, could be broken to these three factors:
the number of instructions you execute per
program, the average number of clock cycles
per instruction, then multiplied by the time
per clock cycle.
Well, people noticed that the RISC architectures
with simple instructions, you have to execute
more of them, so that seems like a bad idea.
But DEC [MOU1]engineers, to their credit,
a few years later published what the actual
facts were.
The CISC instruction did in fact execute fewer
instructions, maybe half as many, but the
average number of clock cycles per instruction
was about a factor of 6 bigger.
So, RISC was about a factor of 3 faster so
RISC architectures took off.
Intel cleverly had the hardware resources
that they decided what they were going to
do is translate the old CISC instructions
of the x86 internally into RISC instructions.
Then anything that the RISC guys came up with,
they could include in their microprocessor
as well.
This was very successful, and IBM grew the
market to 350M x86 microprocessors per year
and eventually that technology and that instruction
set dominated the market and took over the
servers as well as laptops.
That's the PC era.
What happened in the PostPC era?
Let's say that started with the introduction
of the iPhone in 2007.
Now, instead of buying a microprocessor from
a company like Intel, it�s the IP that goes
onto an SoC.
Which means, you value not just performance,
but size and energy is incredibly valuable.
So what's happened PostPC era is, now there's
more than 20 billion microprocessors every
year but the x86 hasn�t penetrated most
of that market.
It peaked in about 2011; it�s been declining
about 8% a year.
Since, in fact, they sold fewer of them a
couple years ago than they did in 2007.
The x86 servers are dominating the cloud but
the clouds are not that big in terms of number
of chips.
The chips are expensive but not that many,
maybe there are 10 million servers in the
cloud today, which almost takes no time[MOU2].
Today, 99% of the processors made in the world
are RISC processors; RISC won that year.
The next idea in architecture that came along
was very long instruction word or VLIW.
That was supposed to replace RISC and CISC.
The idea was, this longer instruction could
specify lots of operations at the same time
and you could put a very powerful datapath
with lots of ALUs in it.
How are you going to control it?
Well, the hardware was really simple - some
things would take one clock cycle, some takes
three or four.
The job was up to the programmer, more importantly
the compiler, to fill in the slots to make
it all work right.
If you didn't put the right distance between
them, you wouldn't get the right answer.
There were no hardware interlocks that you
see from pipeline.
That was the idea of very long instruction
words.
You shifted all the work to the compiler and
the advocates said the compilers are ready
for this.
They can handle that complexity and schedule
things appropriately and it's all going to
work.
Around this same time that VLIW was getting
a lot of notoriety, Intel had to make a business
decision.
They had successfully taken the 8086 from
16 bits to 32 bits with 8386, they just made
the registers wider.
They could do that again for the 64-bit thing
because programs were getting better than
the 32-bit address space.
However, AMD had the rights to make the x86
processors.
This wasn't a great thing for business and
you know the architecture itself was that
thing that was designed in three weeks, so
it wasn't the greatest architecture.
So, they decided to embrace the VLIW.
In fact, they decided to do that in conjunction
with Hewlett Packard.
They called it EPIC for explicitly parallel
instruction computing.
It was basically a binary compatible version
of the VLIW and we had these two big companies
pushing it.
The name of what Intel calls it was IA-64,
the other one IA-32, so the 64-bit version
was this VLIW and AMD was squeezed out.
Now, AMD had no choice but, because it was
squeezed out, had to make a 64-bit version
of x86.
It took a while for them to get their first
chip, it was several years late but they built
several more.
A bunch of companies at the time, given it
was these powerhouses HP and Intel saying
this is the future, they just gave up on their
RISC processors in pledge to use Itanium because
of the power of the companies and the arguments
that are being made.
So, what happens in computer architecture
is, we make these arguments that maybe you'd
have in a bar here at ISSCC and then we spend
billions of dollars to see who's right.
So, what happened?
What happened is that VLIW was an EPIC failure,
that's what happened.
The problem was unpredictable branches was
hard to schedule in software, when you had
a cache miss it was hard to figure out how
to schedule that thing, and the final one
was it's called very long instruction word
so the program's got a lot bigger and that's
a problem when you have million-line programs.
Donald Knuth, the famous computer scientist
from Stanford said the Itanium approach was
supposed to be fantastic but no one could
build the compilers.
Because it had gotten so much notoriety, it
became something of a - it was ridiculed by
the chip industry and somebody changed the
name from Itanium to the Itanic, inspired
by the infamous Titanic ship.
So, it didn't work.
What happened was, Intel was forced to follow
AMD's leads in the RISC instruction set.
Finishing up part one, what's consensus on
instruction sets today?
It�s not CISC, no one�s introduced one
in 30 years.
For general purpose things, it�s not VLIW.
No one�s tried that in 15 years.
The good news for VLIW architecture is that
it works pretty well in digital signal processing
because the programs are small, the branches
are easy, and it doesn�t have caches � they
have software-controlled memories.
So, it�s found a home but not general-purpose
computing.
So what�s left?
RISC!
35 years later, to my shock, RISC is still
the best idea going forward for instruction
set design.
Okay, let�s go to part 2, domain-specific
architectures.
End of Dennard scaling, end of Moore's law
that we talk about a lot here, those are big
things.
Architectural limits binary compatibility,
below the covers we would have out of order
execution ideas but those kind of ran out
of steam about 15 years ago.
So, we went to multicore because we couldn't
build better unicores, we needed a bigger
power budget.
We were limited by how many cores you can
usefully put together, like eight or so for
a lot of the applications.
Amdahl, Gene Amdahl, a famous computer architects
said in Amdahl�s law that if one eighth
of the time it�s sequential, even if you
even have a thousand processors, the maximum
speed-up is a factor of eight.
That kind of limits the benefit there so we've
got to do something new.
In fact, you can see this reflected in the
first picture, the first figure in our textbook
of Hennessey's and mine in computer architecture.
In the good old days, we had the CISC architecture
with only doubling performance every three
and a half years and then we had this RISC
era, which was doubling every eighteen months
and you couldn't wait to get the next laptop
because it was so much faster than you when
you had.
That's what many people thought of as Moore's
law was that speed up.
With the end of Dennard scaling in power budgets
we had to switch to multicore, so you could
see it slowed down to about doubling every
three and a half years and then with Amdahl�s
law about the benefits of multicore limiting,
it slowed down to every six years.
I was shocked in the latest edition of the
book, the Intel servers and standard benchmarks
are only like three percent faster per year,
doubling every 20 years.
So, we�ve come to kind of a grinding halt.
What are we going to do?
Transistors aren't much better, power budget
is limited, we've already done the multicore
trick, the only thing left that architects
know what to do is what we call domain-specific
architectures.
We're going to design for a narrow class of
things that won't have to do everything well
but what it does well, it will do very well.
An example of domain-specific architecture
happened at Google.
Google, about five years ago, thought that
with neural networking, they're going to be
able to do things like speech translation.
They calculated that if they didn't do anything
to accelerate that and use standard CPUs,
Google would have to double their cloud, double
their datacenters to handle the load, which
would be extraordinarily expensive.
So, they started an emergency project to build
a custom hardware chip to do the inference
part of neural networking.
What's on that chip is - the heart of that
chip is the matrix-multiply unit, that is
65,000 8-bit multiply-accumulate units, which
is kind of an amazing number and with 700MHz
clock rate, that�s more than 90 trillion
operations per second.
That's, in terms of MACs, 25 to 100 times
more than other chips have.
It's got 4MiB of memory to handle the results
and another large memory that's a buffer to
handle other operations, that�s 24MiB on
the chip.
Now, so called parameters, or weights that
you need for neural networking don't fit on
the chip, so they have to be brought in from
DRAM over a DDR3 interface.
Here's the floorplan of the chip - the matrix
unit is the yellow one on the right, that's
a quarter of the chip.
The unified buffer and the accumulators are
very large memory, that�s about another
40% of the chip.
So, two-thirds of the chip are those two things.
In architecture, it's hard to compare in absolute
scale so you compare it to other things.
The contemporary chips of the time were the
CPU and GPU and if you look at the leftmost
column, the TPU is only about half the size
of those other chips and it's also about half
the power of those chips.
What about the performance?
To explain the performance, it's not just
peak performance, it's the actual performance
of delivered things.
A useful graph to show that is called the
roofline model.
The roofline of the chip has a flat part and
this diagonal part.
The flat part of the chip is kind of the speed
of light; for the TPU, you can�t do faster
than 92 TOP per second, it's physically the
speed it will go.
Now the slanted part is when you're limited
by the memory system.
Typically, you're either limited by computation
or limited by the memory system.
Then what the graph uses to decide where your
program is running is the average number of
operations for memory bytes fetched so if
you have a lot of operations per memory byte,
you are over there on the right-hand side
where you are computation limited, if it's
few operations per memory byte you're in the
slanted part and memory limited.
What does a TPU look like?
A TPU looks like a really long slanted roof
because it didn't have that great of a memory
system; it had huge computation.
You can see four of the six neural networking
apps are memory limited but they're pretty
close to the roofline so they're operating
pretty efficiently.
Here's the CPU.
You know, it's got a lower roofline, but the
apps are close to the top.
But the GPU, the apps are quite a bit below
it.
So, why is that?
One of the surprise when analyzing the results
that we didn't know in advance was that response
time, 99th percentile response time was vital
for these users facing neural networking applications.
As you can see between the green and the red
on these figures, for the CPU and GPU, the
99th percentile 7ms response time is far off
the peak of what these things could do if
you didn't care about the response time, where
as the TPU from Google is pretty close; there's
a small difference there.
That's one of the advantages or one of the
requirements that the TPU worked out.
If we put all of them together, you can see
on a log-log scale what the relative looks
like but I don't think log-log scale.
I have to think linear so that's what it looks
like.
So, it's a lot faster, how much faster is
it?
If we compare our six benchmarks, the GPU
is about twice as fast as a CPU, the TPU is
about 30 times faster than CPU, so the TPU
is about 15 times faster.
Now, in a datacenter, which is where the TPU
is placed, you don't just care about performance,
you care about total cost of ownership, which
is the cost of the chip as well as the operating
for the electricity and things like that.
Companies don't reveal their total cost of
ownership, but it is related to the performance
per watt, which we can talk about.
This slide shows the performance per watt
two different ways.
Since it's an accelerator, it goes into an
Intel server.
You can either include the host power or not
but if we don't include the host power, it's
a factor of 80 in the performance per watt
over the CPU and a factor of 30 over the GPU.
Now, remember I said it was an emergency project
and it just used standard DDR3 interfaces
whereas the GPU has the graphic DDR5 interfaces,
which is much faster.
So, a natural question is, suppose we had
the time to be able to use DDR5 memory instead,
what would�ve happened to these results?
It would be kind of astronomical.
The TPU would have been a factor of 200 times
performance per watt of the CPU and a factor
of 70 times performance per watt of the GPU.
Alright, summering part two is, the TPU, we're
doing this because we can't make general-purpose
processors go a lot faster.
We picked neural networks, which is useful
for a lot of things; it helps with many tasks.
A key part is this very large matrix-multiply
unit and it's two-dimensional, which matches
neural networking needs.
The CPU has about a dozen or so 1D ones, so
does a GPU.
It uses 8-bit integers instead of 32-bit floating-point
numbers and it drops a bunch of features,
so it can be as small and low-power chip because
you don't need those general-purpose features
if it's domain-specific.
It has a single-threaded performance; one
program is running at a time where the other
ones have a dozen or so, which makes it hard
for that response time limit.
It could run the neural networking applications
of today.
It's a flexible enough it's not hardwired.
TensorFlow let it be at a very high level.
Alright, that�s part two.
Part three, open instruction set architectures.
We've already said RISC is the conventional
best of [MOU3]the best instruction set but
how many instruction sets are there on an
SoC?
About a dozen.
Amazingly, there's a dozen different instruction
sets for the reasons you see on this slide,
with separate software stacks.
Do we need that?
Why do they all have different instruction
sets?
And they are proprietary.
Our colleagues in software have open sourced
operating systems, open sourced databases,
why don't we have open instruction sets?
Could you do that?
And that's what I'm going to talk about now.
RISC-V, it�s called RISC-V because I was
involved in four RISC projects at Berkeley
and they kindly decided to name this one RISC-V.
We were doing our research about 8 years ago
or so and we had to pick an instruction set
to do our research in, the obvious choices
were ARM and Intel but a) they were too complicated
and b) we were prohibited by law from using
them, so we had to do our own.
We started a short project to do our own instruction
set and be clean slate taking advantage of
starting 25 years later to do a better design.
I was involved, it was led by my colleague
Krste Asanovic, and the two graduate students
that led it were Andrew Waterman and Yunsup
Lee.
It took us a lot longer than a few months
to release it, but you know when you're in
academia, industry complains all the time
about what you do, about you not doing a good
job at education.
So, we were used to that, but we started getting
complaints along these lines of how come you
changed your instruction set from the fall
in your courses to the spring in the courses,
why did you do that.
That was an unusual complaint, so when we
investigated why did you care what we did
in our courses, which instruction set we used.
We found out that apparently a need had developed
for an open instruction set architecture and
they looked around at what was available,
and they liked RISC-V so they started using
that.
Well, once you realize there is this need
for an open architecture we thought that was
a great idea and kind of following Donald
Peterson's tradition, we would try and help
industry be successful using those ideas and
make it open.
What's different about RISC-V?
First of all, it�s really simple instruction
set architecture.
It's far smaller than the others.
The x86 manual is like 2500 pages and it would
take you a month to read if read it every
day, 8 hours a day.
RISC-V is 200 pages, you can read it in an
afternoon.
It's clean slate, we had no baggage and we
learn from the mistakes of others.
It's modular, it was a very small base instruction
set that's almost identical to the architecture
presented here 34 years ago, RISC-I.
It�s designed to support domain-specific
architectures with extensibility.
There's actually two instruction set design,
a 32-bit and 64-bit instruction set.
The optional extensions include multiply divide,
atomic operations, single- and double-precision
floating point, and a very nice vector architecture.
Opcodes are set aside so you can do the domain-specific
extensions to it.
It's to keep it, you know.
If it's just a university project, it's hard
to bet your company on that so like other
open-source efforts, we started a foundations
and it's taking responsibility of the architecture.
There's at least a hundred members involved.
I'd just like to mention two of whom are NVIDIA,
which announced that at a workshop that they're
going to put them, there will be a RISC core
in every NVIDIA GPU chip in the future so
that'll be tens of millions per year and Western
Digital at the last workshop announced they
would put a RISC core in every disk that they
make so that'll be billions of RISC cores
per year.
An interesting way to do this is, because
it's a minimal instruction set, we have optional
extensions, how do we add more extensions.
Classically what happens in architecture is
a company announces some new extension and
then all the software people in the world
complain about how they got it wrong.
This time we're doing the complaining upfront.
We are having groups of individuals and companies
get together and proposing.
It's almost like a standard in advance, maybe
like JEDEC does, and you agree what's the
right ideas in an open conversation before
you put it in.
Andrew and I, Andrew�s the other architect
of it, did a book you can get a hold of.
Let me go ahead and summarize these slides
and take a few minutes to talk more historical
perspective[MOU4].
Like I said, I can't believe that RISC is
still the best idea in instruction set architecture,
but it is.
RISC-V is free and open, people ask what that
means, that means there's no proprietary lock-in.
If you join the RISC-V foundation you can
use it.
Anyone can buy it or sell it.
It's much simpler, which helps because people
want to verify that the instruction set is
correct, and security and verification are
very important.
So, it's popular with security people.
The simplicity of the instruction set makes
a big difference at the low end in terms of
area and power and it tends to be faster too.
As you get a much bigger design, the instruction
set plays less of a role, but you know you
can have very fast implementations.
It's very easily extensible, which we need
for the domain-specific architectures as opcode
are set asides[MOU5].
The big difference is right now with x86,
there's two companies that build those: AMD
and Intel.
If you don't have one of their badges you
can't design their processors.
For ARM either work for ARM or there's about
a dozen companies who's paid millions of dollars
to have the rights to design their own but
there's a small number of organizations that
can design ARM processors.
With RISC-V, everyone, universities, every
companies, everybody can design processor
architectures and we're starting to see people
putting them into the open-source space.
There's Berkeley and universities, I think
ETH in Switzerland have put out open-source
versions, which people are just downloading
and using and putting on their chips and they've
been used many times.
I think we're going to see a lot faster innovation
in this much more competitive environment
because it's an open instruction set architecture.
You know, our modest goal for the project
is just to be, you know, become world dominating
- be the instruction set that you run everywhere.
What I wanted to do is, take advantage of
my return to ISSCC, to point out just how
popular these ideas already are.
Here are our talks that are related.
The first talk, talk 2.5 is about security,
and it has our fearless leader, the ISSCC
general chair is one of the co-authors and
they're using RISC-V processor for security.
Tonight, Norm Jouppi, Norm is the architect,
the lead architect of the TPU, and he's going
to bring both the TPU version one, which is
the one I talked about a little bit, which
is if you heard the earlier talk, is for inference
and TPU version two, which is for training.
He�ll be here at this new thing called industrial
showcase tonight and he'll bring hardware
with him and talk about it.
Frans Sijstermans, I hope I said that right,
he's also giving a presentation and he's giving
a presentation in this idea of open-source
hardware.
NVIDIA has decided to make their edge inference
device, the deep learning accelerator, free
to anybody who's going to use it.
They have the sources that are going to be
available.
Frans and I are on the RISC-V foundation board
of directors, so I know him well.
So, that's another exciting thing, that's
tonight.
My colleague from France, who work in Germany,
[MOU6]Olivier Temam, who is an early leader
in these things is going to give a talk about
devices at the edge, that�s 13.1 on Tuesday
at 1:30.
Then on Thursday, the whole day is dedicated
to a short course on machine learning.
There's four sessions there.
Okay, the last slide and the conclusion and
wax philosophic[MOU7] here.
This ending of Moore's law and Dennard scaling
means, you know, performance and energy is
going to have to come from innovations and
instructions of design.
Something that's visible to the programmer
and the compiler.
I think the main specific architectures in
RISC-V are going to play a big role in this
change in architecture.
To put the TPU in perspective, this factors
of 30 to 15 better is amazing for commercial
products.
Intel dominates the server market because
its processors or its microprocessor are maybe
one and a half times better than the AMD and
they dominate the market, right.
These are factors you don�t usually see
in between commercial products, so it indicates
why people are excited about the domain-specific
architectures.
I think what we're doing, you know, in reflecting
over my career I think we're entering another
renaissance era in computer architecture.
In the 1980s, there was a lot of excitement,
lots of startups, lots of hardware, you know,
it attracted great people to the field.
My colleague, just a random example of a great
people in the field is my colleague John Hennessy.
John Hennessy became the president of Stanford
and he was the president of Stanford for 16
years.
Not only was he probably the greatest president
in Stanford's history, he was the greatest
college president in the United States during
his era.
So, we were able to attract people like Hennessy
to the field.
I talked to my colleagues about the incoming
PhD class in computer architecture at Berkeley,
the people that are applying to get in the
fall.
It's the best class that they can remember.
They couldn't not accept as many as they could
because they were so fantastic.
So, I think undergraduate students are getting
excited about what's going on and designing
hardware again.
Particularly, it might be this neural networking
that's part of it, but we're getting great
people as input.
Then startups, hardware startups.
Last month�s New York Time talked about
45 startups around the things I was talking
about, domain-specific architectures.
Five companies with a hundred million-dollar
investments and more than one and a half billion.
These are exciting times.
I'd like to thank you all for inviting me
here and I look forward to seeing what you
guys need next.
Thank you.
[MOU1]
[MOU2]
[MOU3]
[MOU4]
[MOU5]
[MOU6]
[MOU7]
