Hello, everyone out there and welcome.
I'm Vicki Hanson, the president of ACM
and I'm really thrilled
to be here tonight.
ACM initiated the Turing Award in 1966
to recognize contributions of
lasting and major
technical importance to the
computing field.
We’re indeed fortunate
that the 45th international
symposium on computer architecture,
is the venue for this year’s ACM,
AM Turing Lecture
As a result of the
fundamental contributions
to the development of energy efficient
RISC-based processors,
the two recipients of
the 2017 Turing Award
have played a key role both in the mobile
and in the internet of things revolutions.
John L. Hennessy, former president of
Stanford University
and David A. Patterson, retired president
I said that before, retired professor
of the University of California, Berkeley
and former president of ACM
conceived of using a set of simple
and general instructions that
needed fewer transistors,
and that’s reduced
the amount of work
a computer most perform.
The concept was revolutionary.
The work led by Hennessy at Sanford
and Patterson at Berkeley
would result in a systematic,
quantitative approach
to designing faster,
lower power and
reduced instruction
set microprocessors.
Patterson’s Berkeley team coined
the term RISC
and built the RISC one processor in 1982.
It would later be commercialized
by Sun Microsystems
and a SPARC microarchitecture.
Hennessy co-founded MIPS
Computer Systems in 1984
to commercialize the Stanford team’s work.
Over the past quarter-century,
their textbook Computer Architecture,
A Quantitative Approach,
which is now in its sixth edition
has influenced generations of engineers
and computer designers
who would adopt and further
refine their ideas.
It would become the foundation
for our ability to model
and analyze the architectures
of new processors
thus accelerating advancers
in microprocessor design.
Today, 99% of the more than 16 billion
microprocessors produced
annually are RISC processors.
They are found in nearly all
smartphones, tablets,
and in the billions of embedded devices
that comprise the internet of things.
It's my privilege to introduce
our 2017 Turing Laureates,
John Hennessy, Chairman of
the Board of Alphabet Inc.
and Director of the Knight-Hennessy
Scholars program at Stanford University
and Dave Patterson, distinguished
engineer at Google
and Vice Chair of the Board
of the RISC-V Foundation.
John and Dave, welcome.
Alright.
So, when John and I were
figuring out how to do this,
it just seemed like it would be
crazy if we gave two independent
lectures given our career.
So, we’re going to do this as tag team.
So, part one is going to be
history, we’re going to do
the history of computer
architecture in about 20 minutes,
I'm going to do that part.
John is going to do part two,
talking about the
challenges facing our field
we’re going to tag team,
doing the first part on domain specific
architecture and me doing the last part.
And then there's time for
questions and answers,
and we’re looking forward to that.
Okay let's go back 50 some years,
early 1960s, IBM had this problem,
IBM had four incompatible
lines of computers
and when incompatible, the
instruction sets are different,
the software stacks are different,
, the operating system
different, the IO is different
and the markets are different.
So, IBM engineers had this idea
that they would bet the company,
bet the whole company that they
could have a single instruction set
that would handle all
four independent lines
that they’d unify around that
To pull that off then and now
the hard part is control,
but Maurice Wilkes, the second
person to win the Turing Award
and from our -- from our field had
this idea of how to make it easier
to build control and his idea
was called microprogramming.
The insight, the logic is what
and his technology at the time
was more expensive than ROM or RAM,
but ROM was cheaper than
RAM and was faster,
so let’s specify control
as a read-only memory
and they called each word of that
control memory microinstruction.
So, IBM was going to bet the company,
were going to pull this off
using microprogramming.
So, on April 1964, they made
the biggest announcement
in the company's history
and here's the example of four
different machines, you can see
the data with those from 8 bits to 64 bits
and the microinstructions
go from 50 bits to 87.
So, back then with the microcode
is the wider their hardware,
the wider the microinstruction, but it
didn't take as many microinstructions
are interpret it so it was shorter,
so that was kind of called
horizontal microprogramming
and the one on the left is 50 bits,
but it’s longer because it takes
more clock cycles to execute.
So, that was called
vertical microprogramming
and they bet the company and
back then using today's dollars,
a million and a half
dollars for the small one,
they won the bet. Bet the company,
won the bet. So, IBM dominated
the mainframe computing
industry and still to this day
that instruction set is still available
and dominates the mainframes.
And the second computer
architect to get a Turing Award
Fred Brooks played a big
role in that effort.
Moore's law comes along,
semiconductor technology
comes along and then
the minicomputer.
Now logic, RAM and ROM are
all in the same transistors,
so they're all about the same
and RAM is of the same speed.
But with Moore's law, we can
have bigger control stores
and also because of this RAM, it
made it you could have bigger
control stores because you could fix it.
So, this led to these more
complicated instruction sets
and the classic example
that was Digital Equipment,
VAX instruction set and you
could see it had 96 bit wide,
but 5000 microinstructions.
An idea came along because it was in RAM
was called Writable Control Store,
basically you could then
since it was alterable
rather than just run the
standard instructions set,
you could put tweaks and to tailor it
to exactly to your application
so Writable Control Store.
So, microprogramming became very
popular in academia and that’s
when I was a graduate student.
My PhD thesis is in this area
my first paper was at SIGMICRO,
but SIGMICRO is actually,
you may not know this is,
the International Workshop
of Microprogramming.
Later it changed its target, but it was
microprogramming in the beginning.
Surely, the most famous machine
with writable control
store was the Xerox Alto
built in 1973.
This was the progenerator of
the ideas we all use today.
It had the first computer with
a graphical user interface,
the first computer with an ethernet
and this is all written in microcode
in the writable control store.
It could do bit lithographic nomination
and ethernet controller and the person--
there's a picture of the Alto and
the third computer architect to win
the Turing Award Chuck
Thacker and this fork,
his contributions with the Alto.
Now, microprocessors
were behind the times.
TMicroprocessors would follow
what the big guys did.
MOS technology was rapidly growing
and they would just imitate
what the big would do,
they would do these microprocessor wars
because everybody was writing
still an assembly language,
they’d say here’s my new
instruction and look, we use this
and they would be countered
back-and-forth through people
and venting new instructions.
Surely,
the most ambitious microprocessor
maybe of all time but certainly in
the 1970s was I think it's 432.
Gordon Moore of Moore's law fame
was a visionary and he believed they
had the 8-bit 8080 microprocessor,
he believed the next one they did
he told was going to be
stuck with forever that
would last for as long
until it would last.
So, he hired a bunch of
PhDs in computer science
sent them up to Oregon to invent
the next great instruction set
and this is a very ambitious project.
They did in that era 32-bit capabilities,
it was object-oriented architecture,
it had a custom operating system written
in an exotic programming language,
so big ideas. Alas, they were big,
but they were -- they were late.
So, it didn't fit in one chip,
it was spread across a few chips,
it had performance problems and
it was going to be years late.
So, the people for Oregon
had to tell Gordon Moore,
sorry we’re not going to be done in time.
So, what was Gordon had to do,
what he had to do is start an emergency
project. It was called the 8086,
they had 52 weeks. The team had
52 weeks to upgrade a stopgap
16-bit processor
so instruction set, architecture,
chip, everything in 52 weeks.
They took three weeks of elapsed time, 10
person weeks to design instruction set.
And the they basically
extended the 8080 to 16 bits
and it was announced not
to very much [inaudible].
The great news for Intel is that IBM
decided to pick the -- an 8-bit version,
8-bit bus version of 86,
they liked the 68,000.
It had a more elegant
instruction set closer the
360, but it was late.
So, they went with the 8088.
IBM thought at the time
that they’d be able to
sell of the PC maybe 250,000 of them
instead they sold hundred million. So,
the 8086 became an overnight success
and thanks to binary compatibility
that IBM had invented earlier,
it was binary compatible with PC software
and so, a really bright
future for the 8086.
So, now researchers started
taking look at these
microcoded machines.
This is a picture of
a fourth computer architect
to win the award John Cocke.
What is happening is this
transition from assembly language
programming to programming
at high-level languages.
High level languages were
popular, but he couldn't
write operating systems,
UNIX disproved that.
UNIX was written in a
high level language,so we
could write everything in
a high level language.
So, now it wasn't what assembly
language programmers did,
it was the output of the
compiler that mattered.
John Cocke and his group as IBM, which
built this hardware, this ECL server,
but also in particularly advanced
compiler technology. They said,
let's take our compiler technology of
this IBM mainframe instruction set,
but only use the simple instructions that
loads and stores register-register ones.
What would happen to performance?
It went three times faster
but using a subset, well that
was kind of a shocking result
and then computer architects Joel Emer
and Doug Clark did the study of that VAX
architecture that I showed
you few slides earlier,
what did they find. First of all
that the average number of clock
cycles per instruction was 10,
so with that microcode interpreter,
it took on an average 10
microinstructions to be able to do that.
And then the other thing that
they found is 20% of that
instruction set was 60% of the
code and it was almost never used.
So, that wow, why are we doing that.
I came on the scene, I
joined Berkeley in 1976
and then I did a kind of
strangely as a system professor,
I did a sabbatical three years later
at DEC because I had done my
dissertation in microprogramming
and they wanted help with
microprogramming bugs.
And I came away kind of astounded how
hard it was to get bugs out of the --
out of the VAX instruction
set architecture.
So, I -- as soon as I came back,
I wrote a paper, I said look
if the microprocessor people
are going to follow the trends
of the minicomputer and mainframes and
build more complicated instruction sets,
they’re going to have a microcode problem
and then we’d have to be able
to repair the microprocessors.
I was going to use
writable control store about that.
So, what happened to that paper?
I come back from my sabbatical, do
all that work, it was rejected.
I remember the viewers
saying, this is a stupid way...
...to build microprocessors
with repairable microcodes.
So, I like if they’re going to make it
more complicated to repair it...
...but it’s a stupid way
to build microprocessors...
...so then kind of like well, why
are we doing this?
So, this is a transition
from CISC to RISC.
So, the SRAM the fast memory
inside the processor...
...processor instead of using
it for a microcode interpreter...
...let's just make it a cache
of user-visible instructions.
So, the content of the memory will change
depending on what the program is running.
Let's keep the instruction
set so simple...
...you don't need an interpreter you could
basically get a pipeline implementation.
So, you could think of them as simple
as the microstructure is just not as wide.
By the way, the CISC, you know, the
compilers only use...
...a few of these complicated
CISC instructions.
So, you’re not losing
that much anyway.
Around that era, the chips are getting
bigger due to Moore's law...
...and we could get
whole 32-bit datapath...
...and a cache on a single chip, so that
made this RISC more attractive.
Then, there was a breakthrough
in register allocation...
...by Greg Chaitin. Sorry...
...using graph coloring
that made...
...register architectures much more
efficient than in the past.
So, that's about when Berkeley
and Stanford came on the scene.
We did our work originally
with a series of...
...a graduate courses,
four graduate courses were...
...investigated the architecture
that this was RISC-1...
...that was mentioned earlier.
Two of the graduate students
decided to build a more...
...efficient version about
the same time as Hennessy....
...and his students at Stanford
built the MIPS one.
So, these were all done
about contemporaneously.
You know, we wish we’d had this
explanation early on but eventually...
...through I think Clark's
work evaluating that...
...VAX and that architecture
we talked about...
...we could do this iron law factoring
into three things and as...
...Dileep Bhandarkar who is here...
...wrote in a paper after
the RISC stuff is look...
CISC executes fewer instructions
maybe three horses as many as RISC...
...but a lot more clock cycles so there's
this kind of factors of four advantage.
So, now I'm going to go back
in history...
...and you’re going to see a couple
things. First of all...
...people who look something like
John and I'm but with a lot more hair...
...and you’re going to see these things,
these little fancy pieces of plastic...
...but this is the way we used to
do presentations for you younger people.
The great thing about when we used to
these transparencies they were called...
...you would start the class
from the projector and start.
It would take 5 seconds to start. Today
we, it takes about 30 minutes to start.
Alright, here we go.
This was the long video.
So, we’ll stop right there.
Alright, so what happened
with -- so our colleagues at Intel...
...were able to mast teams
of 500 people to build microprocessors...
...much more than the RISC
people could do...
...and they had great technology
and they had the idea of...
...translating the x86 instructions as we
know into RISC interest microinstructions.
So, any of the good ideas that RISC people
had they could use...
...and so they started to dominate...
...and got up to 350 million
chips a year, which is...
amazing and not only dominated
the desktop, but servers as well.
But the post PC era, which just
say let's start from...
...the iPhone in 2007...
...now it's not buying chips
from Intel...
...it's -- it's getting
intellectual-property to integrate...
...on the SOC that
you're designing yourself...
...and so that's different and of course,
in this marketplace, the value...
...area energy as much is just
plain performance, so I mean...
...hard to handle it and...
...actually last year, there were
more than 20 billion chips...
....with of 32-bit processors in them.
The x86 peaked in 2011
with the dropping sales of PCs...
...and it actually they’re selling
less now than they used to.
The cloud is still big, but as this paper
estimated there's only like...
10 million servers in the clouds, so it's
not that many chips.
So, it's 99% of the
processors today are RISC.
Okay, what's next in history
computer architecture is something...
...that was going to replace
RISC and CISC, which is VLIW.
If you can -- VLIW, the champion
of that was Josh Fisher.
he actually did his dissertation
in microprogramming and
You can think of VLIW’s
horizontal microcode right...
...it’s really wide instructions
controlling a lot of data path...
...but the compiler does all the work.
It was time for Intel to expand
their architecture...
...and they decided to get to 64 bits,
they had done the transition...
...from 16 to 32 and now
was time for 32, 64...
...and they decided they
were going to bet on VLIW...
...and they named their bet EPIC...
and in fact, they joined
forces with Hewlett-Packard...
...who was also working
VLIW and together...
...they were doing
the EPIC architecture...
which is a VLIW with
binary compatibility and
...they started during 1994
to considerable fanfare...
...and now what this meant was
for business reasons, you know...
...they were using this emergency
instruction set for...
-- for 20 years and so,
it kind of made sense...
...to get a more technological
foundation..
There was also a
business advantage is...
...that AMD had the rights
to make the x86.
But, given the new instruction set, they
didn't have the rights to that.
So, they weren’t allowed to. So, AMD was
forced to just extend the x86 to 64
...bits while this new architecture
was going to take over the world.
A bunch of people, when Intel
and HP joined forces...
in the 90s said, this is the future,
bunch of companies just believed them.
Wow, that’s going to happen
like it or not...
...so they just quit what they were doing,
dropped the RISC architectures...
...and started to embrace EPIC.
What happened? EPIC failure.
Alright, so...
...the compiler what happened was, you
know, the code that might work...
...well in floating point code didn't
really work on integer code right.
The pointers caused a problem
and in particular the problems were...
...of the code size, it’s VLIW,
right, so the program...
has got a little bigger,
they’re long instructions...
...that's a problem.
Unpredictable branches were a...
...problem particularly for
these integer codes...
...and then cache misses
which were unpredictable.
So, all three of
those made it --
well, two of those
made it really hard to compile...
...and the program
is really a lot bigger.
The out of order techniques
worked pretty well for the...
...cache latencies
better than the VLIW.
So, kind of out of
order subsumed it and...
...but the biggest thing
was the compilers...
...which the VLIW bet was on, I
think compilers can handle...
...all this complexity, schedule
all these wide arms.
It turned out as Don
Knuth and other Turing...
...award winners said, impossible
to write.
Now, given all the publicity around...
...the Itanium and EPIC it was called.
When started not
to work, people noticed...
...and so some wag instead
of calling it the Itanium...
...re-christened it the Itanic.
So, then you can see
the sinking into the future.
So, that's kind of what we do
in computer architecture, right.
We have these arguments then companies go
spend billions of dollars...
...betting on both sides and
then we let them architect...
...and figure it out and in
this case, it failed.
So, wrapping up my part before
I hand off to my colleague...
...the consensus on instructions sets
today, it's not CISC. No one has...
proposed one of these microcode
interpreter instruction...
...sets in more than 30 years.
VLIW didn’t work for general-purpose for
some of the reasons we said. However...
...you know, it found a place in more
embedded of RDSP things because...
...you know, it -- the
branches are easier, it...
...doesn't have caches, the
programs are smaller. So, it kind of...
-- VLIW worked there, it didn’t work for
general-purpose. So, what’s left, RISC,
Who would have guessed, 35 years
later it’s till the best ideas.
Okay, with that, I will tag John
and he’ll take over.
Okay, and now for something
really different.
Okay, so what we’re going to talk
about now is what...
...some of the current challenges are.
I know most you are...
...familiar with those kinds of things.
Technology changes, we’re
in an era of lots of change right now.
The end of Dennard
scaling means that power...
...and energy become the
key design constraints.
...and the ending of
Moore's law, not the complete end...
...all as Gordon Moore said to me...
all exponentials come to an end.
Right. So, we’re in the slowdown
phase in Moore's law...
...but we’re also faced
with similar...
...kinds of challenges around our
architectural ideas.
Because our architectural ideas...
...as we push than ever harder
became less and less efficient.
...whether they were
ideas about multicore...
...in Amdahl’s lawover
there were concepts...
...for exploiting instruction
level parallelism.
We were pushing the envelope
more and more...
...and as the inefficiencies in those
fundamental architectural ideas...
...became larger and larger...
...the fact that we were
at the end of Dennard scaling...
...and the end of Moore's law
made it more and more difficult.
So, what’s happened
in terms of processors?
Well, we have this early on notion, we're
getting early CISC processors about...
a 22% performance per year. Then, we got
on this incredible high phase...
where we were getting dramatic performance
goals, 50% improvement per year...
...then we sort of ran out of steam
with the ILP [inaudible].
The end of Dennard scaling
came along. We moved to multicore.
It worked pretty well
and then things got even slower
...and finally, you look
at the last two years...
...we’re basically looking
at 3% improvement...
...in processor performance per year.
The end of a dramatic phase...
...we got to rethink what we're doing, we
have to approach problems differently...
If you break this down and begin to
look at things, you can see that...
...turnover in Moore's
law with respect DRAMs.
Of course, DRAMs are
a very particular technology...
...they rely on trench
capacitor design...
...and so, it's basically seeing
the tail off faster than conventional,
...but even if you look
at the number of processors...
...the number chips, the number
of transistors in an Intel processor...
...you can begin to see the end
of Moore's law, right.
There, first a little bit
and then we gathered steam again...
...but if you look at that
curve since about 2000...
...we’re falling off. So, we’re now --
...if we'd stayed on that curve...
...we’d have 10 times as
many transistors in...
...the typical microprocessors
we have today.
So, we have a really differentiated
and separated away from that curb...
...that's caused all of us to think...
...how -- what are going to do next?
...and of course,
Dennard scaling a similar...
...kind of problem
even more acute...
...I think everybody would say
everybody does chip design would say...
...energy is job one now.
Power is job one think
about the design.
So, as the technologies
continue to improve...
...we’ve got this curve...
...that's power consumption
that's going in the other direction...
...and you look at how that curve
really takes off after about 2008...
...it just goes up and up and up.
Now, of course that’s power...
...how it translates the energy...
...depends on how efficient
we are in using the transistors.
Unfortunately, the techniques
we have for using transistors...
...have become increasingly inefficient.
Think about cache as
something we all love...
...one of the truly great ideas
in computing, okay.
But of course larger and larger
you make your cache, the...
...less and less effective it is
in speeding up the program.
So that's our challenge, we’ve got to
find new ways to think about how to use
...the capability that
we have more efficiently.
We’re also in a pretty
sorry state with respect...
...to security as we heard
the panel at lunchtime.
The simple thing I'd say
that security is...
...if airplanes malfunctioned
as often as computers malfunctioned...
...nobody would be at this conference
who didn’t live in Southern California.
You’d all be home, because
you’d never get on a plane, right.
If cars malfunctioned that way, we’d
never get in a car right.
We’ve got a big problem.
Now some of us are old enough...
to remember when there
was a lot of emphasis on security.
In the 70s where great projects, it was
Multics, there was a lot of focus on it.
We invented domains, rings,
even capabilities...
...some of the ideas that are just
coming back into computer architecture.
We’re piloted in the 1971.
What happened?
Those ideas were not well used.
We also had not
yet developed the...
...architectural techniques
to make them fast.
So, things like translation
lookaside buffers which make...
...virtual memory possible. Imagine
virtual memory without that...
...every single memory
access, every single memory...
...access requires two
accesses to main memory.
So, we didn't have high-performance
ways to do it.
The technique didn't seem to help.
They had lots of overhead.
They were abandoned.
At the same time, we thought...
formal verification is going
to solve all our problems.
We’re going to verify
all our software.
In fact, I remember the rise of kernel
based and microkernel operating systems.
The kernel -- the operating system
that controlled security things...
...was only to be 1000
or 2000 lines of code.
There’s no kernel with less than
a million lines of code out there.
So, we basically didn't get
the security thing right...
...the way we thought we
were going to solve it.
Almost all software has bugs.
All of you who buy new piece
of software, get 15 page...
...disclosure which basically says if this
software doesn't work too bad for you.
Right? That's what it says and you
all check the box and get the software.
So, we the hardware
community, the architecture community...
...have to step
up, working hand-in-hand...
with the art -- with
the operating system community...
...with the people
who think about security...
...to really get this problem right.
If we don’t it...
...we’re going to have a
community of users out there...
...who are become increasingly
unhappy with us.
There are a lot of holes in this.
If you look at it, here's...
just one simple example,
Intel processors have
this management engine...
...that runs a bunch of code...
...before any kernel operating
system code runs and sets up the machine.
Who reads that code and makes
sure that it is doing the right thing?
None of us read that code.
So, we’ve got real problems...
with large instruction sets,
people can test random...
...opcodes, find holes in the
instruction set definition...
...lots of issues that we’ve got to
get right and we’ve got to rethink.
And of course, we have
the specter computer...
...architecture thing as we’ve
pointed out at lunch.
This is going to require us
to rethink our definition...
...of instruction set
architecture...
because we never thought
about timing base.
Now, I'm old enough to remember
when timing-based attacks...
...were first discovered
back in the 1970s.
There was a timing-based
attack on tops 10 and...
...tops 20, the OSs that ran on deck
system, 10s and 20s back then...
...and then we just kind
of forgot about it...
...and assumed it was
an operating system...
...problem and purely
a software problem.
Now it's an architecture problem.
And it's an architecture problem that's...
...been there for 20 years
and we really didn't even know it.
So, we've got rethink
what we mean about...
...security, what our role, what role
the architecture community is...
...how we work collaboratively
and I think...
...as we’ve pointed out lunch
there are lots more microarchitecture...
...attacks on the way, lots more timing
based side channel attacks are coming.
This is not to be an easy problem,
it’s going to mean really rethinking...
...how we think about security.
Okay, so this is a new time
to think about this problem...
...and we’re going to have to redefine
our notion of computer architecture.
So, here we are...
...sounds like a tragedy
about unroll.
Slowdown in Moore's law,
no more Dennard scaling...
...architect security is a mess...
what are we going
to do about this?
So, I've always taken the view there's a
great quote that John Gardner once said.
He said, what we're facing
is a set of seemingly impossible...
...challenges, opportunities disguised
as great challenges...
and that's where we are.
We have great opportunities.
So, what are those opportunities?
Think about software...
...Maurice Wilkes said to me
about 25 years ago, I said Maurice...
...what happens if Moore's law
ever slows down?
...and he says then we’re going to have
to think a lot more carefully about...
...how we write our software and pay a
lot more attention to efficiency.
So, we’ve got software
centric opportunities, right.
We all write in these modern scripting
languages. They're interpreted.
They are dynamically typed.
They encourage reuse.
They give lots of power...
...but they run terribly.
They are incredibly inefficient,
great for programmers, bad for execution.
Hardware centric approaches,
I think Dave and I both believe
the only path forward and I...
...think lots of people in the
architecture memory take this view.
It’s something that's
more domain specific...
..so don’t try to build a general purpose
processor that does everything well.
Build a processor that does
a few tasks incredibly well...
and figure out how to
build a heterogeneous architecture...
...using those techniques.
Of course, there’s combinations,
hardware and software go together.
We’ve got to think about not only
domain specific architectures...
...but the languages that are used
to program them at the same time.
So, here's a great
chart that's out of a...
...paper called There's
Plenty Room at the Top...
...by Leiserson and a group
of colleagues at MIT...
...and it looks at a simple
example admittedly...
...a simple example,
Matrix Multiply.
So, we’ll take the version
in Python...
...how much faster does
the version just rewritten and C run?
47 times faster, 47 times.
Now, I've worked in compilers
before I worked in computer architecture.
A factor of two would make you
a star in the compiler community.
If you just get half that 47,
you’ll have the factor of 23.
You’ll be a hero, you’ll win
the Turing Award.
Then you take it and you take
it on to an 18 core Intel...
...you find the parallel loops
because there's no way...
...that our software systems can
find them automatically...
...that gives you another factor of 8...
Then you take -- you layout
the memory optimization so...
...the caches actually work
correctly with a large matrix.
...which they typically don't.
That gives you another factor of 20...
...and finally...
...you take advantage of
domain specific hardware...
...with Intel AVX instructions,
you use the vectors...
...and that gives you another
factor of 10 performance improvement.
The final version 62000 times
faster than the initial version.
That's a lot of work to get that
out of Moore's law and hardware.
So, this is just a great opportunity.
Domain specific architectures
achieve higher efficiency by...
...tailing the architecture to the
characteristics of the domain.
...and by domain specifically
mean to include...
...us programmable
infrastructure, right. A...
machine which does a
processor architecture...
...that does a range of applications...
that are characterized
in a particular domain...
...not just one
hardwired application.
Neural networks for machine
learning, GPUs for graphics...
...programmable network switches
are all good examples.
So, if you know Dave
and I, you know...
...we’d like to see a quantitative
explanation...
...of why these
techniques are faster...
and I think that's important
because it's not black magic here.
They’re effective because they
make more effective use of parallelism...
...SIMD less flexible than MIMD,
but when it works more efficient...
...no doubt about it right. VLIW...
...versus speculative, out of order,more
efficient when VLIW structures work...
So, what about these
domain specific languages,
I think the key thing is it's simply
too hard to start with C or Python
and extract the level of knowledge
you need to map
to the hardware efficiently
if you have domain specific support on it.
It's simply too hard a problem.
Lots of us in the compiler
community worked on that problem,
it's just too hard.
So, you need a higher-level language
that talks about matrices,
vectors or other
high-level structures and
specifies operations
at that higher-level.
Still means there are
interesting compiler challenges
because we wanted a
domain specific program
to still be relatively independent
from the architecture.
So, we want have
interesting compiler challenges
there from mapping
that domain specific program
to a particular architecture and the
architecture may vary
from version to version.
So, I think there are lots of
terrific research opportunities here,
make Python programs run like C,
you'll be a hero if you do that.
It's sort of déjà vu
when I think about it because
what we were trying to doing RISC days
days was to make
high level languages run efficiently
on the -- on the architecture of the time.
This is the same challenge,
same kind of challenge.
Domain specific applications,
what are the right targets,
what are the right languages,
how do you think that compiler technology,
how do you build domain
specific languages and applications
which can port from
one generation to the next
so we’re not
constantly rewriting software.
Well, what problem might you work on?
Well, there's one area
where the number of
papers is growing as
fast as Moore's law,
which you can see on this plot
and that’s machine learning.
So, there's an obvious area,
it's computationally intensive,
provides lots of interesting things
and the number of applications
is growing by leaps and bounds.
So, why not work on machine learning.
So, of course that
the tensor processing unit
is one example of this
that Google tried to do,
deploy these machines and I think
from the viewpoint of the Google people,
their view is if we
don't deploy this technology,
we won’t be able to afford
to run lots of applications
on -- of machine learning
they’ll be computationally too expensive.
The architectures look
radically different, right.
So rather than build large caches and take
up lots of the architecture for control,
instead we build memory
that's targeted at the application use
because memory access
is such a big power consumer.
You think very hard
about trying to get
memory accesses on-chip
rather than off chip.
You think about building
lots of computational bandwidth
to match the kind of
application you're running.
In the case of doing
neural network inference,
it's doing lots of matrix multiplies,
right, so you build a systolic array.
Here's an old idea come back
in its prime 30 years
after it was created.
And if you look at performance,
you can get dramatic performance
improvements in terms
of performance per watt.
Of course, the challenge--
one of the challenges
we’ll have here is
what applications,
what benchmarks and just
as we did in earlier era,
we invented SPEC as a way
to normalize and get ways
to compare different architectures,
we’re going to have to think
about how we do that for machine learning
and other domain environments as well.
The one thing we’ve
learned in the architecture community
if we have a set of benchmarks
that's a good set of
benchmarks and it's reasonable
and can't be tampered
with that provides stimulus
for everybody to think
about how to bring
their ideas out and
test them against that.
So, summary, lots of opportunities,
but a new approach
to computer architecture is needed.
We need renaissance computer architecture.
Instead of having people
that only understand a
small sliver of the vertical stack,
we need to think about how to build teams
that put together people
who understand applications,
people who understand
languages and domain specific
languages and related
compiler technology,
together with people who
understand architecture
and the underlying
implementation architecture.
From me, it's a return to the past,
it's a return to a time
when computer companies
were vertically integrated rather
than horizontally decomposed
and I think that provides
an exciting opportunity
for people both in academic computer
science as well as in industry.
So, thank you for your
attention, thank everybody
for organizing this
terrific conference
and thanks to all the
colleagues who’ve collaborated
with this over the years and
Dave is going to finish.
Yes, I forget to thank everybody,
so John filled in of me,
but yeah, we found out about the
award not that long ago and ACM
Pat Ryan contacted the
and the organizing committee and said,
what would you think,
can you she squeeze a Turing
Award lecture in with about
and they did and so we
are really appreciative.
Okay, so this is the last part
before questions and answers and
you know, John and I are
looking forward to them.
So, I was always jealous of my colleagues
in operating systems and compilers
that they could work
on industrial strength
and make contributions and the
whole world could use them
because people use open-source
operating systems.
So, why can’t we do that in architecture?
Okay, so let me tell you about
RISC-V and it’s called RISC-V,
V because it’s kind of the
fifth Berkeley RISC project.
So, basically it was time at
Berkeley that Krste Asanovic
was leading the way and we
needed to doing instruction set
and the problem with
the obvious candidates
is not only were they complicated,
we just -- we wouldn’t
be allowed to use them.
Intel was one of our sponsors,
but they didn’t let us use the
x86 because it was controlled.
So, Krste decided that he and
graduate students were going
to do a clean slate instruction set.
So, let’s start over, kind of a radical
idea, but it would only take three months,
okay and then the lead graduate students
for Andrew Waterman, Yunsup Lee
and Andrew is here as well.
I saw him as well and
talked to him about it,
so but that well, it
took us four years,
but they built a lot of chips
in that time and I helped some,
but it’s really Krste, Andrew
and Yunsup that did it.
Then this weird thing happened,
you know, if you're in academia,
you always get complaints
about what you're doing.
So, but we started getting
complaints about us changing
internally details of the RISC-V
instruction set from fall to spring, right
So, okay, you guys don’t complain about
everything, why do you care if we change
our instruction set at
Berkeley for our courses
and what we discovered
in talking to people,
there was this thirst for
an open instruction set.
They looked at a whole bunch of them.
They uncovered RISC-V and
they started using it.
They were going to use it themselves.
So, once we heard and understood there was
a demand for an open instruction set,
we thought well that’s a great
idea, let’s help make that happen.
So, what's different about RISC-V?
It's really simple, it is far simpler
than other instruction sets.
The manual is about 200 pages
that Andrew wrote most of those, Andrew
and Krste wrote most of that and,
you know, x86 is 10 times that.
It’s a clean slate design,
you know, it's easier
if you start 25 years later
to look at all the mistakes
of the past for the MIPS
and SPARC architectures,
don't make those mistakes, don’t
tie the microarchitecture in,
which is easy to do in retrospect.
It's a modular instruction set
and that their standard base
that everybody has to have
that these all software runs
and then there's optional extensions,
so you include or not depending
on your application.
Because we knew the domain specific
architectures were going to be important,
there's lots of opcode space to set aside,
some architectures kind of lose
with a bigger address fields
and a big deal is it’s community designed.
The base of standard extensions are
finished that's not going to change,
if you want to add extensions
it’s a community effort
to do that where we bring
the experts beforehand.
Typically, what happens
in computer architecture,
you announce a new set of
instructions if that is a company
and then all of the software people
tell you want was wrong with it.
We have those conversations upfront
and it's a foundation that’s
going to maintain it.
You know, universities lose
attention and not excited,
so it's a foundation non-profit
foundations will run it
and you know, it’s tech like
operating systems compilers,
advances happen for technical reasons
not really marketing reasons.
So there's actually a few
different instruction set,
a 32-bit and a 64-bit and
one even for embedded,
the standard extensions
optionally multiply and divide
even atomic instructions
single and double precision,
compressed instructions
and then vector which
is no more elegant than
the classics SIMD ones,
my simple instruction format
supported by the foundation.
So, the founders members are
growing up and to the right,
in fact so there’s more than a
100 of them in couple of years
nVIDIA announced at a workshop
they were going to replace
their microcontroller with RISC-Vs so
that’s I know 20 or 40 million year.
Western digital announced at a workshop
that they're going to put them in disc,
so they’re going to
bring computing to the
disc and that’s going
to billions per year
and at out practice talk
at Stanford on Thursday,
two people came up to me from Changhong
and Anyka and they announced
announced they’re going to
RISC-V and they’re going
to be shipping 30 million a
year starting next year.
So, it's -- it's really
starting to catch on.
In terms of the standards groups
or the extensions or the pieces,
I think there’s people here who
have worked on these pieces,
but we get, you know, it’s nice to get all
because it’s open you get all the experts
together and have these conversations,
something like a standards
committee before
you embrace it into
the instruction set.
And RISC-V is just one example and
nVIDLA to it’s credit has an open,
just what John was talking about,
a domain specific accelerator
and everything is open, the
software stack is open.
The instruction set is
architecture is open,
implementations are open,
it’s a scalable design
that you can use it and
it comes either with a
RISC-V core as a host
or not, it’s up to you
so that another example
of open architectures.
And then, you know,
motivated this mornings are
this lunch times thing is security,
likes this idea of open architectures,
they don't believe in
security through obscurity.
They believe in openness.
So, there are worried companies,
countries are worried about
trapdoors that’s a serious worry.
This paper that’s referenced
here changed one line of Verilog
to insert a trap that they
could take over a machine.
So, they'd like to be open
implementations that you could look at
and then the big thing is, you
know, clearly from I -- I think
what you’d pick up from the
lunchtime conversation,
security is going to be a big
challenge for computer architecture.
We need everybody working
that who wants to,
right now the proprietary architecture
you have to work for Intel or
arm for those proprietary ones.
Here everybody can work on them
and including all of academia
who have a lot of value to be able to add
and that I think
what’s exciting the
opportunities is given
the great advances FPGAs
FPGAs and open source implementations
and open software stacks,
you can do novel architectures
and put them online.
You can connect into the
internet and you could be able
to the subject to attacks or you
could offer reward for attacks.
So, you really have an active
adversary on the other side,
not just, you know,
defensive on this thing
and even though it runs at a 100 MHz
that's fast enough to run real software
you could have users and
because it’s an FPGA you
could iterate in weeks
rather than years with
with standard hardware and
so, my guess is and I
talked to people to
table-talk is probably
RISC-V will be the exemplar.
Probably people will use
RISC-V if we do you co-design
with architects and security
people to advance it.
That probably happened there first.
Now other people could use the ideas
that’s probably happened first in RISC-V.
So, summarizing the open architecture
summary, free, it's actually free.
Anybody can use it. There’s no contracts
or anything like just like LINX.
It's simpler out of the gate,
and it's not going to be
there won’t be marketing
reasons to expand it
when I talk to people in
commercial companies,
why do you do that, well, you know,
it's easier to sell a new
instruction than that is,
you know, a better implementation.
It makes a big difference at the low end.
I’ve been surprised how
minute that they want to
how small they want it to be.
At the high-end, you know, the
architecture doesn't matter as much,
but it can be -- there’s no reason
it can’t be as far at the high end.
We can support the DSAs,
we have the opcode space
for that and I think with just
more people building processors,
it's going to be a
more competitive market
which probably means
faster innovation
and I think as I said I think security
experts are going to rally around RISC-V.
And our modest goal is world domination
We can't think of any
reason why, you know,
a single instruction set
architecture wouldn’t
work well at the small
end and why not,
you know, it’s all RISC processor.
So, hoping it will be the
Linux of processors.
So, for the last part of the talk,
okay, is agile hardware development.
So, little over 15 years ago,
there was this breakthrough in
software engineering instead
of the idea that software would be better
with elaborate planning and phases,
called the Waterfall model,
, this was a rebellion and the rebellion
was we’re going to do it agile.
We’re going to do short development.
We’re going to make working
prototypes that are incomplete,
go to the customer and
see what they want next
and we’ll do this rapid
iteration and this has
been a revolution in
software engineering.
One of the model is what’s
call a Scrum organization,
it’s from rugby and that
you have a small team
team and they would do the
work and they do these sprints
where you build the
prototype and then pause
and see what you next then
work hard in sprints.
The good news is modern
CAD software enables us
to use some of these software
development techniques and
so, small teams can do some of
that with abstraction and reuse.
Here's an example from Berkeley in
terms of three different designs,
there is the leftmost column
these are all RISC-V examples
and so three stages, 32-bit,
three stage pipeline,
32-bit design, the middle
one is a classic 5 stage,
64-bit design and the right one
is an out of order machine, 64-bit design.
The unique lines of code, all the
less than half of a code is unique,
you can share across them
in the rocket design.
Hardly any of it is unique
and so it’s huge amount,
you know, it's, you know, the
raising level abstraction reduces
the lines of code, but the big
deal is getting code reuse.
Now, it seems nothing, you know,
how do you do this one-month turnaround
when you’re doing the hardware,
well what the group evolved
was this iterative model.
First of all, if you can
do your innovations
do it there then it’s just like software.
Now, but you can't run
that many clock cycles,
you can't you billions or
trillions of clock cycles.
If you go to the FPGA, you can.
So, if you want
to see how it really works
then you make the changes to the FPGA,
the great news that's happened
over the last few years
is that there's instances
of FPGA in the clouds.
You don’t even have to buy
hardware to do FPGAs,
you can just go the cloud and use it.
Somebody else sets it
all up and maintains it
and you can use it and there’s
examples at this conference like that.
But, you know, if we’re really going
to do energy and really
going to care about cost,
we have to get to layout, right
and so that’s the next iteration.
Once the ideas work in
the simulator and FPGA
and actually do layout,
now the layout flows gives
you some estimates.
There's more work that you have to do
to really be ready tape out,
which we call it tape in,
but then it’s -- it's pretty good.
Now, we could stop there.
We could stop there
because you really can estimate
the area and power with good accuracy
in the clock rate and all that stuff.
So, why not stop there,
it’s because we’re hardware people.
The most -- the difference
the advantage we have
over software people
is they’re stuck in cyberspace,
you know, the programs there,
it’s never -- it's not physical.
We build things.
We get something physical back and
there’s the excitement
when the chip goes back,
is it going to work, how
fast is it going to work,
how much power and stuff.
It comes back
So, the reason to build
a chip is the reward,
the excitement for everybody involved,
the graduate students or the
company to get the chips back.
Now that must be really expensive.
No, it’s not really expensive.
We’ve had test chips
forever that were cheap.
So, you could get 100 one by one,
you know, 10 millimeter squared
chips in 28 nm for $14,000.
Why is this exciting now?
Because with Moore's law where we
are that's millions of transistors,
you can get a RISC-V core and nVIDIA DLA
in a tiny test chip and it's only $14,000.
So, everybody can afford it.
Now, if you want to
build a really big chip
that would be the last step and of course
that will be more expensive
but everybody can afford
to build chips today.
As an example at Berkeley
led by Krste Asanovic,
you know, they built
10 chips in five years
and so with this agile model, right,
you don’t kind of wait several years like
I did and then check it
out and see what happens.
They just, you know, gave next iteration,
tape it out, so the graduates
that came out of the program
felt very confident
that the chips are going
to work because they built
a lot of them and each one
getting a little bit better,
the agile model is superior.
So, wrapping up before we do questions,
John and I think we’re
entering a new Golden Age.
If the end of Dennard scaling
and Moore's law means
architects is where we can innovate
if you want to make
things do a lot better.
It’s not going to happen at
the microarchitecture level.
Security clearly needs
innovation as well,
if the software only
solutions are not going
to lead to more secure systems.
The domain specific language is
raising the level of abstraction
to make it easier for the programmer.
also makes it easier for architects
to innovate and domain
specific architectures
are getting these factors
of 20 or 40, not 5 or 10%.
The open architectures
and open implementations
reduces the effort to get involved,
you can go in and like your compiler
and operating colleagues
and make enhancements
to these devices and everybody
gets to work on it.
The cloud FPGAs make it even easier
for everybody to build things and
what looks like custom hardware
and this agile development means all of us
can afford to make chips.
So, as John said like our time when
we were young architects in the 1980s,
this is a great time to be an architect,
even in academia industry and
with that we’ll have questions.
That was the fastest talk
I ever heard Dave.
I thought John was going
to cut me off, but no --
So, thanks for the thought.
I want to go back to what John
said that it’s very easy to get 62000...
...speed up from scripting languages.
In reality, a system like
the V8 Google compiler...
...is extremely efficient.
It has millions of lines of code...
...and if you look at
with the performance lost...
...there is no silver bullet.
It’s all over the place.
There is type system.
There is multiple layers of compilers.
There is inline caches...
...garbage collection.
So, and caches don’t work.
Nobody said it was going to be easy.
I think what he said was
the potential is there for 62,000.
Yeah, you don’t have to get all it.
If you get a factor of 1000
you’re a hero, right and...
...you just have to get one
160th of what's available.
I think you're right, it
is spread around.
It's going to -- I think
the interesting question will be...
...how to combine compiler
approaches with...
...perhaps new kinds of
hardware support, which...
...will help you put the
two back together.
Remember some of you are old enough
to remember the age of [inaudible]...
...machine and SPUR and things like that.
We had a bunch of ideas there.
Well, now we’ve got programming
leverage that’s substantial.
What are the right architecture ideas
to match with that compiler technology?
Don't separate the two, don’t
put the architects over...
...here. I'm walking away from
the architecture people --
-Compiler people.
-But...
...they're mixed together and you get
compiler people, let them mix together.
Yes, I agree. Just finally
pointing out that
it’s like the development took millions
of engineer hours already so...
We’re researchers, right, if, you know...
...somebody could have told us that
when we were, you know, in our 30s...
...right, it was like you can’t
build a microprocessor.
That takes, you know, hundreds
of Intel engineers, you can't do that.
This is an opportunity.
We have no choice, right.
If Moore's law was still going on.
Microprocessors were...
...double every 18 months, I don't
know if people work on this,
But, we got to do something too
and here's an opportunity.
And you should think about what
actually happened in the whole VLSI era...
...the academic community built
a set of design tools.
There weren’t design tools out there.
I remember...
...going down and seeing
Zilog designing Z80s...
...and there was a piece
of mylar pasted on...
...the wall that was about
20 feet by 20 feet...
...and that’s how they
were doing the design.
If we had adopted
that design methodology...
...we never would've
been to design anything.
The grad students
would have all quit.
They’d still be grad students.
I'm Ling [inaudible] student from Duke.
I'm working on domain
specific architecture.
So, for domain specific
computing people...
here actually, I think we
have two approach...
...one is that we run a new
domain application and conventional...
computing [inaudible].
So, we can find that where
is the [inaudible] improve that.
So, [inaudible]...
...they first find that...
...the computing [inaudible]...
...accelerator and then they find...
...that the instruction said
accelerator is [inaudible]...
...so they proposed the [inaudible].
This is the [inaudible]...
...and the other approach for the
domain specific computing is that...
this is like a [inaudible]
design approach.
So, in this approach
and we just proposed a...
solution directly from
the software [inaudible]...
...hardware architecture.
So, one example is --
Okay, what's the question?
[inaudible] said earlier
what's the question?
So, the question --
here is the question.
So, for domain specific
computing people...
...here and we have two approach,
so do you have any comments
...or suggestion about these
two approach. Thank you.
I think both approaches have merit.
I think to say one approach will dominate
the entire space is probably wrong.
...and in the end, the advantages
of benchmarks for various areas...
...will bear out which approach
works better...
...that’s my view. Yeah, okay.
A quantitative approach.
Very nice talk.
So, Jason Mars, Michigan...
So, we're kind of in an era
right now where...
...we’re going specialized, right. So,
you can think of it as an extreme CISC...
...where we’re really specializing
for particular complex use cases...
...and, you know, so you guys
are hard-core advocates for RISC...
...point of view on the world
and so, I wonder...
...what kind of lessons can we take
from your experience?
Having this battle
before CISC versus RISC...
...what kind of RISC
lessons can we take?
...being that the community
as a whole has been so...
focused on specialization,
acceleration, etc.
Put the hardware
and the software together.
Don’t separate out the hardware
from the software, think about...
...the problem in an integrated fashion.
I think that's the most...
...important thing. I mean we...
...the RISC guy just wouldn't
have come without the insights...
...about what compiler technology
needed to look like...
right, and that's the key insights.
So, it’s the same thing here.
Don't split them apart,
but optimize across...
...that boundary and think
how they fit together.
And like we said, you know,
it’s serendipitous, right...
...is that for domain specific
languages are being created..
...to make programmers more
efficient in these domains...
...and that's independent
of all of this.
But, what good news for
us raise the level of...
...abstraction that may ease
a program, also raises...
...where we can innovate, right.
And so...
...yeah, like that's what
John said, the vertically
...integrated people -- so
it means more from us.
Back in the old SPEC days, I don’t
think people didn’t even know what was...
...you know, I don’t know
it’s program four. I'm...
...going to study it and
make it run faster. Now...
...because you didn’t win
anything if you studied the...
...SPEC program because you
weren’t allowed to change it.
This is a really different world, right.
You're going to -- the opportunities
we see here are...
I think that's why it’s happening
at [inaudible] companies.
These companies have the skills
at many different levels...
...and they’re finding, you
know, big opportunities.
Thank you.
First and foremost thank
you very mach for...
...your contribution,
really appreciate it.
So, the question I have is really
about domain specific architectures,
Now I know the entire
architecture community
...at this point in time
is talking about
...domain specific
architectures being the
...solution for a lot of
the problem we have.
I've worked on mobile
computer architecture for...
...most of my time over
the past several years
and this device that you have here,
it's got more than just a CPU and GPU.
These folks have ben doing
domain specific architectures...
...for over a decade at this point.
So, I'm curious...
while we look at on surface
base, we have the --
...we’ve gone from CPU, GPU
to TPUs today...
whereas these things
have 10 times more...
...that number in terms
of integration in SOC.
So, how you look at domain
specific architectures in this domain?
and say that, is it new or is it really
old what are the lessons we can borrow?
I’ll try and do shorter,
I usually go on for long...
...but basically the
difference is programability...
...right, like I was
involved with the -
...the project that’s what's
called the Pixel.
Virtual Core at Google
on my sabbatical...
...and there was a hardware
called ISPs, they just do...
...this all on hardware. It
was a fixed hardware line.
The idea was to make this
more programmable and they...
...actually the domain specific
languages is called [inaudible]
that they tie it in there, but I think
that's the difference.Certainly there's..
...lots of special purpose
accelerators that are...
...very energy efficient that
kind of do 1ish thing.
and so can we keep
software and acceleration, right.
That's exactly right.
So, Vivek [inaudible], so
Dave and John, I’d like to tap into your
40 years of experience in education
...and think about how computer science
has been taught over this 40 year period.
and, you know, as you approach -- look
for opportunities in this Golden Era,
Do you see gaps in the preparation
of undergraduate and graduate students...
to -- to address this era?
So, clearly security. I mean
clearly we're not...
...teaching our students
enough about the...
...importance of security,
how to think about it,
there was more -- if you go back...
to textbooks of 20 or 30 years
ago, you’ll probably find more...
...stuff on security than you
will in some modern textbook.
We need to fix that. That's
obviously -- obviously crucial.
I think for better or worse, our
field has expanded so dramatically
that it's impossible for
an undergraduate or even...
...a graduate student to
master everything, right.
When I was a graduate
student, I could go to...
...basically any PhD defense
and understand what's...
going on. I could go to any job
talk and understand what was going on.
To say that you can do
this today, it's just...
...impossible. The field has
grown by leaps and bounds...
...so we’ve got to
accept that we have...
...to teach our
students some basic...
algorithms, computation,
complexity, but we’ve also...
...got to teach them some
important system concepts.
...right, parallelism, security, things
like that...
...and we’ve got to realize
that we’re going to...
...have to teach them to
be lifelong learners...
...because they're going to have
to relearn and reengage...
...because our field is going
to continue to expand and blossom.
I think the other thing
is because of this...
...excitement about data
science, I think statistics...
...now depending on the school
they may get plenty...
of statistics or they may not, but
I think computer science getting...
...closer to statistics and vice
versus is going to be a think so...
you know, I guess you’d look
at your curriculum and see.
Sarita [inaudible], University
of Illinois, so thank you.
So, you just talked about entering
a Golden Age of computer architecture
and you also talked about how
it’s going to be important...
...for hardware and software
people to work together...
right and I think this
group gets that, right.
We get that we've to work
with software folks..
We are actually used to doing that.
But, the other way is not clear, right.
Generally, culturally, I think...
...software people don’t quite get
hardware, it’s not part of...
...the computer science department’s
culture, so my question is...
what can we do to move that needle?
So, the first the thing
you have to tell them is...
...guys, we’ve been giving you
a free ride for 30 years
While you write your crummy
software and we made it faster...
...right, that’s over, you know.
You know...
...at Berkeley in like 2003, I went
to the faculty lunches...
..and, you know, I'm going to give
a talk. The future is [inaudible].
Single cores are not getting any faster.
I don't know how many of my
colleagues believe me when I...
...said, you know, they all,
everybody has got an opinion,
but I think look our job is to tell
them it's over. The free ride is over.
There's no magic, there's no -- the
cavalry is not coming over the hill.
The opportunity going forward is going
to be hardware, software codesign
and architects are going
to play this big role. If...
...you want things to get
faster and lower energy,
...it’s going to -- this is
the only path left, you know...
and I think it’s our
-- I think it’s actually
...all of your job to explain that your
colleagues and the faculty that get that
...are going to be at advantage,
right, because we’re right,
right. There’s nothing else
that’s going to happen.
We’re never wrong.
We -- the architecture
community, you know, we know...
...what's going on in hardware
and stuff like this...
and that’s, you know, you may get of
course,well, it won’t quantum computer,
no, quantum computer is a really
exciting thing, it’s not tomorrow,
alright and it’s not going
to happen tomorrow..
It’s about the same schedule as fusion.
So, I get that, but you’re
being live streamed and
...now everybody is listening
to you -- so that’s my --
You can video clip us.
Mark [inaudible], Microsoft.
So, clearly it a renaissance for
architects and mircoarchitects...
...that are building very
innovative chips for AI.
Some of them requiring,
you know, thousands...
...if not tens of
thousands of cores.
What I don't see as much as
a renaissance of compiler folks,
...you know, [inaudible] going
more vertical, but...
...gone are the days that we
needed to compile to a...
...simple RISC ISA, now it’s
compiling for these gigantic...
distributed machines. So,
do you think that we’re...
...feeding the pipe with
talent in that space?
Well, I think it’s an opportunity.
My view is that...
...we kind of hit the wall...
...on compiler -- we pushed compiler
technology as far as it could go.
If we were talking about compiling
from lower-level, high-level...
...languages, so right, like C,
Fortran, things like that.
We pushed that technology
as far as it would go,
...we couldn’t push it any further I mean
and lots of people who worked on it...
...kind of gave up on doing
things like...
...reorganizing memory
to improve memory...
...performance, who got as
far as we could get.
For simple programs, you can go it,
matrix multiply, I can automatically...
...but take a large application
or something as...
....far as matrices forget
it, the system breaks.
So, now we -- there's an opportunity...
...to reengage and renovate that
whole area and rejuvenate it...
...and I think that's exciting
opportunity and I think those...
...people will get the most
leverage if they’re working...
hand in hand with hardware
people with the architects.
And, you know, [inaudible]
for domain specific languages...
...are hopeful right, matrices are
-- are primitives...
...right, and that makes it easier
to do. There's a project at Google...
...called XLA that is trying
to do optimization...
...from this very high level
that’s new -- well...
to my -- I believe that’s
a new challenge for the compiler --
It’s a different approach.
It’s a different approach. So, yeah,
and we can also, you know...
...we’re probably not going
to do all the work...
John and I. What we can do is kind
of set up opportunities and this is...
...clearly an opportunity
and, you know, we’re...
...going from an era,
we’re really tiny...
...you know, papers with little tiny
improvements, right, we’re talking...
...these giant numbers are
there and, you know, like...
-John, I like that what John said.
-And you work in this field --
Future Turing awarders.
Exactly. There's only one requirement,
you have to have more hair than Dave.
I’m [inaudible] student and,
...you know, you talk about we are the...
Golden Age of architecture
...that we can develop domain
specific language for new network...
...and how you think your
network will effect our...
...community, for example
with new network maybe --...
Let me take that one, how
can new network actually --
So, Cliff Young here gave
a keynote address at...
...a workshop called
[inaudible], which it was...
...how can we use machine
learning to design computers, right...
...and I think that's a really
interesting idea, right. There's...
...something in machine learning
called auto machine learning where...
...you use machine learning
to design the models.
So, you know, this is revolutionizing
and machine learning is many fields...
...it would be interesting to use it here,
so like should branch predictors...
...be based on machine learning principles
prefectures of machine learning..
It’s kind of more like a question
how well could we -- could we...
design machines better for
use machine learning, that's
...another, you know, exciting,
potentially revolutionary idea...
and I see Cliff’s keynote.
Hi, I'm Lee from Hallway. I work
on hydrogenous accelerators.
I just want to looking
at the system level, like...
...you have all these
domain specific hours...
...so now we're talking
about applications...
...on the cloud on all
of these views.
We need something
to aggregate all of these hours,
What's your comment on this
because I think my opinion there's a...
...lot, like bigger gaps
on the software side...
...and the hardware side
for [inaudible].
Yeah, there's a big gap.
There's a big gap in both. I...
...mean I think that's what
makes it an exciting time.
There's lots of opportunity
to kind of rethink...
...how we program, how we
organize our architecture...
...and that provides -- that's why we
think it’s a new Golden Age...
...right. We've had this run up.
We ran up all...
...this curve and we pushed
everything out there.
Those ideas are done. Time
to rethink the problem and that...
means there's a great opportunity for --
Maybe we should make
it clear what we mean by Golden Age...
...so we’re kind of researchers,
right and...
...what could be really scary for
companies which make a lot of gold...
...isn’t what we’re talking
about that it’s...
...like oh money on the floor.
This is got...
...wow, we don't know what to do.
So, people...
...like us have an opportunity
to lead the way.
When it’s crystal clear
that companies can keep...
...doing the same thing and
make a lot of money,
you know, it's very
comforting for companies...
...but not so exciting
for researchers.
So, as there's just a
target rich environment for...
...researchers in architecture
to make big contributions...
to, you know, society.
Okay. They’ve told us that
the reception is about to start...
So, we got to thank everybody
for all your great...
...questions and Vicki
wants to just close.
Thank you.
I just wanted to have
everyone give them another...
...big hand. Thank you for
that wonderful talk.
