Skip to content

Instantly share code, notes, and snippets.

@BigEd
Created June 23, 2015 18:22
Show Gist options
  • Save BigEd/4fa7fec9305b595e84dc to your computer and use it in GitHub Desktop.
Save BigEd/4fa7fec9305b595e84dc to your computer and use it in GitHub Desktop.
Robert Tomasulo transcript
Transcript of the talk at https://www.youtube.com/watch?v=S6weTM1tNzQ
Thank you very much for that kind welcome!
What I intend to do with my time today
is divide it into two pieces
not necessarily equal.
The first piece will be a very cursory
examination of 20, 30 years of computer design
from the model 91 on forward to when it finally got replaced
as an out-of-order execution machine
and then I'll spend the rest of my
time answering your questions.
By the way interrupt me any time you have a
question or a comment that you think is appropriate.
So, to come way ahead, to roughly 1990 or thereabouts
IBM finally brought out an out of order machine
and this was such a shocking thing that
they felt it necessary to offer
an explanation or two as to why it
took them 30 years to do this
and they did
and the main explanation which was
quite a good one
was that once you have a cache
with a very rapid access to memory
(not counting misses of course which we'll discuss a little bit later)
and you don't have to worry about floating point
and you don't care about long execution
instructions like floating point then there's really no need
for what the model 91 offered
and so naturally they got rid of it.
But as time marched on it became apparent that
they were going to need something.
Now, IBM moves in mysterious ways -
they probably move a lot slower than
I would have liked or other people would
have liked but ultimately they get it done.
So they brought out this machine
and they provided a reasonable rationale
which the only flaw in it was that as time went on
(because we're talking now about a 30 year span)
it became less and less applicable
to think that way because memories kept getting slower
as they always have
I mean you can think of it as kind of a Golden Age
of the cache
that we had this period where you could get away with
two cycle access to main memory
and Gee, if you missed the main memory you had like
five cycle access to whatever backed it up
instead of, in today's machines,
you've got 70 cycle access to whatever backed it up.
5:14
so
after... let me get straight what I want to say...
I'm gong to jump around somewhat...
So with all these caveats we didn't do an out of order machine
and maybe we could have a little bit sooner
than otherwise.
Now we jump back to the Model 91 timeframe
and why didn't they do an OOO execution machine then?
Well, they did, they did. It was a machine - I forget
what the nomenclature was now - it was a full OOO
machine. It was completely logically designed
it wasn't physically designed. So it's hard facts.
And, it featured some very nice innovations.
Including a branch prediction table which was
a new, a relatively new thing.
What else did they have? They had something in there specially
for RAS purposes, it's slipped my mind, but no-one cares
about RAS anyway. Although we should.
So, this machine was carried through to everything
except final physical design. It was good as gold, it
was completely debugged and conformed to the
architecture and all that other good stuff.
It was deemed that it was too expensive
which it probably was.
By the way, it had a 7ns cycle, which for
this period of time was pretty good. Pretty good.
Contemporary IBM machines were like 40, 50ns
cycle at that time.
7:36
So, now we have nothing. We don't have the model 91 any more
and we didn't bring out a successor machine.
So in a sense nothing happened, from that point of view, for machine
design, for some 30-some odd years. Which was sad, I think
but that's OK.
> Any questions?
... just leapfrog to the rest of my talk. Or rather, the rest of your talk.
I don't have to do too much talking - as you see I don't have too much to
do because IBM made it too easy, right? They made one - I wouldn't
say it was a halfhearted attempt - they made a good attempt, to make
a really good machine, but it wasn't on the cards, and that was it.
Then 30 years go by, and then it's in the stars, and they can start over
again.
So that's really the main thing I wanted to cover. Obviously I can cover
more things if you want, obviously machine design didn't stand still.
During this time there improvements in RAS, and all kinds of things.
But for our purposes I didn't think I'd want to devote too much
time to those things.
9m40
> Now onto questions.
[The questions are difficult to hear clearly enough to transcribe]
[Roughly when was this OOO design done?]
My memory isn't that good... early 70s, 72.
[Something about instructions]
Yes, in a limited way. What it did was classify the instructions
into fixed point, floating point, and decimal. And one instruction
in each class could be executed along with an instruction
from another class.
So it didn't rely really heavily on OOO, because you have to
remember that it's still true that, with the cache, a lot of the
gain of OOO evaporates. So there's not much point pursuing
it just for the sake of pursuing it.
[How large a team of people]
Well, the model 91 was special in the sense that they did
a whole new technology, they did a whole new design
automation system, and they did a whole new machine.
So they had a lot of change on their plates.
[How many people?]
The model 91, because it was developing the design
automation system and the software and everything, it had a lot of
people, altogether. It didn't have that many, if you focus in on
the designers, people like I was, there might have been
twenty, maybe, order of magnitude.
[laughter]
Why is order of magnitude funny? You think I'm trying to
hide a hundred? I wouldn't do that!
[Most trouble in 91?]
We had trouble until we discovered the OOO algorithms,
that cleared up one source of trouble.
The whole machine was stretching.
It was a strange machine because it had
really pitiful memory access. This was before
caches, so it had like 10 cycles, minimum
memory access. Now of course it had 16-way interleaved
memory, so you're not necessarily waiting 10 cycles
for every single access, but it's
pretty pitiful.
So memory was a big bottleneck for that machine.
When they finally brought out... part of the 90 line...
they may have brought out a high speed version.
I seem to remember they made two versions
of thin-film memory machine, which was very fast
memory. Unfortunately it couldn't have the huge
number of megabytes you can get with conventional
memories.
[You went on to work with STC on one of the first microprocessor based...
how did those servers differ from PCs]
As is often the case, a big company like IBM, they're not
necessarily first out of the gate with new things. It may take them
a while, especially if they don't have competition.
So part of what happened is due to that kind
of phenomenon.
But I don't know if the rest is due to that.
16:22
Understand, this machine, the first one we did
as STC - the only one we did as STC - was
not supposed to be a high performance machine.
It was supposed to be below the IBM machines.
Because it supposed to serve as a server, or a
lead-in to the IBM machine.
Reality intruded. The first thing that happened, the
technology was 2x slower than we thought it was.
But fortunately it was also 2x faster than we thought
it was, so we recouped back.
But we were still in the hole, and we end up
pulling some pretty sophisticated tricks
and you don't want to do that
you're dealing with a group - not all - of neophytes
and you don't want to be tackling complicated
things if you can avoid it.
And that was, partially, what did them in.
It was too complicated, for them.
17:51
[Something about marketing as the PC?]
Oh, Yeah! God love marketing people, I wanted to strangle them.
[laughter]
we were going for three years on this project,
sweating bullets, to try and to wring out
every last bit of performance
that these guys, the marketing people, were telling us
"we absolutely need that performance, we're going to
put you in a certain environment, you're not going to
sell that many machines"
and what happens, is that
"oh no, we don't really have to be in that environment
we want a cheaper machine that doesn't go as fast"
Sheesh, I wanted to strangle them, all of them.
It happens.
18:43
BTW this is not the end of the programming, it's the
second part where you get to ask questions.
[question about moving on from the system/360 to consulting
how did you find your job had changed]
Oddly enough, I don't think, that much.
Except that it was a mistake, striving for too much
performance, was bad. We got ourselves a lot
of grief from doing that.
Because, you know, the architecture of the machine
is semi-stable. It changes, you have to upgrade with
the times, but it is semi-stable.
20:14
[what was the original motivation for the OOO architecture
and how come it took so long to be used in production.
Why was the idea ahead of its time]
The short answer, we've already discussed it. One, there was
a machine, a successor, which had got scrapped. Without that
machine, and with the advances in cache, there was no point,
really, in out of order execution. At least not until the 80s.
You can argue about at what point it might have made
sense.
[Given that, why try to do OOO]
At the time, we were young
[laughter]
young and bold and we wanted to go for everything we can
get. And if we hadn't the idea, we would have built a
perfectly good machine which in the best case
might be 20 or 30% faster thanks to the floating point
guy. Because he speeded up the floating point.
So it wasn't that big a deal. But it was a coup.
It was something Seymour Cray didn't have.
No-one had it. So IBM could get some bragging rights
out of the whole thing.
[Back to 60s-70s what was it like to convince designers
and architects that OOO was good]
Pretty smart team! They didn't take any convincing.
I had the idea on a weekend, went in Monday morning
sketched out the bulk of the important parts of the idea.
They were thorough, they made sure they covered things,
they made sure there were no serious quibbles.
And then we were off and running.
Now, you know that you can do more with OOO
than we did. And in particular one thing which is very
important for the operating system is to be able to do
loads out of order. So that you can stack up loads which
might be delayed for whatever reason, and then maybe
get some other instruction through. Because that's the
main ... once you get rid of floating point - let's say long
instructions - then that's basically it. Even the OOO that
I was just talking about is not really a barn-burner kind of thing.
It's good to have. Like all of these things, you get to build
on them. We didn't get to build on them for like 30 years
but you get to build on them.
And out of the closet they come and you find you can do
something you couldn't do before.
25:00
[What kind of design automation did you have]
Ha ha, none!
[laughter]
We were the first machine in IBM to simulate the logic of
the machine. And we could simulate 1000 gates at a time.
[laughter]
It's pitiful! I mean, the model 91 isn't by any means a huge
machine but it's like 40 or 50 thousand gates.
So that's nothing.
And this was something that developed towards the end of
the project.
Like I say, I give loads of credit, we had smart people.
Including this one guy and gal who were really DA people
and they were pushing what they could do, to improve
the performance of the machine.
26:28
[Debugging efforts and kinds of problems/bugs, and the process
to fix]
Yes, And an interesting sideline to that, we had this programmer
who eventually wrote a simulation of the machine.
And he discovered two bugs, in the machine - I think it had to do
with fetching - because in those days we had a pretty sophisticated
fetching algorithm, where you start out with nothing and you try and
catch a loop. So we really didn't have much of anything in the way
of debugging. It was coming, but not for us. Which is too bad.
[Followup, how to debug the machine]
Well, that's somewhat of an art. Especially in those days.
28:38
We had an interesting little experience. We were plagued by something
called the 'cracked stripe'
which none of you probably ever heard of. But it was a fault in the
technology
due to the extremely high current density that they had pushed the
circuits to, such that the wind, the electron wind, going through these very
fine circuits were blowing the atoms away. And you would get faults.
You would get open circuits.
So that was an interesting problem that we had to deal with.
[How to find that this was going on? How to deal with it?]
The 'cracked stripe' was special. We were experiencing one
failure every day. Now, most of you don't have experience debugging
a machine, but you can't debug a complex machine if you have one failure
a day. There are just too many things to find.
We were in real trouble.
And the answer was technology, and they had to fix the technology.
And in the case of the 91, they remade all the technology, in the case of
some of the slower machines in the 360 line they only partially remade them
and some they didn't remake at all.
Because it was a time-dependent thing. The faster your circuit was, the more
it was prone to this problem.
30:52
[how long were these systems under development, from
Thomas Watson saying I want a fast computer]
well, what i have to do, which doesn't make a nice clean
picture, is the following.
We commenced on the model 91 in about 1963
possibly late 62.
Because of the cracked stripe problem, it took us a very long
time to debug the machine.
If there'd been no cracked stripe problem we probably would have
brought up the machine two years earlier than we ultimately
brought it up.
So that was a real devastating blow to us.
And you know, it's really hard, when your hardware is failing under you,
it's hard to make progress
and it's failing in random ways
and sometimes you take it out of the machine
and it doesn't fail!
Now what do you do?
Thank your lucky stars if it fails next time.
[During development of 91 did you think about compiler optimisations]
We didn't have much to say about compilers
I was in the hardware group
we were conversant with some of their problems
later on there was more back and forth
because I always had - after the initial model 91 thing - a dual role of architect
in the early phases which is really
software architecture and then implementing the machine.
Does that answer your question?
[You mentioned Seymour Cray, he was still at Control Data
was there a lot of competition
did you take his machines apart?]
We did a little more of that later on
[laughter]
but no we didn't do that that much.
He had, it's very interesting how these things work out
because he had a jump start on us, okay
because he was already working on
his machine
and we were just starting on ours - we didn't even
have the technology to build our machine
So we were in real trouble.
So, what are we to do in this circumstance? How can we
make up lost ground. We tried all kinds of things, perhaps not
well-founded, to try and make up this lost ground
34:38
But it's difficult. And what happened was, we were saved. We had an
assessment of how fast the Cray machine was. So we said
okay, we think it's this fast, they are going to be two years after us
so we've got to be twice as fast as them so we can ... come out.
That sounds feasible.
Well, it turns out the Cray is four times faster
[laughter]
than we thought. Meanwhile our machine is two times faster
than we thought it was.
The net result of all this fiddling around was rough parity.
There were some things we did faster
some kinds of problems they did faster.
But they still had - perhaps undeserved - the reputation
for raw compute speed.
And I think all of the customers, like the Atomic Energy Commission
laboratories, who were supposedly interested in that, they all went
to Cray, and IBM sold to - how to characterise it - database kind
of applications, that need a lot of memory and concurrency, a lot
of I/O running and all that kind of stuff. Doesn't particularly need a lot of floating point performance,
computational performance in general
[36:21]
[Tell us about how IBM culture changed over your career]
That's a tough one for me inside.
First, I have to divide my career into two parts. First, the five or six
years when I did my machine design - not my machine design - and then after that.
How did IBM change? It became less interesting and cutting edge.
In the beginning it was wide open, crazy ideas and if you could implement them
and it would buy some performance, you got it.
As time went on, you get more and more constrained, by the architecture
and by the necessity of other machines. You're not just allowed to design
for the model 91 class, you have to design a machine, you know, the next class
down, we had three or four [classes] by that time. So there were all kinds of
things standing in the way of pure performance. And you just had to
live with those. There's no way around it.
38:38
[How about backward compatibility?]
Oh yeah, that was a must, that was a no-brainer, we weren't allowed to touch
that with a ten foot pole. In fact we had to get a special dispensation, because
the model 91, because of its out of order floating point, the effect on interrupts
was actually in violation of the architecture, and they had to get a special
dispensation for the model 91.
I don't think anyone ever suffered from this dispensation in those days, but
nevertheless you had to get it.
39:20
[Any specific ideas from your team that wouldn't have worked]
You're asking me if any ideas sort of died?
That's really hard to answer. You like to think that you wring out the most
performance you can get out of your technology, from what you've got,
but that didn't really... really and truly there are all manner of compromises
that have to be made - not have to be made, some have to be made, other's
don't have to be made, but you're not omniscient, you don't know everything,
so it's really hard to say how much that affects machine design.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment