Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
More registers in modern day CPUs
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-09-06 (07:18)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] More registers in modern day CPUs
On Thu, 2007-09-06 at 08:20 +0200, Tom wrote:
> (This question may not be OCaml specific, 

you'd be surprised ..

> However, would it be possible to "emulate" cpu registers using
> software? By keeping registers in the main memory, but accessing them
> often enough to keep them in primary cache? That would be quite fast I
> believe... 

The technique is called 'boxing'. This is one reason why Ocaml
is so fast, when you'd expect the extra dereferences required
all the time to be a big penalty. Instead, if the address is
used but not the data (eg generic operation) cache is saved
compared to an expanded representation. The cache is loaded
if the pointer is dereferenced, and subsequent derefs are
effectively free provided only a small number of boxes
is opened: there is an extra cost of one word for the 
address, which is the price of the lazy loading, and is 
amortised away by generic operations.

This is even faster than one might think because cache
can do speculative preload of the pointed at data.
[Does Ocaml bother to generate those instructions?]

IMHO, the main purpose of registers is to organise
the interleaving of parallel operations (memory reads
mainly) based on dependencies. They differ from main
memory (and cache) in that they're usually thread local
(whereas all the other stuff is shared) so they're
expressing coupling between data and flow of control.

for example in:

	R1 = a
	R2 = b
	R3 = R1 + R2
	R4 = c 
	R5 = d
	R6 = R4 + R5

you'd be mainly wrong to think of these instructions as operating
on data. No. Not today. These instructions are chopping up the
control flow into parallel threads:

	a b c d
	| | | |
	V V V V
	 +   +
	 |   |

I think that's the main reason for registers, not memory operands.
Registers only need a few bits to name, so the dispatching to
functional units is easier to calculate with less hardware.

John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: