Version française
Home     About     Download     Resources     Contact us    
Browse thread
More registers in modern day CPUs
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Brian Hurt <bhurt@j...>
Subject: Re: [Caml-list] More registers in modern day CPUs
Chris King wrote:

>On 9/6/07, Tom <tom.primozic@gmail.com> wrote:
>  
>
>>However, would it be possible to "emulate" cpu registers using software? By
>>keeping registers in the main memory, but accessing them often enough to
>>keep them in primary cache? That would be quite fast I believe...
>>    
>>
>
>This makes me wonder... why have registers to begin with?  I wonder
>how feasible a chip with a, say, 256-byte "register-level" cache would
>be.
>  
>
Such chips exist.  The Itanium is one example.

The problem is gate delays.  The purpose of registers is to be faster 
than L1 cache (which typically has a 2-3 clock delay associated with 
it).  But the more registers you have, the more gate delays you need to 
read or write registers- the naive implementation takes O(log N) gate 
delays to access O(N) registers- reality is more complicated than this.  
But the rule more registers = more gate delays holds true.  And these 
gate delays translate into a slower chip (one way or another- either you 
have to lower your clock rate or add more pipeline stages or both to 
deal with the larger register cache).  Of course, more registers make 
compilers happy, and lowers pressure on the cache bandwidth (as the 
compiler doesn't need to spill/refill registers quite so often).  This 
is why the 64-bit x86 is generally faster than the 32-bit x86- going 
from 8 (6 in practice) to 16 (14 in practice) registers was a big step 
up.  The Itanium has a large enough register set that it's performance 
is probably getting hurt by it, but it's hard to tell with the 
everything else going on.

The sweet spot for register sets seems to be in the 16-64 range- less 
than that, and you're being hurt by the increased memory pressure, more 
than that and you're probably being hurt by the slower register addressing.

Brian