Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] FP's and HyperThreading Processors
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <xavier.leroy@i...>
Subject: Re: [Caml-list] FP's and HyperThreading Processors
> On an old P2 single processor machine at 350 MHz, I am seeing almost 95%+
> CPU utilization. But on a new 3 GHz P4 with HyperThreading enabled (dual
> register banks for fast context switching and minimum cache coherence
> overhead), this same program provides much less than 5% CPU utilization. 

How do you measure "CPU utilization"?  Different tools have different
notions of CPU utilization.  For instance, OS-level tools such as
"top", "ps" or "uptime" treat the time spent by the CPU while stalled
by cache misses and the like as user CPU time, not as idle time.
Only processor-level performance monitor counters can distinguish
between active cycles and stalled cycles.

> The net result is that this program runs only twice as fast on the
> new 3 GHz P4 as it runs on the old 350 MHz P2.

Some operations take more cycles on the P4 than on the PPro/P2/P3,
e.g. shifts, integer multiplication and division.  I saw some programs
run slightly slower on a 2 Ghz P4 than on a 1 Ghz P3.  But that
doesn't suffice to explain the factor of 5 that you observed.

> I suspect, but have yet to prove, that the low utilization is due to a low
> CPU to memory bandwidth and to the failure of the L1 and L2 caches to supply
> needed operations and data to the CPU. This, I would hypothesize, is going
> to be demonstrated by any language that prefers fresh memory allocation for
> results, e.g., OCaml, ML, Lisp, Smalltalk, etc.
> 
> If I am correct, then it implies that our hardware friends are moving
> rapidly in the opposite direction to our advanced software systems. I
> mention this in order to tickle the imaginations of both camps.

This was a concern in the early to mid 90s, when processor speed
increased much more than the performances of the memory subsystems.
(I remember Caml benchmarks on an early Alpha system where the processor
was stalled 9 cycles out of 10.)

But since then the hardware manufacturers have done a very good job at
maintaining a decent balance between the memory subsystem and the ALU.
In particular, out-of-order execution is quite effective at hiding
cache miss latencies.  

My general feeling (although I have no proof of that) is that the
memory usage patterns of Caml programs aren't radically different from
those of more mainstream programs.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners