Version française
Home     About     Download     Resources     Contact us    
Browse thread
Performance of threaded interpreter on hyper-threaded CPU
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Till Varoquaux <till.varoquaux@g...>
Subject: Re: [Caml-list] Re: Performance of threaded interpreter on hyper-threaded CPU
I might just add that hyperthreading is pretty far from a real
multiprocessor setup...
It works best when the various threads are using differents units of the
cpu, wich is less liable to happen when the threads are unning doing
basically the same thing. A friend of mine has been experimenting on Xeons
recently whith the exact same code (i.e.: multithreaded in both cases) he
gains 12.5% when using hyperthreading. This might be an extreme example...
However supposing you were in the same case it is very conceivable that the
few percent you scrape by  are lost in the machinery required to get
multithreading working properly (mutexes etc...).
Could you try running your multithreaded code on only one of the virtual cpu
to see the improvement hyperthreading really brings in?
Till

On 4/18/06, Michel Schinz <Michel.Schinz@epfl.ch> wrote:
>
> Xavier Leroy <Xavier.Leroy@inria.fr> writes:
>
> >  > When the ratio given in the last column is greater than 1, then
> >  > threaded code is faster than the switch-based solution. As you can
> >  > see, this is only true in my case for non-hyper-threaded
> >  > architectures.
> >
> > Which version(s) of gcc do you use for compiling the bytecode
> > interpreter?  Is it the same version on all machines?
>
> No, unfortunately not. Here are the various versions used (I realise
> this variety is annoying, but I have no control over what software
> runs on these machines):
>
> 1.25 GHz PPC G4
>   powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1
>    (Apple Computer, Inc. build 5247)
> 1.70 GHz P4
>   gcc (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
> 3.0 GHz hyper-threaded P4
>   gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)
> dual 3.0 GHz hyper-threaded Xeon
>   gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)
>
> I'm aware of the problem due to gcc's cross-jumping "optimisation"
> (described as you mention by Ertl in [1]). For the record, I tried
> disabling it with -fno-crossjumping, but as Ertl mention, this didn't
> change anything. However, judging by the versions of gcc I'm using,
> cross-jumping should also be performed on the second machine, for
> which threaded code provides a noticable gain...
>
> However, your remark motivated me to measure the performance of a
> single ocamlrun executable running on the various Pentium 4 I have at
> hand, and the results are interesting...
>
> Using the executable produced by gcc 3.2.2, I obtain the following
> timings:
>
> | architecture                      | switch | threaded |   ratio |
> |-----------------------------------+--------+----------+---------|
> | 1.70 GHz Pentium 4                |   6.34 |     4.82 |  1.3154 |
> | 3.0 GHz Pentium 4, hyper-threaded |   2.62 |     3.46 | 0.75723 |
> | dual 3.0 GHz Xeon, hyper-threaded |   3.36 |     2.59 |  1.2973 |
>
> while using the executable produced by gcc 3.4.4, I obtain the
> following timings:
>
> | architecture                      | switch | threaded |   ratio |
> |-----------------------------------+--------+----------+---------|
> | 1.70 GHz Pentium 4                |   6.26 |     6.70 | 0.93433 |
> | 3.0 GHz Pentium 4, hyper-threaded |   2.51 |     6.15 | 0.40813 |
> | dual 3.0 GHz Xeon, hyper-threaded |   3.32 |     3.58 | 0.92737 |
>
> Finally, I noticed that gcc 4.0.0 was also available on the second
> machine, so I gave it a try, and obtained the following timings:
>
> | architecture                      | switch | threaded |   ratio |
> |-----------------------------------+--------+----------+---------|
> | 1.70 GHz Pentium 4                |   7.27 |     6.62 |  1.0982 |
> | 3.0 GHz Pentium 4, hyper-threaded |   2.37 |     4.75 | 0.49895 |
> | dual 3.0 GHz Xeon, hyper-threaded |   3.91 |     3.56 |  1.0983 |
>
> So the threaded code version of the OCaml VM is always slower on the
> hyper-threaded P4, albeit not always by the same amount.
>
> Michel.
>
> [1] http://www.complang.tuwien.ac.at/forth/threading/
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>