English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
[Caml-list] 32 bit floats, SSE instructions
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-06-09 (00:26)
From: Jon Harrop <jdh30@c...>
Subject: Re: [Caml-list] 32 bit floats, SSE instructions
On Tuesday 08 June 2004 20:24, Brandon J. Van Every wrote:
> ...
> What utter nonsense!

Yo Mama.

> You ever written a 3D device driver? 

Are you trying to write a device driver in OCaml?

> You do *not* 
> engineer your most basic data structures for infinite flexibility.

You appear to be approximating the two in "two transpose formats" with 
infinity. Are you an astrophysicist?

> Almost nobody has the luxury of defining things so abastractly that they
> can switch SoA for AoS whenever they like.  It's a highly invasive
> change of programming model.

I think you are exaggerating cost of the "abstraction" of reordering the 
arguments of a function.

> The experiment of SoA has been tried in the 3D graphics industry and
> found wanting.  All the HW is AoS.

Apart from that "CPU" thing. ;-)

> The SoA methodology is possible with 
> the 3D APIs.  As one poster hinted, it's probably borne of the software
> rendering era mentality.  I distinctly recall when the DirectX 5.0 guys
> were implementing those features, shortly after the OpenGL guys did
> IIRC.  That would have been a 1997 timeframe.  They were thinking how
> "neat" it would all be.  Seen from the vantage of 2004, it all gave way
> to AoS and programmable shaders.  All your data for one vertex or one
> pixel at a time.

My point is that there are likely to be much more productive, higher-level 
optimisations that you could be doing.

> > The only algorithms which would be significantly affected are
> > those for which
> > accesses are to (x_i, y_i, z_i) for random "i" rather than to
> Oh, the 'only' algorithms.  Crikey.

Are you saying that most of your algorithms require random access of that 
form? Can these algorithms not be transformed so that they access more 

> You ever written a 3D graphics pipeline???

I have dabbled in 3D graphics.

> You think it's all based on handing over some huge matrix 
> that could jolly well be in whatever order?

That is the objective, yes.

> Get real. 

Well, you're hardly going to be using complex numbers to represent vertex 
coordinates... ;-)

> The vast 
> majority of 3D graphics processing is accept / reject testing.  You want
> your data here, now, so you can decide what to do with it.  So you can
> retire it once you're done deciding, and not have it pollute your cache
> any further.

I'd like more, specific examples here. What determines the accept or reject? 
What is the consequence of an accept or reject?

> > > For example, transforming a large
> > > number of XYZW vectors by a 4x4 matrix is a 'pat' problem
> > > that occurs at some point in 3D graphics processing.
> >
> > If you want high performance, which you seem to want, the
> > hardware should be doing those for you.
> Well hand me a general purpose GPU with an incredible 2-way memory bus,
> smart guy.

The task of optimising 3D graphics software is to design your approach such 
that you don't need the results back, and all further computation occurs on 
the card. You make as much of the data available as possible and control data 
flow through the pipeline at the highest level with state changes.

If your programs are bottlenecked by lots of very low-level arithmetic over 
huge, flat data structures then you will almost certainly benefit from using 
more structured, hierarchical (ideally, multiresolution) representations. 
Derive the (possibly asymptotic) complexities of any suitable algorithms and 
make an educated decision on the basis of that quantification. This is likely 
to give you much better performance than very low-level optimisations such as 
fiddling with 32-bits floats.

> Hint: commodity 3D graphics cards are fast when you write to 
> them, damn slow when you read them, by design.

In general, you can't read the results of T&L from the card (sorry for being 
so off topic, guys) so the driver resorts to software T&L in OGL feedback 

If you absolutely must use flat data structures then perhaps you should 
consider using the GPU as a CPU.

> OCaml has a somewhat practical focus, but maybe it's not sufficiently
> practical for me.  I do find myself re-evaluating languages in terms of
> 3 overriding problems:
> - the available C++ migration path and its efficiency

Why C++? Is your objective to always prototype in OCaml and convert to C++?

> - the support of basic 3D graphics types, i.e. 32 bit floats

Why not "ease of use of trees, graphs etc."? Algorithmic optimisations are so 
much more productive...

> - ability to work with imperative, object oriented designs and libraries

OO is overrated, IMHO. Imperative is excusable for UI level things but I'd 
prefer a functional style for everything but the simplest of algorithms. 
OCaml can play with libraries fairly well but, yes, it takes a lot of time to 
get some things working. That isn't the fault of the OCaml creators though, 
of course, it's the fault of those dim-wits at Bell Labs...

> OCaml currently has 1 out of 3.
> See for point of reference, "Why No One Uses Functional Languages."
> http://cm.bell-labs.com/cm/cs/who/wadler/papers/sigplan-why/sigplan-why.
> ps.gz

Speak of the devil.

> 1.5 out of 3 if one considers OCaml SWIG to be an available, slow,
> optimizeable path.

Given the huge differences between C/C++ and OCaml (like safety), it would be 
overly optimistic to expect migration of code from OCaml to C to be much more 
efficient than it is now.

> ...Write my own little Python script to emit a lot of redundant,
> boring filename.i files with #include filename.h %include filename.h
> directives in them.

Yes, but the OCaml bit of SWIG is very alpha, AFAIK. SWIG wasn't even designed 
to deal with languages like OCaml.

Just out of curiosity, do you use the STL much? An interface to the STL might 
be interesting. I wouldn't use it any more though - I no longer have any need 
for C++.

> > > The reality is that 32 bit floats get used in the real
> > > world all over the place by 3D graphics guys.
> >
> > Don't read that, Xavier.
> Oh, is that about Xavier exploding if he hears 'real world' again?

Yes. Personally, I think the INRIA are doing a superb job with OCaml. It is an 
excellent implementation of an excellent language. I do all of my work in 
OCaml now. I'm getting offered jobs because I am so much more productive as a 
consequence. The programs I write whilst doing my 3D graphics research are 
more robust and faster than ever now that they have been converted entirely 
into OCaml from C++.

> > I don't believe Python was designed for doing 3D graphics.
> It wasn't.  It was designed to be flexible and easy to program, not
> fast.  Now that Python is growing in popularity, people are coming in
> post-hoc to try to make it fast.  Maybe in 3 years it'll be a good
> language in that regard.

I'm no expert, but I suspect there are numerous, rigorous theoretical reasons 
why it can't be made much more efficient (at least not by people who choose 
to program in Python ;-).

> > ...
> > Can you give an example of this?
> I'm feeling mentally challenged on specifics rigth now.  Generally
> speaking, 3D graphics problems are pipelines with N stages you might
> turn on or off.  This creates 2^N path possibilities.  Often you'd like
> to coalesce the operations at the various stages.

So you've got a large quantity of data in flat containers like arrays which 
you want to perform a sequence of algorithms on?

What sort of data is in the containers and what sorts of algorithms are you 

Is the problem that you would like to hoist the inner loops of all of the 
algorithms so that each datum has each algorithm applied to it rather than 
feeding all of the data through each algorithm in turn (i.e. deforestation)?


To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners