Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] 32 bit floats, SSE instructions
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon Harrop <jdh30@c...>
Subject: Re: [Caml-list] 3D graphics debate

> > Are you trying to write a device driver in OCaml?
>
> 3D device driver problems are indicative of graphics problems in
> general.  There's not much 'new' at any level of the game.

I would contest that, but then I am trying to make some "new" stuff...

> Just things 
> that have been committed to HW vs. things that are still waiting to be
> committed to HW, with some interrupting complexities in between.
> Traversing a cell grid database is no different from rasterizing a
> triangle.  3D graphics is one giant self-similar fractal.

Ironically, I would say that more things should be done hierarchically, e.g. 
octrees rather than cell grids. :-)

> > I think you are exaggerating cost of the "abstraction" of
> > reordering the arguments of a function.
>
> And I'm thinking you've never done high performance anything in a big
> architecture for a living.  Prove me wrong.

Oh ok then - I've just passed my PhD in computational and theoretical physics 
from the University of Cambridge. While I was there I developed numerous high 
performance programs, both for simulation (condensed matter physics) and 
visualisation (using OpenGL). I also invented and published research on 
wavelets so I've studied a lot of multiresolution analyses and hierarchical 
data structures...

I also did some fun projects which, I believe, are valid research into the 
usefulness of hierarchical data structures over flat ones in the context of 
2D and 3D graphics.

One is a multiresolution representation of a planet which is dynamically 
tesselated to give just enough detail for rendering:

http://www.chem.pwf.cam.ac.uk/~jdh30/programming/opengl/lod/index.html

if the planet were represented to enough detail using flat data structures 
(equivalent to the cell grid) for that demo it would require 2^80 floats. Had 
I known OCaml at the time, this would have been a much smaller and faster 
program. :-)

Another ongoing project of mine is a 2D vector graphics renderer which takes a 
similar, hierarchical approach to the rendering of 2D graphics (which are 
quite different to 3D because you're now interested in rendering the interior 
defined by a manifold, rather than the manifold itself):

http://www.chem.pwf.cam.ac.uk/~jdh30/programming/opengl/smoke/index.html

I have converted that C++ version into OCaml and optimised it so that it is 
now much faster, more robust and better behaved wrt real-time rendering. It 
is about 100 times faster than every other general purpose vector graphics 
renderer that I have ever come across. Of course, because it is 
multiresolution, it can render 1Gb vector graphics files in real-time whilst 
all the other programs fall over. This makes it uniquely suitable for 
real-time, interactive graphics which is why I intend to commercialise it...

So yes, I've dabbled. :-)

> > Apart from that "CPU" thing. ;-)
>
> I suppose you think CPU approaches are divorced from the realities of
> what underlying HW expects?  You want to retire things quickly, you
> don't spread them out all over memory.  Not unless you've got some kind
> of super duper programmer visible scatter-gather DMA, and I'm not aware
> of anyone building PCs in such configurations.

If you want to retire things quickly, you use a hierarchical representation 
which allows you to retire things in O(ln n) rather than O(n) time. Who 
knows, it may even let you reduce your storage from O(n^2) to O(n).

> > My point is that there are likely to be much more productive,
> > higher-level optimisations that you could be doing.
>
> It is a wasted point.  Performance jock do that *and* the low-level
> optimizations.  The key is sane architecture so you don't have to do
> more than a handful of low-level optimizations.

Ideally. But the vast majority of "performance jocks" ignore the mathematical 
proofs and stick to twiddling 32-bit floats.

> No, I said that *in addition to* that rather common class of algorithms,
> even sequential processing is usually accept / reject.  Let's say you've
> got SoA for a 4-field vector.  When you accept, you're going to be
> touching all 4 arrays.  If you often accept, then your cache is going to
> be polluted by all the rejects.  You have made your problem 4 times less
> cache coherent than it otherwise would be.  The reality of 3D graphics
> processing is you usually need all that info in the structure.  That's
> why it's a structure.

Aren't we talking about a 4xn matrix rather than an nx4 matrix?

> > Can these algorithms not be transformed so that they
> > access more coherently?
>
> No, dammit!  You Just Don't Get It [TM].  When someone tells you
> something is "an invasive programming model," they're telling you
> they're not looking for the opportunity to rewrite their whole software
> architecture from scratch just because you personally think SoA might be
> more kewl than AoS.

Well, if you want to optimise your program and you wrote it badly the first 
time... ;-)

> > I'd like more, specific examples here. What determines the
> > accept or reject?
> > What is the consequence of an accept or reject?
>
> I don't have time to educate you on the fundamentals of 3D graphics
> processing.  You can find all the classical, basic information via
> newsgroups such as comp.graphics.algorithms.  If you want to save a lot
> of time going up the learning curve, you could post something like "AoS
> or SoA?" and see what people say.  Or Google the archives for them.

I would say that I am somewhat familiar with the concepts involved. My concern 
is that I have only ever come across your optimisation problems in the 
context of shoehorning algorithms into using flat containers.

> So was it 'dabbler' or not then?  I suppose you think that the
> implementation complexity of multiresolution representations is
> desireable in an industry where a significantly faster card ships every
> 6 months?

Absolutely. It separates the men from the boys...

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners