Browse thread
HLVM stuff
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Jon Harrop <jon@f...> |
| Subject: | Re: [Caml-list] HLVM stuff |
On Sunday 27 September 2009 22:58:59 David McClain wrote:
> On Sep 27, 2009, at 12:25 PM, Jon Harrop wrote:
> > where the "kthSmallest" and "Array2D.parallelInit" functions are both
> > polymorphic. The former handles implicit sequences of any comparable
> > type and
> > the latter handles 2D arrays of any element type. This use of
> > polymorphic
>
> But facing a situation with 2^26 pixels to process, I would never do
> that.
Here is a better one-line F# solution:
images |> Array2D.map (fun xs -> Array.sortInPlaceWith compare xs; xs.[m/2])
This solves your problem from the REPL in 0.34s. Moreover, you can easily
parallelize it in F#:
Parallel.For(0, n, fun y ->
for x=0 to n-1 do
Array.sortInPlaceWith compare images.[y, x])
images |> Array2D.map (fun xs -> xs.[m/2])
On this 8-core box, the time taken is reduced to 0.039s (finally a superlinear
speedup on my Intel box, yay!).
Here is the OCaml equivalent:
Array.map (Array.map (fun gs -> Array.sort compare gs; gs.(m/2))) images
This solves your problem non-interactively in 32s, which is 821x slower than
F#.
This huge performance discrepancy is a direct result of the elegant solution
using polymorphic functions. HLVM's solution to polymorphism solves this
problem, offering polymorphism with no performance degradation whatsoever.
> I would write a type-specific function to apply.
Why waste your time doing by hand what the compiler can do for you?
> Why dispatch of every pixel of the aggregate, when I could dispatch once at
> the top, to decide what kind of homogeneous array...
Why dispatch at all when a JIT compiler would already know all of the types
involved and could partially specialize your code for them?
FWIW, a completed HLVM would solve this problem extremely efficiently despite
having a naive garbage collector because the entire program only does a
single allocation. This is not at all uncommon in technical computing and is
exactly the characteristic I was referring to: these solutions leverage
features of the OCaml language like higher-order functions, currying and
partial application but they have completely different performance
requirements to those of Coq. In the context of technical computing, the
benefits of shared-memory parallelism far outweigh those of efficient
single-threaded allocation and collection of small values.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e