Version française
Home     About     Download     Resources     Contact us    
Browse thread
Vec: a functional implementation of extensible arrays
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Luca de Alfaro <luca@d...>
Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays
On 7/20/07, Jon Harrop <jon@ffconsultancy.com> wrote:
>
> On Friday 20 July 2007 08:35:39 Loup Vaillant wrote:
> > > ;-).  I am open to new ideas.  In part, I wanted a simple data
> structure
> > > (easier to extend, among other things).  Also, I use Set, Map, etc,
> quite
> > > often, and those are also balanced trees: I thought that if I can live
> > > with those, I can probably live with Vec as well.
>
> This is the beginnings of an awesome data structure!


Thanks!

> So can I. Your current implementation is already very attractive, and
> > looks very usable. For the new idea, have you thought of making (or
> > specifying) syntactic sugar to use your array?
>
> Should be very easy using the new camlp4. You might like to add a slicing
> notation as well. :-)


I have to study how to do it ... this would be very interesting.
Would you be interested in helping?

> About improving performance...
>
> I have two suggestions:
>
> 1. Add an extra node representing single elements that replaces
> Node(Empty, _,
> Empty). The reduces GC stress enormously and makes the whole thing ~30%
> faster.


This is easy.  I can give it a try soon, and see if I get something
reasonable, or if the code blows up.

2. Allow unbalanced sub trees. Balancing is slow and folds and maps don't
> need
> to rebalance, but "get" should force rebalancing. Extracting subarrays
> should
> return an unbalanced result.


This is almost easy.  I would need to add a bit to each node to keep track
of whether it's balanced...
The penalty would be that the balancing function would need to do slightly
more work to find out what has to be balanced.
So perhaps it's not a good idea for append, insert, but it could make sense
for concat (?), and especially for filter and sub...
But I am hesitant.  If one does concat, or one does sub to extract a
sub-array, I wrote the code already so that sharing is maximized. What is
the percentage of cases in which you get a Vec, but then don't do any
get/set on it, and only iterate?
Especially since you already have iterators on subranges?  Do you think it's
worth it?  Anyone has advice?


> > Is there a way to obtain a efficient filter?
>
> Yes. I discovered a most-excellent way to do this. It requires arbitrary
> metadata in every node, a constructor that composes subnodes to create the
> metadata for the parent and a filter function that can cull branches from
> the
> search tree.
>
> I used this in my Mathematica implementation to provide asymptotically
> fast
> filtering based upon lazily evaluated sets of symbols in each subnode.
> This
> gave huge performance improvements with no significant performance
> overhead.


I don't provide filter because..., well, I guess because I forgot: of all
iterators, filter is the one I need most rarely.
I should at least provide a simple implementation of it...

Another operation I would like to implement is splice:

splice i v1 v2

replaces the element in position i of vec v2 with vec v1.  A sort of
generalized insert.

Dr Jon D Harrop, Flying Frog Consultancy Ltd.
> OCaml for Scientists
> http://www.ffconsultancy.com/products/ocaml_for_scientists/?e
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


BTW, Jon (and anyone else as well), let me know if you would like to help...
I could create a Google Code project so that we get a svn repository for the
code.

Luca