Version française
Home     About     Download     Resources     Contact us    
Browse thread
Parallelized parsing
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon Harrop <jon@f...>
Subject: Re: [Caml-list] Parallelized parsing
On Tuesday 21 April 2009 08:19:33 you wrote:
> On Mon, Apr 20, 2009 at 23:15, Jon Harrop <jon@ffconsultancy.com> wrote:
> > For example, Mathematica syntax for nested lists of integers looks like:
> >
> >  {{{1, 2}}, {{3, 4}, {4, 5}}, ..}
> >
> > and there are obvious divide-and-conquer approaches to lexing and parsing
> > that grammar. You can recursively subdivide the string (e.g. memory
> > mapped from a file) to build a tree of where the tokens { , and } appear
> > by index and then recursively convert the tree into an AST.
> >
> > What other grammars can be lexed and/or parsed efficiently in parallel?
>
> Is it of any use? The overhead of parsing a single file in parallel is
> so high that you won't have any speedup, especially compared to the
> much simpler approach of parsing *several* files in parallel.

I'm seeing near-linear speedups parsing nested Mathematica lists in F# using 
the algorithms I described. Moreover, dumping large quantities of data from 
Mathematica in that format is much easier than dividing it into separate 
files because there is no clear break in the tree.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e