Browse thread
Parallelized parsing
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Jon Harrop <jon@f...> |
| Subject: | Re: [Caml-list] Parallelized parsing |
On Tuesday 21 April 2009 08:19:33 you wrote:
> On Mon, Apr 20, 2009 at 23:15, Jon Harrop <jon@ffconsultancy.com> wrote:
> > For example, Mathematica syntax for nested lists of integers looks like:
> >
> > {{{1, 2}}, {{3, 4}, {4, 5}}, ..}
> >
> > and there are obvious divide-and-conquer approaches to lexing and parsing
> > that grammar. You can recursively subdivide the string (e.g. memory
> > mapped from a file) to build a tree of where the tokens { , and } appear
> > by index and then recursively convert the tree into an AST.
> >
> > What other grammars can be lexed and/or parsed efficiently in parallel?
>
> Is it of any use? The overhead of parsing a single file in parallel is
> so high that you won't have any speedup, especially compared to the
> much simpler approach of parsing *several* files in parallel.
I'm seeing near-linear speedups parsing nested Mathematica lists in F# using
the algorithms I described. Moreover, dumping large quantities of data from
Mathematica in that format is much easier than dividing it into separate
files because there is no clear break in the tree.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e