Browse thread
Parallelized parsing
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2009-04-21 (15:57) |
From: | Jon Harrop <jon@f...> |
Subject: | Re: [Caml-list] Parallelized parsing |
On Tuesday 21 April 2009 08:19:33 you wrote: > On Mon, Apr 20, 2009 at 23:15, Jon Harrop <jon@ffconsultancy.com> wrote: > > For example, Mathematica syntax for nested lists of integers looks like: > > > > {{{1, 2}}, {{3, 4}, {4, 5}}, ..} > > > > and there are obvious divide-and-conquer approaches to lexing and parsing > > that grammar. You can recursively subdivide the string (e.g. memory > > mapped from a file) to build a tree of where the tokens { , and } appear > > by index and then recursively convert the tree into an AST. > > > > What other grammars can be lexed and/or parsed efficiently in parallel? > > Is it of any use? The overhead of parsing a single file in parallel is > so high that you won't have any speedup, especially compared to the > much simpler approach of parsing *several* files in parallel. I'm seeing near-linear speedups parsing nested Mathematica lists in F# using the algorithms I described. Moreover, dumping large quantities of data from Mathematica in that format is much easier than dividing it into separate files because there is no clear break in the tree. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e