English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Re: Why OCaml sucks
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-05-14 (04:40)
From: Kuba Ober <ober.14@o...>
Subject: Re: [Caml-list] Re: Why OCaml sucks
On Tuesday 13 May 2008, Robert Fischer wrote:
> > The problem, as I understand it, is in writting parsers.  Your standard
> > finite automata based regular expression library or lexical analyzer is
> > based, at it's heart, on a table lookup- you have a 2D array, whose size
> > is the number of input characters times the number of states.  For ASCII
> > input, the number of possible input characters is small- 256 at most.
> > 256 input characters times hundreds of states isn't that big of a table-
> > we're looking at sizes in 10's of K- easily handlable even in the bad
> > old days of 64K segments.  Even going to UTF-16 ups the number of input
> > characters from 256 to 65,536- and now a moderately large state machine
> > (hundreds of states) weighs in at tens of megabytes of table space.
> > And, of course, if you try to handle the entire 31-bit full unicode
> > point space, welcome to really large tables :-).
> >
> > The solution, I think, is to change the implementation of your finite
> > automata to use some data structure smarter than a flat 2D array, but
> > that's me.
> Yes.  It is certainly possible to write slow code to solve this problem.

With "slow code" you could have been meaning two things:

1. Table lookup globally replaced by compares-and-jumps. The latter benefit 
from branch prediction and speculative execution. So it's not slow anymore.

2. Table "compression" used, where a few compare-and-jumps remove 
huge "unused" swaths of the table. By "unused" I meant "bomb out with an 
internal error".

I think you're being silly. Stop it.

Cheers, Kuba