Browse thread
Re: Why OCaml sucks
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Kuba Ober <ober.14@o...> |
| Subject: | Re: [Caml-list] Re: Why OCaml sucks |
On Tuesday 13 May 2008, Robert Fischer wrote: > > The problem, as I understand it, is in writting parsers. Your standard > > finite automata based regular expression library or lexical analyzer is > > based, at it's heart, on a table lookup- you have a 2D array, whose size > > is the number of input characters times the number of states. For ASCII > > input, the number of possible input characters is small- 256 at most. > > 256 input characters times hundreds of states isn't that big of a table- > > we're looking at sizes in 10's of K- easily handlable even in the bad > > old days of 64K segments. Even going to UTF-16 ups the number of input > > characters from 256 to 65,536- and now a moderately large state machine > > (hundreds of states) weighs in at tens of megabytes of table space. > > And, of course, if you try to handle the entire 31-bit full unicode > > point space, welcome to really large tables :-). > > > > The solution, I think, is to change the implementation of your finite > > automata to use some data structure smarter than a flat 2D array, but > > that's me. > > Yes. It is certainly possible to write slow code to solve this problem. With "slow code" you could have been meaning two things: 1. Table lookup globally replaced by compares-and-jumps. The latter benefit from branch prediction and speculative execution. So it's not slow anymore. 2. Table "compression" used, where a few compare-and-jumps remove huge "unused" swaths of the table. By "unused" I meant "bomb out with an internal error". I think you're being silly. Stop it. Cheers, Kuba