Browse thread
[Caml-list] ocamllex+ocamlyacc and not parsing properly
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Jon Harrop <jon@f...> |
| Subject: | Re: [Caml-list] ocamllex+ocamlyacc and not parsing properly |
On Sunday 07 August 2005 22:39, Jonathan Roewen wrote:
> I'm having some trouble with a lexer+parser I've written to parse IRC
> strings. Just about all strings are parsed correctly, but I'm having a
> few minor issues.
>
> Here are two strings that fail to parse correctly:
> :Sovereign.Wyldryde.org 254 dst 112 :holodeck programs running
> :
> :Sovereign.Wyldryde.org 333 dst #bfos Helio 112025589
I just added "irc_types.ml":
type command = JOIN | PART | MODE | TOPIC | NAMES | LIST | INVITE
| KICK | PRIVMSG | NOTICE | QUIT | PING | Numeric of int
and compiled with:
ocamllex irc_lexer.mll
ocamlyacc irc_parser.mly
ocamlc -c irc_types.ml irc_parser.mli irc_parser.ml irc_lexer.ml
ocamlmktop irc_types.cmo irc_parser.cmo irc_lexer.cmo -o irc.top
ran the custom top-level with "./irc.top" and asked it to lex the first of
your example strings:
# let lexbuf = Lexing.from_string ":Sovereign.Wyldryde.org 254 dst
112:holodeck programs running";;
val lexbuf : Lexing.lexbuf =
{Lexing.refill_buff = <fun>;
Lexing.lex_buffer =
":Sovereign.Wyldryde.org 254 dst 112 :holodeck programs running";
Lexing.lex_buffer_len = 62; Lexing.lex_abs_pos = 0;
Lexing.lex_start_pos = 0; Lexing.lex_curr_pos = 0;
Lexing.lex_last_pos = 0; Lexing.lex_last_action = 0;
Lexing.lex_eof_reached = true; Lexing.lex_mem = [||];
Lexing.lex_start_p =
{Lexing.pos_fname = ""; Lexing.pos_lnum = 1; Lexing.pos_bol = 0;
Lexing.pos_cnum = 0};
Lexing.lex_curr_p =
{Lexing.pos_fname = ""; Lexing.pos_lnum = 1; Lexing.pos_bol = 0;
Lexing.pos_cnum = 0}}
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.STRING "Sovereign.Wyldryde.org"
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.COMMAND (Irc_types.Numeric 254)
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.STRING "dst"
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.COMMAND (Irc_types.Numeric 112)
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.STRING "holodeck programs running"
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.EOL
So you're lexer is emitting the tokens str, com, str, com, str, eol but your
parser looks as though it is expecting str, com, str, str, str, eol.
I'm guessing the error is in the lexer because the grammar in the parser is
very simple. So ":Sovereign.Wyldryde.org" is lexed by "message" into str, " "
then invokes "command" which parses 254 into com, " " then invokes "param"
which parses "dst" into str, "param" then invokes the remaining into strs.
However, that can't be correct because the lexer has clearly gone back into
"command" in order to emit "Irc_types.Numeric 112".
It's just a guess, but have you assumed that each time the lexer is invoked by
the parser that it starts in the rule it was left in when, in fact, the
parser invokes the "message" rule every time?
> BTW: As an aside, if the lexer doesn't cover all the bases, it doesn't
> throw an exception, just screws up my OS (Bounds check error, followed
> by seg-fault).
Any idea what is causing the segfault?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists