Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] ocamllex+ocamlyacc and not parsing properly
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon Harrop <jon@f...>
Subject: Re: [Caml-list] ocamllex+ocamlyacc and not parsing properly
On Sunday 07 August 2005 22:39, Jonathan Roewen wrote:
> I'm having some trouble with a lexer+parser I've written to parse IRC
> strings. Just about all strings are parsed correctly, but I'm having a
> few minor issues.
>
> Here are two strings that fail to parse correctly:
> :Sovereign.Wyldryde.org 254 dst 112 :holodeck programs running
> :
> :Sovereign.Wyldryde.org 333 dst #bfos Helio 112025589

I just added "irc_types.ml":

type command = JOIN | PART | MODE | TOPIC | NAMES | LIST | INVITE
               | KICK | PRIVMSG | NOTICE | QUIT | PING | Numeric of int

and compiled with:

ocamllex irc_lexer.mll
ocamlyacc irc_parser.mly
ocamlc -c irc_types.ml irc_parser.mli irc_parser.ml irc_lexer.ml
ocamlmktop irc_types.cmo irc_parser.cmo irc_lexer.cmo -o irc.top

ran the custom top-level with "./irc.top" and asked it to lex the first of 
your example strings:

# let lexbuf = Lexing.from_string ":Sovereign.Wyldryde.org 254 dst 
112:holodeck programs running";;
val lexbuf : Lexing.lexbuf =
  {Lexing.refill_buff = <fun>;
   Lexing.lex_buffer =
    ":Sovereign.Wyldryde.org 254 dst 112 :holodeck programs running";
   Lexing.lex_buffer_len = 62; Lexing.lex_abs_pos = 0;
   Lexing.lex_start_pos = 0; Lexing.lex_curr_pos = 0;
   Lexing.lex_last_pos = 0; Lexing.lex_last_action = 0;
   Lexing.lex_eof_reached = true; Lexing.lex_mem = [||];
   Lexing.lex_start_p =
    {Lexing.pos_fname = ""; Lexing.pos_lnum = 1; Lexing.pos_bol = 0;
     Lexing.pos_cnum = 0};
   Lexing.lex_curr_p =
    {Lexing.pos_fname = ""; Lexing.pos_lnum = 1; Lexing.pos_bol = 0;
     Lexing.pos_cnum = 0}}
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.STRING "Sovereign.Wyldryde.org"
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.COMMAND (Irc_types.Numeric 254)
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.STRING "dst"
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.COMMAND (Irc_types.Numeric 112)
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.STRING "holodeck programs running"
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.EOL

So you're lexer is emitting the tokens str, com, str, com, str, eol but your 
parser looks as though it is expecting str, com, str, str, str, eol.

I'm guessing the error is in the lexer because the grammar in the parser is 
very simple. So ":Sovereign.Wyldryde.org" is lexed by "message" into str, " " 
then invokes "command" which parses 254 into com, " " then invokes "param" 
which parses "dst" into str, "param" then invokes the remaining into strs.

However, that can't be correct because the lexer has clearly gone back into 
"command" in order to emit "Irc_types.Numeric 112".

It's just a guess, but have you assumed that each time the lexer is invoked by 
the parser that it starts in the rule it was left in when, in fact, the 
parser invokes the "message" rule every time?

> BTW: As an aside, if the lexer doesn't cover all the bases, it doesn't
> throw an exception, just screws up my OS (Bounds check error, followed
> by seg-fault).

Any idea what is causing the segfault?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists