English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
lexing from a sub-string with an explicit position
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2006-02-16 (19:23)
From: Basile STARYNKEVITCH <basile@s...>
Subject: lexing from a sub-string with an explicit position

Dear All,

I'm painfully coding a scriptable wiki (ie web users can interactively
edit scripts running within the wiki in their browser; the edited wiki
data may contain a mixture of document & script code) - the code (and
the design) is already messy... 

For example, the login web window is scripted as follow


    myloginfun(user,password) =
      let loginstatus= login_user_failed(user,password) in
        if loginstatus then
  	   block [{
  	     [para.big Failed login: [$loginstatus]]
  	   block [{ [para.big Welcome]  }]


  block [{

     [para.huge Welcome [i here]]
        [title Please log in.]
        [fitem your name or account] [formtext userv]
        [fitem your password] [formpassword passwordv]
        [formaction ["login"] [( myloginfun(user,password) )] ]



In the above syntax, let, block, if, ... are scripting keywords.
document chunks are strings bracketed by [{ }] following the block
keyword. square brackets like [i here] (similar to html <i>here</i>)
are wiki markup.  document chunks also contain expressions bracketed
in [( )] for describing the actions executed in forms. [$foo] is the
variable foo expanded as a displayed string (actually a marked-up wiki
document subchunk).

Given that wiki syntax should be rather contextual and flexible, and
that the scripting language (basically with a Scheme like semantics
-dynamic typing, 1st order functions, letrec, ... and a Pascalian
syntax) is more rigid, I made the choice (probably a bad design...) to
parse the Wiki syntax by handcrafted code and to parse the embedded
script code using Menhir (BTW, thanks to Francois Pottier & Yan for
their help). I was not able to code a menhir grammar with 2 modes, one
for Wiki (flexible) and one for scripting.

So my wiki parser (handcrafted recursive descent parser) calls the
expression parser (menhir coded LR) which itself calls the wiki
parser, etc... So I am copying substring quite often and maybe even
backtracking (but parsing performance is not that important).

For example, the menhir & ocamllex parser of scripting language copy
the bracketed string [{ [para.big Welcome] }] and call the wiki
descending parser (which, for expressions bracketed like [(
myloginfun(user,password) )] will call the LR parser, etc...

So I am copying a substring and calling a Lexing parser with a lexbuf
etc. The problem is to keep positions accurately. Just using
Lexing.from_string and then setting the lex_curr_p and lex_start_p
fields manually do not work.

the following do not work as I want. I want the string to be parsed
and I want to give a position for it (remember, the string is
extracted from some bigger stuff).

let lexbuf_from_positioned_string lpos str =
  let posr = ref 0 
  and strlen = String.length str 
  and strcopy = String.copy str in
  let stread tostr tocnt = 
    let oldposr = !posr in
      if oldposr >= strlen then 0
	let rcnt = min tocnt (strlen - oldposr) in
	  String.blit strcopy oldposr tostr 0 rcnt;
	  posr := oldposr+rcnt;
  let lxbu = Lexing.from_function stread in
    lxbu.Lexing.lex_start_p <- lpos;
    lxbu.Lexing.lex_curr_p <- lpos;

(* then called with *)

let common_expression_string_parser?(startpos: Lexing.position option)  s =
  let slen = String.length s in
      let lexbuf = match startpos with None -> Lexing.from_string s 
	| Some po -> lexbuf_from_positioned_string po s in
	      (Lexer.exprtoken) lexbuf 

Any clues?

Don't tell me I should not do that.... (I'm becomiong ashamed of doing it).

Regards, and thanks for reading...

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France