Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Again: Lexer becomes very slow for large tokens #3216

Closed
vicuna opened this issue Feb 20, 2002 · 2 comments
Closed

Again: Lexer becomes very slow for large tokens #3216

vicuna opened this issue Feb 20, 2002 · 2 comments
Labels

Comments

@vicuna
Copy link

vicuna commented Feb 20, 2002

Original bug ID: 908
Reporter: administrator
Status: closed
Resolution: fixed
Priority: normal
Severity: minor
Category: ~DO NOT USE (was: OCaml general)

Bug description

Full_Name: Gerd Stolpmann
Version: 3.04
OS: Linux
Submission from: p50816832.dip0.t-ipconnect.de (80.129.104.50)

Hello again,

I think I solved problem #900 as shown below (lexing.ml). This code passes
a lot of tests, so I think it works.

Another improvement would be that from_function passes directly the
lexbuf to the function, avoiding many blits. Of course, the function
needs the start position of the free slice in the buffer. Furthermore,
it is not as safe as the current solution because the function
can access the other parts of the buffer, but it would be a nice
addition.

Gerd

--

let lex_refill read_fun aux_buffer lexbuf =
let read =
read_fun aux_buffer (String.length aux_buffer) in
let n =
if read > 0
then read
else (lexbuf.lex_eof_reached <- true; 0) in
(* Is there enough space at the end of the buffer? )
let space = String.length lexbuf.lex_buffer - lexbuf.lex_buffer_len in
if space < n then begin
(
No )
(
First try to remove the first lex_start_pos bytes of the buffer )
let s = lexbuf.lex_start_pos in
let space' = space - s in
let efflen = lexbuf.lex_buffer_len - s in
if space' >= n then begin
(
There is enough space at the beginning of the buffer )
String.(unsafe_)blit lexbuf.lex_buffer s lexbuf.lex_buffer 0 efflen;
end
else begin
(
Allocate a bigger buffer )
let oldlen = String.length lexbuf.lex_buffer in
let newlen = max (oldlen * 2) (n + efflen) in
let newbuf = String.create newlen in
String.(unsafe_)blit lexbuf.lex_buffer s newbuf 0 efflen;
lexbuf.lex_buffer <- newbuf;
end;
(
Anyway, the first s bytes have been removed. Update the positions. )
lexbuf.lex_abs_pos <- lexbuf.lex_abs_pos + s;
lexbuf.lex_curr_pos <- lexbuf.lex_curr_pos - s;
lexbuf.lex_start_pos <- 0;
lexbuf.lex_last_pos <- lexbuf.lex_last_pos - s;
lexbuf.lex_buffer_len <- efflen;
end;
(
There is now enough space at the end of the buffer *)
String.unsafe_blit aux_buffer 0
lexbuf.lex_buffer lexbuf.lex_buffer_len
n;
lexbuf.lex_buffer_len <- lexbuf.lex_buffer_len + n

let from_function f =
{ refill_buff = lex_refill f (String.create 512);
lex_buffer = String.create 1024;
lex_buffer_len = 0;
lex_abs_pos = 0;
lex_start_pos = 0;
lex_curr_pos = 0;
lex_last_pos = 0;
lex_last_action = 0;
lex_eof_reached = false }

let from_channel ic =
from_function (fun buf n -> input ic buf 0 n)

let from_string s =
{ refill_buff = (fun lexbuf -> lexbuf.lex_eof_reached <- true);
lex_buffer = s ^ "";
lex_buffer_len = String.length s;
lex_abs_pos = 0;
lex_start_pos = 0;
lex_curr_pos = 0;
lex_last_pos = 0;
lex_last_action = 0;
lex_eof_reached = true }

let lexeme lexbuf =
let len = lexbuf.lex_curr_pos - lexbuf.lex_start_pos in
let s = String.create len in
String.unsafe_blit lexbuf.lex_buffer lexbuf.lex_start_pos s 0 len;
s

let lexeme_char lexbuf i =
String.get lexbuf.lex_buffer (lexbuf.lex_start_pos + i)

let lexeme_start lexbuf =
lexbuf.lex_abs_pos + lexbuf.lex_start_pos
and lexeme_end lexbuf =
lexbuf.lex_abs_pos + lexbuf.lex_curr_pos

@vicuna
Copy link
Author

vicuna commented Mar 11, 2002

Comment author: administrator

Fixed mostly as suggested, 2002-03-11, XL

@vicuna vicuna closed this as completed Mar 11, 2002
@vicuna
Copy link
Author

vicuna commented Mar 11, 2002

Comment author: administrator

Dear Gerd,

I think I solved problem #900 as shown below (lexing.ml). This code passes
a lot of tests, so I think it works.

Thanks for pointing out the problem and suggested a fix. The buffer
resizing code in Lexing was awful indeed. I have essentially merged
in your proposed fix.

Another improvement would be that from_function passes directly the
lexbuf to the function, avoiding many blits. Of course, the function
needs the start position of the free slice in the buffer. Furthermore,
it is not as safe as the current solution because the function
can access the other parts of the buffer, but it would be a nice
addition.

I can't remember the rationale for the current design, which has been
around for a very long time (the beginning of Caml Light, I guess).
I'd rather not break existing code and tolerate the extra blits --
those add only a constant factor, unlike the old resizing strategy
which could be arbitrarily expensive.

Cheers,

  • Xavier Leroy

@vicuna vicuna added the bug label Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant