Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input_line is #3155

Closed
vicuna opened this issue Jan 15, 2002 · 11 comments
Closed

input_line is #3155

vicuna opened this issue Jan 15, 2002 · 11 comments
Labels

Comments

@vicuna
Copy link

vicuna commented Jan 15, 2002

Original bug ID: 813
Reporter: administrator
Status: closed
Resolution: fixed
Priority: normal
Severity: minor
Category: ~DO NOT USE (was: OCaml general)

Bug description

Full_Name: Lukasz Lew
Version: 3.04
OS: Win 2000
Submission from: 54-dzi-3.acn.waw.pl (62.121.66.54)

The read_line function from pervasives.ml is ineffective.
It is O(n^2) because cost of string concatenation. So i propose sth like this:

let input_line chan =
let rec input_line accu len =
let n = input_scan_line chan in
if n = 0 then raise End_of_file
else if n > 0 then begin
let res = string_create (n-1) in
ignore (unsafe_input chan res 0 (n-1));
ignore (input_char chan);
(len+n-1, (n-1, res) :: accu)
end else begin
let res = string_create (-n) in
ignore (unsafe_input chan res 0 (-n));
try input_line ((-n, res) :: accu) (len-n) with
End_of_file -> len, accu
end
in
let len, lst = input_line [] 0 in
let res = string_create len
and pos = ref len in
List.iter
(fun (len, str) -> pos := !pos - len; string_blit str 0 res !pos len)
lst;
res

This function at first reads blocks provided by unsafe_input
and push on list. And when total lenght is known it creates result string, copy
everything and retuns it.

@vicuna
Copy link
Author

vicuna commented Jan 16, 2002

Comment author: administrator

The read_line function from pervasives.ml is ineffective.
It is O(n^2) because cost of string concatenation.

I'm not too concerned about this, because it is fairly rare that more
than one refill of the input buffer is needed to reach the final newline.
Did you observe the quadratic behavior, or is this just a theoretical concern?

So i propose sth like this:

Reasonable, although this could be written more concisely with
String.concat, or (even shorter) Buffer.add_string.

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Jan 16, 2002

Comment author: administrator

The read_line function from pervasives.ml is ineffective.
It is O(n^2) because cost of string concatenation.

I'm not too concerned about this, because it is fairly rare that more
than one refill of the input buffer is needed to reach the final newline.
Did you observe the quadratic behavior, or is this just a theoretical concern?

I heve to write ACM-like programs, and i have some limits for execution
time. The input for the program is taken from stdin and often happens that
there are many integers in one line (up to 10MB). My programs always had
judgment "time limit exceeded". I tried to test my programs at home.
Everything was fast ... until lenght of one line reached 1MB.
Over this limit everyting was very slow because of big HDD access time.

So i propose sth like this:

Reasonable, although this could be written more concisely with
String.concat, or (even shorter) Buffer.add_string.
Yes, now i see.

I have two requests/questions:

  1. Could read_int () behave like scanf("%d",...) rather than
    int_of_string( read_line() ) ?
  2. Would it ever be posible to write things like this:

let add a b = a +. b;;

val add : float -> float -> float =

let add a b = a + b;;

val add : int -> int -> int =

add 3.5 4.5;;

  • : float = 8

add 3 4;;

  • : int = 7

?

Lukasz Lew

@vicuna
Copy link
Author

vicuna commented Jan 16, 2002

Comment author: administrator

Date: Wed, 16 Jan 2002 17:37:02 +0100 (MET)
From: xavier.leroy@inria.fr

The read_line function from pervasives.ml is ineffective.
It is O(n^2) because cost of string concatenation.
I'm not too concerned about this, because it is fairly rare that more than one
refill of the input buffer is needed to reach the final newline. Did you
observe the quadratic behavior, or is this just a theoretical concern?
I did, because I had noticed the problem in input_line, so I tried to
read a 16 Mb file with no \n (on purpose). What I didn't understand
is that even when I enable GC compacting, my test program still died
by swap space exhaustion (500 Mb) -- maybe this is another bug report.

So I propose sth like this:
Reasonable, although this could be written more concisely with
String.concat, or (even shorter) Buffer.add_string.
I don't know if Pervasives should depend on String or Buffer.

Just a note: I had to implement a slightly different version of
read_line for cash, and my first solution was very much like the one
proposed. I changed it because the most frequently used path (short
lines) was noticeably slower (not `a lot', but noticeably). But my
current version is even less concise... (I'm willing to give it, of
course).

Bruno.

@vicuna
Copy link
Author

vicuna commented Jan 16, 2002

Comment author: administrator

From: ll189417@zodiac.mimuw.edu.pl
Subject: Re: input_line is (#3155)
Date: Wed, 16 Jan 2002 18:06:49 +0100 (MET)
Message-ID: 200201161706.SAA16542@pauillac.inria.fr

  1. Would it ever be posible to write things like this:

let add a b = a +. b;;

val add : float -> float -> float =

let add a b = a + b;;

val add : int -> int -> int =

add 3.5 4.5;;

  • : float = 8

add 3 4;;

  • : int = 7

You here try to overload the add identifier so that
it can be used for both the additions of ints and floats.
It is impossible in O'Caml... at this moment.

In your example, it is clear that "add" in add 3.5 4.5 should refer to
the one for floats and add 3 4 refers to the one for integers.
But how about the following ?

let double x = add x x;;

What the type of double ? Is it for integers, floats, or both ?
Or rejected by the type checker because of the ambiguity of add ?
In the polymorphic languages like ML, this overloading problem
is not so trivial...


Jun P. Furuse Jun.Furuse@inria.fr
INRIA
Institut National de Recherche en Informatique et en Automatique

@vicuna
Copy link
Author

vicuna commented Jan 16, 2002

Comment author: administrator

  1. Would it ever be posible to write things like this:

let add a b = a +. b;;

val add : float -> float -> float =

let add a b = a + b;;

val add : int -> int -> int =

add 3.5 4.5;;

  • : float = 8

add 3 4;;

  • : int = 7

You here try to overload the add identifier so that
it can be used for both the additions of ints and floats.
It is impossible in O'Caml... at this moment.

In your example, it is clear that "add" in add 3.5 4.5 should refer to
the one for floats and add 3 4 refers to the one for integers.
But how about the following ?

let double x = add x x;;

What the type of double ? Is it for integers, floats, or both ?
Or rejected by the type checker because of the ambiguity of add ?
In the polymorphic languages like ML, this overloading problem
is not so trivial...
So have you any ideas/plans about overloading?
Are there any papers with suggestions how to implement overloading in
Ocaml?

Please read this code:

let add = function
I x -> ( function I y -> I (x+y) | F _-> failwith "int+float")
| F x -> ( function F y -> F (x+.y) | I _-> failwith "int+float")
;;

let double x = add x x;;

Am i right that we can emulate overloading with polymorphic variants?
If true, am i rigth that we don't make it in core because of Weaknesses of
polymorphic variants pointed in 4.2.1 chapter of Ocaml documentation?

I can't clearly understand relationship between polymorphic variants and
overloading, could you explain me this?

Does any functional languages have overloading?
How it is solved there?
Are they hard typed?

Uhh... so much questions..sorry :)

Lukasz Lew

@vicuna
Copy link
Author

vicuna commented Jan 17, 2002

Comment author: administrator

On Wed, Jan 16, 2002 at 06:20:05PM +0100, Bruno.Verlyck@inria.fr wrote:

Date: Wed, 16 Jan 2002 17:37:02 +0100 (MET)
From: xavier.leroy@inria.fr

The read_line function from pervasives.ml is ineffective.
It is O(n^2) because cost of string concatenation.
I'm not too concerned about this, because it is fairly rare that more than one
refill of the input buffer is needed to reach the final newline. Did you
observe the quadratic behavior, or is this just a theoretical concern?
I did, because I had noticed the problem in input_line, so I tried to
read a 16 Mb file with no \n (on purpose). What I didn't understand
is that even when I enable GC compacting, my test program still died
by swap space exhaustion (500 Mb) -- maybe this is another bug report.

C'est un problème que j'ai déjà rencontré à plusieurs reprises.
Lorsque l'on alloue des chaînes de caractères assez longues et de plus
en plus grandes, OCaml alloue une zone mémoire pour chaque chaîne et
ne peut pas réutiliser ces zones mémoires pour les chaînes suivantes
car elles sont trop petites. De plus, ces zones ne sont jamais
libérées (à moins de faire tourner le compacteur).

Peut-être faudrait-il un traitement spécial pour les objets de grande
taille : par exemple, les allouer dans des zones spéciales crées à
l'aide de mmap (pour éviter une fragmentation de bas niveau) et qui
seraient libérées avec l'objet.

-- Jérôme

@vicuna
Copy link
Author

vicuna commented Jan 17, 2002

Comment author: administrator

Did you observe the quadratic behavior, or is this just a
theoretical concern?

I heve to write ACM-like programs, and i have some limits for execution
time. The input for the program is taken from stdin and often happens that
there are many integers in one line (up to 10MB). My programs always had
judgment "time limit exceeded". I tried to test my programs at home.
Everything was fast ... until lenght of one line reached 1MB.

OK, I'm convinced :-) We will fix it one way or another. Bruno's
comment that Buffer and String aren't usable in Pervasives is very
well taken, I forgot about that...

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Jan 17, 2002

Comment author: administrator

From: Jerome Vouillon jerome.vouillon@inria.fr

ne peut pas réutiliser ces zones mémoires pour les chaînes suivantes
car elles sont trop petites. De plus, ces zones ne sont jamais
libérées (à moins de faire tourner le compacteur).

En fait, meme en faisant tourner le compacteur, elles ne seront pas
liberees (parce qu'il y a une autre zone occupee apres dans la
memoire). Y'a peut-etre moyen de patcher le GC.

Peut-être faudrait-il un traitement spécial pour les objets de grande
taille : par exemple, les allouer dans des zones spéciales crées à
l'aide de mmap (pour éviter une fragmentation de bas niveau) et qui
seraient libérées avec l'objet.

Ce genre de bidouille implique generalement un overhead pas
negligeable sur tout le systeme, et un paquet de bugs en prime. La
vraie solution c'est que l'utilisateur change le parametre
[major_heap_increment].

-- Damien

@vicuna
Copy link
Author

vicuna commented Jan 23, 2002

Comment author: administrator

I've been thinking about the problem with Pervasives.read_line.
This was designed for reading input from users or from network
protocols (where the line size is bounded, usually by 1k). It was
never intended for heavy-duty use.

If you want to read arbitrary-length lines, not only it has quadratic
behaviour, but it cannot handle an input line larger than the maximum
string size, which is a bit less than 16M on 32-bit machines.

I think we should set a reasonable limit (like 64k) to the line
length in Pervasives.read_line and maybe give another function (in
some other module) for more extreme usage.

I did, because I had noticed the problem in input_line, so I tried to
read a 16 Mb file with no \n (on purpose). What I didn't understand
is that even when I enable GC compacting, my test program still died
by swap space exhaustion (500 Mb) -- maybe this is another bug report.

As far as I can tell, it should have died with
Invalid_argument ("create_string").

-- Damien

@vicuna
Copy link
Author

vicuna commented Jan 23, 2002

Comment author: administrator

Date: Wed, 23 Jan 2002 17:17:55 +0100 (MET)
From: damien.doligez@inria.fr

I've been thinking about the problem with Pervasives.read_line.
[..]
it cannot handle an input line larger than the maximum string size,
which is a bit less than 16M on 32-bit machines.
Bonne remarque.
I think we should set a reasonable limit (like 64k) to the line
length in Pervasives.read_line and maybe give another function (in
some other module) for more extreme usage.
Oui, pourquoi pas.

I did, because I had noticed the problem in input_line, so I tried
to read a 16 Mb file with no \n (on purpose). What I didn't
understand is that even when I enable GC compacting, my test
program still died by swap space exhaustion (500 Mb) -- maybe this
is another bug report.

As far as I can tell, it should have died with
Invalid_argument ("create_string").
Non, mon fichier faisait plus de 16e6 mais moins de 2**24. De toutes
manières, le programme n'a jamais eu l'occasion d'atteindre cette
limite, sans doute à cause du comportement' de Gc.compact dont tu parlais dans ton mail précédent (du 17 Jan 2002 17:20:34 +0100) -- à moins que tu ne sous-entendes ici: avec quelques Go de swap en
plus et après un certain temps'.

Penses-tu le modifier (le comportement) ?

Bruno.

@vicuna
Copy link
Author

vicuna commented Feb 11, 2002

Comment author: administrator

Reimplemented input_line 2002-02-11 XL

@vicuna vicuna closed this as completed Feb 11, 2002
@vicuna vicuna added the bug label Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant