Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In String (and Bytes), index_from raises Not_found instead of Invalid_argument on the string's length #7183

Closed
vicuna opened this issue Mar 16, 2016 · 3 comments

Comments

@vicuna
Copy link

vicuna commented Mar 16, 2016

Original bug ID: 7183
Reporter: Niols
Status: closed (set by @damiendoligez on 2016-03-23T13:42:31Z)
Resolution: not a bug
Priority: low
Severity: minor
Version: 4.02.3
Category: standard library
Monitored by: @gasche @hcarty

Bug description

In toplevel:

        OCaml version 4.02.3

# let s = "any string";;
val s : string = "any string"

# String.index_from s (String.length s) '.';;
Exception: Not_found.

# String.index_from s (String.length s + 1) '.';;
Exception: Invalid_argument "String.index_from / Bytes.index_from".

# s.[String.length s];;
Exception: Invalid_argument "index out of bounds".

#

while in the String lib reference, we have:

val index_from : string -> int -> char -> int

String.index_from s i c returns the index of the first occurrence of character c in string s after position i. String.index s c is equivalent to String.index_from s 0 c.

Raise Invalid_argument if i is not a valid position in s. Raise Not_found if c does not occur in s after position i.

I was thus expecting

# String.index_from s (String.length s) '.';;

to raise

Exception: Invalid_argument "String.index_from / Bytes.index_from".

I believe this is because of the line 228 in bytes.ml:

if i < 0 || i > l then invalid_arg "String.index_from / Bytes.index_from"

Is this the expected behaviour?

@vicuna
Copy link
Author

vicuna commented Mar 16, 2016

Comment author: @gasche

I think this is a bug. You can use the grep command
git grep -n "<[^=>].||.[^<]>[^=]" **/*.ml
to find other occurences of this pattern. In particular,

stdlib/buffer.ml:116: if len < 0 || len > Sys.max_string_length then (* #5004 *)

looks like a potential bug on 32bit architectures to me. It's in the Buffer.resize function, so it's not directly callable, but I wonder what would happen with

let () =
let buf = Buffer.create 1000 in
let str = String.make (Sys.max_string_length - 1000) '0' in
Buffer.add_string buf str

on a 32bit machines. (A 64bit machine will fail with Out_of_memory at the String.make call)

@vicuna
Copy link
Author

vicuna commented Mar 17, 2016

Comment author: Niols

I'm not sure the one you mentionned would be a bug, since the length may range from 0 to Sys.max_string_length included.
An index however may range from 0 to Sys.max_string_length excluded. This seems to be the source of the problem in this issue.

Concerning stdlib/buffer.ml:166, in add_channel, I must admit I don't see the point of this test, since:

  • either len > Sys.max_string_length is indeed a problem, but then that's more a problem in really_input
  • or the interesting value is Sys.max_string_length - b.length (or maybe p.position?) but then that's not really the right test

@vicuna
Copy link
Author

vicuna commented Mar 23, 2016

Comment author: @damiendoligez

This is expected behavior and it conforms exactly to the documentation:

Given a string s of length l,
[...]
A position is the point between two characters or at the beginning or end of the string. We call a position valid in [s] if it falls within the range [0...l] (inclusive).

When you call [index_from s p c], you are saying "search for c in the substring of s that starts after character p". When p = String.length s, this is the empty substring, but it's a valid use case.

Otherwise, you couldn't just use 0 to search in the whole string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant