Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion to add the "bytes" type string constants #7797

Closed
vicuna opened this issue May 19, 2018 · 5 comments
Closed

Suggestion to add the "bytes" type string constants #7797

vicuna opened this issue May 19, 2018 · 5 comments

Comments

@vicuna
Copy link

vicuna commented May 19, 2018

Original bug ID: 7797
Reporter: xvilka
Status: acknowledged (set by @xavierleroy on 2018-05-21T10:08:56Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.06.1
Category: language features
Monitored by: @nojb @gasche

Bug description

Right now OCaml has support for string literals, but with the clear distinction between bytes and strings there is a need for "bytes" type literals. For example
when you need to compare

if some_bytes_var = "\x00"

Since OCaml 4.06 this wont work anymore and Bytes.of_string "\x00" would be an overkill and unneeded copy.What I am suggesting is to introduce new type of the byte literals with syntax like b"\x00" or another.

So instead of

if some_bytes_var = (Bytes.of_string "\x00")

it would be possible to write

if some_bytes_var = b"\x00"

@vicuna
Copy link
Author

vicuna commented May 19, 2018

Comment author: @gasche

If we had bytes literal, the only safe general way to give them a meaning would be to define your proposed b"foo" as essentially an alias for Bytes.of_string "foo". You can already write the same thing with the same performance profile. Doing any better would require bytes-function-specific optimizations, and we could already perform them today if we wanted to. Given that the choice of a good grammar for string literals is problematic (b"foo" is valid OCaml code today, after (let b s = "Hello "^s) for example), I don't think that this feature proposal is worth the trouble.

The reason why we would need a copy in the general case is that string literals in the code are allocated globally for the module, and will thus be shared, while bytes cannot be shared so each occurrence of a byte literal must result in a fresh allocation. For example, consider

for i = 0 to n do
f "foo";
done

With string literals, "foo" is a piece of module-global data, and all invocations to (f) get a pointer to the same string. With a bytes literal b"foo", doing the same thing would be deeply unsound: if the first iteration of (f) modifies the string to "bar", you don't want the next iteration to be invoked with "bar" instead of "foo" as an argument -- this is a bug that you could actually observe with older OCaml versions using mutable strings. So you need to allocate a fresh copy of the literal b"foo", and there is no faster way than to allocate the memory and then do a blit from a global string constant, which is exactly what (Bytes.of_string "foo") of do.

(It would be possible to optimize slightly by statically computing the length of the new string to be allocated, but we could do this optimization for all String.length call on a string literal, no need to special-case byte literals.)

I argued that in the general case you always need to allocate a fresh value for a bytes literal, but note that (some_var = "foo") is a special case: there you don't actually need to allocate a new string, because no mutation will occur as part of the equality test. So you could actually write (Bytes.unsafe_to_string some_var = "foo"), which performs no copy -- and I find nicer than casting "foo" to bytes, as temporarily pretending that things are immutable is nicer than the other way around. It's easy for the user to write this; the compiler could figure out by itself that a Bytes.of_string operation on "foo" here is not necessary and rewrite it to the unsafe copy, but then it could do this in all case of comparison with strings, not just those arising from a byte literal. Again, byte literals bring you no benefit.

@vicuna
Copy link
Author

vicuna commented May 21, 2018

Comment author: @xclerc

@gasche I am a bit confused; why would bytes literals
be different from array literals?

@vicuna
Copy link
Author

vicuna commented May 21, 2018

Comment author: @xavierleroy

There are no array literals strictly speaking. There is a construct [|e1; ... eN|] to build arrays in extension from the values of the expressions e1, ..., eN. When all expressions are compile-time constants the compiler tries to implement this construct more efficiently, by copying a statically-allocated array, as @gasche mentioned.

The equivalent for bytes would be a construct that builds a byte sequence of length N from N expressions of type char. This is not what this feature request is about.

I agree with @gasche that the use cases for byte literals are probably too few to justify special syntax and semantics. It could make sense, however, to add some "mixed" bytes-and-strings operations to the Bytes module, such as "compare a bytes with a string".

@vicuna
Copy link
Author

vicuna commented May 21, 2018

Comment author: @xclerc

There are no array literals strictly speaking. There is a construct [|e1; ... ?eN|] to build arrays in extension from the values of the expressions e1, ..., eN. When all expressions are compile-time constants the compiler tries to implement this construct more efficiently, by copying a statically-allocated array, as @gasche mentioned.

The equivalent for bytes would be a construct that builds a byte sequence of length N from N expressions of type char. This is not what this feature request is about.

I misinterpreted the request as "a construct that builds a byte sequence of length N from the N characters of the quoted literal".

@xavierleroy
Copy link
Contributor

xavierleroy commented Mar 16, 2019

Re-reading this discussion, I am convinced that Bytes.of_string "literal" is about the best we can do, from a code speed and code size point of view. If you have lots of those, you can shorten the notation:

let b = Bytes.of_string
b"foo" ... b"bar" ...

So, I'll go ahead and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants