New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
caml_alloc_string allocates empty strings #7145
Comments
Comment author: junsli physical equality does not hold for any strings that
Maybe I don't fully understand the problem. But I do feel the inconsistency here:
|
Comment author: @mmottl Interesting, I thought that strings behave like arrays in that respect. I think they should. It's also worth noting that the "string" type has recently been made "pure", i.e. it can only be manipulated in purely functional ways. The above behavior seems even more odd in that case. |
Comment author: @xavierleroy Strings and arrays have different representations. The empty array is a block of size 0 words, which therefore must be allocated statically outside the heap. The empty string is a block of size 1 word. So, the caml_atom_table trick cannot be used for the empty string. As to re-sharing of string literals in general, it could happen in the future, but not before -safe-string becomes the default. |
Comment author: @mmottl Ah, I see, I wasn't aware that unlike arrays empty string blocks have non-zero size. Looking forward to -safe-string becoming the default... |
Comment author: @alainfrisch I don't understand what would prevent us from sharing empty strings (at least in caml_alloc_string). Even without -safe-string, the empty string is immutable. |
Comment author: @bobzhang as a short-cut, can we have |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
I don’t know if there have been any changes to the handling of string constants. Strings have been immutable for a long time now so it would seem worthwhile to reconsider sharing them. |
Looking at this as well (hi @mmottl!):
No strong opinion on whether the feature still matters. |
Keep in mind that even for empty arrays the sharing is not 100% reliable: There is actually already a solution for constants that absolutely need to be shared (namely, predefined exceptions). This works by reserving symbols for these constants on the C side, but making the OCaml linker generate the code for them. |
Nowadays, ocamlopt shares identical string literals within one compilation unit, and ocamlc does not. This is the same behavior as with immutable structured constants (e.g. I think ocamlc could do better and also share within compilation units. I agree with @lthls that sharing across compilation units is a pain and not something we should strive for. I'll go one step further. Immutable data structures offer more possibilities for the compiler to share them, but this is strictly an optimization: sharing should not be mandated in any way, and user code should not rely on it. I move to close this issue. |
There is certainly no need to make the compilation process unnecessarily more complex to share across compilation units. It's good to know that May I suggest also adding Maybe I can't close this issue myself since my report was copied from Mantis. Please feel free to do so. |
I agree with Xavier: this is not a problem, as physical equality is never guaranteed for immutable structures. And if we want to share strings, we should probably strive to share more than just the empty string. I'm closing the issue. If anyone wants to work on sharing in the compiler, feel free to open a PR. |
Original bug ID: 7145
Reporter: @mmottl
Status: acknowledged (set by @damiendoligez on 2016-02-24T15:51:32Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.02.3
Target version: later
Category: runtime system and C interface
Monitored by: @bobzhang @diml @jmeber @mmottl
Bug description
caml_alloc_string in byterun/alloc.c does not check whether the length of the requested string is zero and hence always allocates a fresh value even for empty strings. As a consequence, physical equality does not hold in e.g. String.copy "" == ""
This does not happen with e.g. empty arrays. The caml_atom_table apparently contains an entry for string tags, too. Not sure, but it may be safe to tweak that case, which could be beneficial for bindings that allocate a lot of empty strings (for optional / empty fields).
The text was updated successfully, but these errors were encountered: