Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should String_val return 'const char *' when safe-string is globally enabled? #7594

Closed
vicuna opened this issue Jul 21, 2017 · 7 comments
Closed
Milestone

Comments

@vicuna
Copy link

vicuna commented Jul 21, 2017

Original bug ID: 7594
Reporter: @yallop
Assigned to: @yallop
Status: resolved (set by @yallop on 2017-08-03T13:27:57Z)
Resolution: fixed
Priority: normal
Severity: minor
Target version: 4.06.0 +dev/beta1/beta2/rc1
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: runtime system and C interface
Monitored by: @gasche @dbuenzli

Bug description

Since GPR 687 (#687) it's possible to enable -safe-string globally at configure time.

There's a second GPR open (#1252) that makes configuration-time -safe-string the default.

At present, -safe-string only affects OCaml code; C stubs that mutate OCaml strings will continue to compile without error. This may lead to subtle bugs, not least since the compiler now takes string immutability into account during optimizations (#703).

Should we update String_val to return 'const char *' when -safe-string is enabled, like this?

#ifdef CAML_SAFE_STRING
#define String_val(x) ((const char *) Bp_val(x))
#else
#define String_val(x) ((char *) Bp_val(x))
#endif

We'd also need a new 'Bytes_val' that always returned 'char *'.

This change would be slightly backwards-incompatible, but probably not any more so than defaulting to -safe-string. And it seems likely to improve the quality of code, by making it more explicit which C code can actually modify OCaml string/bytes values.

@vicuna
Copy link
Author

vicuna commented Jul 21, 2017

Comment author: @xavierleroy

If we do this, the following common idiom

char * p = String_val(v);

and probably many others will warn or break compilation if -warn-error. That means breakage not only in .ml files but also on .c files.

If anyone wants to experiment with this approach and report on the breakage, that would be most welcome.

@vicuna
Copy link
Author

vicuna commented Jul 21, 2017

Comment author: @xavierleroy

And of course we need

#define Bytes_val(x) ((char *) Bp_val(x))

in <caml/mlvalues.h>.

On second thoughts, make that "(unsigned char *)" just to gain more machine independence. ("char" can be signed or unsigned.)

@vicuna
Copy link
Author

vicuna commented Jul 21, 2017

Comment author: @yallop

I ran some preliminary experiments, changing String_val in trunk to return 'const char *', and looking at what needed to be changed in the distribution. The branch is here:

yallop/ocaml@trunk...yallop:const-string-val

Summary:

  • The changes in the OCaml distribution are relatively modest (66 additions, 53 deletions)

  • The signatures of several public functions need additional 'const'. Example:

    -extern void unix_error (int errcode, char * cmdname, value arg)
    +extern void unix_error (int errcode, const char * cmdname, value arg)

    This is unlikely to affect calls, since 'char ' arguments can be passed where 'const char' is expected.

  • Unsurprisingly, several internal uses of String_val need adjustment, either to

    const char *c = String_val(...)

    or to

    unsigned char *c = Bytes_val(...)

  • Using 'unsigned char *' as the return value of Bytes_val occasionally needs extra casts in calling code. For example, caml_alloc_sprintf uses vsnprintf to write to a string, and vsnprintf's first argument is 'char *', not 'unsigned char *'

It'd also be helpful to look at some packages that use C extensions. I plan to look at a few such packages shortly.

@vicuna
Copy link
Author

vicuna commented Jul 21, 2017

Comment author: @xavierleroy

Thanks for the experiment, these are interesting data points.

Using 'unsigned char *' as the return value of Bytes_val occasionally needs extra casts in calling code. For example, caml_alloc_sprintf uses vsnprintf to write to a string, and vsnprintf's first argument is 'char *', not 'unsigned char *'

Just one remark: I think Bytes_val should be reserved for data that has type bytes on the Caml side, not to gain write access to a Caml string. For the latter use, "(char *) String_val(v)" makes the intent clearer, in my opinion.

@vicuna
Copy link
Author

vicuna commented Jul 24, 2017

Comment author: @yallop

Just one remark: I think Bytes_val should be reserved for data that has type bytes on the Caml side, not to gain write access to a Caml string. For the latter use, "(char *) String_val(v)" makes the intent clearer, in my opinion.

Thanks for the clarification. I agree that what you suggest is better.

Ok, here are a few statistics about String_val and OPAM packages.

Of the OPAM packages that call String_val (and that aren't explicitly constrained to an earlier OCaml version)

  • 130 install without problems

  • 94 aren't installable on trunk because of version constraints in dependencies

  • 10 are not installable on my system because of external dependency issues

  • 17 don't compile with OCaml trunk due to problems in OCaml code or unrelated C problems

  • 4 packages actually fail with String_val related-problems when String_val is changed to return 'const char *':
    delimcc
    ocaml-expat
    cairo
    milter

@vicuna
Copy link
Author

vicuna commented Jul 31, 2017

Comment author: @mshinwell

I'm in favour of a change along these lines.

@vicuna
Copy link
Author

vicuna commented Aug 2, 2017

Comment author: @yallop

I've opened a pull request: #1274

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant