Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] handle SIGBUS on shrinked Bigarray mapped files #5471

Closed
vicuna opened this issue Jan 10, 2012 · 7 comments
Closed

[PATCH] handle SIGBUS on shrinked Bigarray mapped files #5471

vicuna opened this issue Jan 10, 2012 · 7 comments
Labels

Comments

@vicuna
Copy link

vicuna commented Jan 10, 2012

Original bug ID: 5471
Reporter: @edwintorok
Assigned to: @protz
Status: closed (set by @xavierleroy on 2013-08-31T10:44:26Z)
Resolution: fixed
Priority: normal
Severity: minor
Platform: Linux
OS: Debian GNU/Linux
OS Version: Linux 3.2.0
Version: 3.12.1
Fixed in version: 3.13.0+dev
Category: ~DO NOT USE (was: OCaml general)
Monitored by: mehdi

Bug description

Accesses to bigarrays are bounds-checked, but the bounds for Bigarray.*.map_file
is established when the file is mapped.
Later the file may shrink or grow, if another (or the same) process modifies it.
If the file shrinks, and you attempt to read/write past the new end-of-file
the OS may raise SIGBUS (for example Linux does).

It is not possible to handle SIGBUS purely on OCaml side, because
attempting to set an OCaml signal handler would cause SIGBUS to be
re-delivered infinitely.

Attached patch (for ocaml trunk) implements SIGBUS handling in the byterun/asmrun runtimes.
It also introduces a new page flag In_mapped_area, that is set by bigarray.
This way we know the SIGBUS came from our mapping, and not from some other
native code mapping.

Patch note: The stdlib/printexc.ml change require an 'make coreboot' run.
If you 'git apply attachedpatch.patch' then it'll update boot/ for you.

Steps to reproduce

  1. Compile the below code with ocamlc and ocamlopt

  2. Run xbyte and xnat

  3. Current results: both crash
    $ ./xbyte (or ./xnat)
    trying to read invalid mem area
    Bus error

  4. Uncomment the Sys.set_signal line, and run again. Now you got an infinite loop (SIGBUS delivered over and over again and the OCaml signal handle never reached).

  5. Expected result: no crash, an OCaml exception is raised (like with stack overflow, bounds errors, etc.):

Expected xbyte output:

trying to read invalid mem area
exception caught: Bus error
Raised by primitive operation at unknown location
trying to write invalid mem area
Raised by primitive operation at unknown location
exception caught: Bus error

Expected xnat output:
trying to read invalid mem area
exception caught: Bus error
Raised by primitive operation at file "pervasives.ml", line 250, characters 21-28
trying to write invalid mem area
Raised by primitive operation at file "pervasives.ml", line 250, characters 21-28
exception caught: Bus error

Additional information

(* test code

  • ocamlc bigarray.cma unix.cma x.ml -o xbyte; ./xbyte
  • ocamlopt bigarray.cmxa unix.cmxa x.ml -o xnat; ./xnat
    *)

open Bigarray
open Unix

exception Sigbus

let sigbus_handler _ =
raise Sigbus;;

let _ =
Printexc.record_backtrace true;
(* Sys.set_signal 7 (Sys.Signal_handle sigbus_handler);*)
close (openfile "/tmp/x" [O_CREAT] 0o644);
truncate "/tmp/x" 1024;
let fd = openfile "/tmp/x" [ O_RDWR ] 0 in
let m = Array1.map_file fd char c_layout false (-1) in
Printf.printf "%c" m.{0};
ftruncate fd 0;
begin try
Printf.printf "trying to read invalid mem area\n";
flush_all ();
Printf.printf "%c" m.{0}
with
| e ->
Printf.printf "exception caught: %s\n" (Printexc.to_string e);
Printexc.print_backtrace Pervasives.stdout;
flush_all ();
end;
begin try
Printf.printf "trying to write invalid mem area\n";
Printexc.print_backtrace Pervasives.stdout;
flush_all ();
m.{0} <- 'a';
with
| e ->
Printf.printf "exception caught: %s\n" (Printexc.to_string e);
flush_all ();
end;;

File attachments

@vicuna
Copy link
Author

vicuna commented Jan 10, 2012

Comment author: @edwintorok

P.S.: the patch doesn't raise Bus_error on Win32, but AFAIK Win32 locks mapped files, and you wouldn't be able to shrink it anyway.

@vicuna
Copy link
Author

vicuna commented Jan 10, 2012

Comment author: @edwintorok

Added a patch without the boot/ changes for easier review:
http://caml.inria.fr/mantis/file_download.php?file_id=565&type=bug

@vicuna
Copy link
Author

vicuna commented Jan 10, 2012

Comment author: gerd

Interesting idea, and I also ran already into this problem, so I'd really appreciate a solution.

Edwin, I don't understand why it is always safe to raise an exception when the bus error occurs. Even if we only look at the case this patch made for, namely trapping illegal bigarray accesses. If Ocaml code is running at this time, the code may be wrong for allowing exceptions at this point (probably easy to fix). If C code is running, everything can go wrong that may go wrong (e.g. memory may remain uninitialized). If we could recognize the latter case, it would be better.

There are also other reasons for SIGBUS, and they are quite platform-dependent. We should exclude these in the signal handler (somehow).

A safe solution would be probably to enable this special sigbus handler immediately before the bigarray access, and to disable it after the access. But this would make bigarray access a bit more expensive.

@vicuna
Copy link
Author

vicuna commented Jan 10, 2012

Comment author: @edwintorok

btw for OCamlNet you'd need to set the In_mapped_area flag (perhaps wrapped
in an #ifdef In_mapped_area) for the file when mmap/munmap and you would get same behaviour as Bigarray (see patch comments).

I agree that we should check (in a platform specific way?) the SIGBUS reason, as it can also be unaligned access (and other, like stack overflow/underflow).
The asmrun/ patch already checks that the fault came from a Bigarray mapped file
(In_mapped_area page flag), so that only leaves two other fault possibilities:

  • unaligned access, but the compiler should already prevent this
  • permission problem, i.e. write to a PROT_READ page, or read from PROT_NONE page. Don't know if this can generate a SIGBUS on some platforms.

The byterun/ code only checks that the fault comes from Bigarray mapped file (In_mapped_area flag), as I don't know a way to check that the fault came from OCaml code (its the caml_ba_get_N and caml_ba_set_generic C functions that die in this case, or the copy* functions that they call). If this is too unsafe,
then we could set the sigbus handler only for native code.

Instead of raising the Bus_error (or Failure, or other exception) we could stat() the file to determine the new bounds, and update Bigarray's idea of what the bounds should be. And then raise a bounds violation error. But that seems like overkill for an otherwise rare condition.

@vicuna
Copy link
Author

vicuna commented Jan 14, 2012

Comment author: @xavierleroy

Thanks for the suggestion and the clever patch. I like the idea of recording mmap-ed areas in Caml's master page table. I see a problem, though:

On ports that cache runtime variables in registers (like the amd64 port, the one we care most about), turning a signal into a synchronous exception is possible only if the signal occurs while executing ocamlopt-generated code. It's not possible if we're inside C code. Your patch correctly accounts for this. However, it means that invalid bigarray accesses performed from C will still crash the program on an unhandled SIGBUS.

Bytecode programs always go through the C functions caml_ba_{get,set}_* to access bigarrays. Native-code programs also go through these functions in a number of cases (high-dimension bigarrays and bigarrays whose type isn't fully statically known). It's only in a few special cases that the bigarray access can be inlined by ocamlopt.

I don't think there is any easy solution to this issue. The most reasonable thing might be to document the issue and tell users "don't do that" (memory-mapping a file whose size can change asynchronously).

@vicuna
Copy link
Author

vicuna commented Jan 14, 2012

Comment author: @edwintorok

Most of the time the "don't do that" is outside the control of the OCaml application though, so it'd be nice if a solution can be found on the OCaml side.
Its a rare situation though - for a C application (ClamAV) it took about 6 years until the first SIGBUS crash got reported - so if a nice OCaml solution can't be found just documenting the issue would be fine.

I think we could set/reset a flag in caml_ba_{get,set} as gerd suggested,
and change the signal handler to check for that flag.
Then raise the exception even if we are not in OCaml code, but the flag is set. Except in that case it should also leave caml_young_ptr/caml_exception_pointer untouched. Same flag could be used in bytecode runtime to make sure the SIGBUS came from bigarray.

Actually all the signal handler setup and signal handling could be moved to Bigarray (as ocp suggested), leaving only the page-table changes in the byte/asm runtimes.

This should be safe:

  • caml_ba_{get,set} doesn't enter blocking section, so we know only one
    thread runs it, and that no OCaml code is running at that time
  • caml_ba_{get,set} is not noalloc, so it is called via caml_c_call and the register-cached variables are saved to global variables (caml_young_ptr/exception_pointer)
  • all we have to do is to leave the global pointers alone in the signal handler as they already have the correct values, and just raise the exception

@vicuna
Copy link
Author

vicuna commented Jan 18, 2012

Comment author: @protz

Hi edwin,

We had a lengthy discussion about that yesterday, and we feel like there's no good solution to this; trying to make things rights would take us too far.

Therefore, I have implemented you first suggestion, namely, documenting this behavior clearly, so that users are aware of this "shortcoming". This is r12042

Thanks,

jonathan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant