|Anonymous | Login | Signup for a new account||2014-07-29 14:54 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005813||OCaml||OCaml runtime system||public||2012-11-06 17:36||2014-07-17 10:20|
|Target Version||after-4.02.0||Fixed in Version|
|Summary||0005813: Out of memory when using the Marshal module|
repeatedly unmarshalling large data structures with the Marshal module causes the process to run out of memory even in cases where constant memory usage is expected. This happens on the 32-bit version of OCaml on Windows.
let () =
let data = Array.make_matrix 10 (1024*1024) 0 in
let data = Marshal.to_string data  in
for i = 0 to 100 do
let content = Marshal.from_string data 0 in
This loop dies at iteration 45 with "Fatal error: exception Out_of_memory".
What we think happens (after looking at the RTS code) is that the OCaml heap is extended each time "Marshal.from_string" is called. Eventually this causes the process to run out of memory and finally a call to C's "malloc" in the RTS fails.
Inserting a Gc.full_major in the loop fixes the problem, which is a work-around for now. However we consider this behaviour to be a bug in the Marshal module/RTS.
Shouldn't the RTS do a garbage collection when running out of memory or before unmarshalling large data structures?
|Tags||No tags attached.|
|Attached Files||urge_major_slice.diff [^] (632 bytes) 2014-02-20 10:54 [Show Content]|
|(Note: this is with the MSVC port, we did not try the mingw one.)|
edited on: 2013-01-04 15:14
Reproduced with 4.01.0+dev on Mac OS X.
Note that the bug doesn't occur in bytecode. It also doesn't occur if you replace your print_int/print_newline with a printf. This is probably because in native code the loop doesn't allocate in the minor heap, so it doesn't check for the "urgent GC" condition.
The solution is probably to have Marshal.from_string check that condition before extending the heap.
|We've seen the error in real code, and I doubt that that code does not allocate at all in the minor heap. Anyway, it sounds a good idea to check the "urgent GC" condition. (We will then test this fix on our code to see if it addresses our problem.)|
Tentative fix on SVN trunk, commit r13755. Will be in 4.01. We just call caml_check_urgent_gc at the end of the unmarshaling functions. (The comment in intern_alloc() explains why it cannot be done just after caml_alloc_shr, as is usually done.) With this fix, the repro case runs in constant heap space.
|On my systtem using the 32 bit MSVC port, the test case still behaves the same after the fix. That is, the loop still dies at iteration 45 with an out of memory exception.|
edited on: 2014-02-20 09:53
Confirmed on my machine (tested with OCaml trunk) with the MSVC 32-bit port.
Contrary to what Damien's note suggests, bytecode is affected as well, and the error persists if we do a [Printf.printf "%i\n%!" i] inside the loop.
I believe the problem is that we are in the case wosize > Max_wosize in intern.c/intern_alloc, and we end up extending the heap with new pages (caml_alloc_for_heap). AFAIK, the only way to release some pages is by running the compacter. We could tweak the demarshaller to give more incentive to run the compacter, but it is kind of sad that the submitted code sample depends on the compacter, even if there is no actual fragmentation issue.
How difficult would it be to allocate each block of the unmarshaled structured using normal allocation functions (instead of pre-allocating a single big block), and how would it affect performance? We would of course need to register intermediate values as roots (or put all of them in an array which is itself a root), and ensure that partial values are valid w.r.t. GC invariants.
if (caml_allocated_words > Wsize_bsize (caml_minor_heap_size))
after bumping caml_allocated_words in intern_add_to_heap() (patch attached). I added it since I saw the same thing in memory.c after bumping caml_allocated_words. It seems to have the effect of causing a compaction when running the test case on my system as the test case suceeds without an out of memory exception. Perhaps it's a heuristic we should add?
|Does anybody have a suggestion on how to best address this problem?|
Xavier told me that allocating blocks on the fly why demarshaling (and making sure all intermediate values are properly registered as roots) would probably be quite slow (it was how the first version of the code was written, and performance was not good enough at that time).
What about allocating a pool of N "big blocks" ahead of demarshaling instead of just one? The demarshaling code will fill in the first one, then switch to the second one, etc. It just needs to known in advance how many and how big these big blocks should be, but it should not be too costly to instrument the marshaling code to collect this information (creating a new big block before emitting a new block if it would overflow the current one, considering the maximum block size on a 32-bit machine) and store it.
I known that this issue is specific to 32-bit, and thus quite boring, but it's quite important for us (Windows 32-bit is still alive), and I assume 32-bit will remain around for some time on smaller devices.
|Xavier, Damien, others: do you have an opinion on the best approach to fix this problem?|
|A pure OCaml implementation of the demarshaler exists (by Damien, for Frama-C). It could be interesting to switch to it instead of extending the heap (when the demarshaled data is too big, on 32-bit machines). It will be slower, but this would avoid triggering compaction too often.|
|2012-11-06 17:36||waern||New Issue|
|2012-11-06 17:42||frisch||Note Added: 0008432|
|2013-01-04 15:13||doligez||Note Added: 0008692|
|2013-01-04 15:13||doligez||Status||new => acknowledged|
|2013-01-04 15:13||doligez||Target Version||=> 4.01.0+dev|
|2013-01-04 15:14||doligez||Note Edited: 0008692||View Revisions|
|2013-01-05 11:21||frisch||Note Added: 0008697|
|2013-06-07 16:10||xleroy||Note Added: 0009433|
|2013-06-07 16:10||xleroy||Status||acknowledged => resolved|
|2013-06-07 16:10||xleroy||Resolution||open => fixed|
|2013-06-07 16:10||xleroy||Fixed in Version||=> 4.01.0+dev|
|2014-02-19 17:39||waern||Note Added: 0010956|
|2014-02-20 09:50||frisch||Note Added: 0010960|
|2014-02-20 09:50||frisch||Assigned To||=> frisch|
|2014-02-20 09:50||frisch||Status||resolved => confirmed|
|2014-02-20 09:51||frisch||Product Version||4.00.1 => 4.01.1+dev|
|2014-02-20 09:51||frisch||Fixed in Version||4.01.0+dev =>|
|2014-02-20 09:51||frisch||Target Version||4.01.0+dev => 4.02.0+dev|
|2014-02-20 09:51||frisch||Assigned To||frisch =>|
|2014-02-20 09:53||frisch||Note Edited: 0010960||View Revisions|
|2014-02-20 10:43||frisch||Note Added: 0010961|
|2014-02-20 10:53||waern||Note Added: 0010962|
|2014-02-20 10:54||waern||File Added: urge_major_slice.diff|
|2014-02-26 12:09||frisch||Note Added: 0010979|
|2014-02-26 12:09||frisch||OS||Windows =>|
|2014-02-26 12:09||frisch||OS Version||7 =>|
|2014-02-26 12:09||frisch||Platform||=> 32-bit|
|2014-03-13 11:03||frisch||Note Added: 0011040|
|2014-06-02 19:13||frisch||Note Added: 0011629|
|2014-07-08 11:22||doligez||Resolution||fixed => open|
|2014-07-08 11:22||doligez||Target Version||4.02.0+dev => after-4.02.0|
|2014-07-17 10:20||frisch||Note Added: 0011870|
|Copyright © 2000 - 2011 MantisBT Group|