Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory when using the Marshal module #5813

Closed
vicuna opened this issue Nov 6, 2012 · 15 comments
Closed

Out of memory when using the Marshal module #5813

vicuna opened this issue Nov 6, 2012 · 15 comments
Assignees

Comments

@vicuna
Copy link

vicuna commented Nov 6, 2012

Original bug ID: 5813
Reporter: waern
Assigned to: @damiendoligez
Status: assigned (set by @damiendoligez on 2015-02-06T18:26:28Z)
Resolution: open
Priority: normal
Severity: major
Platform: 32-bit
Version: 4.01.1+dev
Target version: later
Category: runtime system and C interface
Tags: patch
Monitored by: enrico @jmeber @alainfrisch

Bug description

Hi,

repeatedly unmarshalling large data structures with the Marshal module causes the process to run out of memory even in cases where constant memory usage is expected. This happens on the 32-bit version of OCaml on Windows.

Example code:

let () =
let data = Array.make_matrix 10 (1024*1024) 0 in
let data = Marshal.to_string data [] in
for i = 0 to 100 do
print_int i;
print_newline ();
let content = Marshal.from_string data 0 in
()
done;

This loop dies at iteration 45 with "Fatal error: exception Out_of_memory".

What we think happens (after looking at the RTS code) is that the OCaml heap is extended each time "Marshal.from_string" is called. Eventually this causes the process to run out of memory and finally a call to C's "malloc" in the RTS fails.

Inserting a Gc.full_major in the loop fixes the problem, which is a work-around for now. However we consider this behaviour to be a bug in the Marshal module/RTS.

Shouldn't the RTS do a garbage collection when running out of memory or before unmarshalling large data structures?

File attachments

@vicuna
Copy link
Author

vicuna commented Nov 6, 2012

Comment author: @alainfrisch

(Note: this is with the MSVC port, we did not try the mingw one.)

@vicuna
Copy link
Author

vicuna commented Jan 4, 2013

Comment author: @damiendoligez

Reproduced with 4.01.0+dev on Mac OS X.

Note that the bug doesn't occur in bytecode. It also doesn't occur if you replace your print_int/print_newline with a printf. This is probably because in native code the loop doesn't allocate in the minor heap, so it doesn't check for the "urgent GC" condition.

The solution is probably to have Marshal.from_string check that condition before extending the heap.

@vicuna
Copy link
Author

vicuna commented Jan 5, 2013

Comment author: @alainfrisch

We've seen the error in real code, and I doubt that that code does not allocate at all in the minor heap. Anyway, it sounds a good idea to check the "urgent GC" condition. (We will then test this fix on our code to see if it addresses our problem.)

@vicuna
Copy link
Author

vicuna commented Jun 7, 2013

Comment author: @xavierleroy

Tentative fix on SVN trunk, commit r13755. Will be in 4.01. We just call caml_check_urgent_gc at the end of the unmarshaling functions. (The comment in intern_alloc() explains why it cannot be done just after caml_alloc_shr, as is usually done.) With this fix, the repro case runs in constant heap space.

@vicuna
Copy link
Author

vicuna commented Feb 19, 2014

Comment author: waern

On my systtem using the 32 bit MSVC port, the test case still behaves the same after the fix. That is, the loop still dies at iteration 45 with an out of memory exception.

@vicuna
Copy link
Author

vicuna commented Feb 20, 2014

Comment author: @alainfrisch

Confirmed on my machine (tested with OCaml trunk) with the MSVC 32-bit port.

Contrary to what Damien's note suggests, bytecode is affected as well, and the error persists if we do a [Printf.printf "%i\n%!" i] inside the loop.

@vicuna
Copy link
Author

vicuna commented Feb 20, 2014

Comment author: @alainfrisch

I believe the problem is that we are in the case wosize > Max_wosize in intern.c/intern_alloc, and we end up extending the heap with new pages (caml_alloc_for_heap). AFAIK, the only way to release some pages is by running the compacter. We could tweak the demarshaller to give more incentive to run the compacter, but it is kind of sad that the submitted code sample depends on the compacter, even if there is no actual fragmentation issue.

How difficult would it be to allocate each block of the unmarshaled structured using normal allocation functions (instead of pre-allocating a single big block), and how would it affect performance? We would of course need to register intermediate values as roots (or put all of them in an array which is itself a root), and ensure that partial values are valid w.r.t. GC invariants.

@vicuna
Copy link
Author

vicuna commented Feb 20, 2014

Comment author: waern

I added:

if (caml_allocated_words > Wsize_bsize (caml_minor_heap_size))
  caml_urge_major_slice ();

after bumping caml_allocated_words in intern_add_to_heap() (patch attached). I added it since I saw the same thing in memory.c after bumping caml_allocated_words. It seems to have the effect of causing a compaction when running the test case on my system as the test case suceeds without an out of memory exception. Perhaps it's a heuristic we should add?

@vicuna
Copy link
Author

vicuna commented Feb 26, 2014

Comment author: @alainfrisch

Does anybody have a suggestion on how to best address this problem?

@vicuna
Copy link
Author

vicuna commented Mar 13, 2014

Comment author: @alainfrisch

Xavier told me that allocating blocks on the fly why demarshaling (and making sure all intermediate values are properly registered as roots) would probably be quite slow (it was how the first version of the code was written, and performance was not good enough at that time).

What about allocating a pool of N "big blocks" ahead of demarshaling instead of just one? The demarshaling code will fill in the first one, then switch to the second one, etc. It just needs to known in advance how many and how big these big blocks should be, but it should not be too costly to instrument the marshaling code to collect this information (creating a new big block before emitting a new block if it would overflow the current one, considering the maximum block size on a 32-bit machine) and store it.

I known that this issue is specific to 32-bit, and thus quite boring, but it's quite important for us (Windows 32-bit is still alive), and I assume 32-bit will remain around for some time on smaller devices.

@vicuna
Copy link
Author

vicuna commented Jun 2, 2014

Comment author: @alainfrisch

Xavier, Damien, others: do you have an opinion on the best approach to fix this problem?

@vicuna
Copy link
Author

vicuna commented Jul 17, 2014

Comment author: @alainfrisch

A pure OCaml implementation of the demarshaler exists (by Damien, for Frama-C). It could be interesting to switch to it instead of extending the heap (when the demarshaled data is too big, on 32-bit machines). It will be slower, but this would avoid triggering compaction too often.

@vicuna
Copy link
Author

vicuna commented Mar 11, 2015

Comment author: @alainfrisch

Since there doesn't seem to be a short-term solution for this problem, I'll postpone this ticket.

FWIW, while I still consider this is big issue for 32-bit platforms, LexiFi is no longer directly exposed to it (we wrote a custom binary serializer for the specific datatype that triggered the problem).

@xavierleroy
Copy link
Contributor

I confirm this issue is still there (4.09.0+dev): in Ubuntu x86-32, the test dies at iteration number 98, having extended the heap to 4102464k.

Is there a chance that this issue will be fixed by @damiendoligez's ongoing work on the free list allocator?

If not I suggest to close, since this issue is specific to 32-bit platforms, and those platforms are not our main focus.

@vicuna vicuna added the bug label Mar 20, 2019
@damiendoligez
Copy link
Member

Is there a chance that this issue will be fixed by @damiendoligez's ongoing work on the free list allocator?

No way. The problem is that these allocations are not from the heap (hence not from the free list) so a clever free-list allocator will not help at all.

I'm following @xavierleroy's suggestion. We'll reopen if someone feels this is really important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants