New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC fails to collect large array #5389
Comments
Comment author: @lefessan Actually, the GC collects the block, but fails to free the chunk during compaction. The reason is simple: it is the first chunk in the list of chunks (because Linux gives it the smallest address in memory), and the compaction algorithm never frees the first chunk. Clearly, the compaction algorithm should use a different strategy to free chunks: it should first sort chunks by the increasing probability that a chunk be freed during compaction, so that it would not use a chunk that is free at 99% as the target for copying the other chunks... |
Comment author: @lefessan Another (simpler) solution would be to truncate the chunk when it is clearly much bigger than necessary. |
Comment author: @ygrek I am seeing similar behaviour quite often, i.e. after some heap growing and shrinking Gc reports small overall heap size while OS reports large chunks allocated for process. but never got to investigate it closely, thanks a lot for your explanation. Looking forward for a patch. |
Comment author: @mjambon I found a workaround which may or may not work in other situations. It consists in allocating a string of about the same length in bytes (multiply array length by 8 or 4) after having allocated the array and before the garbage collector performs any compaction. See alloc_array_fix and without_gc_compaction in the newly uploaded program test_gc_workaround.ml. |
Comment author: @lefessan Your "workaround" will probably lead to a memory leak sooner or later. The difference between arrays and strings is not that the string is freed, and the array is not, the real difference is that strings are not initialized, and thus, they are not completely allocated by the kernel (pages in the middle will be mapped on demand). What you see in your workaround is that the chunk containing the array is reclaimed by the compaction, but the chunk containing the string (almost the same size) is not. Since the string is not initialized, it looks like everything has been collected, but it is not the case, and as the chunk will be progressively used for other data, pages will be mapped by the kernel, and the memory footprint of your program is going to grow until the size of chunk (1GB). However, this might happen slowly enough that it won't bother your application. |
Comment author: @mjambon Thanks for the explanation. Is there a document that describes how OCaml's GC works and that would help me understand the problem better and find a workaround? Alternatively, are chunks created as large as needed, rounded up to the nearest power of two (here 1GB)? What is the smallest chunk size? Can I artificially create a small first chunk? I am thinking that creating a list that uses at least as many bytes as my array would do the trick (i.e. allocating a large array, allocating a long list then compacting) because it would have to create a new chunk but not a huge one. It seems to work in my naive test. Is it supposed to? |
Comment author: @lefessan Unfortunately, there is no public description of OCaml garbage collection algorithm, probably because it has been tuned so much that a complete description would not be easier to read than the sources... ;-) If your goal is to use ancient to move the array outside the heap, you might consider allocating outside the heap in the first place. |
Comment author: @ygrek I for one would suggest that the first chunk should be truncated. Otherwise for the user it looks like the program eats much more memory than it really uses. |
Comment author: @lefessan Yes, this PR should be kept as at least a feature wish, with "need to truncate the first chunk after compaction". |
Comment author: varobert I stumbled upon that bug today as I was fuzz-testing my application. Basically I would get some length from the fuzzed input and allocate a string of that size. This could go up to 2^31-ish in my runs. This string is just used to do some comparison up to a few characters, then completely discarded (yes, what a waste to allocate it in the first place!). http://caml.inria.fr/mantis/file_download.php?file_id=615&type=bug < This shows exactly what I'm doing. The output shows pretty clearly what is happening: 268435456 536870912 1073741824 [...] You can see that the difference in the major heap's size between "before" and "after" compaction is equal to its previous size: Since I run several of these fuzz-tests in parallel, my OS is painfully DOS-ed by those OCaml programs claiming to need ~4GB of the RAM-cake each! |
Comment author: gerd varobert: a possible workaround exists (not really nice, but should solve your problem). The idea is to allocate a large buffer as bigarray (which is malloced memory, and can even be reused in each iteration), and then initialize the bigarray so it looks like a string. Ocamlnet contains the required C functions for this (feel free to copy them): let buffer = Netsys_mem.alloc_memory_pages (n+64) Now s is a string outside Ocaml's heap, and you have detailed control about memory management. The only "danger" here is that the bigarray is deallocated when buffer gets unreachable, but s is still used. You can avoid that by always passing (buffer,s) to any function. For questions about using that, please contact me directly (gerd (at) gerd-stolpmann.de) |
Comment author: @alainfrisch
Are you sure this is enough? In native code, the garbage collector knows, for each point in code where it can be triggered, which are the live values (used in the rest of the function). So you need to make sure that buffer is explicitly used "at the end" of the function, to be sure it won't be collected. It seems very easy to forget about it and get some very rare bug (difficult to track). |
Comment author: @damiendoligez lefessan: Truncating the chunk is not an option, because it is malloced memory, and realloc may move the block when resizing it. ygrek: If the overall heap size is small, then the OCaml GC has already freed the memory and it's your libc that fails to give it back to the OS. There's nothing much we can do in that case, short of reimplementing our own malloc. Martin Jambon: Fabrice is right, your workaround is likely to fail in long-running programs. You cannot create a small first chunk because we don't control the placement of the chunks in memory. protz and lefessan: don't close this PR. I consider this a performance bug, not a mere feature wish. I think I have a solution, but it involves changes to the compaction algorithm that will break one undocumented invariant: as it is now, the compaction doesn't change the relative order of blocks in memory; after the change, it will. I know at least one library out there relies on the relative order of pointers. However foolish that is, I want to give the authors advance warning about this change. The only problem is, I don't remember who is doing that... |
Comment author: @damiendoligez I couldn't reproduce the problem with either test_gc.ml or Gcfail.ml, but I managed to trigger it with a small synthetic example (compact_problem.ml). I have a patch that seems to solve the problem. This is how it works: This patch is only lightly tested for the moment. If you guys could try it out, that would be great. |
Comment author: varobert 268435456 536870912 1073741824 It indeed solves my issue with Gcfail.ml (in both 3.12.1 and 4.01.0+dev1_2012-03-31). |
Comment author: @damiendoligez Patch applied to 4.00 (commit 12364) and trunk (12365). |
Comment author: @ygrek Is it expected to work on 3.11? I've applied this patch on 3.11.2 and observe crash in do_compaction (second invocation) in large application (with some potentially buggy C bindings, so I am not absolutely sure that this patch is a reason, maybe just a trigger).. PS patch doesn't update caml_stat_heap_chunks - is it ok? |
Comment author: @hcarty I'm able to build trunk and the problem is fixed for me there, but version/4.00 does not build for me. make[2]: Entering directory |
Comment author: @damiendoligez About version/4.00: this is an independent problem that was fixed (#5582). About caml_stat_heap_chunks: well spotted. I've fixed this in 4.00 (commit 12524) and trunk (commit 12525). About 3.11: I have no idea whether it might work with this patch. |
Original bug ID: 5389
Reporter: @mjambon
Assigned to: @damiendoligez
Status: closed (set by @xavierleroy on 2016-12-07T10:36:56Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 3.12.1
Target version: 4.00.0+dev
Fixed in version: 4.00.0+dev
Category: ~DO NOT USE (was: OCaml general)
Related to: #5757
Monitored by: varobert jmeister @protz @lefessan mehdi @ygrek @glondu @hcarty @mjambon "Christoph Bauer"
Bug description
A large array initialized with zeros fails to be collected after a call to Gc.compact. However a string or a bigarray of the same physical size are collected. This behavior is observed on amd64 Linux with OCaml 3.11.2 and 3.12.1.
See attached code.
In practice we are experiencing this problem with a large read-only hash table loaded from a file using Marshal and copied using Ancient in order to increase performance. The old copy produced by Marshal is not reclaimed, which doubles the expected memory usage.
File attachments
The text was updated successfully, but these errors were encountered: